CRAN release: 2023-01-24
New family of consistent string separating functions:
separate_longer_position(). These functions are thorough refreshes of
extract(), featuring improved performance, greater consistency, a polished API, and a new approach for handling problems. They use stringr and supersede
.byargument which allows you to specify the columns to nest by (rather than the columns to nest, i.e. through
...). Additionally, the
.keyargument is no longer deprecated, and is used whenever
...isn’t specified (#1458).
cms_patient_careto demonstrate various tidying challenges (#1333).
...argument of both
pivot_wider()has been moved to the front of the function signature, after the required arguments but before the optional ones. Additionally,
build_wider_spec()have all gained
...arguments in a similar location. This change allows us to more easily add new features to the pivoting functions without breaking existing CRAN packages and user scripts.
pivot_wider()provides temporary backwards compatible support for the case of a single unnamed argument that previously was being positionally matched to
id_cols. This one special case still works, but will throw a warning encouraging you to explicitly name the
unnest_longer()now consistently drops rows with either
NULLor empty vectors (like
integer()) by default. Set the new
TRUEto retain them. Previously,
keep_empty = TRUEwas implicitly being used for
keep_empty = FALSEwas being used for empty vectors, which was inconsistent with all other tidyr verbs with this argument (#1363).
unnest_wider()now generates automatic names for partially unnamed vectors. Previously it only generated them for fully unnamed vectors, resulting in a strange mix of automatic names and name-repaired names (#1367).
All built in datasets are now standard tibbles (#1459).
R >=3.4.0 is now required, in line with the tidyverse standard of supporting the previous 5 minor releases of R.
Removed dependency on ellipsis in favor of equivalent functions in rlang (#1314).
unpack()does a better job of reporting column name duplication issues and gives better advice about how to resolve them using
names_sep. This also improves errors from functions that use
pivot_longer()no longer supports interpreting
values_ptypes = list()and
names_ptypes = list()as
NULL. An empty
list()is now interpreted as a
<list>prototype to apply to all columns, which is consistent with how any other 0-length value is interpreted (#1296).
CRAN release: 2022-02-01
expand()no longer allow you to complete or expand on a grouping column. This was never well-defined since completion/expansion on a grouped data frame happens “within” each group and otherwise has the potential to produce erroneous results (#1299).
replace_na()no longer allows the type of
datato change when the replacement is applied.
replacewill now always be cast to the type of
databefore the replacement is made. For example, this means that using a replacement value of
1.5on an integer column is no longer allowed. Similarly, replacing missing values in a list-column must now be done with
list("foo")rather than just
id_expandarguments for turning implicit missing factor levels and variable combinations into explicit ones. This is similar to the
values_toarguments now accept a glue specification, which is useful when unnesting multiple columns.
unnest_wider()gains a new
strictargument which controls whether or not strict vctrs typing rules should be applied. It defaults to
FALSEfor backwards compatibility, and because it is often more useful to be lax when unnesting JSON, which doesn’t always map one-to-one with R’s types (#1125).
fill()have been updated to utilize vctrs. This means that you can use these functions on a wider variety of column types, including lubridate’s Period types (#1094), data frame columns, and the rcrd type from vctrs.
@mgirlich is now a tidyr author in recognition of his significant and sustained contributions.
All lazyeval variants of tidyr verbs have been soft-deprecated. Expect them to move to the defunct stage in the next minor release of tidyr (#1294).
dplyr >= 1.0.0 is now required.
pivot_wider()now works correctly when
values_fillis a data frame.
values_fromarguments are now required if their default values of
valuedon’t correspond to columns in
data. Additionally, they must identify at least 1 column in
The grouped data frame methods for
expand()now move the group columns to the front of the result (in addition to the columns you completed on or expanded, which were already moved to the front). This should make more intuitive sense, as you are completing or expanding “within” each group, so the group columns should be the first thing you see (#1289).
crossing()now silently apply name repair to automatically named inputs. This avoids a number of issues resulting from duplicate truncated names (#1116, #1221, #1092, #1037, #992).
crossing()now allow columns from unnamed data frames to be used in expressions after that data frame was specified, like
expand_grid(tibble(x = 1), y = x). This is more consistent with how
crossing()now return a 1 row data frame when no inputs are supplied, which is more consistent with
prod() == 1Land the idea that computations involving the number of combinations computed from an empty set should return 1 (#1258).
CRAN release: 2021-09-27
unnest()no longer allows unnesting a list-col containing a mix of vector and data frame elements. Previously, this only worked by accident, and is considered an off-label usage of
unnest()that has now become an error.
CRAN release: 2021-03-03
tidyr verbs no longer have “default” methods for lazyeval fallbacks. This means that you’ll get clearer error messages (#1036).
CRAN release: 2020-08-27
CRAN release: 2020-07-31
CRAN release: 2020-05-20
transformarguments; these allow you to transform values “in flight”. They are partly needed because vctrs coercion rules have become stricter, but they give you greater flexibility than was available previously (#921).
Arguments that use tidy selection syntax are now clearly documented and have been updated to use tidyselect 1.1.0 (#872).
df <- tibble(id = 1:3, x_1 = 1:3, x_2 = 4:6) df %>% pivot_longer(-id, names_to = character())
pivot_longer()no longer creates a
.copyvariable in the presence of duplicate column names. This makes it more consistent with the handling of non-unique specs.
df <- tibble(id = 1:3, x_1 = 1:3, x_2 = 4:6) df %>% pivot_longer(-id, names_pattern = "(.)_.") df %>% pivot_longer(-id, names_sep = "_", names_to = c("name", NA)) df %>% pivot_longer(-id, names_sep = "_", names_to = c(".value", NA))
names_sortargument which allows you to sort column names in order. The default,
FALSE, orders columns by their first appearance (#839). In a future version, I’ll consider changing the default to
names_glueargument that allows you to construct output column names with a glue specification.
values_fillcan now be single values; you now only need to use a named list if you want to use different values for different value columns (#739, #746). They also get improved errors if they’re not of the expected type.
hoist()now automatically names pluckers that are a single string (#837). It error if you use duplicated column names (@mgirlich, #834), and now uses
rlang::list2()behind the scenes (which means that you can now use
hoist()do a better job simplifying list-cols. They no longer add unneeded
unspecified()when the result is still a list (#806), and work when the list contains non-vectors (#810, #848).
unnest_wider(names_sep = "")now provides default names for unnamed inputs, suppressing the many previous name repair messages (#742).
.names_separgument allows you to strip outer names from inner names, in symmetrical way to how the same argument to
unnest()combines inner and outer names (#795, #797).
unnest_longer()can now unnest
list_ofcolumns. This is important for unnesting columns created from
pivot_wider(), which will create
list_ofcolumns if the id columns are non-unique (#741).
chop()now creates list-columns of class
vctrs::list_of(). This helps keep track of the type in case the chopped data frame is empty, allowing
unchop()to reconstitute a data frame with the correct number and types of column even when there are no observations.
unite(na.rm = TRUE)now works for all types of variable, not just character vectors (#765).
CRAN release: 2019-09-11
vignette("in-packages") for a detailed transition guide.
unnest()have new syntax. The majority of existing usage should be automatically translated to the new syntax with a warning. If that doesn’t work, put this in your script to use the old versions until you can take a closer look and update your code:
The first argument of
nest()has changed from
nest_()and the lazyeval methods for
nest()are now defunct. They have been deprecated for some time, and, since the interface has changed, package authors will need to update to avoid deprecation warnings. I think one clean break should be less work for everyone.
All other lazyeval functions have been formally deprecated, and will be made defunct in the next major release. (See lifecycle vignette for details on deprecation stages).
nesting()now return 0-row outputs if any input is a length-0 vector. If you want to preserve the previous behaviour which silently dropped these inputs, you should convert empty vectors to
NULL. (More discussion on this general pattern at https://github.com/tidyverse/principles/issues/24)
pivot_wider() provide modern alternatives to
gather(). They have been carefully redesigned to be easier to learn and remember, and include many new features. Learn more in
These functions resolve multiple existing issues with
gather(). Both functions now handle mulitple value columns (#149/#150), support more vector types (#333), use tidyverse conventions for duplicated column names (#496, #478), and are symmetric (#453).
pivot_longer() gracefully handles duplicated column names (#472), and can directly split column names into multiple variables.
pivot_wider() can now aggregate (#474), select keys (#572), and has control over generated column names (#208).
To demonstrate how these functions work in practice, tidyr has gained several new datasets:
Finally, tidyr demos have been removed. They are dated, and have been superseded by
tidyr contains four new functions to support rectangling, turning a deeply nested list into a tidy tibble:
hoist(). They are documented in a new vignette:
unnest_wider() make it easier to unnest list-columns of vectors into either rows or columns (#418).
unnest_auto() automatically picks between
_wider() using heuristics based on the presence of common names.
hoist() provides a convenient way of plucking components of a list-column out into their own top-level columns (#341). This is particularly useful when you are working with deeply nested JSON, because it provides a convenient shortcut for the
unnest() have been updated with new interfaces that are more closely aligned to evolving tidyverse conventions. They use the theory developed in vctrs to more consistently handle mixtures of input types, and their arguments have been overhauled based on the last few years of experience. They are supported by a new
vignette("nest"), which outlines some of the main ideas of nested data (it’s still very rough, but will get better over time).
The biggest change is to their operation with multiple columns:
df %>% unnest(x, y, z) becomes
df %>% unnest(c(x, y, z)) and
df %>% nest(x, y, z) becomes
df %>% nest(data = c(x, y, z)).
I have done my best to ensure that common uses of
unnest() will continue to work, generating an informative warning telling you precisely how you need to update your code. Please file an issue if I’ve missed an important use case.
unnest() has been overhauled:
keep_emptyparameter ensures that every row in the input gets at least one row in the output, inserting missing values as needed (#358).
names_separgument to control how inner and outer column names are combined.
Uses standard tidyverse name-repair rules, so by default you will get an error if the output would contain multiple columns with the same name. You can override by using
Packing and chopping are interesting primarily because they are the atomic operations underlying nesting (and similarly, unchop and unpacking underlie unnesting), and I don’t expect them to be used directly very often.
expand()have been rewritten to use the vctrs package. This should not affect much existing code, but considerably simplies the implementation and ensures that these functions work consistently across all generalised vectors (#557). As part of this alignment, these functions now only drop
NULLinputs, not any 0-length vector.
full_seq()now also works when gaps between observations are shorter than the given
period, but are within the tolerance given by
tol. Previously, gaps between consecutive observations had to be in the range [
period + tol]; gaps can now be in the range [
period - tol,
period + tol] (@ha0ye, #657).
tidyr now re-exports
tribble(), as well as the tidyselect helpers (
ends_width(), …). This makes generating documentation, reprexes, and tests easier, and makes tidyr easier to use without also attaching dplyr.
All functions that take
...have been instrumented with functions from the ellipsis package to warn if you’ve supplied arguments that are ignored (typically because you’ve misspelled an argument name) (#573).
CRAN release: 2019-03-01
nest()is compatible with dplyr 0.8.0.
CRAN release: 2018-10-28
CRAN release: 2018-05-18
CRAN release: 2018-01-29
There are no deliberate breaking changes in this release. However, a number of packages are failing with errors related to numbers of elements in columns, and row names. It is possible that these are accidental API changes or new bugs. If you see such an error in your package, I would sincerely appreciate a minimal reprex.
separate()now correctly uses -1 to refer to the far right position, instead of -2. If you depended on this behaviour, you’ll need to switch on
packageVersion("tidyr") > "0.7.2"
nest()is now faster, especially when a long data frame is collapsed into a nested data frame with few rows.
gather()(#347) now replace existing variables rather than creating an invalid data frame with duplicated variable names (matching the semantics of mutate).
unnest(df)now works if
dfcontains no list-cols (#344)
CRAN release: 2017-10-16
CRAN release: 2017-09-01
This is a hotfix release to account for some tidyselect changes in the unit tests.
Note that the upcoming version of tidyselect backtracks on some of the changes announced for 0.7.0. The special evaluation semantics for selection have been changed back to the old behaviour because the new rules were causing too much trouble and confusion. From now on data expressions (symbols and calls to
c()) can refer to both registered variables and to objects from the context.
However the semantics for context expressions (any calls other than to
c()) remain the same. Those expressions are evaluated in the context only and cannot refer to registered variables. If you’re writing functions and refer to contextual objects, it is still a good idea to avoid data expressions by following the advice of the 0.7.0 release notes.
CRAN release: 2017-08-16
This release includes important changes to tidyr internals. Tidyr now supports the new tidy evaluation framework for quoting (NSE) functions. It also uses the new tidyselect package as selecting backend.
If you see error messages about objects or functions not found, it is likely because the selecting functions are now stricter in their arguments An example of selecting function is
...argument. This change makes the code more robust by disallowing ambiguous scoping. Consider the following code:
Does it select the first three columns (using the
xdefined in the global environment), or does it select the first two columns (using the column named
To solve this ambiguity, we now make a strict distinction between data and context expressions. A data expression is either a bare name or an expression like
c(x, y). In a data expression, you can only refer to columns from the data frame. Everything else is a context expression in which you can only refer to objects that you have defined with
In practice this means that you can no longer refer to contextual objects like this:
mtcars %>% gather(var, value, 1:ncol(mtcars)) x <- 3 mtcars %>% gather(var, value, 1:x) mtcars %>% gather(var, value, -(1:x))
You now have to be explicit about where to find objects. To do so, you can use the quasiquotation operator
!!which will evaluate its argument early and inline the result:
Following the switch to tidy evaluation, you might see warnings about the “variable context not set”. This is most likely caused by supplying helpers like
everything()to underscored versions of tidyr verbs. Helpers should be always be evaluated lazily. To fix this, just quote the helper with a formula:
The selecting functions are now stricter when you supply integer positions. If you see an error along the lines of
`-0.949999999999999`, `-0.940000000000001`, ... must resolve to integer column positions, not a double vector
please round the positions before supplying them to tidyr. Double vectors are fine as long as they are rounded.
tidyr is now a tidy evaluation grammar. See the programming vignette in dplyr for practical information about tidy evaluation.
The tidyr port is a bit special. While the philosophy of tidy evaluation is that R code should refer to real objects (from the data frame or from the context), we had to make some exceptions to this rule for tidyr. The reason is that several functions accept bare symbols to specify the names of new columns to create (
gather() being a prime example). This is not tidy because the symbol do not represent any actual object. Our workaround is to capture these arguments using
rlang::quo_name() (so they still support quasiquotation and you can unquote symbols or strings). This type of NSE is now discouraged in the tidyverse: symbols in R code should represent real objects.
Following the switch to tidy eval the underscored variants are softly deprecated. However they will remain around for some time and without warning for backward compatibility.
The selecting backend of dplyr has been extracted in a standalone package tidyselect which tidyr now uses for selecting variables. It is used for selecting multiple variables (in
drop_na()) as well as single variables (the
col argument of
separate(), and the
value arguments of
spread()). This implies the following changes:
The arguments for selecting a single variable now support all features from
dplyr::pull(). You can supply a name or a position, including negative positions.
Multiple variables are now selected a bit differently. We now make a strict distinction between data and context expressions. A data expression is either a bare name of an expression like
c(x, y). In a data expression, you can only refer to columns from the data frame. Everything else is a context expression in which you can only refer to objects that you have defined with
You can still refer to contextual objects in a data expression by being explicit. One way of being explicit is to unquote a variable from the environment with the tidy eval operator
On the other hand, select helpers like
start_with()are context expressions. It is therefore easy to refer to objects and they will never be ambiguous with data columns:
x <- "d" drop_na(df, starts_with(x))
While these special rules is in contrast to most dplyr and tidyr verbs (where both the data and the context are in scope) they make sense for selecting functions and should provide more robust and helpful semantics.
CRAN release: 2017-05-04
Register C functions
Added package docs
Patch tests to be compatible with dev dplyr.
CRAN release: 2017-01-10
CRAN release: 2016-08-12
table4bto make their connection more clear. The
table2have been renamed to
CRAN release: 2016-06-14
CRAN release: 2016-06-12
Moved in useful sample datasets from the DSR package.
Made compatible with both dplyr 0.4 and 0.5.
tidyr functions that create new columns are more aggressive about re-encoding the column names as UTF-8.
CRAN release: 2016-02-05
CRAN release: 2016-01-18
unnest() have been overhauled to support a useful way of structuring data frames: the nested data frame. In a grouped data frame, you have one row per observation, and additional metadata define the groups. In a nested data frame, you have one row per group, and the individual observations are stored in a column that is a list of data frames. This is a useful structure when you have lists of other objects (like models) with one element per group.
nest()now produces a single list of data frames called “data” rather than a list column for each variable. Nesting variables are not included in nested data frames. It also works with grouped data frames made by
dplyr::group_by(). You can override the default column name with
.dropargument which controls what happens to other list columns. By default, they’re kept if the output doesn’t require row duplication; otherwise they’re dropped.
full_seq(x, period)creates the full sequence of values from
NULLs in list-columns.
gather()now stores the key column as character, by default. To revert to the previous behaviour of using a factor (which allows you to preserve the ordering of the columns), use
key_factor = TRUE(#96).
seq_range()has been removed. It was never used or announced.
spread()once again creates columns of mixed type when
convert = TRUE(#118, @jennybc).
drop = FALSEhandles zero-length factors (#56).
spread()ing a data frame with only key and value columns creates a one row output (#41).
CRAN release: 2015-09-10
- Fixed bug where attributes of non-gather columns were lost (#104)
CRAN release: 2015-09-08
replace_na()makes it easy to replace missing values with something meaningful for your data.
unnest()can now work with multiple list-columns at the same time. If you don’t supply any columns names, it will unlist all list-columns (#44).
unnest()can also handle columns that are lists of data frames (#58).
tidyr no longer depends on reshape2. This should fix issues if you also try to load reshape (#88).
%>%is re-exported from magrittr.
expand_does SE evaluation correctly so you can pass it a character vector of columns names (or list of formulas etc) (#70).
separate()only displays the first 20 failures (#50). It has finer control over what happens if there are two few matches: you can fill with missing values on either the “left” or the “right” (#49).
separate()no longer throws an error if the number of pieces aren’t as expected - instead it uses drops extra values and fills on the right and gives a warning.
unnest()method for lists has been removed.
CRAN release: 2014-12-05
extraargument which lets you control what happens to extra pieces. The default is to throw an “error”, but you can also “merge” or “drop”.
dropargument, which allows you to preserve missing factor levels (#25). It converts factor value variables to character vectors, instead of embedding a matrix inside the data frame (#35).