[Maturing]

Chopping and unchopping preserve the width of a data frame, changing its length. chop() makes df shorter by converting rows within each group into list-columns. unchop() makes df longer by expanding list-columns so that each element of the list-column gets its own row in the output. chop() and unchop() are building blocks for more complicated functions (like unnest(), unnest_longer(), and unnest_wider()) and are generally more suitable for programming than interactive data analysis.

chop(data, cols)

unchop(data, cols, keep_empty = FALSE, ptype = NULL)

Arguments

data

A data frame.

cols

<tidy-select> Columns to chop or unchop (automatically quoted).

For unchop(), each column should be a list-column containing generalised vectors (e.g. any mix of NULLs, atomic vector, S3 vectors, a lists, or data frames).

keep_empty

By default, you get one row of output for each element of the list your unchopping/unnesting. This means that if there's a size-0 element (like NULL or an empty data frame), that entire row will be dropped from the output. If you want to preserve all rows, use keep_empty = TRUE to replace size-0 elements with a single row of missing values.

ptype

Optionally, supply a data frame prototype for the output cols, overriding the default that will be guessed from the combination of individual values.

Details

Generally, unchopping is more useful than chopping because it simplifies a complex data structure, and nest()ing is usually more appropriate that chop()ing` since it better preserves the connections between observations.

chop() creates list-columns of class vctrs::list_of() to ensure consistent behaviour when the chopped data frame is emptied. For instance this helps getting back the original column types after the roundtrip chop and unchop. Because <list_of> keeps tracks of the type of its elements, unchop() is able to reconstitute the correct vector type even for empty list-columns.

Examples

# Chop ============================================================== df <- tibble(x = c(1, 1, 1, 2, 2, 3), y = 1:6, z = 6:1) # Note that we get one row of output for each unique combination of # non-chopped variables df %>% chop(c(y, z))
#> # A tibble: 3 x 3 #> x y z #> <dbl> <list<int>> <list<int>> #> 1 1 [3] [3] #> 2 2 [2] [2] #> 3 3 [1] [1]
# cf nest df %>% nest(data = c(y, z))
#> # A tibble: 3 x 2 #> x data #> <dbl> <list> #> 1 1 <tibble [3 × 2]> #> 2 2 <tibble [2 × 2]> #> 3 3 <tibble [1 × 2]>
# Unchop ============================================================ df <- tibble(x = 1:4, y = list(integer(), 1L, 1:2, 1:3)) df %>% unchop(y)
#> # A tibble: 6 x 2 #> x y #> <int> <int> #> 1 2 1 #> 2 3 1 #> 3 3 2 #> 4 4 1 #> 5 4 2 #> 6 4 3
df %>% unchop(y, keep_empty = TRUE)
#> # A tibble: 7 x 2 #> x y #> <int> <int> #> 1 1 NA #> 2 2 1 #> 3 3 1 #> 4 3 2 #> 5 4 1 #> 6 4 2 #> 7 4 3
# Incompatible types ------------------------------------------------- # If the list-col contains types that can not be natively df <- tibble(x = 1:2, y = list("1", 1:3)) try(df %>% unchop(y))
#> Error : Can't combine `..1` <character> and `..2` <integer>.
# Unchopping data frames ----------------------------------------------------- # Unchopping a list-col of data frames must generate a df-col because # unchop leaves the column names unchanged df <- tibble(x = 1:3, y = list(NULL, tibble(x = 1), tibble(y = 1:2))) df %>% unchop(y)
#> # A tibble: 3 x 2 #> x y$x $y #> <int> <dbl> <int> #> 1 2 1 NA #> 2 3 NA 1 #> 3 3 NA 2
df %>% unchop(y, keep_empty = TRUE)
#> # A tibble: 4 x 2 #> x y$x $y #> <int> <dbl> <int> #> 1 1 NA NA #> 2 2 1 NA #> 3 3 NA 1 #> 4 3 NA 2