Maturing lifecycle

pivot_longer() "lengthens" data, increasing the number of rows and decreasing the number of columns. The inverse transformation is pivot_wider()

Learn more in vignette("pivot").

pivot_longer(data, cols, names_to = "name", names_prefix = NULL,
  names_sep = NULL, names_pattern = NULL, names_ptypes = list(),
  names_repair = "check_unique", values_to = "value",
  values_drop_na = FALSE, values_ptypes = list())

Arguments

data

A data frame to pivot.

cols

Columns to pivot into longer format. This takes a tidyselect specification.

names_to

A string specifying the name of the column to create from the data stored in the column names of data.

Can be a character vector, creating multiple columns, if names_sep or names_pattern is provided.

names_prefix

A regular expression used to remove matching text from the start of each variable name.

names_sep, names_pattern

If names_to contains multiple values, these arguments control how the column name is broken up.

names_sep takes the same specification as separate(), and can either be a numeric vector (specifying positions to break on), or a single string (specifying a regular expression to split on).

names_pattern takes the same specification as extract(), a regular expression containing matching groups (()).

If these arguments do not give you enough control, use pivot_longer_spec() to create a spec object and process manually as needed.

names_ptypes, values_ptypes

A list of column name-prototype pairs. A prototype (or ptype for short) is a zero-length vector (like integer() or numeric()) that defines the type, class, and attributes of a vector.

If not specified, the type of the columns generated from names_to will be character, and the type of the variables generated from values_to will be the common type of the input columns used to generate them.

names_repair

What happen if the output has invalid column names? The default, "check_unique" is to error if the columns are duplicated. Use "minimal" to allow duplicates in the output, or "unique" to de-duplicated by adding numeric suffixes. See vctrs::vec_as_names() for more options.

values_to

A string specifying the name of the column to create from the data stored in cell values. If names_to is a character containing the special .value sentinel, this value will be ignored, and the name of the value column will be derived from part of the existing column names.

values_drop_na

If TRUE, will drop rows that contain only NAs in the value_to column. This effectively converts explicit missing values to implicit missing values, and should generally be used only when missing values in data were created by its structure.

Details

pivot_longer() is an updated approach to gather(), designed to be both simpler to use and to handle more use cases. We recommend you use pivot_longer() for new code; gather() isn't going away but is no longer under active development.

Examples

# See vignette("pivot") for examples and explanation # Simplest case where column names are character data relig_income
#> # A tibble: 18 x 11 #> religion `<$10k` `$10-20k` `$20-30k` `$30-40k` `$40-50k` `$50-75k` `$75-100k` #> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 Agnostic 27 34 60 81 76 137 122 #> 2 Atheist 12 27 37 52 35 70 73 #> 3 Buddhist 27 21 30 34 33 58 62 #> 4 Catholic 418 617 732 670 638 1116 949 #> 5 Don’t k… 15 14 15 11 10 35 21 #> 6 Evangel… 575 869 1064 982 881 1486 949 #> 7 Hindu 1 9 7 9 11 34 47 #> 8 Histori… 228 244 236 238 197 223 131 #> 9 Jehovah… 20 27 24 24 21 30 15 #> 10 Jewish 19 19 25 25 30 95 69 #> 11 Mainlin… 289 495 619 655 651 1107 939 #> 12 Mormon 29 40 48 51 56 112 85 #> 13 Muslim 6 7 9 10 9 23 16 #> 14 Orthodox 13 17 23 32 32 47 38 #> 15 Other C… 9 7 11 13 13 14 18 #> 16 Other F… 20 33 40 46 49 63 46 #> 17 Other W… 5 2 3 4 2 7 3 #> 18 Unaffil… 217 299 374 365 341 528 407 #> # … with 3 more variables: `$100-150k` <dbl>, `>150k` <dbl>, `Don't #> # know/refused` <dbl>
relig_income %>% pivot_longer(-religion, names_to = "income", values_to = "count")
#> # A tibble: 180 x 3 #> religion income count #> <chr> <chr> <dbl> #> 1 Agnostic <$10k 27 #> 2 Agnostic $10-20k 34 #> 3 Agnostic $20-30k 60 #> 4 Agnostic $30-40k 81 #> 5 Agnostic $40-50k 76 #> 6 Agnostic $50-75k 137 #> 7 Agnostic $75-100k 122 #> 8 Agnostic $100-150k 109 #> 9 Agnostic >150k 84 #> 10 Agnostic Don't know/refused 96 #> # … with 170 more rows
# Slightly more complex case where columns have common prefix, # and missing missings are structural so should be dropped. billboard
#> # A tibble: 317 x 79 #> artist track date.entered wk1 wk2 wk3 wk4 wk5 wk6 wk7 wk8 #> <chr> <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 2 Pac Baby… 2000-02-26 87 82 72 77 87 94 99 NA #> 2 2Ge+h… The … 2000-09-02 91 87 92 NA NA NA NA NA #> 3 3 Doo… Kryp… 2000-04-08 81 70 68 67 66 57 54 53 #> 4 3 Doo… Loser 2000-10-21 76 76 72 69 67 65 55 59 #> 5 504 B… Wobb… 2000-04-15 57 34 25 17 17 31 36 49 #> 6 98^0 Give… 2000-08-19 51 39 34 26 26 19 2 2 #> 7 A*Tee… Danc… 2000-07-08 97 97 96 95 100 NA NA NA #> 8 Aaliy… I Do… 2000-01-29 84 62 51 41 38 35 35 38 #> 9 Aaliy… Try … 2000-03-18 59 53 38 28 21 18 16 14 #> 10 Adams… Open… 2000-08-26 76 76 74 69 68 67 61 58 #> # … with 307 more rows, and 68 more variables: wk9 <dbl>, wk10 <dbl>, #> # wk11 <dbl>, wk12 <dbl>, wk13 <dbl>, wk14 <dbl>, wk15 <dbl>, wk16 <dbl>, #> # wk17 <dbl>, wk18 <dbl>, wk19 <dbl>, wk20 <dbl>, wk21 <dbl>, wk22 <dbl>, #> # wk23 <dbl>, wk24 <dbl>, wk25 <dbl>, wk26 <dbl>, wk27 <dbl>, wk28 <dbl>, #> # wk29 <dbl>, wk30 <dbl>, wk31 <dbl>, wk32 <dbl>, wk33 <dbl>, wk34 <dbl>, #> # wk35 <dbl>, wk36 <dbl>, wk37 <dbl>, wk38 <dbl>, wk39 <dbl>, wk40 <dbl>, #> # wk41 <dbl>, wk42 <dbl>, wk43 <dbl>, wk44 <dbl>, wk45 <dbl>, wk46 <dbl>, #> # wk47 <dbl>, wk48 <dbl>, wk49 <dbl>, wk50 <dbl>, wk51 <dbl>, wk52 <dbl>, #> # wk53 <dbl>, wk54 <dbl>, wk55 <dbl>, wk56 <dbl>, wk57 <dbl>, wk58 <dbl>, #> # wk59 <dbl>, wk60 <dbl>, wk61 <dbl>, wk62 <dbl>, wk63 <dbl>, wk64 <dbl>, #> # wk65 <dbl>, wk66 <lgl>, wk67 <lgl>, wk68 <lgl>, wk69 <lgl>, wk70 <lgl>, #> # wk71 <lgl>, wk72 <lgl>, wk73 <lgl>, wk74 <lgl>, wk75 <lgl>, wk76 <lgl>
billboard %>% pivot_longer( cols = starts_with("wk"), names_to = "week", names_prefix = "wk", values_to = "rank", values_drop_na = TRUE )
#> # A tibble: 5,307 x 5 #> artist track date.entered week rank #> <chr> <chr> <date> <chr> <dbl> #> 1 2 Pac Baby Don't Cry (Keep... 2000-02-26 1 87 #> 2 2 Pac Baby Don't Cry (Keep... 2000-02-26 2 82 #> 3 2 Pac Baby Don't Cry (Keep... 2000-02-26 3 72 #> 4 2 Pac Baby Don't Cry (Keep... 2000-02-26 4 77 #> 5 2 Pac Baby Don't Cry (Keep... 2000-02-26 5 87 #> 6 2 Pac Baby Don't Cry (Keep... 2000-02-26 6 94 #> 7 2 Pac Baby Don't Cry (Keep... 2000-02-26 7 99 #> 8 2Ge+her The Hardest Part Of ... 2000-09-02 1 91 #> 9 2Ge+her The Hardest Part Of ... 2000-09-02 2 87 #> 10 2Ge+her The Hardest Part Of ... 2000-09-02 3 92 #> # … with 5,297 more rows
# Multiple variables stored in colum names who %>% pivot_longer( cols = new_sp_m014:newrel_f65, names_to = c("diagnosis", "gender", "age"), names_pattern = "new_?(.*)_(.)(.*)", values_to = "count" )
#> # A tibble: 405,440 x 8 #> country iso2 iso3 year diagnosis gender age count #> <chr> <chr> <chr> <int> <chr> <chr> <chr> <int> #> 1 Afghanistan AF AFG 1980 sp m 014 NA #> 2 Afghanistan AF AFG 1980 sp m 1524 NA #> 3 Afghanistan AF AFG 1980 sp m 2534 NA #> 4 Afghanistan AF AFG 1980 sp m 3544 NA #> 5 Afghanistan AF AFG 1980 sp m 4554 NA #> 6 Afghanistan AF AFG 1980 sp m 5564 NA #> 7 Afghanistan AF AFG 1980 sp m 65 NA #> 8 Afghanistan AF AFG 1980 sp f 014 NA #> 9 Afghanistan AF AFG 1980 sp f 1524 NA #> 10 Afghanistan AF AFG 1980 sp f 2534 NA #> # … with 405,430 more rows
# Multiple observations per row anscombe
#> x1 x2 x3 x4 y1 y2 y3 y4 #> 1 10 10 10 8 8.04 9.14 7.46 6.58 #> 2 8 8 8 8 6.95 8.14 6.77 5.76 #> 3 13 13 13 8 7.58 8.74 12.74 7.71 #> 4 9 9 9 8 8.81 8.77 7.11 8.84 #> 5 11 11 11 8 8.33 9.26 7.81 8.47 #> 6 14 14 14 8 9.96 8.10 8.84 7.04 #> 7 6 6 6 8 7.24 6.13 6.08 5.25 #> 8 4 4 4 19 4.26 3.10 5.39 12.50 #> 9 12 12 12 8 10.84 9.13 8.15 5.56 #> 10 7 7 7 8 4.82 7.26 6.42 7.91 #> 11 5 5 5 8 5.68 4.74 5.73 6.89
anscombe %>% pivot_longer(everything(), names_to = c(".value", "set"), names_pattern = "(.)(.)" )
#> # A tibble: 44 x 3 #> set x y #> <chr> <dbl> <dbl> #> 1 1 10 8.04 #> 2 2 10 9.14 #> 3 3 10 7.46 #> 4 4 8 6.58 #> 5 1 8 6.95 #> 6 2 8 8.14 #> 7 3 8 6.77 #> 8 4 8 5.76 #> 9 1 13 7.58 #> 10 2 13 8.74 #> # … with 34 more rows