pivot_wider()
"widens" data, increasing the number of columns and
decreasing the number of rows. The inverse transformation is
pivot_longer()
.
Learn more in vignette("pivot")
.
Usage
pivot_wider(
data,
id_cols = NULL,
id_expand = FALSE,
names_from = name,
names_prefix = "",
names_sep = "_",
names_glue = NULL,
names_sort = FALSE,
names_vary = "fastest",
names_expand = FALSE,
names_repair = "check_unique",
values_from = value,
values_fill = NULL,
values_fn = NULL,
unused_fn = NULL,
...
)
Arguments
- data
A data frame to pivot.
- id_cols
<
tidy-select
> A set of columns that uniquely identifies each observation. Defaults to all columns indata
except for the columns specified innames_from
andvalues_from
. Typically used when you have redundant variables, i.e. variables whose values are perfectly correlated with existing variables.- id_expand
Should the values in the
id_cols
columns be expanded byexpand()
before pivoting? This results in more rows, the output will contain a complete expansion of all possible values inid_cols
. Implicit factor levels that aren't represented in the data will become explicit. Additionally, the row values corresponding to the expandedid_cols
will be sorted.- names_from, values_from
<
tidy-select
> A pair of arguments describing which column (or columns) to get the name of the output column (names_from
), and which column (or columns) to get the cell values from (values_from
).If
values_from
contains multiple values, the value will be added to the front of the output column.- names_prefix
String added to the start of every variable name. This is particularly useful if
names_from
is a numeric vector and you want to create syntactic variable names.- names_sep
If
names_from
orvalues_from
contains multiple variables, this will be used to join their values together into a single string to use as a column name.- names_glue
Instead of
names_sep
andnames_prefix
, you can supply a glue specification that uses thenames_from
columns (and special.value
) to create custom column names.- names_sort
Should the column names be sorted? If
FALSE
, the default, column names are ordered by first appearance.- names_vary
When
names_from
identifies a column (or columns) with multiple unique values, and multiplevalues_from
columns are provided, in what order should the resulting column names be combined?"fastest"
variesnames_from
values fastest, resulting in a column naming scheme of the form:value1_name1, value1_name2, value2_name1, value2_name2
. This is the default."slowest"
variesnames_from
values slowest, resulting in a column naming scheme of the form:value1_name1, value2_name1, value1_name2, value2_name2
.
- names_expand
Should the values in the
names_from
columns be expanded byexpand()
before pivoting? This results in more columns, the output will contain column names corresponding to a complete expansion of all possible values innames_from
. Implicit factor levels that aren't represented in the data will become explicit. Additionally, the column names will be sorted, identical to whatnames_sort
would produce.- names_repair
What happens if the output has invalid column names? The default,
"check_unique"
is to error if the columns are duplicated. Use"minimal"
to allow duplicates in the output, or"unique"
to de-duplicated by adding numeric suffixes. Seevctrs::vec_as_names()
for more options.- values_fill
Optionally, a (scalar) value that specifies what each
value
should be filled in with when missing.This can be a named list if you want to apply different fill values to different value columns.
- values_fn
Optionally, a function applied to the value in each cell in the output. You will typically use this when the combination of
id_cols
andnames_from
columns does not uniquely identify an observation.This can be a named list if you want to apply different aggregations to different
values_from
columns.- unused_fn
Optionally, a function applied to summarize the values from the unused columns (i.e. columns not identified by
id_cols
,names_from
, orvalues_from
).The default drops all unused columns from the result.
This can be a named list if you want to apply different aggregations to different unused columns.
id_cols
must be supplied forunused_fn
to be useful, since otherwise all unspecified columns will be consideredid_cols
.This is similar to grouping by the
id_cols
then summarizing the unused columns usingunused_fn
.- ...
Additional arguments passed on to methods.
Details
pivot_wider()
is an updated approach to spread()
, designed to be both
simpler to use and to handle more use cases. We recommend you use
pivot_wider()
for new code; spread()
isn't going away but is no longer
under active development.
See also
pivot_wider_spec()
to pivot "by hand" with a data frame that
defines a pivotting specification.
Examples
# See vignette("pivot") for examples and explanation
fish_encounters
#> # A tibble: 114 × 3
#> fish station seen
#> <fct> <fct> <int>
#> 1 4842 Release 1
#> 2 4842 I80_1 1
#> 3 4842 Lisbon 1
#> 4 4842 Rstr 1
#> 5 4842 Base_TD 1
#> 6 4842 BCE 1
#> 7 4842 BCW 1
#> 8 4842 BCE2 1
#> 9 4842 BCW2 1
#> 10 4842 MAE 1
#> # … with 104 more rows
fish_encounters %>%
pivot_wider(names_from = station, values_from = seen)
#> # A tibble: 19 × 12
#> fish Release I80_1 Lisbon Rstr Base_TD BCE BCW BCE2 BCW2 MAE
#> <fct> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
#> 1 4842 1 1 1 1 1 1 1 1 1 1
#> 2 4843 1 1 1 1 1 1 1 1 1 1
#> 3 4844 1 1 1 1 1 1 1 1 1 1
#> 4 4845 1 1 1 1 1 NA NA NA NA NA
#> 5 4847 1 1 1 NA NA NA NA NA NA NA
#> 6 4848 1 1 1 1 NA NA NA NA NA NA
#> 7 4849 1 1 NA NA NA NA NA NA NA NA
#> 8 4850 1 1 NA 1 1 1 1 NA NA NA
#> 9 4851 1 1 NA NA NA NA NA NA NA NA
#> 10 4854 1 1 NA NA NA NA NA NA NA NA
#> 11 4855 1 1 1 1 1 NA NA NA NA NA
#> 12 4857 1 1 1 1 1 1 1 1 1 NA
#> 13 4858 1 1 1 1 1 1 1 1 1 1
#> 14 4859 1 1 1 1 1 NA NA NA NA NA
#> 15 4861 1 1 1 1 1 1 1 1 1 1
#> 16 4862 1 1 1 1 1 1 1 1 1 NA
#> 17 4863 1 1 NA NA NA NA NA NA NA NA
#> 18 4864 1 1 NA NA NA NA NA NA NA NA
#> 19 4865 1 1 1 NA NA NA NA NA NA NA
#> # … with 1 more variable: MAW <int>
# Fill in missing values
fish_encounters %>%
pivot_wider(names_from = station, values_from = seen, values_fill = 0)
#> # A tibble: 19 × 12
#> fish Release I80_1 Lisbon Rstr Base_TD BCE BCW BCE2 BCW2 MAE
#> <fct> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
#> 1 4842 1 1 1 1 1 1 1 1 1 1
#> 2 4843 1 1 1 1 1 1 1 1 1 1
#> 3 4844 1 1 1 1 1 1 1 1 1 1
#> 4 4845 1 1 1 1 1 0 0 0 0 0
#> 5 4847 1 1 1 0 0 0 0 0 0 0
#> 6 4848 1 1 1 1 0 0 0 0 0 0
#> 7 4849 1 1 0 0 0 0 0 0 0 0
#> 8 4850 1 1 0 1 1 1 1 0 0 0
#> 9 4851 1 1 0 0 0 0 0 0 0 0
#> 10 4854 1 1 0 0 0 0 0 0 0 0
#> 11 4855 1 1 1 1 1 0 0 0 0 0
#> 12 4857 1 1 1 1 1 1 1 1 1 0
#> 13 4858 1 1 1 1 1 1 1 1 1 1
#> 14 4859 1 1 1 1 1 0 0 0 0 0
#> 15 4861 1 1 1 1 1 1 1 1 1 1
#> 16 4862 1 1 1 1 1 1 1 1 1 0
#> 17 4863 1 1 0 0 0 0 0 0 0 0
#> 18 4864 1 1 0 0 0 0 0 0 0 0
#> 19 4865 1 1 1 0 0 0 0 0 0 0
#> # … with 1 more variable: MAW <int>
# Generate column names from multiple variables
us_rent_income
#> # A tibble: 104 × 5
#> GEOID NAME variable estimate moe
#> <chr> <chr> <chr> <dbl> <dbl>
#> 1 01 Alabama income 24476 136
#> 2 01 Alabama rent 747 3
#> 3 02 Alaska income 32940 508
#> 4 02 Alaska rent 1200 13
#> 5 04 Arizona income 27517 148
#> 6 04 Arizona rent 972 4
#> 7 05 Arkansas income 23789 165
#> 8 05 Arkansas rent 709 5
#> 9 06 California income 29454 109
#> 10 06 California rent 1358 3
#> # … with 94 more rows
us_rent_income %>%
pivot_wider(
names_from = variable,
values_from = c(estimate, moe)
)
#> # A tibble: 52 × 6
#> GEOID NAME estimate_income estimate_rent moe_income moe_rent
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 01 Alabama 24476 747 136 3
#> 2 02 Alaska 32940 1200 508 13
#> 3 04 Arizona 27517 972 148 4
#> 4 05 Arkansas 23789 709 165 5
#> 5 06 California 29454 1358 109 3
#> 6 08 Colorado 32401 1125 109 5
#> 7 09 Connecticut 35326 1123 195 5
#> 8 10 Delaware 31560 1076 247 10
#> 9 11 District of Co… 43198 1424 681 17
#> 10 12 Florida 25952 1077 70 3
#> # … with 42 more rows
# You can control whether `names_from` values vary fastest or slowest
# relative to the `values_from` column names using `names_vary`.
us_rent_income %>%
pivot_wider(
names_from = variable,
values_from = c(estimate, moe),
names_vary = "slowest"
)
#> # A tibble: 52 × 6
#> GEOID NAME estimate_income moe_income estimate_rent moe_rent
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 01 Alabama 24476 136 747 3
#> 2 02 Alaska 32940 508 1200 13
#> 3 04 Arizona 27517 148 972 4
#> 4 05 Arkansas 23789 165 709 5
#> 5 06 California 29454 109 1358 3
#> 6 08 Colorado 32401 109 1125 5
#> 7 09 Connecticut 35326 195 1123 5
#> 8 10 Delaware 31560 247 1076 10
#> 9 11 District of Co… 43198 681 1424 17
#> 10 12 Florida 25952 70 1077 3
#> # … with 42 more rows
# When there are multiple `names_from` or `values_from`, you can use
# use `names_sep` or `names_glue` to control the output variable names
us_rent_income %>%
pivot_wider(
names_from = variable,
names_sep = ".",
values_from = c(estimate, moe)
)
#> # A tibble: 52 × 6
#> GEOID NAME estimate.income estimate.rent moe.income moe.rent
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 01 Alabama 24476 747 136 3
#> 2 02 Alaska 32940 1200 508 13
#> 3 04 Arizona 27517 972 148 4
#> 4 05 Arkansas 23789 709 165 5
#> 5 06 California 29454 1358 109 3
#> 6 08 Colorado 32401 1125 109 5
#> 7 09 Connecticut 35326 1123 195 5
#> 8 10 Delaware 31560 1076 247 10
#> 9 11 District of Co… 43198 1424 681 17
#> 10 12 Florida 25952 1077 70 3
#> # … with 42 more rows
us_rent_income %>%
pivot_wider(
names_from = variable,
names_glue = "{variable}_{.value}",
values_from = c(estimate, moe)
)
#> # A tibble: 52 × 6
#> GEOID NAME income_estimate rent_estimate income_moe rent_moe
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 01 Alabama 24476 747 136 3
#> 2 02 Alaska 32940 1200 508 13
#> 3 04 Arizona 27517 972 148 4
#> 4 05 Arkansas 23789 709 165 5
#> 5 06 California 29454 1358 109 3
#> 6 08 Colorado 32401 1125 109 5
#> 7 09 Connecticut 35326 1123 195 5
#> 8 10 Delaware 31560 1076 247 10
#> 9 11 District of Co… 43198 1424 681 17
#> 10 12 Florida 25952 1077 70 3
#> # … with 42 more rows
# Can perform aggregation with `values_fn`
warpbreaks <- as_tibble(warpbreaks[c("wool", "tension", "breaks")])
warpbreaks
#> # A tibble: 54 × 3
#> wool tension breaks
#> <fct> <fct> <dbl>
#> 1 A L 26
#> 2 A L 30
#> 3 A L 54
#> 4 A L 25
#> 5 A L 70
#> 6 A L 52
#> 7 A L 51
#> 8 A L 26
#> 9 A L 67
#> 10 A M 18
#> # … with 44 more rows
warpbreaks %>%
pivot_wider(
names_from = wool,
values_from = breaks,
values_fn = mean
)
#> # A tibble: 3 × 3
#> tension A B
#> <fct> <dbl> <dbl>
#> 1 L 44.6 28.2
#> 2 M 24 28.8
#> 3 H 24.6 18.8
# Can pass an anonymous function to `values_fn` when you
# need to supply additional arguments
warpbreaks$breaks[1] <- NA
warpbreaks %>%
pivot_wider(
names_from = wool,
values_from = breaks,
values_fn = ~mean(.x, na.rm = TRUE)
)
#> # A tibble: 3 × 3
#> tension A B
#> <fct> <dbl> <dbl>
#> 1 L 46.9 28.2
#> 2 M 24 28.8
#> 3 H 24.6 18.8