Gather takes multiple columns and collapses into key-value pairs, duplicating all other columns as needed. You use gather() when you notice that you have columns that are not variables.

gather(data, key, value, ..., na.rm = FALSE, convert = FALSE,
  factor_key = FALSE)

Arguments

data

A data frame.

key, value

Names of key and value columns to create in output.

...

Specification of columns to gather. Use bare variable names. Select all variables between x and z with x:z, exclude y with -y. For more options, see the select documentation.

na.rm

If TRUE, will remove rows from output where the value column in NA.

convert

If TRUE will automatically run type.convert on the key column. This is useful if the column names are actually numeric, integer, or logical.

factor_key

If FALSE, the default, the key values will be stored as a character vector. If TRUE, will be stored as a factor, which preserves the original ordering of the columns.

See also

gather_ for a version that uses regular evaluation and is suitable for programming with.

Examples

library(dplyr) # From http://stackoverflow.com/questions/1181060 stocks <- data_frame( time = as.Date('2009-01-01') + 0:9, X = rnorm(10, 0, 1), Y = rnorm(10, 0, 2), Z = rnorm(10, 0, 4) ) gather(stocks, stock, price, -time)
#> # A tibble: 30 × 3 #> time stock price #> <date> <chr> <dbl> #> 1 2009-01-01 X -1.63098940 #> 2 2009-01-02 X 0.51242695 #> 3 2009-01-03 X -1.86301149 #> 4 2009-01-04 X -0.52201251 #> 5 2009-01-05 X -0.05260191 #> 6 2009-01-06 X 0.54299634 #> 7 2009-01-07 X -0.91407483 #> 8 2009-01-08 X 0.46815442 #> 9 2009-01-09 X 0.36295126 #> 10 2009-01-10 X -1.30454355 #> # ... with 20 more rows
stocks %>% gather(stock, price, -time)
#> # A tibble: 30 × 3 #> time stock price #> <date> <chr> <dbl> #> 1 2009-01-01 X -1.63098940 #> 2 2009-01-02 X 0.51242695 #> 3 2009-01-03 X -1.86301149 #> 4 2009-01-04 X -0.52201251 #> 5 2009-01-05 X -0.05260191 #> 6 2009-01-06 X 0.54299634 #> 7 2009-01-07 X -0.91407483 #> 8 2009-01-08 X 0.46815442 #> 9 2009-01-09 X 0.36295126 #> 10 2009-01-10 X -1.30454355 #> # ... with 20 more rows
# get first observation for each Species in iris data -- base R mini_iris <- iris[c(1, 51, 101), ] # gather Sepal.Length, Sepal.Width, Petal.Length, Petal.Width gather(mini_iris, key = flower_att, value = measurement, Sepal.Length, Sepal.Width, Petal.Length, Petal.Width)
#> Species flower_att measurement #> 1 setosa Sepal.Length 5.1 #> 2 versicolor Sepal.Length 7.0 #> 3 virginica Sepal.Length 6.3 #> 4 setosa Sepal.Width 3.5 #> 5 versicolor Sepal.Width 3.2 #> 6 virginica Sepal.Width 3.3 #> 7 setosa Petal.Length 1.4 #> 8 versicolor Petal.Length 4.7 #> 9 virginica Petal.Length 6.0 #> 10 setosa Petal.Width 0.2 #> 11 versicolor Petal.Width 1.4 #> 12 virginica Petal.Width 2.5
# same result but less verbose gather(mini_iris, key = flower_att, value = measurement, -Species)
#> Species flower_att measurement #> 1 setosa Sepal.Length 5.1 #> 2 versicolor Sepal.Length 7.0 #> 3 virginica Sepal.Length 6.3 #> 4 setosa Sepal.Width 3.5 #> 5 versicolor Sepal.Width 3.2 #> 6 virginica Sepal.Width 3.3 #> 7 setosa Petal.Length 1.4 #> 8 versicolor Petal.Length 4.7 #> 9 virginica Petal.Length 6.0 #> 10 setosa Petal.Width 0.2 #> 11 versicolor Petal.Width 1.4 #> 12 virginica Petal.Width 2.5
# repeat iris example using dplyr and the pipe operator library(dplyr) mini_iris <- iris %>% group_by(Species) %>% slice(1) mini_iris %>% gather(key = flower_att, value = measurement, -Species)
#> Source: local data frame [12 x 3] #> Groups: Species [3] #> #> Species flower_att measurement #> <fctr> <chr> <dbl> #> 1 setosa Sepal.Length 5.1 #> 2 versicolor Sepal.Length 7.0 #> 3 virginica Sepal.Length 6.3 #> 4 setosa Sepal.Width 3.5 #> 5 versicolor Sepal.Width 3.2 #> 6 virginica Sepal.Width 3.3 #> 7 setosa Petal.Length 1.4 #> 8 versicolor Petal.Length 4.7 #> 9 virginica Petal.Length 6.0 #> 10 setosa Petal.Width 0.2 #> 11 versicolor Petal.Width 1.4 #> 12 virginica Petal.Width 2.5