library(tidyr)
library(dplyr)
library(purrr)

Basics

A nested data frame is a data frame where one (or more) columns is a list of data frames. You can create simple nested data frames by hand:

(It is possible to create list-columns in regular data frames, not just in tibbles, but it’s considerably more work because the default behaviour of data.frame() is to treat lists as lists of columns.)

But more commonly you’ll create them with tidyr::nest():

nest() specifies which variables should be nested inside; an alternative is to use dplyr::group_by() to describe which variables should be kept outside.

I think nesting is easiest to understand in connection to grouped data: each row in the output corresponds to one group in the input. We’ll see shortly this is particularly convenient when you have other per-group objects.

The opposite of nest() is unnest(). You give it the name of a list-column containing data frames, and it row-binds the data frames together, repeating the outer columns the right number of times to line up.

Nested data and models

Nested data is a great fit for problems where you have one of something for each group. A common place this arises is when you’re fitting multiple models.

mtcars_nested <- mtcars %>% 
  group_by(cyl) %>% 
  nest()

mtcars_nested
#>   cyl
#> 1   6
#> 2   4
#> 3   8
#>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     data
#> 1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     21.000, 21.000, 21.400, 18.100, 19.200, 17.800, 19.700, 160.000, 160.000, 258.000, 225.000, 167.600, 167.600, 145.000, 110.000, 110.000, 110.000, 105.000, 123.000, 123.000, 175.000, 3.900, 3.900, 3.080, 2.760, 3.920, 3.920, 3.620, 2.620, 2.875, 3.215, 3.460, 3.440, 3.440, 2.770, 16.460, 17.020, 19.440, 20.220, 18.300, 18.900, 15.500, 0.000, 0.000, 1.000, 1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 0.000, 0.000, 0.000, 0.000, 1.000, 4.000, 4.000, 3.000, 3.000, 4.000, 4.000, 5.000, 4.000, 4.000, 1.000, 1.000, 4.000, 4.000, 6.000
#> 2                                                                                                                                                                                                                                                   22.800, 24.400, 22.800, 32.400, 30.400, 33.900, 21.500, 27.300, 26.000, 30.400, 21.400, 108.000, 146.700, 140.800, 78.700, 75.700, 71.100, 120.100, 79.000, 120.300, 95.100, 121.000, 93.000, 62.000, 95.000, 66.000, 52.000, 65.000, 97.000, 66.000, 91.000, 113.000, 109.000, 3.850, 3.690, 3.920, 4.080, 4.930, 4.220, 3.700, 4.080, 4.430, 3.770, 4.110, 2.320, 3.190, 3.150, 2.200, 1.615, 1.835, 2.465, 1.935, 2.140, 1.513, 2.780, 18.610, 20.000, 22.900, 19.470, 18.520, 19.900, 20.010, 18.900, 16.700, 16.900, 18.600, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000, 0.000, 0.000, 1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 4.000, 4.000, 4.000, 4.000, 4.000, 4.000, 3.000, 4.000, 5.000, 5.000, 4.000, 1.000, 2.000, 2.000, 1.000, 2.000, 1.000, 1.000, 1.000, 2.000, 2.000, 2.000
#> 3 18.700, 14.300, 16.400, 17.300, 15.200, 10.400, 10.400, 14.700, 15.500, 15.200, 13.300, 19.200, 15.800, 15.000, 360.000, 360.000, 275.800, 275.800, 275.800, 472.000, 460.000, 440.000, 318.000, 304.000, 350.000, 400.000, 351.000, 301.000, 175.000, 245.000, 180.000, 180.000, 180.000, 205.000, 215.000, 230.000, 150.000, 150.000, 245.000, 175.000, 264.000, 335.000, 3.150, 3.210, 3.070, 3.070, 3.070, 2.930, 3.000, 3.230, 2.760, 3.150, 3.730, 3.080, 4.220, 3.540, 3.440, 3.570, 4.070, 3.730, 3.780, 5.250, 5.424, 5.345, 3.520, 3.435, 3.840, 3.845, 3.170, 3.570, 17.020, 15.840, 17.400, 17.600, 18.000, 17.980, 17.820, 17.420, 16.870, 17.300, 15.410, 17.050, 14.500, 14.600, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 1.000, 1.000, 3.000, 3.000, 3.000, 3.000, 3.000, 3.000, 3.000, 3.000, 3.000, 3.000, 3.000, 3.000, 5.000, 5.000, 2.000, 4.000, 3.000, 3.000, 3.000, 4.000, 4.000, 4.000, 2.000, 2.000, 4.000, 2.000, 4.000, 8.000

Once you have a list of data frames, it’s very natural to produce a list of models:

mtcars_nested <- mtcars_nested %>% 
  mutate(model = map(data, function(df) lm(mpg ~ wt, data = df)))
mtcars_nested
#>   cyl
#> 1   6
#> 2   4
#> 3   8
#>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     data
#> 1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     21.000, 21.000, 21.400, 18.100, 19.200, 17.800, 19.700, 160.000, 160.000, 258.000, 225.000, 167.600, 167.600, 145.000, 110.000, 110.000, 110.000, 105.000, 123.000, 123.000, 175.000, 3.900, 3.900, 3.080, 2.760, 3.920, 3.920, 3.620, 2.620, 2.875, 3.215, 3.460, 3.440, 3.440, 2.770, 16.460, 17.020, 19.440, 20.220, 18.300, 18.900, 15.500, 0.000, 0.000, 1.000, 1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 0.000, 0.000, 0.000, 0.000, 1.000, 4.000, 4.000, 3.000, 3.000, 4.000, 4.000, 5.000, 4.000, 4.000, 1.000, 1.000, 4.000, 4.000, 6.000
#> 2                                                                                                                                                                                                                                                   22.800, 24.400, 22.800, 32.400, 30.400, 33.900, 21.500, 27.300, 26.000, 30.400, 21.400, 108.000, 146.700, 140.800, 78.700, 75.700, 71.100, 120.100, 79.000, 120.300, 95.100, 121.000, 93.000, 62.000, 95.000, 66.000, 52.000, 65.000, 97.000, 66.000, 91.000, 113.000, 109.000, 3.850, 3.690, 3.920, 4.080, 4.930, 4.220, 3.700, 4.080, 4.430, 3.770, 4.110, 2.320, 3.190, 3.150, 2.200, 1.615, 1.835, 2.465, 1.935, 2.140, 1.513, 2.780, 18.610, 20.000, 22.900, 19.470, 18.520, 19.900, 20.010, 18.900, 16.700, 16.900, 18.600, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000, 0.000, 0.000, 1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 4.000, 4.000, 4.000, 4.000, 4.000, 4.000, 3.000, 4.000, 5.000, 5.000, 4.000, 1.000, 2.000, 2.000, 1.000, 2.000, 1.000, 1.000, 1.000, 2.000, 2.000, 2.000
#> 3 18.700, 14.300, 16.400, 17.300, 15.200, 10.400, 10.400, 14.700, 15.500, 15.200, 13.300, 19.200, 15.800, 15.000, 360.000, 360.000, 275.800, 275.800, 275.800, 472.000, 460.000, 440.000, 318.000, 304.000, 350.000, 400.000, 351.000, 301.000, 175.000, 245.000, 180.000, 180.000, 180.000, 205.000, 215.000, 230.000, 150.000, 150.000, 245.000, 175.000, 264.000, 335.000, 3.150, 3.210, 3.070, 3.070, 3.070, 2.930, 3.000, 3.230, 2.760, 3.150, 3.730, 3.080, 4.220, 3.540, 3.440, 3.570, 4.070, 3.730, 3.780, 5.250, 5.424, 5.345, 3.520, 3.435, 3.840, 3.845, 3.170, 3.570, 17.020, 15.840, 17.400, 17.600, 18.000, 17.980, 17.820, 17.420, 16.870, 17.300, 15.410, 17.050, 14.500, 14.600, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 1.000, 1.000, 3.000, 3.000, 3.000, 3.000, 3.000, 3.000, 3.000, 3.000, 3.000, 3.000, 3.000, 3.000, 5.000, 5.000, 2.000, 4.000, 3.000, 3.000, 3.000, 4.000, 4.000, 4.000, 2.000, 2.000, 4.000, 2.000, 4.000, 8.000
#>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        model
#> 1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  28.40884, -2.780106, -0.124967, 0.5839601, 1.929196, -0.689678, 0.3547199, -1.04528, -1.007951, -52.23469, -2.426656, 2.111436, -0.3526643, 0.679099, -0.720901, -1.10683, 2, 21.12497, 20.41604, 19.4708, 18.78968, 18.84528, 18.84528, 20.70795, 0, 1, -2.645751, 0.3779645, 0.3779645, 0.3779645, 0.3779645, 0.3779645, 0.3779645, -8.247185, 0.8728647, -0.2683341, -0.5490191, -0.526106, -0.526106, 0.2414814, 1.377964, 1.121188, 1, 2, 1e-07, 2, 5, lm(formula = mpg ~ wt, data = df), mpg ~ wt, 21, 21, 21.4, 18.1, 19.2, 17.8, 19.7, 2.62, 2.875, 3.215, 3.46, 3.44, 3.44, 2.77
#> 2                                                                                                                                                                                                      39.5712, -5.647025, -3.670097, 2.842815, 1.016934, 5.25226, -0.05125022, 4.691095, -4.151279, -1.344202, -1.486562, -0.6272468, -2.472466, -88.43328, 10.17096, 0.6947654, 6.230721, 1.728126, 6.169273, -3.535624, -0.00293297, -0.4259551, 1.291776, -2.288073, 2, 26.4701, 21.55719, 21.78307, 27.14774, 30.45125, 29.2089, 25.65128, 28.6442, 27.48656, 31.02725, 23.87247, 0, 1, -3.316625, 0.3015113, 0.3015113, 0.3015113, 0.3015113, 0.3015113, 0.3015113, 0.3015113, 0.3015113, 0.3015113, 0.3015113, -7.5809, -1.801119, 0.4754451, -0.05200489, -0.376803, -0.2546567, 0.0951259, -0.1991357, -0.08531752, -0.4334345, 0.2700172, 1.301511, 1.497654, 1, 2, 1e-07, 2, 9, lm(formula = mpg ~ wt, data = df), mpg ~ wt, 22.8, 24.4, 22.8, 32.4, 30.4, 33.9, 21.5, 27.3, 26, 30.4, 21.4, 2.32, 3.19, 3.15, 2.2, 1.615, 1.835, 2.465, 1.935, 2.14, 1.513, 2.78
#> 3 23.86803, -2.192438, 2.373957, -1.741026, 1.455193, 1.609764, -0.3806137, -1.95773, -1.576246, 2.550552, -0.6506476, -1.137005, -2.149067, 3.761895, -1.118001, -1.041026, -56.49903, -6.003055, 0.8157971, 1.220314, -0.8068206, -3.464586, -3.211015, 0.9738578, -0.8857193, -1.30959, -2.619382, 3.287904, -1.095775, -1.312854, 2, 16.32604, 16.04103, 14.94481, 15.69024, 15.58061, 12.35773, 11.97625, 12.14945, 16.15065, 16.337, 15.44907, 15.43811, 16.918, 16.04103, 0, 1, -3.741657, 0.2672612, 0.2672612, 0.2672612, 0.2672612, 0.2672612, 0.2672612, 0.2672612, 0.2672612, 0.2672612, 0.2672612, 0.2672612, 0.2672612, 0.2672612, -14.96369, 2.738073, -0.06892519, 0.05524975, 0.03698873, -0.4998852, -0.5634336, -0.5345812, 0.131946, 0.1629898, 0.0150755, 0.0132494, 0.2597732, 0.113685, 1.267261, 1.113685, 1, 2, 1e-07, 2, 12, lm(formula = mpg ~ wt, data = df), mpg ~ wt, 18.7, 14.3, 16.4, 17.3, 15.2, 10.4, 10.4, 14.7, 15.5, 15.2, 13.3, 19.2, 15.8, 15, 3.44, 3.57, 4.07, 3.73, 3.78, 5.25, 5.424, 5.345, 3.52, 3.435, 3.84, 3.845, 3.17, 3.57

And then you could even produce a list of predictions:

mtcars_nested <- mtcars_nested %>% 
  mutate(model = map(model, predict))
mtcars_nested  
#>   cyl
#> 1   6
#> 2   4
#> 3   8
#>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     data
#> 1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     21.000, 21.000, 21.400, 18.100, 19.200, 17.800, 19.700, 160.000, 160.000, 258.000, 225.000, 167.600, 167.600, 145.000, 110.000, 110.000, 110.000, 105.000, 123.000, 123.000, 175.000, 3.900, 3.900, 3.080, 2.760, 3.920, 3.920, 3.620, 2.620, 2.875, 3.215, 3.460, 3.440, 3.440, 2.770, 16.460, 17.020, 19.440, 20.220, 18.300, 18.900, 15.500, 0.000, 0.000, 1.000, 1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 0.000, 0.000, 0.000, 0.000, 1.000, 4.000, 4.000, 3.000, 3.000, 4.000, 4.000, 5.000, 4.000, 4.000, 1.000, 1.000, 4.000, 4.000, 6.000
#> 2                                                                                                                                                                                                                                                   22.800, 24.400, 22.800, 32.400, 30.400, 33.900, 21.500, 27.300, 26.000, 30.400, 21.400, 108.000, 146.700, 140.800, 78.700, 75.700, 71.100, 120.100, 79.000, 120.300, 95.100, 121.000, 93.000, 62.000, 95.000, 66.000, 52.000, 65.000, 97.000, 66.000, 91.000, 113.000, 109.000, 3.850, 3.690, 3.920, 4.080, 4.930, 4.220, 3.700, 4.080, 4.430, 3.770, 4.110, 2.320, 3.190, 3.150, 2.200, 1.615, 1.835, 2.465, 1.935, 2.140, 1.513, 2.780, 18.610, 20.000, 22.900, 19.470, 18.520, 19.900, 20.010, 18.900, 16.700, 16.900, 18.600, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000, 0.000, 0.000, 1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 4.000, 4.000, 4.000, 4.000, 4.000, 4.000, 3.000, 4.000, 5.000, 5.000, 4.000, 1.000, 2.000, 2.000, 1.000, 2.000, 1.000, 1.000, 1.000, 2.000, 2.000, 2.000
#> 3 18.700, 14.300, 16.400, 17.300, 15.200, 10.400, 10.400, 14.700, 15.500, 15.200, 13.300, 19.200, 15.800, 15.000, 360.000, 360.000, 275.800, 275.800, 275.800, 472.000, 460.000, 440.000, 318.000, 304.000, 350.000, 400.000, 351.000, 301.000, 175.000, 245.000, 180.000, 180.000, 180.000, 205.000, 215.000, 230.000, 150.000, 150.000, 245.000, 175.000, 264.000, 335.000, 3.150, 3.210, 3.070, 3.070, 3.070, 2.930, 3.000, 3.230, 2.760, 3.150, 3.730, 3.080, 4.220, 3.540, 3.440, 3.570, 4.070, 3.730, 3.780, 5.250, 5.424, 5.345, 3.520, 3.435, 3.840, 3.845, 3.170, 3.570, 17.020, 15.840, 17.400, 17.600, 18.000, 17.980, 17.820, 17.420, 16.870, 17.300, 15.410, 17.050, 14.500, 14.600, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 1.000, 1.000, 3.000, 3.000, 3.000, 3.000, 3.000, 3.000, 3.000, 3.000, 3.000, 3.000, 3.000, 3.000, 5.000, 5.000, 2.000, 4.000, 3.000, 3.000, 3.000, 4.000, 4.000, 4.000, 2.000, 2.000, 4.000, 2.000, 4.000, 8.000
#>                                                                                                                                        model
#> 1                                                                       21.12497, 20.41604, 19.47080, 18.78968, 18.84528, 18.84528, 20.70795
#> 2                               26.47010, 21.55719, 21.78307, 27.14774, 30.45125, 29.20890, 25.65128, 28.64420, 27.48656, 31.02725, 23.87247
#> 3 16.32604, 16.04103, 14.94481, 15.69024, 15.58061, 12.35773, 11.97625, 12.14945, 16.15065, 16.33700, 15.44907, 15.43811, 16.91800, 16.04103

This workflow works particularly well in conjunction with broom, which makes it easy to turn models into tidy data frames which can then be unnest()ed to get back to flat data frames. You can see a bigger example in the broom and dplyr vignette.