Given a regular expression with capturing groups, extract() turns each group into a new column. If the groups don't match, or the input is NA, the output will be NA.

extract(data, col, into, regex = "([[:alnum:]]+)", remove = TRUE,
  convert = FALSE, ...)



A data frame.


Column name or position. This is passed to tidyselect::vars_pull().

This argument is passed by expression and supports quasiquotation (you can unquote column names or column positions).


Names of new variables to create as character vector.


a regular expression used to extract the desired values. The should be one group (defined by ()) for each element of into.


If TRUE, remove input column from output data frame.


If TRUE, will run type.convert() with = TRUE on new columns. This is useful if the component columns are integer, numeric or logical.


Other arguments passed on to regexec() to control how the regular expression is processed.


library(dplyr) df <- data.frame(x = c(NA, "a-b", "a-d", "b-c", "d-e")) df %>% extract(x, "A")
#> A #> 1 <NA> #> 2 a #> 3 a #> 4 b #> 5 d
df %>% extract(x, c("A", "B"), "([[:alnum:]]+)-([[:alnum:]]+)")
#> A B #> 1 <NA> <NA> #> 2 a b #> 3 a d #> 4 b c #> 5 d e
# If no match, NA: df %>% extract(x, c("A", "B"), "([a-d]+)-([a-d]+)")
#> A B #> 1 <NA> <NA> #> 2 a b #> 3 a d #> 4 b c #> 5 <NA> <NA>