
Extract a character column into multiple columns using regular expression groups
Source:R/extract.R
extract.Rd
Given a regular expression with capturing groups, extract()
turns
each group into a new column. If the groups don't match, or the input
is NA, the output will be NA.
Arguments
- data
A data frame.
- col
Column name or position. This is passed to
tidyselect::vars_pull()
.This argument is passed by expression and supports quasiquotation (you can unquote column names or column positions).
- into
Names of new variables to create as character vector. Use
NA
to omit the variable in the output.- regex
A string representing a regular expression used to extract the desired values. There should be one group (defined by
()
) for each element ofinto
.- remove
If
TRUE
, remove input column from output data frame.- convert
If
TRUE
, will runtype.convert()
withas.is = TRUE
on new columns. This is useful if the component columns are integer, numeric or logical.NB: this will cause string
"NA"
s to be converted toNA
s.- ...
Additional arguments passed on to methods.
See also
separate()
to split up by a separator.
Examples
df <- data.frame(x = c(NA, "a-b", "a-d", "b-c", "d-e"))
df %>% extract(x, "A")
#> A
#> 1 <NA>
#> 2 a
#> 3 a
#> 4 b
#> 5 d
df %>% extract(x, c("A", "B"), "([[:alnum:]]+)-([[:alnum:]]+)")
#> A B
#> 1 <NA> <NA>
#> 2 a b
#> 3 a d
#> 4 b c
#> 5 d e
# If no match, NA:
df %>% extract(x, c("A", "B"), "([a-d]+)-([a-d]+)")
#> A B
#> 1 <NA> <NA>
#> 2 a b
#> 3 a d
#> 4 b c
#> 5 <NA> <NA>