dask.dataframe.Series.str.extract

dask.dataframe.Series.str.extract

dataframe.Series.str.extract(*args, **kwargs)

Extract capture groups in the regex pat as columns in a DataFrame.

This docstring was copied from pandas.core.strings.accessor.StringMethods.extract.

Some inconsistencies with the Dask version may exist.

For each subject string in the Series, extract groups from the first match of regular expression pat.

Parameters
patstr (Not supported in Dask)

Regular expression pattern with capturing groups.

flagsint, default 0 (no flags) (Not supported in Dask)

Flags from the re module, e.g. re.IGNORECASE, that modify regular expression matching for things like case, spaces, etc. For more details, see re.

expandbool, default True (Not supported in Dask)

If True, return DataFrame with one column per capture group. If False, return a Series/Index if there is one capture group or DataFrame if there are multiple capture groups.

Returns
DataFrame or Series or Index

A DataFrame with one row for each subject string, and one column for each group. Any capture group names in regular expression pat will be used for column names; otherwise capture group numbers will be used. The dtype of each result column is always object, even when no match is found. If expand=False and pat has only one capture group, then return a Series (if subject is a Series) or Index (if subject is an Index).

See also

extractall

Returns all matches (not just the first match).

Examples

A pattern with two groups will return a DataFrame with two columns. Non-matches will be NaN.

>>> s = pd.Series(['a1', 'b2', 'c3'])  
>>> s.str.extract(r'([ab])(\d)')  
    0    1
0    a    1
1    b    2
2  NaN  NaN

A pattern may contain optional groups.

>>> s.str.extract(r'([ab])?(\d)')  
    0  1
0    a  1
1    b  2
2  NaN  3

Named groups will become column names in the result.

>>> s.str.extract(r'(?P<letter>[ab])(?P<digit>\d)')  
letter digit
0      a     1
1      b     2
2    NaN   NaN

A pattern with one group will return a DataFrame with one column if expand=True.

>>> s.str.extract(r'[ab](\d)', expand=True)  
    0
0    1
1    2
2  NaN

A pattern with one group will return a Series if expand=False.

>>> s.str.extract(r'[ab](\d)', expand=False)  
0      1
1      2
2    NaN
dtype: object