dask.dataframe.DataFrame.query

dask.dataframe.DataFrame.query#

DataFrame.query(expr, **kwargs)[source]#

Filter dataframe with complex expression

Blocked version of pd.DataFrame.query

Parameters:
expr: str

The query string to evaluate. You can refer to column names that are not valid Python variable names by surrounding them in backticks. Dask does not fully support referring to variables using the ‘@’ character, use f-strings or the local_dict keyword argument instead.

Examples

>>> import pandas as pd
>>> import dask.dataframe as dd
>>> df = pd.DataFrame({'x': [1, 2, 1, 2],
...                    'y': [1, 2, 3, 4],
...                    'z z': [4, 3, 2, 1]})
>>> ddf = dd.from_pandas(df, npartitions=2)

Refer to column names directly:

>>> ddf.query('y > x').compute()
   x  y  z z
2  1  3    2
3  2  4    1

Refer to column name using backticks:

>>> ddf.query('`z z` > x').compute()
   x  y  z z
0  1  1    4
1  2  2    3
2  1  3    2

Refer to variable name using f-strings:

>>> value = 1
>>> ddf.query(f'x == {value}').compute()
   x  y  z z
0  1  1    4
2  1  3    2

Refer to variable name using local_dict:

>>> ddf.query('x == @value', local_dict={"value": value}).compute()
   x  y  z z
0  1  1    4
2  1  3    2