dask_expr._groupby.GroupBy.aggregate

dask_expr._groupby.GroupBy.aggregate¶

GroupBy.aggregate(arg=None, split_every=8, split_out=None, shuffle_method=None, **kwargs)[source]¶

Aggregate using one or more specified operations

Based on pd.core.groupby.DataFrameGroupBy.agg

Parameters

argcallable, str, list or dict, optional

Aggregation spec. Accepted combinations are:

callable function
string function name
list of functions and/or function names, e.g. [np.sum, 'mean']
dict of column names -> function, function name or list of such.
None only if named aggregation syntax is used

split_everyint, optional

Number of intermediate partitions that may be aggregated at once. This defaults to 8. If your intermediate partitions are likely to be small (either due to a small number of groups or a small initial partition size), consider increasing this number for better performance.

split_outint, optional

Number of output partitions. Default is 1.

shufflebool or str, optional

Whether a shuffle-based algorithm should be used. A specific algorithm name may also be specified (e.g. "tasks" or "p2p"). The shuffle-based algorithm is likely to be more efficient than shuffle=False when split_out>1 and the number of unique groups is large (high cardinality). Default is False when split_out = 1. When split_out > 1, it chooses the algorithm set by the shuffle option in the dask config system, or "tasks" if nothing is set.

kwargs: tuple or pd.NamedAgg, optional

Used for named aggregations where the keywords are the output column names and the values are tuples where the first element is the input column name and the second element is the aggregation function. pandas.NamedAgg can also be used as the value. To use the named aggregation syntax, arg must be set to None.

dask_expr._collection.Index.where

dask_expr._groupby.GroupBy.apply