dask_expr._groupby.GroupBy.aggregate
dask_expr._groupby.GroupBy.aggregate¶
- GroupBy.aggregate(arg=None, split_every=8, split_out=None, shuffle_method=None, **kwargs)[source]¶
Aggregate using one or more specified operations
Based on pd.core.groupby.DataFrameGroupBy.agg
- Parameters
- argcallable, str, list or dict, optional
Aggregation spec. Accepted combinations are:
callable function
string function name
list of functions and/or function names, e.g.
[np.sum, 'mean']
dict of column names -> function, function name or list of such.
None only if named aggregation syntax is used
- split_everyint, optional
Number of intermediate partitions that may be aggregated at once. This defaults to 8. If your intermediate partitions are likely to be small (either due to a small number of groups or a small initial partition size), consider increasing this number for better performance.
- split_outint, optional
Number of output partitions. Default is 1.
- shufflebool or str, optional
Whether a shuffle-based algorithm should be used. A specific algorithm name may also be specified (e.g.
"tasks"
or"p2p"
). The shuffle-based algorithm is likely to be more efficient thanshuffle=False
whensplit_out>1
and the number of unique groups is large (high cardinality). Default isFalse
whensplit_out = 1
. Whensplit_out > 1
, it chooses the algorithm set by theshuffle
option in the dask config system, or"tasks"
if nothing is set.- kwargs: tuple or pd.NamedAgg, optional
Used for named aggregations where the keywords are the output column names and the values are tuples where the first element is the input column name and the second element is the aggregation function.
pandas.NamedAgg
can also be used as the value. To use the named aggregation syntax, arg must be set to None.