dask.dataframe.api.SeriesGroupBy.aggregate#
- SeriesGroupBy.aggregate(arg=None, split_every=8, split_out=None, shuffle_method=None, **kwargs)#
Aggregate using one or more specified operations
Based on pd.core.groupby.DataFrameGroupBy.agg
- Parameters:
- argcallable, str, list or dict, optional
Aggregation spec. Accepted combinations are:
callable function
string function name
list of functions and/or function names, e.g.
[np.sum, 'mean']dict of column names -> function, function name or list of such.
None only if named aggregation syntax is used
- split_everyint >= 2 or dict(axis: int), optional
Number of intermediate partitions that may be aggregated at once. This defaults to 8. Determines the depth of the recursive aggregation. If set to or more than the number of input chunks, the aggregation will be performed in two steps, one
chunkfunction per input chunk and a singleaggregatefunction at the end. If set to less than that, an intermediatecombinefunction will be used, so that any onecombineoraggregatefunction has no more thansplit_everyinputs. The depth of the aggregation graph will be \(\log_\text{split_every}(\text{input chunks along reduced axes})\). Setting to a low value can reduce cache size and network transfers, at the cost of more CPU and a larger dask graph.- split_outint, optional
Number of output results in group-by like aggregations (defaults to 1)
- shufflebool or str, optional
Whether a shuffle-based algorithm should be used. A specific algorithm name may also be specified (e.g.
"tasks"or"p2p"). The shuffle-based algorithm is likely to be more efficient thanshuffle=Falsewhensplit_out>1and the number of unique groups is large (high cardinality). Default isFalsewhensplit_out = 1. Whensplit_out > 1, it chooses the algorithm set by theshuffleoption in the dask config system, or"tasks"if nothing is set.- kwargs: tuple or pd.NamedAgg, optional
Used for named aggregations where the keywords are the output column names and the values are tuples where the first element is the input column name and the second element is the aggregation function.
pandas.NamedAggcan also be used as the value. To use the named aggregation syntax, arg must be set to None.