dask_expr._collection.Index
dask_expr._collection.Index¶
- class dask_expr._collection.Index(expr)[source]¶
Index-like Expr Collection.
The constructor takes the expression that represents the query as input. The class is not meant to be instantiated directly. Instead, use one of the IO connectors from Dask.
- __init__(expr)¶
Methods
__init__
(expr)abs
()Return a Series/DataFrame with absolute numeric value of each element.
add
(other[, level, fill_value, axis])add_prefix
(prefix)Prefix labels with string prefix.
add_suffix
(suffix)Suffix labels with string suffix.
align
(other[, join, axis, fill_value])Align two objects on their axes with the specified join method.
all
([axis, skipna, split_every])Return whether all elements are True, potentially over an axis.
analyze
([filename, format])Outputs statistics about every node in the expression.
any
([axis, skipna, split_every])Return whether any element is True, potentially over an axis.
apply
(function, *args[, meta, axis])Parallel version of pandas.Series.apply
astype
(dtypes)Cast a pandas object to a specified dtype
dtype
.autocorr
([lag, split_every])Compute the lag-N autocorrelation.
between
(left, right[, inclusive])Return boolean Series equivalent to left <= series <= right.
bfill
([axis, limit])Fill NA/NaN values by using the next valid observation to fill the gap.
case_when
(caselist)Replace values where the conditions are True.
Forget division information.
clip
([lower, upper, axis])Trim values at input threshold(s).
combine
(other, func[, fill_value])Combine the Series with a Series or scalar according to func.
combine_first
(other)Update null elements with value in the same location in other.
compute
([fuse, concatenate])Compute this DataFrame.
compute_current_divisions
([col, set_divisions])Compute the current divisions of the DataFrame.
copy
([deep])Make a copy of the dataframe
corr
(other[, method, min_periods, split_every])Compute correlation with other Series, excluding missing values.
count
([split_every])Count non-NA cells for each column or row.
cov
(other[, min_periods, split_every])Compute covariance with Series, excluding missing values.
cummax
([axis, skipna])Return cumulative maximum over a DataFrame or Series axis.
cummin
([axis, skipna])Return cumulative minimum over a DataFrame or Series axis.
cumprod
([axis, skipna])Return cumulative product over a DataFrame or Series axis.
cumsum
([axis, skipna])Return cumulative sum over a DataFrame or Series axis.
describe
([split_every, percentiles, ...])Generate descriptive statistics.
diff
([periods, axis])First discrete difference of element.
div
(other[, level, fill_value, axis])divide
(other[, level, fill_value, axis])dot
(other[, meta])Compute the dot product between the Series and the columns of other.
drop_duplicates
([ignore_index, split_every, ...])dropna
()Return a new Series with missing values removed.
enforce_runtime_divisions
()Enforce the current divisions at runtime.
eq
(other[, level, fill_value, axis])explain
([stage, format])Create a graph representation of the Expression.
explode
()Transform each element of a list-like to a row.
ffill
([axis, limit])Fill NA/NaN values by propagating the last valid observation to next valid.
fillna
([value, axis])Fill NA/NaN values using the specified method.
floordiv
(other[, level, fill_value, axis])from_dict
(data, *[, npartitions, orient, ...])Construct a Dask DataFrame from a Python Dictionary
ge
(other[, level, fill_value, axis])Get a dask DataFrame/Series representing the nth partition.
groupby
(by, **kwargs)Group Series using a mapper or by a Series of columns.
gt
(other[, level, fill_value, axis])head
([n, npartitions, compute])First n rows of the dataset
idxmax
(*args, **kwargs)Return index of first occurrence of maximum over requested axis.
idxmin
(*args, **kwargs)Return index of first occurrence of minimum over requested axis.
isin
(values)Whether each element in the DataFrame is contained in values.
isna
()Detect missing values.
isnull
()DataFrame.isnull is an alias for DataFrame.isna.
kurt
([axis, fisher, bias, nan_policy, ...])Return unbiased kurtosis over requested axis.
kurtosis
([axis, fisher, bias, nan_policy, ...])Return unbiased kurtosis over requested axis.
le
(other[, level, fill_value, axis])lower_once
()lt
(other[, level, fill_value, axis])map
(arg[, na_action, meta, is_monotonic])Map values using an input mapping or function.
map_overlap
(func, before, after, *args[, ...])Apply a function to each partition, sharing rows with adjacent partitions.
map_partitions
(func, *args[, meta, ...])Apply a Python function to each partition
mask
(cond[, other])Replace values where the condition is True.
max
([axis, skipna, numeric_only, split_every])Return the maximum of the values over the requested axis.
mean
(*args, **kwargs)Return the mean of the values over the requested axis.
median
()Return the median of the values over the requested axis.
median_approximate
([method])Return the approximate median of the values over the requested axis.
memory_usage
([deep])Memory usage of the values.
memory_usage_per_partition
([index, deep])Return the memory usage of each partition
min
([axis, skipna, numeric_only, split_every])Return the minimum of the values over the requested axis.
mod
(other[, level, fill_value, axis])mode
([dropna, split_every])Return the mode(s) of the Series.
mul
(other[, level, fill_value, axis])ne
(other[, level, fill_value, axis])nlargest
([n, split_every])Return the largest n elements.
notnull
()DataFrame.notnull is an alias for DataFrame.notna.
nsmallest
([n, split_every])Return the smallest n elements.
nunique
([dropna, split_every, split_out])Return number of unique elements in the object.
nunique_approx
([split_every])Approximate number of unique rows.
optimize
([fuse])Optimizes the DataFrame.
persist
([fuse])Persist this dask collection into memory
pipe
(func, *args, **kwargs)Apply chainable functions that expect Series or DataFrames.
pow
(other[, level, fill_value, axis])pprint
()Outputs a string representation of the DataFrame.
prod
(*args, **kwargs)Return the product of the values over the requested axis.
product
([axis, skipna, numeric_only, ...])Return the product of the values over the requested axis.
quantile
([q, method])Approximate quantiles of Series
radd
(other[, level, fill_value, axis])random_split
(frac[, random_state, shuffle])Pseudorandomly split dataframe into different pieces row-wise
rdiv
(other[, level, fill_value, axis])reduction
(chunk[, aggregate, combine, meta, ...])Generic row-wise reductions.
rename
(index[, sorted_index])Alter Series index labels or name
rename_axis
([mapper, index, columns, axis])Set the name of the axis for the index or columns.
repartition
([divisions, npartitions, ...])Repartition a collection
replace
([to_replace, value, regex])Replace values given in to_replace with value.
resample
(rule[, closed, label])Resample time-series data.
reset_index
([drop])Reset the index to the default index.
rfloordiv
(other[, level, fill_value, axis])rmod
(other[, level, fill_value, axis])rmul
(other[, level, fill_value, axis])rolling
(window, **kwargs)Provides rolling transformations.
round
([decimals])Round a DataFrame to a variable number of decimal places.
rpow
(other[, level, fill_value, axis])rsub
(other[, level, fill_value, axis])rtruediv
(other[, level, fill_value, axis])sample
([n, frac, replace, random_state])Random sample of items
sem
([axis, skipna, ddof, split_every, ...])Return unbiased standard error of the mean over requested axis.
shift
([periods, freq])Shift index by desired number of periods with an optional time freq.
shuffle
([on, ignore_index, npartitions, ...])Rearrange DataFrame into new partitions
simplify
()skew
([axis, bias, nan_policy, numeric_only])Return unbiased skew over requested axis.
squeeze
()Squeeze 1 dimensional axis objects into scalars.
std
(*args, **kwargs)Return sample standard deviation over requested axis.
sub
(other[, level, fill_value, axis])sum
(*args, **kwargs)Return the sum of the values over the requested axis.
tail
([n, compute])Last n rows of the dataset
to_backend
([backend])Move to a new DataFrame backend
to_bag
([index, format])Create a Dask Bag from a Series
to_csv
(filename, **kwargs)See dd.to_csv docstring for more information
to_dask_array
([lengths, meta, optimize])Convert a dask DataFrame to a dask array.
to_dask_dataframe
(*args, **kwargs)Convert to a legacy dask-dataframe collection
to_delayed
([optimize_graph])Convert into a list of
dask.delayed
objects, one per partition.to_frame
([index, name])Create a DataFrame with a column containing the Index.
to_hdf
(path_or_buf, key[, mode, append])See dd.to_hdf docstring for more information
to_json
(filename, *args, **kwargs)See dd.to_json docstring for more information
to_legacy_dataframe
([optimize])Convert to a legacy dask-dataframe collection
to_orc
(path, *args, **kwargs)See dd.to_orc docstring for more information
to_records
([index, lengths])to_series
([index, name])Create a Series with both index and values equal to the index keys.
to_sql
(name, uri[, schema, if_exists, ...])to_string
([max_rows])Render a string representation of the Series.
to_timestamp
([freq, how])Cast to DatetimeIndex of timestamps, at beginning of period.
truediv
(other[, level, fill_value, axis])unique
([split_every, split_out, shuffle_method])Return Series of unique values in the object.
value_counts
([sort, ascending, dropna, ...])Return a Series containing counts of unique values.
var
(*args, **kwargs)Return unbiased variance over requested axis.
visualize
([tasks])Visualize the expression or task graph
where
(cond[, other])Replace values where the condition is False.
Attributes
axes
columns
dask
divisions
Tuple of
npartitions + 1
values, in ascending order, marking the lower/upper bounds of each partition's index.dtypes
Return data types
expr
index
Return dask Index instance
Return boolean if values in the object are monotonically decreasing.
Return boolean if values in the object are monotonically increasing.
Whether the divisions are known.
Purely label-location based indexer for selection by label.
name
Number of bytes
Return dimensionality
npartitions
Return number of partitions
partitions
Slice dataframe by partitions
Return a tuple representing the dimensionality of the DataFrame.
Size of the Series or DataFrame as a Delayed object.
Return a dask.array of the values of this dataframe