dask_expr._collection.DataFrame.align

dask_expr._collection.DataFrame.align

DataFrame.align(other, join='outer', axis=None, fill_value=None)

Align two objects on their axes with the specified join method.

This docstring was copied from pandas.core.frame.DataFrame.align.

Some inconsistencies with the Dask version may exist.

Join method is specified for each axis Index.

Parameters
otherDataFrame or Series
join{‘outer’, ‘inner’, ‘left’, ‘right’}, default ‘outer’

Type of alignment to be performed.

  • left: use only keys from left frame, preserve key order.

  • right: use only keys from right frame, preserve key order.

  • outer: use union of keys from both frames, sort keys lexicographically.

  • inner: use intersection of keys from both frames, preserve the order of the left keys.

axisallowed axis of the other object, default None

Align on index (0), columns (1), or both (None).

levelint or level name, default None (Not supported in Dask)

Broadcast across a level, matching Index values on the passed MultiIndex level.

copybool, default True (Not supported in Dask)

Always returns new objects. If copy=False and no reindexing is required then original objects are returned.

Note

The copy keyword will change behavior in pandas 3.0. Copy-on-Write will be enabled by default, which means that all methods with a copy keyword will use a lazy copy mechanism to defer the copy and ignore the copy keyword. The copy keyword will be removed in a future version of pandas.

You can already get the future behavior and improvements through enabling copy on write pd.options.mode.copy_on_write = True

fill_valuescalar, default np.nan

Value to use for missing values. Defaults to NaN, but can be any “compatible” value.

method{‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, None}, default None (Not supported in Dask)

Method to use for filling holes in reindexed Series:

  • pad / ffill: propagate last valid observation forward to next valid.

  • backfill / bfill: use NEXT valid observation to fill gap.

Deprecated since version 2.1.

limitint, default None (Not supported in Dask)

If method is specified, this is the maximum number of consecutive NaN values to forward/backward fill. In other words, if there is a gap with more than this number of consecutive NaNs, it will only be partially filled. If method is not specified, this is the maximum number of entries along the entire axis where NaNs will be filled. Must be greater than 0 if not None.

Deprecated since version 2.1.

fill_axis{0 or ‘index’} for Series, {0 or ‘index’, 1 or ‘columns’} for DataFrame, default 0 (Not supported in Dask)

Filling axis, method and limit.

Deprecated since version 2.1.

broadcast_axis{0 or ‘index’} for Series, {0 or ‘index’, 1 or ‘columns’} for DataFrame, default None (Not supported in Dask)

Broadcast values along this axis, if aligning two objects of different dimensions.

Deprecated since version 2.1.

Returns
tuple of (Series/DataFrame, type of other)

Aligned objects.

Examples

>>> df = pd.DataFrame(  
...     [[1, 2, 3, 4], [6, 7, 8, 9]], columns=["D", "B", "E", "A"], index=[1, 2]
... )
>>> other = pd.DataFrame(  
...     [[10, 20, 30, 40], [60, 70, 80, 90], [600, 700, 800, 900]],
...     columns=["A", "B", "C", "D"],
...     index=[2, 3, 4],
... )
>>> df  
   D  B  E  A
1  1  2  3  4
2  6  7  8  9
>>> other  
    A    B    C    D
2   10   20   30   40
3   60   70   80   90
4  600  700  800  900

Align on columns:

>>> left, right = df.align(other, join="outer", axis=1)  
>>> left  
   A  B   C  D  E
1  4  2 NaN  1  3
2  9  7 NaN  6  8
>>> right  
    A    B    C    D   E
2   10   20   30   40 NaN
3   60   70   80   90 NaN
4  600  700  800  900 NaN

We can also align on the index:

>>> left, right = df.align(other, join="outer", axis=0)  
>>> left  
    D    B    E    A
1  1.0  2.0  3.0  4.0
2  6.0  7.0  8.0  9.0
3  NaN  NaN  NaN  NaN
4  NaN  NaN  NaN  NaN
>>> right  
    A      B      C      D
1    NaN    NaN    NaN    NaN
2   10.0   20.0   30.0   40.0
3   60.0   70.0   80.0   90.0
4  600.0  700.0  800.0  900.0

Finally, the default axis=None will align on both index and columns:

>>> left, right = df.align(other, join="outer", axis=None)  
>>> left  
     A    B   C    D    E
1  4.0  2.0 NaN  1.0  3.0
2  9.0  7.0 NaN  6.0  8.0
3  NaN  NaN NaN  NaN  NaN
4  NaN  NaN NaN  NaN  NaN
>>> right  
       A      B      C      D   E
1    NaN    NaN    NaN    NaN NaN
2   10.0   20.0   30.0   40.0 NaN
3   60.0   70.0   80.0   90.0 NaN
4  600.0  700.0  800.0  900.0 NaN