dask.dataframe.Index.drop_duplicates

dask.dataframe.Index.drop_duplicates

Index.drop_duplicates(split_every=None, split_out=1, shuffle_method=None, **kwargs)[source]

Return Index with duplicate values removed.

This docstring was copied from pandas.core.indexes.base.Index.drop_duplicates.

Some inconsistencies with the Dask version may exist.

Known inconsistencies:

keep=False will raise a NotImplementedError

Parameters
keep{‘first’, ‘last’, False}, default ‘first’ (Not supported in Dask)
  • ‘first’ : Drop duplicates except for the first occurrence.

  • ‘last’ : Drop duplicates except for the last occurrence.

  • False : Drop all duplicates.

Returns
Index

See also

Series.drop_duplicates

Equivalent method on Series.

DataFrame.drop_duplicates

Equivalent method on DataFrame.

Index.duplicated

Related method on Index, indicating duplicate Index values.

Examples

Generate an pandas.Index with duplicate values.

>>> idx = pd.Index(['lama', 'cow', 'lama', 'beetle', 'lama', 'hippo'])  

The keep parameter controls which duplicate values are removed. The value ‘first’ keeps the first occurrence for each set of duplicated entries. The default value of keep is ‘first’.

>>> idx.drop_duplicates(keep='first')  
Index(['lama', 'cow', 'beetle', 'hippo'], dtype='object')

The value ‘last’ keeps the last occurrence for each set of duplicated entries.

>>> idx.drop_duplicates(keep='last')  
Index(['cow', 'beetle', 'lama', 'hippo'], dtype='object')

The value False discards all sets of duplicated entries.

>>> idx.drop_duplicates(keep=False)  
Index(['cow', 'beetle', 'hippo'], dtype='object')