dask.bag.Bag.distinct

dask.bag.Bag.distinct

Bag.distinct(key=None)[source]

Distinct elements of collection

Unordered without repeats.

Parameters
key: {callable,str}

Defines uniqueness of items in bag by calling key on each item. If a string is passed key is considered to be lambda x: x[key].

Examples

>>> import dask.bag as db
>>> b = db.from_sequence(['Alice', 'Bob', 'Alice'])
>>> sorted(b.distinct())
['Alice', 'Bob']
>>> b = db.from_sequence([{'name': 'Alice'}, {'name': 'Bob'}, {'name': 'Alice'}])
>>> b.distinct(key=lambda x: x['name']).compute()
[{'name': 'Alice'}, {'name': 'Bob'}]
>>> b.distinct(key='name').compute()
[{'name': 'Alice'}, {'name': 'Bob'}]