API

API

The dask.delayed interface consists of one function, delayed:

  • delayed wraps functions

    Wraps functions. Can be used as a decorator, or around function calls directly (i.e. delayed(foo)(a, b, c)). Outputs from functions wrapped in delayed are proxy objects of type Delayed that contain a graph of all operations done to get to this result.

  • delayed wraps objects

    Wraps objects. Used to create Delayed proxies directly.

Delayed objects can be thought of as representing a key in the dask task graph. A Delayed supports most python operations, each of which creates another Delayed representing the result:

  • Most operators (*, -, and so on)

  • Item access and slicing (a[0])

  • Attribute access (a.size)

  • Method calls (a.index(0))

Operations that aren’t supported include:

  • Mutating operators (a += 1)

  • Mutating magics such as __setitem__/__setattr__ (a[0] = 1, a.foo = 1)

  • Iteration. (for i in a: ...)

  • Use as a predicate (if a: ...)

The last two points in particular mean that Delayed objects cannot be used for control flow, meaning that no Delayed can appear in a loop or if statement. In other words you can’t iterate over a Delayed object, or use it as part of a condition in an if statement, but Delayed object can be used in a body of a loop or if statement (i.e. the example above is fine, but if data was a Delayed object it wouldn’t be). Even with this limitation, many workflows can easily be parallelized.

delayed([obj, name, pure, nout, traverse])

Wraps a function or object to produce a Delayed.

Delayed(key, dsk[, length, layer])

Represents a value to be computed by dask.

dask.delayed.delayed(obj='__no__default__', name=None, pure=None, nout=None, traverse=True)[source]

Wraps a function or object to produce a Delayed.

Delayed objects act as proxies for the object they wrap, but all operations on them are done lazily by building up a dask graph internally.

Parameters
objobject

The function or object to wrap

nameDask key, optional

The key to use in the underlying graph for the wrapped object. Defaults to hashing content. Note that this only affects the name of the object wrapped by this call to delayed, and not the output of delayed function calls - for that use dask_key_name= as described below.

Note

Because this name is used as the key in task graphs, you should ensure that it uniquely identifies obj. If you’d like to provide a descriptive name that is still unique, combine the descriptive name with dask.base.tokenize() of the array_like. See Task Graphs for more.

purebool, optional

Indicates whether calling the resulting Delayed object is a pure operation. If True, arguments to the call are hashed to produce deterministic keys. If not provided, the default is to check the global delayed_pure setting, and fallback to False if unset.

noutint, optional

The number of outputs returned from calling the resulting Delayed object. If provided, the Delayed output of the call can be iterated into nout objects, allowing for unpacking of results. By default iteration over Delayed objects will error. Note, that nout=1 expects obj to return a tuple of length 1, and consequently for nout=0, obj should return an empty tuple.

traversebool, optional

By default dask traverses builtin python collections looking for dask objects passed to delayed. For large collections this can be expensive. If obj doesn’t contain any dask objects, set traverse=False to avoid doing this traversal.

Examples

Apply to functions to delay execution:

>>> from dask import delayed
>>> def inc(x):
...     return x + 1
>>> inc(10)
11
>>> x = delayed(inc, pure=True)(10)
>>> type(x) == Delayed
True
>>> x.compute()
11

Can be used as a decorator:

>>> @delayed(pure=True)
... def add(a, b):
...     return a + b
>>> add(1, 2).compute()
3

delayed also accepts an optional keyword pure. If False, then subsequent calls will always produce a different Delayed. This is useful for non-pure functions (such as time or random).

>>> from random import random
>>> out1 = delayed(random, pure=False)()
>>> out2 = delayed(random, pure=False)()
>>> out1.key == out2.key
False

If you know a function is pure (output only depends on the input, with no global state), then you can set pure=True. This will attempt to apply a consistent name to the output, but will fallback on the same behavior of pure=False if this fails.

>>> @delayed(pure=True)
... def add(a, b):
...     return a + b
>>> out1 = add(1, 2)
>>> out2 = add(1, 2)
>>> out1.key == out2.key
True

Instead of setting pure as a property of the callable, you can also set it contextually using the delayed_pure setting. Note that this influences the call and not the creation of the callable:

>>> @delayed
... def mul(a, b):
...     return a * b
>>> import dask
>>> with dask.config.set(delayed_pure=True):
...     print(mul(1, 2).key == mul(1, 2).key)
True
>>> with dask.config.set(delayed_pure=False):
...     print(mul(1, 2).key == mul(1, 2).key)
False

The key name of the result of calling a delayed object is determined by hashing the arguments by default. To explicitly set the name, you can use the dask_key_name keyword when calling the function:

>>> add(1, 2)   
Delayed('add-3dce7c56edd1ac2614add714086e950f')
>>> add(1, 2, dask_key_name='three')
Delayed('three')

Note that objects with the same key name are assumed to have the same result. If you set the names explicitly you should make sure your key names are different for different results.

>>> add(1, 2, dask_key_name='three')
Delayed('three')
>>> add(2, 1, dask_key_name='three')
Delayed('three')
>>> add(2, 2, dask_key_name='four')
Delayed('four')

delayed can also be applied to objects to make operations on them lazy:

>>> a = delayed([1, 2, 3])
>>> isinstance(a, Delayed)
True
>>> a.compute()
[1, 2, 3]

The key name of a delayed object is hashed by default if pure=True or is generated randomly if pure=False (default). To explicitly set the name, you can use the name keyword. To ensure that the key is unique you should include the tokenized value as well, or otherwise ensure that it’s unique:

>>> from dask.base import tokenize
>>> data = [1, 2, 3]
>>> a = delayed(data, name='mylist-' + tokenize(data))
>>> a  
Delayed('mylist-55af65871cb378a4fa6de1660c3e8fb7')

Delayed results act as a proxy to the underlying object. Many operators are supported:

>>> (a + [1, 2]).compute()
[1, 2, 3, 1, 2]
>>> a[1].compute()
2

Method and attribute access also works:

>>> a.count(2).compute()
1

Note that if a method doesn’t exist, no error will be thrown until runtime:

>>> res = a.not_a_real_method() 
>>> res.compute()  
AttributeError("'list' object has no attribute 'not_a_real_method'")

“Magic” methods (e.g. operators and attribute access) are assumed to be pure, meaning that subsequent calls must return the same results. This behavior is not overridable through the delayed call, but can be modified using other ways as described below.

To invoke an impure attribute or operator, you’d need to use it in a delayed function with pure=False:

>>> class Incrementer:
...     def __init__(self):
...         self._n = 0
...     @property
...     def n(self):
...         self._n += 1
...         return self._n
...
>>> x = delayed(Incrementer())
>>> x.n.key == x.n.key
True
>>> get_n = delayed(lambda x: x.n, pure=False)
>>> get_n(x).key == get_n(x).key
False

In contrast, methods are assumed to be impure by default, meaning that subsequent calls may return different results. To assume purity, set pure=True. This allows sharing of any intermediate values.

>>> a.count(2, pure=True).key == a.count(2, pure=True).key
True

As with function calls, method calls also respect the global delayed_pure setting and support the dask_key_name keyword:

>>> a.count(2, dask_key_name="count_2")
Delayed('count_2')
>>> import dask
>>> with dask.config.set(delayed_pure=True):
...     print(a.count(2).key == a.count(2).key)
True
class dask.delayed.Delayed(key, dsk, length=None, layer=None)[source]

Represents a value to be computed by dask.

Equivalent to the output from a single key in a dask graph.