dask.dataframe.to_orc

dask.dataframe.to_orc

dask.dataframe.to_orc(df, path, engine='pyarrow', write_index=True, storage_options=None, compute=True, compute_kwargs=None)[source]

Store Dask.dataframe to ORC files

Parameters
dfdask.dataframe.DataFrame
pathstring or pathlib.Path

Destination directory for data. Prepend with protocol like s3:// or hdfs:// for remote data.

engine: ‘pyarrow’ or ORCEngine

Backend ORC engine to use for I/O. Default is “pyarrow”.

write_indexboolean, default True

Whether or not to write the index. Defaults to True.

storage_optionsdict, default None

Key/value pairs to be passed on to the file-system backend, if any.

computebool, default True

If True (default) then the result is computed immediately. If False then a dask.delayed object is returned for future computation.

compute_kwargsdict, default True

Options to be passed in to the compute method

See also

read_orc

Read ORC data to dask.dataframe

Notes

Each partition will be written to a separate file.

Examples

>>> df = dd.read_csv(...)  
>>> df.to_orc('/path/to/output/', ...)