dask.dataframe.to_orc
dask.dataframe.to_orc¶
- dask.dataframe.to_orc(df, path, engine='pyarrow', write_index=True, storage_options=None, compute=True, compute_kwargs=None)[source]¶
Store Dask.dataframe to ORC files
- Parameters
- dfdask.dataframe.DataFrame
- pathstring or pathlib.Path
Destination directory for data. Prepend with protocol like
s3://
orhdfs://
for remote data.- engine: ‘pyarrow’ or ORCEngine
Backend ORC engine to use for I/O. Default is “pyarrow”.
- write_indexboolean, default True
Whether or not to write the index. Defaults to True.
- storage_optionsdict, default None
Key/value pairs to be passed on to the file-system backend, if any.
- computebool, default True
If True (default) then the result is computed immediately. If False then a
dask.delayed
object is returned for future computation.- compute_kwargsdict, default True
Options to be passed in to the compute method
See also
read_orc
Read ORC data to dask.dataframe
Notes
Each partition will be written to a separate file.
Examples
>>> df = dd.read_csv(...) >>> df.to_orc('/path/to/output/', ...)