dask.dataframe.Series.to_csv

dask.dataframe.Series.to_csv

Series.to_csv(filename, **kwargs)

Store Dask DataFrame to CSV files

One filename per partition will be created. You can specify the filenames in a variety of ways.

Use a globstring:

>>> df.to_csv('/path/to/data/export-*.csv')  

The * will be replaced by the increasing sequence 0, 1, 2, …

/path/to/data/export-0.csv
/path/to/data/export-1.csv

Use a globstring and a name_function= keyword argument. The name_function function should expect an integer and produce a string. Strings produced by name_function must preserve the order of their respective partition indices.

>>> from datetime import date, timedelta
>>> def name(i):
...     return str(date(2015, 1, 1) + i * timedelta(days=1))
>>> name(0)
'2015-01-01'
>>> name(15)
'2015-01-16'
>>> df.to_csv('/path/to/data/export-*.csv', name_function=name)  
/path/to/data/export-2015-01-01.csv
/path/to/data/export-2015-01-02.csv
...

You can also provide an explicit list of paths:

>>> paths = ['/path/to/data/alice.csv', '/path/to/data/bob.csv', ...]  
>>> df.to_csv(paths) 

You can also provide a directory name:

>>> df.to_csv('/path/to/data') 

The files will be numbered 0, 1, 2, (and so on) suffixed with ‘.part’:

/path/to/data/0.part
/path/to/data/1.part
Parameters
dfdask.DataFrame

Data to save

filenamestring or list

Absolute or relative filepath(s). Prefix with a protocol like s3:// to save to remote filesystems.

single_filebool, default False

Whether to save everything into a single CSV file. Under the single file mode, each partition is appended at the end of the specified CSV file.

encodingstring, default ‘utf-8’

A string representing the encoding to use in the output file.

modestr, default ‘w’

Python file mode. The default is ‘w’ (or ‘wt’), for writing a new file or overwriting an existing file in text mode. ‘a’ (or ‘at’) will append to an existing file in text mode or create a new file if it does not already exist. See open().

name_functioncallable, default None

Function accepting an integer (partition index) and producing a string to replace the asterisk in the given filename globstring. Should preserve the lexicographic order of partitions. Not supported when single_file is True.

compressionstring, optional

A string representing the compression to use in the output file, allowed values are ‘gzip’, ‘bz2’, ‘xz’, only used when the first argument is a filename.

computebool, default True

If True, immediately executes. If False, returns a set of delayed objects, which can be computed at a later time.

storage_optionsdict

Parameters passed on to the backend filesystem class.

header_first_partition_onlybool, default None

If set to True, only write the header row in the first output file. By default, headers are written to all partitions under the multiple file mode (single_file is False) and written only once under the single file mode (single_file is True). It must be True under the single file mode.

compute_kwargsdict, optional

Options to be passed in to the compute method

kwargsdict, optional

Additional parameters to pass to pandas.DataFrame.to_csv().

Returns
The names of the file written if they were computed right away.
If not, the delayed tasks associated with writing the files.
Raises
ValueError

If header_first_partition_only is set to False or name_function is specified when single_file is True.