Re: [Python] why does write_feather drop index by default?

Joris Van den Bossche Fri, 16 Jul 2021 04:25:07 -0700

A lot of interesting discussion on the thread, but to still answer your
original question about the dataframe index being dropped. I don't know the
historical reason to do so, but the easy workaround (without needing to
rely on private modules) is to convert the dataframe to a table first (and
ensure your index is preserved in this step), and pass that to
write_feather (which supports pyarrow.Table as well in addition to
pandas.DataFrame):


import pyarrow as pa
from pyarrow import feather

table = pa.Table.from_pandas(df, preserve_index=True)
feather.write_feather(table, dest, ...)


On Tue, 13 Jul 2021 at 19:06, Arun Joseph <[email protected]> wrote:

> Hi,
>
> I've noticed that if I pass a pandas dataframe to write_feather
> <https://github.com/apache/arrow/blob/release-4.0.1/python/pyarrow/feather.py#L152>
> (hyperlink to relevant part of code), it will automatically drop the index.
> Was this behavior intentionally chosen to only drop the index and not to
> allow the user to specify? I assumed the behavior would match the default
> behavior of converting from a pandas dataframe to an arrow table as
> mentioned in the docs
> <https://arrow.apache.org/docs/python/pandas.html#handling-pandas-indexes>
> .
>
> Is the best way around this to do the following?
>
> ```python3
> import pyarrow.lib as ext
> from pyarrow.lib import Table
>
> table = Table.from_pandas(df)
> ext.write_feather(table, dest,
>                          compression=compression,
> compression_level=compression_level,
>                          chunksize=chunksize, version=version)
> ```
> Thank You,
> --
> Arun Joseph
>
>

Re: [Python] why does write_feather drop index by default?

Reply via email to