Re: Efficient Pandas serialization for mixed object and numeric DataFrames

2018-10-19 Thread Wes McKinney
hi Mitar -- to Robert's point, we aren't sure which code path you are referring to. Perhaps related, I'm interested in handling Python pickling for "other" kinds of Python objects when converting to or from the Arrow format. So "Python object" would be defined as a user defined type that's embedde

Re: Efficient Pandas serialization for mixed object and numeric DataFrames

2018-10-19 Thread Antoine Pitrou
Slightly off-topic, but the recent work on PEP 574 (*) should allow efficient serialization of Pandas dataframes (**) with standard pickle (or the pickle5 backport). Experimental support for pickle5 has already been merged in Arrow and Numpy (and Pandas uses Numpy as its storage backend). My pe

Re: Efficient Pandas serialization for mixed object and numeric DataFrames

2018-10-18 Thread Robert Nishihara
How are you serializing the dataframe? If you use *pyarrow.serialize(df)*, then each column should be serialized separately and numeric columns will be handled efficiently. On Thu, Oct 18, 2018 at 9:10 PM Mitar wrote: > Hi! > > It seems that if a DataFrame contains both numeric and object column

Efficient Pandas serialization for mixed object and numeric DataFrames

2018-10-18 Thread Mitar
Hi! It seems that if a DataFrame contains both numeric and object columns, the whole DataFrame is pickled and not that only object columns are pickled? Is this right? Are there any plans to improve this? Mitar -- http://mitar.tnode.com/ https://twitter.com/mitar_m