Seems a bit buggy, can you open a Jira issue? Thanks
On Wed, Nov 4, 2020 at 5:05 PM Jason Sachs <[email protected]> wrote: > > It looks like pyarrow.Table.from_pydict() cuts off binary data after an > embedded 00 byte. Is this a known bug? > > (py3) C:\>python > Python 3.8.5 (default, Sep 3 2020, 21:29:08) [MSC v.1916 64 bit (AMD64)] :: > Anaconda, Inc. on win32 > Type "help", "copyright", "credits" or "license" for more information. > >>> import numpy as np > >>> import pyarrow as pa > >>> > >>> data = np.array([b'', b'', b'', b'Foo!!', b'Bar!!', > .. b'\x00Baz!', b'half\x00baked', b''], dtype='|S13') > >>> t = pa.Table.from_pydict({'data':data}) > >>> t.to_pandas() > data > 0 b'' > 1 b'' > 2 b'' > 3 b'Foo!!' > 4 b'Bar!!' > 5 b'' > 6 b'half' > 7 b'' > >>> import pandas as pd > >>> pd.DataFrame(data) > 0 > 0 b'' > 1 b'' > 2 b'' > 3 b'Foo!!' > 4 b'Bar!!' > 5 b'\x00Baz!' > 6 b'half\x00baked' > 7 b'' > >>>
