Thomas Buhrmann created ARROW-5353: -------------------------------------- Summary: 0-row table can be written but not read Key: ARROW-5353 URL: https://issues.apache.org/jira/browse/ARROW-5353 Project: Apache Arrow Issue Type: Bug Components: C++, Python Affects Versions: 0.13.0, 0.12.0, 0.11.0 Reporter: Thomas Buhrmann
I can serialize a table with 0 rows, but not read it. The following code {code} import pandas as pd import pyarrow as pa df = pd.DataFrame({'x': [0,1,2]})[:0] fnm = "tbl.arr" tbl = pa.Table.from_pandas(df) print(tbl.schema) writer = pa.RecordBatchStreamWriter(fnm, tbl.schema) writer.write_table(tbl) reader = pa.RecordBatchStreamReader(fnm) tbl2 = reader.read_all() {code} ...results in the following output: {code} x: int64 metadata -------- OrderedDict([(b'pandas', b'{"index_columns": [{"kind": "range", "name": null, "start": ' b'0, "stop": 0, "step": 1}], "column_indexes": [{"name": null,' b' "field_name": null, "pandas_type": "unicode", "numpy_type":' b' "object", "metadata": {"encoding": "UTF-8"}}], "columns": [' b'{"name": "x", "field_name": "x", "pandas_type": "int64", "nu' b'mpy_type": "int64", "metadata": null}], "creator": {"library' b'": "pyarrow", "version": "0.13.0"}, "pandas_version": null}')]) --------------------------------------------------------------------------- ArrowInvalid Traceback (most recent call last) <ipython-input-3-8869ad9b37c6> in <module> 11 writer.write_table(tbl) 12 ---> 13 reader = pa.RecordBatchStreamReader(fnm) 14 tbl2 = reader.read_all() ~/anaconda/envs/grapy/lib/python3.6/site-packages/pyarrow/ipc.py in __init__(self, source) 56 """ 57 def __init__(self, source): ---> 58 self._open(source) 59 60 ~/anaconda/envs/grapy/lib/python3.6/site-packages/pyarrow/ipc.pxi in pyarrow.lib._RecordBatchStreamReader._open() ~/anaconda/envs/grapy/lib/python3.6/site-packages/pyarrow/error.pxi in pyarrow.lib.check_status() ArrowInvalid: Expected schema message in stream, was null or length 0 {code} Since the schema should be sufficient to build a table, even though it may not have any actual data, I wouldn't expect this to fail but return the same 0-row input table. -- This message was sent by Atlassian JIRA (v7.6.3#76005)