Dima Ryazanov created ARROW-2592: ------------------------------------ Summary: [Python] AssertionError in to_pandas() Key: ARROW-2592 URL: https://issues.apache.org/jira/browse/ARROW-2592 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 0.9.0, 0.8.0 Reporter: Dima Ryazanov
Pyarrow 0.8 and 0.9 raises an AssertionError for one of the datasets I have (created using an older version of pyarrow). Repro steps: {{In [1]: from pyarrow.parquet import ParquetDataset}} {{In [2]: d = ParquetDataset(['bug.parq'])}} {{In [3]: t = d.read()}} {{In [4]: t.to_pandas()}} {{---------------------------------------------------------------------------}} {{AssertionError Traceback (most recent call last)}} {{<ipython-input-4-d17c9e2818f1> in <module>()}} {{----> 1 t.to_pandas()}} {{table.pxi in pyarrow.lib.Table.to_pandas()}} {{~/envs/cli3/lib/python3.6/site-packages/pyarrow/pandas_compat.py in table_to_blockmanager(options, table, memory_pool, nthreads, categories)}} {{ 529 # There must be the same number of field names and physical names}} {{ 530 # (fields in the arrow Table)}} {{--> 531 assert len(logical_index_names) == len(index_columns_set)}} {{ 532 }} {{ 533 # It can never be the case in a released version of pyarrow that}} {{AssertionError: }} Here's the file: [https://www.dropbox.com/s/oja3khjsc5tycfh/bug.parq] (I was not able to attach it here due to a "missing token", whatever that means.) -- This message was sent by Atlassian JIRA (v7.6.3#76005)