[ https://issues.apache.org/jira/browse/ARROW-4267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Uwe L. Korn reassigned ARROW-4267: ---------------------------------- Assignee: Uwe L. Korn > [Python/C++] Segfault when reading rowgroups with duplicated columns > -------------------------------------------------------------------- > > Key: ARROW-4267 > URL: https://issues.apache.org/jira/browse/ARROW-4267 > Project: Apache Arrow > Issue Type: Bug > Affects Versions: 0.11.1 > Reporter: Florian Jetter > Assignee: Uwe L. Korn > Priority: Minor > Fix For: 0.13.0 > > > When reading a row group using duplicated columns I receive a segfault. > {code:python} > import pandas as pd > import pyarrow as pa > import pyarrow.parquet as pq > df = pd.DataFrame({ > "col": ["A", "B"] > }) > table = pa.Table.from_pandas(df) > buf = pa.BufferOutputStream() > pq.write_table(table, buf) > parquet_file = pq.ParquetFile(buf.getvalue()) > parquet_file.read_row_group(0) > parquet_file.read_row_group(0, columns=["col"]) > # boom > parquet_file.read_row_group(0, columns=["col", "col"]) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)