[ https://issues.apache.org/jira/browse/ARROW-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17660957#comment-17660957 ]
Rok Mihevc commented on ARROW-3933: ----------------------------------- This issue has been migrated to [issue #20542|https://github.com/apache/arrow/issues/20542] on GitHub. Please see the [migration documentation|https://github.com/apache/arrow/issues/14542] for further details. > [Python] Segfault reading Parquet files from GNOMAD > --------------------------------------------------- > > Key: ARROW-3933 > URL: https://issues.apache.org/jira/browse/ARROW-3933 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Python > Environment: Ubuntu 18.04 or Mac OS X > Reporter: David Konerding > Assignee: Wes McKinney > Priority: Minor > Labels: parquet, pull-request-available > Fix For: 0.15.0 > > Attachments: > part-r-00000-31fcf9bd-682f-4c20-bbe5-b0bd08699104.snappy.parquet > > Time Spent: 0.5h > Remaining Estimate: 0h > > I am getting segfault trying to run a basic program Ubuntu 18.04 VM (AWS). > Error also occurs out of box on Mac OS X. > $ sudo snap install --classic google-cloud-sdk > $ gsutil cp > gs://gnomad-public/release/2.0.2/vds/exomes/gnomad.exomes.r2.0.2.sites.vds/rdd.parquet/part-r-00000-31fcf9bd-682f-4c20-bbe5-b0bd08699104.snappy.parquet > . > $ conda install pyarrow > $ python test.py > Segmentation fault (core dumped) > test.py: > import pyarrow.parquet as pq > path = "part-r-00000-31fcf9bd-682f-4c20-bbe5-b0bd08699104.snappy.parquet" > pq.read_table(path) > gdb output: > Thread 3 "python" received signal SIGSEGV, Segmentation fault. > [Switching to Thread 0x7fffdf199700 (LWP 13703)] > 0x00007fffdfc2a470 in parquet::arrow::StructImpl::GetDefLevels(short const**, > unsigned long*) () from > /home/ubuntu/miniconda2/lib/python2.7/site-packages/pyarrow/../../../libparquet.so.11 > I tested fastparquet, it reads the file just fine. -- This message was sent by Atlassian Jira (v8.20.10#820010)