[ 
https://issues.apache.org/jira/browse/ARROW-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17660957#comment-17660957
 ] 

Rok Mihevc commented on ARROW-3933:
-----------------------------------

This issue has been migrated to [issue 
#20542|https://github.com/apache/arrow/issues/20542] on GitHub. Please see the 
[migration documentation|https://github.com/apache/arrow/issues/14542] for 
further details.

> [Python] Segfault reading Parquet files from GNOMAD
> ---------------------------------------------------
>
>                 Key: ARROW-3933
>                 URL: https://issues.apache.org/jira/browse/ARROW-3933
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++, Python
>         Environment: Ubuntu 18.04 or Mac OS X
>            Reporter: David Konerding
>            Assignee: Wes McKinney
>            Priority: Minor
>              Labels: parquet, pull-request-available
>             Fix For: 0.15.0
>
>         Attachments: 
> part-r-00000-31fcf9bd-682f-4c20-bbe5-b0bd08699104.snappy.parquet
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> I am getting segfault trying to run a basic program Ubuntu 18.04 VM (AWS). 
> Error also occurs out of box on Mac OS X.
> $ sudo snap install --classic google-cloud-sdk
> $ gsutil cp 
> gs://gnomad-public/release/2.0.2/vds/exomes/gnomad.exomes.r2.0.2.sites.vds/rdd.parquet/part-r-00000-31fcf9bd-682f-4c20-bbe5-b0bd08699104.snappy.parquet
>  .
> $ conda install pyarrow
> $ python test.py
> Segmentation fault (core dumped)
> test.py:
> import pyarrow.parquet as pq
> path = "part-r-00000-31fcf9bd-682f-4c20-bbe5-b0bd08699104.snappy.parquet"
> pq.read_table(path)
> gdb output:
> Thread 3 "python" received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7fffdf199700 (LWP 13703)]
> 0x00007fffdfc2a470 in parquet::arrow::StructImpl::GetDefLevels(short const**, 
> unsigned long*) () from 
> /home/ubuntu/miniconda2/lib/python2.7/site-packages/pyarrow/../../../libparquet.so.11
> I tested fastparquet, it reads the file just fine.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to