[ 
https://issues.apache.org/jira/browse/ARROW-12547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Baywatch closed ARROW-12547.
--------------------------------
    Resolution: Not A Problem

This isn't an arrow issue. I can reproduce with other types of memory-mapped 
files on NFS and am unable to reproduce with arrow when mmap is turned off. 

> Sigbus when using mmap in multiprocessing env over netapp
> ---------------------------------------------------------
>
>                 Key: ARROW-12547
>                 URL: https://issues.apache.org/jira/browse/ARROW-12547
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 3.0.0
>            Reporter: Jay Baywatch
>            Priority: Minor
>
> We have noticed a condition where using arrow to read parquet files that 
> reside on our netapp from slurm (over python) raise an occasional signal 7.
> We haven’t yet tried disabling memory mapping yet, although we do expect that 
> turning memory mapping off in read_table will resolve the issue.
> This seems to occur when we read a file that has just been written, even 
> though we do write parquet files to a transient location and then swap the 
> file in using os.rename
>  
> All that said, we were not sure if this was known issue or if team pyarrow is 
> interested in the stack trace.
>  
>  
> Thread 1 (Thread 0x7fafa7dff700 (LWP 44408)):
> #0  __memcpy_avx_unaligned () at 
> ../sysdeps/x86_64/multiarch/memcpy-avx-unaligned.S:238
> #1  0x00007fafb9c40aba in snappy::RawUncompress(snappy::Source*, char*) () 
> from 
> /home/svc_backtest/portfolio_analytics/prod/pyenv/lib/python3.7/site-packages/pyarrow/*libarrow.so*.300
> #2  0x00007fafb9c41131 in snappy::RawUncompress(char const*, unsigned long, 
> char*) () from 
> /home/svc_backtest/portfolio_analytics/prod/pyenv/lib/python3.7/site-packages/pyarrow/libarrow.so.300
> #3  0x00007fafb942abbe in arrow::util::internal::(anonymous 
> namespace)::SnappyCodec::Decompress(long, unsigned char const*, long, 
> unsigned char*) () from 
> /home/svc_backtest/portfolio_analytics/prod/pyenv/lib/python3.7/site-packages/pyarrow/libarrow.so.300
> #4  0x00007fafb4d0965e in parquet::(anonymous 
> namespace)::SerializedPageReader::DecompressIfNeeded(std::shared_ptr<arrow::Buffer>,
>  int, int, int) () from 
> /home/svc_backtest/portfolio_analytics/prod/pyenv/lib/python3.7/site-packages/pyarrow/*libparquet.so*.300
> #5  0x00007fafb4d2bc2d in parquet::(anonymous 
> namespace)::SerializedPageReader::NextPage() () from 
> /home/svc_backtest/portfolio_analytics/prod/pyenv/lib/python3.7/site-packages/pyarrow/libparquet.so.300
> #6  0x00007fafb4d330c3 in parquet::(anonymous 
> namespace)::ColumnReaderImplBase<parquet::PhysicalType<(parquet::Type::type)5>
>  >::HasNextInternal() [clone .part.0] () from 
> /home/svc_backtest/portfolio_analytics/prod/pyenv/lib/python3.7/site-packages/pyarrow/libparquet.so.300
> #7  0x00007fafb4d33eb8 in parquet::internal::(anonymous 
> namespace)::TypedRecordReader<parquet::PhysicalType<(parquet::Type::type)5> 
> >::ReadRecords(long) () from 
> /home/svc_backtest/portfolio_analytics/prod/pyenv/lib/python3.7/site-packages/pyarrow/libparquet.so.300
> #8  0x00007fafb4d21bb8 in parquet::arrow::(anonymous 
> namespace)::LeafReader::LoadBatch(long) () from 
> /home/svc_backtest/portfolio_analytics/prod/pyenv/lib/python3.7/site-packages/pyarrow/libparquet.so.300
> #9  0x00007fafb4d489c8 in parquet::arrow::ColumnReaderImpl::NextBatch(long, 
> std::shared_ptr<arrow::ChunkedArray>*) () from 
> /home/svc_backtest/portfolio_analytics/prod/pyenv/lib/python3.7/site-packages/pyarrow/libparquet.so.300
> #10 0x00007fafb4d32db9 in arrow::internal::FnOnce<void 
> ()>::FnImpl<std::_Bind<arrow::detail::ContinueFuture 
> (arrow::Future<arrow::detail::Empty>, parquet::arrow::(anonymous 
> namespace)::FileReaderImpl::GetRecordBatchReader(std::vector<int, 
> std::allocator<int> > const&, std::vector<int, std::allocator<int> > const&, 
> std::unique_ptr<arrow::RecordBatchReader, 
> std::default_delete<arrow::RecordBatchReader> 
> >*)::\{lambda()#1}::operator()()::\{lambda(int)#1}, int)> >::invoke() () from 
> /home/svc_backtest/portfolio_analytics/prod/pyenv/lib/python3.7/site-packages/pyarrow/libparquet.so.300
> #11 0x00007fafb9444ddd in 
> std::thread::_State_impl<std::thread::_Invoker<std::tuple<arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::\{lambda()#1}>
>  > >::_M_run() () from 
> /home/svc_backtest/portfolio_analytics/prod/pyenv/lib/python3.7/site-packages/pyarrow/libarrow.so.300
> #12 0x00007fafb9dd3580 in execute_native_thread_routine () from 
> /home/svc_backtest/portfolio_analytics/prod/pyenv/lib/python3.7/site-packages/pyarrow/libarrow.so.300
> #13 0x00007fafefcdc6ba in start_thread (arg=0x7fafa7dff700) at 
> pthread_create.c:333
> #14 0x00007fafefa1241d in clone () at 
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:109



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to