[ https://issues.apache.org/jira/browse/ARROW-12547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jay Baywatch closed ARROW-12547. -------------------------------- Resolution: Not A Problem This isn't an arrow issue. I can reproduce with other types of memory-mapped files on NFS and am unable to reproduce with arrow when mmap is turned off. > Sigbus when using mmap in multiprocessing env over netapp > --------------------------------------------------------- > > Key: ARROW-12547 > URL: https://issues.apache.org/jira/browse/ARROW-12547 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Affects Versions: 3.0.0 > Reporter: Jay Baywatch > Priority: Minor > > We have noticed a condition where using arrow to read parquet files that > reside on our netapp from slurm (over python) raise an occasional signal 7. > We haven’t yet tried disabling memory mapping yet, although we do expect that > turning memory mapping off in read_table will resolve the issue. > This seems to occur when we read a file that has just been written, even > though we do write parquet files to a transient location and then swap the > file in using os.rename > > All that said, we were not sure if this was known issue or if team pyarrow is > interested in the stack trace. > > > Thread 1 (Thread 0x7fafa7dff700 (LWP 44408)): > #0 __memcpy_avx_unaligned () at > ../sysdeps/x86_64/multiarch/memcpy-avx-unaligned.S:238 > #1 0x00007fafb9c40aba in snappy::RawUncompress(snappy::Source*, char*) () > from > /home/svc_backtest/portfolio_analytics/prod/pyenv/lib/python3.7/site-packages/pyarrow/*libarrow.so*.300 > #2 0x00007fafb9c41131 in snappy::RawUncompress(char const*, unsigned long, > char*) () from > /home/svc_backtest/portfolio_analytics/prod/pyenv/lib/python3.7/site-packages/pyarrow/libarrow.so.300 > #3 0x00007fafb942abbe in arrow::util::internal::(anonymous > namespace)::SnappyCodec::Decompress(long, unsigned char const*, long, > unsigned char*) () from > /home/svc_backtest/portfolio_analytics/prod/pyenv/lib/python3.7/site-packages/pyarrow/libarrow.so.300 > #4 0x00007fafb4d0965e in parquet::(anonymous > namespace)::SerializedPageReader::DecompressIfNeeded(std::shared_ptr<arrow::Buffer>, > int, int, int) () from > /home/svc_backtest/portfolio_analytics/prod/pyenv/lib/python3.7/site-packages/pyarrow/*libparquet.so*.300 > #5 0x00007fafb4d2bc2d in parquet::(anonymous > namespace)::SerializedPageReader::NextPage() () from > /home/svc_backtest/portfolio_analytics/prod/pyenv/lib/python3.7/site-packages/pyarrow/libparquet.so.300 > #6 0x00007fafb4d330c3 in parquet::(anonymous > namespace)::ColumnReaderImplBase<parquet::PhysicalType<(parquet::Type::type)5> > >::HasNextInternal() [clone .part.0] () from > /home/svc_backtest/portfolio_analytics/prod/pyenv/lib/python3.7/site-packages/pyarrow/libparquet.so.300 > #7 0x00007fafb4d33eb8 in parquet::internal::(anonymous > namespace)::TypedRecordReader<parquet::PhysicalType<(parquet::Type::type)5> > >::ReadRecords(long) () from > /home/svc_backtest/portfolio_analytics/prod/pyenv/lib/python3.7/site-packages/pyarrow/libparquet.so.300 > #8 0x00007fafb4d21bb8 in parquet::arrow::(anonymous > namespace)::LeafReader::LoadBatch(long) () from > /home/svc_backtest/portfolio_analytics/prod/pyenv/lib/python3.7/site-packages/pyarrow/libparquet.so.300 > #9 0x00007fafb4d489c8 in parquet::arrow::ColumnReaderImpl::NextBatch(long, > std::shared_ptr<arrow::ChunkedArray>*) () from > /home/svc_backtest/portfolio_analytics/prod/pyenv/lib/python3.7/site-packages/pyarrow/libparquet.so.300 > #10 0x00007fafb4d32db9 in arrow::internal::FnOnce<void > ()>::FnImpl<std::_Bind<arrow::detail::ContinueFuture > (arrow::Future<arrow::detail::Empty>, parquet::arrow::(anonymous > namespace)::FileReaderImpl::GetRecordBatchReader(std::vector<int, > std::allocator<int> > const&, std::vector<int, std::allocator<int> > const&, > std::unique_ptr<arrow::RecordBatchReader, > std::default_delete<arrow::RecordBatchReader> > >*)::\{lambda()#1}::operator()()::\{lambda(int)#1}, int)> >::invoke() () from > /home/svc_backtest/portfolio_analytics/prod/pyenv/lib/python3.7/site-packages/pyarrow/libparquet.so.300 > #11 0x00007fafb9444ddd in > std::thread::_State_impl<std::thread::_Invoker<std::tuple<arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::\{lambda()#1}> > > >::_M_run() () from > /home/svc_backtest/portfolio_analytics/prod/pyenv/lib/python3.7/site-packages/pyarrow/libarrow.so.300 > #12 0x00007fafb9dd3580 in execute_native_thread_routine () from > /home/svc_backtest/portfolio_analytics/prod/pyenv/lib/python3.7/site-packages/pyarrow/libarrow.so.300 > #13 0x00007fafefcdc6ba in start_thread (arg=0x7fafa7dff700) at > pthread_create.c:333 > #14 0x00007fafefa1241d in clone () at > ../sysdeps/unix/sysv/linux/x86_64/clone.S:109 -- This message was sent by Atlassian Jira (v8.3.4#803005)