Martin Morgan created ARROW-14677:
-------------------------------------

             Summary: macOS R package arrow segfault on `open_dataset()`
                 Key: ARROW-14677
                 URL: https://issues.apache.org/jira/browse/ARROW-14677
             Project: Apache Arrow
          Issue Type: Bug
          Components: R
    Affects Versions: 6.0.0
            Reporter: Martin Morgan


Following a slack post 
(https://ropensci.slack.com/archives/C026GCWKA/p1636588933095400), accessing a 
public bucket with the R client
{code:java}
df <- 
arrow::open_dataset("s3://gbif-open-data-af-south-1/occurrence/2021-11-01/occurrence.parquet/")
{code}
leads to a segfault
{code:java}
  *** caught segfault ***
address 0x0, cause 'unknown'
Traceback:
1: dataset__DatasetFactory_Finish1(self, unify_schemas)
2: factory$Finish(schema, isTRUE(unify_schemas))
3: doTryCatch(return(expr), name, parentenv, handler)
4: tryCatchOne(expr, names, parentenv, handlers[[1L]])
5: tryCatchList(expr, classes, parentenv, handlers)
6: tryCatch(factory$Finish(schema, isTRUE(unify_schemas)), error = function(e)
{ handle_parquet_io_error(e, format)}
)
7: 
arrow::open_dataset("s3://gbif-open-data-af-south-1/occurrence/2021-11-01/occurrence.parquet/")
 
{code}
The arrow portion of the lldb traceback is
{code:java}
(lldb) thread backtrace

thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS 
(code=EXC_I386_GPFLT) frame #0: 0x000000012ab2029c 
libthrift-0.15.0.dylib`std::__1::shared_ptr<apache::thrift::async::TAsyncProcessor>::~shared_ptr()
 + 46
frame #1: 0x0000000128bb6ac2 arrow.so`void 
parquet::DeserializeThriftUnencryptedMsg<parquet::format::FileMetaData>(unsigned
 char const*, unsigned int*, parquet::format::FileMetaData*) + 309
frame #2: 0x0000000128bb5f49 
arrow.so`parquet::FileMetaData::FileMetaDataImpl::FileMetaDataImpl(void const*, 
unsigned int*, std::__1::shared_ptr<parquet::InternalFileDecryptor>) + 517
frame #3: 0x0000000128bace0d arrow.so`parquet::FileMetaData::FileMetaData(void 
const*, unsigned int*, std::__1::shared_ptr<parquet::InternalFileDecryptor>) + 
85
frame #4: 0x0000000128bacd1b arrow.so`parquet::FileMetaData::Make(void const*, 
unsigned int*, std::__1::shared_ptr<parquet::InternalFileDecryptor>) + 89
frame #5: 0x0000000128b9cb4a 
arrow.so`parquet::SerializedFile::ParseUnencryptedFileMetadata(std::__1::shared_ptr<arrow::Buffer>
 const&, unsigned int) + 118
frame #6: 0x0000000128b9df43 arrow.so`parquet::SerializedFile::ParseMetaData() 
+ 607
frame #7: 0x0000000128b9dc6c 
arrow.so`parquet::ParquetFileReader::Contents::Open(std::_1::shared_ptr<arrow::io::RandomAccessFile>,
 parquet::ReaderProperties const&, std::_1::shared_ptr<parquet::FileMetaData>) 
+ 214
frame #8: 0x0000000128b9eb72 
arrow.so`parquet::ParquetFileReader::Open(std::_1::shared_ptr<arrow::io::RandomAccessFile>,
 parquet::ReaderProperties const&, std::_1::shared_ptr<parquet::FileMetaData>) 
+ 58
frame #9: 0x0000000128c8a988 
arrow.so`arrow::dataset::ParquetFileFormat::GetReader(arrow::dataset::FileSource
 const&, arrow::dataset::ScanOptions*) const + 286
frame #10: 0x0000000128c8a72e 
arrow.so`arrow::dataset::ParquetFileFormat::Inspect(arrow::dataset::FileSource 
const&) const + 44
frame #11: 0x0000000128c0b994 
arrow.so`arrow::dataset::FileSystemDatasetFactory::InspectSchemas(arrow::dataset::InspectOptions)
 + 336
frame #12: 0x0000000128c09079 
arrow.so`arrow::dataset::DatasetFactory::Inspect(arrow::dataset::InspectOptions)
 + 43
frame #13: 0x0000000128c0c1cf 
arrow.so`arrow::dataset::FileSystemDatasetFactory::Finish(arrow::dataset::FinishOptions)
 + 541
frame #14: 0x0000000128a66805 
arrow.so`dataset__DatasetFactoryFinish1(std::_1::shared_ptr<arrow::dataset::DatasetFactory>
 const&, bool) + 69
frame #15: 0x0000000128a105aa arrow.so`arrow_dataset_DatasetFactory_Finish1 + 
154 {code}
arrow was installed from source on
{code:java}
> sessionInfo()
R Under development (unstable) (2021-10-28 r81109)
Platform: x86_64-apple-darwin19.6.0 (64-bit)
Running under: macOS Catalina 10.15.7
Matrix products: default
BLAS: /Users/ma38727/bin/R-devel/lib/libRblas.dylib
LAPACK: /Users/ma38727/bin/R-devel/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] arrow_6.0.0.2
loaded via a namespace (and not attached):
[1] tidyselect_1.1.1 bit_4.0.4 compiler_4.2.0
[4] BiocManager_1.30.16 magrittr_2.0.1 assertthat_0.2.1
[7] R6_2.5.1 glue_1.5.0 bit64_4.0.5
[10] vctrs_0.3.8 rlang_0.4.12 purrr_0.3.4
{code}
During package installation, the one step that was 'new' to me was the use of 
autobrew
{code:java}
*** Downloading apache-arrow
Using autobrew bundle: apache-arrow-6.0.0-high_sierra.tar.xz{code}
I'm not sure how to validate that this use is consistent with my brew 
installation.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to