We run our tests with some flags on linux that help discovery illegal
memory access (e.g. MALLOC_PERTURB_=90). Any test that imports pyarrow is
segfaulting because of an issue discovered and fixed in the AWS SDK. It is
fixed as of 1.9.214 but arrow is currently built with version 1.8.133 (as
per
https://github.com/apache/arrow/blob/ea3480033e57947ae59e7862ed35b8bf3335ea9f/cpp/thirdparty/versions.txt
).

My question is more procedural than technical. Is there a way to request
that pyarrow/arrow gets built with the fixed version of the AWS SDK for the
builds published to pypi in the next week or two or is this unrealistic? If
so, is the next best option for us to git clone the source and build with
the fixed version of the AWS SDK?

Thanks in advance for your advice,
-Joe

*Technical details:*
- This occurs because of an incorrect resource handling that segfaults on
shutdown in the AWS SDK
- pyarrow's fs module calls arrow::fs::InitializeS3() on import which sets
up the conditions to segfault on exit (depending on how the memory
allocations all work out, which is why this is easier to reproduce with
MALLOC_PETURB_)
- This fix for this is (
https://github.com/aws/aws-sdk-cpp/commit/a2512bd02addd77515430ac74d7ee5f37343ec99
)
- The first release tag I see for this commit in aws-sdk-cpp is in version
1.9.214







Sent via Superhuman <https://sprh.mn/[email protected]>

Reply via email to