Hi Valentyn,
Thank You for information and details. All make sense! I think we can
wait for 2.53.0 release and meantime apply hotfix.
Best
Wiśniowski Piotr
On 10.11.2023 20:27, Valentyn Tymofieiev via user wrote:
From https://pypi.org/project/pyarrow-hotfix/ :
pyarrow_hotfix must be imported in your application or library code
for it to take effect.
Just installing the package is not sufficient:
For Beam users, that means that the pipeline code running on the
workers would need to import this module on every worker, for example
by adding this line to DoFn.setup or in main session (if pipeline is
composed only from one file AND uses dill pickler with
--save_main_session flag).
We will continue addressing this in
https://github.com/apache/beam/issues/29392.
On Fri, Nov 10, 2023 at 10:23 AM Valentyn Tymofieiev
<valen...@google.com> wrote:
Hi Piotr, thanks for bringing this to the list.
There is a FR to support pyarrow
https://github.com/apache/beam/issues/28410 . I looked into it
briefly in https://github.com/apache/beam/pull/28437 but saw some
test failures and it has been on back burner. Given the news about
vulnerability it would make sense to prioritize this.
I think we could decouple this from 2.52.0 release since:
1) there is a workaround
2) new versions of pyarrow haven't been fully tested with Beam
3) Beam 2.52.0 fixes some other issues that are known to
affecting users, e.g. https://github.com/apache/beam/issues/28246
From
https://securityonline.info/cve-2023-47248-pyarrow-arbitrary-code-execution-vulnerability-a-critical-threat-to-data-analysts/
:
> If you cannot upgrade to PyArrow 14.0.1, you can use the
pyarrow-hotfix package to disable the vulnerability on older
versions of PyArrow. However, this is not a permanent solution,
and you should upgrade to PyArrow 14.0.1 as soon as possible. We
could consider adding pyarrow-hotfix to the containers for 2.52.0
release. CC: @Danny McCormick
<mailto:dannymccorm...@google.com> (release manager).
Beam users can also install this additional dependency via one of
the ways described in
https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/
.
On Fri, Nov 10, 2023 at 4:42 AM Wiśniowski Piotr
<contact.wisniowskipi...@gmail.com> wrote:
Hi,
Few days ago this one was detected:
https://securityonline.info/cve-2023-47248-pyarrow-arbitrary-code-execution-vulnerability-a-critical-threat-to-data-analysts/
I do see that beam 2.51.0 does have `pyarrow<=12.0.0` in
requirements.
1. Is there a reason for not allowing newer versions of pyarrow?
2. Is there any planned effort on updating this to `14.0.1`?
Is it
possible to push the update to `2.52.0` beam release? I know
the beam
release is almost there.
Best
Wiśniowski Piotr