+Valentyn Tymofieiev <valen...@google.com>

This sounds like it's related to ARROW-8983 (pyarrow takes a long time to
download after 0.16.0), discussed on the arrow dev list [2]. I'm not sure
what would've triggered this to start happening for you today though.

[1] https://issues.apache.org/jira/browse/ARROW-8983
[2]
https://lists.apache.org/thread.html/r9baa48a9d1517834c285f0f238f29fcf54405cb7cf1e681314239d7f%40%3Cdev.arrow.apache.org%3E

On Fri, Oct 9, 2020 at 12:10 PM Ross Vandegrift <
ross.vandegr...@cleardata.com> wrote:

> Hello,
>
> Starting today, running a beam pipeline triggers a large reinstallation of
> python modules.  For some reason, it forces full rebuilds from source -
> since
> beam depends on numpy, this takes a long time.
>
> There's nothing strange about my python setup.  I'm using python3.7 on
> debian
> buster with the dataflow runner.  My venv is setup like this:
>  python3 -m venv ~/.venvs/beam
>  . ~/.venvs/beam/bin/activate
>  python3 -m pip install --upgrade wheel
>  python3 -m pip install --upgrade pip setuptools
>  python3 -m pip install -r requirements.txt
>
> My requirements.txt has:
>   apache-beam[gcp]==2.23.0
>   boto3==1.15.0
>
> When it's building, `ps ax | grep python` shows me this:
>   /home/ross/.venvs/beam/bin/python -m pip download --dest /tmp/dataflow-
> requirements-cache -r requirements.txt --exists-action i --no-binary :all:
>
> How do I prevent this?  It's far too slow to develop with, and our
> compliance
> folks are likely to prohibit a tool that silently downloads & builds
> unknown
> code.
>
> Ross
>

Reply via email to