+Valentyn Tymofieiev <valen...@google.com> This sounds like it's related to ARROW-8983 (pyarrow takes a long time to download after 0.16.0), discussed on the arrow dev list [2]. I'm not sure what would've triggered this to start happening for you today though.
[1] https://issues.apache.org/jira/browse/ARROW-8983 [2] https://lists.apache.org/thread.html/r9baa48a9d1517834c285f0f238f29fcf54405cb7cf1e681314239d7f%40%3Cdev.arrow.apache.org%3E On Fri, Oct 9, 2020 at 12:10 PM Ross Vandegrift < ross.vandegr...@cleardata.com> wrote: > Hello, > > Starting today, running a beam pipeline triggers a large reinstallation of > python modules. For some reason, it forces full rebuilds from source - > since > beam depends on numpy, this takes a long time. > > There's nothing strange about my python setup. I'm using python3.7 on > debian > buster with the dataflow runner. My venv is setup like this: > python3 -m venv ~/.venvs/beam > . ~/.venvs/beam/bin/activate > python3 -m pip install --upgrade wheel > python3 -m pip install --upgrade pip setuptools > python3 -m pip install -r requirements.txt > > My requirements.txt has: > apache-beam[gcp]==2.23.0 > boto3==1.15.0 > > When it's building, `ps ax | grep python` shows me this: > /home/ross/.venvs/beam/bin/python -m pip download --dest /tmp/dataflow- > requirements-cache -r requirements.txt --exists-action i --no-binary :all: > > How do I prevent this? It's far too slow to develop with, and our > compliance > folks are likely to prohibit a tool that silently downloads & builds > unknown > code. > > Ross >