FYI, I tried to install a psycopg2 wheel from a file using the "extra_packages" argument (although, wheels installation is apparently still an experimental feature), but this led to a problem with ECS-2 vs ECS-4 compatibility issues (looks like the Dataflow version of Python is using ECS-2, while wheels for Linux generally use ECS-4).
What ended up working for me ultimately, though, is an approach similar to juliaset, with a few small differences: https://gist.github.com/doubleyou/27bf3abb0fc77a2bc9257e6adc5cfe8f Note two things here: 1. We import the "install" class from setuptools, not from distutils. This, in fact, has been the core problem for me. I haven't yet tried if the juliaset example works for me at all, but I strongly suspect that it may not work exactly because of this issue. 2. We handle commands in a simpler fashion, by just using one single class. I'll make a Jira ticket later today or tomorrow to reflect my findings, maybe make a pull request if I confirm that juliaset is not universally working either, if that's fine. On Tue, Jun 6, 2017 at 8:46 PM, Dmitry Demeshchuk <dmi...@postmates.com> wrote: > Yeah, I wasn't really pinning it myself, it's one of the dependency > packages that depends on that specific version. > > Thanks for the information, I'll try to explicitly install 33.1.1 and see > if it changes anything. > > On Tue, Jun 6, 2017 at 7:13 PM, Ahmet Altay <al...@google.com> wrote: > >> Pinning setuptools is generally not a good practice. The reason is at >> installation time it might cause removal of the the setuptools that is >> being used to install packages. >> >> FWIW, dataflow workers should have setuptools 33.1.1, which was released >> in 2017/01/16. >> >> Ahmet >> >> On Tue, Jun 6, 2017 at 6:53 PM, Dmitry Demeshchuk <dmi...@postmates.com> >> wrote: >> >>> Thanks, Ahmet, it really turned out that Stackdriver had more logs than >>> just the Dataflow logs section. >>> >>> So, I ended up seeing this code that fails constantly: >>> >>> I Running setup.py install for dataflow: started >>> I Running setup.py install for dataflow: finished with status 'error' >>> I Complete output from command /usr/bin/python -u -c "import >>> setuptools, >>> tokenize;__file__='/tmp/pip-bXyST4-build/setup.py';f=getattr(tokenize, >>> 'open', open)(__file__);code=f.read().replace('\r\n', >>> '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record >>> /tmp/pip-sHw6oI-record/install-record.txt >>> --single-version-externally-managed --compile: >>> I usage: -c [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...] >>> I or: -c --help [cmd1 cmd2 ...] >>> I or: -c --help-commands >>> I or: -c cmd --help >>> I >>> I error: option --single-version-externally-managed not recognized >>> I >>> I ---------------------------------------- >>> I Command "/usr/bin/python -u -c "import setuptools, >>> tokenize;__file__='/tmp/pip-bXyST4-build/setup.py';f=getattr(tokenize, >>> 'open', open)(__file__);code=f.read().replace('\r\n', >>> '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record >>> /tmp/pip-sHw6oI-record/install-record.txt >>> --single-version-externally-managed --compile" failed with error code 1 in >>> /tmp/pip-bXyST4-build/ >>> I /usr/local/bin/pip failed with exit status 1 >>> >>> >>> This seems to mean that the natively installed setuptools are too old, >>> and the new command has been generated with a newer version of setuptools >>> (specifically, my project has setuptools==36.0.1 as a dependency of some >>> package). I'm still digging more through the Stackdriver logs but so far >>> couldn't find out the exact reason of the failure. >>> >>> Also talking to the Dataflow folks, maybe they'll have a better idea. >>> I'll also try to compare this to the output of successful pipelines and see >>> if it gives me any ideas. >>> >>> Thank you. >>> >>> On Tue, Jun 6, 2017 at 4:40 PM, Ahmet Altay <al...@google.com> wrote: >>> >>>> >>>> >>>> On Tue, Jun 6, 2017 at 2:07 PM, Dmitry Demeshchuk <dmi...@postmates.com >>>> > wrote: >>>> >>>>> Hi Ahmet, >>>>> >>>>> Thanks a lot for pointing out that doc, I somehow missed it from the >>>>> official Python SDK page! >>>>> >>>>> One thing that comes to my mind is that generally one should probably >>>>> use the 'install' command in setuptools, not 'build', like it's done in >>>>> https://github.com/apache/beam/blob/master/sdks/python/ap >>>>> ache_beam/examples/complete/juliaset/setup.py#L113. Reason being, the >>>>> 'build' step seems to be executed on the original machine, not inside the >>>>> runner's containers, while 'install' will be triggered inside of them. If >>>>> I >>>>> run a pipeline that uses setup.py with a "build" step, it fails due to >>>>> being unable to "apt-get install libpq-dev" on a mac. >>>>> >>>> >>>> Thank you. This example should similarly work in install commands I >>>> believe. Also, if possible please file a JIRA issue with your ideas and we >>>> can work on improving things. >>>> >>>> >>>>> >>>>> I'm still trying to make it work with either build or install steps, >>>>> talking to the Dataflow folks in parallel to get more understanding of >>>>> what >>>>> I'm doing wrong (Dataflow doesn't send out installation failure logs to >>>>> Stackdriver, only runtime logs, so it seems). >>>>> >>>> >>>> Have you tried looking worker-startup logs? All of the logs should be >>>> in stackdriver. >>>> >>>> >>>>> >>>>> On Tue, Jun 6, 2017 at 9:21 AM, Ahmet Altay <al...@google.com> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> Please see Managing Python Pipeline Dependencies [1] for various ways >>>>>> on installing additional dependencies. The section on non-python >>>>>> dependencies is relevant to your question. >>>>>> >>>>>> Thank you, >>>>>> Ahmet >>>>>> >>>>>> [1] https://beam.apache.org/documentation/sdks/python-pipeli >>>>>> ne-dependencies/ >>>>>> >>>>>> On Mon, Jun 5, 2017 at 11:52 PM, Morand, Sebastien < >>>>>> sebastien.mor...@veolia.com> wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> Interested too. Could be fine for instance to add sftp >>>>>>> BoundedSource, but compilalation of paramiko with ssl library (and so >>>>>>> installation of ssl-dev) >>>>>>> >>>>>>> Regards, >>>>>>> >>>>>>> *Sébastien MORAND* >>>>>>> Team Lead Solution Architect >>>>>>> Technology & Operations / Digital Factory >>>>>>> Veolia - Group Information Systems & Technology (IS&T) >>>>>>> Cell.: +33 7 52 66 20 81 / Direct: +33 1 85 57 71 08 >>>>>>> <+33%201%2085%2057%2071%2008> >>>>>>> Bureau 0144C (Ouest) >>>>>>> 30, rue Madeleine-Vionnet - 93300 Aubervilliers, France >>>>>>> *www.veolia.com <http://www.veolia.com>* >>>>>>> <http://www.veolia.com> >>>>>>> <https://www.facebook.com/veoliaenvironment/> >>>>>>> <https://www.youtube.com/user/veoliaenvironnement> >>>>>>> <https://www.linkedin.com/company/veolia-environnement> >>>>>>> <https://twitter.com/veolia> >>>>>>> >>>>>>> On 6 June 2017 at 08:01, Dmitry Demeshchuk <dmi...@postmates.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi again, folks, >>>>>>>> >>>>>>>> How should I go about installing Python packages that require to be >>>>>>>> built and/or require native dependencies like shared libraries or such? >>>>>>>> >>>>>>>> I guess, I could potentially build the C-based modules using the >>>>>>>> same version of kernel and glibc that Dataflow is running, but doesn't >>>>>>>> seem >>>>>>>> like there's any way to install shared libraries at these boxes, right? >>>>>>>> >>>>>>>> Thanks! >>>>>>>> >>>>>>>> -- >>>>>>>> Best regards, >>>>>>>> Dmitry Demeshchuk. >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> ------------------------------------------------------------ >>>>>>> -------------------------------- >>>>>>> This e-mail transmission (message and any attached files) may >>>>>>> contain information that is proprietary, privileged and/or confidential >>>>>>> to >>>>>>> Veolia Environnement and/or its affiliates and is intended exclusively >>>>>>> for >>>>>>> the person(s) to whom it is addressed. If you are not the intended >>>>>>> recipient, please notify the sender by return e-mail and delete all >>>>>>> copies >>>>>>> of this e-mail, including all attachments. Unless expressly authorized, >>>>>>> any >>>>>>> use, disclosure, publication, retransmission or dissemination of this >>>>>>> e-mail and/or of its attachments is strictly prohibited. >>>>>>> >>>>>>> Ce message electronique et ses fichiers attaches sont strictement >>>>>>> confidentiels et peuvent contenir des elements dont Veolia Environnement >>>>>>> et/ou l'une de ses entites affiliees sont proprietaires. Ils sont donc >>>>>>> destines a l'usage de leurs seuls destinataires. Si vous avez recu ce >>>>>>> message par erreur, merci de le retourner a son emetteur et de le >>>>>>> detruire >>>>>>> ainsi que toutes les pieces attachees. L'utilisation, la divulgation, la >>>>>>> publication, la distribution, ou la reproduction non expressement >>>>>>> autorisees de ce message et de ses pieces attachees sont interdites. >>>>>>> ------------------------------------------------------------ >>>>>>> -------------------------------- >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Best regards, >>>>> Dmitry Demeshchuk. >>>>> >>>> >>>> >>> >>> >>> -- >>> Best regards, >>> Dmitry Demeshchuk. >>> >> >> > > > -- > Best regards, > Dmitry Demeshchuk. > -- Best regards, Dmitry Demeshchuk.