Thanks for all your help, Ahmet! Comments inline.
On Thu, Jun 8, 2017 at 6:32 PM, Ahmet Altay <[email protected]> wrote: > Thank you for the update, some questions inline. > > On Thu, Jun 8, 2017 at 6:21 PM, Dmitry Demeshchuk <[email protected]> > wrote: > >> FYI, I tried to install a psycopg2 wheel from a file using the >> "extra_packages" argument (although, wheels installation is apparently >> still an experimental feature), but this led to a problem with ECS-2 vs >> ECS-4 compatibility issues (looks like the Dataflow version of Python is >> using ECS-2, while wheels for Linux generally use ECS-4). >> > > What is ECS-2 vs ECS-4 problem, and what the compatibility issue? > Basically, when I would try to import psycopg2 inside a module, the pipeline would die with the following error: /usr/local/lib/python2.7/dist-packages/psycopg2/_psycopg.so: undefined symbol: PyUnicodeUCS2_DecodeUTF8 This issue is explained in the official Python FAQ: https://docs.python.org/2.7/faq/extending.html#when-importing-module-x-why-do-i-get-undefined-symbol-pyunicodeucs2 . During Python compilation, there's a ./configure option that gets passed to specify how many bytes are being used for Unicode. My guess would be that Dataflow's Python uses 2. > > > >> >> What ended up working for me ultimately, though, is an approach similar >> to juliaset, with a few small differences: https://gist.gith >> ub.com/doubleyou/27bf3abb0fc77a2bc9257e6adc5cfe8f >> >> Note two things here: >> >> 1. We import the "install" class from setuptools, not from distutils. >> This, in fact, has been the core problem for me. I haven't yet tried if the >> juliaset example works for me at all, but I strongly suspect that it may >> not work exactly because of this issue. >> > > Please let us know if juliaset does not work for you as is. > Will do! I'll try to find some time to test it out tomorrow. > > >> >> 2. We handle commands in a simpler fashion, by just using one single >> class. >> >> I'll make a Jira ticket later today or tomorrow to reflect my findings, >> maybe make a pull request if I confirm that juliaset is not universally >> working either, if that's fine. >> > > It would be great if you can share this information in a JIRA issue. > Juliaset is only an example of running commands at setup time, it does not > globally solve all possible issues. > Sounds good. I'll create a JIRA when I have enough input information to provide and a clean reproduction case. > > Ahmet > > >> >> On Tue, Jun 6, 2017 at 8:46 PM, Dmitry Demeshchuk <[email protected]> >> wrote: >> >>> Yeah, I wasn't really pinning it myself, it's one of the dependency >>> packages that depends on that specific version. >>> >>> Thanks for the information, I'll try to explicitly install 33.1.1 and >>> see if it changes anything. >>> >>> On Tue, Jun 6, 2017 at 7:13 PM, Ahmet Altay <[email protected]> wrote: >>> >>>> Pinning setuptools is generally not a good practice. The reason is at >>>> installation time it might cause removal of the the setuptools that is >>>> being used to install packages. >>>> >>>> FWIW, dataflow workers should have setuptools 33.1.1, which was >>>> released in 2017/01/16. >>>> >>>> Ahmet >>>> >>>> On Tue, Jun 6, 2017 at 6:53 PM, Dmitry Demeshchuk <[email protected] >>>> > wrote: >>>> >>>>> Thanks, Ahmet, it really turned out that Stackdriver had more logs >>>>> than just the Dataflow logs section. >>>>> >>>>> So, I ended up seeing this code that fails constantly: >>>>> >>>>> I Running setup.py install for dataflow: started >>>>> I Running setup.py install for dataflow: finished with status 'error' >>>>> I Complete output from command /usr/bin/python -u -c "import >>>>> setuptools, >>>>> tokenize;__file__='/tmp/pip-bXyST4-build/setup.py';f=getattr(tokenize, >>>>> 'open', open)(__file__);code=f.read().replace('\r\n', >>>>> '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record >>>>> /tmp/pip-sHw6oI-record/install-record.txt >>>>> --single-version-externally-managed --compile: >>>>> I usage: -c [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...] >>>>> I or: -c --help [cmd1 cmd2 ...] >>>>> I or: -c --help-commands >>>>> I or: -c cmd --help >>>>> I >>>>> I error: option --single-version-externally-managed not recognized >>>>> I >>>>> I ---------------------------------------- >>>>> I Command "/usr/bin/python -u -c "import setuptools, >>>>> tokenize;__file__='/tmp/pip-bXyST4-build/setup.py';f=getattr(tokenize, >>>>> 'open', open)(__file__);code=f.read().replace('\r\n', >>>>> '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record >>>>> /tmp/pip-sHw6oI-record/install-record.txt >>>>> --single-version-externally-managed --compile" failed with error code 1 >>>>> in /tmp/pip-bXyST4-build/ >>>>> I /usr/local/bin/pip failed with exit status 1 >>>>> >>>>> >>>>> This seems to mean that the natively installed setuptools are too old, >>>>> and the new command has been generated with a newer version of setuptools >>>>> (specifically, my project has setuptools==36.0.1 as a dependency of some >>>>> package). I'm still digging more through the Stackdriver logs but so far >>>>> couldn't find out the exact reason of the failure. >>>>> >>>>> Also talking to the Dataflow folks, maybe they'll have a better idea. >>>>> I'll also try to compare this to the output of successful pipelines and >>>>> see >>>>> if it gives me any ideas. >>>>> >>>>> Thank you. >>>>> >>>>> On Tue, Jun 6, 2017 at 4:40 PM, Ahmet Altay <[email protected]> wrote: >>>>> >>>>>> >>>>>> >>>>>> On Tue, Jun 6, 2017 at 2:07 PM, Dmitry Demeshchuk < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Hi Ahmet, >>>>>>> >>>>>>> Thanks a lot for pointing out that doc, I somehow missed it from the >>>>>>> official Python SDK page! >>>>>>> >>>>>>> One thing that comes to my mind is that generally one should >>>>>>> probably use the 'install' command in setuptools, not 'build', like it's >>>>>>> done in https://github.com/apache/beam/blob/master/sdks/python/ap >>>>>>> ache_beam/examples/complete/juliaset/setup.py#L113. Reason being, >>>>>>> the 'build' step seems to be executed on the original machine, not >>>>>>> inside >>>>>>> the runner's containers, while 'install' will be triggered inside of >>>>>>> them. >>>>>>> If I run a pipeline that uses setup.py with a "build" step, it fails >>>>>>> due to >>>>>>> being unable to "apt-get install libpq-dev" on a mac. >>>>>>> >>>>>> >>>>>> Thank you. This example should similarly work in install commands I >>>>>> believe. Also, if possible please file a JIRA issue with your ideas and >>>>>> we >>>>>> can work on improving things. >>>>>> >>>>>> >>>>>>> >>>>>>> I'm still trying to make it work with either build or install steps, >>>>>>> talking to the Dataflow folks in parallel to get more understanding of >>>>>>> what >>>>>>> I'm doing wrong (Dataflow doesn't send out installation failure logs to >>>>>>> Stackdriver, only runtime logs, so it seems). >>>>>>> >>>>>> >>>>>> Have you tried looking worker-startup logs? All of the logs should be >>>>>> in stackdriver. >>>>>> >>>>>> >>>>>>> >>>>>>> On Tue, Jun 6, 2017 at 9:21 AM, Ahmet Altay <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> Please see Managing Python Pipeline Dependencies [1] for various >>>>>>>> ways on installing additional dependencies. The section on non-python >>>>>>>> dependencies is relevant to your question. >>>>>>>> >>>>>>>> Thank you, >>>>>>>> Ahmet >>>>>>>> >>>>>>>> [1] https://beam.apache.org/documentation/sdks/python-pipeli >>>>>>>> ne-dependencies/ >>>>>>>> >>>>>>>> On Mon, Jun 5, 2017 at 11:52 PM, Morand, Sebastien < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> Interested too. Could be fine for instance to add sftp >>>>>>>>> BoundedSource, but compilalation of paramiko with ssl library (and so >>>>>>>>> installation of ssl-dev) >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> >>>>>>>>> *Sébastien MORAND* >>>>>>>>> Team Lead Solution Architect >>>>>>>>> Technology & Operations / Digital Factory >>>>>>>>> Veolia - Group Information Systems & Technology (IS&T) >>>>>>>>> Cell.: +33 7 52 66 20 81 / Direct: +33 1 85 57 71 08 >>>>>>>>> <+33%201%2085%2057%2071%2008> >>>>>>>>> Bureau 0144C (Ouest) >>>>>>>>> 30, rue Madeleine-Vionnet - 93300 Aubervilliers, France >>>>>>>>> *www.veolia.com <http://www.veolia.com>* >>>>>>>>> <http://www.veolia.com> >>>>>>>>> <https://www.facebook.com/veoliaenvironment/> >>>>>>>>> <https://www.youtube.com/user/veoliaenvironnement> >>>>>>>>> <https://www.linkedin.com/company/veolia-environnement> >>>>>>>>> <https://twitter.com/veolia> >>>>>>>>> >>>>>>>>> On 6 June 2017 at 08:01, Dmitry Demeshchuk <[email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hi again, folks, >>>>>>>>>> >>>>>>>>>> How should I go about installing Python packages that require to >>>>>>>>>> be built and/or require native dependencies like shared libraries or >>>>>>>>>> such? >>>>>>>>>> >>>>>>>>>> I guess, I could potentially build the C-based modules using the >>>>>>>>>> same version of kernel and glibc that Dataflow is running, but >>>>>>>>>> doesn't seem >>>>>>>>>> like there's any way to install shared libraries at these boxes, >>>>>>>>>> right? >>>>>>>>>> >>>>>>>>>> Thanks! >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Best regards, >>>>>>>>>> Dmitry Demeshchuk. >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> ------------------------------------------------------------ >>>>>>>>> -------------------------------- >>>>>>>>> This e-mail transmission (message and any attached files) may >>>>>>>>> contain information that is proprietary, privileged and/or >>>>>>>>> confidential to >>>>>>>>> Veolia Environnement and/or its affiliates and is intended >>>>>>>>> exclusively for >>>>>>>>> the person(s) to whom it is addressed. If you are not the intended >>>>>>>>> recipient, please notify the sender by return e-mail and delete all >>>>>>>>> copies >>>>>>>>> of this e-mail, including all attachments. Unless expressly >>>>>>>>> authorized, any >>>>>>>>> use, disclosure, publication, retransmission or dissemination of this >>>>>>>>> e-mail and/or of its attachments is strictly prohibited. >>>>>>>>> >>>>>>>>> Ce message electronique et ses fichiers attaches sont strictement >>>>>>>>> confidentiels et peuvent contenir des elements dont Veolia >>>>>>>>> Environnement >>>>>>>>> et/ou l'une de ses entites affiliees sont proprietaires. Ils sont donc >>>>>>>>> destines a l'usage de leurs seuls destinataires. Si vous avez recu ce >>>>>>>>> message par erreur, merci de le retourner a son emetteur et de le >>>>>>>>> detruire >>>>>>>>> ainsi que toutes les pieces attachees. L'utilisation, la divulgation, >>>>>>>>> la >>>>>>>>> publication, la distribution, ou la reproduction non expressement >>>>>>>>> autorisees de ce message et de ses pieces attachees sont interdites. >>>>>>>>> ------------------------------------------------------------ >>>>>>>>> -------------------------------- >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Best regards, >>>>>>> Dmitry Demeshchuk. >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Best regards, >>>>> Dmitry Demeshchuk. >>>>> >>>> >>>> >>> >>> >>> -- >>> Best regards, >>> Dmitry Demeshchuk. >>> >> >> >> >> -- >> Best regards, >> Dmitry Demeshchuk. >> > > -- Best regards, Dmitry Demeshchuk.
