FYI, I tried to install a psycopg2 wheel from a file using the
"extra_packages" argument (although, wheels installation is apparently
still an experimental feature), but this led to a problem with ECS-2 vs
ECS-4 compatibility issues (looks like the Dataflow version of Python is
using ECS-2, while wheels for Linux generally use ECS-4).

What ended up working for me ultimately, though, is an approach similar to
juliaset, with a few small differences:
https://gist.github.com/doubleyou/27bf3abb0fc77a2bc9257e6adc5cfe8f

Note two things here:

1. We import the "install" class from setuptools, not from distutils. This,
in fact, has been the core problem for me. I haven't yet tried if the
juliaset example works for me at all, but I strongly suspect that it may
not work exactly because of this issue.

2. We handle commands in a simpler fashion, by just using one single class.

I'll make a Jira ticket later today or tomorrow to reflect my findings,
maybe make a pull request if I confirm that juliaset is not universally
working either, if that's fine.

On Tue, Jun 6, 2017 at 8:46 PM, Dmitry Demeshchuk <dmi...@postmates.com>
wrote:

> Yeah, I wasn't really pinning it myself, it's one of the dependency
> packages that depends on that specific version.
>
> Thanks for the information, I'll try to explicitly install 33.1.1 and see
> if it changes anything.
>
> On Tue, Jun 6, 2017 at 7:13 PM, Ahmet Altay <al...@google.com> wrote:
>
>> Pinning setuptools is generally not a good practice. The reason is at
>> installation time it might cause removal of the the setuptools that is
>> being used to install packages.
>>
>> FWIW, dataflow workers should have setuptools 33.1.1, which was released
>> in 2017/01/16.
>>
>> Ahmet
>>
>> On Tue, Jun 6, 2017 at 6:53 PM, Dmitry Demeshchuk <dmi...@postmates.com>
>> wrote:
>>
>>> Thanks, Ahmet, it really turned out that Stackdriver had more logs than
>>> just the Dataflow logs section.
>>>
>>> So, I ended up seeing this code that fails constantly:
>>>
>>> I    Running setup.py install for dataflow: started
>>> I      Running setup.py install for dataflow: finished with status 'error'
>>> I      Complete output from command /usr/bin/python -u -c "import 
>>> setuptools, 
>>> tokenize;__file__='/tmp/pip-bXyST4-build/setup.py';f=getattr(tokenize, 
>>> 'open', open)(__file__);code=f.read().replace('\r\n', 
>>> '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record 
>>> /tmp/pip-sHw6oI-record/install-record.txt 
>>> --single-version-externally-managed --compile:
>>> I      usage: -c [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...]
>>> I         or: -c --help [cmd1 cmd2 ...]
>>> I         or: -c --help-commands
>>> I         or: -c cmd --help
>>> I
>>> I      error: option --single-version-externally-managed not recognized
>>> I
>>> I      ----------------------------------------
>>> I  Command "/usr/bin/python -u -c "import setuptools, 
>>> tokenize;__file__='/tmp/pip-bXyST4-build/setup.py';f=getattr(tokenize, 
>>> 'open', open)(__file__);code=f.read().replace('\r\n', 
>>> '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record 
>>> /tmp/pip-sHw6oI-record/install-record.txt 
>>> --single-version-externally-managed --compile" failed with error code 1 in 
>>> /tmp/pip-bXyST4-build/
>>> I  /usr/local/bin/pip failed with exit status 1
>>>
>>>
>>> This seems to mean that the natively installed setuptools are too old,
>>> and the new command has been generated with a newer version of setuptools
>>> (specifically, my project has setuptools==36.0.1 as a dependency of some
>>> package). I'm still digging more through the Stackdriver logs but so far
>>> couldn't find out the exact reason of the failure.
>>>
>>> Also talking to the Dataflow folks, maybe they'll have a better idea.
>>> I'll also try to compare this to the output of successful pipelines and see
>>> if it gives me any ideas.
>>>
>>> Thank you.
>>>
>>> On Tue, Jun 6, 2017 at 4:40 PM, Ahmet Altay <al...@google.com> wrote:
>>>
>>>>
>>>>
>>>> On Tue, Jun 6, 2017 at 2:07 PM, Dmitry Demeshchuk <dmi...@postmates.com
>>>> > wrote:
>>>>
>>>>> Hi Ahmet,
>>>>>
>>>>> Thanks a lot for pointing out that doc, I somehow missed it from the
>>>>> official Python SDK page!
>>>>>
>>>>> One thing that comes to my mind is that generally one should probably
>>>>> use the 'install' command in setuptools, not 'build', like it's done in
>>>>> https://github.com/apache/beam/blob/master/sdks/python/ap
>>>>> ache_beam/examples/complete/juliaset/setup.py#L113. Reason being, the
>>>>> 'build' step seems to be executed on the original machine, not inside the
>>>>> runner's containers, while 'install' will be triggered inside of them. If 
>>>>> I
>>>>> run a pipeline that uses setup.py with a "build" step, it fails due to
>>>>> being unable to "apt-get install libpq-dev" on a mac.
>>>>>
>>>>
>>>> Thank you. This example should similarly work in install commands I
>>>> believe. Also, if possible please file a JIRA issue with your ideas and we
>>>> can work on improving things.
>>>>
>>>>
>>>>>
>>>>> I'm still trying to make it work with either build or install steps,
>>>>> talking to the Dataflow folks in parallel to get more understanding of 
>>>>> what
>>>>> I'm doing wrong (Dataflow doesn't send out installation failure logs to
>>>>> Stackdriver, only runtime logs, so it seems).
>>>>>
>>>>
>>>> Have you tried looking worker-startup logs? All of the logs should be
>>>> in stackdriver.
>>>>
>>>>
>>>>>
>>>>> On Tue, Jun 6, 2017 at 9:21 AM, Ahmet Altay <al...@google.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Please see Managing Python Pipeline Dependencies [1] for various ways
>>>>>> on installing additional dependencies. The section on non-python
>>>>>> dependencies is relevant to your question.
>>>>>>
>>>>>> Thank you,
>>>>>> Ahmet
>>>>>>
>>>>>> [1] https://beam.apache.org/documentation/sdks/python-pipeli
>>>>>> ne-dependencies/
>>>>>>
>>>>>> On Mon, Jun 5, 2017 at 11:52 PM, Morand, Sebastien <
>>>>>> sebastien.mor...@veolia.com> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> Interested too. Could be fine for instance to add sftp
>>>>>>> BoundedSource, but compilalation of paramiko with ssl library (and so
>>>>>>> installation of ssl-dev)
>>>>>>>
>>>>>>> Regards,
>>>>>>>
>>>>>>> *Sébastien MORAND*
>>>>>>> Team Lead Solution Architect
>>>>>>> Technology & Operations / Digital Factory
>>>>>>> Veolia - Group Information Systems & Technology (IS&T)
>>>>>>> Cell.: +33 7 52 66 20 81 / Direct: +33 1 85 57 71 08
>>>>>>> <+33%201%2085%2057%2071%2008>
>>>>>>> Bureau 0144C (Ouest)
>>>>>>> 30, rue Madeleine-Vionnet - 93300 Aubervilliers, France
>>>>>>> *www.veolia.com <http://www.veolia.com>*
>>>>>>> <http://www.veolia.com>
>>>>>>> <https://www.facebook.com/veoliaenvironment/>
>>>>>>> <https://www.youtube.com/user/veoliaenvironnement>
>>>>>>> <https://www.linkedin.com/company/veolia-environnement>
>>>>>>> <https://twitter.com/veolia>
>>>>>>>
>>>>>>> On 6 June 2017 at 08:01, Dmitry Demeshchuk <dmi...@postmates.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi again, folks,
>>>>>>>>
>>>>>>>> How should I go about installing Python packages that require to be
>>>>>>>> built and/or require native dependencies like shared libraries or such?
>>>>>>>>
>>>>>>>> I guess, I could potentially build the C-based modules using the
>>>>>>>> same version of kernel and glibc that Dataflow is running, but doesn't 
>>>>>>>> seem
>>>>>>>> like there's any way to install shared libraries at these boxes, right?
>>>>>>>>
>>>>>>>> Thanks!
>>>>>>>>
>>>>>>>> --
>>>>>>>> Best regards,
>>>>>>>> Dmitry Demeshchuk.
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> ------------------------------------------------------------
>>>>>>> --------------------------------
>>>>>>> This e-mail transmission (message and any attached files) may
>>>>>>> contain information that is proprietary, privileged and/or confidential 
>>>>>>> to
>>>>>>> Veolia Environnement and/or its affiliates and is intended exclusively 
>>>>>>> for
>>>>>>> the person(s) to whom it is addressed. If you are not the intended
>>>>>>> recipient, please notify the sender by return e-mail and delete all 
>>>>>>> copies
>>>>>>> of this e-mail, including all attachments. Unless expressly authorized, 
>>>>>>> any
>>>>>>> use, disclosure, publication, retransmission or dissemination of this
>>>>>>> e-mail and/or of its attachments is strictly prohibited.
>>>>>>>
>>>>>>> Ce message electronique et ses fichiers attaches sont strictement
>>>>>>> confidentiels et peuvent contenir des elements dont Veolia Environnement
>>>>>>> et/ou l'une de ses entites affiliees sont proprietaires. Ils sont donc
>>>>>>> destines a l'usage de leurs seuls destinataires. Si vous avez recu ce
>>>>>>> message par erreur, merci de le retourner a son emetteur et de le 
>>>>>>> detruire
>>>>>>> ainsi que toutes les pieces attachees. L'utilisation, la divulgation, la
>>>>>>> publication, la distribution, ou la reproduction non expressement
>>>>>>> autorisees de ce message et de ses pieces attachees sont interdites.
>>>>>>> ------------------------------------------------------------
>>>>>>> --------------------------------
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Best regards,
>>>>> Dmitry Demeshchuk.
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Best regards,
>>> Dmitry Demeshchuk.
>>>
>>
>>
>
>
> --
> Best regards,
> Dmitry Demeshchuk.
>



-- 
Best regards,
Dmitry Demeshchuk.

Reply via email to