Re: Portability framework: multiple environments in one pipeline

Chamikara Jayalath Tue, 23 Jul 2019 16:29:13 -0700

On Tue, Jul 23, 2019 at 3:45 PM Chad Dombrova <chad...@gmail.com> wrote:


> Our specific situation is pretty unique, but I think it fits a more
> general pattern.  We use a number of media applications and each comes with
> its own built-in python interpreter (Autodesk Maya and SideFX Houndini, for
> example), and the core modules for each application can only be imported
> within their respective interpreter.  We want to be able to create
> pipelines where certain transforms are hosted within different application
> interpreters, so that we can avoid the ugly workarounds that we have to do
> now.
>
> I can imagine a similar scenario where a user wants to use a number of
> different libraries for different transforms, but the libraries’
> requirements conflict with each other, or perhaps some require python3 and
> others are stuck on python2.
>

Thanks for the details.


>
> Where can I find documentation on the expansion service?  I found a design
> doc which was helpful, but it seems to hew toward the hypothetical, so I
> think there have been a number of concrete steps taken since it was
> written:
>
> https://docs.google.com/document/d/1veiDF2dVH_5y56YxCcri2elCtYJgJnqw9aKPWAgddH8/mobilebasic
>

For now, I suggest you follow one of the existing examples. For example,

We start an expansion service here:
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/wordcount_xlang.py#L123
Expansion jar file generation and Gradle build logic is here:
https://github.com/apache/beam/blob/master/sdks/python/build.gradle#L408

Currently this is only supported by Flink runner. Support for Dataflow
runner is in the works.

There have been discussions about automatically starting expansion services
and we should be adding documentation around current solution in the
future.

Thanks,
Cham



>
> -chad
>
>
>
> On Tue, Jul 23, 2019 at 1:39 PM Chamikara Jayalath <chamik...@google.com>
> wrote:
>
>> I think we have primary focussed on the ability run transforms from
>> multiple SDK in the same pipeline (cross-language) so far, but as Robert
>> mentioned the framework currently in development should also be usable for
>> running pipelines that use multiple environments that have the same SDK
>> installed as well. I'd love to get more clarity on the exact use-case here
>> (for example, details on why you couldn't run all Python transforms in a
>> single environment) and to know if others have the same requirement.
>>
>> Thanks,
>> Cham
>>
>>
>> On Mon, Jul 22, 2019 at 12:31 AM Robert Bradshaw <rober...@google.com>
>> wrote:
>>
>>> Yes, for sure. Support for this is available in some runners (like the
>>> Python Universal Local Runner and Flink) and actively being added to
>>> others (e.g. Dataflow). There are still some rough edges however--one
>>> currently must run an expansion service to define a pipeline step in
>>> an alternative environment (e.g. by registering your transforms and
>>> running
>>> https://github.com/apache/beam/blob/release-2.14.0/sdks/python/apache_beam/runners/portability/expansion_service_test.py
>>> ).
>>> We'd like to make this process a lot smoother (and feedback would be
>>> appreciated).
>>>
>>> On Sat, Jul 20, 2019 at 7:57 PM Chad Dombrova <chad...@gmail.com> wrote:
>>> >
>>> > Hi all,
>>> > I'm interested to know if others on the list would find value in the
>>> ability to use multiple environments (e.g. docker images) within a single
>>> pipeline, using some mechanism to identify the environment(s) that a
>>> transform should use. It would be quite useful for us, since our transforms
>>> can have conflicting python requirements, or worse, conflicting interpreter
>>> requirements.  Currently to solve this we have to break the pipeline up
>>> into multiple pipelines and use pubsub to communicate between them, which
>>> is not ideal.
>>> >
>>> > -chad
>>> >
>>>
>>

Re: Portability framework: multiple environments in one pipeline

Reply via email to