Re: Portable Runner performance optimisation

Kyle Weaver Fri, 15 May 2020 09:02:27 -0700

> Yes, you can start docker containers before hand using the worker_pool
option:


However, it only works for Python. Java doesn't have it yet:
https://issues.apache.org/jira/browse/BEAM-8137

On Fri, May 15, 2020 at 12:00 PM Kyle Weaver <[email protected]> wrote:

> > 2. Is it possible to pre-run SDK Harness containers and reuse them for
> every Portable Runner pipeline? I could win quite a lot of time on this for
> more complicated pipelines.
>
> Yes, you can start docker containers before hand using the worker_pool
> option:
>
> docker run -p=50000:50000 apachebeam/python3.7_sdk --worker_pool # or some
> other port publishing
>
> and then in your pipeline options set:
>
> --environment_type=EXTERNAL --environment_config=localhost:50000
>
> On Fri, May 15, 2020 at 11:47 AM Alexey Romanenko <
> [email protected]> wrote:
>
>> Hello,
>>
>> I’m trying to optimize my pipeline runtime while using it with Portable
>> Runner and I have some related questions.
>>
>> This is a cross-language pipeline, written in Java SDK, and which
>> executes some Python code through “External.of()” transform and my custom
>> Python Expansion Service. I use Docker-based SDK Harness for Java and
>> Python. In a primitive form the pipeline would look like this:
>>
>>
>> [Source (Java)] -> [MyTransform1 (Java)] ->  [External (Execute Python
>> code with Python SDK) ] - >  [MyTransform2 (Java SDK)]
>>
>>
>>
>> While running this pipeline with Portable Spark Runner, I see that quite
>> a lot of time we spend for artifacts staging (in our case, we have quite a
>> lot of artifacts in real pipeline) and launching a Docker container for
>> every Spark stage. So, my questions are the following:
>>
>> 1. Is there any internal Beam functionality to pre-stage or, at least
>> cache, already staged artifacts? Since the same pipeline will be executed
>> many times in a row, there is no reason to stage the same artifacts every
>> run.
>>
>> 2. Is it possible to pre-run SDK Harness containers and reuse them for
>> every Portable Runner pipeline? I could win quite a lot of time on this for
>> more complicated pipelines.
>>
>>
>>
>> Well, I guess I can find some workarounds for that but I wished to ask
>> before that perhaps there is a better way to do that in Beam.
>>
>>
>> Regards,
>> Alexey
>
>

Re: Portable Runner performance optimisation

Reply via email to