> Yes, you can start docker containers before hand using the worker_pool option:
However, it only works for Python. Java doesn't have it yet: https://issues.apache.org/jira/browse/BEAM-8137 On Fri, May 15, 2020 at 12:00 PM Kyle Weaver <[email protected]> wrote: > > 2. Is it possible to pre-run SDK Harness containers and reuse them for > every Portable Runner pipeline? I could win quite a lot of time on this for > more complicated pipelines. > > Yes, you can start docker containers before hand using the worker_pool > option: > > docker run -p=50000:50000 apachebeam/python3.7_sdk --worker_pool # or some > other port publishing > > and then in your pipeline options set: > > --environment_type=EXTERNAL --environment_config=localhost:50000 > > On Fri, May 15, 2020 at 11:47 AM Alexey Romanenko < > [email protected]> wrote: > >> Hello, >> >> I’m trying to optimize my pipeline runtime while using it with Portable >> Runner and I have some related questions. >> >> This is a cross-language pipeline, written in Java SDK, and which >> executes some Python code through “External.of()” transform and my custom >> Python Expansion Service. I use Docker-based SDK Harness for Java and >> Python. In a primitive form the pipeline would look like this: >> >> >> [Source (Java)] -> [MyTransform1 (Java)] -> [External (Execute Python >> code with Python SDK) ] - > [MyTransform2 (Java SDK)] >> >> >> >> While running this pipeline with Portable Spark Runner, I see that quite >> a lot of time we spend for artifacts staging (in our case, we have quite a >> lot of artifacts in real pipeline) and launching a Docker container for >> every Spark stage. So, my questions are the following: >> >> 1. Is there any internal Beam functionality to pre-stage or, at least >> cache, already staged artifacts? Since the same pipeline will be executed >> many times in a row, there is no reason to stage the same artifacts every >> run. >> >> 2. Is it possible to pre-run SDK Harness containers and reuse them for >> every Portable Runner pipeline? I could win quite a lot of time on this for >> more complicated pipelines. >> >> >> >> Well, I guess I can find some workarounds for that but I wished to ask >> before that perhaps there is a better way to do that in Beam. >> >> >> Regards, >> Alexey > >
