I don't think this is being worked on, but given that Java already supports the LOOPBACK environment (which is a special case of EXTERNAL) it would just be a matter of properly parsing the flags.
On Fri, May 15, 2020 at 9:52 AM Alexey Romanenko <[email protected]> wrote: > Thanks! It looks that this is exactly what I need, though mostly for Java > SDK. > Don't you know if anyone works on this Jira? > > On 15 May 2020, at 18:01, Kyle Weaver <[email protected]> wrote: > > > Yes, you can start docker containers before hand using the worker_pool > option: > > However, it only works for Python. Java doesn't have it yet: > https://issues.apache.org/jira/browse/BEAM-8137 > > On Fri, May 15, 2020 at 12:00 PM Kyle Weaver <[email protected]> wrote: > >> > 2. Is it possible to pre-run SDK Harness containers and reuse them for >> every Portable Runner pipeline? I could win quite a lot of time on this for >> more complicated pipelines. >> >> Yes, you can start docker containers before hand using the worker_pool >> option: >> >> docker run -p=50000:50000 apachebeam/python3.7_sdk --worker_pool # or >> some other port publishing >> >> and then in your pipeline options set: >> >> --environment_type=EXTERNAL --environment_config=localhost:50000 >> >> On Fri, May 15, 2020 at 11:47 AM Alexey Romanenko < >> [email protected]> wrote: >> >>> Hello, >>> >>> I’m trying to optimize my pipeline runtime while using it with Portable >>> Runner and I have some related questions. >>> >>> This is a cross-language pipeline, written in Java SDK, and which >>> executes some Python code through “External.of()” transform and my custom >>> Python Expansion Service. I use Docker-based SDK Harness for Java and >>> Python. In a primitive form the pipeline would look like this: >>> >>> >>> [Source (Java)] -> [MyTransform1 (Java)] -> [External (Execute Python >>> code with Python SDK) ] - > [MyTransform2 (Java SDK)] >>> >>> >>> >>> While running this pipeline with Portable Spark Runner, I see that quite >>> a lot of time we spend for artifacts staging (in our case, we have quite a >>> lot of artifacts in real pipeline) and launching a Docker container for >>> every Spark stage. So, my questions are the following: >>> >>> 1. Is there any internal Beam functionality to pre-stage or, at least >>> cache, already staged artifacts? Since the same pipeline will be executed >>> many times in a row, there is no reason to stage the same artifacts every >>> run. >>> >>> 2. Is it possible to pre-run SDK Harness containers and reuse them for >>> every Portable Runner pipeline? I could win quite a lot of time on this for >>> more complicated pipelines. >>> >>> >>> >>> Well, I guess I can find some workarounds for that but I wished to ask >>> before that perhaps there is a better way to do that in Beam. >>> >>> >>> Regards, >>> Alexey >> >> >
