Thanks! It looks that this is exactly what I need, though mostly for Java SDK. Don't you know if anyone works on this Jira?
> On 15 May 2020, at 18:01, Kyle Weaver <[email protected]> wrote: > > > Yes, you can start docker containers before hand using the worker_pool > > option: > > However, it only works for Python. Java doesn't have it yet: > https://issues.apache.org/jira/browse/BEAM-8137 > <https://issues.apache.org/jira/browse/BEAM-8137> > On Fri, May 15, 2020 at 12:00 PM Kyle Weaver <[email protected] > <mailto:[email protected]>> wrote: > > 2. Is it possible to pre-run SDK Harness containers and reuse them for > > every Portable Runner pipeline? I could win quite a lot of time on this for > > more complicated pipelines. > > Yes, you can start docker containers before hand using the worker_pool option: > > docker run -p=50000:50000 apachebeam/python3.7_sdk --worker_pool # or some > other port publishing > > and then in your pipeline options set: > > --environment_type=EXTERNAL --environment_config=localhost:50000 > > On Fri, May 15, 2020 at 11:47 AM Alexey Romanenko <[email protected] > <mailto:[email protected]>> wrote: > Hello, > > I’m trying to optimize my pipeline runtime while using it with Portable > Runner and I have some related questions. > > This is a cross-language pipeline, written in Java SDK, and which executes > some Python code through “External.of()” transform and my custom Python > Expansion Service. I use Docker-based SDK Harness for Java and Python. In a > primitive form the pipeline would look like this: > > > [Source (Java)] -> [MyTransform1 (Java)] -> [External (Execute Python code > with Python SDK) ] - > [MyTransform2 (Java SDK)] > > > > While running this pipeline with Portable Spark Runner, I see that quite a > lot of time we spend for artifacts staging (in our case, we have quite a lot > of artifacts in real pipeline) and launching a Docker container for every > Spark stage. So, my questions are the following: > > 1. Is there any internal Beam functionality to pre-stage or, at least cache, > already staged artifacts? Since the same pipeline will be executed many times > in a row, there is no reason to stage the same artifacts every run. > > 2. Is it possible to pre-run SDK Harness containers and reuse them for every > Portable Runner pipeline? I could win quite a lot of time on this for more > complicated pipelines. > > > > Well, I guess I can find some workarounds for that but I wished to ask before > that perhaps there is a better way to do that in Beam. > > > Regards, > Alexey
