Re: Limit the concurrency of a Beam Step (or all the steps)

2021-09-24 Thread Cristian Constantinescu
Hey Marco, Other more senior people can correct me here. About limiting the concurrency aspect of things. Beam/Runners split PCollections of by Key. So as long as all your items have the same key, I think it will create only one executor for that ParDo. So that's what I did recently: 1. create

Re: Importing dependencies of Python Pipeline

2021-09-24 Thread Jan Lukavský
+dev I hit very similar issue even with standard module (math). No matter where I put the import statement (even one line preceding the use), the module cannot be found and causes NameError: name 'math' is not defined I therefore think, that the --setup_file

Re: Limit the concurrency of a Beam Step (or all the steps)

2021-09-24 Thread Evan Galpin
This has been mentioned a few times and seems to me to be a fairly common requirement. I think that a rate limit could be accomplished through stateful processing, using a combination of bagState and Timers. GroupIntoBatches.java would be a good example. I wonder if this would be a good built-in

Limit the concurrency of a Beam Step (or all the steps)

2021-09-24 Thread Sofia’s World
Hello i was wondering if it's somehow possible to limit the concurrency of a beam Step? i have a workflow which involves a Webclient that uses an API for which my account has a max of 300/requests per minute... Alternatively, will i have to go through a combine and custom ParDo ? Has anyone

Re: Beam/Flink's netty versions seems to clash (2.32.0 / 1.13.1)

2021-09-24 Thread Kaymak, Tobias
Ok, thank you! On Thu, Sep 23, 2021 at 6:06 PM Reuven Lax wrote: > The bug will cause these logs and might also cause some performance > problems. It should no cause any data correctness through. > > The bug fix will be available in Beam 2.3.4, ETA probably this November. > > On Thu, Sep 23,