Re: Portable runner bundle scheduling (Streaming/Python/Flink)

2019-12-05 Thread Thomas Weise
PR for this is now open: https://github.com/apache/beam/pull/10313 Hey Max, Thanks for the feedback. --> On Sun, Nov 24, 2019 at 2:04 PM Maximilian Michels wrote: > Load-balancing the worker selection for bundle execution sounds like the > solution to uneven work distribution across the

Re: Portable runner bundle scheduling (Streaming/Python/Flink)

2019-11-24 Thread Maximilian Michels
Load-balancing the worker selection for bundle execution sounds like the solution to uneven work distribution across the workers. Some comments: (1) I could imagine that in case of long-running bundle execution (e.g. model execution), this could stall upstream operators because their busy

Re: Portable runner bundle scheduling (Streaming/Python/Flink)

2019-11-23 Thread Thomas Weise
JIRA: https://issues.apache.org/jira/browse/BEAM-8816 On Thu, Nov 21, 2019 at 10:44 AM Thomas Weise wrote: > Hi Luke, > > Thanks for the background and it is exciting to see the progress on the > SDF side. It will help with this use case and many other challenges. I > imagine the Python user

Re: Portable runner bundle scheduling (Streaming/Python/Flink)

2019-11-21 Thread Thomas Weise
Hi Luke, Thanks for the background and it is exciting to see the progress on the SDF side. It will help with this use case and many other challenges. I imagine the Python user code would be able to determine that it is bogged down with high latency record processing (based on the duration it

Re: Portable runner bundle scheduling (Streaming/Python/Flink)

2019-11-20 Thread Luke Cwik
Dataflow has run into this issue as well. Dataflow has "work items" that are converted into bundles that are executed on the SDK. Each work item does a greedy assignment to the SDK worker with the fewest work items assigned. As you surmised, we use SDF splitting in batch pipelines to balance work.

Portable runner bundle scheduling (Streaming/Python/Flink)

2019-11-20 Thread Thomas Weise
We found a problem with uneven utilization of SDK workers causing excessive latency with Streaming/Python/Flink. Remember that with Python, we need to execute multiple worker processes on a machine instead of relying on threads in a single worker, which requires the runner to make a decision to