In general, runners like to schedule more than one task per worker (to take
advantage of multiple cores, etc.). The mitigation to this is likely to be
runner-specific. E.g. For Dataflow the number of tasks/threads per
machine is by default chosen to be the number of cores of that VM. I think
Flink and Spark have flags that can be set to control this as well.

Another option would be to control the resource usage with a global lock.
Your DoFn would acquire this lock before starting up the program, and other
workers would sit idly by for their turn.

I think trying to run on machines with lots of memory is the easiest
solution, unless this is truly infeasible (depends on what your setup is).


On Fri, Apr 10, 2020 at 4:24 PM Valentyn Tymofieiev <[email protected]>
wrote:

> I don't think there is a silver bullet solution to avoid an OOM but there
> are mitigations you can employ if there is a problem, such as:
>  - sizing the workers appropriately,
>  - avoiding memory leaks in the user code,
>  - limiting worker-level parallelism, if necessary.
>

Reply via email to