One avenue is to adjust --worker_machine_type when you start a pipeline,
and pass a custom machine type[1], with a small number of cores and a lot
of RAM.

[1] https://cloud.google.com/custom-machine-types

On Sat, Apr 11, 2020 at 3:20 AM Tadas Šubonis <[email protected]>
wrote:

> Thanks.
>
> The program itself is heavy on memory usage but its CPU and IO usage are
> low-medium (let's say max 2 cores per task). Before, I used to use 16GB RAM
> workers (so they would fit 2-6 tasks at once) but as long as there is a way
> to limit worker-level parallelism, I think I should be fine. Could you
> point me to the right place in the docs to read about that ( I am planning
> to use Dataflow)?
>
> On Sat, Apr 11, 2020 at 1:48 AM Robert Bradshaw <[email protected]>
> wrote:
>
>> In general, runners like to schedule more than one task per worker (to
>> take advantage of multiple cores, etc.). The mitigation to this is likely
>> to be runner-specific. E.g. For Dataflow the number of tasks/threads per
>> machine is by default chosen to be the number of cores of that VM. I think
>> Flink and Spark have flags that can be set to control this as well.
>>
>> Another option would be to control the resource usage with a global lock.
>> Your DoFn would acquire this lock before starting up the program, and other
>> workers would sit idly by for their turn.
>>
>> I think trying to run on machines with lots of memory is the easiest
>> solution, unless this is truly infeasible (depends on what your setup is).
>>
>>
>> On Fri, Apr 10, 2020 at 4:24 PM Valentyn Tymofieiev <[email protected]>
>> wrote:
>>
>>> I don't think there is a silver bullet solution to avoid an OOM but
>>> there are mitigations you can employ if there is a problem, such as:
>>>  - sizing the workers appropriately,
>>>  - avoiding memory leaks in the user code,
>>>  - limiting worker-level parallelism, if necessary.
>>>
>>
>
> --
>
> Kind Regards,
> Tadas Šubonis
>

Reply via email to