Re: Dataflow v2 runner scaling behaviour

David Sánchez Wed, 24 Mar 2021 09:19:58 -0700

Hi Pablo,

This is the input data we are testing


Elements added38,792,932
Estimated size3.14 GB

On Wed, Mar 24, 2021 at 5:09 PM Pablo Estrada <[email protected]> wrote:

> Hi David,
> Thanks for sharing. I'm investigating something like this recently. What's
> the size of your data?
> Best
> -P.
>
> On Wed, Mar 24, 2021, 7:52 AM David Sánchez <[email protected]> wrote:
>
>> Hi folks!
>>
>> I'm testing the dataflow v2 runner in a batch pipeline (Apache Beam
>> Python 3.7 SDK 2.27.0) that reads many million of rows from BigQuery and
>> writes to PubSub and BigQuery using the flag "--experiments=use_runner_v2".
>>
>> The same job used to scale up immediately to over 50 workers, but in v2
>> it never scales up further than 5-6 workers, thus it's way slower. I can
>> see however that the total vCPU and memory are about half than before,
>> which is promising. Any clue about why the scaling is behaving differently?
>>
>> Many thanks
>>
>

Re: Dataflow v2 runner scaling behaviour

Reply via email to