I do not think it enforces a reshuffle by just checking the doc here:
https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.util.html?highlight=withkeys#apache_beam.transforms.util.WithKeys

Have you tried to just add ReShuffle after PubsubLiteIO?

On Thu, Jan 18, 2024 at 8:54 PM hsy...@gmail.com <hsy...@gmail.com> wrote:

> Hey guys,
>
> I have a question, does withkeys transformation enforce a reshuffle?
>
> My pipeline basically look like this PubsubLiteIO -> ParDo(..) -> ParDo()
> -> BigqueryIO.write()
>
> The problem is PubsubLiteIO -> ParDo(..) -> ParDo() always fused together.
> But The ParDo is expensive and I want dataflow to have more workers to work
> on that, what's the best way to do that?
>
> Regards,
>
>

Reply via email to