Also I looked at the code, reshuffle seems doing some groupby work internally. But I don't really need groupby
On Fri, Jan 19, 2024 at 9:35 AM hsy...@gmail.com <hsy...@gmail.com> wrote: > ReShuffle is deprecated > > On Fri, Jan 19, 2024 at 8:25 AM XQ Hu via user <user@beam.apache.org> > wrote: > >> I do not think it enforces a reshuffle by just checking the doc here: >> https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.util.html?highlight=withkeys#apache_beam.transforms.util.WithKeys >> >> Have you tried to just add ReShuffle after PubsubLiteIO? >> >> On Thu, Jan 18, 2024 at 8:54 PM hsy...@gmail.com <hsy...@gmail.com> >> wrote: >> >>> Hey guys, >>> >>> I have a question, does withkeys transformation enforce a reshuffle? >>> >>> My pipeline basically look like this PubsubLiteIO -> ParDo(..) -> >>> ParDo() -> BigqueryIO.write() >>> >>> The problem is PubsubLiteIO -> ParDo(..) -> ParDo() always fused >>> together. But The ParDo is expensive and I want dataflow to have more >>> workers to work on that, what's the best way to do that? >>> >>> Regards, >>> >>>