Also I looked at the code, reshuffle seems doing some groupby work
internally. But I don't really need groupby

On Fri, Jan 19, 2024 at 9:35 AM [email protected] <[email protected]> wrote:

> ReShuffle is deprecated
>
> On Fri, Jan 19, 2024 at 8:25 AM XQ Hu via user <[email protected]>
> wrote:
>
>> I do not think it enforces a reshuffle by just checking the doc here:
>> https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.util.html?highlight=withkeys#apache_beam.transforms.util.WithKeys
>>
>> Have you tried to just add ReShuffle after PubsubLiteIO?
>>
>> On Thu, Jan 18, 2024 at 8:54 PM [email protected] <[email protected]>
>> wrote:
>>
>>> Hey guys,
>>>
>>> I have a question, does withkeys transformation enforce a reshuffle?
>>>
>>> My pipeline basically look like this PubsubLiteIO -> ParDo(..) ->
>>> ParDo() -> BigqueryIO.write()
>>>
>>> The problem is PubsubLiteIO -> ParDo(..) -> ParDo() always fused
>>> together. But The ParDo is expensive and I want dataflow to have more
>>> workers to work on that, what's the best way to do that?
>>>
>>> Regards,
>>>
>>>

Reply via email to