Re: Is there any way to set the parallelism of operators like group by, join?

Reuven Lax via user Sat, 15 Apr 2023 10:42:10 -0700

The maximum parallelism is always determined by the parallelism of your
data. If you do a GroupByKey for example, the number of keys in your data
determines the maximum parallelism.

Beyond the limitations in your data, it depends on your execution engine.
If you're using Dataflow, Dataflow is designed to automatically determine
the parallelism (e.g. work will be dynamically split and moved around
between workers, the number of workers will autoscale, etc.), so there's no
need to explicitly set the parallelism of the execution.

On Sat, Apr 15, 2023 at 1:12 AM Jeff Zhang <[email protected]> wrote:

> Besides the global parallelism of beam job, is there any way to set
> parallelism for individual operators like group by and join? I
> understand the parallelism setting depends on the underlying execution
> engine, but it is very common to set parallelism like group by and join in
> both spark & flink.
>
>
>
>
>
>
>
> --
> Best Regards
>
> Jeff Zhang
>

Re: Is there any way to set the parallelism of operators like group by, join?

Reply via email to