Re: Individual Parallelism support for Flink Runner

2020-06-29 Thread Akshay Iyangar
set can be used to set it’s parallelism or can default to the global one when the runner is flink? Thanks Akshay I. From: amit kumar Reply-To: "dev@beam.apache.org" Date: Monday, June 29, 2020 at 2:47 PM To: "dev@beam.apache.org" Subject: Re: Individual Paralleli

Re: Individual Parallelism support for Flink Runner

2020-06-29 Thread amit kumar
Looks like https://ci.apache.org/projects/flink/flink-docs-stable/dev/parallel.html#operator-level Regards, Amit On Mon, Jun 29, 2020 at 12:59 PM Kenneth Knowles wrote: > This exact issue has been discussed before, though I can't find the older > threads. Basically, specifying parallelism is a

Re: Individual Parallelism support for Flink Runner

2020-06-29 Thread Kenneth Knowles
This exact issue has been discussed before, though I can't find the older threads. Basically, specifying parallelism is a workaround (aka a cost), not a feature (aka a benefit). Sometimes you have to pay that cost as it is the only solution currently understood or implemented. It depends on what yo

Re: Individual Parallelism support for Flink Runner

2020-06-29 Thread Luke Cwik
Check out this thread[1] about adding "runner determined sharding" as a general concept. This could be used to enhance the reshuffle implementation significantly and might remove the need for per transform parallelism from that specific use case and likely from most others. 1: https://lists.apache

Re: Individual Parallelism support for Flink Runner

2020-06-29 Thread Maximilian Michels
We could allow parameterizing transforms by using transform identifiers from the pipeline, e.g. options = ['--parameterize=MyTransform;parallelism=5'] with Pipeline.create(PipelineOptions(options)) as p: p | Create(1, 2, 3) | 'MyTransform' >> ParDo(..) Those hints should always be opt

Re: Individual Parallelism support for Flink Runner

2020-06-28 Thread Reuven Lax
However such a parameter would be specific to a single transform, whereas maxNumWorkers is a global parameter today. On Sat, Jun 27, 2020 at 10:31 PM Daniel Collins wrote: > I could imagine for example, a 'parallelismHint' field in the base > parameters that could be set to maxNumWorkers when ru

Re: Individual Parallelism support for Flink Runner

2020-06-27 Thread Daniel Collins
I could imagine for example, a 'parallelismHint' field in the base parameters that could be set to maxNumWorkers when running on dataflow or an equivalent parameter when running on flink. It would be useful to get a default value for the sharding in the Reshuffle changes here https://github.com/apa

Re: Individual Parallelism support for Flink Runner

2020-06-26 Thread Reuven Lax
It's an interesting question - this parameter is clearly very runner specific (e.g. it would be meaningless for the Dataflow runner, where parallelism is not a static constant). How should we go about passing runner-specific options per transform? On Fri, Jun 26, 2020 at 1:14 PM Akshay Iyangar wr

Individual Parallelism support for Flink Runner

2020-06-26 Thread Akshay Iyangar
Hi beam community, So I had brought this issue in our slack channel but I guess this warrants a deeper discussion and if we do go about what is the POA for it. So basically currently for Flink Runner we don’t support operator level parallelism which native Flink provides OOTB. So I was wonderin