set can be used to set
it’s parallelism or can default to the global one when the runner is flink?
Thanks
Akshay I.
From: amit kumar
Reply-To: "dev@beam.apache.org"
Date: Monday, June 29, 2020 at 2:47 PM
To: "dev@beam.apache.org"
Subject: Re: Individual Paralleli
Looks like
https://ci.apache.org/projects/flink/flink-docs-stable/dev/parallel.html#operator-level
Regards,
Amit
On Mon, Jun 29, 2020 at 12:59 PM Kenneth Knowles wrote:
> This exact issue has been discussed before, though I can't find the older
> threads. Basically, specifying parallelism is a
This exact issue has been discussed before, though I can't find the older
threads. Basically, specifying parallelism is a workaround (aka a cost),
not a feature (aka a benefit). Sometimes you have to pay that cost as it is
the only solution currently understood or implemented. It depends on what
yo
Check out this thread[1] about adding "runner determined sharding" as a
general concept. This could be used to enhance the reshuffle implementation
significantly and might remove the need for per transform parallelism from
that specific use case and likely from most others.
1:
https://lists.apache
We could allow parameterizing transforms by using transform identifiers
from the pipeline, e.g.
options = ['--parameterize=MyTransform;parallelism=5']
with Pipeline.create(PipelineOptions(options)) as p:
p | Create(1, 2, 3) | 'MyTransform' >> ParDo(..)
Those hints should always be opt
However such a parameter would be specific to a single transform,
whereas maxNumWorkers is a global parameter today.
On Sat, Jun 27, 2020 at 10:31 PM Daniel Collins
wrote:
> I could imagine for example, a 'parallelismHint' field in the base
> parameters that could be set to maxNumWorkers when ru
I could imagine for example, a 'parallelismHint' field in the base
parameters that could be set to maxNumWorkers when running on dataflow or
an equivalent parameter when running on flink. It would be useful to get a
default value for the sharding in the Reshuffle changes here
https://github.com/apa
It's an interesting question - this parameter is clearly very runner
specific (e.g. it would be meaningless for the Dataflow runner, where
parallelism is not a static constant). How should we go about passing
runner-specific options per transform?
On Fri, Jun 26, 2020 at 1:14 PM Akshay Iyangar wr
Hi beam community,
So I had brought this issue in our slack channel but I guess this warrants a
deeper discussion and if we do go about what is the POA for it.
So basically currently for Flink Runner we don’t support operator level
parallelism which native Flink provides OOTB. So I was wonderin