Exact mechanism of controlling parallelism is runner specific. Looks like Flink allows users to to specify the amount of parallelism (per job) using following option. https://github.com/apache/beam/blob/master/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkPipelineOptions.java#L65
I'm not sure if Flink allows more finer grained control. On Wed, May 16, 2018 at 5:48 PM Harshvardhan Agrawal < [email protected]> wrote: > How do we control parallelism of a particular step then? Is there a > recommended approach to solve this problem? > > On Wed, May 16, 2018 at 20:45 Chamikara Jayalath <[email protected]> > wrote: > >> I don't think this can be specified through Beam API but Flink runner >> might have additional configurations that I'm not aware of. Also, many >> runners fuse steps to improve the execution performance. So simply >> specifying the parallelism of a single step will not work. >> >> Thanks, >> Cham >> >> On Tue, May 15, 2018 at 11:21 AM Harshvardhan Agrawal < >> [email protected]> wrote: >> >>> Hi Guys, >>> >>> I am currently in the process of developing a pipeline using Apache Beam >>> with Flink as an execution engine. As a part of the process I read data >>> from Kafka and perform a bunch of transformations that involve joins, >>> aggregations as well as lookups to an external DB. >>> >>> The idea is that we want to have higher parallelism with Flink when we >>> are performing the aggregations but eventually coalesce the data and have >>> lesser number of processes writing to the DB so that the target DB can >>> handle it (for example say I want to have a parallelism of 40 for >>> aggregations but only 10 when writing to target DB). >>> >>> Is there any way we could do that in Beam? >>> >>> Regards, >>> >>> Harsh >>> >>
