How do we control parallelism of a particular step then? Is there a recommended approach to solve this problem?
On Wed, May 16, 2018 at 20:45 Chamikara Jayalath <[email protected]> wrote: > I don't think this can be specified through Beam API but Flink runner > might have additional configurations that I'm not aware of. Also, many > runners fuse steps to improve the execution performance. So simply > specifying the parallelism of a single step will not work. > > Thanks, > Cham > > On Tue, May 15, 2018 at 11:21 AM Harshvardhan Agrawal < > [email protected]> wrote: > >> Hi Guys, >> >> I am currently in the process of developing a pipeline using Apache Beam >> with Flink as an execution engine. As a part of the process I read data >> from Kafka and perform a bunch of transformations that involve joins, >> aggregations as well as lookups to an external DB. >> >> The idea is that we want to have higher parallelism with Flink when we >> are performing the aggregations but eventually coalesce the data and have >> lesser number of processes writing to the DB so that the target DB can >> handle it (for example say I want to have a parallelism of 40 for >> aggregations but only 10 when writing to target DB). >> >> Is there any way we could do that in Beam? >> >> Regards, >> >> Harsh >> >
