Controlling parallelism of a ParDo Transform while writing to DB

Harshvardhan Agrawal Tue, 15 May 2018 11:21:12 -0700

Hi Guys,

I am currently in the process of developing a pipeline using Apache Beam
with Flink as an execution engine. As a part of the process I read data
from Kafka and perform a bunch of transformations that involve joins,
aggregations as well as lookups to an external DB.


The idea is that we want to have higher parallelism with Flink when we are
performing the aggregations but eventually coalesce the data and have
lesser number of processes writing to the DB so that the target DB can
handle it (for example say I want to have a parallelism of 40 for
aggregations but only 10 when writing to target DB).

Is there any way we could do that in Beam?

Regards,

Harsh
-- 

*Regards,Harshvardhan Agrawal*
*267.991.6618 | LinkedIn <https://www.linkedin.com/in/harshvardhanagr/>*

Controlling parallelism of a ParDo Transform while writing to DB

Reply via email to