Re: Controlling parallelism of a ParDo Transform while writing to DB

Harshvardhan Agrawal Wed, 16 May 2018 17:48:07 -0700

How do we control parallelism of a particular step then? Is there a
recommended approach to solve this problem?


On Wed, May 16, 2018 at 20:45 Chamikara Jayalath <[email protected]>
wrote:

> I don't think this can be specified through Beam API but Flink runner
> might have additional configurations that I'm not aware of. Also, many
> runners fuse steps to improve the execution performance. So simply
> specifying the parallelism of a single step will not work.
>
> Thanks,
> Cham
>
> On Tue, May 15, 2018 at 11:21 AM Harshvardhan Agrawal <
> [email protected]> wrote:
>
>> Hi Guys,
>>
>> I am currently in the process of developing a pipeline using Apache Beam
>> with Flink as an execution engine. As a part of the process I read data
>> from Kafka and perform a bunch of transformations that involve joins,
>> aggregations as well as lookups to an external DB.
>>
>> The idea is that we want to have higher parallelism with Flink when we
>> are performing the aggregations but eventually coalesce the data and have
>> lesser number of processes writing to the DB so that the target DB can
>> handle it (for example say I want to have a parallelism of 40 for
>> aggregations but only 10 when writing to target DB).
>>
>> Is there any way we could do that in Beam?
>>
>> Regards,
>>
>> Harsh
>>
>

Re: Controlling parallelism of a ParDo Transform while writing to DB

Reply via email to