Re: Controlling parallelism of a ParDo Transform while writing to DB

Chamikara Jayalath Wed, 16 May 2018 22:24:56 -0700

Exact mechanism of controlling parallelism is runner specific. Looks like
Flink allows users to to specify the amount of parallelism (per job) using
following option.
https://github.com/apache/beam/blob/master/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkPipelineOptions.java#L65


I'm not sure if Flink allows more finer grained control.

On Wed, May 16, 2018 at 5:48 PM Harshvardhan Agrawal <
[email protected]> wrote:

> How do we control parallelism of a particular step then? Is there a
> recommended approach to solve this problem?
>
> On Wed, May 16, 2018 at 20:45 Chamikara Jayalath <[email protected]>
> wrote:
>
>> I don't think this can be specified through Beam API but Flink runner
>> might have additional configurations that I'm not aware of. Also, many
>> runners fuse steps to improve the execution performance. So simply
>> specifying the parallelism of a single step will not work.
>>
>> Thanks,
>> Cham
>>
>> On Tue, May 15, 2018 at 11:21 AM Harshvardhan Agrawal <
>> [email protected]> wrote:
>>
>>> Hi Guys,
>>>
>>> I am currently in the process of developing a pipeline using Apache Beam
>>> with Flink as an execution engine. As a part of the process I read data
>>> from Kafka and perform a bunch of transformations that involve joins,
>>> aggregations as well as lookups to an external DB.
>>>
>>> The idea is that we want to have higher parallelism with Flink when we
>>> are performing the aggregations but eventually coalesce the data and have
>>> lesser number of processes writing to the DB so that the target DB can
>>> handle it (for example say I want to have a parallelism of 40 for
>>> aggregations but only 10 when writing to target DB).
>>>
>>> Is there any way we could do that in Beam?
>>>
>>> Regards,
>>>
>>> Harsh
>>>
>>

Re: Controlling parallelism of a ParDo Transform while writing to DB

Reply via email to