Re: Parallelism in Production: Best Practices

Yaroslav Tkachenko Sat, 22 May 2021 11:33:41 -0700

Hi Robert,

Thanks for the advice! Checking Flink Forward talks seems like a good idea,
will do 👍


On Sat, May 22, 2021 at 4:19 AM Robert Metzger <rmetz...@apache.org> wrote:

> Hi Yaroslav,
>
> My recommendation is to go with the 2nd pattern you've described, but I
> only have limited insights into real world production workloads.
>
> Besides the parallelism configuration, I also recommend looking into slot
> sharing groups, and maybe disabling operator chaining.
> I'm pretty sure some of Flink's large production users have shared
> information about this in past Flink Forward talks .. but it is difficult
> to find answers (unless you spend a lot of time on YouTube).
>
> Best,
> Robert
>
>
> On Wed, May 19, 2021 at 8:01 PM Yaroslav Tkachenko <
> yaroslav.tkache...@shopify.com> wrote:
>
>> Hi everyone,
>>
>> I'd love to learn more about how different companies approach specifying
>> Flink parallelism. I'm specifically interested in real, production
>> workloads.
>>
>> I can see a few common patterns:
>>
>> - Rely on default parallelism, scale by changing parallelism for the
>> whole pipeline. I guess it only works if the pipeline doesn't have obvious
>> bottlenecks. Also, it looks like the new reactive mode makes specifying
>> parallelism for an operator obsolete (
>> https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/deployment/elastic_scaling/#configuration
>> )
>>
>> - Rely on default parallelism for most of the operators, but override it
>> for some. For example, it doesn't make sense for a Kafka source to have
>> parallelism higher than the number of partitions it consumes. Some custom
>> sinks could choose lower parallelism to avoid overloading their
>> destinations. Some transformation steps could choose higher parallelism to
>> distribute the work better, etc.
>>
>> - Don't rely on default parallelism and configure parallelism
>> explicitly for each operator. This requires very good knowledge of each
>> operator in the pipeline, but it could lead to very good performance.
>>
>> Is there a different pattern that I miss? What do you use? Feel free to
>> share any resources.
>>
>> If you do specify it explicitly, what do you think about the reactive
>> mode? Will you use it?
>>
>> Also, how often do you change parallelism? Do you set it once and forget
>> once the pipeline is stable? Do you keep re-evaluating it?
>>
>> Thanks.
>>
>

Re: Parallelism in Production: Best Practices

Reply via email to