Hi Robert, Thanks for the advice! Checking Flink Forward talks seems like a good idea, will do 👍
On Sat, May 22, 2021 at 4:19 AM Robert Metzger <rmetz...@apache.org> wrote: > Hi Yaroslav, > > My recommendation is to go with the 2nd pattern you've described, but I > only have limited insights into real world production workloads. > > Besides the parallelism configuration, I also recommend looking into slot > sharing groups, and maybe disabling operator chaining. > I'm pretty sure some of Flink's large production users have shared > information about this in past Flink Forward talks .. but it is difficult > to find answers (unless you spend a lot of time on YouTube). > > Best, > Robert > > > On Wed, May 19, 2021 at 8:01 PM Yaroslav Tkachenko < > yaroslav.tkache...@shopify.com> wrote: > >> Hi everyone, >> >> I'd love to learn more about how different companies approach specifying >> Flink parallelism. I'm specifically interested in real, production >> workloads. >> >> I can see a few common patterns: >> >> - Rely on default parallelism, scale by changing parallelism for the >> whole pipeline. I guess it only works if the pipeline doesn't have obvious >> bottlenecks. Also, it looks like the new reactive mode makes specifying >> parallelism for an operator obsolete ( >> https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/deployment/elastic_scaling/#configuration >> ) >> >> - Rely on default parallelism for most of the operators, but override it >> for some. For example, it doesn't make sense for a Kafka source to have >> parallelism higher than the number of partitions it consumes. Some custom >> sinks could choose lower parallelism to avoid overloading their >> destinations. Some transformation steps could choose higher parallelism to >> distribute the work better, etc. >> >> - Don't rely on default parallelism and configure parallelism >> explicitly for each operator. This requires very good knowledge of each >> operator in the pipeline, but it could lead to very good performance. >> >> Is there a different pattern that I miss? What do you use? Feel free to >> share any resources. >> >> If you do specify it explicitly, what do you think about the reactive >> mode? Will you use it? >> >> Also, how often do you change parallelism? Do you set it once and forget >> once the pipeline is stable? Do you keep re-evaluating it? >> >> Thanks. >> >