Zhijiang,

Thanks for your suggestions. We will keep it in mind!

Kumar

From: Zhijiang <wangzhijiang...@aliyun.com>
Reply-To: Zhijiang <wangzhijiang...@aliyun.com>
Date: Tuesday, May 12, 2020 at 10:10 PM
To: Senthil Kumar <senthi...@vmware.com>, "user@flink.apache.org" 
<user@flink.apache.org>
Subject: Re: Flink Streaming Job Tuning help

Hi Kumar,


I can give some general ideas for further analysis.

> We are finding that flink lags seriously behind when we introduce the keyBy 
> (presumably because of shuffle across the network)
The `keyBy` would break the chained operators, so it might bring obvious 
performance sensitive in practice. I guess if your previous way without keyBy 
can make use of chained mechanism,
the follow-up operator can consume the emitted records from the preceding 
operator directly, no need to involve in buffer serialization-> network shuffle 
-> buffer deserializer processes,
especially your record size 10K is a bit large.

If the keyBy is necessary in your case, then you can further check the current 
bottleneck. E.g. whether there are back pressure which you can monitor from web 
UI. If so, which task is the
bottleneck to cause the back pressure, and you can trace it by network related 
metrics.

Whether there are data skew in your case, that means some task would process 
more records than others. If so, maybe we can increase the parallelism to 
balance the load.

Best,
Zhijiang
------------------------------------------------------------------
From:Senthil Kumar <senthi...@vmware.com>
Send Time:2020年5月13日(星期三) 00:49
To:user@flink.apache.org <user@flink.apache.org>
Subject:Re: Flink Streaming Job Tuning help

I forgot to mention, we are consuming said records from AWS kinesis and writing 
out to S3.

From: Senthil Kumar <senthi...@vmware.com>
Date: Tuesday, May 12, 2020 at 10:47 AM
To: "user@flink.apache.org" <user@flink.apache.org>
Subject: Flink Streaming Job Tuning help

Hello Flink Community!

We have a fairly intensive flink streaming application, processing 8-9 million 
records a minute, with each record being 10k.
One of our steps is a keyBy operation. We are finding that flink lags seriously 
behind when we introduce the keyBy (presumably because of shuffle across the 
network).

We are trying to tune it ourselves (size of nodes, memory, network buffers 
etc), but before we spend way too much time on
this; would it be better to hire some “flink tuning expert” to get us through?

If so what resources are recommended on this list?

Cheers
Kumar

Reply via email to