Zhijiang, Thanks for your suggestions. We will keep it in mind!
Kumar From: Zhijiang <wangzhijiang...@aliyun.com> Reply-To: Zhijiang <wangzhijiang...@aliyun.com> Date: Tuesday, May 12, 2020 at 10:10 PM To: Senthil Kumar <senthi...@vmware.com>, "user@flink.apache.org" <user@flink.apache.org> Subject: Re: Flink Streaming Job Tuning help Hi Kumar, I can give some general ideas for further analysis. > We are finding that flink lags seriously behind when we introduce the keyBy > (presumably because of shuffle across the network) The `keyBy` would break the chained operators, so it might bring obvious performance sensitive in practice. I guess if your previous way without keyBy can make use of chained mechanism, the follow-up operator can consume the emitted records from the preceding operator directly, no need to involve in buffer serialization-> network shuffle -> buffer deserializer processes, especially your record size 10K is a bit large. If the keyBy is necessary in your case, then you can further check the current bottleneck. E.g. whether there are back pressure which you can monitor from web UI. If so, which task is the bottleneck to cause the back pressure, and you can trace it by network related metrics. Whether there are data skew in your case, that means some task would process more records than others. If so, maybe we can increase the parallelism to balance the load. Best, Zhijiang ------------------------------------------------------------------ From:Senthil Kumar <senthi...@vmware.com> Send Time:2020年5月13日(星期三) 00:49 To:user@flink.apache.org <user@flink.apache.org> Subject:Re: Flink Streaming Job Tuning help I forgot to mention, we are consuming said records from AWS kinesis and writing out to S3. From: Senthil Kumar <senthi...@vmware.com> Date: Tuesday, May 12, 2020 at 10:47 AM To: "user@flink.apache.org" <user@flink.apache.org> Subject: Flink Streaming Job Tuning help Hello Flink Community! We have a fairly intensive flink streaming application, processing 8-9 million records a minute, with each record being 10k. One of our steps is a keyBy operation. We are finding that flink lags seriously behind when we introduce the keyBy (presumably because of shuffle across the network). We are trying to tune it ourselves (size of nodes, memory, network buffers etc), but before we spend way too much time on this; would it be better to hire some “flink tuning expert” to get us through? If so what resources are recommended on this list? Cheers Kumar