Hi,
I appreciate your comments and thank you for that. My original question
still remains though. Why the very same job just by changing the settings
aforementioned had this increase in cpu usage and performance degradation
when we should have expected the opposite behaviour?

thanks again,
Oscar

On Mon, 15 Apr 2024 at 15:11, Zhanghao Chen <zhanghao.c...@outlook.com>
wrote:

> The exception basically says the remote TM is unreachable, probably
> terminated due to some other reasons. This may not be the root cause. Is
> there any other exceptions in the log? Also, since the overall resource
> usage is almost full, could you try allocating more CPUs and see if the
> instability persists?
>
> Best,
> Zhanghao Chen
> ------------------------------
> *From:* Oscar Perez <oscarfernando.pe...@n26.com>
> *Sent:* Monday, April 15, 2024 19:24
> *To:* Zhanghao Chen <zhanghao.c...@outlook.com>
> *Cc:* Oscar Perez via user <user@flink.apache.org>
> *Subject:* Re: Flink job performance
>
> Hei, ok that is weird. Let me resend them.
>
> Regards,
> Oscar
>
> On Mon, 15 Apr 2024 at 14:00, Zhanghao Chen <zhanghao.c...@outlook.com>
> wrote:
>
> Hi, there seems to be sth wrong with the two images attached in the latest
> email. I cannot open them.
>
> Best,
> Zhanghao Chen
> ------------------------------
> *From:* Oscar Perez via user <user@flink.apache.org>
> *Sent:* Monday, April 15, 2024 15:57
> *To:* Oscar Perez via user <user@flink.apache.org>; pi-team <
> pi-t...@n26.com>; Hermes Team <hermes-t...@n26.com>
> *Subject:* Flink job performance
>
> Hi community!
>
> We have an interesting problem with Flink after increasing parallelism in
> a certain way. Here is the summary:
>
> 1)  We identified that our job bottleneck were some Co-keyed process
> operators that were affecting on previous operators causing backpressure.
> 2( What we did was to increase the parallelism to all the operators from 6
> to 12 but keeping 6 these operators that read from kafka. The main reason
> was that all our topics have 6 partitions so increasing the parallelism
> will not yield better performance
>
> See attached job layout prior and after the changes:
> What happens was that some operations that were chained in the same
> operator like reading - filter - map - filter now are rebalanced and the
> overall performance of the job is suffering (keeps throwing exceptions now
> and then)
>
> Is the rebalance operation going over the network or this happens in the
> same node? How can we effectively improve performance of this job with the
> given resources?
>
> Thanks for the input!
> Regards,
> Oscar
>
>
>

Reply via email to