Hi Guowei,

Thanks a lot for your reply.

I’m using 1.14.0. The timeout happens at job deployment time. A subtask would 
run for a short period of `akka.ask.timeout` before fails due to the timeout.

I noticed that jobmanager have a very hight CPU usage at the moment, like 
2000%. I’m reasoning about the cause by profiling.

Best,
Paul Lam

> 2022年1月21日 09:56,Guowei Ma <guowei....@gmail.com> 写道:
> 
> Hi, Paul 
> 
> Would you like to share some information such as the Flink version you used 
> and the memory of TM and JM.
> And when does the timeout happen? Such as at begin of the job or during the 
> running of the job
> 
> Best,
> Guowei
> 
> 
> On Thu, Jan 20, 2022 at 4:45 PM Paul Lam <paullin3...@gmail.com 
> <mailto:paullin3...@gmail.com>> wrote:
> Hi,
> 
> I’m tuning a Flink job with 1000+ parallelism, which frequently fails with 
> Akka TimeOutException (it was fine with 200 parallelism). 
> 
> I see some posts recommend increasing `akka.ask.timeout` to 120s. I’m not 
> familiar with Akka but it looks like a very long time compared to the default 
> 10s and as a response timeout.
> 
> So I’m wondering what’s the reasonable range for this option? And why would 
> the Actor fail to respond in time (the message was dropped due to pressure)?
> 
> Any input would be appreciated! Thanks a lot.
> 
> Best,
> Paul Lam
> 

Reply via email to