Hi Guowei, Thanks a lot for your reply.
I’m using 1.14.0. The timeout happens at job deployment time. A subtask would run for a short period of `akka.ask.timeout` before fails due to the timeout. I noticed that jobmanager have a very hight CPU usage at the moment, like 2000%. I’m reasoning about the cause by profiling. Best, Paul Lam > 2022年1月21日 09:56,Guowei Ma <guowei....@gmail.com> 写道: > > Hi, Paul > > Would you like to share some information such as the Flink version you used > and the memory of TM and JM. > And when does the timeout happen? Such as at begin of the job or during the > running of the job > > Best, > Guowei > > > On Thu, Jan 20, 2022 at 4:45 PM Paul Lam <paullin3...@gmail.com > <mailto:paullin3...@gmail.com>> wrote: > Hi, > > I’m tuning a Flink job with 1000+ parallelism, which frequently fails with > Akka TimeOutException (it was fine with 200 parallelism). > > I see some posts recommend increasing `akka.ask.timeout` to 120s. I’m not > familiar with Akka but it looks like a very long time compared to the default > 10s and as a response timeout. > > So I’m wondering what’s the reasonable range for this option? And why would > the Actor fail to respond in time (the message was dropped due to pressure)? > > Any input would be appreciated! Thanks a lot. > > Best, > Paul Lam >