subject:"TM heartbeat timeout due to ResourceManager being busy"

Re: TM heartbeat timeout due to ResourceManager being busy

2020-10-12 Thread Xintong Song

No worries :) Thank you~ Xintong Song On Mon, Oct 12, 2020 at 2:48 PM Paul Lam wrote: > Sorry for the misspelled name, Xintong > > Best, > Paul Lam > > 2020年10月12日 14:46，Paul Lam 写道： > > Hi Xingtong, > > Thanks a lot for the pointer! > > It’s good to see there would be a new IO executor

Re: TM heartbeat timeout due to ResourceManager being busy

2020-10-12 Thread Paul Lam

Sorry for the misspelled name, Xintong Best, Paul Lam > 2020年10月12日 14:46，Paul Lam 写道： > > Hi Xingtong, > > Thanks a lot for the pointer! > > It’s good to see there would be a new IO executor to take care of the TM > contexts. Looking forward to the 1.12 release! > > Best, > Paul Lam > >>

Re: TM heartbeat timeout due to ResourceManager being busy

2020-10-12 Thread Paul Lam

Hi Xingtong, Thanks a lot for the pointer! It’s good to see there would be a new IO executor to take care of the TM contexts. Looking forward to the 1.12 release! Best, Paul Lam > 2020年10月12日 14:18，Xintong Song 写道： > > Hi Paul, > > Thanks for reporting this. > > Indeed, Flink's RM

Re: TM heartbeat timeout due to ResourceManager being busy

2020-10-12 Thread Xintong Song

FYI, I just created FLINK-19568 for tracking this issue. Thank you~ Xintong Song [1] https://issues.apache.org/jira/browse/FLINK-19568 On Mon, Oct 12, 2020 at 2:18 PM Xintong Song wrote: > Hi Paul, > > Thanks for reporting this. > > Indeed, Flink's RM currently performs several HDFS

Re: TM heartbeat timeout due to ResourceManager being busy

2020-10-12 Thread Xintong Song

Hi Paul, Thanks for reporting this. Indeed, Flink's RM currently performs several HDFS operations in the rpc main thread when preparing the TM context, which may block the main thread when HDFS is slow. Unfortunately, I don't see any out-of-box approach that fixes the problem at the moment,

TM heartbeat timeout due to ResourceManager being busy

2020-10-11 Thread Paul Lam

Hi, After FLINK-13184 is implemented (even with Flink 1.11), occasionally there would still be jobs with high parallelism getting TM-RM heartbeat timeouts when RM is busy creating TM contexts on cluster initialization and HDFS is slow at that moment. Apart from increasing the TM heartbeat

Re: TM heartbeat timeout due to ResourceManager being busy

Re: TM heartbeat timeout due to ResourceManager being busy

Re: TM heartbeat timeout due to ResourceManager being busy

Re: TM heartbeat timeout due to ResourceManager being busy

Re: TM heartbeat timeout due to ResourceManager being busy

TM heartbeat timeout due to ResourceManager being busy

6 matches

Site Navigation

Mail list logo

Footer information