Re: TM heartbeat timeout due to ResourceManager being busy

2020-10-11 Thread Xintong Song
No worries :) Thank you~ Xintong Song On Mon, Oct 12, 2020 at 2:48 PM Paul Lam wrote: > Sorry for the misspelled name, Xintong > > Best, > Paul Lam > > 2020年10月12日 14:46,Paul Lam 写道: > > Hi Xingtong, > > Thanks a lot for the pointer! > > It’s good to see there would be a new IO executor to

Re: why we need keyed state and operate state when we already have checkpoint?

2020-10-11 Thread Arvid Heise
Hi 大森林, You can always resume from checkpoints independent of the usage of keyed or non-keyed state of operators. 1 checkpoint contains the state of all operators at a given point in time. Each operator may have keyed state, raw state, or non-keyed state. As long as you are not changing the operat

Re: TM heartbeat timeout due to ResourceManager being busy

2020-10-11 Thread Paul Lam
Sorry for the misspelled name, Xintong Best, Paul Lam > 2020年10月12日 14:46,Paul Lam 写道: > > Hi Xingtong, > > Thanks a lot for the pointer! > > It’s good to see there would be a new IO executor to take care of the TM > contexts. Looking forward to the 1.12 release! > > Best, > Paul Lam > >>

Re: TM heartbeat timeout due to ResourceManager being busy

2020-10-11 Thread Paul Lam
Hi Xingtong, Thanks a lot for the pointer! It’s good to see there would be a new IO executor to take care of the TM contexts. Looking forward to the 1.12 release! Best, Paul Lam > 2020年10月12日 14:18,Xintong Song 写道: > > Hi Paul, > > Thanks for reporting this. > > Indeed, Flink's RM currentl

Re: [PyFlink] register udf functions with different versions of the same library in the same job

2020-10-11 Thread Sharipov, Rinat
Hi Xingbo ! Thx a lot for such a detailed reply, it is very useful. пн, 12 окт. 2020 г. в 09:32, Xingbo Huang : > Hi, > I will do my best to provide pyflink related content, I hope it helps you. > > >>> each udf function is a separate process, that is managed by Beam (but > I'm not sure I got it

Re: ConnectionPool to DB and parallelism of operator question

2020-10-11 Thread Arvid Heise
Hi Vijay, If you implement the SinkFunction yourself, you can share the OkHttpClient.Builder across all instances in the same taskmanager by using a static field and initializing it only once (ideally in RichSinkFunction#open). On Tue, Oct 6, 2020 at 9:37 AM Aljoscha Krettek wrote: > Hi, > > si

Re: TM heartbeat timeout due to ResourceManager being busy

2020-10-11 Thread Xintong Song
FYI, I just created FLINK-19568 for tracking this issue. Thank you~ Xintong Song [1] https://issues.apache.org/jira/browse/FLINK-19568 On Mon, Oct 12, 2020 at 2:18 PM Xintong Song wrote: > Hi Paul, > > Thanks for reporting this. > > Indeed, Flink's RM currently performs several HDFS operati

Re: flink checkpoint timeout

2020-10-11 Thread Arvid Heise
Hi Omkar, I don't see anything suspicious in regards to how Flink handles checkpointing; it simply took longer than 10m (configured checkpointing timeout) to checkpoint. The usual reason for long checkpointing times is backpressure. And indeed looking at your thread dump, I see that you have a sl

Re: [PyFlink] register udf functions with different versions of the same library in the same job

2020-10-11 Thread Xingbo Huang
Hi, I will do my best to provide pyflink related content, I hope it helps you. >>> each udf function is a separate process, that is managed by Beam (but I'm not sure I got it right). Strictly speaking, it is not true that every UDF is in a different python process. For example, the two python fu

Re: TM heartbeat timeout due to ResourceManager being busy

2020-10-11 Thread Xintong Song
Hi Paul, Thanks for reporting this. Indeed, Flink's RM currently performs several HDFS operations in the rpc main thread when preparing the TM context, which may block the main thread when HDFS is slow. Unfortunately, I don't see any out-of-box approach that fixes the problem at the moment, exce

TM heartbeat timeout due to ResourceManager being busy

2020-10-11 Thread Paul Lam
Hi, After FLINK-13184 is implemented (even with Flink 1.11), occasionally there would still be jobs with high parallelism getting TM-RM heartbeat timeouts when RM is busy creating TM contexts on cluster initialization and HDFS is slow at that moment. Apart from increasing the TM heartbeat ti

Re: state access causing segmentation fault

2020-10-11 Thread Arvid Heise
Hi Edward, could you try adding the static keyword to ExecQueue and RingBufferExec? As is they hold a reference to the MyKeyedProcessFunction, which has unforeseen consequences. On Sun, Oct 11, 2020 at 5:38 AM Colletta, Edward wrote: > Tried to attach tar file but it got blocked. Resending wi