Re: Documentation for deep diving into flink (data-streaming) job restart process

2021-09-12 Thread Puneet Duggal
Hi Robert, Any solution / alternate approach to above issue would be appreciated as going live with new jobs will be unreliable w.r.t task manager going down. On Fri, Sep 10, 2021 at 1:17 PM Puneet Duggal wrote: > Hi Robert, > > Thanks for taking out time to go through the logs. > > Problem: >

Re: Documentation for deep diving into flink (data-streaming) job restart process

2021-09-10 Thread Puneet Duggal
Hi Robert, Thanks for taking out time to go through the logs. Problem: So reason for restarting all the task managers was to incorporate increased jvm metaspace size for each existing task manager. Currently each taskmanager has 32 slots. But JVM metaspace size was 256 MB which used to get fil

Re: Documentation for deep diving into flink (data-streaming) job restart process

2021-09-09 Thread Robert Metzger
Thanks for the log. >From the partial log that you shared with me, my assumption is that some external resource manager is shutting down your cluster. Multiple TaskManagers are disconnecting, and finally the job is switching into failed state. It seems that you are not stopping only one TaskManger

Re: Documentation for deep diving into flink (data-streaming) job restart process

2021-09-09 Thread Puneet Duggal
Hi, Please find attached logfile regarding job not getting restarted on another task manager once existing task manager got restarted. Just FYI - We are using Fixed Delay Restart (5 times, 10s delay) On Thu, Sep 9, 2021 at 4:29 PM Robert Metzger wrote: > Hi Puneet, > > Can you provide us with

Re: Documentation for deep diving into flink (data-streaming) job restart process

2021-09-09 Thread Robert Metzger
Hi Puneet, Can you provide us with the JobManager logs of this incident? Jobs should not disappear, they should restart on other Task Managers. On Wed, Sep 8, 2021 at 3:06 PM Puneet Duggal wrote: > Hi, > > So for past 2-3 days i have been looking for documentation which > elaborates how flink t

Documentation for deep diving into flink (data-streaming) job restart process

2021-09-08 Thread Puneet Duggal
Hi, So for past 2-3 days i have been looking for documentation which elaborates how flink takes care of restarting the data streaming job. I know all the restart and failover strategies but wanted to know how different components (Job Manager, Task Manager etc) play a role while restarting the