Hi Dinesh, Did updating to Flink 1.10 resolve the issue?
Thanks, — Ken > Hi Andrey, > Sure We will try to use Flink 1.10 to see if HA issues we are facing is fixed > and update in this thread. > > Thanks, > Dinesh > > On Thu, Apr 2, 2020 at 3:22 PM Andrey Zagrebin <azagre...@apache.org > <mailto:azagre...@apache.org>> wrote: > Hi Dinesh, > > Thanks for sharing the logs. There were couple of HA fixes since 1.7, e.g. > [1] and [2]. > I would suggest to try Flink 1.10. > If the problem persists, could you also find the logs of the failed Job > Manager before the failover? > > Best, > Andrey > > [1] https://jira.apache.org/jira/browse/FLINK-14316 > <https://jira.apache.org/jira/browse/FLINK-14316> > [2] https://jira.apache.org/jira/browse/FLINK-11843 > <https://jira.apache.org/jira/browse/FLINK-11843> > On Tue, Mar 31, 2020 at 6:49 AM Dinesh J <dineshj...@gmail.com > <mailto:dineshj...@gmail.com>> wrote: > Hi Yang, > I am attaching one full jobmanager log for a job which I reran today. This a > job that tries to read from savepoint. > Same error message "leader election onging" is displayed. And this stays the > same even after 30 minutes. If I leave the job without yarn kill, it stays > the same forever. > Based on your suggestions till now, I guess it might be some zookeeper > problem. If that is the case, what can I lookout for in zookeeper to figure > out the issue? > > Thanks, > Dinesh [snip] -------------------------- Ken Krugler http://www.scaleunlimited.com custom big data solutions & training Hadoop, Cascading, Cassandra & Solr