Hi Gary, Thanks for the help.
Gary Yao-3 wrote > You are writing that it takes YARN 10 minutes to restart the application > master (AM). However, in my experiments the AM container is restarted > within a > few seconds when after killing the process. If in your setup YARN actually > needs 10 minutes to restart the AM, then you could try increasing the > number > of retry attempts by the client [2]. I think that comes from the difference in how we tested. When I tried to kill the JM process (using kill -9 pid) then a new process got created within some seconds. However, when I tried to test by crashing the server (using init 0), then it needed some time. I found the yarn-site parameter for that timer: yarn.am.liveness-monitor.expiry-interval-ms, which is default to 10 minutes [1] I increased the rest client configuration as you suggested, and it did help. Gary Yao-3 wrote > The REST API that is queried by the Web UI returns the root cause from the > ExecutionGraph [3]. All job status transitions should be logged together > with > the exception that caused the transition [4]. Check for INFO level log > messages that start with "Job [...] switched from state" followed by a > stacktrace. If you cannot find the exception, the problem might be rooted > in > your log4j or logback configuration. Thanks. I got the point. I am using logback. Tried to configure rolling logs, but not yet success yet. Will need to try more. Thanks and regards, Averell [1] https://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-common/yarn-default.xml#yarn.am.liveness-monitor.expiry-interval-ms <https://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-common/yarn-default.xml#yarn.am.liveness-monitor.expiry-interval-ms> -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/