[ https://issues.apache.org/jira/browse/YARN-5937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15813490#comment-15813490 ]
Weiwei Yang commented on YARN-5937: ----------------------------------- Perfect, thank you [~Naganarasimha] :) > stop-yarn.sh is not able to gracefully stop node managers > --------------------------------------------------------- > > Key: YARN-5937 > URL: https://issues.apache.org/jira/browse/YARN-5937 > Project: Hadoop YARN > Issue Type: Bug > Reporter: Weiwei Yang > Assignee: Weiwei Yang > Labels: script > Attachments: YARN-5937.01.patch, nm_shutdown.log > > > stop-yarn.sh always gives following output > {code} > ./sbin/stop-yarn.sh > Stopping resourcemanager > Stopping nodemanagers > <NM_HOST>: WARNING: nodemanager did not stop gracefully after 5 seconds: > Trying to kill with kill -9 > <NM_HOST>: ERROR: Unable to kill 18097 > {code} > this was because resource manager is stopped before node managers, when the > shutdown hook manager tries to gracefully stop NM services, NM needs to > unregister with RM, and it gets timeout as NM could not connect to RM > (already stopped). See log (stop RM then run kill <nm_pid>) > {code} > 16/11/28 08:26:43 ERROR nodemanager.NodeManager: RECEIVED SIGNAL 15: SIGTERM > ... > 16/11/28 08:26:53 WARN util.ShutdownHookManager: ShutdownHook > 'CompositeServiceShutdownHook' timeout, java.util.concurrent.TimeoutException > java.util.concurrent.TimeoutException > at java.util.concurrent.FutureTask.get(FutureTask.java:205) > at > org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:67) > ... > at > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.unRegisterNM(NodeStatusUpdaterImpl.java:291) > ... > 16/11/28 08:27:13 ERROR util.ShutdownHookManager: ShutdownHookManger shutdown > forcefully. > {code} > the shutdown hooker has a default of 10s timeout, so if RM is stopped before > NMs, they always took more than 10s to stop (in java code). However > stop-yarn.sh only gives 5s timeout, so NM is always killed instead of stopped. > It would make sense to stop NMs before RMs in this script, in a graceful way. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org