[ 
https://issues.apache.org/jira/browse/YARN-5937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15797609#comment-15797609
 ] 

Naganarasimha G R commented on YARN-5937:
-----------------------------------------

Sorry for the delayed reply, Actually i was looking out for normal case also NM 
was not shutting down gracefully. Offlate i have not tested trunk code. Let me 
test if its there then we can fix both issues together. Existing solution seems 
fine to me !


> stop-yarn.sh is not able to gracefully stop node managers
> ---------------------------------------------------------
>
>                 Key: YARN-5937
>                 URL: https://issues.apache.org/jira/browse/YARN-5937
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Weiwei Yang
>            Assignee: Weiwei Yang
>              Labels: script
>         Attachments: YARN-5937.01.patch, nm_shutdown.log
>
>
> stop-yarn.sh always gives following output
> {code}
> ./sbin/stop-yarn.sh
> Stopping resourcemanager
> Stopping nodemanagers
> <NM_HOST>: WARNING: nodemanager did not stop gracefully after 5 seconds: 
> Trying to kill with kill -9
> <NM_HOST>: ERROR: Unable to kill 18097
> {code}
> this was because resource manager is stopped before node managers, when the 
> shutdown hook manager tries to gracefully stop NM services, NM needs to 
> unregister with RM, and it gets timeout as NM could not connect to RM 
> (already stopped). See log (stop RM then run kill <nm_pid>)
> {code}
> 16/11/28 08:26:43 ERROR nodemanager.NodeManager: RECEIVED SIGNAL 15: SIGTERM
> ...
> 16/11/28 08:26:53 WARN util.ShutdownHookManager: ShutdownHook 
> 'CompositeServiceShutdownHook' timeout, java.util.concurrent.TimeoutException
> java.util.concurrent.TimeoutException
>       at java.util.concurrent.FutureTask.get(FutureTask.java:205)
>       at 
> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:67)
> ...
>       at 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.unRegisterNM(NodeStatusUpdaterImpl.java:291)
> ...
> 16/11/28 08:27:13 ERROR util.ShutdownHookManager: ShutdownHookManger shutdown 
> forcefully.
> {code}
> the shutdown hooker has a default of 10s timeout, so if RM is stopped before 
> NMs, they always took more than 10s to stop (in java code). However 
> stop-yarn.sh only gives 5s timeout, so NM is always killed instead of stopped.
> It would make sense to stop NMs before RMs in this script, in a graceful way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to