[jira] [Commented] (YARN-8451) Multiple NM heartbeat thread created when a slow NM resync with RM

Botong Huang (JIRA) Thu, 28 Jun 2018 11:21:12 -0700


    [ 
https://issues.apache.org/jira/browse/YARN-8451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16526640#comment-16526640
 ]


Botong Huang commented on YARN-8451:
------------------------------------

Hi [~jlowe], I am actually not changing this behavior (not to block dispatcher 
for resync), existing code has been creating a new thread for it. I think the 
reason is that resync involves draining existing heartbeat thread and a 
register call to RM, which can take a long time (say network slow or RM is down 
during master-slave switch). We don't want to block the entire NM for this. It 
maybe much more involved if we want to change this behavior. 

> Multiple NM heartbeat thread created when a slow NM resync with RM
> ------------------------------------------------------------------
>
>                 Key: YARN-8451
>                 URL: https://issues.apache.org/jira/browse/YARN-8451
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Botong Huang
>            Assignee: Botong Huang
>            Priority: Major
>         Attachments: YARN-8451.v1.patch
>
>
> During a NM resync with RM (say RM did a master slave switch), if NM is 
> running slow, more than one RESYNC event may be put into the NM dispatcher by 
> the existing heartbeat thread before they are processed. As a result, 
> multiple new heartbeat thread are later created and start to hb to RM 
> concurrently with their own responseId. If at some point of time, one thread 
> becomes more than one step behind others, RM will send back a resync signal 
> in this heartbeat response, killing all containers in this NM. 
> See comments below for details on how this can happen. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8451) Multiple NM heartbeat thread created when a slow NM resync with RM

Reply via email to