[ 
https://issues.apache.org/jira/browse/YARN-7102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Botong Huang updated YARN-7102:
-------------------------------
    Attachment: YARN-7102.v2.patch

Some explanations since v2 patch is much bigger. This change revealed more 
flaky tests regarding MockNM heartbeats to RM. Every heartbeat triggers events 
dispatched in RM. Which needs draining for many cases. Furthermore, with this 
change enforcing more strict responseId check, now we need to drain the RM 
dispatcher events after every MockNM heartbeat. Otherwise, two sequential 
MockNM heartbeat will fail on the second heartbeat, because RM is still 
processing the first heartbeat event. 

Instead of going through all the place where {{nm.nodeHeartbeat}} is called and 
add {{rm.drainEvent}} afterwards, I changed the MockNM api, and call drain 
inside the heartbeat method. 

For easy review, the real changes are in these four files: 
{{ResourceTrackerService}}, {{MockNM}}, {{MockRM}} and 
{{TestResourceTrackerService}}. All other file changes are simply because of 
api change in MockNM. 

> NM heartbeat stuck when responseId overflows MAX_INT
> ----------------------------------------------------
>
>                 Key: YARN-7102
>                 URL: https://issues.apache.org/jira/browse/YARN-7102
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Botong Huang
>            Assignee: Botong Huang
>            Priority: Critical
>         Attachments: YARN-7102.v1.patch, YARN-7102.v2.patch
>
>
> ResponseId overflow problem in NM-RM heartbeat. This is same as AM-RM 
> heartbeat in YARN-6640, please refer to YARN-6640 for details. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to