[ 
https://issues.apache.org/jira/browse/YARN-7102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16167000#comment-16167000
 ] 

Botong Huang commented on YARN-7102:
------------------------------------

After fighting through unit tests... in v6 patch: 
{{TestAMRMClientContainerRequest.testOpportunisticAndGuaranteedRequests}} is 
already failing in trunk, YARN-7199 opened for it
{{TestContainerAllocation.testAMContainerAllocationWhenDNSUnavailable}} is 
being tracked under YARN-7044
I need help on {{TestContainerManagerSecurity.testContainerManager}}, it seems 
consistently failing in yetus, but I cannot repro locally at all. 

[~wangda] and [~jlowe], can you please take a look? Some quick notes in 
summary: 

After a more strict responseId check in NM heartbeat, we need to drain the RM 
dispatcher events after every {{MockNM}} heartbeat. Otherwise, two sequential 
{{MockNM}} heartbeat will fail on the second heartbeat, because RM is still 
processing the first heartbeat event.

Instead of going through all the place where nm.nodeHeartbeat is called and add 
rm.drainEvent afterwards (some already have though), I changed the {{MockNM}} 
api, and drain RM events inside the heartbeat method.

For easy review, the real changes are in these four files: 
{{ResourceTrackerService, MockNM, TestResourceTrackerService, MiniYarnCluster}} 
and {{TestMiniYarnClusterNodeUtilization}} (removed a test case because it is 
consumed/identical to the other one). All other file changes are simply because 
of api change in {{MockNM}}. 

Thanks in advance!

> NM heartbeat stuck when responseId overflows MAX_INT
> ----------------------------------------------------
>
>                 Key: YARN-7102
>                 URL: https://issues.apache.org/jira/browse/YARN-7102
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Botong Huang
>            Assignee: Botong Huang
>            Priority: Critical
>         Attachments: YARN-7102.v1.patch, YARN-7102.v2.patch, 
> YARN-7102.v3.patch, YARN-7102.v4.patch, YARN-7102.v5.patch, YARN-7102.v6.patch
>
>
> ResponseId overflow problem in NM-RM heartbeat. This is same as AM-RM 
> heartbeat in YARN-6640, please refer to YARN-6640 for details. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to