[ https://issues.apache.org/jira/browse/YARN-7102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Botong Huang updated YARN-7102: ------------------------------- Attachment: YARN-7102.v2.patch Some explanations since v2 patch is much bigger. This change revealed more flaky tests regarding MockNM heartbeats to RM. Every heartbeat triggers events dispatched in RM. Which needs draining for many cases. Furthermore, with this change enforcing more strict responseId check, now we need to drain the RM dispatcher events after every MockNM heartbeat. Otherwise, two sequential MockNM heartbeat will fail on the second heartbeat, because RM is still processing the first heartbeat event. Instead of going through all the place where {{nm.nodeHeartbeat}} is called and add {{rm.drainEvent}} afterwards, I changed the MockNM api, and call drain inside the heartbeat method. For easy review, the real changes are in these four files: {{ResourceTrackerService}}, {{MockNM}}, {{MockRM}} and {{TestResourceTrackerService}}. All other file changes are simply because of api change in MockNM. > NM heartbeat stuck when responseId overflows MAX_INT > ---------------------------------------------------- > > Key: YARN-7102 > URL: https://issues.apache.org/jira/browse/YARN-7102 > Project: Hadoop YARN > Issue Type: Bug > Reporter: Botong Huang > Assignee: Botong Huang > Priority: Critical > Attachments: YARN-7102.v1.patch, YARN-7102.v2.patch > > > ResponseId overflow problem in NM-RM heartbeat. This is same as AM-RM > heartbeat in YARN-6640, please refer to YARN-6640 for details. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org