[ https://issues.apache.org/jira/browse/YARN-6640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16138598#comment-16138598 ]
Wangda Tan commented on YARN-6640: ---------------------------------- [~botong], I might misread your comment, will review the updated patch and let you know. > AM heartbeat stuck when responseId overflows MAX_INT > ----------------------------------------------------- > > Key: YARN-6640 > URL: https://issues.apache.org/jira/browse/YARN-6640 > Project: Hadoop YARN > Issue Type: Bug > Reporter: Botong Huang > Assignee: Botong Huang > Priority: Blocker > Attachments: YARN-6640.v1.patch, YARN-6640.v2.patch > > > The current code in {{ApplicationMasterService}}: > if ((request.getResponseId() + 1) == lastResponse.getResponseId()) {/* old > heartbeat */ return lastResponse;} > else if (request.getResponseId() + 1 < lastResponse.getResponseId()) { throw > ... } > process the heartbeat... > When a heartbeat comes in, in usual case we are expecting > request.getResponseId() == lastResponse.getResponseId(). The “if“ is for the > duplicate heartbeat that’s one step old, the “else if” is to throw and > complain for heartbeats more than two steps old, otherwise we accept the new > heartbeat and process it. > So the bug is: when lastResponse.getResponseId() == MAX_INT, the newest > heartbeat comes in with responseId == MAX_INT. However reponseId + 1 will be > MIN_INT, and we will fall into the “else if” case and RM will throw. Then we > are stuck here… -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org