[ https://issues.apache.org/jira/browse/YARN-4621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Weiwei Yang updated YARN-4621: ------------------------------ Attachment: terasort_job_failed.log > Job failed if time on one NM is out of sync even other nodes are sync'd > ----------------------------------------------------------------------- > > Key: YARN-4621 > URL: https://issues.apache.org/jira/browse/YARN-4621 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager, resourcemanager > Reporter: Weiwei Yang > Assignee: Weiwei Yang > Attachments: terasort_job_failed.log > > > RM tried allocated more than 10 containers on a NM (on which time is out of > sync), they all failed with token expired error, and job eventually failed. > We can add a new state to NodeState, e.g UNSYNC, if a node is not sync'd with > RM, RM then can skip allocating containers on this node. -- This message was sent by Atlassian JIRA (v6.3.4#6332)