[ https://issues.apache.org/jira/browse/YARN-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14346175#comment-14346175 ]
zhihai xu commented on YARN-2893: --------------------------------- Hi [~vinodkv], Sporadic job failures are due to the cascading sharing the credentials between Jobs. Because the Credentials class is not thread-safe, if multiple jobs try to access the shared credentials, we will have the race condition, which will cause Sporadic job failures. The shared credentials is introduced in JobConf constructor: If we create a new job using JobConf from the old job, these two jobs will share the same credentials. {code} public JobConf(Configuration conf) { super(conf); if (conf instanceof JobConf) { JobConf that = (JobConf)conf; credentials = that.credentials; } checkAndWarnDeprecation(); } {code} The credential from JobConf will be passed to YARNRunner#submitJob which will call createApplicationSubmissionContext to configure Tokens in ContainerLaunchContext {code} DataOutputBuffer dob = new DataOutputBuffer(); ts.writeTokenStorageToStream(dob); ByteBuffer securityTokens = ByteBuffer.wrap(dob.getData(), 0, dob.getLength()); ContainerLaunchContext amContainer = ContainerLaunchContext.newInstance(localResources, environment, vargsFinal, null, securityTokens, acls); {code} It looks like we have two other potential issues in JobConf and Credentials. I created MAPREDUCE-6269 and HADOOP-11667 for separate discussion. > AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream > ------------------------------------------------------------------------------ > > Key: YARN-2893 > URL: https://issues.apache.org/jira/browse/YARN-2893 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager > Affects Versions: 2.4.0 > Reporter: Gera Shegalov > Assignee: zhihai xu > Attachments: YARN-2893.000.patch > > > MapReduce jobs on our clusters experience sporadic failures due to corrupt > tokens in the AM launch context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)