[ https://issues.apache.org/jira/browse/YARN-6019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Aleksandr Balitsky updated YARN-6019: ------------------------------------- Component/s: resourcemanager > MR application fails with "No NMToken sent" exception after MRAppMaster > recovery > -------------------------------------------------------------------------------- > > Key: YARN-6019 > URL: https://issues.apache.org/jira/browse/YARN-6019 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, yarn > Affects Versions: 2.7.0 > Environment: Centos 7 > Reporter: Aleksandr Balitsky > Priority: Critical > Attachments: YARN-6019.001.patch > > > *Steps to reproduce:* > 1) Submit MR application (for example PI app with 50 containers) > 2) Find MRAppMaster process id for the application > 3) Kill MRAppMaster by kill -9 command > *Expected:* ResourceManager launch new MRAppMaster container and MRAppAttempt > and application finish correctly > *Actually:* After launching new MRAppMaster and MRAppAttempt the application > fails with the following exception: > {noformat} > 2016-12-22 23:17:53,929 ERROR [ContainerLauncher #9] > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container > launch failed for container_1482408247195_0002_02_000011 : > org.apache.hadoop.security.token.SecretManager$InvalidToken: No NMToken sent > for node1:43037 > at > org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.newProxy(ContainerManagementProtocolProxy.java:254) > at > org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.<init>(ContainerManagementProtocolProxy.java:244) > at > org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy.getProxy(ContainerManagementProtocolProxy.java:129) > at > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.getCMProxy(ContainerLauncherImpl.java:395) > at > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:138) > at > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:361) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > {noformat} > *Problem*: > When RMCommunicator sends "registerApplicationMaster" request to RM, RM > generates NMTokens for new RMAppAttempt. Those new NMTokens are transmitted > to RMCommunicator in RegisterApplicationMasterResponse > (getNMTokensFromPreviousAttempts method). But we don't handle these tokens in > RMCommunicator.register method. RM don't transmit tese tokens again for other > allocated requests, but we don't have these tokens in NMTokenCache. > Accordingly we get "No NMToken sent for node" exception. > I have found that this issue appears after changes from the > https://github.com/apache/hadoop/commit/9b272ccae78918e7d756d84920a9322187d61eed > > I tried to do the same scenario without the commit and application completed > successfully after RMAppMaster recovery -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org