[jira] [Updated] (YARN-6019) MR application fails with "No NMToken sent" exception after MRAppMaster recovery

2016-12-29 Thread Daniel Templeton (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Templeton updated YARN-6019:
---
Assignee: Aleksandr Balitsky

> MR application fails with "No NMToken sent" exception after MRAppMaster 
> recovery
> 
>
> Key: YARN-6019
> URL: https://issues.apache.org/jira/browse/YARN-6019
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, yarn
>Affects Versions: 2.7.0
> Environment: Centos 7
>Reporter: Aleksandr Balitsky
>Assignee: Aleksandr Balitsky
>Priority: Critical
> Attachments: YARN-6019.001.patch
>
>
> *Steps to reproduce:*
> 1) Submit MR application (for example PI app with 50 containers)
> 2) Find MRAppMaster process id for the application 
> 3) Kill MRAppMaster by kill -9 command
> *Expected:* ResourceManager launch new MRAppMaster container and MRAppAttempt 
> and application finish correctly
> *Actually:* After launching new MRAppMaster and MRAppAttempt the application 
> fails with the following exception:
> {noformat}
> 2016-12-22 23:17:53,929 ERROR [ContainerLauncher #9] 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container 
> launch failed for container_1482408247195_0002_02_11 : 
> org.apache.hadoop.security.token.SecretManager$InvalidToken: No NMToken sent 
> for node1:43037
>   at 
> org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.newProxy(ContainerManagementProtocolProxy.java:254)
>   at 
> org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.(ContainerManagementProtocolProxy.java:244)
>   at 
> org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy.getProxy(ContainerManagementProtocolProxy.java:129)
>   at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.getCMProxy(ContainerLauncherImpl.java:395)
>   at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:138)
>   at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:361)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}
> *Problem*:
> When RMCommunicator sends "registerApplicationMaster" request to RM, RM 
> generates NMTokens for new RMAppAttempt. Those new NMTokens are transmitted 
> to RMCommunicator in RegisterApplicationMasterResponse  
> (getNMTokensFromPreviousAttempts method). But we don't handle these tokens in 
> RMCommunicator.register method. RM don't transmit tese tokens again for other 
> allocated requests, but we don't have these tokens in NMTokenCache. 
> Accordingly we get "No NMToken sent for node" exception.
> I have found that this issue appears after changes from the 
> https://github.com/apache/hadoop/commit/9b272ccae78918e7d756d84920a9322187d61eed
>  
> I tried to do the same scenario without the commit and application completed 
> successfully after RMAppMaster recovery



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6019) MR application fails with "No NMToken sent" exception after MRAppMaster recovery

2016-12-22 Thread Aleksandr Balitsky (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksandr Balitsky updated YARN-6019:
-
Component/s: resourcemanager

> MR application fails with "No NMToken sent" exception after MRAppMaster 
> recovery
> 
>
> Key: YARN-6019
> URL: https://issues.apache.org/jira/browse/YARN-6019
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, yarn
>Affects Versions: 2.7.0
> Environment: Centos 7
>Reporter: Aleksandr Balitsky
>Priority: Critical
> Attachments: YARN-6019.001.patch
>
>
> *Steps to reproduce:*
> 1) Submit MR application (for example PI app with 50 containers)
> 2) Find MRAppMaster process id for the application 
> 3) Kill MRAppMaster by kill -9 command
> *Expected:* ResourceManager launch new MRAppMaster container and MRAppAttempt 
> and application finish correctly
> *Actually:* After launching new MRAppMaster and MRAppAttempt the application 
> fails with the following exception:
> {noformat}
> 2016-12-22 23:17:53,929 ERROR [ContainerLauncher #9] 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container 
> launch failed for container_1482408247195_0002_02_11 : 
> org.apache.hadoop.security.token.SecretManager$InvalidToken: No NMToken sent 
> for node1:43037
>   at 
> org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.newProxy(ContainerManagementProtocolProxy.java:254)
>   at 
> org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.(ContainerManagementProtocolProxy.java:244)
>   at 
> org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy.getProxy(ContainerManagementProtocolProxy.java:129)
>   at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.getCMProxy(ContainerLauncherImpl.java:395)
>   at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:138)
>   at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:361)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}
> *Problem*:
> When RMCommunicator sends "registerApplicationMaster" request to RM, RM 
> generates NMTokens for new RMAppAttempt. Those new NMTokens are transmitted 
> to RMCommunicator in RegisterApplicationMasterResponse  
> (getNMTokensFromPreviousAttempts method). But we don't handle these tokens in 
> RMCommunicator.register method. RM don't transmit tese tokens again for other 
> allocated requests, but we don't have these tokens in NMTokenCache. 
> Accordingly we get "No NMToken sent for node" exception.
> I have found that this issue appears after changes from the 
> https://github.com/apache/hadoop/commit/9b272ccae78918e7d756d84920a9322187d61eed
>  
> I tried to do the same scenario without the commit and application completed 
> successfully after RMAppMaster recovery



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6019) MR application fails with "No NMToken sent" exception after MRAppMaster recovery

2016-12-22 Thread Aleksandr Balitsky (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksandr Balitsky updated YARN-6019:
-
Attachment: YARN-6019.001.patch

> MR application fails with "No NMToken sent" exception after MRAppMaster 
> recovery
> 
>
> Key: YARN-6019
> URL: https://issues.apache.org/jira/browse/YARN-6019
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.7.0
> Environment: Centos 7
>Reporter: Aleksandr Balitsky
>Priority: Critical
> Attachments: YARN-6019.001.patch
>
>
> *Steps to reproduce:*
> 1) Submit MR application (for example PI app with 50 containers)
> 2) Find MRAppMaster process id for the application 
> 3) Kill MRAppMaster by kill -9 command
> *Expected:* ResourceManager launch new MRAppMaster container and MRAppAttempt 
> and application finish correctly
> *Actually:* After launching new MRAppMaster and MRAppAttempt the application 
> fails with the following exception:
> {noformat}
> 2016-12-22 23:17:53,929 ERROR [ContainerLauncher #9] 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container 
> launch failed for container_1482408247195_0002_02_11 : 
> org.apache.hadoop.security.token.SecretManager$InvalidToken: No NMToken sent 
> for node1:43037
>   at 
> org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.newProxy(ContainerManagementProtocolProxy.java:254)
>   at 
> org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.(ContainerManagementProtocolProxy.java:244)
>   at 
> org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy.getProxy(ContainerManagementProtocolProxy.java:129)
>   at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.getCMProxy(ContainerLauncherImpl.java:395)
>   at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:138)
>   at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:361)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}
> *Problem*:
> When RMCommunicator sends "registerApplicationMaster" request to RM, RM 
> generates NMTokens for new RMAppAttempt. Those new NMTokens are transmitted 
> to RMCommunicator in RegisterApplicationMasterResponse  
> (getNMTokensFromPreviousAttempts method). But we don't handle these tokens in 
> RMCommunicator.register method. RM don't transmit tese tokens again for other 
> allocated requests, but we don't have these tokens in NMTokenCache. 
> Accordingly we get "No NMToken sent for node" exception.
> I have found that this issue appears after changes from the 
> https://github.com/apache/hadoop/commit/9b272ccae78918e7d756d84920a9322187d61eed
>  
> I tried to do the same scenario without the commit and application completed 
> successfully after RMAppMaster recovery



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org