[jira] [Commented] (YARN-521) Augment AM - RM client module to be able to request containers only at specific locations
[ https://issues.apache.org/jira/browse/YARN-521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708266#comment-13708266 ] Bikas Saha commented on YARN-521: - Same error and it does not look like a time out. Assertion fails. Augment AM - RM client module to be able to request containers only at specific locations - Key: YARN-521 URL: https://issues.apache.org/jira/browse/YARN-521 Project: Hadoop YARN Issue Type: Sub-task Components: api Affects Versions: 2.0.3-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-521-1.patch, YARN-521-2.patch, YARN-521-2.patch, YARN-521-3.patch, YARN-521-4.patch, YARN-521-4.patch, YARN-521.patch When YARN-392 and YARN-398 are completed, it would be good for AMRMClient to offer an easy way to access their functionality -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-513) Create common proxy client for communicating with RM
[ https://issues.apache.org/jira/browse/YARN-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708267#comment-13708267 ] Bikas Saha commented on YARN-513: - Why does YARNClient need to accept the rmaddress as a parameter? The whole point of this jira is to not require anyone to supply the rm connection parameters. The RM proxy is supposed to figure it out by itself from conf. Create common proxy client for communicating with RM Key: YARN-513 URL: https://issues.apache.org/jira/browse/YARN-513 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Jian He Attachments: YARN-513.10.patch, YARN-513.11.patch, YARN-513.12.patch, YARN-513.13.patch, YARN-513.14.patch, YARN-513.1.patch, YARN-513.2.patch, YARN-513.3.patch, YARN-513.4.patch, YARN.513.5.patch, YARN-513.6.patch, YARN-513.7.patch, YARN-513.8.patch, YARN-513.9.patch When the RM is restarting, the NM, AM and Clients should wait for some time for the RM to come back up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-521) Augment AM - RM client module to be able to request containers only at specific locations
[ https://issues.apache.org/jira/browse/YARN-521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708278#comment-13708278 ] Sandy Ryza commented on YARN-521: - Yeah, I noticed. I'm pretty puzzled because I still can't reproduce it locally, but I'll look further into it tomorrow. Augment AM - RM client module to be able to request containers only at specific locations - Key: YARN-521 URL: https://issues.apache.org/jira/browse/YARN-521 Project: Hadoop YARN Issue Type: Sub-task Components: api Affects Versions: 2.0.3-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-521-1.patch, YARN-521-2.patch, YARN-521-2.patch, YARN-521-3.patch, YARN-521-4.patch, YARN-521-4.patch, YARN-521.patch When YARN-392 and YARN-398 are completed, it would be good for AMRMClient to offer an easy way to access their functionality -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-917) Job can fail when RM restarts after staging dir is cleaned but before MR successfully unregister with RM
[ https://issues.apache.org/jira/browse/YARN-917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708537#comment-13708537 ] Jason Lowe commented on YARN-917: - I think one way to solve this is to move the removal of the staging directory to *after* we unregister from the RM. Now that there's a FINISHING state that gives the app a grace period to finish cleanly, we leverage this to remove the staging directory after unregistering. This should solve some other races related to removal of the staging directory and unregistering (e.g.: AM crashes after removing staging directory but before unregistering). Job can fail when RM restarts after staging dir is cleaned but before MR successfully unregister with RM Key: YARN-917 URL: https://issues.apache.org/jira/browse/YARN-917 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Jian He Assignee: Jian He -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-909) Disable TestLinuxContainerExecutorWithMocks on Windows
[ https://issues.apache.org/jira/browse/YARN-909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chuan Liu updated YARN-909: --- Attachment: YARN-909.2.patch Thanks for the suggestion, Chris! Attaching a new patch. Disable TestLinuxContainerExecutorWithMocks on Windows -- Key: YARN-909 URL: https://issues.apache.org/jira/browse/YARN-909 Project: Hadoop YARN Issue Type: Bug Reporter: Chuan Liu Assignee: Chuan Liu Priority: Minor Attachments: YARN-909.2.patch, YARN-909.patch This unit test tests a Linux specific feature. We should skip this unit test on Windows. A similar unit test 'TestLinuxContainerExecutor' was already skipped when running on Windows. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-909) Disable TestLinuxContainerExecutorWithMocks on Windows
[ https://issues.apache.org/jira/browse/YARN-909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated YARN-909: --- Attachment: YARN-909.3.patch Actually, I was thinking we could put the {{assumeTrue}} call inside the {{Before}} method. This way, it's a very small patch. It also helps ease maintenance, because when people add new tests to this suite, they won't need to remember to call {{assumeTrue}}. I apologize if I'm not communicating this clearly. I'm attaching a patch that shows it. Sometimes code is easier. :-) Does this look good to you? Disable TestLinuxContainerExecutorWithMocks on Windows -- Key: YARN-909 URL: https://issues.apache.org/jira/browse/YARN-909 Project: Hadoop YARN Issue Type: Bug Reporter: Chuan Liu Assignee: Chuan Liu Priority: Minor Attachments: YARN-909.2.patch, YARN-909.3.patch, YARN-909.patch This unit test tests a Linux specific feature. We should skip this unit test on Windows. A similar unit test 'TestLinuxContainerExecutor' was already skipped when running on Windows. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-149) ResourceManager (RM) High-Availability (HA)
[ https://issues.apache.org/jira/browse/YARN-149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708636#comment-13708636 ] Bikas Saha commented on YARN-149: - Thanks. I looked at the draft. Will incorporate stuff from it or use it as the base directly. In general, its slightly mixing fail-over with HA. The way RM restart has been envisioned, with a good implementation, downtime due to restart should not visible to users even with what is termed as a cold restart. Finally, I differ on the wrapper implementation because of 1) extra daemon to manage because in fail-over scenarios each extra actor increases the combinatorics 2) the wrapper functionality seems to overlap the ZKFC and RM 3) RM will need to be changed to interact with the wrapper and the changes IMO should not be much different than those needed for direct ZKFC interaction 4) we will not similar to HDFS patterns and that makes the system harder to maintain and manage. In fact, what is being called as a wrapper is something that probably does wrap around core RM functionality but remains inside the RM. From what I see, it will be an impl of the HAProtocol interface around the core RM startup functionality. ResourceManager (RM) High-Availability (HA) --- Key: YARN-149 URL: https://issues.apache.org/jira/browse/YARN-149 Project: Hadoop YARN Issue Type: New Feature Reporter: Harsh J Assignee: Bikas Saha Attachments: rm-ha-phase1-approach-draft1.pdf, rm-ha-phase1-draft2.pdf This jira tracks work needed to be done to support one RM instance failing over to another RM instance so that we can have RM HA. Work includes leader election, transfer of control to leader and client re-direction to new leader. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-909) Disable TestLinuxContainerExecutorWithMocks on Windows
[ https://issues.apache.org/jira/browse/YARN-909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated YARN-909: --- Component/s: nodemanager Target Version/s: 3.0.0, 2.1.0-beta Affects Version/s: 2.1.0-beta 3.0.0 Hadoop Flags: Reviewed Thanks, Chuan! I'll commit this after Jenkins gives +1 on the latest patch. Disable TestLinuxContainerExecutorWithMocks on Windows -- Key: YARN-909 URL: https://issues.apache.org/jira/browse/YARN-909 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 3.0.0, 2.1.0-beta Reporter: Chuan Liu Assignee: Chuan Liu Priority: Minor Attachments: YARN-909.2.patch, YARN-909.3.patch, YARN-909.patch This unit test tests a Linux specific feature. We should skip this unit test on Windows. A similar unit test 'TestLinuxContainerExecutor' was already skipped when running on Windows. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-909) Disable TestLinuxContainerExecutorWithMocks on Windows
[ https://issues.apache.org/jira/browse/YARN-909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708646#comment-13708646 ] Hadoop QA commented on YARN-909: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12592350/YARN-909.3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1478//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1478//console This message is automatically generated. Disable TestLinuxContainerExecutorWithMocks on Windows -- Key: YARN-909 URL: https://issues.apache.org/jira/browse/YARN-909 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 3.0.0, 2.1.0-beta Reporter: Chuan Liu Assignee: Chuan Liu Priority: Minor Attachments: YARN-909.2.patch, YARN-909.3.patch, YARN-909.patch This unit test tests a Linux specific feature. We should skip this unit test on Windows. A similar unit test 'TestLinuxContainerExecutor' was already skipped when running on Windows. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-521) Augment AM - RM client module to be able to request containers only at specific locations
[ https://issues.apache.org/jira/browse/YARN-521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708652#comment-13708652 ] Sandy Ryza commented on YARN-521: - Looks like Eclipse wasn't failing on the non-JUnit asserts. Uploading a new patch that fixes the issue. Augment AM - RM client module to be able to request containers only at specific locations - Key: YARN-521 URL: https://issues.apache.org/jira/browse/YARN-521 Project: Hadoop YARN Issue Type: Sub-task Components: api Affects Versions: 2.0.3-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-521-1.patch, YARN-521-2.patch, YARN-521-2.patch, YARN-521-3.patch, YARN-521-4.patch, YARN-521-4.patch, YARN-521.patch When YARN-392 and YARN-398 are completed, it would be good for AMRMClient to offer an easy way to access their functionality -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-521) Augment AM - RM client module to be able to request containers only at specific locations
[ https://issues.apache.org/jira/browse/YARN-521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708663#comment-13708663 ] Bikas Saha commented on YARN-521: - Diff is empty for the last 2 patches. Can you please check? Also, probably needs rebase after YARN-654 that just went in. Augment AM - RM client module to be able to request containers only at specific locations - Key: YARN-521 URL: https://issues.apache.org/jira/browse/YARN-521 Project: Hadoop YARN Issue Type: Sub-task Components: api Affects Versions: 2.0.3-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-521-1.patch, YARN-521-2.patch, YARN-521-2.patch, YARN-521-3.patch, YARN-521-4.patch, YARN-521-4.patch, YARN-521.patch When YARN-392 and YARN-398 are completed, it would be good for AMRMClient to offer an easy way to access their functionality -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-924) TestNMClient.testNMClientNoCleanupOnStop frequently failing due to timeout
Bikas Saha created YARN-924: --- Summary: TestNMClient.testNMClientNoCleanupOnStop frequently failing due to timeout Key: YARN-924 URL: https://issues.apache.org/jira/browse/YARN-924 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.0-beta Reporter: Bikas Saha Assignee: Zhijie Shen Error Message test timed out after 18 milliseconds Stacktrace java.lang.Exception: test timed out after 18 milliseconds at org.apache.maven.surefire.report.ConsoleOutputCapture$ForwardingPrintStream.println(ConsoleOutputCapture.java:87) at java.lang.Throwable.printStackTrace(Throwable.java:464) at java.lang.Throwable.printStackTrace(Throwable.java:451) at org.apache.hadoop.yarn.client.api.impl.TestNMClient.testGetContainerStatus(TestNMClient.java:349) at org.apache.hadoop.yarn.client.api.impl.TestNMClient.testContainerManagement(TestNMClient.java:317) at org.apache.hadoop.yarn.client.api.impl.TestNMClient.testNMClientNoCleanupOnStop(TestNMClient.java:182) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-521) Augment AM - RM client module to be able to request containers only at specific locations
[ https://issues.apache.org/jira/browse/YARN-521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708676#comment-13708676 ] Hadoop QA commented on YARN-521: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12592356/YARN-521-5.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1479//console This message is automatically generated. Augment AM - RM client module to be able to request containers only at specific locations - Key: YARN-521 URL: https://issues.apache.org/jira/browse/YARN-521 Project: Hadoop YARN Issue Type: Sub-task Components: api Affects Versions: 2.0.3-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-521-1.patch, YARN-521-2.patch, YARN-521-2.patch, YARN-521-3.patch, YARN-521-4.patch, YARN-521-4.patch, YARN-521-5.patch, YARN-521.patch When YARN-392 and YARN-398 are completed, it would be good for AMRMClient to offer an easy way to access their functionality -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-654) AMRMClient: Perform sanity checks for parameters of public methods
[ https://issues.apache.org/jira/browse/YARN-654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708680#comment-13708680 ] Hudson commented on YARN-654: - SUCCESS: Integrated in Hadoop-trunk-Commit #4082 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4082/]) YARN-654. AMRMClient: Perform sanity checks for parameters of public methods (Xuan Gong via bikas) (bikas: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1503353) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/AMRMClient.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/AMRMClientImpl.java AMRMClient: Perform sanity checks for parameters of public methods -- Key: YARN-654 URL: https://issues.apache.org/jira/browse/YARN-654 Project: Hadoop YARN Issue Type: Bug Reporter: Bikas Saha Assignee: Xuan Gong Fix For: 2.1.0-beta Attachments: YARN-654.1.patch, YARN-654.2.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-661) NM fails to cleanup local directories for users
[ https://issues.apache.org/jira/browse/YARN-661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Omkar Vinit Joshi updated YARN-661: --- Attachment: YARN-661-20130715.1.patch NM fails to cleanup local directories for users --- Key: YARN-661 URL: https://issues.apache.org/jira/browse/YARN-661 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.1.0-beta, 0.23.8 Reporter: Jason Lowe Assignee: Omkar Vinit Joshi Attachments: YARN-661-20130701.patch, YARN-661-20130708.patch, YARN-661-20130710.1.patch, YARN-661-20130711.1.patch, YARN-661-20130712.1.patch, YARN-661-20130715.1.patch YARN-71 added deletion of local directories on startup, but in practice it fails to delete the directories because of permission problems. The top-level usercache directory is owned by the user but is in a directory that is not writable by the user. Therefore the deletion of the user's usercache directory, as the user, fails due to lack of permissions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-909) Disable TestLinuxContainerExecutorWithMocks on Windows
[ https://issues.apache.org/jira/browse/YARN-909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708689#comment-13708689 ] Hudson commented on YARN-909: - SUCCESS: Integrated in Hadoop-trunk-Commit #4083 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4083/]) YARN-909. Disable TestLinuxContainerExecutorWithMocks on Windows. Contributed by Chuan Liu. (cnauroth: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1503357) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLinuxContainerExecutorWithMocks.java Disable TestLinuxContainerExecutorWithMocks on Windows -- Key: YARN-909 URL: https://issues.apache.org/jira/browse/YARN-909 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 3.0.0, 2.1.0-beta Reporter: Chuan Liu Assignee: Chuan Liu Priority: Minor Attachments: YARN-909.2.patch, YARN-909.3.patch, YARN-909.patch This unit test tests a Linux specific feature. We should skip this unit test on Windows. A similar unit test 'TestLinuxContainerExecutor' was already skipped when running on Windows. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-513) Create common proxy client for communicating with RM
[ https://issues.apache.org/jira/browse/YARN-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708696#comment-13708696 ] Jian He commented on YARN-513: -- bq. Why does YARNClient need to accept the rmaddress as a parameter? The whole point of this jira is to not require anyone to supply the rm connection parameters. The RM proxy is supposed to figure it out by itself from conf. The original code is accepting rmaddress as a parameter, should we remove that? Create common proxy client for communicating with RM Key: YARN-513 URL: https://issues.apache.org/jira/browse/YARN-513 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Jian He Attachments: YARN-513.10.patch, YARN-513.11.patch, YARN-513.12.patch, YARN-513.13.patch, YARN-513.14.patch, YARN-513.1.patch, YARN-513.2.patch, YARN-513.3.patch, YARN-513.4.patch, YARN.513.5.patch, YARN-513.6.patch, YARN-513.7.patch, YARN-513.8.patch, YARN-513.9.patch When the RM is restarting, the NM, AM and Clients should wait for some time for the RM to come back up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-521) Augment AM - RM client module to be able to request containers only at specific locations
[ https://issues.apache.org/jira/browse/YARN-521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-521: Attachment: YARN-521-6.patch Augment AM - RM client module to be able to request containers only at specific locations - Key: YARN-521 URL: https://issues.apache.org/jira/browse/YARN-521 Project: Hadoop YARN Issue Type: Sub-task Components: api Affects Versions: 2.0.3-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-521-1.patch, YARN-521-2.patch, YARN-521-2.patch, YARN-521-3.patch, YARN-521-4.patch, YARN-521-4.patch, YARN-521-5.patch, YARN-521-6.patch, YARN-521.patch When YARN-392 and YARN-398 are completed, it would be good for AMRMClient to offer an easy way to access their functionality -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-521) Augment AM - RM client module to be able to request containers only at specific locations
[ https://issues.apache.org/jira/browse/YARN-521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708707#comment-13708707 ] Sandy Ryza commented on YARN-521: - (The diff you were comparing was before I had uploaded the new patch. Latest diff includes test fix and rebase.) Augment AM - RM client module to be able to request containers only at specific locations - Key: YARN-521 URL: https://issues.apache.org/jira/browse/YARN-521 Project: Hadoop YARN Issue Type: Sub-task Components: api Affects Versions: 2.0.3-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-521-1.patch, YARN-521-2.patch, YARN-521-2.patch, YARN-521-3.patch, YARN-521-4.patch, YARN-521-4.patch, YARN-521-5.patch, YARN-521-6.patch, YARN-521.patch When YARN-392 and YARN-398 are completed, it would be good for AMRMClient to offer an easy way to access their functionality -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-149) ResourceManager (RM) High-Availability (HA)
[ https://issues.apache.org/jira/browse/YARN-149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708732#comment-13708732 ] Karthik Kambatla commented on YARN-149: --- Thanks Bikas. bq. 1) extra daemon to manage because in fail-over scenarios each extra actor increases the combinatorics The wrapper is not an extra daemon. There will be a single daemon for the wrapper/RM. In the cold standby case, the wrapper starts the RM instance when it becomes active. bq. 2) the wrapper functionality seems to overlap the ZKFC and RM The wrapper *interacts* with the ZKFC and RM. bq. 3) RM will need to be changed to interact with the wrapper and the changes IMO should not be much different than those needed for direct ZKFC interaction Mostly agree with you here. I believe it boils down to the following: what state machine to incorporate the HA logic into. The wrapper approach essentially proposes two state machines - one for the core RM and one for the HA logic. Integrating the HA logic into the current RM will be adding more states to the current RM. There are (dis)advantages to both: the wrapper approach shouldn't affect non-HA instances, and might help with earlier adoption by major YARN users like Yahoo! bq. In fact, what is being called as a wrapper is something that probably does wrap around core RM functionality but remains inside the RM. From what I see, it will be an impl of the HAProtocol interface around the core RM startup functionality. Looks like a promising approach. Let me take a closer look at the code and comment. ResourceManager (RM) High-Availability (HA) --- Key: YARN-149 URL: https://issues.apache.org/jira/browse/YARN-149 Project: Hadoop YARN Issue Type: New Feature Reporter: Harsh J Assignee: Bikas Saha Attachments: rm-ha-phase1-approach-draft1.pdf, rm-ha-phase1-draft2.pdf This jira tracks work needed to be done to support one RM instance failing over to another RM instance so that we can have RM HA. Work includes leader election, transfer of control to leader and client re-direction to new leader. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-353) Add Zookeeper-based store implementation for RMStateStore
[ https://issues.apache.org/jira/browse/YARN-353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708734#comment-13708734 ] Jian He commented on YARN-353: -- Thanks for the review, Karthik bq.YarnConfiguration: how about creating a common prefix for all of zk-state-store related parameters? bq.Make the ZKRMStateStore#NUM_RETRIES configurable with default set to 3. bq.ZKRMStateStore#getNewZooKeeper need not be synchronized fixed bq.Might be cleaner to move zkDoWithRetries to ZkAction we can implement no-retry functionalities with ZkAction if separate zkDoWithRetries out of ZkAction. same reason for 6 New patch also added test case for ZKClient disconnect and reconnect logic. Add Zookeeper-based store implementation for RMStateStore - Key: YARN-353 URL: https://issues.apache.org/jira/browse/YARN-353 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Hitesh Shah Assignee: Bikas Saha Attachments: YARN-353.1.patch, YARN-353.2.patch, YARN-353.3.patch, YARN-353.4.patch, YARN-353.5.patch, YARN-353.6.patch Add store that write RM state data to ZK -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-353) Add Zookeeper-based store implementation for RMStateStore
[ https://issues.apache.org/jira/browse/YARN-353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-353: - Attachment: YARN-353.6.patch Add Zookeeper-based store implementation for RMStateStore - Key: YARN-353 URL: https://issues.apache.org/jira/browse/YARN-353 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Hitesh Shah Assignee: Bikas Saha Attachments: YARN-353.1.patch, YARN-353.2.patch, YARN-353.3.patch, YARN-353.4.patch, YARN-353.5.patch, YARN-353.6.patch Add store that write RM state data to ZK -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-661) NM fails to cleanup local directories for users
[ https://issues.apache.org/jira/browse/YARN-661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708735#comment-13708735 ] Hadoop QA commented on YARN-661: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12592359/YARN-661-20130715.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1480//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1480//console This message is automatically generated. NM fails to cleanup local directories for users --- Key: YARN-661 URL: https://issues.apache.org/jira/browse/YARN-661 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.1.0-beta, 0.23.8 Reporter: Jason Lowe Assignee: Omkar Vinit Joshi Attachments: YARN-661-20130701.patch, YARN-661-20130708.patch, YARN-661-20130710.1.patch, YARN-661-20130711.1.patch, YARN-661-20130712.1.patch, YARN-661-20130715.1.patch YARN-71 added deletion of local directories on startup, but in practice it fails to delete the directories because of permission problems. The top-level usercache directory is owned by the user but is in a directory that is not writable by the user. Therefore the deletion of the user's usercache directory, as the user, fails due to lack of permissions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-521) Augment AM - RM client module to be able to request containers only at specific locations
[ https://issues.apache.org/jira/browse/YARN-521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708750#comment-13708750 ] Hadoop QA commented on YARN-521: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12592362/YARN-521-6.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1481//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1481//console This message is automatically generated. Augment AM - RM client module to be able to request containers only at specific locations - Key: YARN-521 URL: https://issues.apache.org/jira/browse/YARN-521 Project: Hadoop YARN Issue Type: Sub-task Components: api Affects Versions: 2.0.3-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-521-1.patch, YARN-521-2.patch, YARN-521-2.patch, YARN-521-3.patch, YARN-521-4.patch, YARN-521-4.patch, YARN-521-5.patch, YARN-521-6.patch, YARN-521.patch When YARN-392 and YARN-398 are completed, it would be good for AMRMClient to offer an easy way to access their functionality -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-903) DistributedShell throwing Errors in logs after successfull completion
[ https://issues.apache.org/jira/browse/YARN-903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708764#comment-13708764 ] Omkar Vinit Joshi commented on YARN-903: planning to fix this.. I am planning to remember completed containers (only id) at node manager for predefined time (10min). Does that time sounds reasonable or we should make it configurable? ... I don't really think adding a new configuration parameter will be a good idea but I am open for any different approach / adding conf .. thoughts? This will have a similar implementation like YARN-62 but only difference is that YARN-62 only tracks container for a time after it starts to avoid duplicate launch..where as this tries to avoid logging errors for valid stop attempts... DistributedShell throwing Errors in logs after successfull completion - Key: YARN-903 URL: https://issues.apache.org/jira/browse/YARN-903 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Affects Versions: 2.0.4-alpha Environment: Ununtu 11.10 Reporter: Abhishek Kapoor Assignee: Omkar Vinit Joshi Attachments: AppMaster.stderr, yarn-sunny-nodemanager-sunny-Inspiron.log I have tried running DistributedShell and also used ApplicationMaster of the same for my test. The application is successfully running through logging some errors which would be useful to fix. Below are the logs from NodeManager and ApplicationMasterode Log Snippet for NodeManager = 2013-07-07 13:39:18,787 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Connecting to ResourceManager at localhost/127.0.0.1:9990. current no. of attempts is 1 2013-07-07 13:39:19,050 INFO org.apache.hadoop.yarn.server.nodemanager.security.NMContainerTokenSecretManager: Rolling master-key for container-tokens, got key with id -325382586 2013-07-07 13:39:19,052 INFO org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerInNM: Rolling master-key for nm-tokens, got key with id :1005046570 2013-07-07 13:39:19,053 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Registered with ResourceManager as sunny-Inspiron:9993 with total resource of memory:10240, vCores:8 2013-07-07 13:39:19,053 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Notifying ContainerManager to unblock new container-requests 2013-07-07 13:39:35,256 INFO SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for appattempt_1373184544832_0001_01 (auth:SIMPLE) 2013-07-07 13:39:35,492 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Start request for container_1373184544832_0001_01_01 by user sunny 2013-07-07 13:39:35,507 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Creating a new application reference for app application_1373184544832_0001 2013-07-07 13:39:35,511 INFO org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=sunny IP=127.0.0.1OPERATION=Start Container Request TARGET=ContainerManageImpl RESULT=SUCCESS APPID=application_1373184544832_0001 CONTAINERID=container_1373184544832_0001_01_01 2013-07-07 13:39:35,511 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Application application_1373184544832_0001 transitioned from NEW to INITING 2013-07-07 13:39:35,512 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Adding container_1373184544832_0001_01_01 to application application_1373184544832_0001 2013-07-07 13:39:35,518 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Application application_1373184544832_0001 transitioned from INITING to RUNNING 2013-07-07 13:39:35,528 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1373184544832_0001_01_01 transitioned from NEW to LOCALIZING 2013-07-07 13:39:35,540 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource hdfs://localhost:9000/application/test.jar transitioned from INIT to DOWNLOADING 2013-07-07 13:39:35,540 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Created localizer for container_1373184544832_0001_01_01 2013-07-07 13:39:35,675 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Writing credentials to the nmPrivate file
[jira] [Commented] (YARN-744) Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated.
[ https://issues.apache.org/jira/browse/YARN-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708761#comment-13708761 ] Zhijie Shen commented on YARN-744: -- The passed in appAttemptId for an app currently seems to be the same object, such that it can be used to for synchronized blocks, but I agree with the idea of wrapper, because it is more predictable and stand-alone in ApplicationMasterService. BTW, is it convenient to write a test case for concurrent allocation? Like TestClientRMService#testConcurrentAppSubmit. Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated. - Key: YARN-744 URL: https://issues.apache.org/jira/browse/YARN-744 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Bikas Saha Assignee: Omkar Vinit Joshi Attachments: MAPREDUCE-3899-branch-0.23.patch, YARN-744-20130711.1.patch, YARN-744.patch Looks like the lock taken in this is broken. It takes a lock on lastResponse object and then puts a new lastResponse object into the map. At this point a new thread entering this function will get a new lastResponse object and will be able to take its lock and enter the critical section. Presumably we want to limit one response per app attempt. So the lock could be taken on the ApplicationAttemptId key of the response map object. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-353) Add Zookeeper-based store implementation for RMStateStore
[ https://issues.apache.org/jira/browse/YARN-353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708796#comment-13708796 ] Hadoop QA commented on YARN-353: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12592363/YARN-353.6.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 3 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1482//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/1482//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1482//console This message is automatically generated. Add Zookeeper-based store implementation for RMStateStore - Key: YARN-353 URL: https://issues.apache.org/jira/browse/YARN-353 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Hitesh Shah Assignee: Bikas Saha Attachments: YARN-353.1.patch, YARN-353.2.patch, YARN-353.3.patch, YARN-353.4.patch, YARN-353.5.patch, YARN-353.6.patch Add store that write RM state data to ZK -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (YARN-924) TestNMClient.testNMClientNoCleanupOnStop frequently failing due to timeout
[ https://issues.apache.org/jira/browse/YARN-924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen resolved YARN-924. -- Resolution: Duplicate It is duplicate with YARN-906. TestNMClient.testNMClientNoCleanupOnStop frequently failing due to timeout -- Key: YARN-924 URL: https://issues.apache.org/jira/browse/YARN-924 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.0-beta Reporter: Bikas Saha Assignee: Zhijie Shen Error Message test timed out after 18 milliseconds Stacktrace java.lang.Exception: test timed out after 18 milliseconds at org.apache.maven.surefire.report.ConsoleOutputCapture$ForwardingPrintStream.println(ConsoleOutputCapture.java:87) at java.lang.Throwable.printStackTrace(Throwable.java:464) at java.lang.Throwable.printStackTrace(Throwable.java:451) at org.apache.hadoop.yarn.client.api.impl.TestNMClient.testGetContainerStatus(TestNMClient.java:349) at org.apache.hadoop.yarn.client.api.impl.TestNMClient.testContainerManagement(TestNMClient.java:317) at org.apache.hadoop.yarn.client.api.impl.TestNMClient.testNMClientNoCleanupOnStop(TestNMClient.java:182) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-919) Setting default heap sizes in yarn env
[ https://issues.apache.org/jira/browse/YARN-919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708997#comment-13708997 ] Mayank Bansal commented on YARN-919: Thanks [~hitesh] for comments The primary intention for this JIRA is to provide some handy way for users to set the parameters. I think you are right bin/yarn already set the defaults which I can remove from this patch , however We need something like this in yarn-env.sh may be commented for now by that user doesnt have to dig around the documentation to increase the sizes for memory. thoughts? Thanks, Mayank Setting default heap sizes in yarn env -- Key: YARN-919 URL: https://issues.apache.org/jira/browse/YARN-919 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.0-beta Reporter: Mayank Bansal Assignee: Mayank Bansal Priority: Minor Attachments: YARN-919-trunk-1.patch Right now there are no defaults in yarn env scripts for resource manager nad node manager and if user wants to override that, then user has to go to documentation and find the variables and change the script. There is no straight forward way to change it in script. Just updating the variables with defaults. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-353) Add Zookeeper-based store implementation for RMStateStore
[ https://issues.apache.org/jira/browse/YARN-353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-353: - Attachment: YARN-353.7.patch Add Zookeeper-based store implementation for RMStateStore - Key: YARN-353 URL: https://issues.apache.org/jira/browse/YARN-353 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Hitesh Shah Assignee: Bikas Saha Attachments: YARN-353.1.patch, YARN-353.2.patch, YARN-353.3.patch, YARN-353.4.patch, YARN-353.5.patch, YARN-353.6.patch, YARN-353.7.patch Add store that write RM state data to ZK -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-321) Generic application history service
[ https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709030#comment-13709030 ] Mayank Bansal commented on YARN-321: Overall Looks good, However some points to consider ResourceManager will push the data to HistoryStorage after an application finishes in a separate thread. Is it per application or only one thread in RM? Isn't it be a good idea that as soon as application starts we send the information to AHS and let AHS write all the data published by RM for that application. In that case it would be very less overhead for RM. What about in the cases where RM restart or crashes in those cases RM has to republish all the running applications to AHS or just forget about the previous running apps? Right now its not clear what needs to be done for log aggregation? Generic application history service --- Key: YARN-321 URL: https://issues.apache.org/jira/browse/YARN-321 Project: Hadoop YARN Issue Type: Improvement Reporter: Luke Lu Assignee: Vinod Kumar Vavilapalli The mapreduce job history server currently needs to be deployed as a trusted server in sync with the mapreduce runtime. Every new application would need a similar application history server. Having to deploy O(T*V) (where T is number of type of application, V is number of version of application) trusted servers is clearly not scalable. Job history storage handling itself is pretty generic: move the logs and history data into a particular directory for later serving. Job history data is already stored as json (or binary avro). I propose that we create only one trusted application history server, which can have a generic UI (display json as a tree of strings) as well. Specific application/version can deploy untrusted webapps (a la AMs) to query the application history server and interpret the json for its specific UI and/or analytics. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-925) Interface of HistoryStorage for AHS
Mayank Bansal created YARN-925: -- Summary: Interface of HistoryStorage for AHS Key: YARN-925 URL: https://issues.apache.org/jira/browse/YARN-925 Project: Hadoop YARN Issue Type: Sub-task Reporter: Mayank Bansal Assignee: Mayank Bansal -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-353) Add Zookeeper-based store implementation for RMStateStore
[ https://issues.apache.org/jira/browse/YARN-353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709060#comment-13709060 ] Hadoop QA commented on YARN-353: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12592413/YARN-353.7.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.recovery.TestRMStateStore {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1484//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/1484//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1484//console This message is automatically generated. Add Zookeeper-based store implementation for RMStateStore - Key: YARN-353 URL: https://issues.apache.org/jira/browse/YARN-353 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Hitesh Shah Assignee: Bikas Saha Attachments: YARN-353.1.patch, YARN-353.2.patch, YARN-353.3.patch, YARN-353.4.patch, YARN-353.5.patch, YARN-353.6.patch, YARN-353.7.patch Add store that write RM state data to ZK -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-513) Create common proxy client for communicating with RM
[ https://issues.apache.org/jira/browse/YARN-513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-513: - Attachment: YARN-513.15.patch Create common proxy client for communicating with RM Key: YARN-513 URL: https://issues.apache.org/jira/browse/YARN-513 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Jian He Attachments: YARN-513.10.patch, YARN-513.11.patch, YARN-513.12.patch, YARN-513.13.patch, YARN-513.14.patch, YARN-513.15.patch, YARN-513.1.patch, YARN-513.2.patch, YARN-513.3.patch, YARN-513.4.patch, YARN.513.5.patch, YARN-513.6.patch, YARN-513.7.patch, YARN-513.8.patch, YARN-513.9.patch When the RM is restarting, the NM, AM and Clients should wait for some time for the RM to come back up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-513) Create common proxy client for communicating with RM
[ https://issues.apache.org/jira/browse/YARN-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709063#comment-13709063 ] Jian He commented on YARN-513: -- New patch removed rmaddress as a parameter from YARNClient Create common proxy client for communicating with RM Key: YARN-513 URL: https://issues.apache.org/jira/browse/YARN-513 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Jian He Attachments: YARN-513.10.patch, YARN-513.11.patch, YARN-513.12.patch, YARN-513.13.patch, YARN-513.14.patch, YARN-513.15.patch, YARN-513.1.patch, YARN-513.2.patch, YARN-513.3.patch, YARN-513.4.patch, YARN.513.5.patch, YARN-513.6.patch, YARN-513.7.patch, YARN-513.8.patch, YARN-513.9.patch When the RM is restarting, the NM, AM and Clients should wait for some time for the RM to come back up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-744) Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated.
[ https://issues.apache.org/jira/browse/YARN-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709064#comment-13709064 ] Omkar Vinit Joshi commented on YARN-744: bq. BTW, is it convenient to write a test case for concurrent allocation? Like TestClientRMService#testConcurrentAppSubmit. yeah wrote one... bq. The passed in appAttemptId for an app currently seems to be the same object, such that it can be used to for synchronized blocks, but I agree with the idea of wrapper, because it is more predictable and stand-alone in ApplicationMasterService. locking on appAttemptId in case of allocate / RegisterApplicationMaster call won't work. They are coming from client...can't guarantee that they are identical in terms grabbing a lock.. thoughts? Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated. - Key: YARN-744 URL: https://issues.apache.org/jira/browse/YARN-744 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Bikas Saha Assignee: Omkar Vinit Joshi Attachments: MAPREDUCE-3899-branch-0.23.patch, YARN-744-20130711.1.patch, YARN-744.patch Looks like the lock taken in this is broken. It takes a lock on lastResponse object and then puts a new lastResponse object into the map. At this point a new thread entering this function will get a new lastResponse object and will be able to take its lock and enter the critical section. Presumably we want to limit one response per app attempt. So the lock could be taken on the ApplicationAttemptId key of the response map object. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-914) Support graceful decommission of nodemanager
[ https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709067#comment-13709067 ] Aaron T. Myers commented on YARN-914: - Should we perhaps do an s/NN/NM/g in the description of this JIRA? Or does this have something to do with the Name Node and I'm completely missing it? Support graceful decommission of nodemanager Key: YARN-914 URL: https://issues.apache.org/jira/browse/YARN-914 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.0.4-alpha Reporter: Luke Lu Assignee: Junping Du When NNs are decommissioned for non-fault reasons (capacity change etc.), it's desirable to minimize the impact to running applications. Currently if a NN is decommissioned, all running containers on the NN need to be rescheduled on other NNs. Further more, for finished map tasks, if their map output are not fetched by the reducers of the job, these map tasks will need to be rerun as well. We propose to introduce a mechanism to optionally gracefully decommission a node manager. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-744) Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated.
[ https://issues.apache.org/jira/browse/YARN-744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Omkar Vinit Joshi updated YARN-744: --- Attachment: YARN-744-20130715.1.patch Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated. - Key: YARN-744 URL: https://issues.apache.org/jira/browse/YARN-744 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Bikas Saha Assignee: Omkar Vinit Joshi Attachments: MAPREDUCE-3899-branch-0.23.patch, YARN-744-20130711.1.patch, YARN-744-20130715.1.patch, YARN-744.patch Looks like the lock taken in this is broken. It takes a lock on lastResponse object and then puts a new lastResponse object into the map. At this point a new thread entering this function will get a new lastResponse object and will be able to take its lock and enter the critical section. Presumably we want to limit one response per app attempt. So the lock could be taken on the ApplicationAttemptId key of the response map object. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-744) Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated.
[ https://issues.apache.org/jira/browse/YARN-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709072#comment-13709072 ] Zhijie Shen commented on YARN-744: -- bq. locking on appAttemptId in case of allocate / RegisterApplicationMaster call won't work. They are coming from client...can't guarantee that they are identical in terms grabbing a lock.. thoughts? I meant that AMRMClient uses the same appAttemptId, but the uniqueness is not guaranteed, so I agreed with the self-contained locker - wrapper. Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated. - Key: YARN-744 URL: https://issues.apache.org/jira/browse/YARN-744 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Bikas Saha Assignee: Omkar Vinit Joshi Attachments: MAPREDUCE-3899-branch-0.23.patch, YARN-744-20130711.1.patch, YARN-744-20130715.1.patch, YARN-744.patch Looks like the lock taken in this is broken. It takes a lock on lastResponse object and then puts a new lastResponse object into the map. At this point a new thread entering this function will get a new lastResponse object and will be able to take its lock and enter the critical section. Presumably we want to limit one response per app attempt. So the lock could be taken on the ApplicationAttemptId key of the response map object. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-919) Setting default heap sizes in yarn env
[ https://issues.apache.org/jira/browse/YARN-919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709088#comment-13709088 ] Hitesh Shah commented on YARN-919: -- [~mayank_bansal] Yes, that sounds good. Something like: export YARN_RESOURCEMANAGER_HEAPSIZE=${YARN_RESOURCEMANAGER_HEAPSIZE:-default/recommended value} export YARN_RESOURCEMANAGER_OPTS=${YARN_RESOURCEMANAGER_OPTS:-default/recommended settings} should be enough to account for docs. yarn-env.sh is expected to be overwritten by the user in any case as part of a deployment. Setting default heap sizes in yarn env -- Key: YARN-919 URL: https://issues.apache.org/jira/browse/YARN-919 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.0-beta Reporter: Mayank Bansal Assignee: Mayank Bansal Priority: Minor Attachments: YARN-919-trunk-1.patch Right now there are no defaults in yarn env scripts for resource manager nad node manager and if user wants to override that, then user has to go to documentation and find the variables and change the script. There is no straight forward way to change it in script. Just updating the variables with defaults. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-744) Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated.
[ https://issues.apache.org/jira/browse/YARN-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709091#comment-13709091 ] Bikas Saha commented on YARN-744: - Why do we need a wrapper? We should not be locking on the app attempt id. We should try to find some internal RM object thats unique for the app attempt and lock on that. Also avoid locking the RMAttempImpl object itself since it will block internal async dispatcher. Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated. - Key: YARN-744 URL: https://issues.apache.org/jira/browse/YARN-744 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Bikas Saha Assignee: Omkar Vinit Joshi Attachments: MAPREDUCE-3899-branch-0.23.patch, YARN-744-20130711.1.patch, YARN-744-20130715.1.patch, YARN-744.patch Looks like the lock taken in this is broken. It takes a lock on lastResponse object and then puts a new lastResponse object into the map. At this point a new thread entering this function will get a new lastResponse object and will be able to take its lock and enter the critical section. Presumably we want to limit one response per app attempt. So the lock could be taken on the ApplicationAttemptId key of the response map object. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-744) Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated.
[ https://issues.apache.org/jira/browse/YARN-744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha updated YARN-744: Priority: Minor (was: Major) Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated. - Key: YARN-744 URL: https://issues.apache.org/jira/browse/YARN-744 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Bikas Saha Assignee: Omkar Vinit Joshi Priority: Minor Attachments: MAPREDUCE-3899-branch-0.23.patch, YARN-744-20130711.1.patch, YARN-744-20130715.1.patch, YARN-744.patch Looks like the lock taken in this is broken. It takes a lock on lastResponse object and then puts a new lastResponse object into the map. At this point a new thread entering this function will get a new lastResponse object and will be able to take its lock and enter the critical section. Presumably we want to limit one response per app attempt. So the lock could be taken on the ApplicationAttemptId key of the response map object. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-744) Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated.
[ https://issues.apache.org/jira/browse/YARN-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709096#comment-13709096 ] Bikas Saha commented on YARN-744: - btw. it does not look like this is a practical problem. Until we start seeing a few instances of this happening we should probably lower the priority of this jira. I will do that now. Please change it if you think otherwise. A bug that does not manifest itself is not a bug :P Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated. - Key: YARN-744 URL: https://issues.apache.org/jira/browse/YARN-744 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Bikas Saha Assignee: Omkar Vinit Joshi Attachments: MAPREDUCE-3899-branch-0.23.patch, YARN-744-20130711.1.patch, YARN-744-20130715.1.patch, YARN-744.patch Looks like the lock taken in this is broken. It takes a lock on lastResponse object and then puts a new lastResponse object into the map. At this point a new thread entering this function will get a new lastResponse object and will be able to take its lock and enter the critical section. Presumably we want to limit one response per app attempt. So the lock could be taken on the ApplicationAttemptId key of the response map object. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-321) Generic application history service
[ https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709098#comment-13709098 ] Zhijie Shen commented on YARN-321: -- bq. Is it per application or only one thread in RM? I think it should be one thread in RM. bq. Isn't it be a good idea that as soon as application starts we send the information to AHS and let AHS write all the data published by RM for that application. I'm afraid a number of metrics cannot be determined when an application has just been started, such as the finish time and the final status. Generic application history service --- Key: YARN-321 URL: https://issues.apache.org/jira/browse/YARN-321 Project: Hadoop YARN Issue Type: Improvement Reporter: Luke Lu Assignee: Vinod Kumar Vavilapalli The mapreduce job history server currently needs to be deployed as a trusted server in sync with the mapreduce runtime. Every new application would need a similar application history server. Having to deploy O(T*V) (where T is number of type of application, V is number of version of application) trusted servers is clearly not scalable. Job history storage handling itself is pretty generic: move the logs and history data into a particular directory for later serving. Job history data is already stored as json (or binary avro). I propose that we create only one trusted application history server, which can have a generic UI (display json as a tree of strings) as well. Specific application/version can deploy untrusted webapps (a la AMs) to query the application history server and interpret the json for its specific UI and/or analytics. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-513) Create common proxy client for communicating with RM
[ https://issues.apache.org/jira/browse/YARN-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709113#comment-13709113 ] Hadoop QA commented on YARN-513: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12592424/YARN-513.15.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1486//console This message is automatically generated. Create common proxy client for communicating with RM Key: YARN-513 URL: https://issues.apache.org/jira/browse/YARN-513 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Jian He Attachments: YARN-513.10.patch, YARN-513.11.patch, YARN-513.12.patch, YARN-513.13.patch, YARN-513.14.patch, YARN-513.15.patch, YARN-513.1.patch, YARN-513.2.patch, YARN-513.3.patch, YARN-513.4.patch, YARN.513.5.patch, YARN-513.6.patch, YARN-513.7.patch, YARN-513.8.patch, YARN-513.9.patch When the RM is restarting, the NM, AM and Clients should wait for some time for the RM to come back up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-914) Support graceful decommission of nodemanager
[ https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luke Lu updated YARN-914: - Description: When NNs are decommissioned for non-fault reasons (capacity change etc.), it's desirable to minimize the impact to running applications. Currently if a NM is decommissioned, all running containers on the NM need to be rescheduled on other NMs. Further more, for finished map tasks, if their map output are not fetched by the reducers of the job, these map tasks will need to be rerun as well. We propose to introduce a mechanism to optionally gracefully decommission a node manager. was: When NNs are decommissioned for non-fault reasons (capacity change etc.), it's desirable to minimize the impact to running applications. Currently if a NN is decommissioned, all running containers on the NN need to be rescheduled on other NNs. Further more, for finished map tasks, if their map output are not fetched by the reducers of the job, these map tasks will need to be rerun as well. We propose to introduce a mechanism to optionally gracefully decommission a node manager. Support graceful decommission of nodemanager Key: YARN-914 URL: https://issues.apache.org/jira/browse/YARN-914 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.0.4-alpha Reporter: Luke Lu Assignee: Junping Du When NNs are decommissioned for non-fault reasons (capacity change etc.), it's desirable to minimize the impact to running applications. Currently if a NM is decommissioned, all running containers on the NM need to be rescheduled on other NMs. Further more, for finished map tasks, if their map output are not fetched by the reducers of the job, these map tasks will need to be rerun as well. We propose to introduce a mechanism to optionally gracefully decommission a node manager. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-918) ApplicationMasterProtocol doesn't need ApplicationAttemptId in the payload after YARN-701
[ https://issues.apache.org/jira/browse/YARN-918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-918: - Attachment: YARN-918-20130715.txt Here's a patch that works. Needs to be applied on top of YARN-701. It simply removes the ApplicationAttemptId from the protocol request objects. ApplicationMasterProtocol doesn't need ApplicationAttemptId in the payload after YARN-701 - Key: YARN-918 URL: https://issues.apache.org/jira/browse/YARN-918 Project: Hadoop YARN Issue Type: Bug Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Priority: Blocker Attachments: YARN-918-20130715.txt Once we use AMRMToken irrespective of kerberos after YARN-701, we don't need ApplicationAttemptId in the RPC pay load. This is an API change, so doing it as a blocker for 2.1.0-beta. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-914) Support graceful decommission of nodemanager
[ https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luke Lu updated YARN-914: - Description: When NMs are decommissioned for non-fault reasons (capacity change etc.), it's desirable to minimize the impact to running applications. Currently if a NM is decommissioned, all running containers on the NM need to be rescheduled on other NMs. Further more, for finished map tasks, if their map output are not fetched by the reducers of the job, these map tasks will need to be rerun as well. We propose to introduce a mechanism to optionally gracefully decommission a node manager. was: When NNs are decommissioned for non-fault reasons (capacity change etc.), it's desirable to minimize the impact to running applications. Currently if a NM is decommissioned, all running containers on the NM need to be rescheduled on other NMs. Further more, for finished map tasks, if their map output are not fetched by the reducers of the job, these map tasks will need to be rerun as well. We propose to introduce a mechanism to optionally gracefully decommission a node manager. Support graceful decommission of nodemanager Key: YARN-914 URL: https://issues.apache.org/jira/browse/YARN-914 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.0.4-alpha Reporter: Luke Lu Assignee: Junping Du When NMs are decommissioned for non-fault reasons (capacity change etc.), it's desirable to minimize the impact to running applications. Currently if a NM is decommissioned, all running containers on the NM need to be rescheduled on other NMs. Further more, for finished map tasks, if their map output are not fetched by the reducers of the job, these map tasks will need to be rerun as well. We propose to introduce a mechanism to optionally gracefully decommission a node manager. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-926) ContainerManagerProtcol APIs should take in requests for multiple containers
Vinod Kumar Vavilapalli created YARN-926: Summary: ContainerManagerProtcol APIs should take in requests for multiple containers Key: YARN-926 URL: https://issues.apache.org/jira/browse/YARN-926 Project: Hadoop YARN Issue Type: Bug Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli AMs typically have to launch multiple containers on a node and the current single container APIs aren't helping. We should have all the APIs take in multiple requests and return multiple responses. The client libraries could expose both the single and multi-container requests. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-744) Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated.
[ https://issues.apache.org/jira/browse/YARN-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709147#comment-13709147 ] Hadoop QA commented on YARN-744: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12592426/YARN-744-20130715.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1485//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1485//console This message is automatically generated. Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated. - Key: YARN-744 URL: https://issues.apache.org/jira/browse/YARN-744 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Bikas Saha Assignee: Omkar Vinit Joshi Priority: Minor Attachments: MAPREDUCE-3899-branch-0.23.patch, YARN-744-20130711.1.patch, YARN-744-20130715.1.patch, YARN-744.patch Looks like the lock taken in this is broken. It takes a lock on lastResponse object and then puts a new lastResponse object into the map. At this point a new thread entering this function will get a new lastResponse object and will be able to take its lock and enter the critical section. Presumably we want to limit one response per app attempt. So the lock could be taken on the ApplicationAttemptId key of the response map object. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-914) Support graceful decommission of nodemanager
[ https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709144#comment-13709144 ] Luke Lu commented on YARN-914: -- [~atm]: Nice catch! Of course :) Support graceful decommission of nodemanager Key: YARN-914 URL: https://issues.apache.org/jira/browse/YARN-914 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.0.4-alpha Reporter: Luke Lu Assignee: Junping Du When NMs are decommissioned for non-fault reasons (capacity change etc.), it's desirable to minimize the impact to running applications. Currently if a NM is decommissioned, all running containers on the NM need to be rescheduled on other NMs. Further more, for finished map tasks, if their map output are not fetched by the reducers of the job, these map tasks will need to be rerun as well. We propose to introduce a mechanism to optionally gracefully decommission a node manager. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-926) ContainerManagerProtcol APIs should take in requests for multiple containers
[ https://issues.apache.org/jira/browse/YARN-926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-926: - Priority: Blocker (was: Major) Target Version/s: 2.1.0-beta Technically this isn't a blocker, and can be added as a new API in a compatible manner. But I'd like to avoid having multiple APIs as we still have a chance of getting this into 2.1.0. Thoughts? ContainerManagerProtcol APIs should take in requests for multiple containers Key: YARN-926 URL: https://issues.apache.org/jira/browse/YARN-926 Project: Hadoop YARN Issue Type: Bug Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Priority: Blocker AMs typically have to launch multiple containers on a node and the current single container APIs aren't helping. We should have all the APIs take in multiple requests and return multiple responses. The client libraries could expose both the single and multi-container requests. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-912) Create exceptions package in common/api for yarn and move client facing exceptions to them
[ https://issues.apache.org/jira/browse/YARN-912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709157#comment-13709157 ] Vinod Kumar Vavilapalli commented on YARN-912: -- Can you also take care of NMNotYetReadyException and InvalidContainerException too? Create exceptions package in common/api for yarn and move client facing exceptions to them -- Key: YARN-912 URL: https://issues.apache.org/jira/browse/YARN-912 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.0-beta Reporter: Bikas Saha Assignee: Mayank Bansal Attachments: YARN-912-trunk-1.patch, YARN-912-trunk-2.patch, YARN-912-trunk-3.patch Exceptions like InvalidResourceBlacklistRequestException, InvalidResourceRequestException, InvalidApplicationMasterRequestException etc are currently inside ResourceManager and not visible to clients. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-321) Generic application history service
[ https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709192#comment-13709192 ] Karthik Kambatla commented on YARN-321: --- Few other considerations: bq. Running as service: By default, ApplicationHistoryService will be embedded inside ResourceManager but will be independent enough to run as a separate service for scaling purposes. Is there a reason to embed this inside the RM? I don't know if there were reasons for the JHS to be separate, other than it being MR-specific. If there were, this would be against those. No? That said, I agree it will be easier for the user if AHS starts along with the RM. May be, that should be configurable and turned on by default? bq. Hosting/serving per-framework data is out of scope for this JIRA. Understand and agree it makes sense to not complicate it. However, during the design, it would be nice to outline (at least at a high-level) how the plugins can work. For the plugins to serve application-specific information, I suspect the RM should write this information in addition to generic YARN information about that application (e.g. MapReduce counters). On completion, can we leave a provision for the AM to write a json blob (may be, via RM) to {{HistoryStorage}}. In the AHS, can we leave a provision for app-plugins to access/use this information to render application specifics. Generic application history service --- Key: YARN-321 URL: https://issues.apache.org/jira/browse/YARN-321 Project: Hadoop YARN Issue Type: Improvement Reporter: Luke Lu Assignee: Vinod Kumar Vavilapalli The mapreduce job history server currently needs to be deployed as a trusted server in sync with the mapreduce runtime. Every new application would need a similar application history server. Having to deploy O(T*V) (where T is number of type of application, V is number of version of application) trusted servers is clearly not scalable. Job history storage handling itself is pretty generic: move the logs and history data into a particular directory for later serving. Job history data is already stored as json (or binary avro). I propose that we create only one trusted application history server, which can have a generic UI (display json as a tree of strings) as well. Specific application/version can deploy untrusted webapps (a la AMs) to query the application history server and interpret the json for its specific UI and/or analytics. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-906) TestNMClient.testNMClientNoCleanupOnStop fails occasionally
[ https://issues.apache.org/jira/browse/YARN-906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709195#comment-13709195 ] Zhijie Shen commented on YARN-906: -- Did some investigation in this test failure. The test itself seems to have no problem. The test was timeout because the container state kept RUNNING after it was stopped, which was not expected. Looked into the test log: after stopContainer was called, Container moved from LOCALIZED to KILLING, but didn't move on any more. However, looked into my local test log of a successful run: Container moved from LOCALIZED to KILLING, and then from KILLING to CONTAINER_CLEANEDUP_AFTER_KILL, during which the major work is to clean the localized container resources (observed the execution of file deletion). However, the failed test log didn't show any file deletion. Therefore, I guess there's something blocking during container resources cleanup. Thoughts? More investigation is needed to further locate the problem. TestNMClient.testNMClientNoCleanupOnStop fails occasionally --- Key: YARN-906 URL: https://issues.apache.org/jira/browse/YARN-906 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Zhijie Shen See https://builds.apache.org/job/PreCommit-YARN-Build/1435//testReport/org.apache.hadoop.yarn.client.api.impl/TestNMClient/testNMClientNoCleanupOnStop/ -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (YARN-306) FIFO scheduler doesn't respect changing job priority
[ https://issues.apache.org/jira/browse/YARN-306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla reassigned YARN-306: - Assignee: (was: Karthik Kambatla) FIFO scheduler doesn't respect changing job priority Key: YARN-306 URL: https://issues.apache.org/jira/browse/YARN-306 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.0.2-alpha Reporter: Nishan Shetty 1.Submit job 2.Change the job priority using setPriority() or CLI command ./mapred job-set-priority job-id priority Observe that Job priority is not changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-896) Roll up for long lived YARN
[ https://issues.apache.org/jira/browse/YARN-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709242#comment-13709242 ] Thomas Weise commented on YARN-896: --- We also identified the need for token renewal (app specific tokens). This should be a common need for long running services. Has it been discussed elsewhere? Roll up for long lived YARN --- Key: YARN-896 URL: https://issues.apache.org/jira/browse/YARN-896 Project: Hadoop YARN Issue Type: New Feature Reporter: Robert Joseph Evans YARN is intended to be general purpose, but it is missing some features to be able to truly support long lived applications and long lived containers. This ticket is intended to # discuss what is needed to support long lived processes # track the resulting JIRA. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-654) AMRMClient: Perform sanity checks for parameters of public methods
[ https://issues.apache.org/jira/browse/YARN-654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709243#comment-13709243 ] Bikas Saha commented on YARN-654: - Sorry the minor change I made in the patch that changed =0 to 0 is wrong. I did not follow the code correctly. Fixing it in the commit for YARN-521. AMRMClient: Perform sanity checks for parameters of public methods -- Key: YARN-654 URL: https://issues.apache.org/jira/browse/YARN-654 Project: Hadoop YARN Issue Type: Bug Reporter: Bikas Saha Assignee: Xuan Gong Fix For: 2.1.0-beta Attachments: YARN-654.1.patch, YARN-654.2.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-521) Augment AM - RM client module to be able to request containers only at specific locations
[ https://issues.apache.org/jira/browse/YARN-521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha updated YARN-521: Attachment: YARN-521.final.patch Augment AM - RM client module to be able to request containers only at specific locations - Key: YARN-521 URL: https://issues.apache.org/jira/browse/YARN-521 Project: Hadoop YARN Issue Type: Sub-task Components: api Affects Versions: 2.0.3-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-521-1.patch, YARN-521-2.patch, YARN-521-2.patch, YARN-521-3.patch, YARN-521-4.patch, YARN-521-4.patch, YARN-521-5.patch, YARN-521-6.patch, YARN-521.final.patch, YARN-521.patch When YARN-392 and YARN-398 are completed, it would be good for AMRMClient to offer an easy way to access their functionality -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-513) Create common proxy client for communicating with RM
[ https://issues.apache.org/jira/browse/YARN-513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-513: - Attachment: YARN-513.16.patch Create common proxy client for communicating with RM Key: YARN-513 URL: https://issues.apache.org/jira/browse/YARN-513 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Jian He Attachments: YARN-513.10.patch, YARN-513.11.patch, YARN-513.12.patch, YARN-513.13.patch, YARN-513.14.patch, YARN-513.15.patch, YARN-513.16.patch, YARN-513.1.patch, YARN-513.2.patch, YARN-513.3.patch, YARN-513.4.patch, YARN.513.5.patch, YARN-513.6.patch, YARN-513.7.patch, YARN-513.8.patch, YARN-513.9.patch When the RM is restarting, the NM, AM and Clients should wait for some time for the RM to come back up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-875) Application can hang if AMRMClientAsync callback thread has exception
[ https://issues.apache.org/jira/browse/YARN-875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-875: --- Attachment: YARN-875.1.patch Application can hang if AMRMClientAsync callback thread has exception - Key: YARN-875 URL: https://issues.apache.org/jira/browse/YARN-875 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.0-beta Reporter: Bikas Saha Assignee: Xuan Gong Attachments: YARN-875.1.patch Currently that thread will die and then never callback. App can hang. Possible solution could be to catch Throwable in the callback and then call client.onError(). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-914) Support graceful decommission of nodemanager
[ https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709255#comment-13709255 ] Aaron T. Myers commented on YARN-914: - Thanks, Luke. Support graceful decommission of nodemanager Key: YARN-914 URL: https://issues.apache.org/jira/browse/YARN-914 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.0.4-alpha Reporter: Luke Lu Assignee: Junping Du When NMs are decommissioned for non-fault reasons (capacity change etc.), it's desirable to minimize the impact to running applications. Currently if a NM is decommissioned, all running containers on the NM need to be rescheduled on other NMs. Further more, for finished map tasks, if their map output are not fetched by the reducers of the job, these map tasks will need to be rerun as well. We propose to introduce a mechanism to optionally gracefully decommission a node manager. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-321) Generic application history service
[ https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709262#comment-13709262 ] Jason Lowe commented on YARN-321: - bq. Is there a reason to embed this inside the RM? I don't know if there were reasons for the JHS to be separate, other than it being MR-specific. IIRC the history server was embedded in the JT back in 1.x and was only split out as a separate daemon to keep the RM from having a dependency on MR. bq. That said, I agree it will be easier for the user if AHS starts along with the RM. May be, that should be configurable and turned on by default? That'd be my preference, and the proxyserver is already done this way. One can run it either as part of the RM (default) or setup some configs and launch it separately via {{yarn proxyserver}}. Generic application history service --- Key: YARN-321 URL: https://issues.apache.org/jira/browse/YARN-321 Project: Hadoop YARN Issue Type: Improvement Reporter: Luke Lu Assignee: Vinod Kumar Vavilapalli The mapreduce job history server currently needs to be deployed as a trusted server in sync with the mapreduce runtime. Every new application would need a similar application history server. Having to deploy O(T*V) (where T is number of type of application, V is number of version of application) trusted servers is clearly not scalable. Job history storage handling itself is pretty generic: move the logs and history data into a particular directory for later serving. Job history data is already stored as json (or binary avro). I propose that we create only one trusted application history server, which can have a generic UI (display json as a tree of strings) as well. Specific application/version can deploy untrusted webapps (a la AMs) to query the application history server and interpret the json for its specific UI and/or analytics. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (YARN-521) Augment AM - RM client module to be able to request containers only at specific locations
[ https://issues.apache.org/jira/browse/YARN-521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha resolved YARN-521. - Resolution: Fixed Fix Version/s: 2.1.0-beta Hadoop Flags: Reviewed Committed to trunk, branch-2 and branch-2.1-beta. I included in trivial error from YARN-654 in the commit. Thanks Sandy! Augment AM - RM client module to be able to request containers only at specific locations - Key: YARN-521 URL: https://issues.apache.org/jira/browse/YARN-521 Project: Hadoop YARN Issue Type: Sub-task Components: api Affects Versions: 2.0.3-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Fix For: 2.1.0-beta Attachments: YARN-521-1.patch, YARN-521-2.patch, YARN-521-2.patch, YARN-521-3.patch, YARN-521-4.patch, YARN-521-4.patch, YARN-521-5.patch, YARN-521-6.patch, YARN-521.final.patch, YARN-521.patch When YARN-392 and YARN-398 are completed, it would be good for AMRMClient to offer an easy way to access their functionality -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-523) Container localization failures aren't reported from NM to RM
[ https://issues.apache.org/jira/browse/YARN-523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709280#comment-13709280 ] Vinod Kumar Vavilapalli commented on YARN-523: -- Tx for the testing update Jian. The test changes are trivial and good. Checking this in. Container localization failures aren't reported from NM to RM - Key: YARN-523 URL: https://issues.apache.org/jira/browse/YARN-523 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Jian He Attachments: YARN-523.patch This is mainly a pain on crashing AMs, but once we fix this, containers also can benefit - same fix for both. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-744) Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated.
[ https://issues.apache.org/jira/browse/YARN-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709304#comment-13709304 ] Omkar Vinit Joshi commented on YARN-744: bq. We should not be locking on the app attempt id. I am not locking on appAttemptId... or AppAttemptImpl...didn't understand your question. bq. Why do we need a wrapper? We don't have any explicit lock for an application attempt...I am creating a wrapped object to avoid maintaining per application attempt lock. Thereby across application attempt response we can lock on specific attempt. I think this is important as we may loose container than what were requested... Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated. - Key: YARN-744 URL: https://issues.apache.org/jira/browse/YARN-744 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Bikas Saha Assignee: Omkar Vinit Joshi Priority: Minor Attachments: MAPREDUCE-3899-branch-0.23.patch, YARN-744-20130711.1.patch, YARN-744-20130715.1.patch, YARN-744.patch Looks like the lock taken in this is broken. It takes a lock on lastResponse object and then puts a new lastResponse object into the map. At this point a new thread entering this function will get a new lastResponse object and will be able to take its lock and enter the critical section. Presumably we want to limit one response per app attempt. So the lock could be taken on the ApplicationAttemptId key of the response map object. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (YARN-430) Add HDFS based store for RM which manages the store using directories
[ https://issues.apache.org/jira/browse/YARN-430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha resolved YARN-430. - Resolution: Not A Problem YARN-922 adds this to the FileSystemStateStore. HDFS retry behavior can be configured from config. No need for this anymore. Add HDFS based store for RM which manages the store using directories - Key: YARN-430 URL: https://issues.apache.org/jira/browse/YARN-430 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Jian He There is a generic FileSystem store but it does not take advantage of HDFS features like directories, replication, DFSClient advanced settings for HA, retries etc. Writing a store thats optimized for HDFS would be good. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-922) Change FileSystemRMStateStore to use directories
[ https://issues.apache.org/jira/browse/YARN-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709311#comment-13709311 ] Jian He commented on YARN-922: -- existing test cases covers this, no need to add more tests Change FileSystemRMStateStore to use directories Key: YARN-922 URL: https://issues.apache.org/jira/browse/YARN-922 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Jian He Assignee: Jian He Attachments: YARN-922.patch Store each app and its attempts in the same directory so that removing application state is only one operation -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-513) Create common proxy client for communicating with RM
[ https://issues.apache.org/jira/browse/YARN-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709329#comment-13709329 ] Hadoop QA commented on YARN-513: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12592454/YARN-513.16.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1487//console This message is automatically generated. Create common proxy client for communicating with RM Key: YARN-513 URL: https://issues.apache.org/jira/browse/YARN-513 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Jian He Attachments: YARN-513.10.patch, YARN-513.11.patch, YARN-513.12.patch, YARN-513.13.patch, YARN-513.14.patch, YARN-513.15.patch, YARN-513.16.patch, YARN-513.1.patch, YARN-513.2.patch, YARN-513.3.patch, YARN-513.4.patch, YARN.513.5.patch, YARN-513.6.patch, YARN-513.7.patch, YARN-513.8.patch, YARN-513.9.patch When the RM is restarting, the NM, AM and Clients should wait for some time for the RM to come back up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-513) Create common proxy client for communicating with RM
[ https://issues.apache.org/jira/browse/YARN-513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-513: - Attachment: YARN-513.17.patch rebased the patch Create common proxy client for communicating with RM Key: YARN-513 URL: https://issues.apache.org/jira/browse/YARN-513 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Jian He Attachments: YARN-513.10.patch, YARN-513.11.patch, YARN-513.12.patch, YARN-513.13.patch, YARN-513.14.patch, YARN-513.15.patch, YARN-513.16.patch, YARN-513.17.patch, YARN-513.1.patch, YARN-513.2.patch, YARN-513.3.patch, YARN-513.4.patch, YARN.513.5.patch, YARN-513.6.patch, YARN-513.7.patch, YARN-513.8.patch, YARN-513.9.patch When the RM is restarting, the NM, AM and Clients should wait for some time for the RM to come back up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-922) Change FileSystemRMStateStore to use directories
[ https://issues.apache.org/jira/browse/YARN-922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-922: - Attachment: YARN-922.1.patch Change FileSystemRMStateStore to use directories Key: YARN-922 URL: https://issues.apache.org/jira/browse/YARN-922 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Jian He Assignee: Jian He Attachments: YARN-922.1.patch, YARN-922.patch Store each app and its attempts in the same directory so that removing application state is only one operation -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-927) Change ContainerRequest to not have more than 1 container count and remove StoreContainerRequest
Bikas Saha created YARN-927: --- Summary: Change ContainerRequest to not have more than 1 container count and remove StoreContainerRequest Key: YARN-927 URL: https://issues.apache.org/jira/browse/YARN-927 Project: Hadoop YARN Issue Type: Task Reporter: Bikas Saha Assignee: Bikas Saha The downside is having to use more than 1 container request when requesting more than 1 container at * priority. For most other use cases that have specific locations we anyways need to make multiple container requests. This will also remove unnecessary duplication caused by StoredContainerRequest. It will make the getMatchingRequest() always available and easy to use removeContainerRequest(). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-245) Node Manager gives InvalidStateTransitonException for FINISH_APPLICATION at FINISHED
[ https://issues.apache.org/jira/browse/YARN-245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709355#comment-13709355 ] Omkar Vinit Joshi commented on YARN-245: Thanks mayank... I just took a look at your patch.. Here are few comments.. bq. + private int lastHeartBeatId; do we need this? Can we remove it? bq. +// Checking if the response id is the same which we just processed bq. +// If yes then ignore the update. bq. +if (lastHeartBeatID = response.getResponseId()) { bq. + ++lastHeartBeatID; bq. + continue; bq. +} I remember we talked about it some time backcorrect me if I am wrong.. Can we replace this with something like ..accepting only one and reject all others? if (lastHeartbeatID != response.getResponseId() - 1 ) { continue; } thoughts?? * for Test case.. probably we can avoid creating certain classes there.. MyNodeManager6 - MockNM with existing NodeStatusUpdater? MyResourceTracker6 - with existing resource tracker (may be from another test? )only override nodeHeartbeat method... Also I think we need to check 2 things.. 1) on node manager side we are getting only one APP_FINISH event on dispatcher queue? 2) if application state == DONE then send node status response containing the current application to be finished?? may be this will actually test the current problem? should fail without patch.. thoughts? Node Manager gives InvalidStateTransitonException for FINISH_APPLICATION at FINISHED Key: YARN-245 URL: https://issues.apache.org/jira/browse/YARN-245 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.0.2-alpha, 2.0.1-alpha Reporter: Devaraj K Assignee: Mayank Bansal Attachments: YARN-245-trunk-1.patch, YARN-245-trunk-2.patch {code:xml} 2012-11-25 12:56:11,795 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: FINISH_APPLICATION at FINISHED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443) at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:398) at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:58) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:520) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:512) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75) at java.lang.Thread.run(Thread.java:662) 2012-11-25 12:56:11,796 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Application application_1353818859056_0004 transitioned from FINISHED to null {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-927) Change ContainerRequest to not have more than 1 container count and remove StoreContainerRequest
[ https://issues.apache.org/jira/browse/YARN-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709359#comment-13709359 ] Sandy Ryza commented on YARN-927: - Removing the ability to request multiple containers would be an annoying regression for a large class of applications that merely want a non-locality-constrained bunch of processes on the cluster. Have you considered allowing StoredContainerRequest to have multiple containers and including a decrementContainerRequest method? Change ContainerRequest to not have more than 1 container count and remove StoreContainerRequest Key: YARN-927 URL: https://issues.apache.org/jira/browse/YARN-927 Project: Hadoop YARN Issue Type: Task Reporter: Bikas Saha Assignee: Bikas Saha The downside is having to use more than 1 container request when requesting more than 1 container at * priority. For most other use cases that have specific locations we anyways need to make multiple container requests. This will also remove unnecessary duplication caused by StoredContainerRequest. It will make the getMatchingRequest() always available and easy to use removeContainerRequest(). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-523) Container localization failures aren't reported from NM to RM
[ https://issues.apache.org/jira/browse/YARN-523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709366#comment-13709366 ] Hudson commented on YARN-523: - SUCCESS: Integrated in Hadoop-trunk-Commit #4086 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4086/]) YARN-523. Modified a test-case to validate container diagnostics on localization failures. Contributed by Jian He. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1503532) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/TestContainer.java Container localization failures aren't reported from NM to RM - Key: YARN-523 URL: https://issues.apache.org/jira/browse/YARN-523 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Jian He Fix For: 2.1.1-beta Attachments: YARN-523.patch This is mainly a pain on crashing AMs, but once we fix this, containers also can benefit - same fix for both. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-521) Augment AM - RM client module to be able to request containers only at specific locations
[ https://issues.apache.org/jira/browse/YARN-521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709367#comment-13709367 ] Hudson commented on YARN-521: - SUCCESS: Integrated in Hadoop-trunk-Commit #4086 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4086/]) YARN-521. Augment AM - RM client module to be able to request containers only at specific locations (Sandy Ryza via bikas) (bikas: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1503526) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/AMRMClient.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/InvalidContainerRequestException.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/AMRMClientImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestAMRMClient.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestAMRMClientContainerRequest.java Augment AM - RM client module to be able to request containers only at specific locations - Key: YARN-521 URL: https://issues.apache.org/jira/browse/YARN-521 Project: Hadoop YARN Issue Type: Sub-task Components: api Affects Versions: 2.0.3-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Fix For: 2.1.0-beta Attachments: YARN-521-1.patch, YARN-521-2.patch, YARN-521-2.patch, YARN-521-3.patch, YARN-521-4.patch, YARN-521-4.patch, YARN-521-5.patch, YARN-521-6.patch, YARN-521.final.patch, YARN-521.patch When YARN-392 and YARN-398 are completed, it would be good for AMRMClient to offer an easy way to access their functionality -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-875) Application can hang if AMRMClientAsync callback thread has exception
[ https://issues.apache.org/jira/browse/YARN-875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709370#comment-13709370 ] Hadoop QA commented on YARN-875: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12592456/YARN-875.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 1 warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client: org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell org.apache.hadoop.yarn.client.api.impl.TestAMRMClientContainerRequest org.apache.hadoop.yarn.client.api.impl.TestAMRMClient org.apache.hadoop.yarn.client.api.impl.TestNMClient {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1489//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1489//console This message is automatically generated. Application can hang if AMRMClientAsync callback thread has exception - Key: YARN-875 URL: https://issues.apache.org/jira/browse/YARN-875 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.0-beta Reporter: Bikas Saha Assignee: Xuan Gong Attachments: YARN-875.1.patch Currently that thread will die and then never callback. App can hang. Possible solution could be to catch Throwable in the callback and then call client.onError(). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-927) Change ContainerRequest to not have more than 1 container count and remove StoreContainerRequest
[ https://issues.apache.org/jira/browse/YARN-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709380#comment-13709380 ] Bikas Saha commented on YARN-927: - IMO calling addContainerRequest(new ContainerRequest(5)) is only a little bit less work than calling for(int i=0; i5; ++i) {addContainerRequest(new ContainerRequest(1));} Other than that there should be no change in any of those applications. They should continue to work as is after that. Book-keeping for store containers is next to impossible when add(pri1, 5), add(pri1, 4) is called followed remove(pri1, 2). Internally, we dont know whether to remove from the first CR or the second. That makes getting getMatchingRequest API a non-starter. That is why getMatchingRequest is restricted for StoredContainerRequest. It makes the API confusing. Allowing users to get some CR and changing its container count outside of the AMRMClient will lead to other correctness issues. The API is messy the way it is right now. I had always wanted to do this but lost track of it because of TEZ stabilization work. Reviewing YARN-521 recently reminded of this when I noticed the duplication of stuff between CR and StoredCR and how its easy to miss them. From what I see, the downside of this very minimal and upside is a much cleaner API. Hence I want to get this in before beta. I held back on it since YARN-521 was close and didnt want to cause unnecessary massive merge conflicts because of this simple refactor. Change ContainerRequest to not have more than 1 container count and remove StoreContainerRequest Key: YARN-927 URL: https://issues.apache.org/jira/browse/YARN-927 Project: Hadoop YARN Issue Type: Task Reporter: Bikas Saha Assignee: Bikas Saha The downside is having to use more than 1 container request when requesting more than 1 container at * priority. For most other use cases that have specific locations we anyways need to make multiple container requests. This will also remove unnecessary duplication caused by StoredContainerRequest. It will make the getMatchingRequest() always available and easy to use removeContainerRequest(). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-922) Change FileSystemRMStateStore to use directories
[ https://issues.apache.org/jira/browse/YARN-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709382#comment-13709382 ] Hadoop QA commented on YARN-922: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12592465/YARN-922.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.recovery.TestRMStateStore {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1488//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1488//console This message is automatically generated. Change FileSystemRMStateStore to use directories Key: YARN-922 URL: https://issues.apache.org/jira/browse/YARN-922 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Jian He Assignee: Jian He Attachments: YARN-922.1.patch, YARN-922.patch Store each app and its attempts in the same directory so that removing application state is only one operation -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-62) AM should not be able to abuse container tokens for repetitive container launches
[ https://issues.apache.org/jira/browse/YARN-62?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709388#comment-13709388 ] Omkar Vinit Joshi commented on YARN-62: --- thanks vinod.. bq. Though it works in most cases, it isn't logically correct to expire old token only if a new container comes in or succeeds. We should perform the expiry in a thread. I thought about threads earlier but that is like starting an additional one to maintain this and seems like an overhead. thoughts? bq. Can you also write a specific test which launches a container that very quickly exits, turns around and launches another container with same ID and token and gets rejected? bq. Also, please write a test which makes sure that old tokens are expired after 10 mins. yeah will add one... AM should not be able to abuse container tokens for repetitive container launches - Key: YARN-62 URL: https://issues.apache.org/jira/browse/YARN-62 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 0.23.3, 2.0.0-alpha Reporter: Vinod Kumar Vavilapalli Assignee: Omkar Vinit Joshi Attachments: YARN-62-20130621.1.patch, YARN-62-20130621.patch, YARN-62-20130628.patch Clone of YARN-51. ApplicationMaster should not be able to store container tokens and use the same set of tokens for repetitive container launches. The possibility of such abuse is there in the current code, for a duration of 1d+10mins, we need to fix this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-875) Application can hang if AMRMClientAsync callback thread has exception
[ https://issues.apache.org/jira/browse/YARN-875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha updated YARN-875: Attachment: YARN-875.1.patch Kicking test again. Application can hang if AMRMClientAsync callback thread has exception - Key: YARN-875 URL: https://issues.apache.org/jira/browse/YARN-875 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.0-beta Reporter: Bikas Saha Assignee: Xuan Gong Attachments: YARN-875.1.patch, YARN-875.1.patch Currently that thread will die and then never callback. App can hang. Possible solution could be to catch Throwable in the callback and then call client.onError(). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-744) Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated.
[ https://issues.apache.org/jira/browse/YARN-744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Omkar Vinit Joshi updated YARN-744: --- Description: Looks like the lock taken in this is broken. It takes a lock on lastResponse object and then puts a new lastResponse object into the map. At this point a new thread entering this function will get a new lastResponse object and will be able to take its lock and enter the critical section. Presumably we want to limit one response per app attempt. So the lock could be taken on the ApplicationAttemptId key of the response map object. was: Looks like the lock taken in this is broken. It takes a lock on lastResponse object and then puts a new lastResponse object into the map. At this point a new thread entering this function will get a new lastResponse object and will be able to take its lock and enter the critical section. Presumably we want to limit one response per app attempt. So the lock could be taken on the ApplicationAttemptId key of the response map object. As a part of this ticket also fixing the problem which is present in Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated. - Key: YARN-744 URL: https://issues.apache.org/jira/browse/YARN-744 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Bikas Saha Assignee: Omkar Vinit Joshi Priority: Minor Attachments: MAPREDUCE-3899-branch-0.23.patch, YARN-744-20130711.1.patch, YARN-744-20130715.1.patch, YARN-744.patch Looks like the lock taken in this is broken. It takes a lock on lastResponse object and then puts a new lastResponse object into the map. At this point a new thread entering this function will get a new lastResponse object and will be able to take its lock and enter the critical section. Presumably we want to limit one response per app attempt. So the lock could be taken on the ApplicationAttemptId key of the response map object. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-744) Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated.
[ https://issues.apache.org/jira/browse/YARN-744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Omkar Vinit Joshi updated YARN-744: --- Description: Looks like the lock taken in this is broken. It takes a lock on lastResponse object and then puts a new lastResponse object into the map. At this point a new thread entering this function will get a new lastResponse object and will be able to take its lock and enter the critical section. Presumably we want to limit one response per app attempt. So the lock could be taken on the ApplicationAttemptId key of the response map object. As a part of this ticket also fixing the problem which is present in was:Looks like the lock taken in this is broken. It takes a lock on lastResponse object and then puts a new lastResponse object into the map. At this point a new thread entering this function will get a new lastResponse object and will be able to take its lock and enter the critical section. Presumably we want to limit one response per app attempt. So the lock could be taken on the ApplicationAttemptId key of the response map object. Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated. - Key: YARN-744 URL: https://issues.apache.org/jira/browse/YARN-744 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Bikas Saha Assignee: Omkar Vinit Joshi Priority: Minor Attachments: MAPREDUCE-3899-branch-0.23.patch, YARN-744-20130711.1.patch, YARN-744-20130715.1.patch, YARN-744.patch Looks like the lock taken in this is broken. It takes a lock on lastResponse object and then puts a new lastResponse object into the map. At this point a new thread entering this function will get a new lastResponse object and will be able to take its lock and enter the critical section. Presumably we want to limit one response per app attempt. So the lock could be taken on the ApplicationAttemptId key of the response map object. As a part of this ticket also fixing the problem which is present in -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-521) Augment AM - RM client module to be able to request containers only at specific locations
[ https://issues.apache.org/jira/browse/YARN-521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709398#comment-13709398 ] Hudson commented on YARN-521: - SUCCESS: Integrated in Hadoop-trunk-Commit #4087 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4087/]) Trivial fix for minor refactor error for YARN-521 (bikas: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1503543) * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/AMRMClient.java Augment AM - RM client module to be able to request containers only at specific locations - Key: YARN-521 URL: https://issues.apache.org/jira/browse/YARN-521 Project: Hadoop YARN Issue Type: Sub-task Components: api Affects Versions: 2.0.3-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Fix For: 2.1.0-beta Attachments: YARN-521-1.patch, YARN-521-2.patch, YARN-521-2.patch, YARN-521-3.patch, YARN-521-4.patch, YARN-521-4.patch, YARN-521-5.patch, YARN-521-6.patch, YARN-521.final.patch, YARN-521.patch When YARN-392 and YARN-398 are completed, it would be good for AMRMClient to offer an easy way to access their functionality -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-875) Application can hang if AMRMClientAsync callback thread has exception
[ https://issues.apache.org/jira/browse/YARN-875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709408#comment-13709408 ] Hadoop QA commented on YARN-875: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12592471/YARN-875.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 1 warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1491//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1491//console This message is automatically generated. Application can hang if AMRMClientAsync callback thread has exception - Key: YARN-875 URL: https://issues.apache.org/jira/browse/YARN-875 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.0-beta Reporter: Bikas Saha Assignee: Xuan Gong Attachments: YARN-875.1.patch, YARN-875.1.patch Currently that thread will die and then never callback. App can hang. Possible solution could be to catch Throwable in the callback and then call client.onError(). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-927) Change ContainerRequest to not have more than 1 container count and remove StoreContainerRequest
[ https://issues.apache.org/jira/browse/YARN-927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha updated YARN-927: Attachment: YARN-927.1.patch Attaching the refactoring patch. The code change in DistributedShell reflects the trivial changes needed for apps that would have used 1 container count in a single ContainerRequest. Change ContainerRequest to not have more than 1 container count and remove StoreContainerRequest Key: YARN-927 URL: https://issues.apache.org/jira/browse/YARN-927 Project: Hadoop YARN Issue Type: Task Reporter: Bikas Saha Assignee: Bikas Saha Attachments: YARN-927.1.patch The downside is having to use more than 1 container request when requesting more than 1 container at * priority. For most other use cases that have specific locations we anyways need to make multiple container requests. This will also remove unnecessary duplication caused by StoredContainerRequest. It will make the getMatchingRequest() always available and easy to use removeContainerRequest(). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-451) Add more metrics to RM page
[ https://issues.apache.org/jira/browse/YARN-451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709420#comment-13709420 ] Joep Rottinghuis commented on YARN-451: --- It would certainly be very useful to be able to see application size/weight (and order by this) when many applications run. If it were to be added, various Yarn applications would have their own specific implementation. At the moment only memory is tracked, so #slot Gigabytes would be a possible number that would be more generic then simply #mappers+#reducers. Either would be more useful that having no data at all. Being able to see the size of applications is really helpful to understand what is going on in one view. Is somebody running many small applications, a few large ones, many large ones ? etc. Add more metrics to RM page --- Key: YARN-451 URL: https://issues.apache.org/jira/browse/YARN-451 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.0.3-alpha Reporter: Lohit Vijayarenu Priority: Minor ResourceManager webUI shows list of RUNNING applications, but it does not tell which applications are requesting more resource compared to others. With cluster running hundreds of applications at once it would be useful to have some kind of metric to show high-resource usage applications vs low-resource usage ones. At the minimum showing number of containers is good option. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-513) Create common proxy client for communicating with RM
[ https://issues.apache.org/jira/browse/YARN-513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha updated YARN-513: Attachment: YARN-513.17.patch Create common proxy client for communicating with RM Key: YARN-513 URL: https://issues.apache.org/jira/browse/YARN-513 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Jian He Attachments: YARN-513.10.patch, YARN-513.11.patch, YARN-513.12.patch, YARN-513.13.patch, YARN-513.14.patch, YARN-513.15.patch, YARN-513.16.patch, YARN-513.17.patch, YARN-513.17.patch, YARN-513.1.patch, YARN-513.2.patch, YARN-513.3.patch, YARN-513.4.patch, YARN.513.5.patch, YARN-513.6.patch, YARN-513.7.patch, YARN-513.8.patch, YARN-513.9.patch When the RM is restarting, the NM, AM and Clients should wait for some time for the RM to come back up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-927) Change ContainerRequest to not have more than 1 container count and remove StoreContainerRequest
[ https://issues.apache.org/jira/browse/YARN-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709433#comment-13709433 ] Hadoop QA commented on YARN-927: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12592478/YARN-927.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 2 warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1492//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1492//console This message is automatically generated. Change ContainerRequest to not have more than 1 container count and remove StoreContainerRequest Key: YARN-927 URL: https://issues.apache.org/jira/browse/YARN-927 Project: Hadoop YARN Issue Type: Task Reporter: Bikas Saha Assignee: Bikas Saha Attachments: YARN-927.1.patch The downside is having to use more than 1 container request when requesting more than 1 container at * priority. For most other use cases that have specific locations we anyways need to make multiple container requests. This will also remove unnecessary duplication caused by StoredContainerRequest. It will make the getMatchingRequest() always available and easy to use removeContainerRequest(). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-513) Create common proxy client for communicating with RM
[ https://issues.apache.org/jira/browse/YARN-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709443#comment-13709443 ] Hadoop QA commented on YARN-513: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12592464/YARN-513.17.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests: org.apache.hadoop.mapreduce.v2.TestMROldApiJobs org.apache.hadoop.yarn.client.api.impl.TestAMRMClientContainerRequest org.apache.hadoop.yarn.client.api.impl.TestAMRMClient org.apache.hadoop.yarn.client.api.impl.TestNMClient {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1490//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1490//console This message is automatically generated. Create common proxy client for communicating with RM Key: YARN-513 URL: https://issues.apache.org/jira/browse/YARN-513 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Jian He Attachments: YARN-513.10.patch, YARN-513.11.patch, YARN-513.12.patch, YARN-513.13.patch, YARN-513.14.patch, YARN-513.15.patch, YARN-513.16.patch, YARN-513.17.patch, YARN-513.17.patch, YARN-513.1.patch, YARN-513.2.patch, YARN-513.3.patch, YARN-513.4.patch, YARN.513.5.patch, YARN-513.6.patch, YARN-513.7.patch, YARN-513.8.patch, YARN-513.9.patch When the RM is restarting, the NM, AM and Clients should wait for some time for the RM to come back up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-922) Change FileSystemRMStateStore to use directories
[ https://issues.apache.org/jira/browse/YARN-922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-922: - Attachment: YARN-922.2.patch fixed test failure. Change FileSystemRMStateStore to use directories Key: YARN-922 URL: https://issues.apache.org/jira/browse/YARN-922 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Jian He Assignee: Jian He Attachments: YARN-922.1.patch, YARN-922.2.patch, YARN-922.patch Store each app and its attempts in the same directory so that removing application state is only one operation -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-513) Create common proxy client for communicating with RM
[ https://issues.apache.org/jira/browse/YARN-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709457#comment-13709457 ] Jian He commented on YARN-513: -- The above test failures is not related to this patch, TestMROldApiJobs is also failing on trunk, other test failures are related to YARN-521 which should be already fixed by now Create common proxy client for communicating with RM Key: YARN-513 URL: https://issues.apache.org/jira/browse/YARN-513 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Jian He Attachments: YARN-513.10.patch, YARN-513.11.patch, YARN-513.12.patch, YARN-513.13.patch, YARN-513.14.patch, YARN-513.15.patch, YARN-513.16.patch, YARN-513.17.patch, YARN-513.17.patch, YARN-513.1.patch, YARN-513.2.patch, YARN-513.3.patch, YARN-513.4.patch, YARN.513.5.patch, YARN-513.6.patch, YARN-513.7.patch, YARN-513.8.patch, YARN-513.9.patch When the RM is restarting, the NM, AM and Clients should wait for some time for the RM to come back up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-922) Change FileSystemRMStateStore to use directories
[ https://issues.apache.org/jira/browse/YARN-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709458#comment-13709458 ] Hadoop QA commented on YARN-922: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12592484/YARN-922.2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1494//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1494//console This message is automatically generated. Change FileSystemRMStateStore to use directories Key: YARN-922 URL: https://issues.apache.org/jira/browse/YARN-922 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Jian He Assignee: Jian He Attachments: YARN-922.1.patch, YARN-922.2.patch, YARN-922.patch Store each app and its attempts in the same directory so that removing application state is only one operation -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-875) Application can hang if AMRMClientAsync callback thread has exception
[ https://issues.apache.org/jira/browse/YARN-875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709461#comment-13709461 ] Xuan Gong commented on YARN-875: fix -1 on javadoc Application can hang if AMRMClientAsync callback thread has exception - Key: YARN-875 URL: https://issues.apache.org/jira/browse/YARN-875 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.0-beta Reporter: Bikas Saha Assignee: Xuan Gong Attachments: YARN-875.1.patch, YARN-875.1.patch, YARN-875.2.patch Currently that thread will die and then never callback. App can hang. Possible solution could be to catch Throwable in the callback and then call client.onError(). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-875) Application can hang if AMRMClientAsync callback thread has exception
[ https://issues.apache.org/jira/browse/YARN-875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-875: --- Attachment: YARN-875.2.patch Application can hang if AMRMClientAsync callback thread has exception - Key: YARN-875 URL: https://issues.apache.org/jira/browse/YARN-875 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.0-beta Reporter: Bikas Saha Assignee: Xuan Gong Attachments: YARN-875.1.patch, YARN-875.1.patch, YARN-875.2.patch Currently that thread will die and then never callback. App can hang. Possible solution could be to catch Throwable in the callback and then call client.onError(). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-875) Application can hang if AMRMClientAsync callback thread has exception
[ https://issues.apache.org/jira/browse/YARN-875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709470#comment-13709470 ] Hadoop QA commented on YARN-875: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12592488/YARN-875.2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1495//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1495//console This message is automatically generated. Application can hang if AMRMClientAsync callback thread has exception - Key: YARN-875 URL: https://issues.apache.org/jira/browse/YARN-875 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.0-beta Reporter: Bikas Saha Assignee: Xuan Gong Attachments: YARN-875.1.patch, YARN-875.1.patch, YARN-875.2.patch Currently that thread will die and then never callback. App can hang. Possible solution could be to catch Throwable in the callback and then call client.onError(). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Moved] (YARN-928) While killing attempt for a task which got succeeded , task transition happens from SUCCEEDED to SCHEDULED and InvalidStateTransitonException is thrown
[ https://issues.apache.org/jira/browse/YARN-928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] J.Andreina moved MAPREDUCE-5389 to YARN-928: Component/s: (was: task) applications Affects Version/s: (was: 2.0.5-alpha) 2.0.5-alpha Key: YARN-928 (was: MAPREDUCE-5389) Project: Hadoop YARN (was: Hadoop Map/Reduce) While killing attempt for a task which got succeeded , task transition happens from SUCCEEDED to SCHEDULED and InvalidStateTransitonException is thrown Key: YARN-928 URL: https://issues.apache.org/jira/browse/YARN-928 Project: Hadoop YARN Issue Type: Bug Components: applications Affects Versions: 2.0.5-alpha Reporter: J.Andreina Priority: Minor Step 1: Install cluster with HDFS , MR Step 2: Execute a job Step 3: Issue a kill task attempt for which the task has got completed. Rex@HOST-10-18-91-55:~/NodeAgentTmpDir/installations/hadoop-2.0.5.tar/hadoop-2.0.5/bin ./mapred job -kill-task attempt_1373875322959_0032_m_00_0 No GC_PROFILE is given. Defaults to medium. 13/07/15 14:46:32 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited. 13/07/15 14:46:32 INFO proxy.ResourceManagerProxies: HA Proxy Creation with xface : interface org.apache.hadoop.yarn.api.ClientRMProtocol 13/07/15 14:46:33 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is started. Killed task attempt_1373875322959_0032_m_00_0 Observation: === 1. task state has been transitioned from SUCCEEDED to SCHEDULED 2. For a Succeeded attempt , when client issues Kill , then the client is notified as killed for a succeeded attempt. 3. Launched second task_attempt which is succeeded and then killed later on client request. 4. Even after the job state transitioned from SUCCEEDED to ERROR , on UI the state is succeeded Issue : = 1. Client has been notified that the atttempt is killed , but acutually the attempt is succeeded and the same is displayed in JHS UI. 2. At App master InvalidStateTransitonException is thrown . 3. At client side and JHS job has exited with state Finished/succeeded ,At RM side the state is Finished/Failed. AM Logs: 2013-07-15 14:46:25,461 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1373875322959_0032_m_00_0 TaskAttempt Transitioned from RUNNING to SUCCEEDED 2013-07-15 14:46:25,468 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: Task succeeded with attempt attempt_1373875322959_0032_m_00_0 2013-07-15 14:46:25,470 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1373875322959_0032_m_00 Task Transitioned from RUNNING to SUCCEEDED 2013-07-15 14:46:33,810 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1373875322959_0032_m_00 Task Transitioned from SUCCEEDED to SCHEDULED 2013-07-15 14:46:37,344 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: Task succeeded with attempt attempt_1373875322959_0032_m_00_1 2013-07-15 14:46:37,344 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1373875322959_0032_m_00 Task Transitioned from RUNNING to SUCCEEDED 2013-07-15 14:46:37,345 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: JOB_TASK_COMPLETED at SUCCEEDED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:445) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:866) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:128) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1095) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1091) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:130) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:77) at java.lang.Thread.run(Thread.java:662) -- This message is automatically
[jira] [Commented] (YARN-513) Create common proxy client for communicating with RM
[ https://issues.apache.org/jira/browse/YARN-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709480#comment-13709480 ] Hadoop QA commented on YARN-513: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12592481/YARN-513.17.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests: org.apache.hadoop.mapreduce.v2.TestMROldApiJobs {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1493//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1493//console This message is automatically generated. Create common proxy client for communicating with RM Key: YARN-513 URL: https://issues.apache.org/jira/browse/YARN-513 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Jian He Attachments: YARN-513.10.patch, YARN-513.11.patch, YARN-513.12.patch, YARN-513.13.patch, YARN-513.14.patch, YARN-513.15.patch, YARN-513.16.patch, YARN-513.17.patch, YARN-513.17.patch, YARN-513.1.patch, YARN-513.2.patch, YARN-513.3.patch, YARN-513.4.patch, YARN.513.5.patch, YARN-513.6.patch, YARN-513.7.patch, YARN-513.8.patch, YARN-513.9.patch When the RM is restarting, the NM, AM and Clients should wait for some time for the RM to come back up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira