[jira] [Updated] (YARN-1775) Create SMAPBasedProcessTree to get PSS information
[ https://issues.apache.org/jira/browse/YARN-1775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated YARN-1775: --- Attachment: YARN-1775-v3.patch getSmapBasedCumulativeRssmem() should be private -Fixed When converting #pages to bytes, use PAGE_SIZE instead of hard-coding 1024. -smap information has KB which needs to be converted to bytes. PAGE_SIZE mostly will be 4096 which will give wrong value in getSmapBasedCumulativeRssmem. Move the constant PROCFS_SMAPS_ENABLED to YarnConfiguration -Fixed. Suggestions for renames PROCFS_SMAPS_ENABLED - PROCFS_USE_SMAPS_BASED_RSS yarn.nodemanager.container-monitor.process-tree.smaps.enabled - yarn.nodemanager.container-monitor.procfs-based-proces-tree.smaps-based-rss.enabled. (Did I just say that? ) -Fixed (yarn.nodemanager.container-monitor.procfs-tree.smaps-based-rss.enabled). Still long I believe. ProcessMemInfo - ProcessTreeSmapMemInfo?, MemoryMappingInfo - ProcessSmapMemoryInfo, moduleMemList - memoryInfoList, processSMAPTree should be cleared in every iteration of updating the process-tree -Fixed isSmapEnabled() should be private -Removed this method completely. As a part of setConf() call, smapEnabled is computed. MemoryMappingInfo.updateModuleMemInfo: We should skip everything else when we run into integer parsing issue of the value. Right now you are logging, ignoring and continuing. -Fixed Rename MEM_INFO to MemInfo to go with other enums in the source? -Fixed We should probably switch the following two ifs? -Fixed Javadoc error -Fixed Reformatted the testcase as well. While enforcing memory constraints, I wonder if people would want to use any other definitions of RSS to be more conservative or aggressive. Do you think it would make sense to provide these options separately, and have what you have as the default? We can punt this to a different JIRA, just wanted to bring it up. -This option can be provided as advanced/expert configuration. We can have a separate JIRA to track it separately. Please feel free to open a new JIRA. Create SMAPBasedProcessTree to get PSS information -- Key: YARN-1775 URL: https://issues.apache.org/jira/browse/YARN-1775 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Rajesh Balamohan Assignee: Rajesh Balamohan Attachments: YARN-1775-v2.patch, YARN-1775-v3.patch, YARN-1775-v3.patch, YARN-1775-v4.patch, yarn-1775-2.4.0.patch Create SMAPBasedProcessTree (by extending ProcfsBasedProcessTree), which will make use of PSS for computing the memory usage. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1775) Create SMAPBasedProcessTree to get PSS information
[ https://issues.apache.org/jira/browse/YARN-1775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated YARN-1775: --- Attachment: YARN-1775-v4.patch Renaming the patch as v4. Create SMAPBasedProcessTree to get PSS information -- Key: YARN-1775 URL: https://issues.apache.org/jira/browse/YARN-1775 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Rajesh Balamohan Assignee: Rajesh Balamohan Attachments: YARN-1775-v2.patch, YARN-1775-v3.patch, YARN-1775-v3.patch, YARN-1775-v4.patch, yarn-1775-2.4.0.patch Create SMAPBasedProcessTree (by extending ProcfsBasedProcessTree), which will make use of PSS for computing the memory usage. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1775) Create SMAPBasedProcessTree to get PSS information
[ https://issues.apache.org/jira/browse/YARN-1775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13942872#comment-13942872 ] Hadoop QA commented on YARN-1775: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12635971/YARN-1775-v4.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3420//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3420//console This message is automatically generated. Create SMAPBasedProcessTree to get PSS information -- Key: YARN-1775 URL: https://issues.apache.org/jira/browse/YARN-1775 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Rajesh Balamohan Assignee: Rajesh Balamohan Attachments: YARN-1775-v2.patch, YARN-1775-v3.patch, YARN-1775-v3.patch, YARN-1775-v4.patch, yarn-1775-2.4.0.patch Create SMAPBasedProcessTree (by extending ProcfsBasedProcessTree), which will make use of PSS for computing the memory usage. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1854) TestRMHA#testStartAndTransitions Fails
[ https://issues.apache.org/jira/browse/YARN-1854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-1854: - Attachment: YARN-1854.1.patch Attaching patch. Please review.. I changed verifyClusterMetrics for retrying 5 times with 1sec waiting.I verified behaviour adding break point in capacityscheduler,so that retry is done for 2 times and later updating queuemetric references. TestRMHA#testStartAndTransitions Fails -- Key: YARN-1854 URL: https://issues.apache.org/jira/browse/YARN-1854 Project: Hadoop YARN Issue Type: Test Affects Versions: 2.4.0 Reporter: Mit Desai Assignee: Rohith Priority: Blocker Attachments: YARN-1854.1.patch, YARN-1854.patch {noformat} testStartAndTransitions(org.apache.hadoop.yarn.server.resourcemanager.TestRMHA) Time elapsed: 5.883 sec FAILURE! java.lang.AssertionError: Incorrect value for metric availableMB expected:2048 but was:4096 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.assertMetric(TestRMHA.java:396) at org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.verifyClusterMetrics(TestRMHA.java:387) at org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.testStartAndTransitions(TestRMHA.java:160) Results : Failed tests: TestRMHA.testStartAndTransitions:160-verifyClusterMetrics:387-assertMetric:396 Incorrect value for metric availableMB expected:2048 but was:4096 {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1811) RM HA: AM link broken if the AM is on nodes other than RM
[ https://issues.apache.org/jira/browse/YARN-1811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13942974#comment-13942974 ] Hudson commented on YARN-1811: -- FAILURE: Integrated in Hadoop-Yarn-trunk #516 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/516/]) YARN-1811. Fixed AMFilters in YARN to correctly accept requests from either web-app proxy or the RMs when HA is enabled. Contributed by Robert Kanter. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1579877) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/HAUtil.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/RMHAUtils.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/util/WebAppUtils.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMHAUtils.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebApp.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/main/java/org/apache/hadoop/yarn/server/webproxy/amfilter/AmFilterInitializer.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/main/java/org/apache/hadoop/yarn/server/webproxy/amfilter/AmIpFilter.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/test/java/org/apache/hadoop/yarn/server/webproxy/amfilter/TestAmFilter.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/test/java/org/apache/hadoop/yarn/server/webproxy/amfilter/TestAmFilterInitializer.java RM HA: AM link broken if the AM is on nodes other than RM - Key: YARN-1811 URL: https://issues.apache.org/jira/browse/YARN-1811 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.3.0 Reporter: Robert Kanter Assignee: Robert Kanter Fix For: 2.4.0 Attachments: YARN-1811.patch, YARN-1811.patch, YARN-1811.patch, YARN-1811.patch, YARN-1811.patch, YARN-1811.patch, YARN-1811.patch When using RM HA, if you click on the Application Master link in the RM web UI while the job is running, you get an Error 500: -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1570) Formatting the lines within 80 chars in YarnCommands.apt.vm
[ https://issues.apache.org/jira/browse/YARN-1570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13942970#comment-13942970 ] Hudson commented on YARN-1570: -- FAILURE: Integrated in Hadoop-Yarn-trunk #516 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/516/]) YARN-1570. Fixed formatting of the lines in YarnCommands.apt.vm docs source. Contributed by Akira Ajisaka. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1579797) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/YarnCommands.apt.vm Formatting the lines within 80 chars in YarnCommands.apt.vm --- Key: YARN-1570 URL: https://issues.apache.org/jira/browse/YARN-1570 Project: Hadoop YARN Issue Type: Improvement Components: documentation Affects Versions: 2.2.0 Reporter: Akira AJISAKA Assignee: Akira AJISAKA Priority: Minor Labels: newbie Fix For: 2.4.0 Attachments: YARN-1570.patch In YarnCommands.apt.vm, there are some lines longer than 80 characters. For example: {code} Yarn commands are invoked by the bin/yarn script. Running the yarn script without any arguments prints the description for all commands. {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1859) WebAppProxyServlet will throw ApplicationNotFoundException if the app is no longer cached in RM
[ https://issues.apache.org/jira/browse/YARN-1859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13942972#comment-13942972 ] Hudson commented on YARN-1859: -- FAILURE: Integrated in Hadoop-Yarn-trunk #516 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/516/]) YARN-1859. Fixed WebAppProxyServlet to correctly handle applications absent on the ResourceManager. Contributed by Zhijie Shen. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1579866) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/main/java/org/apache/hadoop/yarn/server/webproxy/WebAppProxyServlet.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/test/java/org/apache/hadoop/yarn/server/webproxy/TestWebAppProxyServlet.java WebAppProxyServlet will throw ApplicationNotFoundException if the app is no longer cached in RM --- Key: YARN-1859 URL: https://issues.apache.org/jira/browse/YARN-1859 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Zhijie Shen Fix For: 2.4.0 Attachments: YARN-1859.1.patch WebAppProxyServlet checks null to determine whether the application is not found or not. {code} ApplicationReport applicationReport = getApplicationReport(id); if(applicationReport == null) { LOG.warn(req.getRemoteUser()+ Attempting to access +id+ that was not found); {code} However, WebAppProxyServlet calls AppReportFetcher, which consequently calls ClientRMService. When application is not found, ClientRMService throws ApplicationNotFoundException. Therefore, in WebAppProxyServlet, the following logic to create the tracking url for a non-cached app will no longer be in use. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1855) TestRMFailover#testRMWebAppRedirect fails in trunk
[ https://issues.apache.org/jira/browse/YARN-1855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13942971#comment-13942971 ] Hudson commented on YARN-1855: -- FAILURE: Integrated in Hadoop-Yarn-trunk #516 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/516/]) YARN-1855. Made Application-history server to be optional in MiniYARNCluster and thus avoid the failure of TestRMFailover#testRMWebAppRedirect. Contributed by Zhijie Shen. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1579838) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDistributedShell.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java TestRMFailover#testRMWebAppRedirect fails in trunk -- Key: YARN-1855 URL: https://issues.apache.org/jira/browse/YARN-1855 Project: Hadoop YARN Issue Type: Test Reporter: Ted Yu Assignee: Zhijie Shen Priority: Critical Fix For: 2.4.0 Attachments: YARN-1855.1.patch, YARN-1855.1.patch, YARN-1855.2.patch From https://builds.apache.org/job/Hadoop-Yarn-trunk/514/console : {code} testRMWebAppRedirect(org.apache.hadoop.yarn.client.TestRMFailover) Time elapsed: 5.39 sec ERROR! java.lang.NullPointerException: null at org.apache.hadoop.yarn.client.TestRMFailover.testRMWebAppRedirect(TestRMFailover.java:269) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1570) Formatting the lines within 80 chars in YarnCommands.apt.vm
[ https://issues.apache.org/jira/browse/YARN-1570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943080#comment-13943080 ] Hudson commented on YARN-1570: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1708 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1708/]) YARN-1570. Fixed formatting of the lines in YarnCommands.apt.vm docs source. Contributed by Akira Ajisaka. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1579797) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/YarnCommands.apt.vm Formatting the lines within 80 chars in YarnCommands.apt.vm --- Key: YARN-1570 URL: https://issues.apache.org/jira/browse/YARN-1570 Project: Hadoop YARN Issue Type: Improvement Components: documentation Affects Versions: 2.2.0 Reporter: Akira AJISAKA Assignee: Akira AJISAKA Priority: Minor Labels: newbie Fix For: 2.4.0 Attachments: YARN-1570.patch In YarnCommands.apt.vm, there are some lines longer than 80 characters. For example: {code} Yarn commands are invoked by the bin/yarn script. Running the yarn script without any arguments prints the description for all commands. {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1859) WebAppProxyServlet will throw ApplicationNotFoundException if the app is no longer cached in RM
[ https://issues.apache.org/jira/browse/YARN-1859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943082#comment-13943082 ] Hudson commented on YARN-1859: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1708 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1708/]) YARN-1859. Fixed WebAppProxyServlet to correctly handle applications absent on the ResourceManager. Contributed by Zhijie Shen. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1579866) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/main/java/org/apache/hadoop/yarn/server/webproxy/WebAppProxyServlet.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/test/java/org/apache/hadoop/yarn/server/webproxy/TestWebAppProxyServlet.java WebAppProxyServlet will throw ApplicationNotFoundException if the app is no longer cached in RM --- Key: YARN-1859 URL: https://issues.apache.org/jira/browse/YARN-1859 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Zhijie Shen Fix For: 2.4.0 Attachments: YARN-1859.1.patch WebAppProxyServlet checks null to determine whether the application is not found or not. {code} ApplicationReport applicationReport = getApplicationReport(id); if(applicationReport == null) { LOG.warn(req.getRemoteUser()+ Attempting to access +id+ that was not found); {code} However, WebAppProxyServlet calls AppReportFetcher, which consequently calls ClientRMService. When application is not found, ClientRMService throws ApplicationNotFoundException. Therefore, in WebAppProxyServlet, the following logic to create the tracking url for a non-cached app will no longer be in use. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1811) RM HA: AM link broken if the AM is on nodes other than RM
[ https://issues.apache.org/jira/browse/YARN-1811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943084#comment-13943084 ] Hudson commented on YARN-1811: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1708 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1708/]) YARN-1811. Fixed AMFilters in YARN to correctly accept requests from either web-app proxy or the RMs when HA is enabled. Contributed by Robert Kanter. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1579877) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/HAUtil.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/RMHAUtils.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/util/WebAppUtils.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMHAUtils.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebApp.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/main/java/org/apache/hadoop/yarn/server/webproxy/amfilter/AmFilterInitializer.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/main/java/org/apache/hadoop/yarn/server/webproxy/amfilter/AmIpFilter.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/test/java/org/apache/hadoop/yarn/server/webproxy/amfilter/TestAmFilter.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/test/java/org/apache/hadoop/yarn/server/webproxy/amfilter/TestAmFilterInitializer.java RM HA: AM link broken if the AM is on nodes other than RM - Key: YARN-1811 URL: https://issues.apache.org/jira/browse/YARN-1811 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.3.0 Reporter: Robert Kanter Assignee: Robert Kanter Fix For: 2.4.0 Attachments: YARN-1811.patch, YARN-1811.patch, YARN-1811.patch, YARN-1811.patch, YARN-1811.patch, YARN-1811.patch, YARN-1811.patch When using RM HA, if you click on the Application Master link in the RM web UI while the job is running, you get an Error 500: -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1855) TestRMFailover#testRMWebAppRedirect fails in trunk
[ https://issues.apache.org/jira/browse/YARN-1855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943081#comment-13943081 ] Hudson commented on YARN-1855: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1708 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1708/]) YARN-1855. Made Application-history server to be optional in MiniYARNCluster and thus avoid the failure of TestRMFailover#testRMWebAppRedirect. Contributed by Zhijie Shen. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1579838) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDistributedShell.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java TestRMFailover#testRMWebAppRedirect fails in trunk -- Key: YARN-1855 URL: https://issues.apache.org/jira/browse/YARN-1855 Project: Hadoop YARN Issue Type: Test Reporter: Ted Yu Assignee: Zhijie Shen Priority: Critical Fix For: 2.4.0 Attachments: YARN-1855.1.patch, YARN-1855.1.patch, YARN-1855.2.patch From https://builds.apache.org/job/Hadoop-Yarn-trunk/514/console : {code} testRMWebAppRedirect(org.apache.hadoop.yarn.client.TestRMFailover) Time elapsed: 5.39 sec ERROR! java.lang.NullPointerException: null at org.apache.hadoop.yarn.client.TestRMFailover.testRMWebAppRedirect(TestRMFailover.java:269) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1811) RM HA: AM link broken if the AM is on nodes other than RM
[ https://issues.apache.org/jira/browse/YARN-1811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943134#comment-13943134 ] Hudson commented on YARN-1811: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1733 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1733/]) YARN-1811. Fixed AMFilters in YARN to correctly accept requests from either web-app proxy or the RMs when HA is enabled. Contributed by Robert Kanter. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1579877) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/HAUtil.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/RMHAUtils.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/util/WebAppUtils.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMHAUtils.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebApp.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/main/java/org/apache/hadoop/yarn/server/webproxy/amfilter/AmFilterInitializer.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/main/java/org/apache/hadoop/yarn/server/webproxy/amfilter/AmIpFilter.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/test/java/org/apache/hadoop/yarn/server/webproxy/amfilter/TestAmFilter.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/test/java/org/apache/hadoop/yarn/server/webproxy/amfilter/TestAmFilterInitializer.java RM HA: AM link broken if the AM is on nodes other than RM - Key: YARN-1811 URL: https://issues.apache.org/jira/browse/YARN-1811 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.3.0 Reporter: Robert Kanter Assignee: Robert Kanter Fix For: 2.4.0 Attachments: YARN-1811.patch, YARN-1811.patch, YARN-1811.patch, YARN-1811.patch, YARN-1811.patch, YARN-1811.patch, YARN-1811.patch When using RM HA, if you click on the Application Master link in the RM web UI while the job is running, you get an Error 500: -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1855) TestRMFailover#testRMWebAppRedirect fails in trunk
[ https://issues.apache.org/jira/browse/YARN-1855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943131#comment-13943131 ] Hudson commented on YARN-1855: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1733 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1733/]) YARN-1855. Made Application-history server to be optional in MiniYARNCluster and thus avoid the failure of TestRMFailover#testRMWebAppRedirect. Contributed by Zhijie Shen. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1579838) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDistributedShell.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java TestRMFailover#testRMWebAppRedirect fails in trunk -- Key: YARN-1855 URL: https://issues.apache.org/jira/browse/YARN-1855 Project: Hadoop YARN Issue Type: Test Reporter: Ted Yu Assignee: Zhijie Shen Priority: Critical Fix For: 2.4.0 Attachments: YARN-1855.1.patch, YARN-1855.1.patch, YARN-1855.2.patch From https://builds.apache.org/job/Hadoop-Yarn-trunk/514/console : {code} testRMWebAppRedirect(org.apache.hadoop.yarn.client.TestRMFailover) Time elapsed: 5.39 sec ERROR! java.lang.NullPointerException: null at org.apache.hadoop.yarn.client.TestRMFailover.testRMWebAppRedirect(TestRMFailover.java:269) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1570) Formatting the lines within 80 chars in YarnCommands.apt.vm
[ https://issues.apache.org/jira/browse/YARN-1570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943130#comment-13943130 ] Hudson commented on YARN-1570: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1733 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1733/]) YARN-1570. Fixed formatting of the lines in YarnCommands.apt.vm docs source. Contributed by Akira Ajisaka. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1579797) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/YarnCommands.apt.vm Formatting the lines within 80 chars in YarnCommands.apt.vm --- Key: YARN-1570 URL: https://issues.apache.org/jira/browse/YARN-1570 Project: Hadoop YARN Issue Type: Improvement Components: documentation Affects Versions: 2.2.0 Reporter: Akira AJISAKA Assignee: Akira AJISAKA Priority: Minor Labels: newbie Fix For: 2.4.0 Attachments: YARN-1570.patch In YarnCommands.apt.vm, there are some lines longer than 80 characters. For example: {code} Yarn commands are invoked by the bin/yarn script. Running the yarn script without any arguments prints the description for all commands. {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1859) WebAppProxyServlet will throw ApplicationNotFoundException if the app is no longer cached in RM
[ https://issues.apache.org/jira/browse/YARN-1859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943132#comment-13943132 ] Hudson commented on YARN-1859: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1733 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1733/]) YARN-1859. Fixed WebAppProxyServlet to correctly handle applications absent on the ResourceManager. Contributed by Zhijie Shen. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1579866) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/main/java/org/apache/hadoop/yarn/server/webproxy/WebAppProxyServlet.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/test/java/org/apache/hadoop/yarn/server/webproxy/TestWebAppProxyServlet.java WebAppProxyServlet will throw ApplicationNotFoundException if the app is no longer cached in RM --- Key: YARN-1859 URL: https://issues.apache.org/jira/browse/YARN-1859 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Zhijie Shen Fix For: 2.4.0 Attachments: YARN-1859.1.patch WebAppProxyServlet checks null to determine whether the application is not found or not. {code} ApplicationReport applicationReport = getApplicationReport(id); if(applicationReport == null) { LOG.warn(req.getRemoteUser()+ Attempting to access +id+ that was not found); {code} However, WebAppProxyServlet calls AppReportFetcher, which consequently calls ClientRMService. When application is not found, ClientRMService throws ApplicationNotFoundException. Therefore, in WebAppProxyServlet, the following logic to create the tracking url for a non-cached app will no longer be in use. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1670) aggregated log writer can write more log data then it says is the log length
[ https://issues.apache.org/jira/browse/YARN-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated YARN-1670: Attachment: YARN-1670-v3-b23.patch aggregated log writer can write more log data then it says is the log length Key: YARN-1670 URL: https://issues.apache.org/jira/browse/YARN-1670 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 0.23.10, 2.2.0 Reporter: Thomas Graves Assignee: Mit Desai Priority: Critical Attachments: YARN-1670-b23.patch, YARN-1670-v2-b23.patch, YARN-1670-v2.patch, YARN-1670-v3-b23.patch, YARN-1670.patch, YARN-1670.patch We have seen exceptions when using 'yarn logs' to read log files. at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Long.parseLong(Long.java:441) at java.lang.Long.parseLong(Long.java:483) at org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.readAContainerLogsForALogType(AggregatedLogFormat.java:518) at org.apache.hadoop.yarn.logaggregation.LogDumper.dumpAContainerLogs(LogDumper.java:178) at org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:130) at org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:246) We traced it down to the reader trying to read the file type of the next file but where it reads is still log data from the previous file. What happened was the Log Length was written as a certain size but the log data was actually longer then that. Inside of the write() routine in LogValue it first writes what the logfile length is, but then when it goes to write the log itself it just goes to the end of the file. There is a race condition here where if someone is still writing to the file when it goes to be aggregated the length written could be to small. We should have the write() routine stop when it writes whatever it said was the length. It would be nice if we could somehow tell the user it might be truncated but I'm not sure of a good way to do this. We also noticed that a bug in readAContainerLogsForALogType where it is using an int for curRead whereas it should be using a long. while (len != -1 curRead fileLength) { This isn't actually a problem right now as it looks like the underlying decoder is doing the right thing and the len condition exits. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1670) aggregated log writer can write more log data then it says is the log length
[ https://issues.apache.org/jira/browse/YARN-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated YARN-1670: Attachment: YARN-1670-v3.patch Thanks [~vinodkv] for the feedback. 1- I changed the formatting. 2- I have modified the patch to use up less memory. It should work now. I have also tested the new patch on my Eclipse IDE with HeapSize=1GB and the test pass every time I run it. aggregated log writer can write more log data then it says is the log length Key: YARN-1670 URL: https://issues.apache.org/jira/browse/YARN-1670 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 0.23.10, 2.2.0 Reporter: Thomas Graves Assignee: Mit Desai Priority: Critical Attachments: YARN-1670-b23.patch, YARN-1670-v2-b23.patch, YARN-1670-v2.patch, YARN-1670-v3-b23.patch, YARN-1670-v3.patch, YARN-1670.patch, YARN-1670.patch We have seen exceptions when using 'yarn logs' to read log files. at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Long.parseLong(Long.java:441) at java.lang.Long.parseLong(Long.java:483) at org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.readAContainerLogsForALogType(AggregatedLogFormat.java:518) at org.apache.hadoop.yarn.logaggregation.LogDumper.dumpAContainerLogs(LogDumper.java:178) at org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:130) at org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:246) We traced it down to the reader trying to read the file type of the next file but where it reads is still log data from the previous file. What happened was the Log Length was written as a certain size but the log data was actually longer then that. Inside of the write() routine in LogValue it first writes what the logfile length is, but then when it goes to write the log itself it just goes to the end of the file. There is a race condition here where if someone is still writing to the file when it goes to be aggregated the length written could be to small. We should have the write() routine stop when it writes whatever it said was the length. It would be nice if we could somehow tell the user it might be truncated but I'm not sure of a good way to do this. We also noticed that a bug in readAContainerLogsForALogType where it is using an int for curRead whereas it should be using a long. while (len != -1 curRead fileLength) { This isn't actually a problem right now as it looks like the underlying decoder is doing the right thing and the len condition exits. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-1863) TestRMFailover fails with 'AssertionError: null'
Ted Yu created YARN-1863: Summary: TestRMFailover fails with 'AssertionError: null' Key: YARN-1863 URL: https://issues.apache.org/jira/browse/YARN-1863 Project: Hadoop YARN Issue Type: Test Reporter: Ted Yu This happened in Hadoop-Yarn-trunk - Build # 516 and can be reproduced: {code} testWebAppProxyInStandAloneMode(org.apache.hadoop.yarn.client.TestRMFailover) Time elapsed: 5.834 sec FAILURE! java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:92) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertTrue(Assert.java:54) at org.apache.hadoop.yarn.client.TestRMFailover.verifyExpectedException(TestRMFailover.java:250) at org.apache.hadoop.yarn.client.TestRMFailover.testWebAppProxyInStandAloneMode(TestRMFailover.java:216) testEmbeddedWebAppProxy(org.apache.hadoop.yarn.client.TestRMFailover) Time elapsed: 5.341 sec FAILURE! java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:92) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertTrue(Assert.java:54) at org.apache.hadoop.yarn.client.TestRMFailover.verifyExpectedException(TestRMFailover.java:250) at org.apache.hadoop.yarn.client.TestRMFailover.testEmbeddedWebAppProxy(TestRMFailover.java:241) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1670) aggregated log writer can write more log data then it says is the log length
[ https://issues.apache.org/jira/browse/YARN-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943218#comment-13943218 ] Hadoop QA commented on YARN-1670: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12636040/YARN-1670-v3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3422//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3422//console This message is automatically generated. aggregated log writer can write more log data then it says is the log length Key: YARN-1670 URL: https://issues.apache.org/jira/browse/YARN-1670 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 0.23.10, 2.2.0 Reporter: Thomas Graves Assignee: Mit Desai Priority: Critical Attachments: YARN-1670-b23.patch, YARN-1670-v2-b23.patch, YARN-1670-v2.patch, YARN-1670-v3-b23.patch, YARN-1670-v3.patch, YARN-1670.patch, YARN-1670.patch We have seen exceptions when using 'yarn logs' to read log files. at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Long.parseLong(Long.java:441) at java.lang.Long.parseLong(Long.java:483) at org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.readAContainerLogsForALogType(AggregatedLogFormat.java:518) at org.apache.hadoop.yarn.logaggregation.LogDumper.dumpAContainerLogs(LogDumper.java:178) at org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:130) at org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:246) We traced it down to the reader trying to read the file type of the next file but where it reads is still log data from the previous file. What happened was the Log Length was written as a certain size but the log data was actually longer then that. Inside of the write() routine in LogValue it first writes what the logfile length is, but then when it goes to write the log itself it just goes to the end of the file. There is a race condition here where if someone is still writing to the file when it goes to be aggregated the length written could be to small. We should have the write() routine stop when it writes whatever it said was the length. It would be nice if we could somehow tell the user it might be truncated but I'm not sure of a good way to do this. We also noticed that a bug in readAContainerLogsForALogType where it is using an int for curRead whereas it should be using a long. while (len != -1 curRead fileLength) { This isn't actually a problem right now as it looks like the underlying decoder is doing the right thing and the len condition exits. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (YARN-1863) TestRMFailover fails with 'AssertionError: null'
[ https://issues.apache.org/jira/browse/YARN-1863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong reassigned YARN-1863: --- Assignee: Xuan Gong TestRMFailover fails with 'AssertionError: null' Key: YARN-1863 URL: https://issues.apache.org/jira/browse/YARN-1863 Project: Hadoop YARN Issue Type: Test Reporter: Ted Yu Assignee: Xuan Gong This happened in Hadoop-Yarn-trunk - Build # 516 and can be reproduced: {code} testWebAppProxyInStandAloneMode(org.apache.hadoop.yarn.client.TestRMFailover) Time elapsed: 5.834 sec FAILURE! java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:92) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertTrue(Assert.java:54) at org.apache.hadoop.yarn.client.TestRMFailover.verifyExpectedException(TestRMFailover.java:250) at org.apache.hadoop.yarn.client.TestRMFailover.testWebAppProxyInStandAloneMode(TestRMFailover.java:216) testEmbeddedWebAppProxy(org.apache.hadoop.yarn.client.TestRMFailover) Time elapsed: 5.341 sec FAILURE! java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:92) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertTrue(Assert.java:54) at org.apache.hadoop.yarn.client.TestRMFailover.verifyExpectedException(TestRMFailover.java:250) at org.apache.hadoop.yarn.client.TestRMFailover.testEmbeddedWebAppProxy(TestRMFailover.java:241) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1863) TestRMFailover fails with 'AssertionError: null'
[ https://issues.apache.org/jira/browse/YARN-1863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1863: Attachment: YARN-1863.1.patch TestRMFailover fails with 'AssertionError: null' Key: YARN-1863 URL: https://issues.apache.org/jira/browse/YARN-1863 Project: Hadoop YARN Issue Type: Test Reporter: Ted Yu Assignee: Xuan Gong Attachments: YARN-1863.1.patch This happened in Hadoop-Yarn-trunk - Build # 516 and can be reproduced: {code} testWebAppProxyInStandAloneMode(org.apache.hadoop.yarn.client.TestRMFailover) Time elapsed: 5.834 sec FAILURE! java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:92) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertTrue(Assert.java:54) at org.apache.hadoop.yarn.client.TestRMFailover.verifyExpectedException(TestRMFailover.java:250) at org.apache.hadoop.yarn.client.TestRMFailover.testWebAppProxyInStandAloneMode(TestRMFailover.java:216) testEmbeddedWebAppProxy(org.apache.hadoop.yarn.client.TestRMFailover) Time elapsed: 5.341 sec FAILURE! java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:92) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertTrue(Assert.java:54) at org.apache.hadoop.yarn.client.TestRMFailover.verifyExpectedException(TestRMFailover.java:250) at org.apache.hadoop.yarn.client.TestRMFailover.testEmbeddedWebAppProxy(TestRMFailover.java:241) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1863) TestRMFailover fails with 'AssertionError: null'
[ https://issues.apache.org/jira/browse/YARN-1863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943253#comment-13943253 ] Xuan Gong commented on YARN-1863: - After https://issues.apache.org/jira/browse/YARN-1859, if we send a httpRequest with fake Application id, it will throw ApplicationNotFoundException. Instead, it will send the httpResponse with Not Found message which cause the test case failures TestRMFailover fails with 'AssertionError: null' Key: YARN-1863 URL: https://issues.apache.org/jira/browse/YARN-1863 Project: Hadoop YARN Issue Type: Test Reporter: Ted Yu Assignee: Xuan Gong Attachments: YARN-1863.1.patch This happened in Hadoop-Yarn-trunk - Build # 516 and can be reproduced: {code} testWebAppProxyInStandAloneMode(org.apache.hadoop.yarn.client.TestRMFailover) Time elapsed: 5.834 sec FAILURE! java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:92) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertTrue(Assert.java:54) at org.apache.hadoop.yarn.client.TestRMFailover.verifyExpectedException(TestRMFailover.java:250) at org.apache.hadoop.yarn.client.TestRMFailover.testWebAppProxyInStandAloneMode(TestRMFailover.java:216) testEmbeddedWebAppProxy(org.apache.hadoop.yarn.client.TestRMFailover) Time elapsed: 5.341 sec FAILURE! java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:92) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertTrue(Assert.java:54) at org.apache.hadoop.yarn.client.TestRMFailover.verifyExpectedException(TestRMFailover.java:250) at org.apache.hadoop.yarn.client.TestRMFailover.testEmbeddedWebAppProxy(TestRMFailover.java:241) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1863) TestRMFailover fails with 'AssertionError: null'
[ https://issues.apache.org/jira/browse/YARN-1863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943260#comment-13943260 ] Xuan Gong commented on YARN-1863: - Modify the testcases to verify we can receive httpResponse with Not Found message if we send a httpRequest with fakeApplicationId TestRMFailover fails with 'AssertionError: null' Key: YARN-1863 URL: https://issues.apache.org/jira/browse/YARN-1863 Project: Hadoop YARN Issue Type: Test Reporter: Ted Yu Assignee: Xuan Gong Attachments: YARN-1863.1.patch This happened in Hadoop-Yarn-trunk - Build # 516 and can be reproduced: {code} testWebAppProxyInStandAloneMode(org.apache.hadoop.yarn.client.TestRMFailover) Time elapsed: 5.834 sec FAILURE! java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:92) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertTrue(Assert.java:54) at org.apache.hadoop.yarn.client.TestRMFailover.verifyExpectedException(TestRMFailover.java:250) at org.apache.hadoop.yarn.client.TestRMFailover.testWebAppProxyInStandAloneMode(TestRMFailover.java:216) testEmbeddedWebAppProxy(org.apache.hadoop.yarn.client.TestRMFailover) Time elapsed: 5.341 sec FAILURE! java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:92) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertTrue(Assert.java:54) at org.apache.hadoop.yarn.client.TestRMFailover.verifyExpectedException(TestRMFailover.java:250) at org.apache.hadoop.yarn.client.TestRMFailover.testEmbeddedWebAppProxy(TestRMFailover.java:241) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (YARN-1863) TestRMFailover fails with 'AssertionError: null'
[ https://issues.apache.org/jira/browse/YARN-1863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen reassigned YARN-1863: - Assignee: Zhijie Shen (was: Xuan Gong) TestRMFailover fails with 'AssertionError: null' Key: YARN-1863 URL: https://issues.apache.org/jira/browse/YARN-1863 Project: Hadoop YARN Issue Type: Test Reporter: Ted Yu Assignee: Zhijie Shen Attachments: YARN-1863.1.patch This happened in Hadoop-Yarn-trunk - Build # 516 and can be reproduced: {code} testWebAppProxyInStandAloneMode(org.apache.hadoop.yarn.client.TestRMFailover) Time elapsed: 5.834 sec FAILURE! java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:92) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertTrue(Assert.java:54) at org.apache.hadoop.yarn.client.TestRMFailover.verifyExpectedException(TestRMFailover.java:250) at org.apache.hadoop.yarn.client.TestRMFailover.testWebAppProxyInStandAloneMode(TestRMFailover.java:216) testEmbeddedWebAppProxy(org.apache.hadoop.yarn.client.TestRMFailover) Time elapsed: 5.341 sec FAILURE! java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:92) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertTrue(Assert.java:54) at org.apache.hadoop.yarn.client.TestRMFailover.verifyExpectedException(TestRMFailover.java:250) at org.apache.hadoop.yarn.client.TestRMFailover.testEmbeddedWebAppProxy(TestRMFailover.java:241) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1863) TestRMFailover fails with 'AssertionError: null'
[ https://issues.apache.org/jira/browse/YARN-1863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943267#comment-13943267 ] Zhijie Shen commented on YARN-1863: --- In YARN-1859, I catch ApplicationNotFoundException and move on, because the client still have the chance to create the tracking url when the application is not found in RM cache. Therefore, finally if the tracking url is still not available, not found http response will be return. I'll handle the test failure TestRMFailover fails with 'AssertionError: null' Key: YARN-1863 URL: https://issues.apache.org/jira/browse/YARN-1863 Project: Hadoop YARN Issue Type: Test Reporter: Ted Yu Assignee: Zhijie Shen Attachments: YARN-1863.1.patch This happened in Hadoop-Yarn-trunk - Build # 516 and can be reproduced: {code} testWebAppProxyInStandAloneMode(org.apache.hadoop.yarn.client.TestRMFailover) Time elapsed: 5.834 sec FAILURE! java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:92) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertTrue(Assert.java:54) at org.apache.hadoop.yarn.client.TestRMFailover.verifyExpectedException(TestRMFailover.java:250) at org.apache.hadoop.yarn.client.TestRMFailover.testWebAppProxyInStandAloneMode(TestRMFailover.java:216) testEmbeddedWebAppProxy(org.apache.hadoop.yarn.client.TestRMFailover) Time elapsed: 5.341 sec FAILURE! java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:92) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertTrue(Assert.java:54) at org.apache.hadoop.yarn.client.TestRMFailover.verifyExpectedException(TestRMFailover.java:250) at org.apache.hadoop.yarn.client.TestRMFailover.testEmbeddedWebAppProxy(TestRMFailover.java:241) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1863) TestRMFailover fails with 'AssertionError: null'
[ https://issues.apache.org/jira/browse/YARN-1863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-1863: -- Assignee: Xuan Gong (was: Zhijie Shen) TestRMFailover fails with 'AssertionError: null' Key: YARN-1863 URL: https://issues.apache.org/jira/browse/YARN-1863 Project: Hadoop YARN Issue Type: Test Reporter: Ted Yu Assignee: Xuan Gong Attachments: YARN-1863.1.patch This happened in Hadoop-Yarn-trunk - Build # 516 and can be reproduced: {code} testWebAppProxyInStandAloneMode(org.apache.hadoop.yarn.client.TestRMFailover) Time elapsed: 5.834 sec FAILURE! java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:92) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertTrue(Assert.java:54) at org.apache.hadoop.yarn.client.TestRMFailover.verifyExpectedException(TestRMFailover.java:250) at org.apache.hadoop.yarn.client.TestRMFailover.testWebAppProxyInStandAloneMode(TestRMFailover.java:216) testEmbeddedWebAppProxy(org.apache.hadoop.yarn.client.TestRMFailover) Time elapsed: 5.341 sec FAILURE! java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:92) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertTrue(Assert.java:54) at org.apache.hadoop.yarn.client.TestRMFailover.verifyExpectedException(TestRMFailover.java:250) at org.apache.hadoop.yarn.client.TestRMFailover.testEmbeddedWebAppProxy(TestRMFailover.java:241) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1863) TestRMFailover fails with 'AssertionError: null'
[ https://issues.apache.org/jira/browse/YARN-1863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943283#comment-13943283 ] Zhijie Shen commented on YARN-1863: --- Saw Xuan has post the patch already. Reassign it to Xuan. One comment on the patch: please assert the response code == 404 as well. TestRMFailover fails with 'AssertionError: null' Key: YARN-1863 URL: https://issues.apache.org/jira/browse/YARN-1863 Project: Hadoop YARN Issue Type: Test Reporter: Ted Yu Assignee: Xuan Gong Attachments: YARN-1863.1.patch This happened in Hadoop-Yarn-trunk - Build # 516 and can be reproduced: {code} testWebAppProxyInStandAloneMode(org.apache.hadoop.yarn.client.TestRMFailover) Time elapsed: 5.834 sec FAILURE! java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:92) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertTrue(Assert.java:54) at org.apache.hadoop.yarn.client.TestRMFailover.verifyExpectedException(TestRMFailover.java:250) at org.apache.hadoop.yarn.client.TestRMFailover.testWebAppProxyInStandAloneMode(TestRMFailover.java:216) testEmbeddedWebAppProxy(org.apache.hadoop.yarn.client.TestRMFailover) Time elapsed: 5.341 sec FAILURE! java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:92) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertTrue(Assert.java:54) at org.apache.hadoop.yarn.client.TestRMFailover.verifyExpectedException(TestRMFailover.java:250) at org.apache.hadoop.yarn.client.TestRMFailover.testEmbeddedWebAppProxy(TestRMFailover.java:241) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1863) TestRMFailover fails with 'AssertionError: null'
[ https://issues.apache.org/jira/browse/YARN-1863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943296#comment-13943296 ] Xuan Gong commented on YARN-1863: - bq. One comment on the patch: please assert the response code == 404 as well. DONE TestRMFailover fails with 'AssertionError: null' Key: YARN-1863 URL: https://issues.apache.org/jira/browse/YARN-1863 Project: Hadoop YARN Issue Type: Test Reporter: Ted Yu Assignee: Xuan Gong Attachments: YARN-1863.1.patch, YARN-1863.2.patch This happened in Hadoop-Yarn-trunk - Build # 516 and can be reproduced: {code} testWebAppProxyInStandAloneMode(org.apache.hadoop.yarn.client.TestRMFailover) Time elapsed: 5.834 sec FAILURE! java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:92) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertTrue(Assert.java:54) at org.apache.hadoop.yarn.client.TestRMFailover.verifyExpectedException(TestRMFailover.java:250) at org.apache.hadoop.yarn.client.TestRMFailover.testWebAppProxyInStandAloneMode(TestRMFailover.java:216) testEmbeddedWebAppProxy(org.apache.hadoop.yarn.client.TestRMFailover) Time elapsed: 5.341 sec FAILURE! java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:92) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertTrue(Assert.java:54) at org.apache.hadoop.yarn.client.TestRMFailover.verifyExpectedException(TestRMFailover.java:250) at org.apache.hadoop.yarn.client.TestRMFailover.testEmbeddedWebAppProxy(TestRMFailover.java:241) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1863) TestRMFailover fails with 'AssertionError: null'
[ https://issues.apache.org/jira/browse/YARN-1863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1863: Attachment: YARN-1863.2.patch TestRMFailover fails with 'AssertionError: null' Key: YARN-1863 URL: https://issues.apache.org/jira/browse/YARN-1863 Project: Hadoop YARN Issue Type: Test Reporter: Ted Yu Assignee: Xuan Gong Attachments: YARN-1863.1.patch, YARN-1863.2.patch This happened in Hadoop-Yarn-trunk - Build # 516 and can be reproduced: {code} testWebAppProxyInStandAloneMode(org.apache.hadoop.yarn.client.TestRMFailover) Time elapsed: 5.834 sec FAILURE! java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:92) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertTrue(Assert.java:54) at org.apache.hadoop.yarn.client.TestRMFailover.verifyExpectedException(TestRMFailover.java:250) at org.apache.hadoop.yarn.client.TestRMFailover.testWebAppProxyInStandAloneMode(TestRMFailover.java:216) testEmbeddedWebAppProxy(org.apache.hadoop.yarn.client.TestRMFailover) Time elapsed: 5.341 sec FAILURE! java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:92) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertTrue(Assert.java:54) at org.apache.hadoop.yarn.client.TestRMFailover.verifyExpectedException(TestRMFailover.java:250) at org.apache.hadoop.yarn.client.TestRMFailover.testEmbeddedWebAppProxy(TestRMFailover.java:241) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1294) Log4j settings in container-log4j.properties cannot be overridden
[ https://issues.apache.org/jira/browse/YARN-1294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-1294: Attachment: apache-yarn-1294.1.patch Updated patch to fix order of assignment so that we can set map and reduce specific environment variables and override HADOOP_CLIENT_OPTS and HADOOP_ROOT_LOGGER. Log4j settings in container-log4j.properties cannot be overridden -- Key: YARN-1294 URL: https://issues.apache.org/jira/browse/YARN-1294 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.0-beta Reporter: Eugene Koifman Assignee: Varun Vasudev Attachments: apache-yarn-1294.0.patch, apache-yarn-1294.1.patch setting HADOOP_ROOT_LOGGER, -Dhadoop.root.logger has no effect -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1670) aggregated log writer can write more log data then it says is the log length
[ https://issues.apache.org/jira/browse/YARN-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943356#comment-13943356 ] Vinod Kumar Vavilapalli commented on YARN-1670: --- Looks good, checking this in. aggregated log writer can write more log data then it says is the log length Key: YARN-1670 URL: https://issues.apache.org/jira/browse/YARN-1670 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 0.23.10, 2.2.0 Reporter: Thomas Graves Assignee: Mit Desai Priority: Critical Attachments: YARN-1670-b23.patch, YARN-1670-v2-b23.patch, YARN-1670-v2.patch, YARN-1670-v3-b23.patch, YARN-1670-v3.patch, YARN-1670.patch, YARN-1670.patch We have seen exceptions when using 'yarn logs' to read log files. at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Long.parseLong(Long.java:441) at java.lang.Long.parseLong(Long.java:483) at org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.readAContainerLogsForALogType(AggregatedLogFormat.java:518) at org.apache.hadoop.yarn.logaggregation.LogDumper.dumpAContainerLogs(LogDumper.java:178) at org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:130) at org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:246) We traced it down to the reader trying to read the file type of the next file but where it reads is still log data from the previous file. What happened was the Log Length was written as a certain size but the log data was actually longer then that. Inside of the write() routine in LogValue it first writes what the logfile length is, but then when it goes to write the log itself it just goes to the end of the file. There is a race condition here where if someone is still writing to the file when it goes to be aggregated the length written could be to small. We should have the write() routine stop when it writes whatever it said was the length. It would be nice if we could somehow tell the user it might be truncated but I'm not sure of a good way to do this. We also noticed that a bug in readAContainerLogsForALogType where it is using an int for curRead whereas it should be using a long. while (len != -1 curRead fileLength) { This isn't actually a problem right now as it looks like the underlying decoder is doing the right thing and the len condition exits. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1294) Log4j settings in container-log4j.properties cannot be overridden
[ https://issues.apache.org/jira/browse/YARN-1294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943365#comment-13943365 ] Vinod Kumar Vavilapalli commented on YARN-1294: --- This belongs to MapReduce, moving it. Log4j settings in container-log4j.properties cannot be overridden -- Key: YARN-1294 URL: https://issues.apache.org/jira/browse/YARN-1294 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.0-beta Reporter: Eugene Koifman Assignee: Varun Vasudev Attachments: apache-yarn-1294.0.patch, apache-yarn-1294.1.patch setting HADOOP_ROOT_LOGGER, -Dhadoop.root.logger has no effect -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1336) Work-preserving nodemanager restart
[ https://issues.apache.org/jira/browse/YARN-1336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943381#comment-13943381 ] Karthik Kambatla commented on YARN-1336: Thanks for the update, Jason. I just tried it on a pseudo-dist cluster - on-going containers continue to make progress across an NM restart. It looks very neat! I also barely skimmed over the rollup patch, things look promising. Work-preserving nodemanager restart --- Key: YARN-1336 URL: https://issues.apache.org/jira/browse/YARN-1336 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager Affects Versions: 2.3.0 Reporter: Jason Lowe Assignee: Jason Lowe Attachments: YARN-1336-rollup.patch This serves as an umbrella ticket for tasks related to work-preserving nodemanager restart. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1838) Timeline service getEntities API should provide ability to get entities from given id
[ https://issues.apache.org/jira/browse/YARN-1838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943389#comment-13943389 ] Hadoop QA commented on YARN-1838: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12635874/YARN-1838.2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3423//console This message is automatically generated. Timeline service getEntities API should provide ability to get entities from given id - Key: YARN-1838 URL: https://issues.apache.org/jira/browse/YARN-1838 Project: Hadoop YARN Issue Type: Sub-task Reporter: Srimanth Gunturi Assignee: Billie Rinaldi Attachments: YARN-1838.1.patch, YARN-1838.2.patch To support pagination, we need ability to get entities from a certain ID by providing a new param called {{fromid}}. For example on a page of 10 jobs, our first call will be like [http://server:8188/ws/v1/timeline/HIVE_QUERY_ID?fields=events,primaryfilters,otherinfolimit=11] When user hits next, we would like to call [http://server:8188/ws/v1/timeline/HIVE_QUERY_ID?fields=events,primaryfilters,otherinfofromid=JID11limit=11] and continue on for further _Next_ clicks On hitting back, we will make similar calls for previous items [http://server:8188/ws/v1/timeline/HIVE_QUERY_ID?fields=events,primaryfilters,otherinfofromid=JID1limit=11] {{fromid}} should be inclusive of the id given. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1670) aggregated log writer can write more log data then it says is the log length
[ https://issues.apache.org/jira/browse/YARN-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943395#comment-13943395 ] Hudson commented on YARN-1670: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5371 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5371/]) YARN-1670. Fixed a bug in log-aggregation that can cause the writer to write more log-data than the log-length that it records. Contributed by Mit Desai. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1580005) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/AggregatedLogFormat.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/logaggregation/TestAggregatedLogFormat.java aggregated log writer can write more log data then it says is the log length Key: YARN-1670 URL: https://issues.apache.org/jira/browse/YARN-1670 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 0.23.10, 2.2.0 Reporter: Thomas Graves Assignee: Mit Desai Priority: Critical Fix For: 2.4.0 Attachments: YARN-1670-b23.patch, YARN-1670-v2-b23.patch, YARN-1670-v2.patch, YARN-1670-v3-b23.patch, YARN-1670-v3.patch, YARN-1670.patch, YARN-1670.patch We have seen exceptions when using 'yarn logs' to read log files. at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Long.parseLong(Long.java:441) at java.lang.Long.parseLong(Long.java:483) at org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.readAContainerLogsForALogType(AggregatedLogFormat.java:518) at org.apache.hadoop.yarn.logaggregation.LogDumper.dumpAContainerLogs(LogDumper.java:178) at org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:130) at org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:246) We traced it down to the reader trying to read the file type of the next file but where it reads is still log data from the previous file. What happened was the Log Length was written as a certain size but the log data was actually longer then that. Inside of the write() routine in LogValue it first writes what the logfile length is, but then when it goes to write the log itself it just goes to the end of the file. There is a race condition here where if someone is still writing to the file when it goes to be aggregated the length written could be to small. We should have the write() routine stop when it writes whatever it said was the length. It would be nice if we could somehow tell the user it might be truncated but I'm not sure of a good way to do this. We also noticed that a bug in readAContainerLogsForALogType where it is using an int for curRead whereas it should be using a long. while (len != -1 curRead fileLength) { This isn't actually a problem right now as it looks like the underlying decoder is doing the right thing and the len condition exits. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1536) Cleanup: Get rid of ResourceManager#get*SecretManager() methods and use the RMContext methods instead
[ https://issues.apache.org/jira/browse/YARN-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943396#comment-13943396 ] Hadoop QA commented on YARN-1536: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12635403/yarn-1536.002.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 14 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3424//console This message is automatically generated. Cleanup: Get rid of ResourceManager#get*SecretManager() methods and use the RMContext methods instead - Key: YARN-1536 URL: https://issues.apache.org/jira/browse/YARN-1536 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.3.0 Reporter: Karthik Kambatla Assignee: Anubhav Dhoot Priority: Minor Labels: newbie Attachments: yarn-1536.002.patch, yarn-1536.patch Both ResourceManager and RMContext have methods to access the secret managers, and it should be safe (cleaner) to get rid of the ResourceManager methods. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1863) TestRMFailover fails with 'AssertionError: null'
[ https://issues.apache.org/jira/browse/YARN-1863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943413#comment-13943413 ] Hadoop QA commented on YARN-1863: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12636061/YARN-1863.2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3425//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3425//console This message is automatically generated. TestRMFailover fails with 'AssertionError: null' Key: YARN-1863 URL: https://issues.apache.org/jira/browse/YARN-1863 Project: Hadoop YARN Issue Type: Test Reporter: Ted Yu Assignee: Xuan Gong Attachments: YARN-1863.1.patch, YARN-1863.2.patch This happened in Hadoop-Yarn-trunk - Build # 516 and can be reproduced: {code} testWebAppProxyInStandAloneMode(org.apache.hadoop.yarn.client.TestRMFailover) Time elapsed: 5.834 sec FAILURE! java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:92) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertTrue(Assert.java:54) at org.apache.hadoop.yarn.client.TestRMFailover.verifyExpectedException(TestRMFailover.java:250) at org.apache.hadoop.yarn.client.TestRMFailover.testWebAppProxyInStandAloneMode(TestRMFailover.java:216) testEmbeddedWebAppProxy(org.apache.hadoop.yarn.client.TestRMFailover) Time elapsed: 5.341 sec FAILURE! java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:92) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertTrue(Assert.java:54) at org.apache.hadoop.yarn.client.TestRMFailover.verifyExpectedException(TestRMFailover.java:250) at org.apache.hadoop.yarn.client.TestRMFailover.testEmbeddedWebAppProxy(TestRMFailover.java:241) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (YARN-1368) RM should populate running container allocation information from NM resync
[ https://issues.apache.org/jira/browse/YARN-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot reassigned YARN-1368: --- Assignee: Anubhav Dhoot RM should populate running container allocation information from NM resync -- Key: YARN-1368 URL: https://issues.apache.org/jira/browse/YARN-1368 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Anubhav Dhoot YARN-1367 adds support for the NM to tell the RM about all currently running containers upon registration. The RM needs to send this information to the schedulers along with the NODE_ADDED_EVENT so that the schedulers can recover the current allocation state of the cluster. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1670) aggregated log writer can write more log data then it says is the log length
[ https://issues.apache.org/jira/browse/YARN-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943421#comment-13943421 ] Jonathan Eagles commented on YARN-1670: --- Mit, I'm worried that we are still going to have this issue except in the opposite way. On the last read that puts us over the initial filelength, we are not going to write the last part of the data that still fits into the original filelength. In this case our Aggregate File Log length will be smaller than the filelength written to the data structure. jeagles aggregated log writer can write more log data then it says is the log length Key: YARN-1670 URL: https://issues.apache.org/jira/browse/YARN-1670 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 0.23.10, 2.2.0 Reporter: Thomas Graves Assignee: Mit Desai Priority: Critical Fix For: 2.4.0 Attachments: YARN-1670-b23.patch, YARN-1670-v2-b23.patch, YARN-1670-v2.patch, YARN-1670-v3-b23.patch, YARN-1670-v3.patch, YARN-1670.patch, YARN-1670.patch We have seen exceptions when using 'yarn logs' to read log files. at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Long.parseLong(Long.java:441) at java.lang.Long.parseLong(Long.java:483) at org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.readAContainerLogsForALogType(AggregatedLogFormat.java:518) at org.apache.hadoop.yarn.logaggregation.LogDumper.dumpAContainerLogs(LogDumper.java:178) at org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:130) at org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:246) We traced it down to the reader trying to read the file type of the next file but where it reads is still log data from the previous file. What happened was the Log Length was written as a certain size but the log data was actually longer then that. Inside of the write() routine in LogValue it first writes what the logfile length is, but then when it goes to write the log itself it just goes to the end of the file. There is a race condition here where if someone is still writing to the file when it goes to be aggregated the length written could be to small. We should have the write() routine stop when it writes whatever it said was the length. It would be nice if we could somehow tell the user it might be truncated but I'm not sure of a good way to do this. We also noticed that a bug in readAContainerLogsForALogType where it is using an int for curRead whereas it should be using a long. while (len != -1 curRead fileLength) { This isn't actually a problem right now as it looks like the underlying decoder is doing the right thing and the len condition exits. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Reopened] (YARN-1670) aggregated log writer can write more log data then it says is the log length
[ https://issues.apache.org/jira/browse/YARN-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles reopened YARN-1670: --- aggregated log writer can write more log data then it says is the log length Key: YARN-1670 URL: https://issues.apache.org/jira/browse/YARN-1670 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 0.23.10, 2.2.0 Reporter: Thomas Graves Assignee: Mit Desai Priority: Critical Fix For: 2.4.0 Attachments: YARN-1670-b23.patch, YARN-1670-v2-b23.patch, YARN-1670-v2.patch, YARN-1670-v3-b23.patch, YARN-1670-v3.patch, YARN-1670.patch, YARN-1670.patch We have seen exceptions when using 'yarn logs' to read log files. at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Long.parseLong(Long.java:441) at java.lang.Long.parseLong(Long.java:483) at org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.readAContainerLogsForALogType(AggregatedLogFormat.java:518) at org.apache.hadoop.yarn.logaggregation.LogDumper.dumpAContainerLogs(LogDumper.java:178) at org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:130) at org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:246) We traced it down to the reader trying to read the file type of the next file but where it reads is still log data from the previous file. What happened was the Log Length was written as a certain size but the log data was actually longer then that. Inside of the write() routine in LogValue it first writes what the logfile length is, but then when it goes to write the log itself it just goes to the end of the file. There is a race condition here where if someone is still writing to the file when it goes to be aggregated the length written could be to small. We should have the write() routine stop when it writes whatever it said was the length. It would be nice if we could somehow tell the user it might be truncated but I'm not sure of a good way to do this. We also noticed that a bug in readAContainerLogsForALogType where it is using an int for curRead whereas it should be using a long. while (len != -1 curRead fileLength) { This isn't actually a problem right now as it looks like the underlying decoder is doing the right thing and the len condition exits. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1670) aggregated log writer can write more log data then it says is the log length
[ https://issues.apache.org/jira/browse/YARN-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943422#comment-13943422 ] Jonathan Eagles commented on YARN-1670: --- I've reopened this ticket to verify the correctness of the patch that went into branch-2 and branch-2.4. aggregated log writer can write more log data then it says is the log length Key: YARN-1670 URL: https://issues.apache.org/jira/browse/YARN-1670 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 0.23.10, 2.2.0 Reporter: Thomas Graves Assignee: Mit Desai Priority: Critical Fix For: 2.4.0 Attachments: YARN-1670-b23.patch, YARN-1670-v2-b23.patch, YARN-1670-v2.patch, YARN-1670-v3-b23.patch, YARN-1670-v3.patch, YARN-1670.patch, YARN-1670.patch We have seen exceptions when using 'yarn logs' to read log files. at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Long.parseLong(Long.java:441) at java.lang.Long.parseLong(Long.java:483) at org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.readAContainerLogsForALogType(AggregatedLogFormat.java:518) at org.apache.hadoop.yarn.logaggregation.LogDumper.dumpAContainerLogs(LogDumper.java:178) at org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:130) at org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:246) We traced it down to the reader trying to read the file type of the next file but where it reads is still log data from the previous file. What happened was the Log Length was written as a certain size but the log data was actually longer then that. Inside of the write() routine in LogValue it first writes what the logfile length is, but then when it goes to write the log itself it just goes to the end of the file. There is a race condition here where if someone is still writing to the file when it goes to be aggregated the length written could be to small. We should have the write() routine stop when it writes whatever it said was the length. It would be nice if we could somehow tell the user it might be truncated but I'm not sure of a good way to do this. We also noticed that a bug in readAContainerLogsForALogType where it is using an int for curRead whereas it should be using a long. while (len != -1 curRead fileLength) { This isn't actually a problem right now as it looks like the underlying decoder is doing the right thing and the len condition exits. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1863) TestRMFailover fails with 'AssertionError: null'
[ https://issues.apache.org/jira/browse/YARN-1863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943455#comment-13943455 ] Vinod Kumar Vavilapalli commented on YARN-1863: --- Zhijie/Xuan, can we please run all the yarn tests before committing this patch? Tests have been in the broken stage for a while now.. TestRMFailover fails with 'AssertionError: null' Key: YARN-1863 URL: https://issues.apache.org/jira/browse/YARN-1863 Project: Hadoop YARN Issue Type: Test Reporter: Ted Yu Assignee: Xuan Gong Attachments: YARN-1863.1.patch, YARN-1863.2.patch This happened in Hadoop-Yarn-trunk - Build # 516 and can be reproduced: {code} testWebAppProxyInStandAloneMode(org.apache.hadoop.yarn.client.TestRMFailover) Time elapsed: 5.834 sec FAILURE! java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:92) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertTrue(Assert.java:54) at org.apache.hadoop.yarn.client.TestRMFailover.verifyExpectedException(TestRMFailover.java:250) at org.apache.hadoop.yarn.client.TestRMFailover.testWebAppProxyInStandAloneMode(TestRMFailover.java:216) testEmbeddedWebAppProxy(org.apache.hadoop.yarn.client.TestRMFailover) Time elapsed: 5.341 sec FAILURE! java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:92) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertTrue(Assert.java:54) at org.apache.hadoop.yarn.client.TestRMFailover.verifyExpectedException(TestRMFailover.java:250) at org.apache.hadoop.yarn.client.TestRMFailover.testEmbeddedWebAppProxy(TestRMFailover.java:241) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1863) Several test failures on trunk
[ https://issues.apache.org/jira/browse/YARN-1863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-1863: -- Summary: Several test failures on trunk (was: TestRMFailover fails with 'AssertionError: null') Several test failures on trunk -- Key: YARN-1863 URL: https://issues.apache.org/jira/browse/YARN-1863 Project: Hadoop YARN Issue Type: Test Reporter: Ted Yu Assignee: Xuan Gong Attachments: YARN-1863.1.patch, YARN-1863.2.patch This happened in Hadoop-Yarn-trunk - Build # 516 and can be reproduced: {code} testWebAppProxyInStandAloneMode(org.apache.hadoop.yarn.client.TestRMFailover) Time elapsed: 5.834 sec FAILURE! java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:92) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertTrue(Assert.java:54) at org.apache.hadoop.yarn.client.TestRMFailover.verifyExpectedException(TestRMFailover.java:250) at org.apache.hadoop.yarn.client.TestRMFailover.testWebAppProxyInStandAloneMode(TestRMFailover.java:216) testEmbeddedWebAppProxy(org.apache.hadoop.yarn.client.TestRMFailover) Time elapsed: 5.341 sec FAILURE! java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:92) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertTrue(Assert.java:54) at org.apache.hadoop.yarn.client.TestRMFailover.verifyExpectedException(TestRMFailover.java:250) at org.apache.hadoop.yarn.client.TestRMFailover.testEmbeddedWebAppProxy(TestRMFailover.java:241) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1863) TestRMFailover fails with 'AssertionError: null'
[ https://issues.apache.org/jira/browse/YARN-1863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943465#comment-13943465 ] Zhijie Shen commented on YARN-1863: --- I've observed more test failures on trunk: Tests run: 9, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 4.288 sec FAILURE! - in org.apache.hadoop.yarn.util.TestFSDownload testDownloadPublicWithStatCache(org.apache.hadoop.yarn.util.TestFSDownload) Time elapsed: 0.384 sec FAILURE! java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:92) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertTrue(Assert.java:54) at org.apache.hadoop.yarn.util.TestFSDownload.testDownloadPublicWithStatCache(TestFSDownload.java:363) Running org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation Tests run: 4, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 33.744 sec FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation testAMContainerAllocationWhenDNSUnavailable(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation) Time elapsed: 5.077 sec FAILURE! java.lang.AssertionError: expected:SCHEDULED but was:ALLOCATED at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:147) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation.testAMContainerAllocationWhenDNSUnavailable(TestContainerAllocation.java:240) TestRMFailover fails with 'AssertionError: null' Key: YARN-1863 URL: https://issues.apache.org/jira/browse/YARN-1863 Project: Hadoop YARN Issue Type: Test Reporter: Ted Yu Assignee: Xuan Gong Attachments: YARN-1863.1.patch, YARN-1863.2.patch This happened in Hadoop-Yarn-trunk - Build # 516 and can be reproduced: {code} testWebAppProxyInStandAloneMode(org.apache.hadoop.yarn.client.TestRMFailover) Time elapsed: 5.834 sec FAILURE! java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:92) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertTrue(Assert.java:54) at org.apache.hadoop.yarn.client.TestRMFailover.verifyExpectedException(TestRMFailover.java:250) at org.apache.hadoop.yarn.client.TestRMFailover.testWebAppProxyInStandAloneMode(TestRMFailover.java:216) testEmbeddedWebAppProxy(org.apache.hadoop.yarn.client.TestRMFailover) Time elapsed: 5.341 sec FAILURE! java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:92) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertTrue(Assert.java:54) at org.apache.hadoop.yarn.client.TestRMFailover.verifyExpectedException(TestRMFailover.java:250) at org.apache.hadoop.yarn.client.TestRMFailover.testEmbeddedWebAppProxy(TestRMFailover.java:241) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1670) aggregated log writer can write more log data then it says is the log length
[ https://issues.apache.org/jira/browse/YARN-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943464#comment-13943464 ] Thomas Graves commented on YARN-1670: - Good catch Jon. Yep I think you are correct here. We can actually still write more then we should. It should be checking to make sure the curRead + len read is fileLength before writing and if its not only writing the difference. aggregated log writer can write more log data then it says is the log length Key: YARN-1670 URL: https://issues.apache.org/jira/browse/YARN-1670 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 0.23.10, 2.2.0 Reporter: Thomas Graves Assignee: Mit Desai Priority: Critical Fix For: 2.4.0 Attachments: YARN-1670-b23.patch, YARN-1670-v2-b23.patch, YARN-1670-v2.patch, YARN-1670-v3-b23.patch, YARN-1670-v3.patch, YARN-1670.patch, YARN-1670.patch We have seen exceptions when using 'yarn logs' to read log files. at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Long.parseLong(Long.java:441) at java.lang.Long.parseLong(Long.java:483) at org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.readAContainerLogsForALogType(AggregatedLogFormat.java:518) at org.apache.hadoop.yarn.logaggregation.LogDumper.dumpAContainerLogs(LogDumper.java:178) at org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:130) at org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:246) We traced it down to the reader trying to read the file type of the next file but where it reads is still log data from the previous file. What happened was the Log Length was written as a certain size but the log data was actually longer then that. Inside of the write() routine in LogValue it first writes what the logfile length is, but then when it goes to write the log itself it just goes to the end of the file. There is a race condition here where if someone is still writing to the file when it goes to be aggregated the length written could be to small. We should have the write() routine stop when it writes whatever it said was the length. It would be nice if we could somehow tell the user it might be truncated but I'm not sure of a good way to do this. We also noticed that a bug in readAContainerLogsForALogType where it is using an int for curRead whereas it should be using a long. while (len != -1 curRead fileLength) { This isn't actually a problem right now as it looks like the underlying decoder is doing the right thing and the len condition exits. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1863) Several test failures on trunk
[ https://issues.apache.org/jira/browse/YARN-1863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943469#comment-13943469 ] Xuan Gong commented on YARN-1863: - I will fix all of them with this patch Several test failures on trunk -- Key: YARN-1863 URL: https://issues.apache.org/jira/browse/YARN-1863 Project: Hadoop YARN Issue Type: Test Reporter: Ted Yu Assignee: Xuan Gong Attachments: YARN-1863.1.patch, YARN-1863.2.patch This happened in Hadoop-Yarn-trunk - Build # 516 and can be reproduced: {code} testWebAppProxyInStandAloneMode(org.apache.hadoop.yarn.client.TestRMFailover) Time elapsed: 5.834 sec FAILURE! java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:92) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertTrue(Assert.java:54) at org.apache.hadoop.yarn.client.TestRMFailover.verifyExpectedException(TestRMFailover.java:250) at org.apache.hadoop.yarn.client.TestRMFailover.testWebAppProxyInStandAloneMode(TestRMFailover.java:216) testEmbeddedWebAppProxy(org.apache.hadoop.yarn.client.TestRMFailover) Time elapsed: 5.341 sec FAILURE! java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:92) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertTrue(Assert.java:54) at org.apache.hadoop.yarn.client.TestRMFailover.verifyExpectedException(TestRMFailover.java:250) at org.apache.hadoop.yarn.client.TestRMFailover.testEmbeddedWebAppProxy(TestRMFailover.java:241) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1670) aggregated log writer can write more log data then it says is the log length
[ https://issues.apache.org/jira/browse/YARN-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943477#comment-13943477 ] Vinod Kumar Vavilapalli commented on YARN-1670: --- It'll be useful to write a test for this cast too, though.. aggregated log writer can write more log data then it says is the log length Key: YARN-1670 URL: https://issues.apache.org/jira/browse/YARN-1670 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 0.23.10, 2.2.0 Reporter: Thomas Graves Assignee: Mit Desai Priority: Critical Fix For: 2.4.0 Attachments: YARN-1670-b23.patch, YARN-1670-v2-b23.patch, YARN-1670-v2.patch, YARN-1670-v3-b23.patch, YARN-1670-v3.patch, YARN-1670.patch, YARN-1670.patch We have seen exceptions when using 'yarn logs' to read log files. at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Long.parseLong(Long.java:441) at java.lang.Long.parseLong(Long.java:483) at org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.readAContainerLogsForALogType(AggregatedLogFormat.java:518) at org.apache.hadoop.yarn.logaggregation.LogDumper.dumpAContainerLogs(LogDumper.java:178) at org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:130) at org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:246) We traced it down to the reader trying to read the file type of the next file but where it reads is still log data from the previous file. What happened was the Log Length was written as a certain size but the log data was actually longer then that. Inside of the write() routine in LogValue it first writes what the logfile length is, but then when it goes to write the log itself it just goes to the end of the file. There is a race condition here where if someone is still writing to the file when it goes to be aggregated the length written could be to small. We should have the write() routine stop when it writes whatever it said was the length. It would be nice if we could somehow tell the user it might be truncated but I'm not sure of a good way to do this. We also noticed that a bug in readAContainerLogsForALogType where it is using an int for curRead whereas it should be using a long. while (len != -1 curRead fileLength) { This isn't actually a problem right now as it looks like the underlying decoder is doing the right thing and the len condition exits. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1863) Several YARN test failures on trunk
[ https://issues.apache.org/jira/browse/YARN-1863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1863: Summary: Several YARN test failures on trunk (was: Several test failures on trunk) Several YARN test failures on trunk --- Key: YARN-1863 URL: https://issues.apache.org/jira/browse/YARN-1863 Project: Hadoop YARN Issue Type: Test Reporter: Ted Yu Assignee: Xuan Gong Attachments: YARN-1863.1.patch, YARN-1863.2.patch This happened in Hadoop-Yarn-trunk - Build # 516 and can be reproduced: {code} testWebAppProxyInStandAloneMode(org.apache.hadoop.yarn.client.TestRMFailover) Time elapsed: 5.834 sec FAILURE! java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:92) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertTrue(Assert.java:54) at org.apache.hadoop.yarn.client.TestRMFailover.verifyExpectedException(TestRMFailover.java:250) at org.apache.hadoop.yarn.client.TestRMFailover.testWebAppProxyInStandAloneMode(TestRMFailover.java:216) testEmbeddedWebAppProxy(org.apache.hadoop.yarn.client.TestRMFailover) Time elapsed: 5.341 sec FAILURE! java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:92) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertTrue(Assert.java:54) at org.apache.hadoop.yarn.client.TestRMFailover.verifyExpectedException(TestRMFailover.java:250) at org.apache.hadoop.yarn.client.TestRMFailover.testEmbeddedWebAppProxy(TestRMFailover.java:241) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1670) aggregated log writer can write more log data then it says is the log length
[ https://issues.apache.org/jira/browse/YARN-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943476#comment-13943476 ] Vinod Kumar Vavilapalli commented on YARN-1670: --- Good catch, Jon! Just checking for {{curRead + len fileLength}} will also not work no? We have to explicitly write only {{fileLength - curRead}} bytes if {{curRead + len fileLength}}. Right? aggregated log writer can write more log data then it says is the log length Key: YARN-1670 URL: https://issues.apache.org/jira/browse/YARN-1670 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 0.23.10, 2.2.0 Reporter: Thomas Graves Assignee: Mit Desai Priority: Critical Fix For: 2.4.0 Attachments: YARN-1670-b23.patch, YARN-1670-v2-b23.patch, YARN-1670-v2.patch, YARN-1670-v3-b23.patch, YARN-1670-v3.patch, YARN-1670.patch, YARN-1670.patch We have seen exceptions when using 'yarn logs' to read log files. at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Long.parseLong(Long.java:441) at java.lang.Long.parseLong(Long.java:483) at org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.readAContainerLogsForALogType(AggregatedLogFormat.java:518) at org.apache.hadoop.yarn.logaggregation.LogDumper.dumpAContainerLogs(LogDumper.java:178) at org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:130) at org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:246) We traced it down to the reader trying to read the file type of the next file but where it reads is still log data from the previous file. What happened was the Log Length was written as a certain size but the log data was actually longer then that. Inside of the write() routine in LogValue it first writes what the logfile length is, but then when it goes to write the log itself it just goes to the end of the file. There is a race condition here where if someone is still writing to the file when it goes to be aggregated the length written could be to small. We should have the write() routine stop when it writes whatever it said was the length. It would be nice if we could somehow tell the user it might be truncated but I'm not sure of a good way to do this. We also noticed that a bug in readAContainerLogsForALogType where it is using an int for curRead whereas it should be using a long. while (len != -1 curRead fileLength) { This isn't actually a problem right now as it looks like the underlying decoder is doing the right thing and the len condition exits. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1863) Several test failures on trunk
[ https://issues.apache.org/jira/browse/YARN-1863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943471#comment-13943471 ] Xuan Gong commented on YARN-1863: - I mean related test case failures Several test failures on trunk -- Key: YARN-1863 URL: https://issues.apache.org/jira/browse/YARN-1863 Project: Hadoop YARN Issue Type: Test Reporter: Ted Yu Assignee: Xuan Gong Attachments: YARN-1863.1.patch, YARN-1863.2.patch This happened in Hadoop-Yarn-trunk - Build # 516 and can be reproduced: {code} testWebAppProxyInStandAloneMode(org.apache.hadoop.yarn.client.TestRMFailover) Time elapsed: 5.834 sec FAILURE! java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:92) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertTrue(Assert.java:54) at org.apache.hadoop.yarn.client.TestRMFailover.verifyExpectedException(TestRMFailover.java:250) at org.apache.hadoop.yarn.client.TestRMFailover.testWebAppProxyInStandAloneMode(TestRMFailover.java:216) testEmbeddedWebAppProxy(org.apache.hadoop.yarn.client.TestRMFailover) Time elapsed: 5.341 sec FAILURE! java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:92) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertTrue(Assert.java:54) at org.apache.hadoop.yarn.client.TestRMFailover.verifyExpectedException(TestRMFailover.java:250) at org.apache.hadoop.yarn.client.TestRMFailover.testEmbeddedWebAppProxy(TestRMFailover.java:241) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (YARN-1670) aggregated log writer can write more log data then it says is the log length
[ https://issues.apache.org/jira/browse/YARN-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943477#comment-13943477 ] Vinod Kumar Vavilapalli edited comment on YARN-1670 at 3/21/14 8:00 PM: It'll be useful to write a test for this case too, though.. was (Author: vinodkv): It'll be useful to write a test for this cast too, though.. aggregated log writer can write more log data then it says is the log length Key: YARN-1670 URL: https://issues.apache.org/jira/browse/YARN-1670 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 0.23.10, 2.2.0 Reporter: Thomas Graves Assignee: Mit Desai Priority: Critical Fix For: 2.4.0 Attachments: YARN-1670-b23.patch, YARN-1670-v2-b23.patch, YARN-1670-v2.patch, YARN-1670-v3-b23.patch, YARN-1670-v3.patch, YARN-1670.patch, YARN-1670.patch We have seen exceptions when using 'yarn logs' to read log files. at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Long.parseLong(Long.java:441) at java.lang.Long.parseLong(Long.java:483) at org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.readAContainerLogsForALogType(AggregatedLogFormat.java:518) at org.apache.hadoop.yarn.logaggregation.LogDumper.dumpAContainerLogs(LogDumper.java:178) at org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:130) at org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:246) We traced it down to the reader trying to read the file type of the next file but where it reads is still log data from the previous file. What happened was the Log Length was written as a certain size but the log data was actually longer then that. Inside of the write() routine in LogValue it first writes what the logfile length is, but then when it goes to write the log itself it just goes to the end of the file. There is a race condition here where if someone is still writing to the file when it goes to be aggregated the length written could be to small. We should have the write() routine stop when it writes whatever it said was the length. It would be nice if we could somehow tell the user it might be truncated but I'm not sure of a good way to do this. We also noticed that a bug in readAContainerLogsForALogType where it is using an int for curRead whereas it should be using a long. while (len != -1 curRead fileLength) { This isn't actually a problem right now as it looks like the underlying decoder is doing the right thing and the len condition exits. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1536) Cleanup: Get rid of ResourceManager#get*SecretManager() methods and use the RMContext methods instead
[ https://issues.apache.org/jira/browse/YARN-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot updated YARN-1536: Attachment: yarn-1536.003.patch Cleanup: Get rid of ResourceManager#get*SecretManager() methods and use the RMContext methods instead - Key: YARN-1536 URL: https://issues.apache.org/jira/browse/YARN-1536 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.3.0 Reporter: Karthik Kambatla Assignee: Anubhav Dhoot Priority: Minor Labels: newbie Attachments: yarn-1536.002.patch, yarn-1536.003.patch, yarn-1536.patch Both ResourceManager and RMContext have methods to access the secret managers, and it should be safe (cleaner) to get rid of the ResourceManager methods. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1670) aggregated log writer can write more log data then it says is the log length
[ https://issues.apache.org/jira/browse/YARN-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943497#comment-13943497 ] Mit Desai commented on YARN-1670: - Thats correct Vinod. In the last iteration, where the buf length is greater than the remaining portion of the file, we will have to write the {{fileLength-curRead}} bytes aggregated log writer can write more log data then it says is the log length Key: YARN-1670 URL: https://issues.apache.org/jira/browse/YARN-1670 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 0.23.10, 2.2.0 Reporter: Thomas Graves Assignee: Mit Desai Priority: Critical Fix For: 2.4.0 Attachments: YARN-1670-b23.patch, YARN-1670-v2-b23.patch, YARN-1670-v2.patch, YARN-1670-v3-b23.patch, YARN-1670-v3.patch, YARN-1670.patch, YARN-1670.patch We have seen exceptions when using 'yarn logs' to read log files. at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Long.parseLong(Long.java:441) at java.lang.Long.parseLong(Long.java:483) at org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.readAContainerLogsForALogType(AggregatedLogFormat.java:518) at org.apache.hadoop.yarn.logaggregation.LogDumper.dumpAContainerLogs(LogDumper.java:178) at org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:130) at org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:246) We traced it down to the reader trying to read the file type of the next file but where it reads is still log data from the previous file. What happened was the Log Length was written as a certain size but the log data was actually longer then that. Inside of the write() routine in LogValue it first writes what the logfile length is, but then when it goes to write the log itself it just goes to the end of the file. There is a race condition here where if someone is still writing to the file when it goes to be aggregated the length written could be to small. We should have the write() routine stop when it writes whatever it said was the length. It would be nice if we could somehow tell the user it might be truncated but I'm not sure of a good way to do this. We also noticed that a bug in readAContainerLogsForALogType where it is using an int for curRead whereas it should be using a long. while (len != -1 curRead fileLength) { This isn't actually a problem right now as it looks like the underlying decoder is doing the right thing and the len condition exits. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (YARN-304) RM Tracking Links for purged applications needs a long-term solution
[ https://issues.apache.org/jira/browse/YARN-304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mayank Bansal reassigned YARN-304: -- Assignee: Mayank Bansal (was: Zhijie Shen) RM Tracking Links for purged applications needs a long-term solution Key: YARN-304 URL: https://issues.apache.org/jira/browse/YARN-304 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 3.0.0, 0.23.5 Reporter: Derek Dagit Assignee: Mayank Bansal This JIRA is intended to track a proper long-term fix for the issue described in YARN-285. The following is from the original description: As applications complete, the RM tracks their IDs in a completed list. This list is routinely truncated to limit the total number of application remembered by the RM. When a user clicks the History for a job, either the browser is redirected to the application's tracking link obtained from the stored application instance. But when the application has been purged from the RM, an error is displayed. In very busy clusters the rate at which applications complete can cause applications to be purged from the RM's internal list within hours, which breaks the proxy URLs users have saved for their jobs. We would like the RM to provide valid tracking links persist so that users are not frustrated by broken links. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-304) RM Tracking Links for purged applications needs a long-term solution
[ https://issues.apache.org/jira/browse/YARN-304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943545#comment-13943545 ] Mayank Bansal commented on YARN-304: Taking it over RM Tracking Links for purged applications needs a long-term solution Key: YARN-304 URL: https://issues.apache.org/jira/browse/YARN-304 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 3.0.0, 0.23.5 Reporter: Derek Dagit Assignee: Mayank Bansal This JIRA is intended to track a proper long-term fix for the issue described in YARN-285. The following is from the original description: As applications complete, the RM tracks their IDs in a completed list. This list is routinely truncated to limit the total number of application remembered by the RM. When a user clicks the History for a job, either the browser is redirected to the application's tracking link obtained from the stored application instance. But when the application has been purged from the RM, an error is displayed. In very busy clusters the rate at which applications complete can cause applications to be purged from the RM's internal list within hours, which breaks the proxy URLs users have saved for their jobs. We would like the RM to provide valid tracking links persist so that users are not frustrated by broken links. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1536) Cleanup: Get rid of ResourceManager#get*SecretManager() methods and use the RMContext methods instead
[ https://issues.apache.org/jira/browse/YARN-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943542#comment-13943542 ] Hadoop QA commented on YARN-1536: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12636094/yarn-1536.003.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 15 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3426//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3426//console This message is automatically generated. Cleanup: Get rid of ResourceManager#get*SecretManager() methods and use the RMContext methods instead - Key: YARN-1536 URL: https://issues.apache.org/jira/browse/YARN-1536 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.3.0 Reporter: Karthik Kambatla Assignee: Anubhav Dhoot Priority: Minor Labels: newbie Attachments: yarn-1536.002.patch, yarn-1536.003.patch, yarn-1536.patch Both ResourceManager and RMContext have methods to access the secret managers, and it should be safe (cleaner) to get rid of the ResourceManager methods. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1863) Several YARN test failures on trunk
[ https://issues.apache.org/jira/browse/YARN-1863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943572#comment-13943572 ] Vinod Kumar Vavilapalli commented on YARN-1863: --- I can help fix these tests. Let's cover everything that is either related or not. Several YARN test failures on trunk --- Key: YARN-1863 URL: https://issues.apache.org/jira/browse/YARN-1863 Project: Hadoop YARN Issue Type: Test Reporter: Ted Yu Assignee: Xuan Gong Attachments: YARN-1863.1.patch, YARN-1863.2.patch This happened in Hadoop-Yarn-trunk - Build # 516 and can be reproduced: {code} testWebAppProxyInStandAloneMode(org.apache.hadoop.yarn.client.TestRMFailover) Time elapsed: 5.834 sec FAILURE! java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:92) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertTrue(Assert.java:54) at org.apache.hadoop.yarn.client.TestRMFailover.verifyExpectedException(TestRMFailover.java:250) at org.apache.hadoop.yarn.client.TestRMFailover.testWebAppProxyInStandAloneMode(TestRMFailover.java:216) testEmbeddedWebAppProxy(org.apache.hadoop.yarn.client.TestRMFailover) Time elapsed: 5.341 sec FAILURE! java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:92) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertTrue(Assert.java:54) at org.apache.hadoop.yarn.client.TestRMFailover.verifyExpectedException(TestRMFailover.java:250) at org.apache.hadoop.yarn.client.TestRMFailover.testEmbeddedWebAppProxy(TestRMFailover.java:241) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1776) renewDelegationToken should survive RM failover
[ https://issues.apache.org/jira/browse/YARN-1776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943585#comment-13943585 ] Jian He commented on YARN-1776: --- Some comments on the patch: - completeRenewRecords - checkAndResumeUpdateOperation? - updateRMDelegationTokenAndSequenceNumberState-updateRMDelegationTokenAndSequenceNumberInternal ? - For ZK, we may just use setData, instead of remove and create znode for updates. - Test for FSRMStateStore: we need a test to verify on recovery, if encountering a .new file, we should resume the update operation. Essentially, completeRenewRecords needs test. renewDelegationToken should survive RM failover --- Key: YARN-1776 URL: https://issues.apache.org/jira/browse/YARN-1776 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-1776.1.patch, YARN-1776.2.patch, YARN-1776.3.patch When a delegation token is renewed, two RMStateStore operations: 1) removing the old DT, and 2) storing the new DT will happen. If RM fails in between. There would be problem. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1849) NPE in ResourceTrackerService#registerNodeManager for UAM
[ https://issues.apache.org/jira/browse/YARN-1849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943589#comment-13943589 ] Jian He commented on YARN-1849: --- LGTM , thanks Karthik ! NPE in ResourceTrackerService#registerNodeManager for UAM - Key: YARN-1849 URL: https://issues.apache.org/jira/browse/YARN-1849 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Blocker Attachments: yarn-1849-1.patch, yarn-1849-2.patch, yarn-1849-2.patch, yarn-1849-3.patch, yarn-1849-4.patch, yarn-1849-5.patch, yarn-1849-6.patch While running an UnmanagedAM on secure cluster, ran into an NPE on failover/restart. This is similar to YARN-1821. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1838) Timeline service getEntities API should provide ability to get entities from given id
[ https://issues.apache.org/jira/browse/YARN-1838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Billie Rinaldi updated YARN-1838: - Attachment: YARN-1838.3.patch New patch hopefully fixing compilation issue and fixing bug in how insert timestamp is determined. Timeline service getEntities API should provide ability to get entities from given id - Key: YARN-1838 URL: https://issues.apache.org/jira/browse/YARN-1838 Project: Hadoop YARN Issue Type: Sub-task Reporter: Srimanth Gunturi Assignee: Billie Rinaldi Attachments: YARN-1838.1.patch, YARN-1838.2.patch, YARN-1838.3.patch To support pagination, we need ability to get entities from a certain ID by providing a new param called {{fromid}}. For example on a page of 10 jobs, our first call will be like [http://server:8188/ws/v1/timeline/HIVE_QUERY_ID?fields=events,primaryfilters,otherinfolimit=11] When user hits next, we would like to call [http://server:8188/ws/v1/timeline/HIVE_QUERY_ID?fields=events,primaryfilters,otherinfofromid=JID11limit=11] and continue on for further _Next_ clicks On hitting back, we will make similar calls for previous items [http://server:8188/ws/v1/timeline/HIVE_QUERY_ID?fields=events,primaryfilters,otherinfofromid=JID1limit=11] {{fromid}} should be inclusive of the id given. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1775) Create SMAPBasedProcessTree to get PSS information
[ https://issues.apache.org/jira/browse/YARN-1775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-1775: -- Attachment: YARN-1775-v5.patch Same patch with a few more renames. Will check this in if Jenkins says okay. Create SMAPBasedProcessTree to get PSS information -- Key: YARN-1775 URL: https://issues.apache.org/jira/browse/YARN-1775 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Rajesh Balamohan Assignee: Rajesh Balamohan Attachments: YARN-1775-v2.patch, YARN-1775-v3.patch, YARN-1775-v3.patch, YARN-1775-v4.patch, YARN-1775-v5.patch, yarn-1775-2.4.0.patch Create SMAPBasedProcessTree (by extending ProcfsBasedProcessTree), which will make use of PSS for computing the memory usage. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1838) Timeline service getEntities API should provide ability to get entities from given id
[ https://issues.apache.org/jira/browse/YARN-1838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943658#comment-13943658 ] Hadoop QA commented on YARN-1838: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12636118/YARN-1838.3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified test files. {color:red}-1 javac{color}. The applied patch generated 1494 javac compiler warnings (more than the trunk's current 1491 warnings). {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3427//testReport/ Javac warnings: https://builds.apache.org/job/PreCommit-YARN-Build/3427//artifact/trunk/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3427//console This message is automatically generated. Timeline service getEntities API should provide ability to get entities from given id - Key: YARN-1838 URL: https://issues.apache.org/jira/browse/YARN-1838 Project: Hadoop YARN Issue Type: Sub-task Reporter: Srimanth Gunturi Assignee: Billie Rinaldi Attachments: YARN-1838.1.patch, YARN-1838.2.patch, YARN-1838.3.patch To support pagination, we need ability to get entities from a certain ID by providing a new param called {{fromid}}. For example on a page of 10 jobs, our first call will be like [http://server:8188/ws/v1/timeline/HIVE_QUERY_ID?fields=events,primaryfilters,otherinfolimit=11] When user hits next, we would like to call [http://server:8188/ws/v1/timeline/HIVE_QUERY_ID?fields=events,primaryfilters,otherinfofromid=JID11limit=11] and continue on for further _Next_ clicks On hitting back, we will make similar calls for previous items [http://server:8188/ws/v1/timeline/HIVE_QUERY_ID?fields=events,primaryfilters,otherinfofromid=JID1limit=11] {{fromid}} should be inclusive of the id given. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1863) Several YARN test failures on trunk
[ https://issues.apache.org/jira/browse/YARN-1863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-1863: -- Priority: Blocker (was: Major) Target Version/s: 2.4.0 Several YARN test failures on trunk --- Key: YARN-1863 URL: https://issues.apache.org/jira/browse/YARN-1863 Project: Hadoop YARN Issue Type: Test Reporter: Ted Yu Assignee: Vinod Kumar Vavilapalli Priority: Blocker Attachments: YARN-1863.1.patch, YARN-1863.2.patch This happened in Hadoop-Yarn-trunk - Build # 516 and can be reproduced: {code} testWebAppProxyInStandAloneMode(org.apache.hadoop.yarn.client.TestRMFailover) Time elapsed: 5.834 sec FAILURE! java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:92) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertTrue(Assert.java:54) at org.apache.hadoop.yarn.client.TestRMFailover.verifyExpectedException(TestRMFailover.java:250) at org.apache.hadoop.yarn.client.TestRMFailover.testWebAppProxyInStandAloneMode(TestRMFailover.java:216) testEmbeddedWebAppProxy(org.apache.hadoop.yarn.client.TestRMFailover) Time elapsed: 5.341 sec FAILURE! java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:92) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertTrue(Assert.java:54) at org.apache.hadoop.yarn.client.TestRMFailover.verifyExpectedException(TestRMFailover.java:250) at org.apache.hadoop.yarn.client.TestRMFailover.testEmbeddedWebAppProxy(TestRMFailover.java:241) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (YARN-1863) Several YARN test failures on trunk
[ https://issues.apache.org/jira/browse/YARN-1863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli reassigned YARN-1863: - Assignee: Vinod Kumar Vavilapalli (was: Xuan Gong) Several YARN test failures on trunk --- Key: YARN-1863 URL: https://issues.apache.org/jira/browse/YARN-1863 Project: Hadoop YARN Issue Type: Test Reporter: Ted Yu Assignee: Vinod Kumar Vavilapalli Attachments: YARN-1863.1.patch, YARN-1863.2.patch This happened in Hadoop-Yarn-trunk - Build # 516 and can be reproduced: {code} testWebAppProxyInStandAloneMode(org.apache.hadoop.yarn.client.TestRMFailover) Time elapsed: 5.834 sec FAILURE! java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:92) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertTrue(Assert.java:54) at org.apache.hadoop.yarn.client.TestRMFailover.verifyExpectedException(TestRMFailover.java:250) at org.apache.hadoop.yarn.client.TestRMFailover.testWebAppProxyInStandAloneMode(TestRMFailover.java:216) testEmbeddedWebAppProxy(org.apache.hadoop.yarn.client.TestRMFailover) Time elapsed: 5.341 sec FAILURE! java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:92) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertTrue(Assert.java:54) at org.apache.hadoop.yarn.client.TestRMFailover.verifyExpectedException(TestRMFailover.java:250) at org.apache.hadoop.yarn.client.TestRMFailover.testEmbeddedWebAppProxy(TestRMFailover.java:241) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1838) Timeline service getEntities API should provide ability to get entities from given id
[ https://issues.apache.org/jira/browse/YARN-1838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Billie Rinaldi updated YARN-1838: - Attachment: YARN-1838.4.patch Fixed javac warning. Timeline service getEntities API should provide ability to get entities from given id - Key: YARN-1838 URL: https://issues.apache.org/jira/browse/YARN-1838 Project: Hadoop YARN Issue Type: Sub-task Reporter: Srimanth Gunturi Assignee: Billie Rinaldi Attachments: YARN-1838.1.patch, YARN-1838.2.patch, YARN-1838.3.patch, YARN-1838.4.patch To support pagination, we need ability to get entities from a certain ID by providing a new param called {{fromid}}. For example on a page of 10 jobs, our first call will be like [http://server:8188/ws/v1/timeline/HIVE_QUERY_ID?fields=events,primaryfilters,otherinfolimit=11] When user hits next, we would like to call [http://server:8188/ws/v1/timeline/HIVE_QUERY_ID?fields=events,primaryfilters,otherinfofromid=JID11limit=11] and continue on for further _Next_ clicks On hitting back, we will make similar calls for previous items [http://server:8188/ws/v1/timeline/HIVE_QUERY_ID?fields=events,primaryfilters,otherinfofromid=JID1limit=11] {{fromid}} should be inclusive of the id given. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1577) Unmanaged AM is broken because of YARN-1493
[ https://issues.apache.org/jira/browse/YARN-1577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-1577: -- Attachment: YARN-1577.1.patch Unmanaged AM is broken because of YARN-1493 --- Key: YARN-1577 URL: https://issues.apache.org/jira/browse/YARN-1577 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.3.0 Reporter: Jian He Assignee: Naren Koneru Priority: Blocker Attachments: YARN-1577.1.patch Today unmanaged AM client is waiting for app state to be Accepted to launch the AM. This is broken since we changed in YARN-1493 to start the attempt after the application is Accepted. We may need to introduce an attempt state report that client can rely on to query the attempt state and choose to launch the unmanaged AM. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1577) Unmanaged AM is broken because of YARN-1493
[ https://issues.apache.org/jira/browse/YARN-1577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943671#comment-13943671 ] Jian He commented on YARN-1577: --- Uploaded a patch: Changed UMA launcher to wait until attempt reaches Launched state to launch the AM. Unmanaged AM is broken because of YARN-1493 --- Key: YARN-1577 URL: https://issues.apache.org/jira/browse/YARN-1577 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.3.0 Reporter: Jian He Assignee: Naren Koneru Priority: Blocker Attachments: YARN-1577.1.patch Today unmanaged AM client is waiting for app state to be Accepted to launch the AM. This is broken since we changed in YARN-1493 to start the attempt after the application is Accepted. We may need to introduce an attempt state report that client can rely on to query the attempt state and choose to launch the unmanaged AM. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1854) TestRMHA#testStartAndTransitions Fails
[ https://issues.apache.org/jira/browse/YARN-1854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943679#comment-13943679 ] Vinod Kumar Vavilapalli commented on YARN-1854: --- This looks fine, running all the tests before commit so as to take care of YARN-1863 also. TestRMHA#testStartAndTransitions Fails -- Key: YARN-1854 URL: https://issues.apache.org/jira/browse/YARN-1854 Project: Hadoop YARN Issue Type: Test Affects Versions: 2.4.0 Reporter: Mit Desai Assignee: Rohith Priority: Blocker Attachments: YARN-1854.1.patch, YARN-1854.patch {noformat} testStartAndTransitions(org.apache.hadoop.yarn.server.resourcemanager.TestRMHA) Time elapsed: 5.883 sec FAILURE! java.lang.AssertionError: Incorrect value for metric availableMB expected:2048 but was:4096 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.assertMetric(TestRMHA.java:396) at org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.verifyClusterMetrics(TestRMHA.java:387) at org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.testStartAndTransitions(TestRMHA.java:160) Results : Failed tests: TestRMHA.testStartAndTransitions:160-verifyClusterMetrics:387-assertMetric:396 Incorrect value for metric availableMB expected:2048 but was:4096 {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1577) Unmanaged AM is broken because of YARN-1493
[ https://issues.apache.org/jira/browse/YARN-1577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-1577: -- Attachment: YARN-1577.2.patch Fixed a typo Unmanaged AM is broken because of YARN-1493 --- Key: YARN-1577 URL: https://issues.apache.org/jira/browse/YARN-1577 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.3.0 Reporter: Jian He Assignee: Naren Koneru Priority: Blocker Attachments: YARN-1577.1.patch, YARN-1577.2.patch Today unmanaged AM client is waiting for app state to be Accepted to launch the AM. This is broken since we changed in YARN-1493 to start the attempt after the application is Accepted. We may need to introduce an attempt state report that client can rely on to query the attempt state and choose to launch the unmanaged AM. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1775) Create SMAPBasedProcessTree to get PSS information
[ https://issues.apache.org/jira/browse/YARN-1775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943690#comment-13943690 ] Karthik Kambatla commented on YARN-1775: The latest patch looks good to me too. Create SMAPBasedProcessTree to get PSS information -- Key: YARN-1775 URL: https://issues.apache.org/jira/browse/YARN-1775 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Rajesh Balamohan Assignee: Rajesh Balamohan Attachments: YARN-1775-v2.patch, YARN-1775-v3.patch, YARN-1775-v3.patch, YARN-1775-v4.patch, YARN-1775-v5.patch, yarn-1775-2.4.0.patch Create SMAPBasedProcessTree (by extending ProcfsBasedProcessTree), which will make use of PSS for computing the memory usage. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-1864) Fair Scheduler Dynamic Hierarchical User Queues
Ashwin Shankar created YARN-1864: Summary: Fair Scheduler Dynamic Hierarchical User Queues Key: YARN-1864 URL: https://issues.apache.org/jira/browse/YARN-1864 Project: Hadoop YARN Issue Type: New Feature Components: scheduler Reporter: Ashwin Shankar Fix For: 2.4.0 In Fair Scheduler, we want to be able to create user queues under any parent queue in the hierarchy. For eg. Say user1 submits a job to a parent queue called root.allUserQueues, we want be able to create a new queue called root.allUserQueues.user1 and run user1's job in it.Any further jobs submitted by this user to root.allUserQueues will be run in this newly created root.allUserQueues.user1. This is very similar to the 'user-as-default' feature in Fair Scheduler which creates user queues under root queue. But we want the ability to create user queues under ANY parent queue. Why do we want this ? 1. Preemption : these dynamically created user queues can preempt each other if its fair share is not met. So there is fairness among users. 2. Allocation to user queues : we want all the user queries(adhoc) to consume only a fraction of resources in the shared cluster. By creating this feature,we could do that by giving a fair share to the parent user queue which is then redistributed to all the dynamically created user queues. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1864) Fair Scheduler Dynamic Hierarchical User Queues
[ https://issues.apache.org/jira/browse/YARN-1864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashwin Shankar updated YARN-1864: - Attachment: YARN-1864-v1.txt Fair Scheduler Dynamic Hierarchical User Queues --- Key: YARN-1864 URL: https://issues.apache.org/jira/browse/YARN-1864 Project: Hadoop YARN Issue Type: New Feature Components: scheduler Reporter: Ashwin Shankar Labels: scheduler Fix For: 2.4.0 Attachments: YARN-1864-v1.txt In Fair Scheduler, we want to be able to create user queues under any parent queue in the hierarchy. For eg. Say user1 submits a job to a parent queue called root.allUserQueues, we want be able to create a new queue called root.allUserQueues.user1 and run user1's job in it.Any further jobs submitted by this user to root.allUserQueues will be run in this newly created root.allUserQueues.user1. This is very similar to the 'user-as-default' feature in Fair Scheduler which creates user queues under root queue. But we want the ability to create user queues under ANY parent queue. Why do we want this ? 1. Preemption : these dynamically created user queues can preempt each other if its fair share is not met. So there is fairness among users. 2. Allocation to user queues : we want all the user queries(adhoc) to consume only a fraction of resources in the shared cluster. By creating this feature,we could do that by giving a fair share to the parent user queue which is then redistributed to all the dynamically created user queues. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1776) renewDelegationToken should survive RM failover
[ https://issues.apache.org/jira/browse/YARN-1776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-1776: -- Attachment: YARN-1776.4.patch Upload a new patch, which addresses Jian's comments renewDelegationToken should survive RM failover --- Key: YARN-1776 URL: https://issues.apache.org/jira/browse/YARN-1776 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-1776.1.patch, YARN-1776.2.patch, YARN-1776.3.patch, YARN-1776.4.patch When a delegation token is renewed, two RMStateStore operations: 1) removing the old DT, and 2) storing the new DT will happen. If RM fails in between. There would be problem. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1775) Create SMAPBasedProcessTree to get PSS information
[ https://issues.apache.org/jira/browse/YARN-1775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943700#comment-13943700 ] Hadoop QA commented on YARN-1775: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12636125/YARN-1775-v5.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3428//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3428//console This message is automatically generated. Create SMAPBasedProcessTree to get PSS information -- Key: YARN-1775 URL: https://issues.apache.org/jira/browse/YARN-1775 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Rajesh Balamohan Assignee: Rajesh Balamohan Attachments: YARN-1775-v2.patch, YARN-1775-v3.patch, YARN-1775-v3.patch, YARN-1775-v4.patch, YARN-1775-v5.patch, yarn-1775-2.4.0.patch Create SMAPBasedProcessTree (by extending ProcfsBasedProcessTree), which will make use of PSS for computing the memory usage. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1849) NPE in ResourceTrackerService#registerNodeManager for UAM
[ https://issues.apache.org/jira/browse/YARN-1849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943706#comment-13943706 ] Hudson commented on YARN-1849: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5375 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5375/]) YARN-1849. Fixed NPE in ResourceTrackerService#registerNodeManager for UAM. Contributed by Karthik Kambatla (jianhe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1580077) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java NPE in ResourceTrackerService#registerNodeManager for UAM - Key: YARN-1849 URL: https://issues.apache.org/jira/browse/YARN-1849 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Blocker Attachments: yarn-1849-1.patch, yarn-1849-2.patch, yarn-1849-2.patch, yarn-1849-3.patch, yarn-1849-4.patch, yarn-1849-5.patch, yarn-1849-6.patch While running an UnmanagedAM on secure cluster, ran into an NPE on failover/restart. This is similar to YARN-1821. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1838) Timeline service getEntities API should provide ability to get entities from given id
[ https://issues.apache.org/jira/browse/YARN-1838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943710#comment-13943710 ] Hadoop QA commented on YARN-1838: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12636127/YARN-1838.4.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3429//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3429//console This message is automatically generated. Timeline service getEntities API should provide ability to get entities from given id - Key: YARN-1838 URL: https://issues.apache.org/jira/browse/YARN-1838 Project: Hadoop YARN Issue Type: Sub-task Reporter: Srimanth Gunturi Assignee: Billie Rinaldi Attachments: YARN-1838.1.patch, YARN-1838.2.patch, YARN-1838.3.patch, YARN-1838.4.patch To support pagination, we need ability to get entities from a certain ID by providing a new param called {{fromid}}. For example on a page of 10 jobs, our first call will be like [http://server:8188/ws/v1/timeline/HIVE_QUERY_ID?fields=events,primaryfilters,otherinfolimit=11] When user hits next, we would like to call [http://server:8188/ws/v1/timeline/HIVE_QUERY_ID?fields=events,primaryfilters,otherinfofromid=JID11limit=11] and continue on for further _Next_ clicks On hitting back, we will make similar calls for previous items [http://server:8188/ws/v1/timeline/HIVE_QUERY_ID?fields=events,primaryfilters,otherinfofromid=JID1limit=11] {{fromid}} should be inclusive of the id given. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1577) Unmanaged AM is broken because of YARN-1493
[ https://issues.apache.org/jira/browse/YARN-1577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943743#comment-13943743 ] Hadoop QA commented on YARN-1577: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12636130/YARN-1577.2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3430//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3430//console This message is automatically generated. Unmanaged AM is broken because of YARN-1493 --- Key: YARN-1577 URL: https://issues.apache.org/jira/browse/YARN-1577 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.3.0 Reporter: Jian He Assignee: Naren Koneru Priority: Blocker Attachments: YARN-1577.1.patch, YARN-1577.2.patch Today unmanaged AM client is waiting for app state to be Accepted to launch the AM. This is broken since we changed in YARN-1493 to start the attempt after the application is Accepted. We may need to introduce an attempt state report that client can rely on to query the attempt state and choose to launch the unmanaged AM. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1838) Timeline service getEntities API should provide ability to get entities from given id
[ https://issues.apache.org/jira/browse/YARN-1838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-1838: -- Attachment: YARN-1838.5.patch Timeline service getEntities API should provide ability to get entities from given id - Key: YARN-1838 URL: https://issues.apache.org/jira/browse/YARN-1838 Project: Hadoop YARN Issue Type: Sub-task Reporter: Srimanth Gunturi Assignee: Billie Rinaldi Attachments: YARN-1838.1.patch, YARN-1838.2.patch, YARN-1838.3.patch, YARN-1838.4.patch, YARN-1838.5.patch To support pagination, we need ability to get entities from a certain ID by providing a new param called {{fromid}}. For example on a page of 10 jobs, our first call will be like [http://server:8188/ws/v1/timeline/HIVE_QUERY_ID?fields=events,primaryfilters,otherinfolimit=11] When user hits next, we would like to call [http://server:8188/ws/v1/timeline/HIVE_QUERY_ID?fields=events,primaryfilters,otherinfofromid=JID11limit=11] and continue on for further _Next_ clicks On hitting back, we will make similar calls for previous items [http://server:8188/ws/v1/timeline/HIVE_QUERY_ID?fields=events,primaryfilters,otherinfofromid=JID1limit=11] {{fromid}} should be inclusive of the id given. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1838) Timeline service getEntities API should provide ability to get entities from given id
[ https://issues.apache.org/jira/browse/YARN-1838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943745#comment-13943745 ] Zhijie Shen commented on YARN-1838: --- Billie, thanks for the new patch. It looks good to me. Based on your patch, I just made some minor touch: remove unnecessary suppresswarning, format a piece of javadoc, and enhance the test of testGetEntitiesWithFromTs Vinod, do you want to have a look as well? Timeline service getEntities API should provide ability to get entities from given id - Key: YARN-1838 URL: https://issues.apache.org/jira/browse/YARN-1838 Project: Hadoop YARN Issue Type: Sub-task Reporter: Srimanth Gunturi Assignee: Billie Rinaldi Attachments: YARN-1838.1.patch, YARN-1838.2.patch, YARN-1838.3.patch, YARN-1838.4.patch, YARN-1838.5.patch To support pagination, we need ability to get entities from a certain ID by providing a new param called {{fromid}}. For example on a page of 10 jobs, our first call will be like [http://server:8188/ws/v1/timeline/HIVE_QUERY_ID?fields=events,primaryfilters,otherinfolimit=11] When user hits next, we would like to call [http://server:8188/ws/v1/timeline/HIVE_QUERY_ID?fields=events,primaryfilters,otherinfofromid=JID11limit=11] and continue on for further _Next_ clicks On hitting back, we will make similar calls for previous items [http://server:8188/ws/v1/timeline/HIVE_QUERY_ID?fields=events,primaryfilters,otherinfofromid=JID1limit=11] {{fromid}} should be inclusive of the id given. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1838) Timeline service getEntities API should provide ability to get entities from given id
[ https://issues.apache.org/jira/browse/YARN-1838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943759#comment-13943759 ] Billie Rinaldi commented on YARN-1838: -- Thanks, [~zjshen]. Your updates in the latest patch look good to me. Timeline service getEntities API should provide ability to get entities from given id - Key: YARN-1838 URL: https://issues.apache.org/jira/browse/YARN-1838 Project: Hadoop YARN Issue Type: Sub-task Reporter: Srimanth Gunturi Assignee: Billie Rinaldi Attachments: YARN-1838.1.patch, YARN-1838.2.patch, YARN-1838.3.patch, YARN-1838.4.patch, YARN-1838.5.patch To support pagination, we need ability to get entities from a certain ID by providing a new param called {{fromid}}. For example on a page of 10 jobs, our first call will be like [http://server:8188/ws/v1/timeline/HIVE_QUERY_ID?fields=events,primaryfilters,otherinfolimit=11] When user hits next, we would like to call [http://server:8188/ws/v1/timeline/HIVE_QUERY_ID?fields=events,primaryfilters,otherinfofromid=JID11limit=11] and continue on for further _Next_ clicks On hitting back, we will make similar calls for previous items [http://server:8188/ws/v1/timeline/HIVE_QUERY_ID?fields=events,primaryfilters,otherinfofromid=JID1limit=11] {{fromid}} should be inclusive of the id given. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1864) Fair Scheduler Dynamic Hierarchical User Queues
[ https://issues.apache.org/jira/browse/YARN-1864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943775#comment-13943775 ] Hadoop QA commented on YARN-1864: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12636137/YARN-1864-v1.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3432//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3432//console This message is automatically generated. Fair Scheduler Dynamic Hierarchical User Queues --- Key: YARN-1864 URL: https://issues.apache.org/jira/browse/YARN-1864 Project: Hadoop YARN Issue Type: New Feature Components: scheduler Reporter: Ashwin Shankar Labels: scheduler Fix For: 2.4.0 Attachments: YARN-1864-v1.txt In Fair Scheduler, we want to be able to create user queues under any parent queue in the hierarchy. For eg. Say user1 submits a job to a parent queue called root.allUserQueues, we want be able to create a new queue called root.allUserQueues.user1 and run user1's job in it.Any further jobs submitted by this user to root.allUserQueues will be run in this newly created root.allUserQueues.user1. This is very similar to the 'user-as-default' feature in Fair Scheduler which creates user queues under root queue. But we want the ability to create user queues under ANY parent queue. Why do we want this ? 1. Preemption : these dynamically created user queues can preempt each other if its fair share is not met. So there is fairness among users. 2. Allocation to user queues : we want all the user queries(adhoc) to consume only a fraction of resources in the shared cluster. By creating this feature,we could do that by giving a fair share to the parent user queue which is then redistributed to all the dynamically created user queues. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1776) renewDelegationToken should survive RM failover
[ https://issues.apache.org/jira/browse/YARN-1776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943773#comment-13943773 ] Hadoop QA commented on YARN-1776: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12636139/YARN-1776.4.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3431//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3431//console This message is automatically generated. renewDelegationToken should survive RM failover --- Key: YARN-1776 URL: https://issues.apache.org/jira/browse/YARN-1776 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-1776.1.patch, YARN-1776.2.patch, YARN-1776.3.patch, YARN-1776.4.patch When a delegation token is renewed, two RMStateStore operations: 1) removing the old DT, and 2) storing the new DT will happen. If RM fails in between. There would be problem. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1863) TestRMFailover fails with 'AssertionError: null'
[ https://issues.apache.org/jira/browse/YARN-1863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-1863: -- Assignee: Xuan Gong (was: Vinod Kumar Vavilapalli) Summary: TestRMFailover fails with 'AssertionError: null' (was: Several YARN test failures on trunk) Okay, I cannot reproduce any more test failures on linux and Mac. Re-editing the title and assigning back to Xuan. Checking this in for now. TestRMFailover fails with 'AssertionError: null' - Key: YARN-1863 URL: https://issues.apache.org/jira/browse/YARN-1863 Project: Hadoop YARN Issue Type: Test Reporter: Ted Yu Assignee: Xuan Gong Priority: Blocker Attachments: YARN-1863.1.patch, YARN-1863.2.patch This happened in Hadoop-Yarn-trunk - Build # 516 and can be reproduced: {code} testWebAppProxyInStandAloneMode(org.apache.hadoop.yarn.client.TestRMFailover) Time elapsed: 5.834 sec FAILURE! java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:92) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertTrue(Assert.java:54) at org.apache.hadoop.yarn.client.TestRMFailover.verifyExpectedException(TestRMFailover.java:250) at org.apache.hadoop.yarn.client.TestRMFailover.testWebAppProxyInStandAloneMode(TestRMFailover.java:216) testEmbeddedWebAppProxy(org.apache.hadoop.yarn.client.TestRMFailover) Time elapsed: 5.341 sec FAILURE! java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:92) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertTrue(Assert.java:54) at org.apache.hadoop.yarn.client.TestRMFailover.verifyExpectedException(TestRMFailover.java:250) at org.apache.hadoop.yarn.client.TestRMFailover.testEmbeddedWebAppProxy(TestRMFailover.java:241) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (YARN-1372) Ensure all completed containers are reported to the AMs across RM restart
[ https://issues.apache.org/jira/browse/YARN-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot reassigned YARN-1372: --- Assignee: Anubhav Dhoot Ensure all completed containers are reported to the AMs across RM restart - Key: YARN-1372 URL: https://issues.apache.org/jira/browse/YARN-1372 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Anubhav Dhoot Currently the NM informs the RM about completed containers and then removes those containers from the RM notification list. The RM passes on that completed container information to the AM and the AM pulls this data. If the RM dies before the AM pulls this data then the AM may not be able to get this information again. To fix this, NM should maintain a separate list of such completed container notifications sent to the RM. After the AM has pulled the containers from the RM then the RM will inform the NM about it and the NM can remove the completed container from the new list. Upon re-register with the RM (after RM restart) the NM should send the entire list of completed containers to the RM along with any other containers that completed while the RM was dead. This ensures that the RM can inform the AM's about all completed containers. Some container completions may be reported more than once since the AM may have pulled the container but the RM may die before notifying the NM about the pull. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (YARN-1365) ApplicationMasterService to allow Register and Unregister of an app that was running before restart
[ https://issues.apache.org/jira/browse/YARN-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot reassigned YARN-1365: --- Assignee: Anubhav Dhoot ApplicationMasterService to allow Register and Unregister of an app that was running before restart --- Key: YARN-1365 URL: https://issues.apache.org/jira/browse/YARN-1365 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Anubhav Dhoot For an application that was running before restart, the ApplicationMasterService currently throws an exception when the app tries to make the initial register or final unregister call. These should succeed and the RMApp state machine should transition to completed like normal. Unregistration should succeed for an app that the RM considers complete since the RM may have died after saving completion in the store but before notifying the AM that the AM is free to exit. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (YARN-1373) Transition RMApp and RMAppAttempt state to RUNNING after restart for recovered running apps
[ https://issues.apache.org/jira/browse/YARN-1373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot reassigned YARN-1373: --- Assignee: Anubhav Dhoot Transition RMApp and RMAppAttempt state to RUNNING after restart for recovered running apps --- Key: YARN-1373 URL: https://issues.apache.org/jira/browse/YARN-1373 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Anubhav Dhoot Currently the RM moves recovered app attempts to the a terminal recovered state and starts a new attempt. Instead, it will have to transition the last attempt to a running state such that it can proceed as normal once the running attempt has resynced with the ApplicationMasterService (YARN-1365 and YARN-1366). If the RM had started the application container before dying then the AM would be up and trying to contact the RM. The RM may have had died before launching the container. For this case, the RM should wait for AM liveliness period and issue a kill container for the stored master container. It should transition this attempt to some RECOVER_ERROR state and proceed to start a new attempt. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (YARN-1823) Recover Unmanaged AMs
[ https://issues.apache.org/jira/browse/YARN-1823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot reassigned YARN-1823: --- Assignee: Anubhav Dhoot Recover Unmanaged AMs - Key: YARN-1823 URL: https://issues.apache.org/jira/browse/YARN-1823 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.4.0 Reporter: Karthik Kambatla Assignee: Anubhav Dhoot YARN-1815 does not recover unmanaged AMs after RM restart. This JIRA is a place holder to remove that and make any other necessary changes to ensure Unmanaged AMs continue to proceed after restart. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (YARN-1369) Capacity scheduler to re-populate container allocation state
[ https://issues.apache.org/jira/browse/YARN-1369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot reassigned YARN-1369: --- Assignee: Anubhav Dhoot Capacity scheduler to re-populate container allocation state Key: YARN-1369 URL: https://issues.apache.org/jira/browse/YARN-1369 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Anubhav Dhoot YARN-1367 and YARN-1368 enable the NM to tell the RM about currently running containers and the RM will pass this information to the schedulers along with the node information. The schedulers are currently already informed about previously running apps when the app data is recovered from the store. The scheduler is expected to be able to repopulate its allocation state from the above 2 sources of information. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (YARN-1371) FIFO scheduler to re-populate container allocation state
[ https://issues.apache.org/jira/browse/YARN-1371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot reassigned YARN-1371: --- Assignee: Anubhav Dhoot FIFO scheduler to re-populate container allocation state Key: YARN-1371 URL: https://issues.apache.org/jira/browse/YARN-1371 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Anubhav Dhoot YARN-1367 and YARN-1368 enable the NM to tell the RM about currently running containers and the RM will pass this information to the schedulers along with the node information. The schedulers are currently already informed about previously running apps when the app data is recovered from the store. The scheduler is expected to be able to repopulate its allocation state from the above 2 sources of information. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1854) TestRMHA#testStartAndTransitions Fails
[ https://issues.apache.org/jira/browse/YARN-1854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943784#comment-13943784 ] Vinod Kumar Vavilapalli commented on YARN-1854: --- +1, YARN-1863 is in. Checking this in. TestRMHA#testStartAndTransitions Fails -- Key: YARN-1854 URL: https://issues.apache.org/jira/browse/YARN-1854 Project: Hadoop YARN Issue Type: Test Affects Versions: 2.4.0 Reporter: Mit Desai Assignee: Rohith Priority: Blocker Attachments: YARN-1854.1.patch, YARN-1854.patch {noformat} testStartAndTransitions(org.apache.hadoop.yarn.server.resourcemanager.TestRMHA) Time elapsed: 5.883 sec FAILURE! java.lang.AssertionError: Incorrect value for metric availableMB expected:2048 but was:4096 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.assertMetric(TestRMHA.java:396) at org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.verifyClusterMetrics(TestRMHA.java:387) at org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.testStartAndTransitions(TestRMHA.java:160) Results : Failed tests: TestRMHA.testStartAndTransitions:160-verifyClusterMetrics:387-assertMetric:396 Incorrect value for metric availableMB expected:2048 but was:4096 {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1776) renewDelegationToken should survive RM failover
[ https://issues.apache.org/jira/browse/YARN-1776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-1776: -- Attachment: YARN-1776.5.patch Upload a new patch to clean code path renewDelegationToken should survive RM failover --- Key: YARN-1776 URL: https://issues.apache.org/jira/browse/YARN-1776 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-1776.1.patch, YARN-1776.2.patch, YARN-1776.3.patch, YARN-1776.4.patch, YARN-1776.5.patch When a delegation token is renewed, two RMStateStore operations: 1) removing the old DT, and 2) storing the new DT will happen. If RM fails in between. There would be problem. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-556) RM Restart phase 2 - Work preserving restart
[ https://issues.apache.org/jira/browse/YARN-556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943789#comment-13943789 ] Karthik Kambatla commented on YARN-556: --- Thanks for posting the design doc, [~bikassaha]. [~adhoot] and I have been working on this for the past few days towards an initial prototype, so we get a handle on all the items required. In terms of actual work-items (JIRAs), I wonder if it makes sense to work in a branch. Making the AM, NM resync changes without the scheduler changes would break things. We can work on the scheduler changes first, so there is no caller and add resync later, but I suppose that would make it hard to test outside of unit tests. Thoughts? RM Restart phase 2 - Work preserving restart Key: YARN-556 URL: https://issues.apache.org/jira/browse/YARN-556 Project: Hadoop YARN Issue Type: New Feature Components: resourcemanager Reporter: Bikas Saha Assignee: Bikas Saha Attachments: Work Preserving RM Restart.pdf YARN-128 covered storing the state needed for the RM to recover critical information. This umbrella jira will track changes needed to recover the running state of the cluster so that work can be preserved across RM restarts. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1521) Mark appropriate protocol methods with the idempotent annotation or AtMostOnce annotation
[ https://issues.apache.org/jira/browse/YARN-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943793#comment-13943793 ] Jian He commented on YARN-1521: --- - getNewApplicationId , getDelegationToken. since each call returns a new ID/Token, not sure this matches with idempotency. - For the registers protocols. For example, registerNodeManager : if previous call succeeds, RM didn't crash, registerNodeManager retry because of some network problem, the next call comes in, this node is deemed as a reconnected node instead of a new node. Probably AtMostOnce? Mark appropriate protocol methods with the idempotent annotation or AtMostOnce annotation - Key: YARN-1521 URL: https://issues.apache.org/jira/browse/YARN-1521 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong After YARN-1028, we add the automatically failover into RMProxy. This JIRA is to identify whether we need to add idempotent annotation and which methods can be marked as idempotent. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (YARN-1577) Unmanaged AM is broken because of YARN-1493
[ https://issues.apache.org/jira/browse/YARN-1577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He reassigned YARN-1577: - Assignee: Jian He (was: Naren Koneru) Unmanaged AM is broken because of YARN-1493 --- Key: YARN-1577 URL: https://issues.apache.org/jira/browse/YARN-1577 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.3.0 Reporter: Jian He Assignee: Jian He Priority: Blocker Attachments: YARN-1577.1.patch, YARN-1577.2.patch Today unmanaged AM client is waiting for app state to be Accepted to launch the AM. This is broken since we changed in YARN-1493 to start the attempt after the application is Accepted. We may need to introduce an attempt state report that client can rely on to query the attempt state and choose to launch the unmanaged AM. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1864) Fair Scheduler Dynamic Hierarchical User Queues
[ https://issues.apache.org/jira/browse/YARN-1864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashwin Shankar updated YARN-1864: - Description: In Fair Scheduler, we want to be able to create user queues under any parent queue in the hierarchy. For eg. Say user1 submits a job to a parent queue called root.allUserQueues, we want be able to create a new queue called root.allUserQueues.user1 and run user1's job in it.Any further jobs submitted by this user to root.allUserQueues will be run in this newly created root.allUserQueues.user1. This is very similar to the 'user-as-default' feature in Fair Scheduler which creates user queues under root queue. But we want the ability to create user queues under ANY parent queue. Why do we want this ? 1. Preemption : these dynamically created user queues can preempt each other if its fair share is not met. So there is fairness among users. User queues can also preempt other non-user leaf queue as well if below fair share. 2. Allocation to user queues : we want all the user queries(adhoc) to consume only a fraction of resources in the shared cluster. By creating this feature,we could do that by giving a fair share to the parent user queue which is then redistributed to all the dynamically created user queues. was: In Fair Scheduler, we want to be able to create user queues under any parent queue in the hierarchy. For eg. Say user1 submits a job to a parent queue called root.allUserQueues, we want be able to create a new queue called root.allUserQueues.user1 and run user1's job in it.Any further jobs submitted by this user to root.allUserQueues will be run in this newly created root.allUserQueues.user1. This is very similar to the 'user-as-default' feature in Fair Scheduler which creates user queues under root queue. But we want the ability to create user queues under ANY parent queue. Why do we want this ? 1. Preemption : these dynamically created user queues can preempt each other if its fair share is not met. So there is fairness among users. 2. Allocation to user queues : we want all the user queries(adhoc) to consume only a fraction of resources in the shared cluster. By creating this feature,we could do that by giving a fair share to the parent user queue which is then redistributed to all the dynamically created user queues. Fair Scheduler Dynamic Hierarchical User Queues --- Key: YARN-1864 URL: https://issues.apache.org/jira/browse/YARN-1864 Project: Hadoop YARN Issue Type: New Feature Components: scheduler Reporter: Ashwin Shankar Labels: scheduler Fix For: 2.4.0 Attachments: YARN-1864-v1.txt In Fair Scheduler, we want to be able to create user queues under any parent queue in the hierarchy. For eg. Say user1 submits a job to a parent queue called root.allUserQueues, we want be able to create a new queue called root.allUserQueues.user1 and run user1's job in it.Any further jobs submitted by this user to root.allUserQueues will be run in this newly created root.allUserQueues.user1. This is very similar to the 'user-as-default' feature in Fair Scheduler which creates user queues under root queue. But we want the ability to create user queues under ANY parent queue. Why do we want this ? 1. Preemption : these dynamically created user queues can preempt each other if its fair share is not met. So there is fairness among users. User queues can also preempt other non-user leaf queue as well if below fair share. 2. Allocation to user queues : we want all the user queries(adhoc) to consume only a fraction of resources in the shared cluster. By creating this feature,we could do that by giving a fair share to the parent user queue which is then redistributed to all the dynamically created user queues. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1577) Unmanaged AM is broken because of YARN-1493
[ https://issues.apache.org/jira/browse/YARN-1577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943800#comment-13943800 ] Vinod Kumar Vavilapalli commented on YARN-1577: --- Quickly scanned the patch. It looks like an existing bug, but it looks like even if the app fails immediately for some reason, client will still be stuck for 10min wait timeout. Unmanaged AM is broken because of YARN-1493 --- Key: YARN-1577 URL: https://issues.apache.org/jira/browse/YARN-1577 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.3.0 Reporter: Jian He Assignee: Jian He Priority: Blocker Attachments: YARN-1577.1.patch, YARN-1577.2.patch Today unmanaged AM client is waiting for app state to be Accepted to launch the AM. This is broken since we changed in YARN-1493 to start the attempt after the application is Accepted. We may need to introduce an attempt state report that client can rely on to query the attempt state and choose to launch the unmanaged AM. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (YARN-1577) Unmanaged AM is broken because of YARN-1493
[ https://issues.apache.org/jira/browse/YARN-1577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943800#comment-13943800 ] Vinod Kumar Vavilapalli edited comment on YARN-1577 at 3/22/14 12:52 AM: - Quickly scanned the patch. It looks like an existing bug, but this patch may worsen it - even if the app fails immediately for some reason, client will still be stuck for the 10min wait timeout. was (Author: vinodkv): Quickly scanned the patch. It looks like an existing bug, but it looks like even if the app fails immediately for some reason, client will still be stuck for 10min wait timeout. Unmanaged AM is broken because of YARN-1493 --- Key: YARN-1577 URL: https://issues.apache.org/jira/browse/YARN-1577 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.3.0 Reporter: Jian He Assignee: Jian He Priority: Blocker Attachments: YARN-1577.1.patch, YARN-1577.2.patch Today unmanaged AM client is waiting for app state to be Accepted to launch the AM. This is broken since we changed in YARN-1493 to start the attempt after the application is Accepted. We may need to introduce an attempt state report that client can rely on to query the attempt state and choose to launch the unmanaged AM. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1854) TestRMHA#testStartAndTransitions Fails
[ https://issues.apache.org/jira/browse/YARN-1854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943806#comment-13943806 ] Hudson commented on YARN-1854: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5377 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5377/]) YARN-1854. Fixed test failure in TestRMHA#testStartAndTransitions. Contributed by Rohith Sharma KS. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1580097) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMHA.java TestRMHA#testStartAndTransitions Fails -- Key: YARN-1854 URL: https://issues.apache.org/jira/browse/YARN-1854 Project: Hadoop YARN Issue Type: Test Affects Versions: 2.4.0 Reporter: Mit Desai Assignee: Rohith Priority: Blocker Fix For: 2.4.0 Attachments: YARN-1854.1.patch, YARN-1854.patch {noformat} testStartAndTransitions(org.apache.hadoop.yarn.server.resourcemanager.TestRMHA) Time elapsed: 5.883 sec FAILURE! java.lang.AssertionError: Incorrect value for metric availableMB expected:2048 but was:4096 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.assertMetric(TestRMHA.java:396) at org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.verifyClusterMetrics(TestRMHA.java:387) at org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.testStartAndTransitions(TestRMHA.java:160) Results : Failed tests: TestRMHA.testStartAndTransitions:160-verifyClusterMetrics:387-assertMetric:396 Incorrect value for metric availableMB expected:2048 but was:4096 {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1859) WebAppProxyServlet will throw ApplicationNotFoundException if the app is no longer cached in RM
[ https://issues.apache.org/jira/browse/YARN-1859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943803#comment-13943803 ] Hudson commented on YARN-1859: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5377 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5377/]) YARN-1863. Fixed test failure in TestRMFailover after YARN-1859. Contributed by Xuan Gong. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1580094) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestRMFailover.java WebAppProxyServlet will throw ApplicationNotFoundException if the app is no longer cached in RM --- Key: YARN-1859 URL: https://issues.apache.org/jira/browse/YARN-1859 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Zhijie Shen Fix For: 2.4.0 Attachments: YARN-1859.1.patch WebAppProxyServlet checks null to determine whether the application is not found or not. {code} ApplicationReport applicationReport = getApplicationReport(id); if(applicationReport == null) { LOG.warn(req.getRemoteUser()+ Attempting to access +id+ that was not found); {code} However, WebAppProxyServlet calls AppReportFetcher, which consequently calls ClientRMService. When application is not found, ClientRMService throws ApplicationNotFoundException. Therefore, in WebAppProxyServlet, the following logic to create the tracking url for a non-cached app will no longer be in use. -- This message was sent by Atlassian JIRA (v6.2#6252)