[jira] [Commented] (YARN-1833) TestRMAdminService Fails in trunk and branch-2
[ https://issues.apache.org/jira/browse/YARN-1833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935704#comment-13935704 ] Jonathan Eagles commented on YARN-1833: --- +1. YARN-1830 causes the TestRMRestart error. TestRMAdminService Fails in trunk and branch-2 -- Key: YARN-1833 URL: https://issues.apache.org/jira/browse/YARN-1833 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 2.4.0 Reporter: Mit Desai Assignee: Mit Desai Labels: Test Attachments: YARN-1833-v2.patch, YARN-1833.patch In the test testRefreshUserToGroupsMappingsWithFileSystemBasedConfigurationProvider, the following assert is not needed. {code} Assert.assertTrue(groupWithInit.size() != groupBefore.size()); {code} As the assert takes the default groups for groupWithInit (which in my case are users, sshusers and wheel), it fails as the size of both groupWithInit and groupBefore are same. I do not think we need to have this assert here. Moreover we are also checking that the groupInit does not have the userGroups that are in the groupBefore so removing the assert may not be harmful. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1136) Replace junit.framework.Assert with org.junit.Assert
[ https://issues.apache.org/jira/browse/YARN-1136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles updated YARN-1136: -- Attachment: yarn1136-v1.patch Kicking the build with an updated patch. Replace junit.framework.Assert with org.junit.Assert Key: YARN-1136 URL: https://issues.apache.org/jira/browse/YARN-1136 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.0-beta Reporter: Karthik Kambatla Assignee: Chen He Labels: newbie, test Attachments: yarn1136-v1.patch, yarn1136.patch There are several places where we are using junit.framework.Assert instead of org.junit.Assert. {code}grep -rn junit.framework.Assert hadoop-yarn-project/ --include=*.java{code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Moved] (YARN-1845) Elapsed time for failed tasks that never started is wrong
[ https://issues.apache.org/jira/browse/YARN-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles moved MAPREDUCE-5797 to YARN-1845: -- Component/s: (was: webapps) (was: jobhistoryserver) Target Version/s: 3.0.0, 2.5.0 (was: 0.23.11, 2.4.0) Affects Version/s: (was: 0.23.9) 0.23.9 Issue Type: Improvement (was: Bug) Key: YARN-1845 (was: MAPREDUCE-5797) Project: Hadoop YARN (was: Hadoop Map/Reduce) Elapsed time for failed tasks that never started is wrong Key: YARN-1845 URL: https://issues.apache.org/jira/browse/YARN-1845 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 0.23.9 Reporter: Rushabh S Shah Assignee: Rushabh S Shah Attachments: MAPREDUCE-5797-v3.patch, patch-MapReduce-5797-v2.patch, patch-MapReduce-5797-v2.patch, patch-MapReduce-5797.patch The elapsed time for tasks in a failed job that were never started can be way off. It looks like we're marking the start time as the beginning of the epoch (i.e.: start time = -1) but the finish time is when the task was marked as failed when the whole job failed. That causes the calculated elapsed time of the task to be a ridiculous number of hours. Tasks that fail without any attempts shouldn't have start/finish/elapsed times. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1845) Elapsed time for failed tasks that never started is wrong
[ https://issues.apache.org/jira/browse/YARN-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13938017#comment-13938017 ] Jonathan Eagles commented on YARN-1845: --- +1. lgtm. Thanks for the patch, Rushabh. Committing this to branch-2 and trunk. Elapsed time for failed tasks that never started is wrong Key: YARN-1845 URL: https://issues.apache.org/jira/browse/YARN-1845 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 0.23.9 Reporter: Rushabh S Shah Assignee: Rushabh S Shah Attachments: MAPREDUCE-5797-v3.patch, patch-MapReduce-5797-v2.patch, patch-MapReduce-5797-v2.patch, patch-MapReduce-5797.patch The elapsed time for tasks in a failed job that were never started can be way off. It looks like we're marking the start time as the beginning of the epoch (i.e.: start time = -1) but the finish time is when the task was marked as failed when the whole job failed. That causes the calculated elapsed time of the task to be a ridiculous number of hours. Tasks that fail without any attempts shouldn't have start/finish/elapsed times. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1845) Elapsed time for failed tasks that never started is wrong
[ https://issues.apache.org/jira/browse/YARN-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13938016#comment-13938016 ] Jonathan Eagles commented on YARN-1845: --- Moved this to YARN to better reflect where the changes are taking place. Elapsed time for failed tasks that never started is wrong Key: YARN-1845 URL: https://issues.apache.org/jira/browse/YARN-1845 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 0.23.9 Reporter: Rushabh S Shah Assignee: Rushabh S Shah Attachments: MAPREDUCE-5797-v3.patch, patch-MapReduce-5797-v2.patch, patch-MapReduce-5797-v2.patch, patch-MapReduce-5797.patch The elapsed time for tasks in a failed job that were never started can be way off. It looks like we're marking the start time as the beginning of the epoch (i.e.: start time = -1) but the finish time is when the task was marked as failed when the whole job failed. That causes the calculated elapsed time of the task to be a ridiculous number of hours. Tasks that fail without any attempts shouldn't have start/finish/elapsed times. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1769) CapacityScheduler: Improve reservations
[ https://issues.apache.org/jira/browse/YARN-1769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13938352#comment-13938352 ] Jonathan Eagles commented on YARN-1769: --- TestResourceTrackerService test issue is caused by YARN-1591 CapacityScheduler: Improve reservations Key: YARN-1769 URL: https://issues.apache.org/jira/browse/YARN-1769 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler Affects Versions: 2.3.0 Reporter: Thomas Graves Assignee: Thomas Graves Attachments: YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, YARN-1769.patch Currently the CapacityScheduler uses reservations in order to handle requests for large containers and the fact there might not currently be enough space available on a single host. The current algorithm for reservations is to reserve as many containers as currently required and then it will start to reserve more above that after a certain number of re-reservations (currently biased against larger containers). Anytime it hits the limit of number reserved it stops looking at any other nodes. This results in potentially missing nodes that have enough space to fullfill the request. The other place for improvement is currently reservations count against your queue capacity. If you have reservations you could hit the various limits which would then stop you from looking further at that node. The above 2 cases can cause an application requesting a larger container to take a long time to gets it resources. We could improve upon both of those by simply continuing to look at incoming nodes to see if we could potentially swap out a reservation for an actual allocation. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1833) TestRMAdminService Fails in trunk and branch-2
[ https://issues.apache.org/jira/browse/YARN-1833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles updated YARN-1833: -- Fix Version/s: 2.4.0 TestRMAdminService Fails in trunk and branch-2 -- Key: YARN-1833 URL: https://issues.apache.org/jira/browse/YARN-1833 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 2.4.0 Reporter: Mit Desai Assignee: Mit Desai Labels: Test Fix For: 3.0.0, 2.4.0, 2.5.0 Attachments: YARN-1833-v2.patch, YARN-1833.patch In the test testRefreshUserToGroupsMappingsWithFileSystemBasedConfigurationProvider, the following assert is not needed. {code} Assert.assertTrue(groupWithInit.size() != groupBefore.size()); {code} As the assert takes the default groups for groupWithInit (which in my case are users, sshusers and wheel), it fails as the size of both groupWithInit and groupBefore are same. I do not think we need to have this assert here. Moreover we are also checking that the groupInit does not have the userGroups that are in the groupBefore so removing the assert may not be harmful. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1833) TestRMAdminService Fails in trunk and branch-2
[ https://issues.apache.org/jira/browse/YARN-1833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13940543#comment-13940543 ] Jonathan Eagles commented on YARN-1833: --- Added this test only fix to 2.4.0 release since it is really hindering my testing efforts on that line. TestRMAdminService Fails in trunk and branch-2 -- Key: YARN-1833 URL: https://issues.apache.org/jira/browse/YARN-1833 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 2.4.0 Reporter: Mit Desai Assignee: Mit Desai Labels: Test Fix For: 3.0.0, 2.4.0, 2.5.0 Attachments: YARN-1833-v2.patch, YARN-1833.patch In the test testRefreshUserToGroupsMappingsWithFileSystemBasedConfigurationProvider, the following assert is not needed. {code} Assert.assertTrue(groupWithInit.size() != groupBefore.size()); {code} As the assert takes the default groups for groupWithInit (which in my case are users, sshusers and wheel), it fails as the size of both groupWithInit and groupBefore are same. I do not think we need to have this assert here. Moreover we are also checking that the groupInit does not have the userGroups that are in the groupBefore so removing the assert may not be harmful. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1670) aggregated log writer can write more log data then it says is the log length
[ https://issues.apache.org/jira/browse/YARN-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943421#comment-13943421 ] Jonathan Eagles commented on YARN-1670: --- Mit, I'm worried that we are still going to have this issue except in the opposite way. On the last read that puts us over the initial filelength, we are not going to write the last part of the data that still fits into the original filelength. In this case our Aggregate File Log length will be smaller than the filelength written to the data structure. jeagles aggregated log writer can write more log data then it says is the log length Key: YARN-1670 URL: https://issues.apache.org/jira/browse/YARN-1670 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 0.23.10, 2.2.0 Reporter: Thomas Graves Assignee: Mit Desai Priority: Critical Fix For: 2.4.0 Attachments: YARN-1670-b23.patch, YARN-1670-v2-b23.patch, YARN-1670-v2.patch, YARN-1670-v3-b23.patch, YARN-1670-v3.patch, YARN-1670.patch, YARN-1670.patch We have seen exceptions when using 'yarn logs' to read log files. at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Long.parseLong(Long.java:441) at java.lang.Long.parseLong(Long.java:483) at org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.readAContainerLogsForALogType(AggregatedLogFormat.java:518) at org.apache.hadoop.yarn.logaggregation.LogDumper.dumpAContainerLogs(LogDumper.java:178) at org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:130) at org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:246) We traced it down to the reader trying to read the file type of the next file but where it reads is still log data from the previous file. What happened was the Log Length was written as a certain size but the log data was actually longer then that. Inside of the write() routine in LogValue it first writes what the logfile length is, but then when it goes to write the log itself it just goes to the end of the file. There is a race condition here where if someone is still writing to the file when it goes to be aggregated the length written could be to small. We should have the write() routine stop when it writes whatever it said was the length. It would be nice if we could somehow tell the user it might be truncated but I'm not sure of a good way to do this. We also noticed that a bug in readAContainerLogsForALogType where it is using an int for curRead whereas it should be using a long. while (len != -1 curRead fileLength) { This isn't actually a problem right now as it looks like the underlying decoder is doing the right thing and the len condition exits. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Reopened] (YARN-1670) aggregated log writer can write more log data then it says is the log length
[ https://issues.apache.org/jira/browse/YARN-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles reopened YARN-1670: --- aggregated log writer can write more log data then it says is the log length Key: YARN-1670 URL: https://issues.apache.org/jira/browse/YARN-1670 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 0.23.10, 2.2.0 Reporter: Thomas Graves Assignee: Mit Desai Priority: Critical Fix For: 2.4.0 Attachments: YARN-1670-b23.patch, YARN-1670-v2-b23.patch, YARN-1670-v2.patch, YARN-1670-v3-b23.patch, YARN-1670-v3.patch, YARN-1670.patch, YARN-1670.patch We have seen exceptions when using 'yarn logs' to read log files. at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Long.parseLong(Long.java:441) at java.lang.Long.parseLong(Long.java:483) at org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.readAContainerLogsForALogType(AggregatedLogFormat.java:518) at org.apache.hadoop.yarn.logaggregation.LogDumper.dumpAContainerLogs(LogDumper.java:178) at org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:130) at org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:246) We traced it down to the reader trying to read the file type of the next file but where it reads is still log data from the previous file. What happened was the Log Length was written as a certain size but the log data was actually longer then that. Inside of the write() routine in LogValue it first writes what the logfile length is, but then when it goes to write the log itself it just goes to the end of the file. There is a race condition here where if someone is still writing to the file when it goes to be aggregated the length written could be to small. We should have the write() routine stop when it writes whatever it said was the length. It would be nice if we could somehow tell the user it might be truncated but I'm not sure of a good way to do this. We also noticed that a bug in readAContainerLogsForALogType where it is using an int for curRead whereas it should be using a long. while (len != -1 curRead fileLength) { This isn't actually a problem right now as it looks like the underlying decoder is doing the right thing and the len condition exits. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1670) aggregated log writer can write more log data then it says is the log length
[ https://issues.apache.org/jira/browse/YARN-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943422#comment-13943422 ] Jonathan Eagles commented on YARN-1670: --- I've reopened this ticket to verify the correctness of the patch that went into branch-2 and branch-2.4. aggregated log writer can write more log data then it says is the log length Key: YARN-1670 URL: https://issues.apache.org/jira/browse/YARN-1670 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 0.23.10, 2.2.0 Reporter: Thomas Graves Assignee: Mit Desai Priority: Critical Fix For: 2.4.0 Attachments: YARN-1670-b23.patch, YARN-1670-v2-b23.patch, YARN-1670-v2.patch, YARN-1670-v3-b23.patch, YARN-1670-v3.patch, YARN-1670.patch, YARN-1670.patch We have seen exceptions when using 'yarn logs' to read log files. at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Long.parseLong(Long.java:441) at java.lang.Long.parseLong(Long.java:483) at org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.readAContainerLogsForALogType(AggregatedLogFormat.java:518) at org.apache.hadoop.yarn.logaggregation.LogDumper.dumpAContainerLogs(LogDumper.java:178) at org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:130) at org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:246) We traced it down to the reader trying to read the file type of the next file but where it reads is still log data from the previous file. What happened was the Log Length was written as a certain size but the log data was actually longer then that. Inside of the write() routine in LogValue it first writes what the logfile length is, but then when it goes to write the log itself it just goes to the end of the file. There is a race condition here where if someone is still writing to the file when it goes to be aggregated the length written could be to small. We should have the write() routine stop when it writes whatever it said was the length. It would be nice if we could somehow tell the user it might be truncated but I'm not sure of a good way to do this. We also noticed that a bug in readAContainerLogsForALogType where it is using an int for curRead whereas it should be using a long. while (len != -1 curRead fileLength) { This isn't actually a problem right now as it looks like the underlying decoder is doing the right thing and the len condition exits. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1670) aggregated log writer can write more log data then it says is the log length
[ https://issues.apache.org/jira/browse/YARN-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13944720#comment-13944720 ] Jonathan Eagles commented on YARN-1670: --- Thanks, [~mdesai]. The above logic seems correct, now. Two minor things. - If we move from a count up byte counter to a count down byte counter, does this seem easier to understand? {code} long bytesLeft = file.length(); while (len = in.read(buf)) != -1) { //If buffer contents within fileLength, write if (len bytesLeft) { out.write(buf, 0, len); bytesLeft -= len; } //else only write contents that are within fileLength, then exit early else { out.write(buf, 0, (int)bytesLeft); break; } } {code} - I see the buffer size of 65535 being used (I know, not your code). I wonder if this is really intended to be block aligned (64K) since that will result in theoretical optimal read performance. aggregated log writer can write more log data then it says is the log length Key: YARN-1670 URL: https://issues.apache.org/jira/browse/YARN-1670 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 0.23.10, 2.2.0 Reporter: Thomas Graves Assignee: Mit Desai Priority: Critical Fix For: 2.4.0 Attachments: YARN-1670-b23.patch, YARN-1670-v2-b23.patch, YARN-1670-v2.patch, YARN-1670-v3-b23.patch, YARN-1670-v3.patch, YARN-1670-v4-b23.patch, YARN-1670-v4.patch, YARN-1670.patch, YARN-1670.patch We have seen exceptions when using 'yarn logs' to read log files. at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Long.parseLong(Long.java:441) at java.lang.Long.parseLong(Long.java:483) at org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.readAContainerLogsForALogType(AggregatedLogFormat.java:518) at org.apache.hadoop.yarn.logaggregation.LogDumper.dumpAContainerLogs(LogDumper.java:178) at org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:130) at org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:246) We traced it down to the reader trying to read the file type of the next file but where it reads is still log data from the previous file. What happened was the Log Length was written as a certain size but the log data was actually longer then that. Inside of the write() routine in LogValue it first writes what the logfile length is, but then when it goes to write the log itself it just goes to the end of the file. There is a race condition here where if someone is still writing to the file when it goes to be aggregated the length written could be to small. We should have the write() routine stop when it writes whatever it said was the length. It would be nice if we could somehow tell the user it might be truncated but I'm not sure of a good way to do this. We also noticed that a bug in readAContainerLogsForALogType where it is using an int for curRead whereas it should be using a long. while (len != -1 curRead fileLength) { This isn't actually a problem right now as it looks like the underlying decoder is doing the right thing and the len condition exits. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1670) aggregated log writer can write more log data then it says is the log length
[ https://issues.apache.org/jira/browse/YARN-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945436#comment-13945436 ] Jonathan Eagles commented on YARN-1670: --- +1 on this change. committing to trunk, branch-2.4, branch-2, branch-0.23 aggregated log writer can write more log data then it says is the log length Key: YARN-1670 URL: https://issues.apache.org/jira/browse/YARN-1670 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 0.23.10, 2.2.0 Reporter: Thomas Graves Assignee: Mit Desai Priority: Critical Fix For: 2.4.0 Attachments: YARN-1670-b23.patch, YARN-1670-v2-b23.patch, YARN-1670-v2.patch, YARN-1670-v3-b23.patch, YARN-1670-v3.patch, YARN-1670-v4-b23.patch, YARN-1670-v4-b23.patch, YARN-1670-v4.patch, YARN-1670-v4.patch, YARN-1670.patch, YARN-1670.patch We have seen exceptions when using 'yarn logs' to read log files. at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Long.parseLong(Long.java:441) at java.lang.Long.parseLong(Long.java:483) at org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.readAContainerLogsForALogType(AggregatedLogFormat.java:518) at org.apache.hadoop.yarn.logaggregation.LogDumper.dumpAContainerLogs(LogDumper.java:178) at org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:130) at org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:246) We traced it down to the reader trying to read the file type of the next file but where it reads is still log data from the previous file. What happened was the Log Length was written as a certain size but the log data was actually longer then that. Inside of the write() routine in LogValue it first writes what the logfile length is, but then when it goes to write the log itself it just goes to the end of the file. There is a race condition here where if someone is still writing to the file when it goes to be aggregated the length written could be to small. We should have the write() routine stop when it writes whatever it said was the length. It would be nice if we could somehow tell the user it might be truncated but I'm not sure of a good way to do this. We also noticed that a bug in readAContainerLogsForALogType where it is using an int for curRead whereas it should be using a long. while (len != -1 curRead fileLength) { This isn't actually a problem right now as it looks like the underlying decoder is doing the right thing and the len condition exits. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1426) YARN Components need to unregister their beans upon shutdown
[ https://issues.apache.org/jira/browse/YARN-1426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles updated YARN-1426: -- Attachment: YARN-1426.patch YARN Components need to unregister their beans upon shutdown Key: YARN-1426 URL: https://issues.apache.org/jira/browse/YARN-1426 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 3.0.0, 2.3.0 Reporter: Jonathan Eagles Assignee: Jonathan Eagles Attachments: YARN-1426.patch, YARN-1426.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1106) The RM should point the tracking url to the RM app page if its empty
[ https://issues.apache.org/jira/browse/YARN-1106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles updated YARN-1106: -- Attachment: YARN-1106.patch [~tgraves]. I tried the latest patch against trunk but the tests now fail since the originalTrackingUrl is set to N/A and not null or empty. If we still want this behavior, we will need to add this condition as well. The RM should point the tracking url to the RM app page if its empty Key: YARN-1106 URL: https://issues.apache.org/jira/browse/YARN-1106 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 3.0.0, 2.1.0-beta, 0.23.9 Reporter: Thomas Graves Assignee: Thomas Graves Attachments: YARN-1106.patch, YARN-1106.patch It would be nice if the Resourcemanager set the tracking url to the RM app page if the application master doesn't pass one or passes the empty string. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1883) TestRMAdminService fails due to inconsistent entries in UserGroups
[ https://issues.apache.org/jira/browse/YARN-1883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13951270#comment-13951270 ] Jonathan Eagles commented on YARN-1883: --- +1. Thanks for cleaning this test up. The double bracket initialization that was there before is considered a hack since it is creating an anonymous subclass with a static initialization. Committing to trunk and branch-2. jeagles TestRMAdminService fails due to inconsistent entries in UserGroups -- Key: YARN-1883 URL: https://issues.apache.org/jira/browse/YARN-1883 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 2.4.0 Reporter: Mit Desai Assignee: Mit Desai Labels: java7 Attachments: YARN-1883.patch, YARN-1883.patch testRefreshUserToGroupsMappingsWithFileSystemBasedConfigurationProvider fails with the following error: {noformat} java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:92) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertTrue(Assert.java:54) at org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService.testRefreshUserToGroupsMappingsWithFileSystemBasedConfigurationProvider(TestRMAdminService.java:421) at org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService.testOrder(TestRMAdminService.java:104) {noformat} Line Numbers will be inconsistent as I was testing to run it in a particular order. But the Line on which the failure occurs is {code} Assert.assertTrue(groupBefore.contains(test_group_A) groupBefore.contains(test_group_B) groupBefore.contains(test_group_C) groupBefore.size() == 3); {code} testRMInitialsWithFileSystemBasedConfigurationProvider() and testRefreshUserToGroupsMappingsWithFileSystemBasedConfigurationProvider() calls the function {{MockUnixGroupsMapping.updateGroups();}} which changes the list of userGroups. testRefreshUserToGroupsMappingsWithFileSystemBasedConfigurationProvider() tries to verify the groups before changing it and fails if testRMInitialsWithFileSystemBasedConfigurationProvider() already ran and made the changes. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1906) TestRMRestart#testQueueMetricsOnRMRestart fails intermittently on trunk and branch2
[ https://issues.apache.org/jira/browse/YARN-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13962175#comment-13962175 ] Jonathan Eagles commented on YARN-1906: --- Mit, you might consider using waitForState instead of a raw sleep. This will protected us in the case of a missed race condition, though perhaps will result in more sleep time overall. TestRMRestart#testQueueMetricsOnRMRestart fails intermittently on trunk and branch2 --- Key: YARN-1906 URL: https://issues.apache.org/jira/browse/YARN-1906 Project: Hadoop YARN Issue Type: Bug Reporter: Mit Desai Assignee: Mit Desai Fix For: 3.0.0, 2.5.0 Attachments: YARN-1906.patch Here is the output of the format {noformat} testQueueMetricsOnRMRestart(org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart) Time elapsed: 9.757 sec FAILURE! java.lang.AssertionError: expected:2 but was:1 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.junit.Assert.assertEquals(Assert.java:456) at org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.assertQueueMetrics(TestRMRestart.java:1735) at org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testQueueMetricsOnRMRestart(TestRMRestart.java:1706) {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1981) Nodemanager version is not updated when a node reconnects
[ https://issues.apache.org/jira/browse/YARN-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13996825#comment-13996825 ] Jonathan Eagles commented on YARN-1981: --- +1. lgtm. Committing to branch-2 and trunk. Thanks, [~jlowe]. Nodemanager version is not updated when a node reconnects - Key: YARN-1981 URL: https://issues.apache.org/jira/browse/YARN-1981 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Jason Lowe Assignee: Jason Lowe Attachments: YARN-1981.patch When a nodemanager is quickly restarted and happens to change versions during the restart (e.g.: rolling upgrade scenario) the NM version as reported by the RM is not updated. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1198) Capacity Scheduler headroom calculation does not work as expected
[ https://issues.apache.org/jira/browse/YARN-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14027938#comment-14027938 ] Jonathan Eagles commented on YARN-1198: --- Since headroom calculation is used reducer preemption, I have seen issues with these bugs causes queue deadlock where multi-job queue is full of reducers that can't finish since the mappers can't run due to reducers having higher task priority. Preemption doesn't kill reducers since headroom falsely shows there is plenty of room in the queue for mappers to run. Capacity Scheduler headroom calculation does not work as expected - Key: YARN-1198 URL: https://issues.apache.org/jira/browse/YARN-1198 Project: Hadoop YARN Issue Type: Bug Reporter: Omkar Vinit Joshi Assignee: Omkar Vinit Joshi Today headroom calculation (for the app) takes place only when * New node is added/removed from the cluster * New container is getting assigned to the application. However there are potentially lot of situations which are not considered for this calculation * If a container finishes then headroom for that application will change and should be notified to the AM accordingly. * If a single user has submitted multiple applications (app1 and app2) to the same queue then ** If app1's container finishes then not only app1's but also app2's AM should be notified about the change in headroom. ** Similarly if a container is assigned to any applications app1/app2 then both AM should be notified about their headroom. ** To simplify the whole communication process it is ideal to keep headroom per User per LeafQueue so that everyone gets the same picture (apps belonging to same user and submitted in same queue). * If a new user submits an application to the queue then all applications submitted by all users in that queue should be notified of the headroom change. * Also today headroom is an absolute number ( I think it should be normalized but then this is going to be not backward compatible..) * Also when admin user refreshes queue headroom has to be updated. These all are the potential bugs in headroom calculations -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1857) CapacityScheduler headroom doesn't account for other AM's running
[ https://issues.apache.org/jira/browse/YARN-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles updated YARN-1857: -- Priority: Critical (was: Major) CapacityScheduler headroom doesn't account for other AM's running - Key: YARN-1857 URL: https://issues.apache.org/jira/browse/YARN-1857 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Affects Versions: 2.3.0 Reporter: Thomas Graves Assignee: Chen He Priority: Critical Attachments: YARN-1857.patch, YARN-1857.patch Its possible to get an application to hang forever (or a long time) in a cluster with multiple users. The reason why is that the headroom sent to the application is based on the user limit but it doesn't account for other Application masters using space in that queue. So the headroom (user limit - user consumed) can be 0 even though the cluster is 100% full because the other space is being used by application masters from other users. For instance if you have a cluster with 1 queue, user limit is 100%, you have multiple users submitting applications. One very large application by user 1 starts up, runs most of its maps and starts running reducers. other users try to start applications and get their application masters started but not tasks. The very large application then gets to the point where it has consumed the rest of the cluster resources with all reduces. But at this point it needs to still finish a few maps. The headroom being sent to this application is only based on the user limit (which is 100% of the cluster capacity) its using lets say 95% of the cluster for reduces and then other 5% is being used by other users running application masters. The MRAppMaster thinks it still has 5% so it doesn't know that it should kill a reduce in order to run a map. This can happen in other scenarios also. Generally in a large cluster with multiple queues this shouldn't cause a hang forever but it could cause the application to take much longer. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1857) CapacityScheduler headroom doesn't account for other AM's running
[ https://issues.apache.org/jira/browse/YARN-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles updated YARN-1857: -- Target Version/s: 2.4.1 (was: 2.4.0) CapacityScheduler headroom doesn't account for other AM's running - Key: YARN-1857 URL: https://issues.apache.org/jira/browse/YARN-1857 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Affects Versions: 2.3.0 Reporter: Thomas Graves Assignee: Chen He Attachments: YARN-1857.patch, YARN-1857.patch Its possible to get an application to hang forever (or a long time) in a cluster with multiple users. The reason why is that the headroom sent to the application is based on the user limit but it doesn't account for other Application masters using space in that queue. So the headroom (user limit - user consumed) can be 0 even though the cluster is 100% full because the other space is being used by application masters from other users. For instance if you have a cluster with 1 queue, user limit is 100%, you have multiple users submitting applications. One very large application by user 1 starts up, runs most of its maps and starts running reducers. other users try to start applications and get their application masters started but not tasks. The very large application then gets to the point where it has consumed the rest of the cluster resources with all reduces. But at this point it needs to still finish a few maps. The headroom being sent to this application is only based on the user limit (which is 100% of the cluster capacity) its using lets say 95% of the cluster for reduces and then other 5% is being used by other users running application masters. The MRAppMaster thinks it still has 5% so it doesn't know that it should kill a reduce in order to run a map. This can happen in other scenarios also. Generally in a large cluster with multiple queues this shouldn't cause a hang forever but it could cause the application to take much longer. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1857) CapacityScheduler headroom doesn't account for other AM's running
[ https://issues.apache.org/jira/browse/YARN-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14027943#comment-14027943 ] Jonathan Eagles commented on YARN-1857: --- Bumping the priority since reducer preemption is broken in many cases without this fix. CapacityScheduler headroom doesn't account for other AM's running - Key: YARN-1857 URL: https://issues.apache.org/jira/browse/YARN-1857 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Affects Versions: 2.3.0 Reporter: Thomas Graves Assignee: Chen He Priority: Critical Attachments: YARN-1857.patch, YARN-1857.patch Its possible to get an application to hang forever (or a long time) in a cluster with multiple users. The reason why is that the headroom sent to the application is based on the user limit but it doesn't account for other Application masters using space in that queue. So the headroom (user limit - user consumed) can be 0 even though the cluster is 100% full because the other space is being used by application masters from other users. For instance if you have a cluster with 1 queue, user limit is 100%, you have multiple users submitting applications. One very large application by user 1 starts up, runs most of its maps and starts running reducers. other users try to start applications and get their application masters started but not tasks. The very large application then gets to the point where it has consumed the rest of the cluster resources with all reduces. But at this point it needs to still finish a few maps. The headroom being sent to this application is only based on the user limit (which is 100% of the cluster capacity) its using lets say 95% of the cluster for reduces and then other 5% is being used by other users running application masters. The MRAppMaster thinks it still has 5% so it doesn't know that it should kill a reduce in order to run a map. This can happen in other scenarios also. Generally in a large cluster with multiple queues this shouldn't cause a hang forever but it could cause the application to take much longer. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2184) ResourceManager may fail due to name node in safe mode
[ https://issues.apache.org/jira/browse/YARN-2184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14038912#comment-14038912 ] Jonathan Eagles commented on YARN-2184: --- Jeff, This issue has already be reported under YARN-2035 by me and there is a patch available. Let me know if this solves your issue and we can close this ticket out. ResourceManager may fail due to name node in safe mode -- Key: YARN-2184 URL: https://issues.apache.org/jira/browse/YARN-2184 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.4.0 Reporter: Jeff Zhang Assignee: Jeff Zhang If the historyservice is enabled in resourcemanager, it will try to mkdir when service is inited. And at that time maybe the name node is still in safemode which may cause the historyservice failed and then cause the resouremanager fail. It would be very possible when the cluster is restarted when namenode will be in safemode in a long time. Here's the error logs: {code} Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.SafeModeException): Cannot create directory /Users/jzhang/Java/lib/hadoop-2.4.0/logs/yarn/system/history/ApplicationHistoryDataRoot. Name node is in safe mode. The reported blocks 85 has reached the threshold 0.9990 of total blocks 85. The number of live datanodes 1 has reached the minimum number 0. In safe mode extension. Safe mode will be turned off automatically in 19 seconds. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkNameNodeSafeMode(FSNamesystem.java:1195) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:3564) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3540) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:754) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:558) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) at org.apache.hadoop.ipc.Client.call(Client.java:1410) at org.apache.hadoop.ipc.Client.call(Client.java:1363) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) at com.sun.proxy.$Proxy14.mkdirs(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103) at com.sun.proxy.$Proxy14.mkdirs(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.mkdirs(ClientNamenodeProtocolTranslatorPB.java:500) at org.apache.hadoop.hdfs.DFSClient.primitiveMkdir(DFSClient.java:2553) at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:2524) at org.apache.hadoop.hdfs.DistributedFileSystem$16.doCall(DistributedFileSystem.java:827) at org.apache.hadoop.hdfs.DistributedFileSystem$16.doCall(DistributedFileSystem.java:823) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirsInternal(DistributedFileSystem.java:823) at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirs(DistributedFileSystem.java:816) at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1815) at org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore.serviceInit(FileSystemApplicationHistoryStore.java:120) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) ... 10 more 2014-06-20 11:06:25,220 INFO
[jira] [Created] (YARN-2277) Add JSONP support to the ATS REST API
Jonathan Eagles created YARN-2277: - Summary: Add JSONP support to the ATS REST API Key: YARN-2277 URL: https://issues.apache.org/jira/browse/YARN-2277 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jonathan Eagles As the Application Timeline Server is provided with built-in UI, it may make sense to enable JSONP Rest API capabilities to allow for remote UI to access the data directly via javascript without cross side server browser blocks coming into play. Example client may be like http://api.jquery.com/jQuery.getJSON/ This can alleviate the need to create a local proxy cache. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2277) Add JSONP support to the ATS REST API
[ https://issues.apache.org/jira/browse/YARN-2277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles updated YARN-2277: -- Attachment: YARN-2277.patch Starter patch for conversation starter Add JSONP support to the ATS REST API - Key: YARN-2277 URL: https://issues.apache.org/jira/browse/YARN-2277 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jonathan Eagles Attachments: YARN-2277.patch As the Application Timeline Server is provided with built-in UI, it may make sense to enable JSONP Rest API capabilities to allow for remote UI to access the data directly via javascript without cross side server browser blocks coming into play. Example client may be like http://api.jquery.com/jQuery.getJSON/ This can alleviate the need to create a local proxy cache. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2277) Add JSONP support to the ATS REST API
[ https://issues.apache.org/jira/browse/YARN-2277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14058366#comment-14058366 ] Jonathan Eagles commented on YARN-2277: --- A brief discussion on options http://jvaneyck.wordpress.com/2014/01/07/cross-domain-requests-in-javascript/ the JSONP method is already being used as part of jmx queries so I felt this was most consistent with the current system. I'm no way married to this approach. Add JSONP support to the ATS REST API - Key: YARN-2277 URL: https://issues.apache.org/jira/browse/YARN-2277 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jonathan Eagles Attachments: YARN-2277.patch As the Application Timeline Server is provided with built-in UI, it may make sense to enable JSONP Rest API capabilities to allow for remote UI to access the data directly via javascript without cross side server browser blocks coming into play. Example client may be like http://api.jquery.com/jQuery.getJSON/ This can alleviate the need to create a local proxy cache. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2277) Add JSONP support to the ATS REST API
[ https://issues.apache.org/jira/browse/YARN-2277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles updated YARN-2277: -- Attachment: YARN-2277-CORS.patch Add JSONP support to the ATS REST API - Key: YARN-2277 URL: https://issues.apache.org/jira/browse/YARN-2277 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jonathan Eagles Attachments: YARN-2277-CORS.patch, YARN-2277.patch As the Application Timeline Server is provided with built-in UI, it may make sense to enable JSONP Rest API capabilities to allow for remote UI to access the data directly via javascript without cross side server browser blocks coming into play. Example client may be like http://api.jquery.com/jQuery.getJSON/ This can alleviate the need to create a local proxy cache. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2277) Add Cross-Origin support to the ATS REST API
[ https://issues.apache.org/jira/browse/YARN-2277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles updated YARN-2277: -- Attachment: (was: YARN-2277-CORS.patch) Add Cross-Origin support to the ATS REST API Key: YARN-2277 URL: https://issues.apache.org/jira/browse/YARN-2277 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jonathan Eagles As the Application Timeline Server is provided with built-in UI, it may make sense to enable JSONP or CORS Rest API capabilities to allow for remote UI to access the data directly via javascript without cross side server browser blocks coming into play. Example client may be like http://api.jquery.com/jQuery.getJSON/ This can alleviate the need to create a local proxy cache. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2277) Add Cross-Origin support to the ATS REST API
[ https://issues.apache.org/jira/browse/YARN-2277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles updated YARN-2277: -- Attachment: YARN-2277-JSONP.patch Add Cross-Origin support to the ATS REST API Key: YARN-2277 URL: https://issues.apache.org/jira/browse/YARN-2277 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jonathan Eagles Attachments: YARN-2277-JSONP.patch As the Application Timeline Server is provided with built-in UI, it may make sense to enable JSONP or CORS Rest API capabilities to allow for remote UI to access the data directly via javascript without cross side server browser blocks coming into play. Example client may be like http://api.jquery.com/jQuery.getJSON/ This can alleviate the need to create a local proxy cache. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2277) Add Cross-Origin support to the ATS REST API
[ https://issues.apache.org/jira/browse/YARN-2277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles updated YARN-2277: -- Description: As the Application Timeline Server is provided with built-in UI, it may make sense to enable JSONP or CORS Rest API capabilities to allow for remote UI to access the data directly via javascript without cross side server browser blocks coming into play. Example client may be like http://api.jquery.com/jQuery.getJSON/ This can alleviate the need to create a local proxy cache. was: As the Application Timeline Server is provided with built-in UI, it may make sense to enable JSONP Rest API capabilities to allow for remote UI to access the data directly via javascript without cross side server browser blocks coming into play. Example client may be like http://api.jquery.com/jQuery.getJSON/ This can alleviate the need to create a local proxy cache. Summary: Add Cross-Origin support to the ATS REST API (was: Add JSONP support to the ATS REST API) Add Cross-Origin support to the ATS REST API Key: YARN-2277 URL: https://issues.apache.org/jira/browse/YARN-2277 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jonathan Eagles As the Application Timeline Server is provided with built-in UI, it may make sense to enable JSONP or CORS Rest API capabilities to allow for remote UI to access the data directly via javascript without cross side server browser blocks coming into play. Example client may be like http://api.jquery.com/jQuery.getJSON/ This can alleviate the need to create a local proxy cache. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2277) Add Cross-Origin support to the ATS REST API
[ https://issues.apache.org/jira/browse/YARN-2277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles updated YARN-2277: -- Attachment: (was: YARN-2277.patch) Add Cross-Origin support to the ATS REST API Key: YARN-2277 URL: https://issues.apache.org/jira/browse/YARN-2277 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jonathan Eagles As the Application Timeline Server is provided with built-in UI, it may make sense to enable JSONP or CORS Rest API capabilities to allow for remote UI to access the data directly via javascript without cross side server browser blocks coming into play. Example client may be like http://api.jquery.com/jQuery.getJSON/ This can alleviate the need to create a local proxy cache. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2277) Add Cross-Origin support to the ATS REST API
[ https://issues.apache.org/jira/browse/YARN-2277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles updated YARN-2277: -- Attachment: YARN-2277-CORS.patch Add Cross-Origin support to the ATS REST API Key: YARN-2277 URL: https://issues.apache.org/jira/browse/YARN-2277 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jonathan Eagles Attachments: YARN-2277-CORS.patch, YARN-2277-JSONP.patch As the Application Timeline Server is provided with built-in UI, it may make sense to enable JSONP or CORS Rest API capabilities to allow for remote UI to access the data directly via javascript without cross side server browser blocks coming into play. Example client may be like http://api.jquery.com/jQuery.getJSON/ This can alleviate the need to create a local proxy cache. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2277) Add Cross-Origin support to the ATS REST API
[ https://issues.apache.org/jira/browse/YARN-2277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14059205#comment-14059205 ] Jonathan Eagles commented on YARN-2277: --- [~vinodkv] and [~zjshen] do you guys have any thoughts to the approach taken? Add Cross-Origin support to the ATS REST API Key: YARN-2277 URL: https://issues.apache.org/jira/browse/YARN-2277 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jonathan Eagles Attachments: YARN-2277-CORS.patch, YARN-2277-JSONP.patch As the Application Timeline Server is provided with built-in UI, it may make sense to enable JSONP or CORS Rest API capabilities to allow for remote UI to access the data directly via javascript without cross side server browser blocks coming into play. Example client may be like http://api.jquery.com/jQuery.getJSON/ This can alleviate the need to create a local proxy cache. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-819) ResourceManager and NodeManager should check for a minimum allowed version
[ https://issues.apache.org/jira/browse/YARN-819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13779097#comment-13779097 ] Jonathan Eagles commented on YARN-819: -- +1. Great fix, Rob. ResourceManager and NodeManager should check for a minimum allowed version -- Key: YARN-819 URL: https://issues.apache.org/jira/browse/YARN-819 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, resourcemanager Affects Versions: 2.0.4-alpha Reporter: Robert Parker Assignee: Robert Parker Attachments: YARN-819-1.patch, YARN-819-2.patch, YARN-819-3.patch Our use case is during upgrade on a large cluster several NodeManagers may not restart with the new version. Once the RM comes back up the NodeManager will re-register without issue to the RM. The NM should report the version the RM. The RM should have a configuration to disallow the check (default), equal to the RM (to prevent config change for each release), equal to or greater than RM (to allow NM upgrades), and finally an explicit version or version range. The RM should also have an configuration on how to treat the mismatch: REJECT, or REBOOT the NM. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1199) Make NM/RM Versions Available
[ https://issues.apache.org/jira/browse/YARN-1199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13779169#comment-13779169 ] Jonathan Eagles commented on YARN-1199: --- I have submitted this patch now that YARN-819 is in. Will check-in pending +1 from Hadoop QA Make NM/RM Versions Available - Key: YARN-1199 URL: https://issues.apache.org/jira/browse/YARN-1199 Project: Hadoop YARN Issue Type: Improvement Reporter: Mit Desai Assignee: Mit Desai Attachments: YARN-1199.patch, YARN-1199.patch, YARN-1199.patch Now as we have the NM and RM Versions available, we can display the YARN version of nodes running in the cluster. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1243) ResourceManager: Error in handling event type NODE_UPDATE to the scheduler - NPE at SchedulerApp.java:411
[ https://issues.apache.org/jira/browse/YARN-1243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13779187#comment-13779187 ] Jonathan Eagles commented on YARN-1243: --- +1. Verified backport to branch-0.23 and ran tests. ResourceManager: Error in handling event type NODE_UPDATE to the scheduler - NPE at SchedulerApp.java:411 - Key: YARN-1243 URL: https://issues.apache.org/jira/browse/YARN-1243 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 0.23.8 Environment: RHEL - 6.4, Hadoop 0.23.8 Reporter: Sanjay Upadhyay Assignee: Jason Lowe Attachments: YARN-1243.branch-0.23.patch 2013-09-26 03:25:02,262 [ResourceManager Event Processor] FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type NODE_UPDATE to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApp.unreserve(SchedulerApp.java:411) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.unreserve(LeafQueue.java:1333) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1261) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1137) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1092) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignReservedContainer(LeafQueue.java:887) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:788) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:594) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:656) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:80) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:340) at java.lang.Thread.run(Thread.java:722) Yarn Resource manager exits at this NPE -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-677) Increase coverage to FairScheduler
[ https://issues.apache.org/jira/browse/YARN-677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles updated YARN-677: - Summary: Increase coverage to FairScheduler (was: Add test methods in org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler) Increase coverage to FairScheduler -- Key: YARN-677 URL: https://issues.apache.org/jira/browse/YARN-677 Project: Hadoop YARN Issue Type: Test Affects Versions: 3.0.0, 2.0.3-alpha, 0.23.6 Reporter: Vadim Bondarev Attachments: HADOOP-4536-branch-2-a.patch, HADOOP-4536-branch-2c.patch, HADOOP-4536-trunk-a.patch, HADOOP-4536-trunk-c.patch, HDFS-4536-branch-2--N7.patch, HDFS-4536-branch-2--N8.patch, HDFS-4536-branch-2-N9.patch, HDFS-4536-trunk--N6.patch, HDFS-4536-trunk--N7.patch, HDFS-4536-trunk--N8.patch, HDFS-4536-trunk-N9.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-677) Increase coverage to FairScheduler
[ https://issues.apache.org/jira/browse/YARN-677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784086#comment-13784086 ] Jonathan Eagles commented on YARN-677: -- +1. Thanks for the coverage addition for this component. Increase coverage to FairScheduler -- Key: YARN-677 URL: https://issues.apache.org/jira/browse/YARN-677 Project: Hadoop YARN Issue Type: Test Affects Versions: 3.0.0, 2.0.3-alpha, 0.23.6 Reporter: Vadim Bondarev Attachments: HADOOP-4536-branch-2-a.patch, HADOOP-4536-branch-2c.patch, HADOOP-4536-trunk-a.patch, HADOOP-4536-trunk-c.patch, HDFS-4536-branch-2--N7.patch, HDFS-4536-branch-2--N8.patch, HDFS-4536-branch-2-N9.patch, HDFS-4536-trunk--N6.patch, HDFS-4536-trunk--N7.patch, HDFS-4536-trunk--N8.patch, HDFS-4536-trunk-N9.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-465) fix coverage org.apache.hadoop.yarn.server.webproxy
[ https://issues.apache.org/jira/browse/YARN-465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784149#comment-13784149 ] Jonathan Eagles commented on YARN-465: -- I haven't looked too closely at this, but I see a setAccessible call. This is the same technique that powermock uses to access field which has been a disalllowed testing technique in the hadoop stack. The reason being that it points usually to an improvement that should be made to the class under test. fix coverage org.apache.hadoop.yarn.server.webproxy Key: YARN-465 URL: https://issues.apache.org/jira/browse/YARN-465 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 3.0.0, 0.23.7, 2.0.4-alpha Reporter: Aleksey Gorshkov Assignee: Aleksey Gorshkov Attachments: YARN-465-branch-0.23-a.patch, YARN-465-branch-0.23.patch, YARN-465-branch-2-a.patch, YARN-465-branch-2.patch, YARN-465-trunk-a.patch, YARN-465-trunk.patch fix coverage org.apache.hadoop.yarn.server.webproxy patch YARN-465-trunk.patch for trunk patch YARN-465-branch-2.patch for branch-2 patch YARN-465-branch-0.23.patch for branch-0.23 There is issue in branch-0.23 . Patch does not creating .keep file. For fix it need to run commands: mkdir yhadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/webapps/proxy touch yhadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/webapps/proxy/.keep -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-677) Increase coverage to FairScheduler
[ https://issues.apache.org/jira/browse/YARN-677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785276#comment-13785276 ] Jonathan Eagles commented on YARN-677: -- Thanks, Sandy. Let me take a look at the coverage numbers before this patch went in. In the mean time I will revert until I can prove we need this coverage patch. Increase coverage to FairScheduler -- Key: YARN-677 URL: https://issues.apache.org/jira/browse/YARN-677 Project: Hadoop YARN Issue Type: Test Affects Versions: 3.0.0, 2.0.3-alpha, 0.23.6 Reporter: Vadim Bondarev Fix For: 3.0.0, 2.3.0 Attachments: HADOOP-4536-branch-2-a.patch, HADOOP-4536-branch-2c.patch, HADOOP-4536-trunk-a.patch, HADOOP-4536-trunk-c.patch, HDFS-4536-branch-2--N7.patch, HDFS-4536-branch-2--N8.patch, HDFS-4536-branch-2-N9.patch, HDFS-4536-trunk--N6.patch, HDFS-4536-trunk--N7.patch, HDFS-4536-trunk--N8.patch, HDFS-4536-trunk-N9.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-677) Increase coverage to FairScheduler
[ https://issues.apache.org/jira/browse/YARN-677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles updated YARN-677: - Fix Version/s: (was: 2.3.0) (was: 3.0.0) Increase coverage to FairScheduler -- Key: YARN-677 URL: https://issues.apache.org/jira/browse/YARN-677 Project: Hadoop YARN Issue Type: Test Affects Versions: 3.0.0, 2.0.3-alpha, 0.23.6 Reporter: Vadim Bondarev Attachments: HADOOP-4536-branch-2-a.patch, HADOOP-4536-branch-2c.patch, HADOOP-4536-trunk-a.patch, HADOOP-4536-trunk-c.patch, HDFS-4536-branch-2--N7.patch, HDFS-4536-branch-2--N8.patch, HDFS-4536-branch-2-N9.patch, HDFS-4536-trunk--N6.patch, HDFS-4536-trunk--N7.patch, HDFS-4536-trunk--N8.patch, HDFS-4536-trunk-N9.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1199) Make NM/RM Versions Available
[ https://issues.apache.org/jira/browse/YARN-1199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785526#comment-13785526 ] Jonathan Eagles commented on YARN-1199: --- +1. Thanks, Mit. Make NM/RM Versions Available - Key: YARN-1199 URL: https://issues.apache.org/jira/browse/YARN-1199 Project: Hadoop YARN Issue Type: Improvement Reporter: Mit Desai Assignee: Mit Desai Attachments: YARN-1199.patch, YARN-1199.patch, YARN-1199.patch, YARN-1199.patch Now as we have the NM and RM Versions available, we can display the YARN version of nodes running in the cluster. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1183) MiniYARNCluster shutdown takes several minutes intermittently
[ https://issues.apache.org/jira/browse/YARN-1183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13801883#comment-13801883 ] Jonathan Eagles commented on YARN-1183: --- Great work, everybody. Looks like this patch is ready for checkin. I am assuming this is targeted for trunk and branch-2. Also, can you post a maven command for manual testing? I would be happy to put this in. MiniYARNCluster shutdown takes several minutes intermittently - Key: YARN-1183 URL: https://issues.apache.org/jira/browse/YARN-1183 Project: Hadoop YARN Issue Type: Bug Reporter: Andrey Klochkov Assignee: Andrey Klochkov Attachments: YARN-1183--n2.patch, YARN-1183--n3.patch, YARN-1183--n4.patch, YARN-1183.patch As described in MAPREDUCE-5501 sometimes M/R tests leave MRAppMaster java processes living for several minutes after successful completion of the corresponding test. There is a concurrency issue in MiniYARNCluster shutdown logic which leads to this. Sometimes RM stops before an app master sends it's last report, and then the app master keeps retrying for 6 minutes. In some cases it leads to failures in subsequent tests, and it affects performance of tests as app masters eat resources. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1183) MiniYARNCluster shutdown takes several minutes intermittently
[ https://issues.apache.org/jira/browse/YARN-1183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13802170#comment-13802170 ] Jonathan Eagles commented on YARN-1183: --- Can you post an update patch so I can check in? Current one doesn't apply after YARN-1182. MiniYARNCluster shutdown takes several minutes intermittently - Key: YARN-1183 URL: https://issues.apache.org/jira/browse/YARN-1183 Project: Hadoop YARN Issue Type: Bug Reporter: Andrey Klochkov Assignee: Andrey Klochkov Attachments: YARN-1183--n2.patch, YARN-1183--n3.patch, YARN-1183--n4.patch, YARN-1183.patch As described in MAPREDUCE-5501 sometimes M/R tests leave MRAppMaster java processes living for several minutes after successful completion of the corresponding test. There is a concurrency issue in MiniYARNCluster shutdown logic which leads to this. Sometimes RM stops before an app master sends it's last report, and then the app master keeps retrying for 6 minutes. In some cases it leads to failures in subsequent tests, and it affects performance of tests as app masters eat resources. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1183) MiniYARNCluster shutdown takes several minutes intermittently
[ https://issues.apache.org/jira/browse/YARN-1183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13802282#comment-13802282 ] Jonathan Eagles commented on YARN-1183: --- I'm +1 on YARN-1183--n5.patch. Thanks Andrey and Karthik for getting this patch ready! MiniYARNCluster shutdown takes several minutes intermittently - Key: YARN-1183 URL: https://issues.apache.org/jira/browse/YARN-1183 Project: Hadoop YARN Issue Type: Bug Reporter: Andrey Klochkov Assignee: Andrey Klochkov Attachments: YARN-1183--n2.patch, YARN-1183--n3.patch, YARN-1183--n4.patch, YARN-1183--n5.patch, YARN-1183.patch As described in MAPREDUCE-5501 sometimes M/R tests leave MRAppMaster java processes living for several minutes after successful completion of the corresponding test. There is a concurrency issue in MiniYARNCluster shutdown logic which leads to this. Sometimes RM stops before an app master sends it's last report, and then the app master keeps retrying for 6 minutes. In some cases it leads to failures in subsequent tests, and it affects performance of tests as app masters eat resources. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-473) Capacity Scheduler webpage and REST API not showing correct number of pending applications
[ https://issues.apache.org/jira/browse/YARN-473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13803666#comment-13803666 ] Jonathan Eagles commented on YARN-473: -- I haven't seen any updates on this, so assigning this to another contributor. Feel free chime in if you're still wanting this. I'd like to get this committed in the next week or so. Capacity Scheduler webpage and REST API not showing correct number of pending applications -- Key: YARN-473 URL: https://issues.apache.org/jira/browse/YARN-473 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 0.23.6 Reporter: Kendall Thrapp Assignee: Timothy Chen Labels: usability The Capacity Scheduler REST API (http://hadoop.apache.org/docs/r0.23.6/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Scheduler_API) is not returning the correct number of pending applications. numPendingApplications is almost always zero, even if there are dozens of pending apps. In investigating this, I discovered that the Resource Manager's Scheduler webpage is also showing an incorrect but different number of pending applications. For example, the cluster I'm looking at right now currently has 15 applications in the ACCEPTED state, but the Cluster Metrics table near the top of the page says there are only 2 pending apps. The REST API says there are zero pending apps. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-473) Capacity Scheduler webpage and REST API not showing correct number of pending applications
[ https://issues.apache.org/jira/browse/YARN-473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles updated YARN-473: - Assignee: Mit Desai (was: Timothy Chen) Capacity Scheduler webpage and REST API not showing correct number of pending applications -- Key: YARN-473 URL: https://issues.apache.org/jira/browse/YARN-473 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 0.23.6 Reporter: Kendall Thrapp Assignee: Mit Desai Labels: usability The Capacity Scheduler REST API (http://hadoop.apache.org/docs/r0.23.6/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Scheduler_API) is not returning the correct number of pending applications. numPendingApplications is almost always zero, even if there are dozens of pending apps. In investigating this, I discovered that the Resource Manager's Scheduler webpage is also showing an incorrect but different number of pending applications. For example, the cluster I'm looking at right now currently has 15 applications in the ACCEPTED state, but the Cluster Metrics table near the top of the page says there are only 2 pending apps. The REST API says there are zero pending apps. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1031) JQuery UI components reference external css in branch-23
[ https://issues.apache.org/jira/browse/YARN-1031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809307#comment-13809307 ] Jonathan Eagles commented on YARN-1031: --- +1. Verified Jasons changes. Blocked access to ajax.googleapis.com via /etc/hosts before and after the change to visually inspect. Programmatically scanned network activity via firebug to verify new jquery-ui.css and icons are downloaded via local with no GETs to ajax.googleapis.com. JQuery UI components reference external css in branch-23 Key: YARN-1031 URL: https://issues.apache.org/jira/browse/YARN-1031 Project: Hadoop YARN Issue Type: Bug Affects Versions: 0.23.9 Reporter: Jonathan Eagles Assignee: Jonathan Eagles Attachments: YARN-1031-2-branch-0.23.patch, YARN-1031-3-branch-0.23.patch, YARN-1031-branch-0.23.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1386) NodeManager mistakenly loses resources and relocalizes them
[ https://issues.apache.org/jira/browse/YARN-1386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13820682#comment-13820682 ] Jonathan Eagles commented on YARN-1386: --- +1. Great fix, Jason. NodeManager mistakenly loses resources and relocalizes them --- Key: YARN-1386 URL: https://issues.apache.org/jira/browse/YARN-1386 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 0.23.10, 2.2.0 Reporter: Jason Lowe Assignee: Jason Lowe Priority: Critical Attachments: YARN-1386.patch, YARN-1386.patch When a local resource that should already be present is requested again, the nodemanager checks to see if it still present. However the method it uses to check for presence is via File.exists() as the user of the nodemanager process. If the resource was a private resource localized for another user, it will be localized to a location that is not accessible by the nodemanager user. Therefore File.exists() returns false, the nodemanager mistakenly believes the resource is no longer available, and it proceeds to localize it over and over. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1386) NodeManager mistakenly loses resources and relocalizes them
[ https://issues.apache.org/jira/browse/YARN-1386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles updated YARN-1386: -- Fix Version/s: 2.2.1 NodeManager mistakenly loses resources and relocalizes them --- Key: YARN-1386 URL: https://issues.apache.org/jira/browse/YARN-1386 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 0.23.10, 2.2.0 Reporter: Jason Lowe Assignee: Jason Lowe Priority: Critical Fix For: 3.0.0, 2.3.0, 0.23.10, 2.2.1 Attachments: YARN-1386.patch, YARN-1386.patch When a local resource that should already be present is requested again, the nodemanager checks to see if it still present. However the method it uses to check for presence is via File.exists() as the user of the nodemanager process. If the resource was a private resource localized for another user, it will be localized to a location that is not accessible by the nodemanager user. Therefore File.exists() returns false, the nodemanager mistakenly believes the resource is no longer available, and it proceeds to localize it over and over. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Moved] (YARN-1419) TestFifoScheduler.testAppAttemptMetrics fails intermittently under jdk7
[ https://issues.apache.org/jira/browse/YARN-1419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles moved MAPREDUCE-5630 to YARN-1419: -- Component/s: (was: scheduler) scheduler Target Version/s: 3.0.0, 2.3.0, 0.23.10 (was: 3.0.0, 2.3.0, 0.23.10) Affects Version/s: (was: 0.23.10) (was: 2.3.0) (was: 3.0.0) 0.23.10 2.3.0 3.0.0 Key: YARN-1419 (was: MAPREDUCE-5630) Project: Hadoop YARN (was: Hadoop Map/Reduce) TestFifoScheduler.testAppAttemptMetrics fails intermittently under jdk7 Key: YARN-1419 URL: https://issues.apache.org/jira/browse/YARN-1419 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 3.0.0, 2.3.0, 0.23.10 Reporter: Jonathan Eagles Assignee: Jonathan Eagles Priority: Minor Labels: java7 QueueMetrics holds its data in a static variable causing metrics to bleed over from test to test. clearQueueMetrics is to be called for tests that need to measure metrics correctly for a single test. jdk7 comes into play since tests are run out of order, and in the case make the metrics unreliable. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1419) TestFifoScheduler.testAppAttemptMetrics fails intermittently under jdk7
[ https://issues.apache.org/jira/browse/YARN-1419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles updated YARN-1419: -- Attachment: YARN-1419.patch TestFifoScheduler.testAppAttemptMetrics fails intermittently under jdk7 Key: YARN-1419 URL: https://issues.apache.org/jira/browse/YARN-1419 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 3.0.0, 2.3.0, 0.23.10 Reporter: Jonathan Eagles Assignee: Jonathan Eagles Priority: Minor Labels: java7 Attachments: YARN-1419.patch QueueMetrics holds its data in a static variable causing metrics to bleed over from test to test. clearQueueMetrics is to be called for tests that need to measure metrics correctly for a single test. jdk7 comes into play since tests are run out of order, and in the case make the metrics unreliable. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1419) TestFifoScheduler.testAppAttemptMetrics fails intermittently under jdk7
[ https://issues.apache.org/jira/browse/YARN-1419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles updated YARN-1419: -- Attachment: YARN-1419.patch Instead of heavily changing the QueueMetrics class and its use of static class variables and it not unregistering the beans, I've chosen to take a simpler approach of just measuring the apps submitted delta. TestFifoScheduler.testAppAttemptMetrics fails intermittently under jdk7 Key: YARN-1419 URL: https://issues.apache.org/jira/browse/YARN-1419 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 3.0.0, 2.3.0, 0.23.10 Reporter: Jonathan Eagles Assignee: Jonathan Eagles Priority: Minor Labels: java7 Attachments: YARN-1419.patch, YARN-1419.patch QueueMetrics holds its data in a static variable causing metrics to bleed over from test to test. clearQueueMetrics is to be called for tests that need to measure metrics correctly for a single test. jdk7 comes into play since tests are run out of order, and in the case make the metrics unreliable. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (YARN-1426) YARN Components need to unregister their beans upon shutdown
Jonathan Eagles created YARN-1426: - Summary: YARN Components need to unregister their beans upon shutdown Key: YARN-1426 URL: https://issues.apache.org/jira/browse/YARN-1426 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 3.0.0, 2.3.0 Reporter: Jonathan Eagles -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1420) TestRMContainerAllocator#testUpdatedNodes fails
[ https://issues.apache.org/jira/browse/YARN-1420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13827378#comment-13827378 ] Jonathan Eagles commented on YARN-1420: --- I ran git bisect on my mac using jdk 1.6 to detect when this test failures was introduced. YARN-1343 is the likely culprit. I haven't run this test on linux with jdk 1.6, but I suspect there are in fact two issues. TestRMContainerAllocator#testUpdatedNodes fails --- Key: YARN-1420 URL: https://issues.apache.org/jira/browse/YARN-1420 Project: Hadoop YARN Issue Type: Test Reporter: Ted Yu From https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1607/console : {code} Running org.apache.hadoop.mapreduce.v2.app.TestRMContainerAllocator Tests run: 14, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 65.78 sec FAILURE! - in org.apache.hadoop.mapreduce.v2.app.TestRMContainerAllocator testUpdatedNodes(org.apache.hadoop.mapreduce.v2.app.TestRMContainerAllocator) Time elapsed: 3.125 sec FAILURE! junit.framework.AssertionFailedError: null at junit.framework.Assert.fail(Assert.java:48) at junit.framework.Assert.assertTrue(Assert.java:20) at junit.framework.Assert.assertTrue(Assert.java:27) at org.apache.hadoop.mapreduce.v2.app.TestRMContainerAllocator.testUpdatedNodes(TestRMContainerAllocator.java:779) {code} This assertion fails: {code} Assert.assertTrue(allocator.getJobUpdatedNodeEvents().isEmpty()); {code} The List returned by allocator.getJobUpdatedNodeEvents() is: [EventType: JOB_UPDATED_NODES] -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Assigned] (YARN-1420) TestRMContainerAllocator#testUpdatedNodes fails
[ https://issues.apache.org/jira/browse/YARN-1420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles reassigned YARN-1420: - Assignee: Jonathan Eagles TestRMContainerAllocator#testUpdatedNodes fails --- Key: YARN-1420 URL: https://issues.apache.org/jira/browse/YARN-1420 Project: Hadoop YARN Issue Type: Test Reporter: Ted Yu Assignee: Jonathan Eagles From https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1607/console : {code} Running org.apache.hadoop.mapreduce.v2.app.TestRMContainerAllocator Tests run: 14, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 65.78 sec FAILURE! - in org.apache.hadoop.mapreduce.v2.app.TestRMContainerAllocator testUpdatedNodes(org.apache.hadoop.mapreduce.v2.app.TestRMContainerAllocator) Time elapsed: 3.125 sec FAILURE! junit.framework.AssertionFailedError: null at junit.framework.Assert.fail(Assert.java:48) at junit.framework.Assert.assertTrue(Assert.java:20) at junit.framework.Assert.assertTrue(Assert.java:27) at org.apache.hadoop.mapreduce.v2.app.TestRMContainerAllocator.testUpdatedNodes(TestRMContainerAllocator.java:779) {code} This assertion fails: {code} Assert.assertTrue(allocator.getJobUpdatedNodeEvents().isEmpty()); {code} The List returned by allocator.getJobUpdatedNodeEvents() is: [EventType: JOB_UPDATED_NODES] -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1420) TestRMContainerAllocator#testUpdatedNodes fails
[ https://issues.apache.org/jira/browse/YARN-1420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles updated YARN-1420: -- Attachment: YARN-1420.patch TestRMContainerAllocator#testUpdatedNodes fails --- Key: YARN-1420 URL: https://issues.apache.org/jira/browse/YARN-1420 Project: Hadoop YARN Issue Type: Test Reporter: Ted Yu Assignee: Jonathan Eagles Attachments: YARN-1420.patch From https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1607/console : {code} Running org.apache.hadoop.mapreduce.v2.app.TestRMContainerAllocator Tests run: 14, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 65.78 sec FAILURE! - in org.apache.hadoop.mapreduce.v2.app.TestRMContainerAllocator testUpdatedNodes(org.apache.hadoop.mapreduce.v2.app.TestRMContainerAllocator) Time elapsed: 3.125 sec FAILURE! junit.framework.AssertionFailedError: null at junit.framework.Assert.fail(Assert.java:48) at junit.framework.Assert.assertTrue(Assert.java:20) at junit.framework.Assert.assertTrue(Assert.java:27) at org.apache.hadoop.mapreduce.v2.app.TestRMContainerAllocator.testUpdatedNodes(TestRMContainerAllocator.java:779) {code} This assertion fails: {code} Assert.assertTrue(allocator.getJobUpdatedNodeEvents().isEmpty()); {code} The List returned by allocator.getJobUpdatedNodeEvents() is: [EventType: JOB_UPDATED_NODES] -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1343) NodeManagers additions/restarts are not reported as node updates in AllocateResponse responses to AMs
[ https://issues.apache.org/jira/browse/YARN-1343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13828270#comment-13828270 ] Jonathan Eagles commented on YARN-1343: --- This change introduced a test failure in TestRMContainerAllocator#testUpdatedNodes MAPREDUCE-5632 since it is counting the jobUpdatedNodeEvents. Can someone [~tucu00] or [~bikassaha] verify the patch and make sure that the test reflects the new proper behavior and that I'm not masking a real error in the code. NodeManagers additions/restarts are not reported as node updates in AllocateResponse responses to AMs - Key: YARN-1343 URL: https://issues.apache.org/jira/browse/YARN-1343 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.2.0 Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Priority: Critical Fix For: 2.2.1 Attachments: YARN-1343.patch, YARN-1343.patch, YARN-1343.patch, YARN-1343.patch If a NodeManager joins the cluster or gets restarted, running AMs never receive the node update indicating the Node is running. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Assigned] (YARN-1426) YARN Components need to unregister their beans upon shutdown
[ https://issues.apache.org/jira/browse/YARN-1426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles reassigned YARN-1426: - Assignee: Jonathan Eagles YARN Components need to unregister their beans upon shutdown Key: YARN-1426 URL: https://issues.apache.org/jira/browse/YARN-1426 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 3.0.0, 2.3.0 Reporter: Jonathan Eagles Assignee: Jonathan Eagles -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1426) YARN Components need to unregister their beans upon shutdown
[ https://issues.apache.org/jira/browse/YARN-1426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles updated YARN-1426: -- Attachment: YARN-1426.patch YARN Components need to unregister their beans upon shutdown Key: YARN-1426 URL: https://issues.apache.org/jira/browse/YARN-1426 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 3.0.0, 2.3.0 Reporter: Jonathan Eagles Assignee: Jonathan Eagles Attachments: YARN-1426.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1426) YARN Components need to unregister their beans upon shutdown
[ https://issues.apache.org/jira/browse/YARN-1426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13829381#comment-13829381 ] Jonathan Eagles commented on YARN-1426: --- Test failures: - TestJobCleanup is from MAPREDUCE-5552. -- Ran this test with and without my patch and both succeed on my desktop. YARN Components need to unregister their beans upon shutdown Key: YARN-1426 URL: https://issues.apache.org/jira/browse/YARN-1426 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 3.0.0, 2.3.0 Reporter: Jonathan Eagles Assignee: Jonathan Eagles Attachments: YARN-1426.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1136) Replace junit.framework.Assert with org.junit.Assert
[ https://issues.apache.org/jira/browse/YARN-1136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles updated YARN-1136: -- Assignee: Chen He Replace junit.framework.Assert with org.junit.Assert Key: YARN-1136 URL: https://issues.apache.org/jira/browse/YARN-1136 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.0-beta Reporter: Karthik Kambatla Assignee: Chen He Labels: newbie, test There are several places where we are using junit.framework.Assert instead of org.junit.Assert. {code}grep -rn junit.framework.Assert hadoop-yarn-project/ --include=*.java{code} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Created] (YARN-1491) Upgrade JUnit3 TestCase to JUnit 4
Jonathan Eagles created YARN-1491: - Summary: Upgrade JUnit3 TestCase to JUnit 4 Key: YARN-1491 URL: https://issues.apache.org/jira/browse/YARN-1491 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Jonathan Eagles Assignee: Chen He There are still four references to test classes that extend from junit.framework.TestCase hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestYarnVersionInfo.java hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestWindowsResourceCalculatorPlugin.java hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestLinuxResourceCalculatorPlugin.java hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestWindowsBasedProcessTree.java -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (YARN-1496) Protocol additions to allow moving apps between queues
[ https://issues.apache.org/jira/browse/YARN-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles updated YARN-1496: -- Assignee: (was: Jonathan Eagles) Protocol additions to allow moving apps between queues -- Key: YARN-1496 URL: https://issues.apache.org/jira/browse/YARN-1496 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Sandy Ryza -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Assigned] (YARN-1496) Protocol additions to allow moving apps between queues
[ https://issues.apache.org/jira/browse/YARN-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles reassigned YARN-1496: - Assignee: Jonathan Eagles (was: Sandy Ryza) Protocol additions to allow moving apps between queues -- Key: YARN-1496 URL: https://issues.apache.org/jira/browse/YARN-1496 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Sandy Ryza Assignee: Jonathan Eagles -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (YARN-1180) Update capacity scheduler docs to include types on the configs
[ https://issues.apache.org/jira/browse/YARN-1180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles updated YARN-1180: -- Fix Version/s: (was: 2.4.0) Update capacity scheduler docs to include types on the configs -- Key: YARN-1180 URL: https://issues.apache.org/jira/browse/YARN-1180 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 3.0.0, 2.1.0-beta, 0.23.9 Reporter: Thomas Graves Assignee: Chen He Labels: documentation, newbie Attachments: Yarn-1180.patch The capacity scheduler docs (http://hadoop.apache.org/docs/r2.1.0-beta/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html) don't include types for all the configs. For instance the minimum-user-limit-percent doesn't say its an Int. It also the only setting for the Resource Allocation configs that is an Int rather then a float. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (YARN-1180) Update capacity scheduler docs to include types on the configs
[ https://issues.apache.org/jira/browse/YARN-1180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13853422#comment-13853422 ] Jonathan Eagles commented on YARN-1180: --- Thanks for the patch, Chen. I have taken a look at this patch and I noticed that you have added the types on the configs. Everything looks good there. One thing I did notice is that user-metrics.enable, resource-calculator, node-locality-delay, and possibly others have been left undocumented for some time. I'm okay with doing that work as part of another JIRA or expanding the scope of this JIRA to do that work. Jon Update capacity scheduler docs to include types on the configs -- Key: YARN-1180 URL: https://issues.apache.org/jira/browse/YARN-1180 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 3.0.0, 2.1.0-beta, 0.23.9 Reporter: Thomas Graves Assignee: Chen He Labels: documentation, newbie Attachments: Yarn-1180.patch The capacity scheduler docs (http://hadoop.apache.org/docs/r2.1.0-beta/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html) don't include types for all the configs. For instance the minimum-user-limit-percent doesn't say its an Int. It also the only setting for the Resource Allocation configs that is an Int rather then a float. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (YARN-1479) Invalid NaN values in Hadoop REST API JSON response
[ https://issues.apache.org/jira/browse/YARN-1479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881180#comment-13881180 ] Jonathan Eagles commented on YARN-1479: --- Thanks, Chen. Couple of minor things and a question for you. * There are a couple of unnecessary imports in TestApplicationMasterService. Let's get those cleaned up before this patch goes in. * progressCheck - the function will be better off package-private since the intention is not to advertise new functionality * progressCheck - this function should be renamed since check is a question and not an indication something is being modified. Perhaps progressFilter or hopefully you can think of something better. Invalid NaN values in Hadoop REST API JSON response --- Key: YARN-1479 URL: https://issues.apache.org/jira/browse/YARN-1479 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 0.23.6, 2.0.4-alpha Reporter: Kendall Thrapp Assignee: Chen He Fix For: 2.4.0 Attachments: Yarn-1479.patch I've been occasionally coming across instances where Hadoop's Cluster Applications REST API (http://hadoop.apache.org/docs/r0.23.6/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Applications_API) has returned JSON that PHP's json_decode function failed to parse. I've tracked the syntax error down to the presence of the unquoted word NaN appearing as a value in the JSON. For example: progress:NaN, NaN is not part of the JSON spec, so its presence renders the whole JSON string invalid. Hadoop needs to return something other than NaN in this case -- perhaps an empty string or the quoted string NaN. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1632) TestApplicationMasterServices should be under org.apache.hadoop.yarn.server.resourcemanager package
[ https://issues.apache.org/jira/browse/YARN-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13884421#comment-13884421 ] Jonathan Eagles commented on YARN-1632: --- Thanks for the patch Chen. Looks like the patch has added a temp file by mistake. TestApplicationMasterServices should be under org.apache.hadoop.yarn.server.resourcemanager package --- Key: YARN-1632 URL: https://issues.apache.org/jira/browse/YARN-1632 Project: Hadoop YARN Issue Type: Bug Affects Versions: 0.23.9, 2.2.0 Reporter: Chen He Assignee: Chen He Priority: Minor Attachments: yarn-1632.patch, yarn-1632v2.patch ApplicationMasterService is under org.apache.hadoop.yarn.server.resourcemanager package. However, its unit test file TestApplicationMasterService is placed under org.apache.hadoop.yarn.server.resourcemanager.applicationmasterservice package which only contains one file (TestApplicationMasterService). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1632) TestApplicationMasterServices should be under org.apache.hadoop.yarn.server.resourcemanager package
[ https://issues.apache.org/jira/browse/YARN-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885506#comment-13885506 ] Jonathan Eagles commented on YARN-1632: --- +1. Simple fix. Thanks, Chen. TestApplicationMasterServices should be under org.apache.hadoop.yarn.server.resourcemanager package --- Key: YARN-1632 URL: https://issues.apache.org/jira/browse/YARN-1632 Project: Hadoop YARN Issue Type: Bug Affects Versions: 0.23.9, 2.2.0 Reporter: Chen He Assignee: Chen He Priority: Minor Attachments: yarn-1632v2.patch ApplicationMasterService is under org.apache.hadoop.yarn.server.resourcemanager package. However, its unit test file TestApplicationMasterService is placed under org.apache.hadoop.yarn.server.resourcemanager.applicationmasterservice package which only contains one file (TestApplicationMasterService). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1632) TestApplicationMasterServices should be under org.apache.hadoop.yarn.server.resourcemanager package
[ https://issues.apache.org/jira/browse/YARN-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles updated YARN-1632: -- Fix Version/s: 3.0.0 2.4.0 TestApplicationMasterServices should be under org.apache.hadoop.yarn.server.resourcemanager package --- Key: YARN-1632 URL: https://issues.apache.org/jira/browse/YARN-1632 Project: Hadoop YARN Issue Type: Bug Affects Versions: 0.23.9, 2.2.0 Reporter: Chen He Assignee: Chen He Priority: Minor Fix For: 3.0.0, 2.4.0 Attachments: yarn-1632v2.patch ApplicationMasterService is under org.apache.hadoop.yarn.server.resourcemanager package. However, its unit test file TestApplicationMasterService is placed under org.apache.hadoop.yarn.server.resourcemanager.applicationmasterservice package which only contains one file (TestApplicationMasterService). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1479) Invalid NaN values in Hadoop REST API JSON response
[ https://issues.apache.org/jira/browse/YARN-1479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905740#comment-13905740 ] Jonathan Eagles commented on YARN-1479: --- +1. Making a minor tweak to the sleep time since it was causing the test to take 1 minute longer than needed on my box. Invalid NaN values in Hadoop REST API JSON response --- Key: YARN-1479 URL: https://issues.apache.org/jira/browse/YARN-1479 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 0.23.6, 2.0.4-alpha Reporter: Kendall Thrapp Assignee: Chen He Attachments: Yarn-1479.patch, Yarn-1479v2.patch I've been occasionally coming across instances where Hadoop's Cluster Applications REST API (http://hadoop.apache.org/docs/r0.23.6/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Applications_API) has returned JSON that PHP's json_decode function failed to parse. I've tracked the syntax error down to the presence of the unquoted word NaN appearing as a value in the JSON. For example: progress:NaN, NaN is not part of the JSON spec, so its presence renders the whole JSON string invalid. Hadoop needs to return something other than NaN in this case -- perhaps an empty string or the quoted string NaN. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (YARN-2830) Add backwords compatible ContainerId.newInstance constructor for use within Tez Local Mode
Jonathan Eagles created YARN-2830: - Summary: Add backwords compatible ContainerId.newInstance constructor for use within Tez Local Mode Key: YARN-2830 URL: https://issues.apache.org/jira/browse/YARN-2830 Project: Hadoop YARN Issue Type: Bug Reporter: Jonathan Eagles Assignee: Jonathan Eagles YARN-2229 modified the private unstable api for constructing. Tez uses this api (shouldn't, but does) for use with Tez Local Mode. This causes a NoSuchMethod error when using Tez compiled against pre-2.6. Instead I propose we add the backwards compatible api since overflow is not a problem in tez local mode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2229) ContainerId can overflow with RM restart
[ https://issues.apache.org/jira/browse/YARN-2229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202511#comment-14202511 ] Jonathan Eagles commented on YARN-2229: --- FYI: Filed YARN-2830 to help tez deal with this internal api change in yarn. ContainerId can overflow with RM restart Key: YARN-2229 URL: https://issues.apache.org/jira/browse/YARN-2229 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA Fix For: 2.6.0 Attachments: YARN-2229.1.patch, YARN-2229.10.patch, YARN-2229.10.patch, YARN-2229.11.patch, YARN-2229.12.patch, YARN-2229.13.patch, YARN-2229.14.patch, YARN-2229.15.patch, YARN-2229.16.patch, YARN-2229.2.patch, YARN-2229.2.patch, YARN-2229.3.patch, YARN-2229.4.patch, YARN-2229.5.patch, YARN-2229.6.patch, YARN-2229.7.patch, YARN-2229.8.patch, YARN-2229.9.patch On YARN-2052, we changed containerId format: upper 10 bits are for epoch, lower 22 bits are for sequence number of Ids. This is for preserving semantics of {{ContainerId#getId()}}, {{ContainerId#toString()}}, {{ContainerId#compareTo()}}, {{ContainerId#equals}}, and {{ConverterUtils#toContainerId}}. One concern is epoch can overflow after RM restarts 1024 times. To avoid the problem, its better to make containerId long. We need to define the new format of container Id with preserving backward compatibility on this JIRA. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2830) Add backwords compatible ContainerId.newInstance constructor for use within Tez Local Mode
[ https://issues.apache.org/jira/browse/YARN-2830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202516#comment-14202516 ] Jonathan Eagles commented on YARN-2830: --- Working on validating this patch. Add backwords compatible ContainerId.newInstance constructor for use within Tez Local Mode -- Key: YARN-2830 URL: https://issues.apache.org/jira/browse/YARN-2830 Project: Hadoop YARN Issue Type: Bug Reporter: Jonathan Eagles Assignee: Jonathan Eagles Priority: Blocker Attachments: YARN-2830-v1.patch YARN-2229 modified the private unstable api for constructing. Tez uses this api (shouldn't, but does) for use with Tez Local Mode. This causes a NoSuchMethod error when using Tez compiled against pre-2.6. Instead I propose we add the backwards compatible api since overflow is not a problem in tez local mode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2830) Add backwords compatible ContainerId.newInstance constructor for use within Tez Local Mode
[ https://issues.apache.org/jira/browse/YARN-2830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles updated YARN-2830: -- Attachment: YARN-2830-v1.patch Add backwords compatible ContainerId.newInstance constructor for use within Tez Local Mode -- Key: YARN-2830 URL: https://issues.apache.org/jira/browse/YARN-2830 Project: Hadoop YARN Issue Type: Bug Reporter: Jonathan Eagles Assignee: Jonathan Eagles Priority: Blocker Attachments: YARN-2830-v1.patch YARN-2229 modified the private unstable api for constructing. Tez uses this api (shouldn't, but does) for use with Tez Local Mode. This causes a NoSuchMethod error when using Tez compiled against pre-2.6. Instead I propose we add the backwards compatible api since overflow is not a problem in tez local mode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2830) Add backwords compatible ContainerId.newInstance constructor for use within Tez Local Mode
[ https://issues.apache.org/jira/browse/YARN-2830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202593#comment-14202593 ] Jonathan Eagles commented on YARN-2830: --- [~ozawa], I understand your fix, and that is the correct fix in Tez. But right now I am looking at clusters that are running Tez 0.5.1 running on Hadoop 2.5.1. Those clusters can't be upgraded to Hadoop 2.6.0 without breaking Tez. This is purely to maintain backwards compatibility in Hadoop. Add backwords compatible ContainerId.newInstance constructor for use within Tez Local Mode -- Key: YARN-2830 URL: https://issues.apache.org/jira/browse/YARN-2830 Project: Hadoop YARN Issue Type: Bug Reporter: Jonathan Eagles Assignee: Jonathan Eagles Priority: Blocker Attachments: YARN-2830-v1.patch YARN-2229 modified the private unstable api for constructing. Tez uses this api (shouldn't, but does) for use with Tez Local Mode. This causes a NoSuchMethod error when using Tez compiled against pre-2.6. Instead I propose we add the backwards compatible api since overflow is not a problem in tez local mode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2830) Add backwords compatible ContainerId.newInstance constructor for use within Tez Local Mode
[ https://issues.apache.org/jira/browse/YARN-2830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles updated YARN-2830: -- Attachment: YARN-2830-v2.patch v2 patch is validated to work against Tez 0.5.1 existing release compiled against Hadoop 2.5.1 running against Hadoop 2.6.0 Add backwords compatible ContainerId.newInstance constructor for use within Tez Local Mode -- Key: YARN-2830 URL: https://issues.apache.org/jira/browse/YARN-2830 Project: Hadoop YARN Issue Type: Bug Reporter: Jonathan Eagles Assignee: Jonathan Eagles Priority: Blocker Attachments: YARN-2830-v1.patch, YARN-2830-v2.patch YARN-2229 modified the private unstable api for constructing. Tez uses this api (shouldn't, but does) for use with Tez Local Mode. This causes a NoSuchMethod error when using Tez compiled against pre-2.6. Instead I propose we add the backwards compatible api since overflow is not a problem in tez local mode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2830) Add backwords compatible ContainerId.newInstance constructor for use within Tez Local Mode
[ https://issues.apache.org/jira/browse/YARN-2830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202624#comment-14202624 ] Jonathan Eagles commented on YARN-2830: --- [~ozawa], I've validated this patch and added the deprecated flag. Filed TEZ-1755 to stop using this deprecated issue. Add backwords compatible ContainerId.newInstance constructor for use within Tez Local Mode -- Key: YARN-2830 URL: https://issues.apache.org/jira/browse/YARN-2830 Project: Hadoop YARN Issue Type: Bug Reporter: Jonathan Eagles Assignee: Jonathan Eagles Priority: Blocker Attachments: YARN-2830-v1.patch, YARN-2830-v2.patch YARN-2229 modified the private unstable api for constructing. Tez uses this api (shouldn't, but does) for use with Tez Local Mode. This causes a NoSuchMethod error when using Tez compiled against pre-2.6. Instead I propose we add the backwards compatible api since overflow is not a problem in tez local mode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2830) Add backwords compatible ContainerId.newInstance constructor for use within Tez Local Mode
[ https://issues.apache.org/jira/browse/YARN-2830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202644#comment-14202644 ] Jonathan Eagles commented on YARN-2830: --- [~ozawa], Cross-posting [~hitesh]'s comment from TEZ-1755. When can we expect to mark ContainerId.newInstance as public stable to avoid this type of incompatibility in the future? Add backwords compatible ContainerId.newInstance constructor for use within Tez Local Mode -- Key: YARN-2830 URL: https://issues.apache.org/jira/browse/YARN-2830 Project: Hadoop YARN Issue Type: Bug Reporter: Jonathan Eagles Assignee: Jonathan Eagles Priority: Blocker Attachments: YARN-2830-v1.patch, YARN-2830-v2.patch YARN-2229 modified the private unstable api for constructing. Tez uses this api (shouldn't, but does) for use with Tez Local Mode. This causes a NoSuchMethod error when using Tez compiled against pre-2.6. Instead I propose we add the backwards compatible api since overflow is not a problem in tez local mode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2830) Add backwords compatible ContainerId.newInstance constructor for use within Tez Local Mode
[ https://issues.apache.org/jira/browse/YARN-2830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles updated YARN-2830: -- Attachment: YARN-2830-v3.patch [~sseth], [~ozawa], new patch moves newInstance - newContainerId and re-adds the old newInstance. I'm open to other names for the new API. Please review when you get a chance. Add backwords compatible ContainerId.newInstance constructor for use within Tez Local Mode -- Key: YARN-2830 URL: https://issues.apache.org/jira/browse/YARN-2830 Project: Hadoop YARN Issue Type: Bug Reporter: Jonathan Eagles Assignee: Jonathan Eagles Priority: Blocker Attachments: YARN-2830-v1.patch, YARN-2830-v2.patch, YARN-2830-v3.patch YARN-2229 modified the private unstable api for constructing. Tez uses this api (shouldn't, but does) for use with Tez Local Mode. This causes a NoSuchMethod error when using Tez compiled against pre-2.6. Instead I propose we add the backwards compatible api since overflow is not a problem in tez local mode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2830) Add backwords compatible ContainerId.newInstance constructor for use within Tez Local Mode
[ https://issues.apache.org/jira/browse/YARN-2830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles updated YARN-2830: -- Attachment: YARN-2830-v4.patch Add backwords compatible ContainerId.newInstance constructor for use within Tez Local Mode -- Key: YARN-2830 URL: https://issues.apache.org/jira/browse/YARN-2830 Project: Hadoop YARN Issue Type: Bug Reporter: Jonathan Eagles Assignee: Jonathan Eagles Priority: Blocker Attachments: YARN-2830-v1.patch, YARN-2830-v2.patch, YARN-2830-v3.patch, YARN-2830-v4.patch YARN-2229 modified the private unstable api for constructing. Tez uses this api (shouldn't, but does) for use with Tez Local Mode. This causes a NoSuchMethod error when using Tez compiled against pre-2.6. Instead I propose we add the backwards compatible api since overflow is not a problem in tez local mode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2513) Host framework UIs in YARN for use with the ATS
[ https://issues.apache.org/jira/browse/YARN-2513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles updated YARN-2513: -- Attachment: YARN-2513-v2.patch Refreshing the patch. Host framework UIs in YARN for use with the ATS --- Key: YARN-2513 URL: https://issues.apache.org/jira/browse/YARN-2513 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Jonathan Eagles Assignee: Jonathan Eagles Attachments: YARN-2513-v1.patch, YARN-2513-v2.patch Allow for pluggable UIs as described by TEZ-8. Yarn can provide the infrastructure to host java script and possible java UIs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2375) Allow enabling/disabling timeline server per framework
[ https://issues.apache.org/jira/browse/YARN-2375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles updated YARN-2375: -- Description: This JIRA is to remove the ats enabled flag check within the TimelineClientImpl. Example where this fails is below. While running secure timeline server with ats flag set to disabled on resource manager, Timeline delegation token renewer throws an NPE. Allow enabling/disabling timeline server per framework -- Key: YARN-2375 URL: https://issues.apache.org/jira/browse/YARN-2375 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jonathan Eagles Assignee: Mit Desai This JIRA is to remove the ats enabled flag check within the TimelineClientImpl. Example where this fails is below. While running secure timeline server with ats flag set to disabled on resource manager, Timeline delegation token renewer throws an NPE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2375) Allow enabling/disabling timeline server per framework
[ https://issues.apache.org/jira/browse/YARN-2375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14212964#comment-14212964 ] Jonathan Eagles commented on YARN-2375: --- [~zjshen], you misunderstand my request. I am proposing to retain the flag. However, the responsibility of checking whether the ats is enabled needs to be outside of the TimelineClientImpl. In fact, the code in yarn assumes the design I am proposing. In YarnClient it checks the value of ats.enabled, then it creates the TimelineClientImpl which then re-checks ats.enabled. This is the preferred object design. The issues lies in the fact the the timeline delegation token renewer creates a TimelineClient because it has a timeline server delegation token. This is proof enough that a timelineclient needs to be created. This goes back to my original design constraint that ats.enabled must be able to be turned off globally, and enabled at the per job/framework level. Allow enabling/disabling timeline server per framework -- Key: YARN-2375 URL: https://issues.apache.org/jira/browse/YARN-2375 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jonathan Eagles Assignee: Mit Desai This JIRA is to remove the ats enabled flag check within the TimelineClientImpl. Example where this fails is below. While running secure timeline server with ats flag set to disabled on resource manager, Timeline delegation token renewer throws an NPE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2375) Allow enabling/disabling timeline server per framework
[ https://issues.apache.org/jira/browse/YARN-2375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218507#comment-14218507 ] Jonathan Eagles commented on YARN-2375: --- This code looks good to me. [~zjshen], can you give a final review? Allow enabling/disabling timeline server per framework -- Key: YARN-2375 URL: https://issues.apache.org/jira/browse/YARN-2375 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jonathan Eagles Assignee: Mit Desai Attachments: YARN-2375.patch, YARN-2375.patch, YARN-2375.patch This JIRA is to remove the ats enabled flag check within the TimelineClientImpl. Example where this fails is below. While running secure timeline server with ats flag set to disabled on resource manager, Timeline delegation token renewer throws an NPE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2375) Allow enabling/disabling timeline server per framework
[ https://issues.apache.org/jira/browse/YARN-2375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219930#comment-14219930 ] Jonathan Eagles commented on YARN-2375: --- I think creating a separate ticket for enabling timeline server in the mini MR cluster is a good idea. changes look good to me. [~zjshen], any additional feedback before this goes in? Allow enabling/disabling timeline server per framework -- Key: YARN-2375 URL: https://issues.apache.org/jira/browse/YARN-2375 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jonathan Eagles Assignee: Mit Desai Attachments: YARN-2375.patch, YARN-2375.patch, YARN-2375.patch, YARN-2375.patch This JIRA is to remove the ats enabled flag check within the TimelineClientImpl. Example where this fails is below. While running secure timeline server with ats flag set to disabled on resource manager, Timeline delegation token renewer throws an NPE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2900) Application Not Found in AHS throws Internal Server Error with NPE
Jonathan Eagles created YARN-2900: - Summary: Application Not Found in AHS throws Internal Server Error with NPE Key: YARN-2900 URL: https://issues.apache.org/jira/browse/YARN-2900 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jonathan Eagles Assignee: Mit Desai -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2900) Application Not Found in AHS throws Internal Server Error with NPE
[ https://issues.apache.org/jira/browse/YARN-2900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223552#comment-14223552 ] Jonathan Eagles commented on YARN-2900: --- Application not found in the history store should be a normal case and not an exceptional in the REST api case since the application id is user provided information. Application Not Found in AHS throws Internal Server Error with NPE -- Key: YARN-2900 URL: https://issues.apache.org/jira/browse/YARN-2900 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jonathan Eagles Assignee: Mit Desai Caused by: java.lang.NullPointerException at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.convertToApplicationReport(ApplicationHistoryManagerImpl.java:128) at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.getApplication(ApplicationHistoryManagerImpl.java:118) at org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:222) at org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:219) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679) at org.apache.hadoop.yarn.server.webapp.WebServices.getApp(WebServices.java:218) ... 59 more -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2900) Application Not Found in AHS throws Internal Server Error with NPE
[ https://issues.apache.org/jira/browse/YARN-2900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles updated YARN-2900: -- Description: Caused by: java.lang.NullPointerException at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.convertToApplicationReport(ApplicationHistoryManagerImpl.java:128) at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.getApplication(ApplicationHistoryManagerImpl.java:118) at org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:222) at org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:219) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679) at org.apache.hadoop.yarn.server.webapp.WebServices.getApp(WebServices.java:218) ... 59 more Application Not Found in AHS throws Internal Server Error with NPE -- Key: YARN-2900 URL: https://issues.apache.org/jira/browse/YARN-2900 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jonathan Eagles Assignee: Mit Desai Caused by: java.lang.NullPointerException at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.convertToApplicationReport(ApplicationHistoryManagerImpl.java:128) at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.getApplication(ApplicationHistoryManagerImpl.java:118) at org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:222) at org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:219) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679) at org.apache.hadoop.yarn.server.webapp.WebServices.getApp(WebServices.java:218) ... 59 more -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2900) Application Not Found in AHS throws Internal Server Error with NPE
[ https://issues.apache.org/jira/browse/YARN-2900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223621#comment-14223621 ] Jonathan Eagles commented on YARN-2900: --- [~zjshen], please don't jump to any conclusions. This is my setup, which I believe is a supported configuration for 2.6.0. {quote} yarn.timeline-service.generic-application-history.enabled=false yarn.timeline-service.generic-application-history.store-class=org.apache.hadoop.yarn.server.applicationhistoryservice.NullApplicationHistoryStore {quote} The Tez UI make applicationhistory rest api calls to gather fine details for those who have it enabled. In my case where generic history is disabled, it is causing massive flooding of log files. As far as not finding the duplicate JIRA, I was unable to find this issue in the search. Try to include details that are searchable (stack track, logs, class/file names) so that users are able to find the appropriate issue. Application Not Found in AHS throws Internal Server Error with NPE -- Key: YARN-2900 URL: https://issues.apache.org/jira/browse/YARN-2900 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jonathan Eagles Assignee: Mit Desai Attachments: YARN-2900.patch Caused by: java.lang.NullPointerException at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.convertToApplicationReport(ApplicationHistoryManagerImpl.java:128) at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.getApplication(ApplicationHistoryManagerImpl.java:118) at org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:222) at org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:219) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679) at org.apache.hadoop.yarn.server.webapp.WebServices.getApp(WebServices.java:218) ... 59 more -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2900) Application Not Found in AHS throws Internal Server Error with NPE
[ https://issues.apache.org/jira/browse/YARN-2900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223671#comment-14223671 ] Jonathan Eagles commented on YARN-2900: --- I do see this in the log file that is suspicious now that I am looking at the code. 2014-11-24 22:12:42,107 [main] WARN applicationhistoryservice.ApplicationHistoryServer: The filesystem based application history store is deprecated. Looking into this. Application Not Found in AHS throws Internal Server Error with NPE -- Key: YARN-2900 URL: https://issues.apache.org/jira/browse/YARN-2900 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jonathan Eagles Assignee: Mit Desai Attachments: YARN-2900.patch Caused by: java.lang.NullPointerException at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.convertToApplicationReport(ApplicationHistoryManagerImpl.java:128) at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.getApplication(ApplicationHistoryManagerImpl.java:118) at org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:222) at org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:219) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679) at org.apache.hadoop.yarn.server.webapp.WebServices.getApp(WebServices.java:218) ... 59 more -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2900) Application Not Found in AHS throws Internal Server Error with NPE
[ https://issues.apache.org/jira/browse/YARN-2900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223676#comment-14223676 ] Jonathan Eagles commented on YARN-2900: --- Issue is spacing in the config file. Here is the updated stack trace. {quote} 2014-11-24 22:34:53,900 [17694135@qtp-11347161-6] WARN webapp.GenericExceptionHandler: INTERNAL_SERVER_ERROR javax.ws.rs.WebApplicationException: org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: The entity for application application_1416586084624_0011 doesn't exist in the timeline store at org.apache.hadoop.yarn.server.webapp.WebServices.rewrapAndThrowException(WebServices.java:452) at org.apache.hadoop.yarn.server.webapp.WebServices.getApp(WebServices.java:227) at org.apache.hadoop.yarn.server.applicationhistoryservice.webapp.AHSWebServices.getApp(AHSWebServices.java:95) Caused by: org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: The entity for application application_1416586084624_0011 doesn't exist in the timeline store at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore.getApplication(ApplicationHistoryManagerOnTimelineStore.java:542) at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore.getApplication(ApplicationHistoryManagerOnTimelineStore.java:94) at org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:222) at org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:219) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679) at org.apache.hadoop.yarn.server.webapp.WebServices.getApp(WebServices.java:218) ... 59 more {quote} Application Not Found in AHS throws Internal Server Error with NPE -- Key: YARN-2900 URL: https://issues.apache.org/jira/browse/YARN-2900 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jonathan Eagles Assignee: Mit Desai Attachments: YARN-2900.patch Caused by: java.lang.NullPointerException at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.convertToApplicationReport(ApplicationHistoryManagerImpl.java:128) at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.getApplication(ApplicationHistoryManagerImpl.java:118) at org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:222) at org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:219) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679) at org.apache.hadoop.yarn.server.webapp.WebServices.getApp(WebServices.java:218) ... 59 more -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2900) Application Not Found in AHS throws Internal Server Error with NPE
[ https://issues.apache.org/jira/browse/YARN-2900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223694#comment-14223694 ] Jonathan Eagles commented on YARN-2900: --- FYI: Here is the config that was causing the original failure. Notice the newline as part of the value. {quote} property descriptionStore class name for history store, defaulting to file system store/description nameyarn.timeline-service.generic-application-history.store-class/name valueorg.apache.hadoop.yarn.server.applicationhistoryservice.NullApplicationHistoryStore /value /property {quote} Internal System Error is still happens with ApplicationHistoryManagerOnTimelineStore which this issues now tracks. Application Not Found in AHS throws Internal Server Error with NPE -- Key: YARN-2900 URL: https://issues.apache.org/jira/browse/YARN-2900 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jonathan Eagles Assignee: Mit Desai Attachments: YARN-2900.patch Caused by: java.lang.NullPointerException at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.convertToApplicationReport(ApplicationHistoryManagerImpl.java:128) at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.getApplication(ApplicationHistoryManagerImpl.java:118) at org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:222) at org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:219) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679) at org.apache.hadoop.yarn.server.webapp.WebServices.getApp(WebServices.java:218) ... 59 more -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2900) Application (Attempt and Container) Not Found in AHS results in Internal Server Error (500)
[ https://issues.apache.org/jira/browse/YARN-2900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14234967#comment-14234967 ] Jonathan Eagles commented on YARN-2900: --- +1. [~zjshen], any last comments before this goes in? Application (Attempt and Container) Not Found in AHS results in Internal Server Error (500) --- Key: YARN-2900 URL: https://issues.apache.org/jira/browse/YARN-2900 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Jonathan Eagles Assignee: Mit Desai Attachments: YARN-2900.patch, YARN-2900.patch, YARN-2900.patch, YARN-2900.patch, YARN-2900.patch Caused by: java.lang.NullPointerException at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.convertToApplicationReport(ApplicationHistoryManagerImpl.java:128) at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.getApplication(ApplicationHistoryManagerImpl.java:118) at org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:222) at org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:219) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679) at org.apache.hadoop.yarn.server.webapp.WebServices.getApp(WebServices.java:218) ... 59 more -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2971) RM uses conf instead of service to renew timeline delegation tokens
Jonathan Eagles created YARN-2971: - Summary: RM uses conf instead of service to renew timeline delegation tokens Key: YARN-2971 URL: https://issues.apache.org/jira/browse/YARN-2971 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: 2.6.0 Reporter: Jonathan Eagles Assignee: Jonathan Eagles The TimelineClientImpl renewDelegationToken uses the incorrect webaddress to renew Timeline DelegationTokens. It should read the service address out of the token to renew the delegation token. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2971) RM uses conf instead of service address to renew timeline delegation tokens
[ https://issues.apache.org/jira/browse/YARN-2971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles updated YARN-2971: -- Summary: RM uses conf instead of service address to renew timeline delegation tokens (was: RM uses conf instead of service to renew timeline delegation tokens) RM uses conf instead of service address to renew timeline delegation tokens --- Key: YARN-2971 URL: https://issues.apache.org/jira/browse/YARN-2971 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: 2.6.0 Reporter: Jonathan Eagles Assignee: Jonathan Eagles The TimelineClientImpl renewDelegationToken uses the incorrect webaddress to renew Timeline DelegationTokens. It should read the service address out of the token to renew the delegation token. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2971) RM uses conf instead of token service address to renew timeline delegation tokens
[ https://issues.apache.org/jira/browse/YARN-2971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles updated YARN-2971: -- Summary: RM uses conf instead of token service address to renew timeline delegation tokens (was: RM uses conf instead of service address to renew timeline delegation tokens) RM uses conf instead of token service address to renew timeline delegation tokens - Key: YARN-2971 URL: https://issues.apache.org/jira/browse/YARN-2971 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: 2.6.0 Reporter: Jonathan Eagles Assignee: Jonathan Eagles The TimelineClientImpl renewDelegationToken uses the incorrect webaddress to renew Timeline DelegationTokens. It should read the service address out of the token to renew the delegation token. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2971) RM uses conf instead of token service address to renew timeline delegation tokens
[ https://issues.apache.org/jira/browse/YARN-2971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles updated YARN-2971: -- Attachment: YARN-2971-v1.patch RM uses conf instead of token service address to renew timeline delegation tokens - Key: YARN-2971 URL: https://issues.apache.org/jira/browse/YARN-2971 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: 2.6.0 Reporter: Jonathan Eagles Assignee: Jonathan Eagles Attachments: YARN-2971-v1.patch The TimelineClientImpl renewDelegationToken uses the incorrect webaddress to renew Timeline DelegationTokens. It should read the service address out of the token to renew the delegation token. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2971) RM uses conf instead of token service address to renew timeline delegation tokens
[ https://issues.apache.org/jira/browse/YARN-2971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249461#comment-14249461 ] Jonathan Eagles commented on YARN-2971: --- findbugs are unrelated to this patch. RM uses conf instead of token service address to renew timeline delegation tokens - Key: YARN-2971 URL: https://issues.apache.org/jira/browse/YARN-2971 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: 2.6.0 Reporter: Jonathan Eagles Assignee: Jonathan Eagles Attachments: YARN-2971-v1.patch The TimelineClientImpl renewDelegationToken uses the incorrect webaddress to renew Timeline DelegationTokens. It should read the service address out of the token to renew the delegation token. -- This message was sent by Atlassian JIRA (v6.3.4#6332)