[jira] [Commented] (YARN-941) RM Should have a way to update the tokens it has for a running application
[ https://issues.apache.org/jira/browse/YARN-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815946#comment-13815946 ] Steve Loughran commented on YARN-941: - Thinking some more, if the AM just terminated itself with a special error code when tokens stopped working, then when it exits the RM could restart it and not file this as a failure -which would push back most of the token expiry logic into the AM -the RM wouldn't need to see when tokens had expired on running containers and restart them, just wait for container stopped events as usual and react slightly differently to the exit codes. This strategy would also give the AM the opportunity do cleaner shutdowns, send out notifications c. If there was a way to determine the expiry time in advance then it could even display its remaining life to management tools/web UIs RM Should have a way to update the tokens it has for a running application -- Key: YARN-941 URL: https://issues.apache.org/jira/browse/YARN-941 Project: Hadoop YARN Issue Type: Sub-task Reporter: Robert Joseph Evans When an application is submitted to the RM it includes with it a set of tokens that the RM will renew on behalf of the application, that will be passed to the AM when the application is launched, and will be used when launching the application to access HDFS to download files on behalf of the application. For long lived applications/services these tokens can expire, and then the tokens that the AM has will be invalid, and the tokens that the RM had will also not work to launch a new AM. We need to provide an API that will allow the RM to replace the current tokens for this application with a new set. To avoid any real race issues, I think this API should be something that the AM calls, so that the client can connect to the AM with a new set of tokens it got using kerberos, then the AM can inform the RM of the new set of tokens and quickly update its tokens internally to use these new ones. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-80) Support delay scheduling for node locality in MR2's capacity scheduler
[ https://issues.apache.org/jira/browse/YARN-80?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815950#comment-13815950 ] Mck SembWever commented on YARN-80: --- How can one debug this process? It was easy before with just `grep Choosing hadoop-xxx-jobtracker.log`. I can't find any similar information in YARN log files. Background: I just upgraded to YARN (hadoop-2.2.0). And despite setting yarn.scheduler.capacity.node-locality-delay=3 in capacity-scheduler.xml data-locality is poor. (It was 100% with hadoop-0.22 and fair-scheduler). Support delay scheduling for node locality in MR2's capacity scheduler -- Key: YARN-80 URL: https://issues.apache.org/jira/browse/YARN-80 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler Reporter: Todd Lipcon Assignee: Arun C Murthy Fix For: 2.0.2-alpha, 0.23.6 Attachments: YARN-80.patch, YARN-80.patch The capacity scheduler in MR2 doesn't support delay scheduling for achieving node-level locality. So, jobs exhibit poor data locality even if they have good rack locality. Especially on clusters where disk throughput is much better than network capacity, this hurts overall job performance. We should optionally support node-level delay scheduling heuristics similar to what the fair scheduler implements in MR1. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-647) historyServer can't show container's log when aggregation is not enabled
[ https://issues.apache.org/jira/browse/YARN-647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Liang updated YARN-647: - Assignee: shenhong historyServer can't show container's log when aggregation is not enabled Key: YARN-647 URL: https://issues.apache.org/jira/browse/YARN-647 Project: Hadoop YARN Issue Type: Improvement Components: documentation Affects Versions: 0.23.7, 2.0.4-alpha Environment: yarn.log-aggregation-enable=false , HistoryServer will show like this: Aggregation is not enabled. Try the nodemanager at hd13-vm1:34669 Reporter: shenhong Assignee: shenhong Fix For: 2.2.1 Attachments: yarn-647.patch When yarn.log-aggregation-enable is seted to false, after a MR_App complete, we can't view the container's log from the HistoryServer, it shows message like: Aggregation is not enabled. Try the nodemanager at hd13-vm1:34669 Since we don't want to aggregate the container's log, because it will be a pressure to namenode. but sometimes we also want to take a look at container's log. Should we show the container's log across HistoryServer even if yarn.log-aggregation-enable is seted to false. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-647) historyServer can't show container's log when aggregation is not enabled
[ https://issues.apache.org/jira/browse/YARN-647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Liang updated YARN-647: - Fix Version/s: 2.2.1 historyServer can't show container's log when aggregation is not enabled Key: YARN-647 URL: https://issues.apache.org/jira/browse/YARN-647 Project: Hadoop YARN Issue Type: Improvement Components: documentation Affects Versions: 0.23.7, 2.0.4-alpha Environment: yarn.log-aggregation-enable=false , HistoryServer will show like this: Aggregation is not enabled. Try the nodemanager at hd13-vm1:34669 Reporter: shenhong Fix For: 2.2.1 Attachments: yarn-647.patch When yarn.log-aggregation-enable is seted to false, after a MR_App complete, we can't view the container's log from the HistoryServer, it shows message like: Aggregation is not enabled. Try the nodemanager at hd13-vm1:34669 Since we don't want to aggregate the container's log, because it will be a pressure to namenode. but sometimes we also want to take a look at container's log. Should we show the container's log across HistoryServer even if yarn.log-aggregation-enable is seted to false. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-647) historyServer can't show container's log when aggregation is not enabled
[ https://issues.apache.org/jira/browse/YARN-647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Liang updated YARN-647: - Affects Version/s: 2.2.0 historyServer can't show container's log when aggregation is not enabled Key: YARN-647 URL: https://issues.apache.org/jira/browse/YARN-647 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 0.23.7, 2.0.4-alpha, 2.2.0 Environment: yarn.log-aggregation-enable=false , HistoryServer will show like this: Aggregation is not enabled. Try the nodemanager at hd13-vm1:34669 Reporter: shenhong Assignee: shenhong Attachments: yarn-647.patch When yarn.log-aggregation-enable is seted to false, after a MR_App complete, we can't view the container's log from the HistoryServer, it shows message like: Aggregation is not enabled. Try the nodemanager at hd13-vm1:34669 Since we don't want to aggregate the container's log, because it will be a pressure to namenode. but sometimes we also want to take a look at container's log. Should we show the container's log across HistoryServer even if yarn.log-aggregation-enable is seted to false. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-647) historyServer can't show container's log when aggregation is not enabled
[ https://issues.apache.org/jira/browse/YARN-647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Liang updated YARN-647: - Component/s: (was: documentation) historyServer can't show container's log when aggregation is not enabled Key: YARN-647 URL: https://issues.apache.org/jira/browse/YARN-647 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 0.23.7, 2.0.4-alpha, 2.2.0 Environment: yarn.log-aggregation-enable=false , HistoryServer will show like this: Aggregation is not enabled. Try the nodemanager at hd13-vm1:34669 Reporter: shenhong Assignee: shenhong Attachments: yarn-647.patch When yarn.log-aggregation-enable is seted to false, after a MR_App complete, we can't view the container's log from the HistoryServer, it shows message like: Aggregation is not enabled. Try the nodemanager at hd13-vm1:34669 Since we don't want to aggregate the container's log, because it will be a pressure to namenode. but sometimes we also want to take a look at container's log. Should we show the container's log across HistoryServer even if yarn.log-aggregation-enable is seted to false. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-647) historyServer can't show container's log when aggregation is not enabled
[ https://issues.apache.org/jira/browse/YARN-647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816005#comment-13816005 ] Hadoop QA commented on YARN-647: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12581885/yarn-647.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2390//console This message is automatically generated. historyServer can't show container's log when aggregation is not enabled Key: YARN-647 URL: https://issues.apache.org/jira/browse/YARN-647 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 0.23.7, 2.0.4-alpha, 2.2.0 Environment: yarn.log-aggregation-enable=false , HistoryServer will show like this: Aggregation is not enabled. Try the nodemanager at hd13-vm1:34669 Reporter: shenhong Assignee: shenhong Attachments: yarn-647.patch When yarn.log-aggregation-enable is seted to false, after a MR_App complete, we can't view the container's log from the HistoryServer, it shows message like: Aggregation is not enabled. Try the nodemanager at hd13-vm1:34669 Since we don't want to aggregate the container's log, because it will be a pressure to namenode. but sometimes we also want to take a look at container's log. Should we show the container's log across HistoryServer even if yarn.log-aggregation-enable is seted to false. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-647) historyServer can't show container's log when aggregation is not enabled
[ https://issues.apache.org/jira/browse/YARN-647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816216#comment-13816216 ] Zhijie Shen commented on YARN-647: -- {code} +if (!aggregation) { + logsLink = join(HttpConfig.getSchemePrefix(), nodeHttpAddr, + /node, /containerlogs/, + containerIdString, /, app.getJob().getUserName()); {code} I'm afraid this link will not work as well. When a container is stopped, NM local logs will be deleted and not be accessible via webUI or service again. However, I'm afraid Try the nodemanager at ... is a bit misleading as well, which makes users think that the logs are available via NM webUI. The fact is that for debugging purpose, we can config yarn.nodemanager.delete.debug-delay-sec to delay the deletion of NM local logs, but it's not user-oriented, and anyway, they're not accessible via web. Maybe we want to remove misleading words here? historyServer can't show container's log when aggregation is not enabled Key: YARN-647 URL: https://issues.apache.org/jira/browse/YARN-647 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 0.23.7, 2.0.4-alpha, 2.2.0 Environment: yarn.log-aggregation-enable=false , HistoryServer will show like this: Aggregation is not enabled. Try the nodemanager at hd13-vm1:34669 Reporter: shenhong Assignee: shenhong Attachments: yarn-647.patch When yarn.log-aggregation-enable is seted to false, after a MR_App complete, we can't view the container's log from the HistoryServer, it shows message like: Aggregation is not enabled. Try the nodemanager at hd13-vm1:34669 Since we don't want to aggregate the container's log, because it will be a pressure to namenode. but sometimes we also want to take a look at container's log. Should we show the container's log across HistoryServer even if yarn.log-aggregation-enable is seted to false. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-90) NodeManager should identify failed disks becoming good back again
[ https://issues.apache.org/jira/browse/YARN-90?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816232#comment-13816232 ] Ravi Prakash commented on YARN-90: -- Thanks for updating the patch Song! With almost the same changes as nigel, I was able to get the originally invalid directories to be used again. So the src/main code looks good to me. The one nit I had was that {code} } catch (IOException e2) { Assert.fail(should not throw an exception); Shell.execCommand(Shell.getSetPermissionCommand(755, false, testDir.getAbsolutePath())); throw e2; } {code}, {code} catch (InterruptedException e1) { } {code} , {code} } catch (IOException e2) { Assert.fail(should not throw an exception); throw e2; } {code} and {code} } catch (IOException e) { Assert.fail(Service should have thrown an exception while closing); throw e; } {code} can simply be removed. Other than that, the patch looks good to me. +1. Thanks a lot Nigel and Song! NodeManager should identify failed disks becoming good back again - Key: YARN-90 URL: https://issues.apache.org/jira/browse/YARN-90 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Ravi Gummadi Attachments: YARN-90.1.patch, YARN-90.patch, YARN-90.patch, YARN-90.patch MAPREDUCE-3121 makes NodeManager identify disk failures. But once a disk goes down, it is marked as failed forever. To reuse that disk (after it becomes good), NodeManager needs restart. This JIRA is to improve NodeManager to reuse good disks(which could be bad some time back). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-584) In fair scheduler web UI, queues unexpand on refresh
[ https://issues.apache.org/jira/browse/YARN-584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harshit Daga updated YARN-584: -- Attachment: YARN-584-branch-2.2.0.patch A patch to fix this issue. In fair scheduler web UI, queues unexpand on refresh Key: YARN-584 URL: https://issues.apache.org/jira/browse/YARN-584 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.0.3-alpha Reporter: Sandy Ryza Labels: newbie Attachments: YARN-584-branch-2.2.0.patch In the fair scheduler web UI, you can expand queue information. Refreshing the page causes the expansions to go away, which is annoying for someone who wants to monitor the scheduler page and needs to reopen all the queues they care about each time. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-584) In fair scheduler web UI, queues unexpand on refresh
[ https://issues.apache.org/jira/browse/YARN-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816444#comment-13816444 ] Sandy Ryza commented on YARN-584: - Thanks the picking this up, Harshit. It seems like it would be difficult to write unit tests for this, so I'm ok with not including them. Can you mention the steps you've taken to manually test the patch? As the added code is very similar for both schedulers, are we able to share it in a central location? We could create a SchedulerPageUtil. Can we change the names of addRemoveQueueToQuery and the other function names to make it clear that they're related to expand / unexpand? A few stylistic comments: * Lines should be broken up so that they fit within close to 80 characters. * There should be a space between if and for and the corresponding open parentheses. * There should be spaces between the operators and operands in conditions, e.g (a == b), not (a==b) * There should be spaces after the semicolons in for loops. * There should be spaces in between else and the surrounding curly braces. * Closing curly braces should line up vertically with the beginning of the statement like so: {code} if (condition) { dostuff; } not if (condition) { dostuff; } {code} In fair scheduler web UI, queues unexpand on refresh Key: YARN-584 URL: https://issues.apache.org/jira/browse/YARN-584 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.0.3-alpha Reporter: Sandy Ryza Labels: newbie Attachments: YARN-584-branch-2.2.0.patch In the fair scheduler web UI, you can expand queue information. Refreshing the page causes the expansions to go away, which is annoying for someone who wants to monitor the scheduler page and needs to reopen all the queues they care about each time. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1387) RMWebServices should use ClientRMService for filtering applications
[ https://issues.apache.org/jira/browse/YARN-1387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-1387: --- Description: YARN's REST API allows filtering applications, this should be moved to ClientRMService to allow Java API also support the same functionality. (was: YARN's REST API allows filtering applications, but the Java API doesn't. Ideally, the Java API should implement this and the REST implementation should reuse it.) RMWebServices should use ClientRMService for filtering applications --- Key: YARN-1387 URL: https://issues.apache.org/jira/browse/YARN-1387 Project: Hadoop YARN Issue Type: Improvement Components: api Affects Versions: 2.2.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla YARN's REST API allows filtering applications, this should be moved to ClientRMService to allow Java API also support the same functionality. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1387) RMWebServices should use ClientRMService for filtering applications
[ https://issues.apache.org/jira/browse/YARN-1387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-1387: --- Summary: RMWebServices should use ClientRMService for filtering applications (was: Add Java API to filter RM applications) RMWebServices should use ClientRMService for filtering applications --- Key: YARN-1387 URL: https://issues.apache.org/jira/browse/YARN-1387 Project: Hadoop YARN Issue Type: Improvement Components: api Affects Versions: 2.2.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla YARN's REST API allows filtering applications, but the Java API doesn't. Ideally, the Java API should implement this and the REST implementation should reuse it. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1387) RMWebServices should use ClientRMService for filtering applications
[ https://issues.apache.org/jira/browse/YARN-1387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-1387: --- Attachment: yarn-1387-1.patch Here is a first-cut implementation. ClientRMService implements all the filtering and RMWebServices directly uses it. TestRMWebServices,TestRMWebServicesApps pass - so, I think this doesn't introduce any regressions as such. Any early feedback on the overall approach would greatly help. TODO: Manual testing on a pseudo-dist cluster. RMWebServices should use ClientRMService for filtering applications --- Key: YARN-1387 URL: https://issues.apache.org/jira/browse/YARN-1387 Project: Hadoop YARN Issue Type: Improvement Components: api Affects Versions: 2.2.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: yarn-1387-1.patch YARN's REST API allows filtering applications, this should be moved to ClientRMService to allow Java API also support the same functionality. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-584) In fair scheduler web UI, queues unexpand on refresh
[ https://issues.apache.org/jira/browse/YARN-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816757#comment-13816757 ] Hadoop QA commented on YARN-584: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12612691/YARN-584-branch-2.2.0.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2392//console This message is automatically generated. In fair scheduler web UI, queues unexpand on refresh Key: YARN-584 URL: https://issues.apache.org/jira/browse/YARN-584 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.0.3-alpha Reporter: Sandy Ryza Labels: newbie Attachments: YARN-584-branch-2.2.0.patch In the fair scheduler web UI, you can expand queue information. Refreshing the page causes the expansions to go away, which is annoying for someone who wants to monitor the scheduler page and needs to reopen all the queues they care about each time. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Assigned] (YARN-691) Invalid NaN values in Hadoop REST API JSON response
[ https://issues.apache.org/jira/browse/YARN-691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daryn Sharp reassigned YARN-691: Assignee: Daryn Sharp Invalid NaN values in Hadoop REST API JSON response --- Key: YARN-691 URL: https://issues.apache.org/jira/browse/YARN-691 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 0.23.6, 2.0.4-alpha Reporter: Kendall Thrapp Assignee: Daryn Sharp I've been occasionally coming across instances where Hadoop's Cluster Applications REST API (http://hadoop.apache.org/docs/r0.23.6/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Applications_API) has returned JSON that PHP's json_decode function failed to parse. I've tracked the syntax error down to the presence of the unquoted word NaN appearing as a value in the JSON. For example: progress:NaN, NaN is not part of the JSON spec, so its presence renders the whole JSON string invalid. Hadoop needs to return something other than NaN in this case -- perhaps an empty string or the quoted string NaN. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1387) RMWebServices should use ClientRMService for filtering applications
[ https://issues.apache.org/jira/browse/YARN-1387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816789#comment-13816789 ] Hadoop QA commented on YARN-1387: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12612726/yarn-1387-1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 1 warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2391//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2391//console This message is automatically generated. RMWebServices should use ClientRMService for filtering applications --- Key: YARN-1387 URL: https://issues.apache.org/jira/browse/YARN-1387 Project: Hadoop YARN Issue Type: Improvement Components: api Affects Versions: 2.2.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: yarn-1387-1.patch YARN's REST API allows filtering applications, this should be moved to ClientRMService to allow Java API also support the same functionality. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-691) Invalid NaN values in Hadoop REST API JSON response
[ https://issues.apache.org/jira/browse/YARN-691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daryn Sharp updated YARN-691: - Assignee: (was: Daryn Sharp) Invalid NaN values in Hadoop REST API JSON response --- Key: YARN-691 URL: https://issues.apache.org/jira/browse/YARN-691 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 0.23.6, 2.0.4-alpha Reporter: Kendall Thrapp I've been occasionally coming across instances where Hadoop's Cluster Applications REST API (http://hadoop.apache.org/docs/r0.23.6/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Applications_API) has returned JSON that PHP's json_decode function failed to parse. I've tracked the syntax error down to the presence of the unquoted word NaN appearing as a value in the JSON. For example: progress:NaN, NaN is not part of the JSON spec, so its presence renders the whole JSON string invalid. Hadoop needs to return something other than NaN in this case -- perhaps an empty string or the quoted string NaN. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-691) Invalid NaN values in Hadoop REST API JSON response
[ https://issues.apache.org/jira/browse/YARN-691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816792#comment-13816792 ] Chen He commented on YARN-691: -- Please assign to me ! Invalid NaN values in Hadoop REST API JSON response --- Key: YARN-691 URL: https://issues.apache.org/jira/browse/YARN-691 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 0.23.6, 2.0.4-alpha Reporter: Kendall Thrapp I've been occasionally coming across instances where Hadoop's Cluster Applications REST API (http://hadoop.apache.org/docs/r0.23.6/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Applications_API) has returned JSON that PHP's json_decode function failed to parse. I've tracked the syntax error down to the presence of the unquoted word NaN appearing as a value in the JSON. For example: progress:NaN, NaN is not part of the JSON spec, so its presence renders the whole JSON string invalid. Hadoop needs to return something other than NaN in this case -- perhaps an empty string or the quoted string NaN. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-691) Invalid NaN values in Hadoop REST API JSON response
[ https://issues.apache.org/jira/browse/YARN-691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816799#comment-13816799 ] Chen He commented on YARN-691: -- I think JIRA has a bug for this issue. I can assign myself Mapreduce-5052 but can not assign myself this task. I also asked some committer to help, still the same problem. Looks like this is a BUG for JIRA!!! Invalid NaN values in Hadoop REST API JSON response --- Key: YARN-691 URL: https://issues.apache.org/jira/browse/YARN-691 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 0.23.6, 2.0.4-alpha Reporter: Kendall Thrapp I've been occasionally coming across instances where Hadoop's Cluster Applications REST API (http://hadoop.apache.org/docs/r0.23.6/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Applications_API) has returned JSON that PHP's json_decode function failed to parse. I've tracked the syntax error down to the presence of the unquoted word NaN appearing as a value in the JSON. For example: progress:NaN, NaN is not part of the JSON spec, so its presence renders the whole JSON string invalid. Hadoop needs to return something other than NaN in this case -- perhaps an empty string or the quoted string NaN. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1376) NM need to notify the log aggregation status to RM through Node heartbeat
[ https://issues.apache.org/jira/browse/YARN-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1376: Attachment: YARN-1376.2.patch NM need to notify the log aggregation status to RM through Node heartbeat - Key: YARN-1376 URL: https://issues.apache.org/jira/browse/YARN-1376 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-1376.1.patch, YARN-1376.2.patch Expose a client API to allow clients to figure if log aggregation is complete. The ticket is used to track the changes on NM side -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1242) AHS start as independent process
[ https://issues.apache.org/jira/browse/YARN-1242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816816#comment-13816816 ] Zhijie Shen commented on YARN-1242: --- 1. Please fix yarn.cmd as well, which is for windows. 2. As we decide AHS to be independent. These params should be those of RM, right? {code} + YARN_OPTS=$YARN_OPTS $YARN_RESOURCEMANAGER_OPTS + if [ $YARN_RESOURCEMANAGER_HEAPSIZE != ]; then +JAVA_HEAP_MAX=-Xmx$YARN_RESOURCEMANAGER_HEAPSIZEm + fi {code} AHS start as independent process Key: YARN-1242 URL: https://issues.apache.org/jira/browse/YARN-1242 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Mayank Bansal Attachments: YARN-1242-1.patch Maybe we should include AHS classes as well (for developer usage) in yarn and yarn.cmd -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1242) AHS start as independent process
[ https://issues.apache.org/jira/browse/YARN-1242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-1242: -- Description: Add the command in yarn and yarn.cmd to start and stop AHS (was: Maybe we should include AHS classes as well (for developer usage) in yarn and yarn.cmd) AHS start as independent process Key: YARN-1242 URL: https://issues.apache.org/jira/browse/YARN-1242 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Mayank Bansal Attachments: YARN-1242-1.patch Add the command in yarn and yarn.cmd to start and stop AHS -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1279) Expose a client API to allow clients to figure if log aggregation is complete
[ https://issues.apache.org/jira/browse/YARN-1279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816833#comment-13816833 ] Xuan Gong commented on YARN-1279: - bq. Does it make sense to send an event to RMApp to process the app log status, instead of explicitly creating an update api of RMApp? We can do that. Add a new RMAppEvent (RMAppLogAggregationStatusUpdateEvent) in FINISHED state and KILLED state bq. why is it possible for a single node to first get log aggregation succeeded then Failed ? Actually, I spend more time to think about this question. I made some changes in NM side. The ApplicationLogAggregationStatus will be set into NMContext when the applicationImpl receives the APPLICATION_LOG_HANDLING_FINISHED event. In that case, we can make sure that NM only send the logAggregationStatus out if the RMApp is finished/killed. This also makes sure that we will not receive two different logAggregationStatus from the same NM. bq. I think we can have separate maps, one for completed succeeded node aggregation, the other for failed node aggregation. Then we don't need two more counters for counting succeeded or failed nodes and those increment/decrement logic. Make sense. Changed bq. It's good to append failed log aggregation node info and also the diagnostics coming with ApplicationLogStatus to the diagnostics of the app. Added bq. Do we need a separate Timeout state? is it good to append the timeout diagnostics and return the state as Failed ? this code logic can be simplified to say, if exceeds timeout period return FAILED or Timeout, otherwise return In_Progress. And so we can remove the logAggregationTimeOutDisabled boolean. About this, I still prefer to keep the status something like TIME_OUT. Because it is different when we say logAggregation is time_out between it is failed. For my understanding, the process of Log aggregation should be quick(At nm side). But it will definitely add lots of delay to notify the RMApp. So, the time_out means we are waitting for a long time. I do not the current state right now. It may be better to simply say the log aggregation is failed. bq. ApplicationLogStatus is better to be named as ApplicationLogAggregationStatus renamed bq. containerLogAggregationFail doesn't need to be atmoicBoolean changed Expose a client API to allow clients to figure if log aggregation is complete - Key: YARN-1279 URL: https://issues.apache.org/jira/browse/YARN-1279 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.2.0 Reporter: Arun C Murthy Assignee: Xuan Gong Attachments: YARN-1279.1.patch, YARN-1279.2.patch, YARN-1279.2.patch, YARN-1279.3.patch, YARN-1279.3.patch, YARN-1279.4.patch, YARN-1279.4.patch, YARN-1279.5.patch, YARN-1279.6.patch, YARN-1279.7.patch, YARN-1279.8.patch Expose a client API to allow clients to figure if log aggregation is complete -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1279) Expose a client API to allow clients to figure if log aggregation is complete
[ https://issues.apache.org/jira/browse/YARN-1279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1279: Attachment: YARN-1279.8.patch Expose a client API to allow clients to figure if log aggregation is complete - Key: YARN-1279 URL: https://issues.apache.org/jira/browse/YARN-1279 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.2.0 Reporter: Arun C Murthy Assignee: Xuan Gong Attachments: YARN-1279.1.patch, YARN-1279.2.patch, YARN-1279.2.patch, YARN-1279.3.patch, YARN-1279.3.patch, YARN-1279.4.patch, YARN-1279.4.patch, YARN-1279.5.patch, YARN-1279.6.patch, YARN-1279.7.patch, YARN-1279.8.patch Expose a client API to allow clients to figure if log aggregation is complete -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1376) NM need to notify the log aggregation status to RM through Node heartbeat
[ https://issues.apache.org/jira/browse/YARN-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1376: Attachment: YARN-1376.2.patch NM need to notify the log aggregation status to RM through Node heartbeat - Key: YARN-1376 URL: https://issues.apache.org/jira/browse/YARN-1376 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-1376.1.patch, YARN-1376.2.patch, YARN-1376.2.patch Expose a client API to allow clients to figure if log aggregation is complete. The ticket is used to track the changes on NM side -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Assigned] (YARN-691) Invalid NaN values in Hadoop REST API JSON response
[ https://issues.apache.org/jira/browse/YARN-691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe reassigned YARN-691: --- Assignee: Chen He The problem is you were not listed in the contributors list for the YARN project. I added you, so you will now be able to assign YARN JIRAs to yourself. Invalid NaN values in Hadoop REST API JSON response --- Key: YARN-691 URL: https://issues.apache.org/jira/browse/YARN-691 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 0.23.6, 2.0.4-alpha Reporter: Kendall Thrapp Assignee: Chen He I've been occasionally coming across instances where Hadoop's Cluster Applications REST API (http://hadoop.apache.org/docs/r0.23.6/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Applications_API) has returned JSON that PHP's json_decode function failed to parse. I've tracked the syntax error down to the presence of the unquoted word NaN appearing as a value in the JSON. For example: progress:NaN, NaN is not part of the JSON spec, so its presence renders the whole JSON string invalid. Hadoop needs to return something other than NaN in this case -- perhaps an empty string or the quoted string NaN. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1279) Expose a client API to allow clients to figure if log aggregation is complete
[ https://issues.apache.org/jira/browse/YARN-1279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816841#comment-13816841 ] Hadoop QA commented on YARN-1279: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12612740/YARN-1279.8.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 9 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2393//console This message is automatically generated. Expose a client API to allow clients to figure if log aggregation is complete - Key: YARN-1279 URL: https://issues.apache.org/jira/browse/YARN-1279 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.2.0 Reporter: Arun C Murthy Assignee: Xuan Gong Attachments: YARN-1279.1.patch, YARN-1279.2.patch, YARN-1279.2.patch, YARN-1279.3.patch, YARN-1279.3.patch, YARN-1279.4.patch, YARN-1279.4.patch, YARN-1279.5.patch, YARN-1279.6.patch, YARN-1279.7.patch, YARN-1279.8.patch Expose a client API to allow clients to figure if log aggregation is complete -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1279) Expose a client API to allow clients to figure if log aggregation is complete
[ https://issues.apache.org/jira/browse/YARN-1279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816846#comment-13816846 ] Hadoop QA commented on YARN-1279: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12612745/YARN-1279.8.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 9 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2394//console This message is automatically generated. Expose a client API to allow clients to figure if log aggregation is complete - Key: YARN-1279 URL: https://issues.apache.org/jira/browse/YARN-1279 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.2.0 Reporter: Arun C Murthy Assignee: Xuan Gong Attachments: YARN-1279.1.patch, YARN-1279.2.patch, YARN-1279.2.patch, YARN-1279.3.patch, YARN-1279.3.patch, YARN-1279.4.patch, YARN-1279.4.patch, YARN-1279.5.patch, YARN-1279.6.patch, YARN-1279.7.patch, YARN-1279.8.patch, YARN-1279.8.patch Expose a client API to allow clients to figure if log aggregation is complete -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-955) [YARN-321] Implementation of ApplicationHistoryProtocol
[ https://issues.apache.org/jira/browse/YARN-955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816872#comment-13816872 ] Zhijie Shen commented on YARN-955: -- 1. It's not necessary, and shouldn't be public. {code} +public ApplicationAttemptReport getApplicationAttempt( +ApplicationAttemptId appAttemptId) throws IOException { + return history.getApplicationAttempt(appAttemptId); +} {code} 2. It's not a recommended way to construct pb instances. Please use .newInstance() for all the responses here. {code} + GetApplicationAttemptReportResponse response = Records + .newRecord(GetApplicationAttemptReportResponse.class); {code} 3. Why the reference of the implementation is used? Should we use ApplicationHistoryManager, the interface, instead? {code} + ApplicationHistoryManagerImpl historyService; {code} 4. In TestApplicationHistoryClientService, please write multiple instances, and test gets methods as well. 5. Add \@VisibleForTesting {code} + @Private + public ApplicationHistoryClientService getClientService() { +return this.ahsClientService; + } {code} 6. Unwrap the method and put it directly in ApplicationHistoryServer.main {code} + static ApplicationHistoryServer launchAppHistoryServer(String[] args) { {code} [YARN-321] Implementation of ApplicationHistoryProtocol --- Key: YARN-955 URL: https://issues.apache.org/jira/browse/YARN-955 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Mayank Bansal Attachments: YARN-955-1.patch, YARN-955-2.patch, YARN-955-3.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Resolved] (YARN-1289) Configuration yarn.nodemanager.aux-services should have default value for mapreduce_shuffle.
[ https://issues.apache.org/jira/browse/YARN-1289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du resolved YARN-1289. -- Resolution: Invalid Configuration yarn.nodemanager.aux-services should have default value for mapreduce_shuffle. -- Key: YARN-1289 URL: https://issues.apache.org/jira/browse/YARN-1289 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: wenwupeng Attachments: YARN-1289.patch Failed to run benchmark when not configure yarn.nodemanager.aux-services value in yarn-site.xml', it is better to configure default value. 13/10/09 22:19:23 INFO mapreduce.Job: Task Id : attempt_1381371516570_0001_m_00_1, Status : FAILED Container launch failed for container_1381371516570_0001_01_05 : org.apache.hadoop.yarn.exceptions.InvalidAuxServiceException: The auxService:mapreduce_shuffle does not exist at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:152) at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106) at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:155) at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:369) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-974) RMContainer should collect more useful information to be recorded in Application-History
[ https://issues.apache.org/jira/browse/YARN-974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-974: - Attachment: YARN-974.3.patch Updated the patch: 1. Rebase according to the latest ContainerHistoryData 2. Simplify the way to collect the information. No need to change the Event object any more. 3. Update the test cases Please note that the logURL should be updated to point to the AHS web page, which shows the aggregated logs. This could be updated whenever YARN-954 is done. RMContainer should collect more useful information to be recorded in Application-History Key: YARN-974 URL: https://issues.apache.org/jira/browse/YARN-974 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-974.1.patch, YARN-974.2.patch, YARN-974.3.patch To record the history of a container, users may be also interested in the following information: 1. Start Time 2. Stop Time 3. Diagnostic Information 4. URL to the Log File 5. Actually Allocated Resource 6. Actually Assigned Node These should be remembered during the RMContainer's life cycle. -- This message was sent by Atlassian JIRA (v6.1#6144)