[jira] [Commented] (YARN-1131) $ yarn logs should return a message log aggregation is during progress if YARN application is running
[ https://issues.apache.org/jira/browse/YARN-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784848#comment-13784848 ] Hitesh Shah commented on YARN-1131: --- Sounds good. Would you mind opening jiras for the open comments? Also, the test failure needs to be addressed. $ yarn logs should return a message log aggregation is during progress if YARN application is running - Key: YARN-1131 URL: https://issues.apache.org/jira/browse/YARN-1131 Project: Hadoop YARN Issue Type: Sub-task Components: client Affects Versions: 2.1.1-beta Reporter: Tassapol Athiapinya Assignee: Siddharth Seth Priority: Minor Attachments: YARN-1131.1.txt In the case when log aggregation is enabled, if a user submits MapReduce job and runs $ yarn logs -applicationId app ID while the YARN application is running, the command will return no message and return user back to shell. It is nice to tell the user that log aggregation is in progress. {code} -bash-4.1$ /usr/bin/yarn logs -applicationId application_1377900193583_0002 -bash-4.1$ {code} At the same time, if invalid application ID is given, YARN CLI should say that the application ID is incorrect rather than throwing NoSuchElementException. {code} $ /usr/bin/yarn logs -applicationId application_0 Exception in thread main java.util.NoSuchElementException at com.google.common.base.AbstractIterator.next(AbstractIterator.java:75) at org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:124) at org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:119) at org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:110) at org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:255) {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1131) $yarn logs command should return an appropriate error message if YARN application is still running
[ https://issues.apache.org/jira/browse/YARN-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784850#comment-13784850 ] Siddharth Seth commented on YARN-1131: -- Will open the followup jiras. Running this through jenkins again. Haven't seen the specific test fail or timeout on my local runs. $yarn logs command should return an appropriate error message if YARN application is still running -- Key: YARN-1131 URL: https://issues.apache.org/jira/browse/YARN-1131 Project: Hadoop YARN Issue Type: Sub-task Components: client Affects Versions: 2.1.1-beta Reporter: Tassapol Athiapinya Assignee: Siddharth Seth Priority: Minor Attachments: YARN-1131.1.txt, YARN-1131.2.txt In the case when log aggregation is enabled, if a user submits MapReduce job and runs $ yarn logs -applicationId app ID while the YARN application is running, the command will return no message and return user back to shell. It is nice to tell the user that log aggregation is in progress. {code} -bash-4.1$ /usr/bin/yarn logs -applicationId application_1377900193583_0002 -bash-4.1$ {code} At the same time, if invalid application ID is given, YARN CLI should say that the application ID is incorrect rather than throwing NoSuchElementException. {code} $ /usr/bin/yarn logs -applicationId application_0 Exception in thread main java.util.NoSuchElementException at com.google.common.base.AbstractIterator.next(AbstractIterator.java:75) at org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:124) at org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:119) at org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:110) at org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:255) {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1131) $yarn logs command should return an appropriate error message if YARN application is still running
[ https://issues.apache.org/jira/browse/YARN-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated YARN-1131: -- Summary: $yarn logs command should return an appropriate error message if YARN application is still running (was: $ yarn logs should return a message log aggregation is during progress if YARN application is running) $yarn logs command should return an appropriate error message if YARN application is still running -- Key: YARN-1131 URL: https://issues.apache.org/jira/browse/YARN-1131 Project: Hadoop YARN Issue Type: Sub-task Components: client Affects Versions: 2.1.1-beta Reporter: Tassapol Athiapinya Assignee: Siddharth Seth Priority: Minor Attachments: YARN-1131.1.txt, YARN-1131.2.txt In the case when log aggregation is enabled, if a user submits MapReduce job and runs $ yarn logs -applicationId app ID while the YARN application is running, the command will return no message and return user back to shell. It is nice to tell the user that log aggregation is in progress. {code} -bash-4.1$ /usr/bin/yarn logs -applicationId application_1377900193583_0002 -bash-4.1$ {code} At the same time, if invalid application ID is given, YARN CLI should say that the application ID is incorrect rather than throwing NoSuchElementException. {code} $ /usr/bin/yarn logs -applicationId application_0 Exception in thread main java.util.NoSuchElementException at com.google.common.base.AbstractIterator.next(AbstractIterator.java:75) at org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:124) at org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:119) at org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:110) at org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:255) {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-890) The roundup for memory values on resource manager UI is misleading
[ https://issues.apache.org/jira/browse/YARN-890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784864#comment-13784864 ] Siddharth Seth commented on YARN-890: - +1. Resources should not be rounded up. Is there a similar round up in the actual allocation code, which may cause additional containers to be allocated to a queue ?. Should the CS be allowing nodes to register if the nm-memory.mb is not a multiple of minimum-allocation-mb, or should it just be rounding down at registration ? The roundup for memory values on resource manager UI is misleading -- Key: YARN-890 URL: https://issues.apache.org/jira/browse/YARN-890 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Trupti Dhavle Assignee: Xuan Gong Attachments: Screen Shot 2013-07-10 at 10.43.34 AM.png, YARN-890.1.patch, YARN-890.2.patch From the yarn-site.xml, I see following values- property nameyarn.nodemanager.resource.memory-mb/name value4192/value /property property nameyarn.scheduler.maximum-allocation-mb/name value4192/value /property property nameyarn.scheduler.minimum-allocation-mb/name value1024/value /property However the resourcemanager UI shows total memory as 5MB -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-890) The roundup for memory values on resource manager UI is misleading
[ https://issues.apache.org/jira/browse/YARN-890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784878#comment-13784878 ] Zhijie Shen commented on YARN-890: -- bq. Is there a similar round up in the actual allocation code, which may cause additional containers to be allocated to a queue ? Checked this before. It seems it's only for web UI. bq. or should it just be rounding down at registration ? It sounds make sense. Allocated memory will always be a multiple of minimum-allocation-mb, therefore, the available memory will be so as well. In this sense, minimum-allocation-mb is somehow considered as 1 unit memory. We allocate n units memory to a container, the cluster remains m units container, blah blah... Probably, we can simplify the internal memory description. Just think out loud. The roundup for memory values on resource manager UI is misleading -- Key: YARN-890 URL: https://issues.apache.org/jira/browse/YARN-890 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Trupti Dhavle Assignee: Xuan Gong Attachments: Screen Shot 2013-07-10 at 10.43.34 AM.png, YARN-890.1.patch, YARN-890.2.patch From the yarn-site.xml, I see following values- property nameyarn.nodemanager.resource.memory-mb/name value4192/value /property property nameyarn.scheduler.maximum-allocation-mb/name value4192/value /property property nameyarn.scheduler.minimum-allocation-mb/name value1024/value /property However the resourcemanager UI shows total memory as 5MB -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1197) Support changing resources of an allocated container
[ https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784887#comment-13784887 ] Bikas Saha commented on YARN-1197: -- I dont think Sandy meant that the AM first tells the NM to decrease the size and then the NM informs the RM. He meant AM asks the RM. The RM decreases/increases the size and then the AM informs the NM about the change. RM-NM communication via heartbeat that may happen after some time. For decreasing resources, if the RM is to consider the free resource available only after the AM informs the NM and the NM heartbeats with the RM then this change may become more complicated since the current schedulers dont expect any lag in their allocations. This will also delay the allocation of the free space to others. Also this delay is determined by when the AM syncs with the NM. Thats not a good property. We should probably assume the decrease to be effective immediately and RM-NM sync should enforce that. The downside is that for the duration of the heartbeat interval, the node may get overbooked but that should not be a problem in practice since the container would already be using a lower value of resources before the AM asked its capacity to be decreased. The same problem does not hold for increasing resources. Support changing resources of an allocated container Key: YARN-1197 URL: https://issues.apache.org/jira/browse/YARN-1197 Project: Hadoop YARN Issue Type: Task Components: api, nodemanager, resourcemanager Affects Versions: 2.1.0-beta Reporter: Wangda Tan Attachments: yarn-1197.pdf Currently, YARN cannot support merge several containers in one node to a big container, which can make us incrementally ask resources, merge them to a bigger one, and launch our processes. The user scenario is described in the comments. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1131) $yarn logs command should return an appropriate error message if YARN application is still running
[ https://issues.apache.org/jira/browse/YARN-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784894#comment-13784894 ] Hadoop QA commented on YARN-1131: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12606532/YARN-1131.2.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The following test timeouts occurred in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common: org.apache.hadoop.yarn.client.cli.TestLogsCLI {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2076//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2076//console This message is automatically generated. $yarn logs command should return an appropriate error message if YARN application is still running -- Key: YARN-1131 URL: https://issues.apache.org/jira/browse/YARN-1131 Project: Hadoop YARN Issue Type: Sub-task Components: client Affects Versions: 2.1.1-beta Reporter: Tassapol Athiapinya Assignee: Siddharth Seth Priority: Minor Attachments: YARN-1131.1.txt, YARN-1131.2.txt In the case when log aggregation is enabled, if a user submits MapReduce job and runs $ yarn logs -applicationId app ID while the YARN application is running, the command will return no message and return user back to shell. It is nice to tell the user that log aggregation is in progress. {code} -bash-4.1$ /usr/bin/yarn logs -applicationId application_1377900193583_0002 -bash-4.1$ {code} At the same time, if invalid application ID is given, YARN CLI should say that the application ID is incorrect rather than throwing NoSuchElementException. {code} $ /usr/bin/yarn logs -applicationId application_0 Exception in thread main java.util.NoSuchElementException at com.google.common.base.AbstractIterator.next(AbstractIterator.java:75) at org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:124) at org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:119) at org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:110) at org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:255) {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1197) Support changing resources of an allocated container
[ https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784906#comment-13784906 ] Wangda Tan commented on YARN-1197: -- {quote} For decreasing resources, if the RM is to consider the free resource available only after the AM informs the NM and the NM heartbeats with the RM then this change may become more complicated since the current schedulers dont expect any lag in their allocations. This will also delay the allocation of the free space to others. Also this delay is determined by when the AM syncs with the NM. Thats not a good property. We should probably assume the decrease to be effective immediately and RM-NM sync should enforce that. The downside is that for the duration of the heartbeat interval, the node may get overbooked but that should not be a problem in practice since the container would already be using a lower value of resources before the AM asked its capacity to be decreased. {quote} I think it make sense, AM tell NM first will make RM cannot leverage freed resources, it's not good for heavy-loaded cluster. I'll update document as our discussion and start break down tasks. Please let me know if you have any other comments. Support changing resources of an allocated container Key: YARN-1197 URL: https://issues.apache.org/jira/browse/YARN-1197 Project: Hadoop YARN Issue Type: Task Components: api, nodemanager, resourcemanager Affects Versions: 2.1.0-beta Reporter: Wangda Tan Attachments: yarn-1197.pdf Currently, YARN cannot support merge several containers in one node to a big container, which can make us incrementally ask resources, merge them to a bigger one, and launch our processes. The user scenario is described in the comments. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1197) Support changing resources of an allocated container
[ https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784914#comment-13784914 ] Sandy Ryza commented on YARN-1197: -- bq. I dont think Sandy meant that the AM first tells the NM to decrease the size and then the NM informs the RM. You're right about what I meant. Though thinking about this more, is there any reason a container shrinking needs to get permission from the RM? Should we not treat giving up part of a container in the same way we treat giving up an entire container? I.e. that the app unilaterally decides when to do it. If we need to respect properties like yarn.scheduler.minmum-allocation-mb, the NodeManagers could pick these up and enforce them by rejecting shrinkings. bq. The downside is that for the duration of the heartbeat interval, the node may get overbooked but that should not be a problem in practice since the container would already be using a lower value of resources before the AM asked its capacity to be decreased. Accepting overbooking in this context seems to me like it would open up a bunch of race conditions and compromise a bunch of useful assumptions an administrator can make about what's running on a node at a given time. Do the uses of container shrinking require such low latency? (which we would also achieve by avoiding the round trip to the RM) Support changing resources of an allocated container Key: YARN-1197 URL: https://issues.apache.org/jira/browse/YARN-1197 Project: Hadoop YARN Issue Type: Task Components: api, nodemanager, resourcemanager Affects Versions: 2.1.0-beta Reporter: Wangda Tan Attachments: yarn-1197.pdf Currently, YARN cannot support merge several containers in one node to a big container, which can make us incrementally ask resources, merge them to a bigger one, and launch our processes. The user scenario is described in the comments. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1197) Support changing resources of an allocated container
[ https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784924#comment-13784924 ] Bikas Saha commented on YARN-1197: -- So the suggestion is that increase goes AM(request)-RM(allocation)-AM(increase)-NM and decrease goes AM(decrease)-NM(inform)-RM(consider free)-AM (confirmation from RM similar to completedContainerStatus) ? Support changing resources of an allocated container Key: YARN-1197 URL: https://issues.apache.org/jira/browse/YARN-1197 Project: Hadoop YARN Issue Type: Task Components: api, nodemanager, resourcemanager Affects Versions: 2.1.0-beta Reporter: Wangda Tan Attachments: yarn-1197.pdf Currently, YARN cannot support merge several containers in one node to a big container, which can make us incrementally ask resources, merge them to a bigger one, and launch our processes. The user scenario is described in the comments. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-425) coverage fix for yarn api
[ https://issues.apache.org/jira/browse/YARN-425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784985#comment-13784985 ] Hudson commented on YARN-425: - FAILURE: Integrated in Hadoop-Yarn-trunk #351 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/351/]) YARN-425. coverage fix for yarn api (Aleksey Gorshkov via jeagles) (jeagles: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1528641) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestResourceManagerAdministrationProtocolPBClientImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestYarnApiClasses.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/resources * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/resources/config-with-security.xml coverage fix for yarn api - Key: YARN-425 URL: https://issues.apache.org/jira/browse/YARN-425 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 3.0.0, 0.23.7, 2.1.0-beta Reporter: Aleksey Gorshkov Assignee: Aleksey Gorshkov Fix For: 3.0.0, 2.3.0 Attachments: YARN-425-branch-0.23-d.patch, YARN-425-branch-0.23.patch, YARN-425-branch-0.23-v1.patch, YARN-425-branch-2-b.patch, YARN-425-branch-2-c.patch, YARN-425-branch-2.patch, YARN-425-branch-2-v1.patch, YARN-425-trunk-a.patch, YARN-425-trunk-b.patch, YARN-425-trunk-c.patch, YARN-425-trunk-d.patch, YARN-425-trunk.patch, YARN-425-trunk-v1.patch, YARN-425-trunk-v2.patch coverage fix for yarn api patch YARN-425-trunk-a.patch for trunk patch YARN-425-branch-2.patch for branch-2 patch YARN-425-branch-0.23.patch for branch-0.23 -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1141) Updating resource requests should be decoupled with updating blacklist
[ https://issues.apache.org/jira/browse/YARN-1141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784987#comment-13784987 ] Hudson commented on YARN-1141: -- FAILURE: Integrated in Hadoop-Yarn-trunk #351 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/351/]) YARN-1141. Updating resource requests should be decoupled with updating blacklist (Zhijie Shen via bikas) (bikas: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1528632) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSSchedulerApp.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java Updating resource requests should be decoupled with updating blacklist -- Key: YARN-1141 URL: https://issues.apache.org/jira/browse/YARN-1141 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Zhijie Shen Fix For: 2.1.2-beta Attachments: YARN-1141.1.patch, YARN-1141.2.patch, YARN-1141.3.patch Currently, in CapacityScheduler and FifoScheduler, blacklist is updated together with resource requests, only when the incoming resource requests are not empty. Therefore, when the incoming resource requests are empty, the blacklist will not be updated even when blacklist additions and removals are not empty. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-876) Node resource is added twice when node comes back from unhealthy to healthy
[ https://issues.apache.org/jira/browse/YARN-876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784983#comment-13784983 ] Hudson commented on YARN-876: - FAILURE: Integrated in Hadoop-Yarn-trunk #351 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/351/]) YARN-876. Node resource is added twice when node comes back from unhealthy. (Peng Zhang via Sandy Ryza) (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1528660) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java Node resource is added twice when node comes back from unhealthy to healthy --- Key: YARN-876 URL: https://issues.apache.org/jira/browse/YARN-876 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: PengZhang Assignee: PengZhang Fix For: 2.1.2-beta Attachments: YARN-876.patch When an unhealthy restarts, its resource maybe added twice in scheduler. First time is at node's reconnection, while node's final state is still UNHEALTHY. And second time is at node's update, while node's state changing from UNHEALTHY to HEALTHY. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1213) Restore config to ban submitting to undeclared pools in the Fair Scheduler
[ https://issues.apache.org/jira/browse/YARN-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784984#comment-13784984 ] Hudson commented on YARN-1213: -- FAILURE: Integrated in Hadoop-Yarn-trunk #351 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/351/]) YARN-1213. Restore config to ban submitting to undeclared pools in the Fair Scheduler. (Sandy Ryza) (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1528696) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerConfiguration.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java Restore config to ban submitting to undeclared pools in the Fair Scheduler -- Key: YARN-1213 URL: https://issues.apache.org/jira/browse/YARN-1213 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Affects Versions: 2.1.0-beta Reporter: Sandy Ryza Assignee: Sandy Ryza Fix For: 2.1.2-beta Attachments: YARN-1213.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-677) Increase coverage to FairScheduler
[ https://issues.apache.org/jira/browse/YARN-677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784988#comment-13784988 ] Hudson commented on YARN-677: - FAILURE: Integrated in Hadoop-Yarn-trunk #351 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/351/]) YARN-677. Increase coverage to FairScheduler (Vadim Bondarev and Dennis Y via jeagles) (jeagles: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1528524) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java Increase coverage to FairScheduler -- Key: YARN-677 URL: https://issues.apache.org/jira/browse/YARN-677 Project: Hadoop YARN Issue Type: Test Affects Versions: 3.0.0, 2.0.3-alpha, 0.23.6 Reporter: Vadim Bondarev Fix For: 3.0.0, 2.3.0 Attachments: HADOOP-4536-branch-2-a.patch, HADOOP-4536-branch-2c.patch, HADOOP-4536-trunk-a.patch, HADOOP-4536-trunk-c.patch, HDFS-4536-branch-2--N7.patch, HDFS-4536-branch-2--N8.patch, HDFS-4536-branch-2-N9.patch, HDFS-4536-trunk--N6.patch, HDFS-4536-trunk--N7.patch, HDFS-4536-trunk--N8.patch, HDFS-4536-trunk-N9.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-876) Node resource is added twice when node comes back from unhealthy to healthy
[ https://issues.apache.org/jira/browse/YARN-876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785080#comment-13785080 ] Hudson commented on YARN-876: - SUCCESS: Integrated in Hadoop-Hdfs-trunk #1541 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1541/]) YARN-876. Node resource is added twice when node comes back from unhealthy. (Peng Zhang via Sandy Ryza) (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1528660) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java Node resource is added twice when node comes back from unhealthy to healthy --- Key: YARN-876 URL: https://issues.apache.org/jira/browse/YARN-876 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: PengZhang Assignee: PengZhang Fix For: 2.1.2-beta Attachments: YARN-876.patch When an unhealthy restarts, its resource maybe added twice in scheduler. First time is at node's reconnection, while node's final state is still UNHEALTHY. And second time is at node's update, while node's state changing from UNHEALTHY to HEALTHY. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-425) coverage fix for yarn api
[ https://issues.apache.org/jira/browse/YARN-425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785082#comment-13785082 ] Hudson commented on YARN-425: - SUCCESS: Integrated in Hadoop-Hdfs-trunk #1541 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1541/]) YARN-425. coverage fix for yarn api (Aleksey Gorshkov via jeagles) (jeagles: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1528641) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestResourceManagerAdministrationProtocolPBClientImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestYarnApiClasses.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/resources * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/resources/config-with-security.xml coverage fix for yarn api - Key: YARN-425 URL: https://issues.apache.org/jira/browse/YARN-425 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 3.0.0, 0.23.7, 2.1.0-beta Reporter: Aleksey Gorshkov Assignee: Aleksey Gorshkov Fix For: 3.0.0, 2.3.0 Attachments: YARN-425-branch-0.23-d.patch, YARN-425-branch-0.23.patch, YARN-425-branch-0.23-v1.patch, YARN-425-branch-2-b.patch, YARN-425-branch-2-c.patch, YARN-425-branch-2.patch, YARN-425-branch-2-v1.patch, YARN-425-trunk-a.patch, YARN-425-trunk-b.patch, YARN-425-trunk-c.patch, YARN-425-trunk-d.patch, YARN-425-trunk.patch, YARN-425-trunk-v1.patch, YARN-425-trunk-v2.patch coverage fix for yarn api patch YARN-425-trunk-a.patch for trunk patch YARN-425-branch-2.patch for branch-2 patch YARN-425-branch-0.23.patch for branch-0.23 -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1213) Restore config to ban submitting to undeclared pools in the Fair Scheduler
[ https://issues.apache.org/jira/browse/YARN-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785081#comment-13785081 ] Hudson commented on YARN-1213: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1541 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1541/]) YARN-1213. Restore config to ban submitting to undeclared pools in the Fair Scheduler. (Sandy Ryza) (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1528696) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerConfiguration.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java Restore config to ban submitting to undeclared pools in the Fair Scheduler -- Key: YARN-1213 URL: https://issues.apache.org/jira/browse/YARN-1213 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Affects Versions: 2.1.0-beta Reporter: Sandy Ryza Assignee: Sandy Ryza Fix For: 2.1.2-beta Attachments: YARN-1213.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-677) Increase coverage to FairScheduler
[ https://issues.apache.org/jira/browse/YARN-677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785085#comment-13785085 ] Hudson commented on YARN-677: - SUCCESS: Integrated in Hadoop-Hdfs-trunk #1541 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1541/]) YARN-677. Increase coverage to FairScheduler (Vadim Bondarev and Dennis Y via jeagles) (jeagles: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1528524) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java Increase coverage to FairScheduler -- Key: YARN-677 URL: https://issues.apache.org/jira/browse/YARN-677 Project: Hadoop YARN Issue Type: Test Affects Versions: 3.0.0, 2.0.3-alpha, 0.23.6 Reporter: Vadim Bondarev Fix For: 3.0.0, 2.3.0 Attachments: HADOOP-4536-branch-2-a.patch, HADOOP-4536-branch-2c.patch, HADOOP-4536-trunk-a.patch, HADOOP-4536-trunk-c.patch, HDFS-4536-branch-2--N7.patch, HDFS-4536-branch-2--N8.patch, HDFS-4536-branch-2-N9.patch, HDFS-4536-trunk--N6.patch, HDFS-4536-trunk--N7.patch, HDFS-4536-trunk--N8.patch, HDFS-4536-trunk-N9.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1213) Restore config to ban submitting to undeclared pools in the Fair Scheduler
[ https://issues.apache.org/jira/browse/YARN-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785204#comment-13785204 ] Hudson commented on YARN-1213: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1567 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1567/]) YARN-1213. Restore config to ban submitting to undeclared pools in the Fair Scheduler. (Sandy Ryza) (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1528696) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerConfiguration.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java Restore config to ban submitting to undeclared pools in the Fair Scheduler -- Key: YARN-1213 URL: https://issues.apache.org/jira/browse/YARN-1213 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Affects Versions: 2.1.0-beta Reporter: Sandy Ryza Assignee: Sandy Ryza Fix For: 2.1.2-beta Attachments: YARN-1213.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-425) coverage fix for yarn api
[ https://issues.apache.org/jira/browse/YARN-425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785205#comment-13785205 ] Hudson commented on YARN-425: - FAILURE: Integrated in Hadoop-Mapreduce-trunk #1567 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1567/]) YARN-425. coverage fix for yarn api (Aleksey Gorshkov via jeagles) (jeagles: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1528641) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestResourceManagerAdministrationProtocolPBClientImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestYarnApiClasses.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/resources * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/resources/config-with-security.xml coverage fix for yarn api - Key: YARN-425 URL: https://issues.apache.org/jira/browse/YARN-425 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 3.0.0, 0.23.7, 2.1.0-beta Reporter: Aleksey Gorshkov Assignee: Aleksey Gorshkov Fix For: 3.0.0, 2.3.0 Attachments: YARN-425-branch-0.23-d.patch, YARN-425-branch-0.23.patch, YARN-425-branch-0.23-v1.patch, YARN-425-branch-2-b.patch, YARN-425-branch-2-c.patch, YARN-425-branch-2.patch, YARN-425-branch-2-v1.patch, YARN-425-trunk-a.patch, YARN-425-trunk-b.patch, YARN-425-trunk-c.patch, YARN-425-trunk-d.patch, YARN-425-trunk.patch, YARN-425-trunk-v1.patch, YARN-425-trunk-v2.patch coverage fix for yarn api patch YARN-425-trunk-a.patch for trunk patch YARN-425-branch-2.patch for branch-2 patch YARN-425-branch-0.23.patch for branch-0.23 -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-876) Node resource is added twice when node comes back from unhealthy to healthy
[ https://issues.apache.org/jira/browse/YARN-876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785203#comment-13785203 ] Hudson commented on YARN-876: - FAILURE: Integrated in Hadoop-Mapreduce-trunk #1567 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1567/]) YARN-876. Node resource is added twice when node comes back from unhealthy. (Peng Zhang via Sandy Ryza) (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1528660) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java Node resource is added twice when node comes back from unhealthy to healthy --- Key: YARN-876 URL: https://issues.apache.org/jira/browse/YARN-876 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: PengZhang Assignee: PengZhang Fix For: 2.1.2-beta Attachments: YARN-876.patch When an unhealthy restarts, its resource maybe added twice in scheduler. First time is at node's reconnection, while node's final state is still UNHEALTHY. And second time is at node's update, while node's state changing from UNHEALTHY to HEALTHY. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-677) Increase coverage to FairScheduler
[ https://issues.apache.org/jira/browse/YARN-677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785208#comment-13785208 ] Hudson commented on YARN-677: - FAILURE: Integrated in Hadoop-Mapreduce-trunk #1567 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1567/]) YARN-677. Increase coverage to FairScheduler (Vadim Bondarev and Dennis Y via jeagles) (jeagles: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1528524) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java Increase coverage to FairScheduler -- Key: YARN-677 URL: https://issues.apache.org/jira/browse/YARN-677 Project: Hadoop YARN Issue Type: Test Affects Versions: 3.0.0, 2.0.3-alpha, 0.23.6 Reporter: Vadim Bondarev Fix For: 3.0.0, 2.3.0 Attachments: HADOOP-4536-branch-2-a.patch, HADOOP-4536-branch-2c.patch, HADOOP-4536-trunk-a.patch, HADOOP-4536-trunk-c.patch, HDFS-4536-branch-2--N7.patch, HDFS-4536-branch-2--N8.patch, HDFS-4536-branch-2-N9.patch, HDFS-4536-trunk--N6.patch, HDFS-4536-trunk--N7.patch, HDFS-4536-trunk--N8.patch, HDFS-4536-trunk-N9.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1141) Updating resource requests should be decoupled with updating blacklist
[ https://issues.apache.org/jira/browse/YARN-1141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785207#comment-13785207 ] Hudson commented on YARN-1141: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1567 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1567/]) YARN-1141. Updating resource requests should be decoupled with updating blacklist (Zhijie Shen via bikas) (bikas: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1528632) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSSchedulerApp.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java Updating resource requests should be decoupled with updating blacklist -- Key: YARN-1141 URL: https://issues.apache.org/jira/browse/YARN-1141 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Zhijie Shen Fix For: 2.1.2-beta Attachments: YARN-1141.1.patch, YARN-1141.2.patch, YARN-1141.3.patch Currently, in CapacityScheduler and FifoScheduler, blacklist is updated together with resource requests, only when the incoming resource requests are not empty. Therefore, when the incoming resource requests are empty, the blacklist will not be updated even when blacklist additions and removals are not empty. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1267) Refactor cgroup logic out of LCE into a standalone binary
[ https://issues.apache.org/jira/browse/YARN-1267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-1267: - Target Version/s: 2.3.0 Fix Version/s: (was: 2.3.0) Refactor cgroup logic out of LCE into a standalone binary - Key: YARN-1267 URL: https://issues.apache.org/jira/browse/YARN-1267 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.1.2-beta Reporter: Alejandro Abdelnur Assignee: Roman Shaposhnik As discussed in YARN-1253 we should consider decoupling cgroups handling from the LCE. YARN-3 initially had a proposal on how this could be done, we should see if any of that make sense in the current state of things. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-677) Increase coverage to FairScheduler
[ https://issues.apache.org/jira/browse/YARN-677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785276#comment-13785276 ] Jonathan Eagles commented on YARN-677: -- Thanks, Sandy. Let me take a look at the coverage numbers before this patch went in. In the mean time I will revert until I can prove we need this coverage patch. Increase coverage to FairScheduler -- Key: YARN-677 URL: https://issues.apache.org/jira/browse/YARN-677 Project: Hadoop YARN Issue Type: Test Affects Versions: 3.0.0, 2.0.3-alpha, 0.23.6 Reporter: Vadim Bondarev Fix For: 3.0.0, 2.3.0 Attachments: HADOOP-4536-branch-2-a.patch, HADOOP-4536-branch-2c.patch, HADOOP-4536-trunk-a.patch, HADOOP-4536-trunk-c.patch, HDFS-4536-branch-2--N7.patch, HDFS-4536-branch-2--N8.patch, HDFS-4536-branch-2-N9.patch, HDFS-4536-trunk--N6.patch, HDFS-4536-trunk--N7.patch, HDFS-4536-trunk--N8.patch, HDFS-4536-trunk-N9.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Comment Edited] (YARN-1197) Support changing resources of an allocated container
[ https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785293#comment-13785293 ] Alejandro Abdelnur edited comment on YARN-1197 at 10/3/13 3:47 PM: --- [~gp.leftnoteasy], thanks for your previous answer, it makes sense. We've been thinking about this a while ago in the context of Llama for Impala-Yarn integration. Along the lines of what Sandy suggested, just a couple of extra comments. For decreasing AM can request the correction, effective immediately to the NM. the NM reports the container correction and new free space to the RM in the next heartbeat. Regarding enforcing minimum, configuration properties are scheduler specific, so the minimum should have to come to the NM from the RM as part of the registration response. For increasing the AM must go to the RM first o avoid the race conditions already mentioned. To reduce the changes in the RM to a minimum I was thinking the following approach: * AM does a regular new allocation request with the desired delta capabilities increases with relaxedLocality=false (no changes on the AM-RM protocol/logic). * AM waits for the delta container allocation from the RM. * When AM receives the delta container allocation, using a new AM-NM API, it updates the original container with the delta container. * The NM makes the necessary corrections locally to the original container adding the capabilities o the delta container. * The NM notifies the RM to merge the original container with the delta container. * The RM updates the original container and drops the delta container. The complete list of changes for this approach would be: * AM-NM API ** decreaseContainer(ContainerId original, Resources) ** increateContainer(ContainerId original, ContainerId delta) * NM-RM API ** decreaseContainer(ContainerId original, Resources) ** registration() - +minimumcontainersize ** mergeContainers(ContainerId originalKeep, ContainerId deltaDiscard) * NM logic ** needs to correct capabilities enforcement for +/- delta * RM logic ** needs to update container resources when receiving a NM's decreaseContainer() call ** needs to update original container resources and delete delta container resources when receiving a NM's mergeContainer() call * RM scheduler API ** it should expose methods for decreaseContainer() and mergeContainers() functionality was (Author: tucu00): [~gp.leftnoteasy], thanks for your previous answer, it makes sense. We've been thinking about this a while ago in the context of Llama for Impala-Yarn integration. Along the lines of what Sandy suggested, just a couple of extra comments. For decreasing AM can request the correction, effective immediately to the NM. the NM reports the container correction and new free space to the RM in the next heartbeat. Regarding enforcing minimum, configuration properties are scheduler specific, so the minimum should have to come to the NM from the RM as part of the registration response. For increasing the AM must go to the RM first o avoid the race conditions already mentioned. To reduce the changes in the RM to a minimum I was thinking the following approach: * AM does a regular new allocation request with the desired delta capabilities increases with relaxedLocality=false (no changes on the AM-RM protocol/logic). * AM waits for the delta container allocation from the RM. * When AM receives the delta container allocation, using a new AM-NM API, it updates the original container with the delta container. * The NM makes the necessary corrections locally to the original container adding the capabilities o the delta container. * The NM notifies the RM to merge the original container with the delta container. * The RM updates the original container and drops the delta container. The complete list of changes for this approach would be: * AM-NM API ** decreaseContainer(ContainerId original, Resources) ** increateContainer(ContainerId original, ContainerId delta) * NM-RM API ** decreaseContainer(ContainerId original, Resources) ** registration() - +minimumcontainersize ** mergeContainers(ContainerId originalKeep, ContainerId deltaDiscard) * NM logic * needs to correct capabilities enforcement for +/- delta * RM logic ** needs to update container resources when receiving a NM's decreaseContainer() call ** needs to update original container resources and delete delta container resources when receiving a NM's mergeContainer() call * RM scheduler API ** it should expose methods for decreaseContainer() and mergeContainers() functionality Support changing resources of an allocated container Key: YARN-1197 URL: https://issues.apache.org/jira/browse/YARN-1197 Project: Hadoop YARN
[jira] [Commented] (YARN-1197) Support changing resources of an allocated container
[ https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785293#comment-13785293 ] Alejandro Abdelnur commented on YARN-1197: -- [~gp.leftnoteasy], thanks for your previous answer, it makes sense. We've been thinking about this a while ago in the context of Llama for Impala-Yarn integration. Along the lines of what Sandy suggested, just a couple of extra comments. For decreasing AM can request the correction, effective immediately to the NM. the NM reports the container correction and new free space to the RM in the next heartbeat. Regarding enforcing minimum, configuration properties are scheduler specific, so the minimum should have to come to the NM from the RM as part of the registration response. For increasing the AM must go to the RM first o avoid the race conditions already mentioned. To reduce the changes in the RM to a minimum I was thinking the following approach: * AM does a regular new allocation request with the desired delta capabilities increases with relaxedLocality=false (no changes on the AM-RM protocol/logic). * AM waits for the delta container allocation from the RM. * When AM receives the delta container allocation, using a new AM-NM API, it updates the original container with the delta container. * The NM makes the necessary corrections locally to the original container adding the capabilities o the delta container. * The NM notifies the RM to merge the original container with the delta container. * The RM updates the original container and drops the delta container. The complete list of changes for this approach would be: * AM-NM API * decreaseContainer(ContainerId original, Resources) * increateContainer(ContainerId original, ContainerId delta) * NM-RM API * decreaseContainer(ContainerId original, Resources) * registration() - +minimumcontainersize * mergeContainers(ContainerId originalKeep, ContainerId deltaDiscard) * NM logic * needs to correct capabilities enforcement for +/- delta * RM logic * needs to update container resources when receiving a NM's decreaseContainer() call * needs to update original container resources and delete delta container resources when receiving a NM's mergeContainer() call * RM scheduler API * it should expose methods for decreaseContainer() and mergeContainers() functionality Support changing resources of an allocated container Key: YARN-1197 URL: https://issues.apache.org/jira/browse/YARN-1197 Project: Hadoop YARN Issue Type: Task Components: api, nodemanager, resourcemanager Affects Versions: 2.1.0-beta Reporter: Wangda Tan Attachments: yarn-1197.pdf Currently, YARN cannot support merge several containers in one node to a big container, which can make us incrementally ask resources, merge them to a bigger one, and launch our processes. The user scenario is described in the comments. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-677) Increase coverage to FairScheduler
[ https://issues.apache.org/jira/browse/YARN-677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles updated YARN-677: - Fix Version/s: (was: 2.3.0) (was: 3.0.0) Increase coverage to FairScheduler -- Key: YARN-677 URL: https://issues.apache.org/jira/browse/YARN-677 Project: Hadoop YARN Issue Type: Test Affects Versions: 3.0.0, 2.0.3-alpha, 0.23.6 Reporter: Vadim Bondarev Attachments: HADOOP-4536-branch-2-a.patch, HADOOP-4536-branch-2c.patch, HADOOP-4536-trunk-a.patch, HADOOP-4536-trunk-c.patch, HDFS-4536-branch-2--N7.patch, HDFS-4536-branch-2--N8.patch, HDFS-4536-branch-2-N9.patch, HDFS-4536-trunk--N6.patch, HDFS-4536-trunk--N7.patch, HDFS-4536-trunk--N8.patch, HDFS-4536-trunk-N9.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-677) Increase coverage to FairScheduler
[ https://issues.apache.org/jira/browse/YARN-677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785305#comment-13785305 ] Hudson commented on YARN-677: - SUCCESS: Integrated in Hadoop-trunk-Commit #4525 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4525/]) Revert YARN-677. Increase coverage to FairScheduler (Vadim Bondarev and Dennis Y via jeagles) (jeagles: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1528914) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java Increase coverage to FairScheduler -- Key: YARN-677 URL: https://issues.apache.org/jira/browse/YARN-677 Project: Hadoop YARN Issue Type: Test Affects Versions: 3.0.0, 2.0.3-alpha, 0.23.6 Reporter: Vadim Bondarev Attachments: HADOOP-4536-branch-2-a.patch, HADOOP-4536-branch-2c.patch, HADOOP-4536-trunk-a.patch, HADOOP-4536-trunk-c.patch, HDFS-4536-branch-2--N7.patch, HDFS-4536-branch-2--N8.patch, HDFS-4536-branch-2-N9.patch, HDFS-4536-trunk--N6.patch, HDFS-4536-trunk--N7.patch, HDFS-4536-trunk--N8.patch, HDFS-4536-trunk-N9.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-867) Isolation of failures in aux services
[ https://issues.apache.org/jira/browse/YARN-867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785306#comment-13785306 ] Hitesh Shah commented on YARN-867: -- [~xgong] [~bikassaha] [~vinodkv] It seems like this fix is getting quite complex and the introduction of container failure on service event handling has a possibility of introducing a lot of different race conditions. I propose the following: - Add the code for catch Throwable whenever an aux service is invoked for handling the container related events ( app init, container start, container stop, app cleanup ). And, do not fail the container if an exception is thrown. - A simpler check could be done to match the service metadata from the ContainerLaunchContext and ensure that the service is configured on the NM in question. Using the above, at the very least, we can catch issues related to mis-configured NMs where the shuffle service is not configured. This is way simpler as it could be done a simple synchronous check when handling the startContainers rpc call. This could be targeted to 2.1.2/2.2.0 As for the failing containers, I propose that we target fixing the feedback of failed containers back to the AM on service handling errors in 2.3.0. For the 2.3.0 targeted jira, I would prefer to increase the scope of this to design for differentiating critical vs non-critical services so as to have the framework in place to understand which service's errors result in failed containers. Comments? Isolation of failures in aux services -- Key: YARN-867 URL: https://issues.apache.org/jira/browse/YARN-867 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Hitesh Shah Assignee: Xuan Gong Priority: Critical Attachments: YARN-867.1.sampleCode.patch, YARN-867.3.patch, YARN-867.4.patch, YARN-867.5.patch, YARN-867.sampleCode.2.patch Today, a malicious application can bring down the NM by sending bad data to a service. For example, sending data to the ShuffleService such that it results any non-IOException will cause the NM's async dispatcher to exit as the service's INIT APP event is not handled properly. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-867) Isolation of failures in aux services
[ https://issues.apache.org/jira/browse/YARN-867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-867: --- Attachment: YARN-867.6.patch Simply logging and taking no action for catch Throwable whenever an aux service is invoked for handling the container related events ( app init, container start, container stop, app cleanup ) Isolation of failures in aux services -- Key: YARN-867 URL: https://issues.apache.org/jira/browse/YARN-867 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Hitesh Shah Assignee: Xuan Gong Priority: Critical Attachments: YARN-867.1.sampleCode.patch, YARN-867.3.patch, YARN-867.4.patch, YARN-867.5.patch, YARN-867.6.patch, YARN-867.sampleCode.2.patch Today, a malicious application can bring down the NM by sending bad data to a service. For example, sending data to the ShuffleService such that it results any non-IOException will cause the NM's async dispatcher to exit as the service's INIT APP event is not handled properly. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-677) Increase coverage to FairScheduler
[ https://issues.apache.org/jira/browse/YARN-677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785354#comment-13785354 ] Sandy Ryza commented on YARN-677: - Thanks, Jonathan. Increase coverage to FairScheduler -- Key: YARN-677 URL: https://issues.apache.org/jira/browse/YARN-677 Project: Hadoop YARN Issue Type: Test Affects Versions: 3.0.0, 2.0.3-alpha, 0.23.6 Reporter: Vadim Bondarev Attachments: HADOOP-4536-branch-2-a.patch, HADOOP-4536-branch-2c.patch, HADOOP-4536-trunk-a.patch, HADOOP-4536-trunk-c.patch, HDFS-4536-branch-2--N7.patch, HDFS-4536-branch-2--N8.patch, HDFS-4536-branch-2-N9.patch, HDFS-4536-trunk--N6.patch, HDFS-4536-trunk--N7.patch, HDFS-4536-trunk--N8.patch, HDFS-4536-trunk-N9.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-867) Isolation of failures in aux services
[ https://issues.apache.org/jira/browse/YARN-867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785364#comment-13785364 ] Alejandro Abdelnur commented on YARN-867: - the try/catch should be around each aux service method invocation so a failure of a given service does not affect delivery to other services. Isolation of failures in aux services -- Key: YARN-867 URL: https://issues.apache.org/jira/browse/YARN-867 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Hitesh Shah Assignee: Xuan Gong Priority: Critical Attachments: YARN-867.1.sampleCode.patch, YARN-867.3.patch, YARN-867.4.patch, YARN-867.5.patch, YARN-867.6.patch, YARN-867.sampleCode.2.patch Today, a malicious application can bring down the NM by sending bad data to a service. For example, sending data to the ShuffleService such that it results any non-IOException will cause the NM's async dispatcher to exit as the service's INIT APP event is not handled properly. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-867) Isolation of failures in aux services
[ https://issues.apache.org/jira/browse/YARN-867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785373#comment-13785373 ] Bikas Saha commented on YARN-867: - bq. Using the above, at the very least, we can catch issues related to mis-configured NMs where the shuffle service is not configured. This is way simpler as it could be done a simple synchronous check when handling the startContainers rpc call. This could be targeted to 2.1.2/2.2.0 @hitesh, I agree. In that case shall we leave re-target this jira to 2.3 and use YARN-1256 to fix the misconfigured service and exception logging? Isolation of failures in aux services -- Key: YARN-867 URL: https://issues.apache.org/jira/browse/YARN-867 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Hitesh Shah Assignee: Xuan Gong Priority: Critical Attachments: YARN-867.1.sampleCode.patch, YARN-867.3.patch, YARN-867.4.patch, YARN-867.5.patch, YARN-867.6.patch, YARN-867.sampleCode.2.patch Today, a malicious application can bring down the NM by sending bad data to a service. For example, sending data to the ShuffleService such that it results any non-IOException will cause the NM's async dispatcher to exit as the service's INIT APP event is not handled properly. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-867) Isolation of failures in aux services
[ https://issues.apache.org/jira/browse/YARN-867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785371#comment-13785371 ] Hadoop QA commented on YARN-867: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12606599/YARN-867.6.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2077//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2077//console This message is automatically generated. Isolation of failures in aux services -- Key: YARN-867 URL: https://issues.apache.org/jira/browse/YARN-867 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Hitesh Shah Assignee: Xuan Gong Priority: Critical Attachments: YARN-867.1.sampleCode.patch, YARN-867.3.patch, YARN-867.4.patch, YARN-867.5.patch, YARN-867.6.patch, YARN-867.sampleCode.2.patch Today, a malicious application can bring down the NM by sending bad data to a service. For example, sending data to the ShuffleService such that it results any non-IOException will cause the NM's async dispatcher to exit as the service's INIT APP event is not handled properly. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-867) Isolation of failures in aux services
[ https://issues.apache.org/jira/browse/YARN-867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785380#comment-13785380 ] Hitesh Shah commented on YARN-867: -- +1 to Bikas's suggestion. Isolation of failures in aux services -- Key: YARN-867 URL: https://issues.apache.org/jira/browse/YARN-867 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Hitesh Shah Assignee: Xuan Gong Priority: Critical Attachments: YARN-867.1.sampleCode.patch, YARN-867.3.patch, YARN-867.4.patch, YARN-867.5.patch, YARN-867.6.patch, YARN-867.sampleCode.2.patch Today, a malicious application can bring down the NM by sending bad data to a service. For example, sending data to the ShuffleService such that it results any non-IOException will cause the NM's async dispatcher to exit as the service's INIT APP event is not handled properly. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1219) FSDownload changes file suffix making FileUtil.unTar() throw exception
[ https://issues.apache.org/jira/browse/YARN-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated YARN-1219: Hadoop Flags: Reviewed +1 for the patch. I verified on both Mac and Windows. I plan to commit this later today. FSDownload changes file suffix making FileUtil.unTar() throw exception -- Key: YARN-1219 URL: https://issues.apache.org/jira/browse/YARN-1219 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 3.0.0, 2.1.1-beta, 2.1.2-beta Reporter: shanyu zhao Assignee: shanyu zhao Fix For: 2.1.2-beta Attachments: YARN-1219.patch While running a Hive join operation on Yarn, I saw exception as described below. This is caused by FSDownload copy the files into a temp file and change the suffix into .tmp before unpacking it. In unpack(), it uses FileUtil.unTar() which will determine if the file is gzipped by looking at the file suffix: {code} boolean gzipped = inFile.toString().endsWith(gz); {code} To fix this problem, we can remove the .tmp in the temp file name. Here is the detailed exception: org.apache.commons.compress.archivers.tar.TarArchiveInputStream.getNextTarEntry(TarArchiveInputStream.java:240) at org.apache.hadoop.fs.FileUtil.unTarUsingJava(FileUtil.java:676) at org.apache.hadoop.fs.FileUtil.unTar(FileUtil.java:625) at org.apache.hadoop.yarn.util.FSDownload.unpack(FSDownload.java:203) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:287) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:50) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-465) fix coverage org.apache.hadoop.yarn.server.webproxy
[ https://issues.apache.org/jira/browse/YARN-465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785381#comment-13785381 ] Andrey Klochkov commented on YARN-465: -- The robot failed when testing the branch-2 patch against trunk, this is expected. fix coverage org.apache.hadoop.yarn.server.webproxy Key: YARN-465 URL: https://issues.apache.org/jira/browse/YARN-465 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 3.0.0, 0.23.7, 2.0.4-alpha Reporter: Aleksey Gorshkov Assignee: Aleksey Gorshkov Attachments: YARN-465-branch-0.23-a.patch, YARN-465-branch-0.23.patch, YARN-465-branch-2-a.patch, YARN-465-branch-2--n3.patch, YARN-465-branch-2.patch, YARN-465-trunk-a.patch, YARN-465-trunk--n3.patch, YARN-465-trunk.patch fix coverage org.apache.hadoop.yarn.server.webproxy patch YARN-465-trunk.patch for trunk patch YARN-465-branch-2.patch for branch-2 patch YARN-465-branch-0.23.patch for branch-0.23 There is issue in branch-0.23 . Patch does not creating .keep file. For fix it need to run commands: mkdir yhadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/webapps/proxy touch yhadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/webapps/proxy/.keep -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Assigned] (YARN-677) Increase coverage to FairScheduler
[ https://issues.apache.org/jira/browse/YARN-677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrey Klochkov reassigned YARN-677: Assignee: Andrey Klochkov Increase coverage to FairScheduler -- Key: YARN-677 URL: https://issues.apache.org/jira/browse/YARN-677 Project: Hadoop YARN Issue Type: Test Affects Versions: 3.0.0, 2.0.3-alpha, 0.23.6 Reporter: Vadim Bondarev Assignee: Andrey Klochkov Attachments: HADOOP-4536-branch-2-a.patch, HADOOP-4536-branch-2c.patch, HADOOP-4536-trunk-a.patch, HADOOP-4536-trunk-c.patch, HDFS-4536-branch-2--N7.patch, HDFS-4536-branch-2--N8.patch, HDFS-4536-branch-2-N9.patch, HDFS-4536-trunk--N6.patch, HDFS-4536-trunk--N7.patch, HDFS-4536-trunk--N8.patch, HDFS-4536-trunk-N9.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Assigned] (YARN-1256) NM silently ignores non-existent service in StartContainerRequest
[ https://issues.apache.org/jira/browse/YARN-1256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong reassigned YARN-1256: --- Assignee: Xuan Gong NM silently ignores non-existent service in StartContainerRequest - Key: YARN-1256 URL: https://issues.apache.org/jira/browse/YARN-1256 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.1-beta Reporter: Bikas Saha Assignee: Xuan Gong Priority: Critical Fix For: 2.1.2-beta A container can set token service metadata for a service, say shuffle_service. If that service does not exist then the errors is silently ignored. Later, when the next container wants to access data written to shuffle_service by the first task, then it fails because the service does not have the token that was supposed to be set by the first task. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1256) NM silently ignores non-existent service in StartContainerRequest
[ https://issues.apache.org/jira/browse/YARN-1256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1256: Attachment: YARN-1256.1.patch Simply logging and taking no action NM silently ignores non-existent service in StartContainerRequest - Key: YARN-1256 URL: https://issues.apache.org/jira/browse/YARN-1256 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.1-beta Reporter: Bikas Saha Assignee: Xuan Gong Priority: Critical Fix For: 2.1.2-beta Attachments: YARN-1256.1.patch A container can set token service metadata for a service, say shuffle_service. If that service does not exist then the errors is silently ignored. Later, when the next container wants to access data written to shuffle_service by the first task, then it fails because the service does not have the token that was supposed to be set by the first task. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1197) Support changing resources of an allocated container
[ https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785403#comment-13785403 ] Alejandro Abdelnur commented on YARN-1197: -- Bikas, makes sense, thanks for summarizing. On the decreasing. Given that we also do a round loop AM-NM-RM-AM, Why not make it a bit more symmetric *AM asks RM to decrease a container *RM notifies NM on next heartbeat about container decreasing With this approach the RM can enforce the MIN on AM decrease and reject it if below MIN. Also, there is not need to notify the AM of the decrease taking place as the AM requested that. And as it is a decrease the AM can instruct the container to shrink even if the RM does not told the NM yet. Furthermore, I would expect an AM instructs a container to shrink before asking Yarn to avoid a race condition that could kill the container for using more resources than it should. Also, by doing this there would not be difference in the free resources bookkeeping in the RM and the NMs. Thing that may be handy not to complicate things for YARN-311. Thoughts? Support changing resources of an allocated container Key: YARN-1197 URL: https://issues.apache.org/jira/browse/YARN-1197 Project: Hadoop YARN Issue Type: Task Components: api, nodemanager, resourcemanager Affects Versions: 2.1.0-beta Reporter: Wangda Tan Attachments: yarn-1197.pdf Currently, YARN cannot support merge several containers in one node to a big container, which can make us incrementally ask resources, merge them to a bigger one, and launch our processes. The user scenario is described in the comments. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-677) Increase coverage to FairScheduler
[ https://issues.apache.org/jira/browse/YARN-677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785407#comment-13785407 ] Andrey Klochkov commented on YARN-677: -- I looked at the difference in coverage before and after the patch. There are 2 test methods added: 1. testSchedulerHandleFailWithExternalEvents checks that FairScheduler.handle() throws RuntimeException when supplied with a wrong event type. Actual check is missing so seems like the test will pass in any case. This is a very minor addition to the coverage. If we want to keep it, I can add the check and update the patch. 2. testAggregateCapacityTrackingWithPreemptionEnabled -- not sure about the intention. I see that it adds coverage to the FairScheduler.preemptTasksIfNecessary() method, but basically it just sleeps so the method is invoked, but preemption never happens and the test is not making any checks. I think we can skip this one. Should we keep #1? Increase coverage to FairScheduler -- Key: YARN-677 URL: https://issues.apache.org/jira/browse/YARN-677 Project: Hadoop YARN Issue Type: Test Affects Versions: 3.0.0, 2.0.3-alpha, 0.23.6 Reporter: Vadim Bondarev Assignee: Andrey Klochkov Attachments: HADOOP-4536-branch-2-a.patch, HADOOP-4536-branch-2c.patch, HADOOP-4536-trunk-a.patch, HADOOP-4536-trunk-c.patch, HDFS-4536-branch-2--N7.patch, HDFS-4536-branch-2--N8.patch, HDFS-4536-branch-2-N9.patch, HDFS-4536-trunk--N6.patch, HDFS-4536-trunk--N7.patch, HDFS-4536-trunk--N8.patch, HDFS-4536-trunk-N9.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Assigned] (YARN-465) fix coverage org.apache.hadoop.yarn.server.webproxy
[ https://issues.apache.org/jira/browse/YARN-465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrey Klochkov reassigned YARN-465: Assignee: Andrey Klochkov (was: Aleksey Gorshkov) fix coverage org.apache.hadoop.yarn.server.webproxy Key: YARN-465 URL: https://issues.apache.org/jira/browse/YARN-465 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 3.0.0, 0.23.7, 2.0.4-alpha Reporter: Aleksey Gorshkov Assignee: Andrey Klochkov Attachments: YARN-465-branch-0.23-a.patch, YARN-465-branch-0.23.patch, YARN-465-branch-2-a.patch, YARN-465-branch-2--n3.patch, YARN-465-branch-2.patch, YARN-465-trunk-a.patch, YARN-465-trunk--n3.patch, YARN-465-trunk.patch fix coverage org.apache.hadoop.yarn.server.webproxy patch YARN-465-trunk.patch for trunk patch YARN-465-branch-2.patch for branch-2 patch YARN-465-branch-0.23.patch for branch-0.23 There is issue in branch-0.23 . Patch does not creating .keep file. For fix it need to run commands: mkdir yhadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/webapps/proxy touch yhadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/webapps/proxy/.keep -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-867) Isolation of failures in aux services
[ https://issues.apache.org/jira/browse/YARN-867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785412#comment-13785412 ] Vinod Kumar Vavilapalli commented on YARN-867: -- bq. @hitesh, I agree. In that case shall we leave re-target this jira to 2.3 and use YARN-1256 to fix the misconfigured service and exception logging? +1. +1 also to the earlier suggestion - too late to put it more state machine changes into 2.1.2. Isolation of failures in aux services -- Key: YARN-867 URL: https://issues.apache.org/jira/browse/YARN-867 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Hitesh Shah Assignee: Xuan Gong Priority: Critical Attachments: YARN-867.1.sampleCode.patch, YARN-867.3.patch, YARN-867.4.patch, YARN-867.5.patch, YARN-867.6.patch, YARN-867.sampleCode.2.patch Today, a malicious application can bring down the NM by sending bad data to a service. For example, sending data to the ShuffleService such that it results any non-IOException will cause the NM's async dispatcher to exit as the service's INIT APP event is not handled properly. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1232) Configuration to support multiple RMs
[ https://issues.apache.org/jira/browse/YARN-1232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785428#comment-13785428 ] Bikas Saha commented on YARN-1232: -- Sorry. My bad. I got confused. Patch looks good overall. This should either be RM_ID = RM_PREFIX+id or RM_HA_ID = RM_HA_PREFIX+id Lets be consistent. {code} + public static final String RM_ID = RM_HA_PREFIX + id; {code} Wherever possible, can we simply always call HAUtil methods and let HAUtil handle the if/else. This would help reduce a bunch of if-else blocks scattered in the code. {code} +if (HAUtil.isHAEnabled(this)) { + address = HAUtil.getConfValueForRMId(name, defaultAddress, this); +} else { + address = get(name, defaultAddress); +} {code} Let make a mental note, that when new AlwaysOn services (say RPC) are added then they need to use the updated conf. {code} + void setConf(Configuration configuration) { +conf = configuration; + } {code} Minor nits getRMServiceIds() getRMId() - would help if they both had service or skipped service in the name. getConfValueForRMId(String prefix/getConfValueForRMId(String prefix/setConfValue(String prefix - String rmId instead of prefix? Configuration to support multiple RMs - Key: YARN-1232 URL: https://issues.apache.org/jira/browse/YARN-1232 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Karthik Kambatla Assignee: Karthik Kambatla Labels: ha Attachments: yarn-1232-1.patch, yarn-1232-2.patch, yarn-1232-3.patch, yarn-1232-4.patch, yarn-1232-5.patch, yarn-1232-6.patch We should augment the configuration to allow users specify two RMs and the individual RPC addresses for them. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1232) Configuration to support multiple RMs
[ https://issues.apache.org/jira/browse/YARN-1232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785438#comment-13785438 ] Karthik Kambatla commented on YARN-1232: bq. getConfValueForRMId(String prefix/getConfValueForRMId(String prefix/setConfValue(String prefix - String rmId instead of prefix? Didn't quite understand this comment. Other comments make sense, will try and accommodate them. Configuration to support multiple RMs - Key: YARN-1232 URL: https://issues.apache.org/jira/browse/YARN-1232 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Karthik Kambatla Assignee: Karthik Kambatla Labels: ha Attachments: yarn-1232-1.patch, yarn-1232-2.patch, yarn-1232-3.patch, yarn-1232-4.patch, yarn-1232-5.patch, yarn-1232-6.patch We should augment the configuration to allow users specify two RMs and the individual RPC addresses for them. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1232) Configuration to support multiple RMs
[ https://issues.apache.org/jira/browse/YARN-1232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785441#comment-13785441 ] Bikas Saha commented on YARN-1232: -- I meant lets use String rmId instead of String prefix in the arguments to those methods to clarify that we expect the rm-id to be sent as an argument and not some arbitrary prefix. Isnt that the case? Configuration to support multiple RMs - Key: YARN-1232 URL: https://issues.apache.org/jira/browse/YARN-1232 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Karthik Kambatla Assignee: Karthik Kambatla Labels: ha Attachments: yarn-1232-1.patch, yarn-1232-2.patch, yarn-1232-3.patch, yarn-1232-4.patch, yarn-1232-5.patch, yarn-1232-6.patch We should augment the configuration to allow users specify two RMs and the individual RPC addresses for them. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1197) Support changing resources of an allocated container
[ https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785444#comment-13785444 ] Bikas Saha commented on YARN-1197: -- Sandy had some arguments on why this has race conditions wrt when the RM can start allocating the freed-up resources. Can you please look at the comments above to check if its the same thing or not. Support changing resources of an allocated container Key: YARN-1197 URL: https://issues.apache.org/jira/browse/YARN-1197 Project: Hadoop YARN Issue Type: Task Components: api, nodemanager, resourcemanager Affects Versions: 2.1.0-beta Reporter: Wangda Tan Attachments: yarn-1197.pdf Currently, YARN cannot support merge several containers in one node to a big container, which can make us incrementally ask resources, merge them to a bigger one, and launch our processes. The user scenario is described in the comments. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Moved] (YARN-1269) QueueACLs doesn't work as root allows *
[ https://issues.apache.org/jira/browse/YARN-1269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen moved MAPREDUCE-5557 to YARN-1269: -- Key: YARN-1269 (was: MAPREDUCE-5557) Project: Hadoop YARN (was: Hadoop Map/Reduce) QueueACLs doesn't work as root allows * --- Key: YARN-1269 URL: https://issues.apache.org/jira/browse/YARN-1269 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Zhijie Shen Even if we specify acl for default queue, say user1, user2 can still submit and kill applications on default queue, because the queue checked user2 don't have the access to it, it then checked whether user2 has the access to it's parent recursively, and finally it found user2 have the access to root. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1269) QueueACLs doesn't work as root allows *
[ https://issues.apache.org/jira/browse/YARN-1269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785452#comment-13785452 ] Zhijie Shen commented on YARN-1269: --- We need to configure root not to accept *. However, the following case will have some problem. {code} property nameyarn.scheduler.capacity.root.queue1.acl_submit_applications/name valueuser1/value description The ACL of who can submit jobs to the default queue. /description /property property nameyarn.scheduler.capacity.root.queue2.acl_submit_applications/name valueuser2/value description The ACL of who can submit jobs to the default queue. /description /property {code} If we have the two queues, we definitely don't want to set the users of the root to be the union of the users of both queues. Otherwise, user1 and user2 have the the access to both queues. Maybe we should not check the parent queue access if the parent queue is root? QueueACLs doesn't work as root allows * --- Key: YARN-1269 URL: https://issues.apache.org/jira/browse/YARN-1269 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Zhijie Shen Even if we specify acl for default queue, say user1, user2 can still submit and kill applications on default queue, because the queue checked user2 don't have the access to it, it then checked whether user2 has the access to it's parent recursively, and finally it found user2 have the access to root. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1232) Configuration to support multiple RMs
[ https://issues.apache.org/jira/browse/YARN-1232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785455#comment-13785455 ] Karthik Kambatla commented on YARN-1232: Oh. The prefix is a config key - e.g. yarn.resourcemanager.address. By getConfValueForRMId, we mean get the value of this key for the specific ID mentioned in the Configuration. Configuration to support multiple RMs - Key: YARN-1232 URL: https://issues.apache.org/jira/browse/YARN-1232 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Karthik Kambatla Assignee: Karthik Kambatla Labels: ha Attachments: yarn-1232-1.patch, yarn-1232-2.patch, yarn-1232-3.patch, yarn-1232-4.patch, yarn-1232-5.patch, yarn-1232-6.patch We should augment the configuration to allow users specify two RMs and the individual RPC addresses for them. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1256) NM silently ignores non-existent service in StartContainerRequest
[ https://issues.apache.org/jira/browse/YARN-1256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1256: Attachment: YARN-1256.2.patch Add logic to fail container start if the auxservice can not be found. Remove checking (null == service) from AuxService#handle, since we have already checked in startContainer NM silently ignores non-existent service in StartContainerRequest - Key: YARN-1256 URL: https://issues.apache.org/jira/browse/YARN-1256 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.1-beta Reporter: Bikas Saha Assignee: Xuan Gong Priority: Critical Fix For: 2.1.2-beta Attachments: YARN-1256.1.patch, YARN-1256.2.patch A container can set token service metadata for a service, say shuffle_service. If that service does not exist then the errors is silently ignored. Later, when the next container wants to access data written to shuffle_service by the first task, then it fails because the service does not have the token that was supposed to be set by the first task. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1256) NM silently ignores non-existent service in StartContainerRequest
[ https://issues.apache.org/jira/browse/YARN-1256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785520#comment-13785520 ] Hadoop QA commented on YARN-1256: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12606633/YARN-1256.2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2078//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2078//console This message is automatically generated. NM silently ignores non-existent service in StartContainerRequest - Key: YARN-1256 URL: https://issues.apache.org/jira/browse/YARN-1256 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.1-beta Reporter: Bikas Saha Assignee: Xuan Gong Priority: Critical Fix For: 2.1.2-beta Attachments: YARN-1256.1.patch, YARN-1256.2.patch A container can set token service metadata for a service, say shuffle_service. If that service does not exist then the errors is silently ignored. Later, when the next container wants to access data written to shuffle_service by the first task, then it fails because the service does not have the token that was supposed to be set by the first task. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1199) Make NM/RM Versions Available
[ https://issues.apache.org/jira/browse/YARN-1199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785526#comment-13785526 ] Jonathan Eagles commented on YARN-1199: --- +1. Thanks, Mit. Make NM/RM Versions Available - Key: YARN-1199 URL: https://issues.apache.org/jira/browse/YARN-1199 Project: Hadoop YARN Issue Type: Improvement Reporter: Mit Desai Assignee: Mit Desai Attachments: YARN-1199.patch, YARN-1199.patch, YARN-1199.patch, YARN-1199.patch Now as we have the NM and RM Versions available, we can display the YARN version of nodes running in the cluster. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1270) TestSLSRunner test is failing
[ https://issues.apache.org/jira/browse/YARN-1270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated YARN-1270: Summary: TestSLSRunner test is failing (was: TestSLSRunner is failing) TestSLSRunner test is failing - Key: YARN-1270 URL: https://issues.apache.org/jira/browse/YARN-1270 Project: Hadoop YARN Issue Type: Bug Reporter: Mit Desai Added in the YARn-1021 patch, the test TestSLSRunner is now failing. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1021) Yarn Scheduler Load Simulator
[ https://issues.apache.org/jira/browse/YARN-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785547#comment-13785547 ] Mit Desai commented on YARN-1021: - Hey Wei, FYI, I would like to inform you that the test TestSLSRunner is failing. I have created a new JIRA for that YARN-1270 Yarn Scheduler Load Simulator - Key: YARN-1021 URL: https://issues.apache.org/jira/browse/YARN-1021 Project: Hadoop YARN Issue Type: New Feature Components: scheduler Reporter: Wei Yan Assignee: Wei Yan Fix For: 2.3.0 Attachments: YARN-1021-demo.tar.gz, YARN-1021-images.tar.gz, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.pdf The Yarn Scheduler is a fertile area of interest with different implementations, e.g., Fifo, Capacity and Fair schedulers. Meanwhile, several optimizations are also made to improve scheduler performance for different scenarios and workload. Each scheduler algorithm has its own set of features, and drives scheduling decisions by many factors, such as fairness, capacity guarantee, resource availability, etc. It is very important to evaluate a scheduler algorithm very well before we deploy it in a production cluster. Unfortunately, currently it is non-trivial to evaluate a scheduling algorithm. Evaluating in a real cluster is always time and cost consuming, and it is also very hard to find a large-enough cluster. Hence, a simulator which can predict how well a scheduler algorithm for some specific workload would be quite useful. We want to build a Scheduler Load Simulator to simulate large-scale Yarn clusters and application loads in a single machine. This would be invaluable in furthering Yarn by providing a tool for researchers and developers to prototype new scheduler features and predict their behavior and performance with reasonable amount of confidence, there-by aiding rapid innovation. The simulator will exercise the real Yarn ResourceManager removing the network factor by simulating NodeManagers and ApplicationMasters via handling and dispatching NM/AMs heartbeat events from within the same JVM. To keep tracking of scheduler behavior and performance, a scheduler wrapper will wrap the real scheduler. The simulator will produce real time metrics while executing, including: * Resource usages for whole cluster and each queue, which can be utilized to configure cluster and queue's capacity. * The detailed application execution trace (recorded in relation to simulated time), which can be analyzed to understand/validate the scheduler behavior (individual jobs turn around time, throughput, fairness, capacity guarantee, etc). * Several key metrics of scheduler algorithm, such as time cost of each scheduler operation (allocate, handle, etc), which can be utilized by Hadoop developers to find the code spots and scalability limits. The simulator will provide real time charts showing the behavior of the scheduler and its performance. A short demo is available http://www.youtube.com/watch?v=6thLi8q0qLE, showing how to use simulator to simulate Fair Scheduler and Capacity Scheduler. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1197) Support changing resources of an allocated container
[ https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785550#comment-13785550 ] Sandy Ryza commented on YARN-1197: -- To summarize what I wrote above: YARN is already asymmetrical wrt acquiring and releasing resources. I don't think the minimum allocation logic is enough to justify a round trip to the RM. It will require adding more new states that will make the whole thing more confusing and bug-prone. We can either push down this logic into the NodeManager or just handle it on the RM side, i.e. refuse to free any resources in the scheduler for a container that decreases from 1024 to 1023 mb. Support changing resources of an allocated container Key: YARN-1197 URL: https://issues.apache.org/jira/browse/YARN-1197 Project: Hadoop YARN Issue Type: Task Components: api, nodemanager, resourcemanager Affects Versions: 2.1.0-beta Reporter: Wangda Tan Attachments: yarn-1197.pdf Currently, YARN cannot support merge several containers in one node to a big container, which can make us incrementally ask resources, merge them to a bigger one, and launch our processes. The user scenario is described in the comments. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1199) Make NM/RM Versions Available
[ https://issues.apache.org/jira/browse/YARN-1199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785551#comment-13785551 ] Hudson commented on YARN-1199: -- SUCCESS: Integrated in Hadoop-trunk-Commit #4526 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4526/]) YARN-1199. Make NM/RM Versions Available (Mit Desai via jeagles) (jeagles: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1529003) * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/v2/TestRMNMInfo.java * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/nodemanager/NodeInfo.java * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/RMNodeWrapper.java * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMNMInfo.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNode.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/NodesPage.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/NodeInfo.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNM.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNodes.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMNodeTransitions.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestNodesPage.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesNodes.java Make NM/RM Versions Available - Key: YARN-1199 URL: https://issues.apache.org/jira/browse/YARN-1199 Project: Hadoop YARN Issue Type: Improvement Reporter: Mit Desai Assignee: Mit Desai Attachments: YARN-1199.patch, YARN-1199.patch, YARN-1199.patch, YARN-1199.patch Now as we have the NM and RM Versions available, we can display the YARN version of nodes running in the cluster. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-624) Support gang scheduling in the AM RM protocol
[ https://issues.apache.org/jira/browse/YARN-624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785556#comment-13785556 ] Robert Joseph Evans commented on YARN-624: -- [~curino] Sorry about the late reply. I have not really tested this much with storm on YARN. Most of our experiments it is negligible the amount of time it takes to get nodes. But we have not really done anything serious with it, and adding new nodes right now is a manual operation. Support gang scheduling in the AM RM protocol - Key: YARN-624 URL: https://issues.apache.org/jira/browse/YARN-624 Project: Hadoop YARN Issue Type: Sub-task Components: api, scheduler Affects Versions: 2.0.4-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Per discussion on YARN-392 and elsewhere, gang scheduling, in which a scheduler runs a set of tasks when they can all be run at the same time, would be a useful feature for YARN schedulers to support. Currently, AMs can approximate this by holding on to containers until they get all the ones they need. However, this lends itself to deadlocks when different AMs are waiting on the same containers. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-624) Support gang scheduling in the AM RM protocol
[ https://issues.apache.org/jira/browse/YARN-624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785570#comment-13785570 ] Carlo Curino commented on YARN-624: --- Got it.. thanks anyway, please keep us posted if you get with Storm or Giraph to get some concrete numbers... Support gang scheduling in the AM RM protocol - Key: YARN-624 URL: https://issues.apache.org/jira/browse/YARN-624 Project: Hadoop YARN Issue Type: Sub-task Components: api, scheduler Affects Versions: 2.0.4-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Per discussion on YARN-392 and elsewhere, gang scheduling, in which a scheduler runs a set of tasks when they can all be run at the same time, would be a useful feature for YARN schedulers to support. Currently, AMs can approximate this by holding on to containers until they get all the ones they need. However, this lends itself to deadlocks when different AMs are waiting on the same containers. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1256) NM silently ignores non-existent service in StartContainerRequest
[ https://issues.apache.org/jira/browse/YARN-1256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785571#comment-13785571 ] Bikas Saha commented on YARN-1256: -- Can we do a null check for serviceData. And can we please create a new exception similar to InvalidContainerException. e.g. InvalidAuxServiceException. {code} +MapString, ByteBuffer serviceData = getAuxServiceMetaData(); +for (Map.EntryString, ByteBuffer meta : launchContext.getServiceData() +.entrySet()) { + if (null == serviceData.get(meta.getKey())) { +throw new YarnException(The auxService: + meta.getKey() ++ does not exist); + } +} {code} NM silently ignores non-existent service in StartContainerRequest - Key: YARN-1256 URL: https://issues.apache.org/jira/browse/YARN-1256 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.1-beta Reporter: Bikas Saha Assignee: Xuan Gong Priority: Critical Fix For: 2.1.2-beta Attachments: YARN-1256.1.patch, YARN-1256.2.patch A container can set token service metadata for a service, say shuffle_service. If that service does not exist then the errors is silently ignored. Later, when the next container wants to access data written to shuffle_service by the first task, then it fails because the service does not have the token that was supposed to be set by the first task. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Assigned] (YARN-658) Command to kill a YARN application does not work with newer Ubuntu versions
[ https://issues.apache.org/jira/browse/YARN-658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Parker reassigned YARN-658: -- Assignee: Robert Parker Command to kill a YARN application does not work with newer Ubuntu versions --- Key: YARN-658 URL: https://issues.apache.org/jira/browse/YARN-658 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.0.3-alpha, 2.0.4-alpha Reporter: David Yan Assignee: Robert Parker Attachments: AppMaster.stderr, yarn-david-nodemanager-david-ubuntu.out, yarn-david-resourcemanager-david-ubuntu.out After issuing a KillApplicationRequest, the application keeps running on the system even though the state is changed to KILLED. It happens on both Ubuntu 12.10 and 13.04, but works fine on Ubuntu 12.04. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Resolved] (YARN-658) Command to kill a YARN application does not work with newer Ubuntu versions
[ https://issues.apache.org/jira/browse/YARN-658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Parker resolved YARN-658. Resolution: Duplicate Command to kill a YARN application does not work with newer Ubuntu versions --- Key: YARN-658 URL: https://issues.apache.org/jira/browse/YARN-658 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.0.3-alpha, 2.0.4-alpha Reporter: David Yan Assignee: Robert Parker Attachments: AppMaster.stderr, yarn-david-nodemanager-david-ubuntu.out, yarn-david-resourcemanager-david-ubuntu.out After issuing a KillApplicationRequest, the application keeps running on the system even though the state is changed to KILLED. It happens on both Ubuntu 12.10 and 13.04, but works fine on Ubuntu 12.04. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-890) The roundup for memory values on resource manager UI is misleading
[ https://issues.apache.org/jira/browse/YARN-890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785589#comment-13785589 ] Hudson commented on YARN-890: - SUCCESS: Integrated in Hadoop-trunk-Commit #4528 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4528/]) YARN-890. Ensure CapacityScheduler doesn't round-up metric for available resources. Contributed by Xuan Gong Hitesh Shah. (acmurthy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1529015) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSQueueUtils.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java The roundup for memory values on resource manager UI is misleading -- Key: YARN-890 URL: https://issues.apache.org/jira/browse/YARN-890 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Trupti Dhavle Assignee: Xuan Gong Fix For: 2.1.2-beta Attachments: Screen Shot 2013-07-10 at 10.43.34 AM.png, YARN-890.1.patch, YARN-890.2.patch From the yarn-site.xml, I see following values- property nameyarn.nodemanager.resource.memory-mb/name value4192/value /property property nameyarn.scheduler.maximum-allocation-mb/name value4192/value /property property nameyarn.scheduler.minimum-allocation-mb/name value1024/value /property However the resourcemanager UI shows total memory as 5MB -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1149) NM throws InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING
[ https://issues.apache.org/jira/browse/YARN-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1149: Attachment: YARN-1149.7.patch Remove some transitions from ApplicationImpl state from previous patch since those transitions are impossible to happen. NM throws InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING - Key: YARN-1149 URL: https://issues.apache.org/jira/browse/YARN-1149 Project: Hadoop YARN Issue Type: Bug Reporter: Ramya Sunil Assignee: Xuan Gong Fix For: 2.1.2-beta Attachments: YARN-1149.1.patch, YARN-1149.2.patch, YARN-1149.3.patch, YARN-1149.4.patch, YARN-1149.5.patch, YARN-1149.6.patch, YARN-1149.7.patch When nodemanager receives a kill signal when an application has finished execution but log aggregation has not kicked in, InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING is thrown {noformat} 2013-08-25 20:45:00,875 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:finishLogAggregation(254)) - Application just finished : application_1377459190746_0118 2013-08-25 20:45:00,876 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:uploadLogsForContainer(105)) - Starting aggregate log-file for app application_1377459190746_0118 at /app-logs/foo/logs/application_1377459190746_0118/host_45454.tmp 2013-08-25 20:45:00,876 INFO logaggregation.LogAggregationService (LogAggregationService.java:stopAggregators(151)) - Waiting for aggregation to complete for application_1377459190746_0118 2013-08-25 20:45:00,891 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:uploadLogsForContainer(122)) - Uploading logs for container container_1377459190746_0118_01_04. Current good log dirs are /tmp/yarn/local 2013-08-25 20:45:00,915 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:doAppLogAggregation(182)) - Finished aggregate log-file for app application_1377459190746_0118 2013-08-25 20:45:00,925 WARN application.Application (ApplicationImpl.java:handle(427)) - Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:425) at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:59) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:697) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:689) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:81) at java.lang.Thread.run(Thread.java:662) 2013-08-25 20:45:00,926 INFO application.Application (ApplicationImpl.java:handle(430)) - Application application_1377459190746_0118 transitioned from RUNNING to null 2013-08-25 20:45:00,927 WARN monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(463)) - org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl is interrupted. Exiting. 2013-08-25 20:45:00,938 INFO ipc.Server (Server.java:stop(2437)) - Stopping server on 8040 {noformat} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1256) NM silently ignores non-existent service in StartContainerRequest
[ https://issues.apache.org/jira/browse/YARN-1256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1256: Attachment: YARN-1256.3.patch Add null check for AuxService, and create new InvalidAuxServiceException for missing AuxService NM silently ignores non-existent service in StartContainerRequest - Key: YARN-1256 URL: https://issues.apache.org/jira/browse/YARN-1256 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.1-beta Reporter: Bikas Saha Assignee: Xuan Gong Priority: Critical Fix For: 2.1.2-beta Attachments: YARN-1256.1.patch, YARN-1256.2.patch, YARN-1256.3.patch A container can set token service metadata for a service, say shuffle_service. If that service does not exist then the errors is silently ignored. Later, when the next container wants to access data written to shuffle_service by the first task, then it fails because the service does not have the token that was supposed to be set by the first task. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1149) NM throws InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING
[ https://issues.apache.org/jira/browse/YARN-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785609#comment-13785609 ] Hadoop QA commented on YARN-1149: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12606658/YARN-1149.7.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 9 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2079//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2079//console This message is automatically generated. NM throws InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING - Key: YARN-1149 URL: https://issues.apache.org/jira/browse/YARN-1149 Project: Hadoop YARN Issue Type: Bug Reporter: Ramya Sunil Assignee: Xuan Gong Fix For: 2.1.2-beta Attachments: YARN-1149.1.patch, YARN-1149.2.patch, YARN-1149.3.patch, YARN-1149.4.patch, YARN-1149.5.patch, YARN-1149.6.patch, YARN-1149.7.patch When nodemanager receives a kill signal when an application has finished execution but log aggregation has not kicked in, InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING is thrown {noformat} 2013-08-25 20:45:00,875 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:finishLogAggregation(254)) - Application just finished : application_1377459190746_0118 2013-08-25 20:45:00,876 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:uploadLogsForContainer(105)) - Starting aggregate log-file for app application_1377459190746_0118 at /app-logs/foo/logs/application_1377459190746_0118/host_45454.tmp 2013-08-25 20:45:00,876 INFO logaggregation.LogAggregationService (LogAggregationService.java:stopAggregators(151)) - Waiting for aggregation to complete for application_1377459190746_0118 2013-08-25 20:45:00,891 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:uploadLogsForContainer(122)) - Uploading logs for container container_1377459190746_0118_01_04. Current good log dirs are /tmp/yarn/local 2013-08-25 20:45:00,915 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:doAppLogAggregation(182)) - Finished aggregate log-file for app application_1377459190746_0118 2013-08-25 20:45:00,925 WARN application.Application (ApplicationImpl.java:handle(427)) - Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:425) at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:59) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:697) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:689) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:81) at
[jira] [Commented] (YARN-1149) NM throws InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING
[ https://issues.apache.org/jira/browse/YARN-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785622#comment-13785622 ] Hitesh Shah commented on YARN-1149: --- {code} +/** + * Application is killed by ResourceManager + */ +ON_SHUTDOWN, + +/** + * Application is killed as NodeManager is shut down + */ +BY_RESOURCEMANAGER {code} - descriptions reversed {code} + default: +LOG.warn(Invalid eventType: + eventType); +} {code} - earlier comment on invalid event type not addressed? {code} + @SuppressWarnings(unchecked) + static class NonTransition implements + SingleArcTransitionApplicationImpl, ApplicationEvent { +@Override +public void transition(ApplicationImpl app, ApplicationEvent event) { + if (LOG.isDebugEnabled()) { +LOG.debug(The event: + event.getType() ++ is invalid in current state : + app.getApplicationState()); + } +} + } + {code} - may be better to not have a non-transition. Current message reads as if this is an error and is being ignored with no reason as to why it is ignored. NM throws InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING - Key: YARN-1149 URL: https://issues.apache.org/jira/browse/YARN-1149 Project: Hadoop YARN Issue Type: Bug Reporter: Ramya Sunil Assignee: Xuan Gong Fix For: 2.1.2-beta Attachments: YARN-1149.1.patch, YARN-1149.2.patch, YARN-1149.3.patch, YARN-1149.4.patch, YARN-1149.5.patch, YARN-1149.6.patch, YARN-1149.7.patch When nodemanager receives a kill signal when an application has finished execution but log aggregation has not kicked in, InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING is thrown {noformat} 2013-08-25 20:45:00,875 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:finishLogAggregation(254)) - Application just finished : application_1377459190746_0118 2013-08-25 20:45:00,876 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:uploadLogsForContainer(105)) - Starting aggregate log-file for app application_1377459190746_0118 at /app-logs/foo/logs/application_1377459190746_0118/host_45454.tmp 2013-08-25 20:45:00,876 INFO logaggregation.LogAggregationService (LogAggregationService.java:stopAggregators(151)) - Waiting for aggregation to complete for application_1377459190746_0118 2013-08-25 20:45:00,891 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:uploadLogsForContainer(122)) - Uploading logs for container container_1377459190746_0118_01_04. Current good log dirs are /tmp/yarn/local 2013-08-25 20:45:00,915 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:doAppLogAggregation(182)) - Finished aggregate log-file for app application_1377459190746_0118 2013-08-25 20:45:00,925 WARN application.Application (ApplicationImpl.java:handle(427)) - Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:425) at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:59) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:697) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:689) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:81) at java.lang.Thread.run(Thread.java:662) 2013-08-25 20:45:00,926 INFO application.Application (ApplicationImpl.java:handle(430)) - Application application_1377459190746_0118 transitioned from RUNNING to null 2013-08-25 20:45:00,927 WARN monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(463)) - org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl is interrupted. Exiting. 2013-08-25
[jira] [Updated] (YARN-1256) NM silently ignores non-existent service in StartContainerRequest
[ https://issues.apache.org/jira/browse/YARN-1256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1256: Attachment: YARN-1256.4.patch Add the null check for AuxService Data requested from CLC NM silently ignores non-existent service in StartContainerRequest - Key: YARN-1256 URL: https://issues.apache.org/jira/browse/YARN-1256 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.1-beta Reporter: Bikas Saha Assignee: Xuan Gong Priority: Critical Fix For: 2.1.2-beta Attachments: YARN-1256.1.patch, YARN-1256.2.patch, YARN-1256.3.patch, YARN-1256.4.patch A container can set token service metadata for a service, say shuffle_service. If that service does not exist then the errors is silently ignored. Later, when the next container wants to access data written to shuffle_service by the first task, then it fails because the service does not have the token that was supposed to be set by the first task. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1131) $yarn logs command should return an appropriate error message if YARN application is still running
[ https://issues.apache.org/jira/browse/YARN-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated YARN-1131: - Attachment: YARN-1131.3.txt Updated the patch to get the tests working, also added one more test for when an app is not known by the RM. $yarn logs command should return an appropriate error message if YARN application is still running -- Key: YARN-1131 URL: https://issues.apache.org/jira/browse/YARN-1131 Project: Hadoop YARN Issue Type: Sub-task Components: client Affects Versions: 2.1.1-beta Reporter: Tassapol Athiapinya Assignee: Siddharth Seth Priority: Minor Attachments: YARN-1131.1.txt, YARN-1131.2.txt, YARN-1131.3.txt In the case when log aggregation is enabled, if a user submits MapReduce job and runs $ yarn logs -applicationId app ID while the YARN application is running, the command will return no message and return user back to shell. It is nice to tell the user that log aggregation is in progress. {code} -bash-4.1$ /usr/bin/yarn logs -applicationId application_1377900193583_0002 -bash-4.1$ {code} At the same time, if invalid application ID is given, YARN CLI should say that the application ID is incorrect rather than throwing NoSuchElementException. {code} $ /usr/bin/yarn logs -applicationId application_0 Exception in thread main java.util.NoSuchElementException at com.google.common.base.AbstractIterator.next(AbstractIterator.java:75) at org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:124) at org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:119) at org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:110) at org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:255) {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1256) NM silently ignores non-existent service in StartContainerRequest
[ https://issues.apache.org/jira/browse/YARN-1256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785627#comment-13785627 ] Hadoop QA commented on YARN-1256: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12606665/YARN-1256.3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2080//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2080//console This message is automatically generated. NM silently ignores non-existent service in StartContainerRequest - Key: YARN-1256 URL: https://issues.apache.org/jira/browse/YARN-1256 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.1-beta Reporter: Bikas Saha Assignee: Xuan Gong Priority: Critical Fix For: 2.1.2-beta Attachments: YARN-1256.1.patch, YARN-1256.2.patch, YARN-1256.3.patch, YARN-1256.4.patch A container can set token service metadata for a service, say shuffle_service. If that service does not exist then the errors is silently ignored. Later, when the next container wants to access data written to shuffle_service by the first task, then it fails because the service does not have the token that was supposed to be set by the first task. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1197) Support changing resources of an allocated container
[ https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785628#comment-13785628 ] Alejandro Abdelnur commented on YARN-1197: -- Bikas, yep, there is a race condition the AM-RM-NM for decreasing. At least in the FS due to continuous scheduling (YARN-1010) because the RM could allocate the freed space to an AM before the NM heartbeats and gets the info. This does not happen if allocations are tied to the corresponding NM heartbeating. Thanks. Support changing resources of an allocated container Key: YARN-1197 URL: https://issues.apache.org/jira/browse/YARN-1197 Project: Hadoop YARN Issue Type: Task Components: api, nodemanager, resourcemanager Affects Versions: 2.1.0-beta Reporter: Wangda Tan Attachments: yarn-1197.pdf Currently, YARN cannot support merge several containers in one node to a big container, which can make us incrementally ask resources, merge them to a bigger one, and launch our processes. The user scenario is described in the comments. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1236) FairScheduler setting queue name in RMApp is not working
[ https://issues.apache.org/jira/browse/YARN-1236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785638#comment-13785638 ] Alejandro Abdelnur commented on YARN-1236: -- +1 FairScheduler setting queue name in RMApp is not working - Key: YARN-1236 URL: https://issues.apache.org/jira/browse/YARN-1236 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-1236.patch The fair scheduler sometimes picks a different queue than the one an application was submitted to, such as when user-as-default-queue is turned on. It needs to update the queue name in the RMApp so that this choice will be reflected in the UI. This isn't working because the scheduler is looking up the RMApp by application attempt id instead of app id and failing to find it. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1265) Fair Scheduler chokes on unhealthy node reconnect
[ https://issues.apache.org/jira/browse/YARN-1265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-1265: - Attachment: YARN-1265-1.patch Fair Scheduler chokes on unhealthy node reconnect - Key: YARN-1265 URL: https://issues.apache.org/jira/browse/YARN-1265 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Affects Versions: 2.1.1-beta Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-1265-1.patch, YARN-1265.patch Only nodes in the RUNNING state are tracked by schedulers. When a node reconnects, RMNodeImpl.ReconnectNodeTransition tries to remove it, even if it's in the RUNNING state. The FairScheduler doesn't guard against this. I think the best way to fix this is to check to see whether a node is RUNNING before telling the scheduler to remove it. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1265) Fair Scheduler chokes on unhealthy node reconnect
[ https://issues.apache.org/jira/browse/YARN-1265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785669#comment-13785669 ] Sandy Ryza commented on YARN-1265: -- Uploaded a patch that, instead of the above, changes the Fair Scheduler's behavior to mimic the Capacity Scheduler. Fair Scheduler chokes on unhealthy node reconnect - Key: YARN-1265 URL: https://issues.apache.org/jira/browse/YARN-1265 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Affects Versions: 2.1.1-beta Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-1265-1.patch, YARN-1265.patch Only nodes in the RUNNING state are tracked by schedulers. When a node reconnects, RMNodeImpl.ReconnectNodeTransition tries to remove it, even if it's in the RUNNING state. The FairScheduler doesn't guard against this. I think the best way to fix this is to check to see whether a node is RUNNING before telling the scheduler to remove it. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-621) RM triggers web auth failure before first job
[ https://issues.apache.org/jira/browse/YARN-621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785670#comment-13785670 ] Vinod Kumar Vavilapalli commented on YARN-621: -- Tx for the clarification. The patch now makes sense to me. +1, checking this in. RM triggers web auth failure before first job - Key: YARN-621 URL: https://issues.apache.org/jira/browse/YARN-621 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.0.4-alpha Reporter: Allen Wittenauer Assignee: Omkar Vinit Joshi Priority: Critical Attachments: YARN-621.20131001.1.patch On a secure YARN setup, before the first job is executed, going to the web interface of the resource manager triggers authentication errors. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-621) RM triggers web auth failure before first job
[ https://issues.apache.org/jira/browse/YARN-621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785673#comment-13785673 ] Hudson commented on YARN-621: - SUCCESS: Integrated in Hadoop-trunk-Commit #4529 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4529/]) YARN-621. Changed YARN web app to not add paths that can cause duplicate additions of authenticated filters there by causing kerberos replay errors. Contributed by Omkar Vinit Joshi. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1529030) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/WebApps.java RM triggers web auth failure before first job - Key: YARN-621 URL: https://issues.apache.org/jira/browse/YARN-621 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.0.4-alpha Reporter: Allen Wittenauer Assignee: Omkar Vinit Joshi Priority: Critical Fix For: 2.1.2-beta Attachments: YARN-621.20131001.1.patch On a secure YARN setup, before the first job is executed, going to the web interface of the resource manager triggers authentication errors. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1265) Fair Scheduler chokes on unhealthy node reconnect
[ https://issues.apache.org/jira/browse/YARN-1265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785683#comment-13785683 ] Hadoop QA commented on YARN-1265: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12606676/YARN-1265-1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2083//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2083//console This message is automatically generated. Fair Scheduler chokes on unhealthy node reconnect - Key: YARN-1265 URL: https://issues.apache.org/jira/browse/YARN-1265 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Affects Versions: 2.1.1-beta Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-1265-1.patch, YARN-1265.patch Only nodes in the RUNNING state are tracked by schedulers. When a node reconnects, RMNodeImpl.ReconnectNodeTransition tries to remove it, even if it's in the RUNNING state. The FairScheduler doesn't guard against this. I think the best way to fix this is to check to see whether a node is RUNNING before telling the scheduler to remove it. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (YARN-1271) Text file busy errors launching containers again
Sandy Ryza created YARN-1271: Summary: Text file busy errors launching containers again Key: YARN-1271 URL: https://issues.apache.org/jira/browse/YARN-1271 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.1.1-beta Reporter: Sandy Ryza Assignee: Sandy Ryza The error is shown below in the comments. MAPREDUCE-2374 fixed this by removing -c when running the container launch script. It looks like the -c got brought back during the windows branch merge, so we should remove it again. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1271) Text file busy errors launching containers again
[ https://issues.apache.org/jira/browse/YARN-1271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785688#comment-13785688 ] Sandy Ryza commented on YARN-1271: -- {code} Exception from container-launch: org.apache.hadoop.util.Shell$ExitCodeException: bash: /data/5/yarn/nm/usercache/jenkins/appcache/application_1380783835333_0011/container_1380783835333_0011_01_000476/default_container_executor.sh: /bin/bash: bad interpreter: Text file busy at org.apache.hadoop.util.Shell.runCommand(Shell.java:458) at org.apache.hadoop.util.Shell.run(Shell.java:373) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:578) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:258) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:74) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) {code} Text file busy errors launching containers again -- Key: YARN-1271 URL: https://issues.apache.org/jira/browse/YARN-1271 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.1.1-beta Reporter: Sandy Ryza Assignee: Sandy Ryza The error is shown below in the comments. MAPREDUCE-2374 fixed this by removing -c when running the container launch script. It looks like the -c got brought back during the windows branch merge, so we should remove it again. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1149) NM throws InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING
[ https://issues.apache.org/jira/browse/YARN-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1149: Attachment: YARN-1149.8.patch Address all the comments NM throws InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING - Key: YARN-1149 URL: https://issues.apache.org/jira/browse/YARN-1149 Project: Hadoop YARN Issue Type: Bug Reporter: Ramya Sunil Assignee: Xuan Gong Fix For: 2.1.2-beta Attachments: YARN-1149.1.patch, YARN-1149.2.patch, YARN-1149.3.patch, YARN-1149.4.patch, YARN-1149.5.patch, YARN-1149.6.patch, YARN-1149.7.patch, YARN-1149.8.patch When nodemanager receives a kill signal when an application has finished execution but log aggregation has not kicked in, InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING is thrown {noformat} 2013-08-25 20:45:00,875 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:finishLogAggregation(254)) - Application just finished : application_1377459190746_0118 2013-08-25 20:45:00,876 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:uploadLogsForContainer(105)) - Starting aggregate log-file for app application_1377459190746_0118 at /app-logs/foo/logs/application_1377459190746_0118/host_45454.tmp 2013-08-25 20:45:00,876 INFO logaggregation.LogAggregationService (LogAggregationService.java:stopAggregators(151)) - Waiting for aggregation to complete for application_1377459190746_0118 2013-08-25 20:45:00,891 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:uploadLogsForContainer(122)) - Uploading logs for container container_1377459190746_0118_01_04. Current good log dirs are /tmp/yarn/local 2013-08-25 20:45:00,915 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:doAppLogAggregation(182)) - Finished aggregate log-file for app application_1377459190746_0118 2013-08-25 20:45:00,925 WARN application.Application (ApplicationImpl.java:handle(427)) - Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:425) at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:59) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:697) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:689) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:81) at java.lang.Thread.run(Thread.java:662) 2013-08-25 20:45:00,926 INFO application.Application (ApplicationImpl.java:handle(430)) - Application application_1377459190746_0118 transitioned from RUNNING to null 2013-08-25 20:45:00,927 WARN monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(463)) - org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl is interrupted. Exiting. 2013-08-25 20:45:00,938 INFO ipc.Server (Server.java:stop(2437)) - Stopping server on 8040 {noformat} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1149) NM throws InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING
[ https://issues.apache.org/jira/browse/YARN-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1149: Attachment: YARN-1149.9.patch NM throws InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING - Key: YARN-1149 URL: https://issues.apache.org/jira/browse/YARN-1149 Project: Hadoop YARN Issue Type: Bug Reporter: Ramya Sunil Assignee: Xuan Gong Fix For: 2.1.2-beta Attachments: YARN-1149.1.patch, YARN-1149.2.patch, YARN-1149.3.patch, YARN-1149.4.patch, YARN-1149.5.patch, YARN-1149.6.patch, YARN-1149.7.patch, YARN-1149.8.patch, YARN-1149.9.patch When nodemanager receives a kill signal when an application has finished execution but log aggregation has not kicked in, InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING is thrown {noformat} 2013-08-25 20:45:00,875 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:finishLogAggregation(254)) - Application just finished : application_1377459190746_0118 2013-08-25 20:45:00,876 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:uploadLogsForContainer(105)) - Starting aggregate log-file for app application_1377459190746_0118 at /app-logs/foo/logs/application_1377459190746_0118/host_45454.tmp 2013-08-25 20:45:00,876 INFO logaggregation.LogAggregationService (LogAggregationService.java:stopAggregators(151)) - Waiting for aggregation to complete for application_1377459190746_0118 2013-08-25 20:45:00,891 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:uploadLogsForContainer(122)) - Uploading logs for container container_1377459190746_0118_01_04. Current good log dirs are /tmp/yarn/local 2013-08-25 20:45:00,915 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:doAppLogAggregation(182)) - Finished aggregate log-file for app application_1377459190746_0118 2013-08-25 20:45:00,925 WARN application.Application (ApplicationImpl.java:handle(427)) - Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:425) at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:59) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:697) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:689) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:81) at java.lang.Thread.run(Thread.java:662) 2013-08-25 20:45:00,926 INFO application.Application (ApplicationImpl.java:handle(430)) - Application application_1377459190746_0118 transitioned from RUNNING to null 2013-08-25 20:45:00,927 WARN monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(463)) - org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl is interrupted. Exiting. 2013-08-25 20:45:00,938 INFO ipc.Server (Server.java:stop(2437)) - Stopping server on 8040 {noformat} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1271) Text file busy errors launching containers again
[ https://issues.apache.org/jira/browse/YARN-1271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-1271: - Attachment: YARN-1271.patch Text file busy errors launching containers again -- Key: YARN-1271 URL: https://issues.apache.org/jira/browse/YARN-1271 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.1.1-beta Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-1271.patch The error is shown below in the comments. MAPREDUCE-2374 fixed this by removing -c when running the container launch script. It looks like the -c got brought back during the windows branch merge, so we should remove it again. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1256) NM silently ignores non-existent service in StartContainerRequest
[ https://issues.apache.org/jira/browse/YARN-1256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1256: Attachment: YARN-1256.5.patch NM silently ignores non-existent service in StartContainerRequest - Key: YARN-1256 URL: https://issues.apache.org/jira/browse/YARN-1256 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.1-beta Reporter: Bikas Saha Assignee: Xuan Gong Priority: Critical Fix For: 2.1.2-beta Attachments: YARN-1256.1.patch, YARN-1256.2.patch, YARN-1256.3.patch, YARN-1256.4.patch, YARN-1256.5.patch A container can set token service metadata for a service, say shuffle_service. If that service does not exist then the errors is silently ignored. Later, when the next container wants to access data written to shuffle_service by the first task, then it fails because the service does not have the token that was supposed to be set by the first task. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1149) NM throws InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING
[ https://issues.apache.org/jira/browse/YARN-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785714#comment-13785714 ] Hitesh Shah commented on YARN-1149: --- +1. Latest patch looks good. Will commit after jenkins blesses it. NM throws InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING - Key: YARN-1149 URL: https://issues.apache.org/jira/browse/YARN-1149 Project: Hadoop YARN Issue Type: Bug Reporter: Ramya Sunil Assignee: Xuan Gong Fix For: 2.1.2-beta Attachments: YARN-1149.1.patch, YARN-1149.2.patch, YARN-1149.3.patch, YARN-1149.4.patch, YARN-1149.5.patch, YARN-1149.6.patch, YARN-1149.7.patch, YARN-1149.8.patch, YARN-1149.9.patch When nodemanager receives a kill signal when an application has finished execution but log aggregation has not kicked in, InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING is thrown {noformat} 2013-08-25 20:45:00,875 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:finishLogAggregation(254)) - Application just finished : application_1377459190746_0118 2013-08-25 20:45:00,876 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:uploadLogsForContainer(105)) - Starting aggregate log-file for app application_1377459190746_0118 at /app-logs/foo/logs/application_1377459190746_0118/host_45454.tmp 2013-08-25 20:45:00,876 INFO logaggregation.LogAggregationService (LogAggregationService.java:stopAggregators(151)) - Waiting for aggregation to complete for application_1377459190746_0118 2013-08-25 20:45:00,891 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:uploadLogsForContainer(122)) - Uploading logs for container container_1377459190746_0118_01_04. Current good log dirs are /tmp/yarn/local 2013-08-25 20:45:00,915 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:doAppLogAggregation(182)) - Finished aggregate log-file for app application_1377459190746_0118 2013-08-25 20:45:00,925 WARN application.Application (ApplicationImpl.java:handle(427)) - Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:425) at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:59) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:697) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:689) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:81) at java.lang.Thread.run(Thread.java:662) 2013-08-25 20:45:00,926 INFO application.Application (ApplicationImpl.java:handle(430)) - Application application_1377459190746_0118 transitioned from RUNNING to null 2013-08-25 20:45:00,927 WARN monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(463)) - org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl is interrupted. Exiting. 2013-08-25 20:45:00,938 INFO ipc.Server (Server.java:stop(2437)) - Stopping server on 8040 {noformat} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1167) Submitted distributed shell application shows appMasterHost = empty
[ https://issues.apache.org/jira/browse/YARN-1167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1167: Attachment: YARN-1167.3.patch Remove rpcPort change Submitted distributed shell application shows appMasterHost = empty --- Key: YARN-1167 URL: https://issues.apache.org/jira/browse/YARN-1167 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Reporter: Tassapol Athiapinya Assignee: Xuan Gong Fix For: 2.1.2-beta Attachments: YARN-1167.1.patch, YARN-1167.2.patch, YARN-1167.3.patch Submit distributed shell application. Once the application turns to be RUNNING state, app master host should not be empty. In reality, it is empty. ==console logs== distributedshell.Client: Got application report from ASM for, appId=12, clientToAMToken=null, appDiagnostics=, appMasterHost=, appQueue=default, appMasterRpcPort=0, appStartTime=1378505161360, yarnAppState=RUNNING, distributedFinalState=UNDEFINED, -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1149) NM throws InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING
[ https://issues.apache.org/jira/browse/YARN-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785717#comment-13785717 ] Hadoop QA commented on YARN-1149: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12606681/YARN-1149.8.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 9 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2084//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2084//console This message is automatically generated. NM throws InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING - Key: YARN-1149 URL: https://issues.apache.org/jira/browse/YARN-1149 Project: Hadoop YARN Issue Type: Bug Reporter: Ramya Sunil Assignee: Xuan Gong Fix For: 2.1.2-beta Attachments: YARN-1149.1.patch, YARN-1149.2.patch, YARN-1149.3.patch, YARN-1149.4.patch, YARN-1149.5.patch, YARN-1149.6.patch, YARN-1149.7.patch, YARN-1149.8.patch, YARN-1149.9.patch When nodemanager receives a kill signal when an application has finished execution but log aggregation has not kicked in, InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING is thrown {noformat} 2013-08-25 20:45:00,875 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:finishLogAggregation(254)) - Application just finished : application_1377459190746_0118 2013-08-25 20:45:00,876 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:uploadLogsForContainer(105)) - Starting aggregate log-file for app application_1377459190746_0118 at /app-logs/foo/logs/application_1377459190746_0118/host_45454.tmp 2013-08-25 20:45:00,876 INFO logaggregation.LogAggregationService (LogAggregationService.java:stopAggregators(151)) - Waiting for aggregation to complete for application_1377459190746_0118 2013-08-25 20:45:00,891 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:uploadLogsForContainer(122)) - Uploading logs for container container_1377459190746_0118_01_04. Current good log dirs are /tmp/yarn/local 2013-08-25 20:45:00,915 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:doAppLogAggregation(182)) - Finished aggregate log-file for app application_1377459190746_0118 2013-08-25 20:45:00,925 WARN application.Application (ApplicationImpl.java:handle(427)) - Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:425) at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:59) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:697) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:689) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134) at
[jira] [Commented] (YARN-1131) $yarn logs command should return an appropriate error message if YARN application is still running
[ https://issues.apache.org/jira/browse/YARN-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785718#comment-13785718 ] Siddharth Seth commented on YARN-1131: -- If another state does get added to the YarnApplicationState - we don't know if this is a final state or not. I'd prefer falling back to trying to find the logs on disk, which is what happens rightnow. $yarn logs command should return an appropriate error message if YARN application is still running -- Key: YARN-1131 URL: https://issues.apache.org/jira/browse/YARN-1131 Project: Hadoop YARN Issue Type: Sub-task Components: client Affects Versions: 2.1.1-beta Reporter: Tassapol Athiapinya Assignee: Siddharth Seth Priority: Minor Attachments: YARN-1131.1.txt, YARN-1131.2.txt, YARN-1131.3.txt In the case when log aggregation is enabled, if a user submits MapReduce job and runs $ yarn logs -applicationId app ID while the YARN application is running, the command will return no message and return user back to shell. It is nice to tell the user that log aggregation is in progress. {code} -bash-4.1$ /usr/bin/yarn logs -applicationId application_1377900193583_0002 -bash-4.1$ {code} At the same time, if invalid application ID is given, YARN CLI should say that the application ID is incorrect rather than throwing NoSuchElementException. {code} $ /usr/bin/yarn logs -applicationId application_0 Exception in thread main java.util.NoSuchElementException at com.google.common.base.AbstractIterator.next(AbstractIterator.java:75) at org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:124) at org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:119) at org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:110) at org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:255) {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1167) Submitted distributed shell application shows appMasterHost = empty
[ https://issues.apache.org/jira/browse/YARN-1167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1167: Attachment: YARN-1167.4.patch Submitted distributed shell application shows appMasterHost = empty --- Key: YARN-1167 URL: https://issues.apache.org/jira/browse/YARN-1167 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Reporter: Tassapol Athiapinya Assignee: Xuan Gong Fix For: 2.1.2-beta Attachments: YARN-1167.1.patch, YARN-1167.2.patch, YARN-1167.3.patch, YARN-1167.4.patch Submit distributed shell application. Once the application turns to be RUNNING state, app master host should not be empty. In reality, it is empty. ==console logs== distributedshell.Client: Got application report from ASM for, appId=12, clientToAMToken=null, appDiagnostics=, appMasterHost=, appQueue=default, appMasterRpcPort=0, appStartTime=1378505161360, yarnAppState=RUNNING, distributedFinalState=UNDEFINED, -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1131) $yarn logs command should return an appropriate error message if YARN application is still running
[ https://issues.apache.org/jira/browse/YARN-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785725#comment-13785725 ] Hitesh Shah commented on YARN-1131: --- +1. Will commit once jenkins Ok's the patch. $yarn logs command should return an appropriate error message if YARN application is still running -- Key: YARN-1131 URL: https://issues.apache.org/jira/browse/YARN-1131 Project: Hadoop YARN Issue Type: Sub-task Components: client Affects Versions: 2.1.1-beta Reporter: Tassapol Athiapinya Assignee: Siddharth Seth Priority: Minor Attachments: YARN-1131.1.txt, YARN-1131.2.txt, YARN-1131.3.txt In the case when log aggregation is enabled, if a user submits MapReduce job and runs $ yarn logs -applicationId app ID while the YARN application is running, the command will return no message and return user back to shell. It is nice to tell the user that log aggregation is in progress. {code} -bash-4.1$ /usr/bin/yarn logs -applicationId application_1377900193583_0002 -bash-4.1$ {code} At the same time, if invalid application ID is given, YARN CLI should say that the application ID is incorrect rather than throwing NoSuchElementException. {code} $ /usr/bin/yarn logs -applicationId application_0 Exception in thread main java.util.NoSuchElementException at com.google.common.base.AbstractIterator.next(AbstractIterator.java:75) at org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:124) at org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:119) at org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:110) at org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:255) {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1271) Text file busy errors launching containers again
[ https://issues.apache.org/jira/browse/YARN-1271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785733#comment-13785733 ] Hadoop QA commented on YARN-1271: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12606685/YARN-1271.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2086//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2086//console This message is automatically generated. Text file busy errors launching containers again -- Key: YARN-1271 URL: https://issues.apache.org/jira/browse/YARN-1271 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.1.1-beta Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-1271.patch The error is shown below in the comments. MAPREDUCE-2374 fixed this by removing -c when running the container launch script. It looks like the -c got brought back during the windows branch merge, so we should remove it again. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1167) Submitted distributed shell application shows appMasterHost = empty
[ https://issues.apache.org/jira/browse/YARN-1167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785736#comment-13785736 ] Omkar Vinit Joshi commented on YARN-1167: - +1 lgtm. Thanks [~xgong] Submitted distributed shell application shows appMasterHost = empty --- Key: YARN-1167 URL: https://issues.apache.org/jira/browse/YARN-1167 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Reporter: Tassapol Athiapinya Assignee: Xuan Gong Fix For: 2.1.2-beta Attachments: YARN-1167.1.patch, YARN-1167.2.patch, YARN-1167.3.patch, YARN-1167.4.patch Submit distributed shell application. Once the application turns to be RUNNING state, app master host should not be empty. In reality, it is empty. ==console logs== distributedshell.Client: Got application report from ASM for, appId=12, clientToAMToken=null, appDiagnostics=, appMasterHost=, appQueue=default, appMasterRpcPort=0, appStartTime=1378505161360, yarnAppState=RUNNING, distributedFinalState=UNDEFINED, -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1256) NM silently ignores non-existent service in StartContainerRequest
[ https://issues.apache.org/jira/browse/YARN-1256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785741#comment-13785741 ] Hadoop QA commented on YARN-1256: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12606687/YARN-1256.5.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2085//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2085//console This message is automatically generated. NM silently ignores non-existent service in StartContainerRequest - Key: YARN-1256 URL: https://issues.apache.org/jira/browse/YARN-1256 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.1-beta Reporter: Bikas Saha Assignee: Xuan Gong Priority: Critical Fix For: 2.1.2-beta Attachments: YARN-1256.1.patch, YARN-1256.2.patch, YARN-1256.3.patch, YARN-1256.4.patch, YARN-1256.5.patch A container can set token service metadata for a service, say shuffle_service. If that service does not exist then the errors is silently ignored. Later, when the next container wants to access data written to shuffle_service by the first task, then it fails because the service does not have the token that was supposed to be set by the first task. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1167) Submitted distributed shell application shows appMasterHost = empty
[ https://issues.apache.org/jira/browse/YARN-1167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785744#comment-13785744 ] Hadoop QA commented on YARN-1167: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12606697/YARN-1167.4.patch against trunk revision . {color:red}-1 patch{color}. Trunk compilation may be broken. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2088//console This message is automatically generated. Submitted distributed shell application shows appMasterHost = empty --- Key: YARN-1167 URL: https://issues.apache.org/jira/browse/YARN-1167 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Reporter: Tassapol Athiapinya Assignee: Xuan Gong Fix For: 2.1.2-beta Attachments: YARN-1167.1.patch, YARN-1167.2.patch, YARN-1167.3.patch, YARN-1167.4.patch Submit distributed shell application. Once the application turns to be RUNNING state, app master host should not be empty. In reality, it is empty. ==console logs== distributedshell.Client: Got application report from ASM for, appId=12, clientToAMToken=null, appDiagnostics=, appMasterHost=, appQueue=default, appMasterRpcPort=0, appStartTime=1378505161360, yarnAppState=RUNNING, distributedFinalState=UNDEFINED, -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1149) NM throws InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING
[ https://issues.apache.org/jira/browse/YARN-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785745#comment-13785745 ] Hadoop QA commented on YARN-1149: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12606684/YARN-1149.9.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 9 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2087//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2087//console This message is automatically generated. NM throws InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING - Key: YARN-1149 URL: https://issues.apache.org/jira/browse/YARN-1149 Project: Hadoop YARN Issue Type: Bug Reporter: Ramya Sunil Assignee: Xuan Gong Fix For: 2.1.2-beta Attachments: YARN-1149.1.patch, YARN-1149.2.patch, YARN-1149.3.patch, YARN-1149.4.patch, YARN-1149.5.patch, YARN-1149.6.patch, YARN-1149.7.patch, YARN-1149.8.patch, YARN-1149.9.patch When nodemanager receives a kill signal when an application has finished execution but log aggregation has not kicked in, InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING is thrown {noformat} 2013-08-25 20:45:00,875 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:finishLogAggregation(254)) - Application just finished : application_1377459190746_0118 2013-08-25 20:45:00,876 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:uploadLogsForContainer(105)) - Starting aggregate log-file for app application_1377459190746_0118 at /app-logs/foo/logs/application_1377459190746_0118/host_45454.tmp 2013-08-25 20:45:00,876 INFO logaggregation.LogAggregationService (LogAggregationService.java:stopAggregators(151)) - Waiting for aggregation to complete for application_1377459190746_0118 2013-08-25 20:45:00,891 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:uploadLogsForContainer(122)) - Uploading logs for container container_1377459190746_0118_01_04. Current good log dirs are /tmp/yarn/local 2013-08-25 20:45:00,915 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:doAppLogAggregation(182)) - Finished aggregate log-file for app application_1377459190746_0118 2013-08-25 20:45:00,925 WARN application.Application (ApplicationImpl.java:handle(427)) - Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:425) at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:59) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:697) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:689) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134) at
[jira] [Commented] (YARN-1256) NM silently ignores non-existent service in StartContainerRequest
[ https://issues.apache.org/jira/browse/YARN-1256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785746#comment-13785746 ] Hudson commented on YARN-1256: -- FAILURE: Integrated in Hadoop-trunk-Commit #4531 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4531/]) YARN-1256. NM silently ignores non-existent service in StartContainerRequest (Xuan Gong via bikas) (bikas: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1529039) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/AuxiliaryServiceHelper.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/AuxServices.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManagerImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestContainerManagerWithLCE.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/TestContainerManager.java NM silently ignores non-existent service in StartContainerRequest - Key: YARN-1256 URL: https://issues.apache.org/jira/browse/YARN-1256 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.1-beta Reporter: Bikas Saha Assignee: Xuan Gong Priority: Critical Fix For: 2.1.2-beta Attachments: YARN-1256.1.patch, YARN-1256.2.patch, YARN-1256.3.patch, YARN-1256.4.patch, YARN-1256.5.patch A container can set token service metadata for a service, say shuffle_service. If that service does not exist then the errors is silently ignored. Later, when the next container wants to access data written to shuffle_service by the first task, then it fails because the service does not have the token that was supposed to be set by the first task. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1271) Text file busy errors launching containers again
[ https://issues.apache.org/jira/browse/YARN-1271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785752#comment-13785752 ] Aaron T. Myers commented on YARN-1271: -- +1, looks good to me. This is the exact same fix as what we used in MAPREDUCE-2374. Thanks, Sandy. Text file busy errors launching containers again -- Key: YARN-1271 URL: https://issues.apache.org/jira/browse/YARN-1271 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.1.1-beta Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-1271.patch The error is shown below in the comments. MAPREDUCE-2374 fixed this by removing -c when running the container launch script. It looks like the -c got brought back during the windows branch merge, so we should remove it again. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1167) Submitted distributed shell application shows appMasterHost = empty
[ https://issues.apache.org/jira/browse/YARN-1167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785755#comment-13785755 ] Omkar Vinit Joshi commented on YARN-1167: - Thanks [~vinodkv] for pointing it out.. test case is wrong.. I mean it is not testing the distributed shell code...check TestDistributedShell.java Submitted distributed shell application shows appMasterHost = empty --- Key: YARN-1167 URL: https://issues.apache.org/jira/browse/YARN-1167 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Reporter: Tassapol Athiapinya Assignee: Xuan Gong Fix For: 2.1.2-beta Attachments: YARN-1167.1.patch, YARN-1167.2.patch, YARN-1167.3.patch, YARN-1167.4.patch Submit distributed shell application. Once the application turns to be RUNNING state, app master host should not be empty. In reality, it is empty. ==console logs== distributedshell.Client: Got application report from ASM for, appId=12, clientToAMToken=null, appDiagnostics=, appMasterHost=, appQueue=default, appMasterRpcPort=0, appStartTime=1378505161360, yarnAppState=RUNNING, distributedFinalState=UNDEFINED, -- This message was sent by Atlassian JIRA (v6.1#6144)