[jira] [Commented] (YARN-1426) YARN Components need to unregister their beans upon shutdown
[ https://issues.apache.org/jira/browse/YARN-1426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13828584#comment-13828584 ] Hadoop QA commented on YARN-1426: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12615071/YARN-1426.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.mapred.TestJobCleanup {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2505//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2505//console This message is automatically generated. YARN Components need to unregister their beans upon shutdown Key: YARN-1426 URL: https://issues.apache.org/jira/browse/YARN-1426 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 3.0.0, 2.3.0 Reporter: Jonathan Eagles Assignee: Jonathan Eagles Attachments: YARN-1426.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-967) [YARN-321] Command Line Interface(CLI) for Reading Application History Storage Data
[ https://issues.apache.org/jira/browse/YARN-967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13828809#comment-13828809 ] Mayank Bansal commented on YARN-967: Thanks [~zjshen] for review bq. 1. Change yarn.cmd accordingly. Done bq. 2. Not necessary, no log is written in AHSClientImpl. Done bq,3. Where're the following configurations? Defined in other patch? YARN-955 bq. 4. Should AHSClientImpl use YarnClient configurations? I think we should use same to maintian consistency betwwn ahsclient and yarn cli in terms of polling interval. I think keeping lots of confs doesn't make sense. bq. 5. Is the following condition correct? Done bq. 6. One important issue here is that the command change is incompatible. The users' old shell scripts will be break given the change here. It's good to make the command compatible. For example, by default, it's going to the info of the application(s). Or at least, we need to document the new behavior of the command. Vinod Kumar Vavilapalli, how do you say? As discussed its backward compatible. bq. 7. Rename it to appAttemptReportStr? Also the javadoc. Done bq. 8. Fix the above issue for printContainerReport as well. Done bq. 9. Does AHS RPC protocol throw not found exception as well? If not, I think it's good to do that to keep consistent. Maybe do the same for getApplicationAttemptReport and getContainerReport This is on purpose, as we first want to make call to RM and if app is not there then call AHS if not there then send exception to client. For attempt and contianer it only look into AHS and if not found send exception back to client. Thats the older behavior. bq. 10. Check getApplications as well. Make getApplicationAttempts and getContainers behave similarly. This and the one above are the server-side changes. Probably you'd like to coordinate your other patches. bq. 11. For listApplications, if the users want the applications in FINISHED/FAILED/KILLED states, why not going to historyClient as well? For listapplications we decide not to get info from AHS , we shall do it once we will have filters added. We are leaving it for now. bq. 12. AHSProxy is using a bunch of RM configurations instead of AHS ones. By the way, it seems AHSProxy is almost the same as RMProxy. Is it possible to reuse the code instead of duplicating it? Done bq. 13. In YarnCLI, should we make getter for historyClient as well, like client? Done bq. 14. The mock doesn't need to be defined in get and invoked every time get is called. Define it once, it will behave the same in the following. As discussed ignoring it bq. 15. It's better to mock multiple attempts/containers to test gets. Done bq. 16. The modified part of ApplicationCLI needs to be tested as well. Done [YARN-321] Command Line Interface(CLI) for Reading Application History Storage Data --- Key: YARN-967 URL: https://issues.apache.org/jira/browse/YARN-967 Project: Hadoop YARN Issue Type: Sub-task Reporter: Devaraj K Assignee: Mayank Bansal Attachments: YARN-967-1.patch, YARN-967-2.patch, YARN-967-3.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-967) [YARN-321] Command Line Interface(CLI) for Reading Application History Storage Data
[ https://issues.apache.org/jira/browse/YARN-967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mayank Bansal updated YARN-967: --- Attachment: YARN-967-4.patch Attaching latest patch Thanks, Mayank [YARN-321] Command Line Interface(CLI) for Reading Application History Storage Data --- Key: YARN-967 URL: https://issues.apache.org/jira/browse/YARN-967 Project: Hadoop YARN Issue Type: Sub-task Reporter: Devaraj K Assignee: Mayank Bansal Attachments: YARN-967-1.patch, YARN-967-2.patch, YARN-967-3.patch, YARN-967-4.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1425) TestRMRestart fails because MockRM.waitForState(AttemptId) uses current attempt instead of the attempt passed as argument
[ https://issues.apache.org/jira/browse/YARN-1425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13828825#comment-13828825 ] Hudson commented on YARN-1425: -- FAILURE: Integrated in Hadoop-Yarn-trunk #398 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/398/]) YARN-1425. TestRMRestart fails because MockRM.waitForState(AttemptId) uses current attempt instead of the attempt passed as argument (Omkar Vinit Joshi via bikas) (bikas: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1543952) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java TestRMRestart fails because MockRM.waitForState(AttemptId) uses current attempt instead of the attempt passed as argument - Key: YARN-1425 URL: https://issues.apache.org/jira/browse/YARN-1425 Project: Hadoop YARN Issue Type: Bug Reporter: Omkar Vinit Joshi Assignee: Omkar Vinit Joshi Fix For: 2.3.0 Attachments: YARN-1425.1.patch, error.log TestRMRestart is failing on trunk. Fixing it. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1303) Allow multiple commands separating with ; in distributed-shell
[ https://issues.apache.org/jira/browse/YARN-1303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13828832#comment-13828832 ] Hudson commented on YARN-1303: -- FAILURE: Integrated in Hadoop-Yarn-trunk #398 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/398/]) YARN-1303. Reverted the wrong patch committed earlier and committing the correct patch now. In one go. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1544029) * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/Client.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDistributedShell.java YARN-1303. Fixed DistributedShell to not fail with multiple commands separated by a semi-colon as shell-command. Contributed by Xuan Gong. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1544023) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/Client.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDistributedShell.java Allow multiple commands separating with ; in distributed-shell Key: YARN-1303 URL: https://issues.apache.org/jira/browse/YARN-1303 Project: Hadoop YARN Issue Type: Improvement Components: applications/distributed-shell Reporter: Tassapol Athiapinya Assignee: Xuan Gong Fix For: 2.3.0 Attachments: YARN-1303.1.patch, YARN-1303.2.patch, YARN-1303.3.patch, YARN-1303.3.patch, YARN-1303.4.patch, YARN-1303.4.patch, YARN-1303.5.patch, YARN-1303.6.patch, YARN-1303.7.patch, YARN-1303.8.1.patch, YARN-1303.8.2.patch, YARN-1303.8.patch, YARN-1303.9.patch In shell, we can do ls; ls to run 2 commands at once. In distributed shell, this is not working. We should improve to allow this to occur. There are practical use cases that I know of to run multiple commands or to set environment variables before a command. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1053) Diagnostic message from ContainerExitEvent is ignored in ContainerImpl
[ https://issues.apache.org/jira/browse/YARN-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13828827#comment-13828827 ] Hudson commented on YARN-1053: -- FAILURE: Integrated in Hadoop-Yarn-trunk #398 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/398/]) YARN-1053. Diagnostic message from ContainerExitEvent is ignored in ContainerImpl (Omkar Vinit Joshi via bikas) (bikas: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1543973) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/TestContainer.java Diagnostic message from ContainerExitEvent is ignored in ContainerImpl -- Key: YARN-1053 URL: https://issues.apache.org/jira/browse/YARN-1053 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.2.0, 2.2.1 Reporter: Omkar Vinit Joshi Assignee: Omkar Vinit Joshi Priority: Blocker Labels: newbie Fix For: 2.3.0 Attachments: YARN-1053.1.patch, YARN-1053.20130809.patch If the container launch fails then we send ContainerExitEvent. This event contains exitCode and diagnostic message. Today we are ignoring diagnostic message while handling this event inside ContainerImpl. Fixing it as it is useful in diagnosing the failure. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-967) [YARN-321] Command Line Interface(CLI) for Reading Application History Storage Data
[ https://issues.apache.org/jira/browse/YARN-967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13828839#comment-13828839 ] Hadoop QA commented on YARN-967: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12615103/YARN-967-4.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2507//console This message is automatically generated. [YARN-321] Command Line Interface(CLI) for Reading Application History Storage Data --- Key: YARN-967 URL: https://issues.apache.org/jira/browse/YARN-967 Project: Hadoop YARN Issue Type: Sub-task Reporter: Devaraj K Assignee: Mayank Bansal Attachments: YARN-967-1.patch, YARN-967-2.patch, YARN-967-3.patch, YARN-967-4.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1430) InvalidStateTransition exceptions are ignored in state machines
[ https://issues.apache.org/jira/browse/YARN-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13829400#comment-13829400 ] Omkar Vinit Joshi commented on YARN-1430: - I think for now we should add assert statements so that in test environment it will always fail making sure we are not missing some invalid transitions? YARN-1416 is one of those examples. I agree with [~vinodkv] and [~jlowe]. Probably we should be consistent everywhere and should show somewhere these system critical errors without actually crashing daemons. InvalidStateTransition exceptions are ignored in state machines --- Key: YARN-1430 URL: https://issues.apache.org/jira/browse/YARN-1430 Project: Hadoop YARN Issue Type: Bug Reporter: Omkar Vinit Joshi Assignee: Omkar Vinit Joshi We have all state machines ignoring InvalidStateTransitions. These exceptions will get logged but will not crash the RM / NM. We definitely should crash it as they move the system into some invalid / unacceptable state. * Places where we hide this exception :- ** JobImpl ** TaskAttemptImpl ** TaskImpl ** NMClientAsyncImpl ** ApplicationImpl ** ContainerImpl ** LocalizedResource ** RMAppAttemptImpl ** RMAppImpl ** RMContainerImpl ** RMNodeImpl thoughts? -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1416) InvalidStateTransitions getting reported in multiple test cases even though they pass
[ https://issues.apache.org/jira/browse/YARN-1416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13829407#comment-13829407 ] Hadoop QA commented on YARN-1416: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12615200/YARN-1416.2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2510//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2510//console This message is automatically generated. InvalidStateTransitions getting reported in multiple test cases even though they pass - Key: YARN-1416 URL: https://issues.apache.org/jira/browse/YARN-1416 Project: Hadoop YARN Issue Type: Bug Reporter: Omkar Vinit Joshi Assignee: Jian He Attachments: YARN-1416.1.patch, YARN-1416.1.patch, YARN-1416.2.patch, YARN-1416.2.patch, YARN-1416.2.patch It might be worth checking why they are reporting this. Testcase : TestRMAppTransitions, TestRM there are large number of such errors. can't handle RMAppEventType.APP_UPDATE_SAVED at RMAppState.FAILED -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1430) InvalidStateTransition exceptions are ignored in state machines
[ https://issues.apache.org/jira/browse/YARN-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13829018#comment-13829018 ] Jason Lowe commented on YARN-1430: -- Before flipping the switch to change this, we need to carefully consider the consequences. I'm all for making this a fatal error for unit tests, but I'm not convinced this is a good thing for production environments. We have been running in production for quite some time now (0.23 instead of 2.x, but the code is very similar in many of these areas). We've seen invalid state transitions logged on our production machines and have filed quite a few JIRAs related to those. However I was often thankful the invalid state transition did not crash, because in the vast majority of these cases the system can continue to function in an acceptable manner. Sure, we might leak some resources related to an application, fail to aggregate some log or something similar, but I'd rather take that pain with a potential workaround than the alternative of bringing down the entire cluster each and every time it occurs. What I'm worried about here is a case where we don't see the error during testing but when we deploy to production some critical, frequent job consistently triggers an unhandled transition. If that's always fatal, now we're stuck in a state where the cluster cannot stay up very long until we scramble to develop and deploy a fix or have to rollback, and we have guaranteed downtime when it occurs. In almost all of these cases the invalid transition is going to be localized to just one app, one container, or one node. I'm not sure that kind of error is worth taking down an entire cluster outside of a testing setup. I feel this is similar to how most software products handle asserts -- they are fatal during development but not during production. InvalidStateTransition exceptions are ignored in state machines --- Key: YARN-1430 URL: https://issues.apache.org/jira/browse/YARN-1430 Project: Hadoop YARN Issue Type: Bug Reporter: Omkar Vinit Joshi Assignee: Omkar Vinit Joshi We have all state machines ignoring InvalidStateTransitions. These exceptions will get logged but will not crash the RM / NM. We definitely should crash it as they move the system into some invalid / unacceptable state. * Places where we hide this exception :- ** JobImpl ** TaskAttemptImpl ** TaskImpl ** NMClientAsyncImpl ** ApplicationImpl ** ContainerImpl ** LocalizedResource ** RMAppAttemptImpl ** RMAppImpl ** RMContainerImpl ** RMNodeImpl thoughts? -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1412) Allocating Containers on a particular Node in Yarn
[ https://issues.apache.org/jira/browse/YARN-1412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13829264#comment-13829264 ] gaurav gupta commented on YARN-1412: Yes All the nodes are on the same Rack. Here is the experiment that I did to verify the theory 1. Cluster size: 36 nodes 2. yarn.scheduler.capacity.node-locality-delay is set to 36 3. Asked for 36 containers with priority 0 4. I requested containers with (node=yes, rack=yes,relax-locality=true) But I still see that the containers are allocated on different nodes. Allocating Containers on a particular Node in Yarn -- Key: YARN-1412 URL: https://issues.apache.org/jira/browse/YARN-1412 Project: Hadoop YARN Issue Type: Bug Environment: centos, Hadoop 2.2.0 Reporter: gaurav gupta Summary of the problem: If I pass the node on which I want container and set relax locality default which is true, I don't get back the container on the node specified even if the resources are available on the node. It doesn't matter if I set rack or not. Here is the snippet of the code that I am using AMRMClientContainerRequest amRmClient = AMRMClient.createAMRMClient();; String host = h1; Resource capability = Records.newRecord(Resource.class); capability.setMemory(memory); nodes = new String[] {host}; // in order to request a host, we also have to request the rack racks = new String[] {/default-rack}; ListContainerRequest containerRequests = new ArrayListContainerRequest(); ListContainerId releasedContainers = new ArrayListContainerId(); containerRequests.add(new ContainerRequest(capability, nodes, racks, Priority.newInstance(priority))); if (containerRequests.size() 0) { LOG.info(Asking RM for containers: + containerRequests); for (ContainerRequest cr : containerRequests) { LOG.info(Requested container: {}, cr.toString()); amRmClient.addContainerRequest(cr); } } for (ContainerId containerId : releasedContainers) { LOG.info(Released container, id={}, containerId.getId()); amRmClient.releaseAssignedContainer(containerId); } return amRmClient.allocate(0); -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1416) InvalidStateTransitions getting reported in multiple test cases even though they pass
[ https://issues.apache.org/jira/browse/YARN-1416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13829251#comment-13829251 ] Jian He commented on YARN-1416: --- The reason for this invalid event transition is that inside testGetClientToken, it's manually controlling attempt to move some state such that some logic can be performed in one of the attemptTransition, which cause an unexpected event sent from attempt. and the whole TestRMAppTransitions unit tests are just bypassing attempt transition logic and just manually sending the app event to trigger the app transition, I think this can be fine ? I put some comments for describing the test purposes. bq. Do we know how many tests are reporting such exceptions but passing successfully? This is the only invalid event exception in TestRMAppTransitions, all others are fixed. No invalid event exception found TestRMAppAttemptTransitions. InvalidStateTransitions getting reported in multiple test cases even though they pass - Key: YARN-1416 URL: https://issues.apache.org/jira/browse/YARN-1416 Project: Hadoop YARN Issue Type: Bug Reporter: Omkar Vinit Joshi Assignee: Jian He Attachments: YARN-1416.1.patch, YARN-1416.1.patch, YARN-1416.2.patch It might be worth checking why they are reporting this. Testcase : TestRMAppTransitions, TestRM there are large number of such errors. can't handle RMAppEventType.APP_UPDATE_SAVED at RMAppState.FAILED -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-149) ResourceManager (RM) High-Availability (HA)
[ https://issues.apache.org/jira/browse/YARN-149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13828948#comment-13828948 ] Steve Loughran commented on YARN-149: - I want to warn that HADOOP-9905 is going to drop the ZK dependency from the core hadoop-client POM. If YARN client is going to depend on ZK -that's the client, not the server- then it's going to have to explicitly add it. ResourceManager (RM) High-Availability (HA) --- Key: YARN-149 URL: https://issues.apache.org/jira/browse/YARN-149 Project: Hadoop YARN Issue Type: New Feature Reporter: Harsh J Assignee: Bikas Saha Attachments: YARN ResourceManager Automatic Failover-rev-07-21-13.pdf, YARN ResourceManager Automatic Failover-rev-08-04-13.pdf, rm-ha-phase1-approach-draft1.pdf, rm-ha-phase1-draft2.pdf This jira tracks work needed to be done to support one RM instance failing over to another RM instance so that we can have RM HA. Work includes leader election, transfer of control to leader and client re-direction to new leader. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (YARN-1432) Reduce phase is failing with shuffle error in kerberos enabled cluster
Ramgopal N created YARN-1432: Summary: Reduce phase is failing with shuffle error in kerberos enabled cluster Key: YARN-1432 URL: https://issues.apache.org/jira/browse/YARN-1432 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.0-beta Reporter: Ramgopal N {code} OS user: user3 kerberos user: hdfs Reducer is trying to read the map intermediate output using kerberos user(hdfs),but the owner of this file is OS user(user3) 2013-11-21 20:35:48,169 ERROR org.apache.hadoop.mapred.ShuffleHandler: Shuffle error : java.io.IOException: Error Reading IndexFile at org.apache.hadoop.mapred.IndexCache.readIndexFileToCache(IndexCache.java:123) at org.apache.hadoop.mapred.IndexCache.getIndexInformation(IndexCache.java:68) at org.apache.hadoop.mapred.ShuffleHandler$Shuffle.sendMapOutput(ShuffleHandler.java:595) at org.apache.hadoop.mapred.ShuffleHandler$Shuffle.messageReceived(ShuffleHandler.java:506) at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:80) at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:545) at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:754) at org.jboss.netty.handler.stream.ChunkedWriteHandler.handleUpstream(ChunkedWriteHandler.java:144) at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:545) at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:754) at org.jboss.netty.handler.codec.http.HttpChunkAggregator.messageReceived(HttpChunkAggregator.java:99) at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:80) at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:545) at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:754) at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:302) at org.jboss.netty.handler.codec.replay.ReplayingDecoder.unfoldAndfireMessageReceived(ReplayingDecoder.java:523) at org.jboss.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:507) at org.jboss.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:444) at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:80) at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:545) at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:540) at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:274) at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:261) at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:350) at org.jboss.netty.channel.socket.nio.NioWorker.processSelectedKeys(NioWorker.java:281) at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:201) at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) at org.jboss.netty.util.internal.IoWorkerRunnable.run(IoWorkerRunnable.java:46) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.io.IOException: Owner 'user3' for path /home/user3/NodeAgentTmpDir/data/mapred/nm-local-dir/usercache/hdfs/appcache/application_1385040658134_0011/output/attempt_1385040658134_0011_m_00_0/file.out.index did not match expected owner 'hdfs' at org.apache.hadoop.io.SecureIOUtils.checkStat(SecureIOUtils.java:285) at org.apache.hadoop.io.SecureIOUtils.forceSecureOpenFSDataInputStream(SecureIOUtils.java:174) at org.apache.hadoop.io.SecureIOUtils.openFSDataInputStream(SecureIOUtils.java:158) at org.apache.hadoop.mapred.SpillRecord.init(SpillRecord.java:70) at org.apache.hadoop.mapred.SpillRecord.init(SpillRecord.java:62) at org.apache.hadoop.mapred.IndexCache.readIndexFileToCache(IndexCache.java:119) ... 30 more {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (YARN-1435) Custom script cannot be run because it lacks of executable bit at container level
Tassapol Athiapinya created YARN-1435: - Summary: Custom script cannot be run because it lacks of executable bit at container level Key: YARN-1435 URL: https://issues.apache.org/jira/browse/YARN-1435 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Affects Versions: 2.2.1 Reporter: Tassapol Athiapinya Fix For: 2.2.1 Create custom shell script and use -shell_command to point to that script. Uploaded shell script won't be able to execute at container level because executable bit is missing when container fetches the shell script from HDFS. Distributed shell should grant executable bit in this case. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1320) Custom log4j properties in Distributed shell does not work properly.
[ https://issues.apache.org/jira/browse/YARN-1320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13829467#comment-13829467 ] Hudson commented on YARN-1320: -- SUCCESS: Integrated in Hadoop-trunk-Commit #4784 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4784/]) YARN-1320. Fixed Distributed Shell application to respect custom log4j properties file. Contributed by Xuan Gong. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1544364) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/Client.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/Log4jPropertyHelper.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDistributedShell.java Custom log4j properties in Distributed shell does not work properly. Key: YARN-1320 URL: https://issues.apache.org/jira/browse/YARN-1320 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Reporter: Tassapol Athiapinya Assignee: Xuan Gong Fix For: 2.3.0 Attachments: YARN-1320.1.patch, YARN-1320.2.patch, YARN-1320.3.patch, YARN-1320.4.patch, YARN-1320.4.patch, YARN-1320.4.patch, YARN-1320.4.patch, YARN-1320.4.patch, YARN-1320.5.patch, YARN-1320.6.patch, YARN-1320.6.patch, YARN-1320.7.patch, YARN-1320.8.patch, YARN-1320.9.patch Distributed shell cannot pick up custom log4j properties (specified with -log_properties). It always uses default log4j properties. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1425) TestRMRestart fails because MockRM.waitForState(AttemptId) uses current attempt instead of the attempt passed as argument
[ https://issues.apache.org/jira/browse/YARN-1425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13828916#comment-13828916 ] Hudson commented on YARN-1425: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1589 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1589/]) YARN-1425. TestRMRestart fails because MockRM.waitForState(AttemptId) uses current attempt instead of the attempt passed as argument (Omkar Vinit Joshi via bikas) (bikas: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1543952) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java TestRMRestart fails because MockRM.waitForState(AttemptId) uses current attempt instead of the attempt passed as argument - Key: YARN-1425 URL: https://issues.apache.org/jira/browse/YARN-1425 Project: Hadoop YARN Issue Type: Bug Reporter: Omkar Vinit Joshi Assignee: Omkar Vinit Joshi Fix For: 2.3.0 Attachments: YARN-1425.1.patch, error.log TestRMRestart is failing on trunk. Fixing it. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1266) inheriting Application client and History Protocol from base protocol and implement PB service and clients.
[ https://issues.apache.org/jira/browse/YARN-1266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mayank Bansal updated YARN-1266: Attachment: YARN-1266-6.patch Thanks [~vinodkv] for review. I agree with you. I am updating the patch. Thanks, Mayank inheriting Application client and History Protocol from base protocol and implement PB service and clients. --- Key: YARN-1266 URL: https://issues.apache.org/jira/browse/YARN-1266 Project: Hadoop YARN Issue Type: Sub-task Reporter: Mayank Bansal Assignee: Mayank Bansal Attachments: YARN-1266-1.patch, YARN-1266-2.patch, YARN-1266-3.patch, YARN-1266-4.patch, YARN-1266-5.patch, YARN-1266-6.patch Adding ApplicationHistoryProtocolPBService to make web apps to work and changing yarn to run AHS as a seprate process -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1425) TestRMRestart fails because MockRM.waitForState(AttemptId) uses current attempt instead of the attempt passed as argument
[ https://issues.apache.org/jira/browse/YARN-1425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13828928#comment-13828928 ] Hudson commented on YARN-1425: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1615 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1615/]) YARN-1425. TestRMRestart fails because MockRM.waitForState(AttemptId) uses current attempt instead of the attempt passed as argument (Omkar Vinit Joshi via bikas) (bikas: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1543952) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java TestRMRestart fails because MockRM.waitForState(AttemptId) uses current attempt instead of the attempt passed as argument - Key: YARN-1425 URL: https://issues.apache.org/jira/browse/YARN-1425 Project: Hadoop YARN Issue Type: Bug Reporter: Omkar Vinit Joshi Assignee: Omkar Vinit Joshi Fix For: 2.3.0 Attachments: YARN-1425.1.patch, error.log TestRMRestart is failing on trunk. Fixing it. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1266) inheriting Application client and History Protocol from base protocol and implement PB service and clients.
[ https://issues.apache.org/jira/browse/YARN-1266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13829495#comment-13829495 ] Hadoop QA commented on YARN-1266: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12615224/YARN-1266-6.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2511//console This message is automatically generated. inheriting Application client and History Protocol from base protocol and implement PB service and clients. --- Key: YARN-1266 URL: https://issues.apache.org/jira/browse/YARN-1266 Project: Hadoop YARN Issue Type: Sub-task Reporter: Mayank Bansal Assignee: Mayank Bansal Attachments: YARN-1266-1.patch, YARN-1266-2.patch, YARN-1266-3.patch, YARN-1266-4.patch, YARN-1266-5.patch, YARN-1266-6.patch Adding ApplicationHistoryProtocolPBService to make web apps to work and changing yarn to run AHS as a seprate process -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1320) Custom log4j properties in Distributed shell does not work properly.
[ https://issues.apache.org/jira/browse/YARN-1320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13829239#comment-13829239 ] Xuan Gong commented on YARN-1320: - Did the test on single node cluster. Original: We have # Root logger option hadoop.root.logger=INFO,console We will not see any DEBUG messages. Create a customer log4j.property and set # Root logger option hadoop.root.logger=DEBUG,console And use --log_properties customer.properties We can see the DEBUG messages now. Part of the output : {code} 13/11/21 11:15:42 DEBUG service.AbstractService: Service: org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl entered state STOPPED 13/11/21 11:15:42 DEBUG ipc.Client: Stopping client 13/11/21 11:15:42 DEBUG ipc.Client: IPC Client (1122993107) connection to localhost/127.0.0.1:9105 from appattempt_1385060881865_0007_01: closed 13/11/21 11:15:42 DEBUG ipc.Client: IPC Client (1122993107) connection to localhost/127.0.0.1:9105 from appattempt_1385060881865_0007_01: stopped, remaining connections 0 13/11/21 11:15:42 DEBUG ipc.Client: IPC Client (1122993107) connection to localhost/127.0.0.1:54313 from xuan: closed 13/11/21 11:15:42 DEBUG ipc.Client: IPC Client (1122993107) connection to localhost/127.0.0.1:54313 from xuan: stopped, remaining connections 0 13/11/21 11:15:42 INFO distributedshell.ApplicationMaster: Application Master completed successfully. exiting {code} Custom log4j properties in Distributed shell does not work properly. Key: YARN-1320 URL: https://issues.apache.org/jira/browse/YARN-1320 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Reporter: Tassapol Athiapinya Assignee: Xuan Gong Attachments: YARN-1320.1.patch, YARN-1320.2.patch, YARN-1320.3.patch, YARN-1320.4.patch, YARN-1320.4.patch, YARN-1320.4.patch, YARN-1320.4.patch, YARN-1320.4.patch, YARN-1320.5.patch, YARN-1320.6.patch, YARN-1320.6.patch, YARN-1320.7.patch, YARN-1320.8.patch, YARN-1320.9.patch Distributed shell cannot pick up custom log4j properties (specified with -log_properties). It always uses default log4j properties. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1314) Cannot pass more than 1 argument to shell command
[ https://issues.apache.org/jira/browse/YARN-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1314: Attachment: YARN-1314.3.patch 1.Using the same approach as YARN-1303. Basically, create a file that will save all the client's input args(from --shell_args). The AM will read all the args, and add them into CLC. We try to let all containers run the exactly the same args that client gives, and let clients to figure out when and where to do the correct escaping staff. 2. Did a little code formatting, since we are using lots of duplicate codes Cannot pass more than 1 argument to shell command - Key: YARN-1314 URL: https://issues.apache.org/jira/browse/YARN-1314 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Reporter: Tassapol Athiapinya Assignee: Xuan Gong Fix For: 2.2.1 Attachments: YARN-1314.1.patch, YARN-1314.1.patch, YARN-1314.2.patch, YARN-1314.3.patch Distributed shell cannot accept more than 1 parameters in argument parts. All of these commands are treated as 1 parameter: /usr/bin/yarn org.apache.hadoop.yarn.applications.distributedshell.Client -jar distrubuted shell jar -shell_command echo -shell_args 'My name is Teddy' /usr/bin/yarn org.apache.hadoop.yarn.applications.distributedshell.Client -jar distrubuted shell jar -shell_command echo -shell_args ''My name' 'is Teddy'' /usr/bin/yarn org.apache.hadoop.yarn.applications.distributedshell.Client -jar distrubuted shell jar -shell_command echo -shell_args 'My name' 'is Teddy' -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1435) Custom script cannot be run because it lacks of executable bit at container level
[ https://issues.apache.org/jira/browse/YARN-1435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13829506#comment-13829506 ] Xuan Gong commented on YARN-1435: - Currently, if we want to run custom script at DS. We can do it like this : --shell_command sh --shell_script custom_script.sh Custom script cannot be run because it lacks of executable bit at container level - Key: YARN-1435 URL: https://issues.apache.org/jira/browse/YARN-1435 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Affects Versions: 2.2.1 Reporter: Tassapol Athiapinya Assignee: Xuan Gong Fix For: 2.2.1 Create custom shell script and use -shell_command to point to that script. Uploaded shell script won't be able to execute at container level because executable bit is missing when container fetches the shell script from HDFS. Distributed shell should grant executable bit in this case. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1435) Distributed Shell should not run other commands except sh, and run the custom script at the same time.
[ https://issues.apache.org/jira/browse/YARN-1435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1435: Summary: Distributed Shell should not run other commands except sh, and run the custom script at the same time. (was: Custom script cannot be run because it lacks of executable bit at container level) Distributed Shell should not run other commands except sh, and run the custom script at the same time. Key: YARN-1435 URL: https://issues.apache.org/jira/browse/YARN-1435 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Affects Versions: 2.2.1 Reporter: Tassapol Athiapinya Assignee: Xuan Gong Fix For: 2.2.1 Create custom shell script and use -shell_command to point to that script. Uploaded shell script won't be able to execute at container level because executable bit is missing when container fetches the shell script from HDFS. Distributed shell should grant executable bit in this case. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1434) Single Job can affect fairshare of others
[ https://issues.apache.org/jira/browse/YARN-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13829377#comment-13829377 ] Carlo Curino commented on YARN-1434: This has been observed while modifying the mapreduce AM behavior for other reasons. If the AM aggressively returns containers, it seems to be able to create the illusion to be under-capacity while wasting resources for everyone. A second job running in a separate queue (which was supposed to receive 50% of the cluster resources) was starved (only getting about 30% of the resources). This should be confirmed independently as the environment we observed this in had too much going on (i.e., this might be a false positive). If confirmed, this might be quite bad, as a single malevolent AM could affect the cluster utilization possibly by a lot. [~sandyr], [~acmurthy] thoughts? Single Job can affect fairshare of others - Key: YARN-1434 URL: https://issues.apache.org/jira/browse/YARN-1434 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Carlo Curino Priority: Minor A job receiving containers and deciding not to use them and yielding them back in the next heartbeat could significantly affect the amount of resources given to other jobs. This is because by yielding containers back the job appears always to be under-capacity (more than others) so it is picked to be the next to receive containers. Observed by Robert Grandl, to be independently confirmed. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1435) Distributed Shell should not run other commands except sh, and run the custom script at the same time.
[ https://issues.apache.org/jira/browse/YARN-1435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1435: Description: Currently, if we want to run custom script at DS. We can do it like this : --shell_command sh --shell_script custom_script.sh But it may be better to separate running shell_command and shell_script was:Create custom shell script and use -shell_command to point to that script. Uploaded shell script won't be able to execute at container level because executable bit is missing when container fetches the shell script from HDFS. Distributed shell should grant executable bit in this case. Distributed Shell should not run other commands except sh, and run the custom script at the same time. Key: YARN-1435 URL: https://issues.apache.org/jira/browse/YARN-1435 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Affects Versions: 2.2.1 Reporter: Tassapol Athiapinya Assignee: Xuan Gong Fix For: 2.2.1 Currently, if we want to run custom script at DS. We can do it like this : --shell_command sh --shell_script custom_script.sh But it may be better to separate running shell_command and shell_script -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1435) Distributed Shell should not run other commands except sh, and run the custom script at the same time.
[ https://issues.apache.org/jira/browse/YARN-1435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13829511#comment-13829511 ] Xuan Gong commented on YARN-1435: - We could let DS either execute shell_commands option or shell_script option. The right DS commandline should be that we provide either --shell_command or --shell_script. If we provide both options, we can throw out exception, and say something like Do not provide both options at the same time. Distributed Shell should not run other commands except sh, and run the custom script at the same time. Key: YARN-1435 URL: https://issues.apache.org/jira/browse/YARN-1435 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Affects Versions: 2.2.1 Reporter: Tassapol Athiapinya Assignee: Xuan Gong Fix For: 2.2.1 Currently, if we want to run custom script at DS. We can do it like this : --shell_command sh --shell_script custom_script.sh But it may be better to separate running shell_command and shell_script -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (YARN-1433) ContainerManagementProtocolProxy doesn't have the retry policy
Zhijie Shen created YARN-1433: - Summary: ContainerManagementProtocolProxy doesn't have the retry policy Key: YARN-1433 URL: https://issues.apache.org/jira/browse/YARN-1433 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen ContainerManagementProtocolProxy doesn't have the retry policy, but RMProxy has. Is there any special consideration about whether the retry policy is required or not. The same question is applied to Application History Server as well (YARN-967). Any idea? -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1426) YARN Components need to unregister their beans upon shutdown
[ https://issues.apache.org/jira/browse/YARN-1426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13829381#comment-13829381 ] Jonathan Eagles commented on YARN-1426: --- Test failures: - TestJobCleanup is from MAPREDUCE-5552. -- Ran this test with and without my patch and both succeed on my desktop. YARN Components need to unregister their beans upon shutdown Key: YARN-1426 URL: https://issues.apache.org/jira/browse/YARN-1426 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 3.0.0, 2.3.0 Reporter: Jonathan Eagles Assignee: Jonathan Eagles Attachments: YARN-1426.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1053) Diagnostic message from ContainerExitEvent is ignored in ContainerImpl
[ https://issues.apache.org/jira/browse/YARN-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13828930#comment-13828930 ] Hudson commented on YARN-1053: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1615 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1615/]) YARN-1053. Diagnostic message from ContainerExitEvent is ignored in ContainerImpl (Omkar Vinit Joshi via bikas) (bikas: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1543973) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/TestContainer.java Diagnostic message from ContainerExitEvent is ignored in ContainerImpl -- Key: YARN-1053 URL: https://issues.apache.org/jira/browse/YARN-1053 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.2.0, 2.2.1 Reporter: Omkar Vinit Joshi Assignee: Omkar Vinit Joshi Priority: Blocker Labels: newbie Fix For: 2.3.0 Attachments: YARN-1053.1.patch, YARN-1053.20130809.patch If the container launch fails then we send ContainerExitEvent. This event contains exitCode and diagnostic message. Today we are ignoring diagnostic message while handling this event inside ContainerImpl. Fixing it as it is useful in diagnosing the failure. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1314) Cannot pass more than 1 argument to shell command
[ https://issues.apache.org/jira/browse/YARN-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1314: Attachment: YARN-1314.4.patch fix test case failure Cannot pass more than 1 argument to shell command - Key: YARN-1314 URL: https://issues.apache.org/jira/browse/YARN-1314 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Reporter: Tassapol Athiapinya Assignee: Xuan Gong Fix For: 2.2.1 Attachments: YARN-1314.1.patch, YARN-1314.1.patch, YARN-1314.2.patch, YARN-1314.3.patch, YARN-1314.4.patch Distributed shell cannot accept more than 1 parameters in argument parts. All of these commands are treated as 1 parameter: /usr/bin/yarn org.apache.hadoop.yarn.applications.distributedshell.Client -jar distrubuted shell jar -shell_command echo -shell_args 'My name is Teddy' /usr/bin/yarn org.apache.hadoop.yarn.applications.distributedshell.Client -jar distrubuted shell jar -shell_command echo -shell_args ''My name' 'is Teddy'' /usr/bin/yarn org.apache.hadoop.yarn.applications.distributedshell.Client -jar distrubuted shell jar -shell_command echo -shell_args 'My name' 'is Teddy' -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1314) Cannot pass more than 1 argument to shell command
[ https://issues.apache.org/jira/browse/YARN-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1314: Attachment: YARN-1314.5.patch Cannot pass more than 1 argument to shell command - Key: YARN-1314 URL: https://issues.apache.org/jira/browse/YARN-1314 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Reporter: Tassapol Athiapinya Assignee: Xuan Gong Fix For: 2.2.1 Attachments: YARN-1314.1.patch, YARN-1314.1.patch, YARN-1314.2.patch, YARN-1314.3.patch, YARN-1314.4.1.patch, YARN-1314.4.patch, YARN-1314.5.patch Distributed shell cannot accept more than 1 parameters in argument parts. All of these commands are treated as 1 parameter: /usr/bin/yarn org.apache.hadoop.yarn.applications.distributedshell.Client -jar distrubuted shell jar -shell_command echo -shell_args 'My name is Teddy' /usr/bin/yarn org.apache.hadoop.yarn.applications.distributedshell.Client -jar distrubuted shell jar -shell_command echo -shell_args ''My name' 'is Teddy'' /usr/bin/yarn org.apache.hadoop.yarn.applications.distributedshell.Client -jar distrubuted shell jar -shell_command echo -shell_args 'My name' 'is Teddy' -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1314) Cannot pass more than 1 argument to shell command
[ https://issues.apache.org/jira/browse/YARN-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13829531#comment-13829531 ] Xuan Gong commented on YARN-1314: - Increasing the --num_container numbers to let test case find the correct log folder Cannot pass more than 1 argument to shell command - Key: YARN-1314 URL: https://issues.apache.org/jira/browse/YARN-1314 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Reporter: Tassapol Athiapinya Assignee: Xuan Gong Fix For: 2.2.1 Attachments: YARN-1314.1.patch, YARN-1314.1.patch, YARN-1314.2.patch, YARN-1314.3.patch, YARN-1314.4.1.patch, YARN-1314.4.patch, YARN-1314.5.patch Distributed shell cannot accept more than 1 parameters in argument parts. All of these commands are treated as 1 parameter: /usr/bin/yarn org.apache.hadoop.yarn.applications.distributedshell.Client -jar distrubuted shell jar -shell_command echo -shell_args 'My name is Teddy' /usr/bin/yarn org.apache.hadoop.yarn.applications.distributedshell.Client -jar distrubuted shell jar -shell_command echo -shell_args ''My name' 'is Teddy'' /usr/bin/yarn org.apache.hadoop.yarn.applications.distributedshell.Client -jar distrubuted shell jar -shell_command echo -shell_args 'My name' 'is Teddy' -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1416) InvalidStateTransitions getting reported in multiple test cases even though they pass
[ https://issues.apache.org/jira/browse/YARN-1416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13829257#comment-13829257 ] Hadoop QA commented on YARN-1416: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12615171/YARN-1416.2.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2508//console This message is automatically generated. InvalidStateTransitions getting reported in multiple test cases even though they pass - Key: YARN-1416 URL: https://issues.apache.org/jira/browse/YARN-1416 Project: Hadoop YARN Issue Type: Bug Reporter: Omkar Vinit Joshi Assignee: Jian He Attachments: YARN-1416.1.patch, YARN-1416.1.patch, YARN-1416.2.patch It might be worth checking why they are reporting this. Testcase : TestRMAppTransitions, TestRM there are large number of such errors. can't handle RMAppEventType.APP_UPDATE_SAVED at RMAppState.FAILED -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (YARN-1434) Single Job can affect fairshare of others
Carlo Curino created YARN-1434: -- Summary: Single Job can affect fairshare of others Key: YARN-1434 URL: https://issues.apache.org/jira/browse/YARN-1434 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Carlo Curino Priority: Minor A job receiving containers and deciding not to use them and yielding them back in the next heartbeat could significantly affect the amount of resources given to other jobs. This is because by yielding containers back the job appears always to be under-capacity (more than others) so it is picked to be the next to receive containers. Observed by Robert Grandl, to be independently confirmed. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1314) Cannot pass more than 1 argument to shell command
[ https://issues.apache.org/jira/browse/YARN-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13829542#comment-13829542 ] Hadoop QA commented on YARN-1314: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12615234/YARN-1314.5.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2513//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2513//console This message is automatically generated. Cannot pass more than 1 argument to shell command - Key: YARN-1314 URL: https://issues.apache.org/jira/browse/YARN-1314 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Reporter: Tassapol Athiapinya Assignee: Xuan Gong Fix For: 2.2.1 Attachments: YARN-1314.1.patch, YARN-1314.1.patch, YARN-1314.2.patch, YARN-1314.3.patch, YARN-1314.4.1.patch, YARN-1314.4.patch, YARN-1314.5.patch Distributed shell cannot accept more than 1 parameters in argument parts. All of these commands are treated as 1 parameter: /usr/bin/yarn org.apache.hadoop.yarn.applications.distributedshell.Client -jar distrubuted shell jar -shell_command echo -shell_args 'My name is Teddy' /usr/bin/yarn org.apache.hadoop.yarn.applications.distributedshell.Client -jar distrubuted shell jar -shell_command echo -shell_args ''My name' 'is Teddy'' /usr/bin/yarn org.apache.hadoop.yarn.applications.distributedshell.Client -jar distrubuted shell jar -shell_command echo -shell_args 'My name' 'is Teddy' -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1416) InvalidStateTransitions getting reported in multiple test cases even though they pass
[ https://issues.apache.org/jira/browse/YARN-1416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-1416: -- Attachment: YARN-1416.2.patch InvalidStateTransitions getting reported in multiple test cases even though they pass - Key: YARN-1416 URL: https://issues.apache.org/jira/browse/YARN-1416 Project: Hadoop YARN Issue Type: Bug Reporter: Omkar Vinit Joshi Assignee: Jian He Attachments: YARN-1416.1.patch, YARN-1416.1.patch, YARN-1416.2.patch, YARN-1416.2.patch, YARN-1416.2.patch It might be worth checking why they are reporting this. Testcase : TestRMAppTransitions, TestRM there are large number of such errors. can't handle RMAppEventType.APP_UPDATE_SAVED at RMAppState.FAILED -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1314) Cannot pass more than 1 argument to shell command
[ https://issues.apache.org/jira/browse/YARN-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13829551#comment-13829551 ] Xuan Gong commented on YARN-1314: - Did test on single node cluster: Using --shell_command echo --shell_args HADOOP YARN MAPREDUCE In launch_container.sh , {code} exec /bin/bash -c echo HADOOP YARN MAPREDUCE 1/Users/xuan/dep/hadoop-3.0.0-SNAPSHOT/logs/application_1385060881865_0015/container_1385060881865_0015_01_02/stdout 2/Users/xuan/dep/hadoop-3.0.0-SNAPSHOT/logs/application_1385060881865_0015/container_1385060881865_0015_01_02/stderr {code} In container stdout log: it shows {code} HADOOP YARN MAPREDUCE {code} as expected Cannot pass more than 1 argument to shell command - Key: YARN-1314 URL: https://issues.apache.org/jira/browse/YARN-1314 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Reporter: Tassapol Athiapinya Assignee: Xuan Gong Fix For: 2.2.1 Attachments: YARN-1314.1.patch, YARN-1314.1.patch, YARN-1314.2.patch, YARN-1314.3.patch, YARN-1314.4.1.patch, YARN-1314.4.patch, YARN-1314.5.patch Distributed shell cannot accept more than 1 parameters in argument parts. All of these commands are treated as 1 parameter: /usr/bin/yarn org.apache.hadoop.yarn.applications.distributedshell.Client -jar distrubuted shell jar -shell_command echo -shell_args 'My name is Teddy' /usr/bin/yarn org.apache.hadoop.yarn.applications.distributedshell.Client -jar distrubuted shell jar -shell_command echo -shell_args ''My name' 'is Teddy'' /usr/bin/yarn org.apache.hadoop.yarn.applications.distributedshell.Client -jar distrubuted shell jar -shell_command echo -shell_args 'My name' 'is Teddy' -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1266) Implement PB service and client wrappers for ApplicationHistoryProtocol
[ https://issues.apache.org/jira/browse/YARN-1266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-1266: -- Summary: Implement PB service and client wrappers for ApplicationHistoryProtocol (was: inheriting Application client and History Protocol from base protocol and implement PB service and clients.) Implement PB service and client wrappers for ApplicationHistoryProtocol --- Key: YARN-1266 URL: https://issues.apache.org/jira/browse/YARN-1266 Project: Hadoop YARN Issue Type: Sub-task Reporter: Mayank Bansal Assignee: Mayank Bansal Attachments: YARN-1266-1.patch, YARN-1266-2.patch, YARN-1266-3.patch, YARN-1266-4.patch, YARN-1266-5.patch, YARN-1266-6.patch Adding ApplicationHistoryProtocolPBService to make web apps to work and changing yarn to run AHS as a seprate process -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-967) [YARN-321] Command Line Interface(CLI) for Reading Application History Storage Data
[ https://issues.apache.org/jira/browse/YARN-967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13829243#comment-13829243 ] Zhijie Shen commented on YARN-967: -- bq. I think we should use same to maintian consistency betwwn ahsclient and yarn cli in terms of polling interval. I think keeping lots of confs doesn't make sense. Please remove it, because it doesn't make sense to AHS, it's used by YarnClient#submitApplication {code} + @Override + protected void serviceInit(Configuration conf) throws Exception { +this.ahsAddress = getAHSAddress(conf); +statePollIntervalMillis = conf.getLong( +YarnConfiguration.YARN_CLIENT_APP_SUBMISSION_POLL_INTERVAL_MS, +YarnConfiguration.DEFAULT_YARN_CLIENT_APP_SUBMISSION_POLL_INTERVAL_MS); +super.serviceInit(conf); + } {code} bq. This is on purpose, as we first want to make call to RM and if app is not there then call AHS if not there then send exception to client. For attempt and contianer it only look into AHS and if not found send exception back to client. Thats the older behavior. The point is: 1. Before the patch, if the application is not found, ApplicationNotFoundException is thrown. 2. After the patch, if the application is not found in RM, then check AHS. If the application is not found in AHS, return null. The behavior is changed, such that it is not compatible. I suggest throwing ApplicationNotFoundException if the application is not found in AHS as well. It seems to be done in the patch of YARN-955. Similar changes should be applied to getApplicationAttemptReport, and getContainerReport. In addition, I also suggest looking the behavior of ClientRMService#getApplications, and make ApplicationHistoryClientSerivce to behave similarly. bq. For listapplications we decide not to get info from AHS , we shall do it once we will have filters added. We are leaving it for now. Ok, it's fine. We can fix it later. More comments: 1. The javadoc is still not fixed {code} + * Prints the application attempt report for an application id. + * + * @param applicationId + * @throws YarnException + */ + private void printApplicationAttemptReport(String applicationAttemptId) {code} {code} + /** + * Prints the container report for an application attempt id. + * + * @param applicationAttemptId + * @throws YarnException + */ + private void printContainerReport(String containerId) throws YarnException, + IOException { {code} 2. Then, it's going to reuse the retry policy of RM, which seems not to be good. BTW, ContainerManagementProtocolProxy seems not to have the retry policy as well. Maybe we should simply create a proxy as HSProxies does? {code} +RetryPolicy retryPolicy = createRetryPolicy(conf); {code} [YARN-321] Command Line Interface(CLI) for Reading Application History Storage Data --- Key: YARN-967 URL: https://issues.apache.org/jira/browse/YARN-967 Project: Hadoop YARN Issue Type: Sub-task Reporter: Devaraj K Assignee: Mayank Bansal Attachments: YARN-967-1.patch, YARN-967-2.patch, YARN-967-3.patch, YARN-967-4.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (YARN-1436) ZKRMStateStore should have separate configuration for retry period.
Omkar Vinit Joshi created YARN-1436: --- Summary: ZKRMStateStore should have separate configuration for retry period. Key: YARN-1436 URL: https://issues.apache.org/jira/browse/YARN-1436 Project: Hadoop YARN Issue Type: Bug Reporter: Omkar Vinit Joshi Assignee: Jian He Problem :- Today we have zkSessionTimeout period which is getting used for zookeeper session timeout and for ZKRMStateStore based retry policy. Proposed suggestion :- Ideally we should have different configuration knobs for this. Ideal values for zkSessionTimeout should be :- number of zookeeper instances participating in quorum * per zookeeper session timeout. see {code} org.apache.zookeeper.ClientCnxn.ClientCnxn().. connectTimeout = sessionTimeout / hostProvider.size(); {code} retry policy... (may be retry time period or count) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1436) ZKRMStateStore should have separate configuration for retry period.
[ https://issues.apache.org/jira/browse/YARN-1436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Omkar Vinit Joshi updated YARN-1436: Component/s: resourcemanager ZKRMStateStore should have separate configuration for retry period. --- Key: YARN-1436 URL: https://issues.apache.org/jira/browse/YARN-1436 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.2.1 Reporter: Omkar Vinit Joshi Assignee: Jian He Problem :- Today we have zkSessionTimeout period which is getting used for zookeeper session timeout and for ZKRMStateStore based retry policy. Proposed suggestion :- Ideally we should have different configuration knobs for this. Ideal values for zkSessionTimeout should be :- number of zookeeper instances participating in quorum * per zookeeper session timeout. see {code} org.apache.zookeeper.ClientCnxn.ClientCnxn().. connectTimeout = sessionTimeout / hostProvider.size(); {code} retry policy... (may be retry time period or count) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1416) InvalidStateTransitions getting reported in multiple test cases even though they pass
[ https://issues.apache.org/jira/browse/YARN-1416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-1416: -- Attachment: YARN-1416.2.patch InvalidStateTransitions getting reported in multiple test cases even though they pass - Key: YARN-1416 URL: https://issues.apache.org/jira/browse/YARN-1416 Project: Hadoop YARN Issue Type: Bug Reporter: Omkar Vinit Joshi Assignee: Jian He Attachments: YARN-1416.1.patch, YARN-1416.1.patch, YARN-1416.2.patch It might be worth checking why they are reporting this. Testcase : TestRMAppTransitions, TestRM there are large number of such errors. can't handle RMAppEventType.APP_UPDATE_SAVED at RMAppState.FAILED -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-149) ResourceManager (RM) High-Availability (HA)
[ https://issues.apache.org/jira/browse/YARN-149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13829125#comment-13829125 ] Bikas Saha commented on YARN-149: - Please open a sub-task. A patch would be great or else someone else could pick it up too. ResourceManager (RM) High-Availability (HA) --- Key: YARN-149 URL: https://issues.apache.org/jira/browse/YARN-149 Project: Hadoop YARN Issue Type: New Feature Reporter: Harsh J Assignee: Bikas Saha Attachments: YARN ResourceManager Automatic Failover-rev-07-21-13.pdf, YARN ResourceManager Automatic Failover-rev-08-04-13.pdf, rm-ha-phase1-approach-draft1.pdf, rm-ha-phase1-draft2.pdf This jira tracks work needed to be done to support one RM instance failing over to another RM instance so that we can have RM HA. Work includes leader election, transfer of control to leader and client re-direction to new leader. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-955) [YARN-321] Implementation of ApplicationHistoryProtocol
[ https://issues.apache.org/jira/browse/YARN-955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13829246#comment-13829246 ] Zhijie Shen commented on YARN-955: -- This patch may need to be changed according to the comments in YARN-967: https://issues.apache.org/jira/browse/YARN-967?focusedCommentId=13829243page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13829243 [YARN-321] Implementation of ApplicationHistoryProtocol --- Key: YARN-955 URL: https://issues.apache.org/jira/browse/YARN-955 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Mayank Bansal Attachments: YARN-955-1.patch, YARN-955-2.patch, YARN-955-3.patch, YARN-955-4.patch, YARN-955-5.patch, YARN-955-6.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1303) Allow multiple commands separating with ; in distributed-shell
[ https://issues.apache.org/jira/browse/YARN-1303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13828935#comment-13828935 ] Hudson commented on YARN-1303: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1615 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1615/]) YARN-1303. Reverted the wrong patch committed earlier and committing the correct patch now. In one go. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1544029) * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/Client.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDistributedShell.java YARN-1303. Fixed DistributedShell to not fail with multiple commands separated by a semi-colon as shell-command. Contributed by Xuan Gong. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1544023) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/Client.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDistributedShell.java Allow multiple commands separating with ; in distributed-shell Key: YARN-1303 URL: https://issues.apache.org/jira/browse/YARN-1303 Project: Hadoop YARN Issue Type: Improvement Components: applications/distributed-shell Reporter: Tassapol Athiapinya Assignee: Xuan Gong Fix For: 2.3.0 Attachments: YARN-1303.1.patch, YARN-1303.2.patch, YARN-1303.3.patch, YARN-1303.3.patch, YARN-1303.4.patch, YARN-1303.4.patch, YARN-1303.5.patch, YARN-1303.6.patch, YARN-1303.7.patch, YARN-1303.8.1.patch, YARN-1303.8.2.patch, YARN-1303.8.patch, YARN-1303.9.patch In shell, we can do ls; ls to run 2 commands at once. In distributed shell, this is not working. We should improve to allow this to occur. There are practical use cases that I know of to run multiple commands or to set environment variables before a command. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1416) InvalidStateTransitions getting reported in multiple test cases even though they pass
[ https://issues.apache.org/jira/browse/YARN-1416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-1416: -- Attachment: YARN-1416.2.patch InvalidStateTransitions getting reported in multiple test cases even though they pass - Key: YARN-1416 URL: https://issues.apache.org/jira/browse/YARN-1416 Project: Hadoop YARN Issue Type: Bug Reporter: Omkar Vinit Joshi Assignee: Jian He Attachments: YARN-1416.1.patch, YARN-1416.1.patch, YARN-1416.2.patch, YARN-1416.2.patch It might be worth checking why they are reporting this. Testcase : TestRMAppTransitions, TestRM there are large number of such errors. can't handle RMAppEventType.APP_UPDATE_SAVED at RMAppState.FAILED -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1430) InvalidStateTransition exceptions are ignored in state machines
[ https://issues.apache.org/jira/browse/YARN-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13829229#comment-13829229 ] Vinod Kumar Vavilapalli commented on YARN-1430: --- There are pros and cons to both approaches. If we completely ignore the errors, nobody knows about the problem. One solution to this is have these invalid transitions bubble up to the UI, say on RM UI, AM UI etc in wild, bold and red colors. On the other side, I agree that crashing RM all the time is going to be more and more painful in production environments. As for tests, I think we SHOULD clearly crash the tests, so that we can catch as many of these errors as quickly as possible. But as of today, we are treating them inconsistently. An invalid event to the scheduler crashes the RM but an invalid event in RMNode isn't. We need to be consistent. InvalidStateTransition exceptions are ignored in state machines --- Key: YARN-1430 URL: https://issues.apache.org/jira/browse/YARN-1430 Project: Hadoop YARN Issue Type: Bug Reporter: Omkar Vinit Joshi Assignee: Omkar Vinit Joshi We have all state machines ignoring InvalidStateTransitions. These exceptions will get logged but will not crash the RM / NM. We definitely should crash it as they move the system into some invalid / unacceptable state. * Places where we hide this exception :- ** JobImpl ** TaskAttemptImpl ** TaskImpl ** NMClientAsyncImpl ** ApplicationImpl ** ContainerImpl ** LocalizedResource ** RMAppAttemptImpl ** RMAppImpl ** RMContainerImpl ** RMNodeImpl thoughts? -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1053) Diagnostic message from ContainerExitEvent is ignored in ContainerImpl
[ https://issues.apache.org/jira/browse/YARN-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13828918#comment-13828918 ] Hudson commented on YARN-1053: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1589 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1589/]) YARN-1053. Diagnostic message from ContainerExitEvent is ignored in ContainerImpl (Omkar Vinit Joshi via bikas) (bikas: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1543973) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/TestContainer.java Diagnostic message from ContainerExitEvent is ignored in ContainerImpl -- Key: YARN-1053 URL: https://issues.apache.org/jira/browse/YARN-1053 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.2.0, 2.2.1 Reporter: Omkar Vinit Joshi Assignee: Omkar Vinit Joshi Priority: Blocker Labels: newbie Fix For: 2.3.0 Attachments: YARN-1053.1.patch, YARN-1053.20130809.patch If the container launch fails then we send ContainerExitEvent. This event contains exitCode and diagnostic message. Today we are ignoring diagnostic message while handling this event inside ContainerImpl. Fixing it as it is useful in diagnosing the failure. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1303) Allow multiple commands separating with ; in distributed-shell
[ https://issues.apache.org/jira/browse/YARN-1303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13828923#comment-13828923 ] Hudson commented on YARN-1303: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1589 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1589/]) YARN-1303. Reverted the wrong patch committed earlier and committing the correct patch now. In one go. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1544029) * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/Client.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDistributedShell.java YARN-1303. Fixed DistributedShell to not fail with multiple commands separated by a semi-colon as shell-command. Contributed by Xuan Gong. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1544023) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/Client.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDistributedShell.java Allow multiple commands separating with ; in distributed-shell Key: YARN-1303 URL: https://issues.apache.org/jira/browse/YARN-1303 Project: Hadoop YARN Issue Type: Improvement Components: applications/distributed-shell Reporter: Tassapol Athiapinya Assignee: Xuan Gong Fix For: 2.3.0 Attachments: YARN-1303.1.patch, YARN-1303.2.patch, YARN-1303.3.patch, YARN-1303.3.patch, YARN-1303.4.patch, YARN-1303.4.patch, YARN-1303.5.patch, YARN-1303.6.patch, YARN-1303.7.patch, YARN-1303.8.1.patch, YARN-1303.8.2.patch, YARN-1303.8.patch, YARN-1303.9.patch In shell, we can do ls; ls to run 2 commands at once. In distributed shell, this is not working. We should improve to allow this to occur. There are practical use cases that I know of to run multiple commands or to set environment variables before a command. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1416) InvalidStateTransitions getting reported in multiple test cases even though they pass
[ https://issues.apache.org/jira/browse/YARN-1416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13829306#comment-13829306 ] Hadoop QA commented on YARN-1416: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12615174/YARN-1416.2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2509//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2509//console This message is automatically generated. InvalidStateTransitions getting reported in multiple test cases even though they pass - Key: YARN-1416 URL: https://issues.apache.org/jira/browse/YARN-1416 Project: Hadoop YARN Issue Type: Bug Reporter: Omkar Vinit Joshi Assignee: Jian He Attachments: YARN-1416.1.patch, YARN-1416.1.patch, YARN-1416.2.patch, YARN-1416.2.patch It might be worth checking why they are reporting this. Testcase : TestRMAppTransitions, TestRM there are large number of such errors. can't handle RMAppEventType.APP_UPDATE_SAVED at RMAppState.FAILED -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-946) Adding HDFS implementation for History Reader Interface
[ https://issues.apache.org/jira/browse/YARN-946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-946: - Summary: Adding HDFS implementation for History Reader Interface (was: Adding HDFS implementation for Histrory Reader Interface) Adding HDFS implementation for History Reader Interface --- Key: YARN-946 URL: https://issues.apache.org/jira/browse/YARN-946 Project: Hadoop YARN Issue Type: Sub-task Reporter: Mayank Bansal Assignee: Mayank Bansal By default we decided to do the HDFS implementation for HistoryReader Interface. Thanks, Mayank -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-955) [YARN-321] Implementation of ApplicationHistoryProtocol
[ https://issues.apache.org/jira/browse/YARN-955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13829490#comment-13829490 ] Zhijie Shen commented on YARN-955: -- bq. As discussed , We will do the changes as part of YARN-967. +1. Let's unblock this ticket. [YARN-321] Implementation of ApplicationHistoryProtocol --- Key: YARN-955 URL: https://issues.apache.org/jira/browse/YARN-955 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Mayank Bansal Attachments: YARN-955-1.patch, YARN-955-2.patch, YARN-955-3.patch, YARN-955-4.patch, YARN-955-5.patch, YARN-955-6.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1436) ZKRMStateStore should have separate configuration for retry period.
[ https://issues.apache.org/jira/browse/YARN-1436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Omkar Vinit Joshi updated YARN-1436: Affects Version/s: 2.2.1 ZKRMStateStore should have separate configuration for retry period. --- Key: YARN-1436 URL: https://issues.apache.org/jira/browse/YARN-1436 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.2.1 Reporter: Omkar Vinit Joshi Assignee: Jian He Problem :- Today we have zkSessionTimeout period which is getting used for zookeeper session timeout and for ZKRMStateStore based retry policy. Proposed suggestion :- Ideally we should have different configuration knobs for this. Ideal values for zkSessionTimeout should be :- number of zookeeper instances participating in quorum * per zookeeper session timeout. see {code} org.apache.zookeeper.ClientCnxn.ClientCnxn().. connectTimeout = sessionTimeout / hostProvider.size(); {code} retry policy... (may be retry time period or count) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1314) Cannot pass more than 1 argument to shell command
[ https://issues.apache.org/jira/browse/YARN-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1314: Attachment: YARN-1314.4.1.patch Cannot pass more than 1 argument to shell command - Key: YARN-1314 URL: https://issues.apache.org/jira/browse/YARN-1314 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Reporter: Tassapol Athiapinya Assignee: Xuan Gong Fix For: 2.2.1 Attachments: YARN-1314.1.patch, YARN-1314.1.patch, YARN-1314.2.patch, YARN-1314.3.patch, YARN-1314.4.1.patch, YARN-1314.4.patch Distributed shell cannot accept more than 1 parameters in argument parts. All of these commands are treated as 1 parameter: /usr/bin/yarn org.apache.hadoop.yarn.applications.distributedshell.Client -jar distrubuted shell jar -shell_command echo -shell_args 'My name is Teddy' /usr/bin/yarn org.apache.hadoop.yarn.applications.distributedshell.Client -jar distrubuted shell jar -shell_command echo -shell_args ''My name' 'is Teddy'' /usr/bin/yarn org.apache.hadoop.yarn.applications.distributedshell.Client -jar distrubuted shell jar -shell_command echo -shell_args 'My name' 'is Teddy' -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1314) Cannot pass more than 1 argument to shell command
[ https://issues.apache.org/jira/browse/YARN-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13829518#comment-13829518 ] Hadoop QA commented on YARN-1314: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12615225/YARN-1314.3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell: org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2512//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2512//console This message is automatically generated. Cannot pass more than 1 argument to shell command - Key: YARN-1314 URL: https://issues.apache.org/jira/browse/YARN-1314 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Reporter: Tassapol Athiapinya Assignee: Xuan Gong Fix For: 2.2.1 Attachments: YARN-1314.1.patch, YARN-1314.1.patch, YARN-1314.2.patch, YARN-1314.3.patch Distributed shell cannot accept more than 1 parameters in argument parts. All of these commands are treated as 1 parameter: /usr/bin/yarn org.apache.hadoop.yarn.applications.distributedshell.Client -jar distrubuted shell jar -shell_command echo -shell_args 'My name is Teddy' /usr/bin/yarn org.apache.hadoop.yarn.applications.distributedshell.Client -jar distrubuted shell jar -shell_command echo -shell_args ''My name' 'is Teddy'' /usr/bin/yarn org.apache.hadoop.yarn.applications.distributedshell.Client -jar distrubuted shell jar -shell_command echo -shell_args 'My name' 'is Teddy' -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-954) [YARN-321] History Service should create the webUI and wire it to HistoryStorage
[ https://issues.apache.org/jira/browse/YARN-954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13829632#comment-13829632 ] Vinod Kumar Vavilapalli commented on YARN-954: -- Looked at the patch, mostly looks good! - AppBlock.java has lots of spacing with dots on individual lines. Please fix that. - Does displaying of appType work? - Styling of app-attempts table and container table: Footer bar for the tables is missing. - Appattempts page: Reorder the info data to be State first, then Master container, node, Tracking URL, Diagnostic info. - Container page: -- Title missing -- Reorder the information: State, ExitStatus, Node, priority, Started, Elapsed, Resource: ( Memory, Vcores ), Logs, Diagnostics -- Extra Underline after the table appearing [YARN-321] History Service should create the webUI and wire it to HistoryStorage Key: YARN-954 URL: https://issues.apache.org/jira/browse/YARN-954 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Zhijie Shen Attachments: YARN-954-3.patch, YARN-954-v0.patch, YARN-954-v1.patch, YARN-954-v2.patch, YARN-954.4.patch, YARN-954.5.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Resolved] (YARN-951) Add hard minimum resource capabilities for container launching
[ https://issues.apache.org/jira/browse/YARN-951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alejandro Abdelnur resolved YARN-951. - Resolution: Won't Fix Add hard minimum resource capabilities for container launching -- Key: YARN-951 URL: https://issues.apache.org/jira/browse/YARN-951 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.1.0-beta Reporter: Alejandro Abdelnur Assignee: Wei Yan This is a follow up of YARN-789, which enabled FairScheduler to handle zero capabilities resource requests in one dimension (either zero CPU or zero memory). When resource enforcement is enabled (cgroups for CPU and ProcfsBasedProcessTree for memory) we cannot use zero because the underlying container processes will be killed. We need to introduce an absolute or hard minimum: * For CPU. Hard enforcement can be done via a cgroup cpu controller. Using an absolute minimum of a few CPU shares (ie 10) in the LinuxContainerExecutor we ensure there is enough CPU cycles to run the sleep process. This absolute minimum would only kick-in if zero is allowed, otherwise will never kick in as the shares for 1 CPU are 1024. * For Memory. Hard enforcement is currently done by the ProcfsBasedProcessTree.java, using a minimum absolute of 1 or 2 MBs would take care of zero memory resources. And again, this absolute minimum would only kick-in if zero is allowed, otherwise will never kick in as the increment memory is in several MBs if not 1GB. There would be no default for this hard minimum, if not set no correction will be done. If set, then the MAX(hard-minimum, container-resource-capability) will be used. Effectively there will not be any impact unless the hard minimum capabilities are explicitly set. And, even if set, unless the scheduler is configured to allow zero capabilities, the hard-minimum value will not kick in unless is set to a value higher than the MIN capabilities for a container. Expected values, when set, would be 10 shares for CPU and 2 MB for memory. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1404) Enable external systems/frameworks to share resources with Hadoop leveraging Yarn resource scheduling
[ https://issues.apache.org/jira/browse/YARN-1404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alejandro Abdelnur updated YARN-1404: - Description: Currently Hadoop Yarn expects to manage the lifecycle of the processes its applications run workload in. External frameworks/systems could benefit from sharing resources with other Yarn applications while running their workload within long-running processes owned by the external framework (in other words, running their workload outside of the context of a Yarn container process). Because Yarn provides robust and scalable resource management, it is desirable for some external systems to leverage the resource governance capabilities of Yarn (queues, capacities, scheduling, access control) while supplying their own resource enforcement. Impala is an example of such system. Impala uses Llama (http://cloudera.github.io/llama/) to request resources from Yarn. Impala runs an impalad process in every node of the cluster, when a user submits a query, the processing is broken into 'query fragments' which are run in multiple impalad processes leveraging data locality (similar to Map-Reduce Mappers processing a collocated HDFS block of input data). The execution of a 'query fragment' requires an amount of CPU and memory in the impalad. As the impalad shares the host with other services (HDFS DataNode, Yarn NodeManager, Hbase Region Server) and Yarn Applications (MapReduce tasks). To ensure cluster utilization that follow the Yarn scheduler policies and it does not overload the cluster nodes, before running a 'query fragment' in a node, Impala requests the required amount of CPU and memory from Yarn. Once the requested CPU and memory has been allocated, Impala starts running the 'query fragment' taking care that the 'query fragment' does not use more resources than the ones that have been allocated. Memory is book kept per 'query fragment' and the threads used for the processing of the 'query fragment' are placed under a cgroup to contain CPU utilization. Today, for all resources that have been asked to Yarn RM, a (container) process must be started via the corresponding NodeManager. Failing to do this, will result on the cancelation of the container allocation relinquishing the acquired resource capacity back to the pool of available resources. To avoid this, Impala starts a dummy container process doing 'sleep 10y'. Using a dummy container process has its drawbacks: * the dummy container process is in a cgroup with a given number of CPU shares that are not used and Impala is re-issuing those CPU shares to another cgroup for the thread running the 'query fragment'. The cgroup CPU enforcement works correctly because of the CPU controller implementation (but the formal specified behavior is actually undefined). * Impala may ask for CPU and memory independent of each other. Some requests may be only memory with no CPU or viceversa. Because a container requires a process, complete absence of memory or CPU is not possible even if the dummy process is 'sleep', a minimal amount of memory and CPU is required for the dummy process. Because of this it is desirable to be able to have a container without a backing process. was: Currently a container allocation requires to start a container process with the corresponding NodeManager's node. For applications that need to use the allocated resources out of band from Yarn this means that a dummy container process must be started. Impala/Llama is an example of such application which is currently starting a 'sleep 10y' (10 years) process as the container process. And the resource capabilities are used out of by and the Impala process collocated in the node. The Impala process ensures the processing associated to that resources do not exceed the capabilities of the container. Also, if the container is lost/preempted/killed, Impala stops using the corresponding resources. In addition, in the case of Llama, the current requirement of having a container process, gets complicates when hard resource enforcement (memory -ContainersMonitor- or cpu -via cgroups-) is enabled because Impala/Llama request resources with CPU and memory independently of each other. Some requests are CPU only and others are memory only. Unmanaged containers solve this problem as there is no underlying process with zero CPU or zero memory. Summary: Enable external systems/frameworks to share resources with Hadoop leveraging Yarn resource scheduling (was: Add support for unmanaged containers) Updated the summary and the description to better describe the use case driving this JIRA. I've closed YARN-951 as won't fix as it is a workaround of the problem this JIRA is trying to address. I don't think there is a need for an umbrella JIRA as this is the only change we need. Enable external systems/frameworks to share
[jira] [Commented] (YARN-1404) Enable external systems/frameworks to share resources with Hadoop leveraging Yarn resource scheduling
[ https://issues.apache.org/jira/browse/YARN-1404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13829640#comment-13829640 ] Alejandro Abdelnur commented on YARN-1404: -- The proposal to address this JIRA is: * Allow a NULL ContainerLaunchContext in the startContainer() call, this signals the is not process to be started with the container. * The ContainerLaunch logic would use a latch to lock when there is not associated process. The latch will be released on container completion (preemption or terminated by the AM) The changes to achieve this are minimal and they do not alter at all the lifecycle of a container, nor in the RM, nor in the NM. As previously mentioned by Bikas, this can be seen as a special case of the functionality that YARN-1040 is proposing for managing multiple processes with the same container. The scope of work of YARN-1040 is significantly larger and requires API changes, while this JIRA does not require API changes and the changes are not incompatible with each other. Enable external systems/frameworks to share resources with Hadoop leveraging Yarn resource scheduling - Key: YARN-1404 URL: https://issues.apache.org/jira/browse/YARN-1404 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager Affects Versions: 2.2.0 Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Attachments: YARN-1404.patch Currently Hadoop Yarn expects to manage the lifecycle of the processes its applications run workload in. External frameworks/systems could benefit from sharing resources with other Yarn applications while running their workload within long-running processes owned by the external framework (in other words, running their workload outside of the context of a Yarn container process). Because Yarn provides robust and scalable resource management, it is desirable for some external systems to leverage the resource governance capabilities of Yarn (queues, capacities, scheduling, access control) while supplying their own resource enforcement. Impala is an example of such system. Impala uses Llama (http://cloudera.github.io/llama/) to request resources from Yarn. Impala runs an impalad process in every node of the cluster, when a user submits a query, the processing is broken into 'query fragments' which are run in multiple impalad processes leveraging data locality (similar to Map-Reduce Mappers processing a collocated HDFS block of input data). The execution of a 'query fragment' requires an amount of CPU and memory in the impalad. As the impalad shares the host with other services (HDFS DataNode, Yarn NodeManager, Hbase Region Server) and Yarn Applications (MapReduce tasks). To ensure cluster utilization that follow the Yarn scheduler policies and it does not overload the cluster nodes, before running a 'query fragment' in a node, Impala requests the required amount of CPU and memory from Yarn. Once the requested CPU and memory has been allocated, Impala starts running the 'query fragment' taking care that the 'query fragment' does not use more resources than the ones that have been allocated. Memory is book kept per 'query fragment' and the threads used for the processing of the 'query fragment' are placed under a cgroup to contain CPU utilization. Today, for all resources that have been asked to Yarn RM, a (container) process must be started via the corresponding NodeManager. Failing to do this, will result on the cancelation of the container allocation relinquishing the acquired resource capacity back to the pool of available resources. To avoid this, Impala starts a dummy container process doing 'sleep 10y'. Using a dummy container process has its drawbacks: * the dummy container process is in a cgroup with a given number of CPU shares that are not used and Impala is re-issuing those CPU shares to another cgroup for the thread running the 'query fragment'. The cgroup CPU enforcement works correctly because of the CPU controller implementation (but the formal specified behavior is actually undefined). * Impala may ask for CPU and memory independent of each other. Some requests may be only memory with no CPU or viceversa. Because a container requires a process, complete absence of memory or CPU is not possible even if the dummy process is 'sleep', a minimal amount of memory and CPU is required for the dummy process. Because of this it is desirable to be able to have a container without a backing process. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1434) Single Job can affect fairshare of others
[ https://issues.apache.org/jira/browse/YARN-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13829761#comment-13829761 ] Sandy Ryza commented on YARN-1434: -- This seems possible. To further spell this out: Imagine an AM that, by fairness, receives a container on an NM heartbeat. If it retrieves the container from the RM and gives it back before any other NM can heartbeat, it will also, by fairness, receive the next container that the RM allocates. In this way, it could starve all the other applications on the cluster. An AM that deserves more than a single container could do this with a slower heartbeat interval. For the Fair Scheduler, YARN-1010, which decouples container allocations from node heartbeats, should solve this in most cases. With it, it is nearly impossible for an AM to return containers before the RM allocates other free space to other applications. Single Job can affect fairshare of others - Key: YARN-1434 URL: https://issues.apache.org/jira/browse/YARN-1434 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Carlo Curino Priority: Minor A job receiving containers and deciding not to use them and yielding them back in the next heartbeat could significantly affect the amount of resources given to other jobs. This is because by yielding containers back the job appears always to be under-capacity (more than others) so it is picked to be the next to receive containers. Observed by Robert Grandl, to be independently confirmed. -- This message was sent by Atlassian JIRA (v6.1#6144)