[jira] [Commented] (YARN-1426) YARN Components need to unregister their beans upon shutdown

2013-11-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13828584#comment-13828584
 ] 

Hadoop QA commented on YARN-1426:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12615071/YARN-1426.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.mapred.TestJobCleanup

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2505//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2505//console

This message is automatically generated.

 YARN Components need to unregister their beans upon shutdown
 

 Key: YARN-1426
 URL: https://issues.apache.org/jira/browse/YARN-1426
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 3.0.0, 2.3.0
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
 Attachments: YARN-1426.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-967) [YARN-321] Command Line Interface(CLI) for Reading Application History Storage Data

2013-11-21 Thread Mayank Bansal (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13828809#comment-13828809
 ] 

Mayank Bansal commented on YARN-967:


Thanks [~zjshen] for review

bq. 1. Change yarn.cmd accordingly.
Done
bq. 2. Not necessary, no log is written in AHSClientImpl.
Done
bq,3. Where're the following configurations? Defined in other patch?
YARN-955
bq. 4. Should AHSClientImpl use YarnClient configurations?
I think we should use same to maintian consistency betwwn ahsclient and yarn 
cli in terms of polling interval. I think keeping lots of confs doesn't make 
sense.
bq. 5. Is the following condition correct?
Done
bq. 6. One important issue here is that the command change is incompatible. The 
users' old shell scripts will be break given the change here. It's good to make 
the command compatible. For example, by default, it's going to the info of the 
application(s). Or at least, we need to document the new behavior of the 
command. Vinod Kumar Vavilapalli, how do you say?
As discussed its backward compatible.
bq. 7. Rename it to appAttemptReportStr? Also the javadoc.
Done
bq. 8. Fix the above issue for printContainerReport as well.
Done
bq. 9. Does AHS RPC protocol throw not found exception as well? If not, I think 
it's good to do that to keep consistent. Maybe do the same for 
getApplicationAttemptReport and getContainerReport
This is on purpose, as we first want to make call to RM and if app is not there 
then call AHS if not there then send exception to client. For attempt and 
contianer it only look into AHS and if not found send exception back to client. 
Thats the older behavior.
bq. 10. Check getApplications as well. Make getApplicationAttempts and 
getContainers behave similarly. This and the one above are the server-side 
changes. Probably you'd like to coordinate your other patches.
bq. 11. For listApplications, if the users want the applications in 
FINISHED/FAILED/KILLED states, why not going to historyClient as well?
For listapplications we decide not to get info from AHS , we shall do it once 
we will have filters added. We are leaving it for now.
bq. 12. AHSProxy is using a bunch of RM configurations instead of AHS ones. By 
the way, it seems AHSProxy is almost the same as RMProxy. Is it possible to 
reuse the code instead of duplicating it?
Done
bq. 13. In YarnCLI, should we make getter for historyClient as well, like 
client?
Done
bq. 14. The mock doesn't need to be defined in get and invoked every time 
get is called. Define it once, it will behave the same in the following.
As discussed ignoring it
bq. 15. It's better to mock multiple attempts/containers to test gets.
Done
bq. 16. The modified part of ApplicationCLI needs to be tested as well.
Done

 [YARN-321] Command Line Interface(CLI) for Reading Application History 
 Storage Data
 ---

 Key: YARN-967
 URL: https://issues.apache.org/jira/browse/YARN-967
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Devaraj K
Assignee: Mayank Bansal
 Attachments: YARN-967-1.patch, YARN-967-2.patch, YARN-967-3.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-967) [YARN-321] Command Line Interface(CLI) for Reading Application History Storage Data

2013-11-21 Thread Mayank Bansal (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-967:
---

Attachment: YARN-967-4.patch

Attaching latest patch

Thanks,
Mayank

 [YARN-321] Command Line Interface(CLI) for Reading Application History 
 Storage Data
 ---

 Key: YARN-967
 URL: https://issues.apache.org/jira/browse/YARN-967
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Devaraj K
Assignee: Mayank Bansal
 Attachments: YARN-967-1.patch, YARN-967-2.patch, YARN-967-3.patch, 
 YARN-967-4.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1425) TestRMRestart fails because MockRM.waitForState(AttemptId) uses current attempt instead of the attempt passed as argument

2013-11-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13828825#comment-13828825
 ] 

Hudson commented on YARN-1425:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #398 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/398/])
YARN-1425. TestRMRestart fails because MockRM.waitForState(AttemptId) uses 
current attempt instead of the attempt passed as argument (Omkar Vinit Joshi 
via bikas) (bikas: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1543952)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java


 TestRMRestart fails because MockRM.waitForState(AttemptId) uses current 
 attempt instead of the attempt passed as argument
 -

 Key: YARN-1425
 URL: https://issues.apache.org/jira/browse/YARN-1425
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Omkar Vinit Joshi
Assignee: Omkar Vinit Joshi
 Fix For: 2.3.0

 Attachments: YARN-1425.1.patch, error.log


 TestRMRestart is failing on trunk. Fixing it. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1303) Allow multiple commands separating with ; in distributed-shell

2013-11-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13828832#comment-13828832
 ] 

Hudson commented on YARN-1303:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #398 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/398/])
YARN-1303. Reverted the wrong patch committed earlier and committing the 
correct patch now. In one go. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1544029)
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/Client.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDistributedShell.java
YARN-1303. Fixed DistributedShell to not fail with multiple commands separated 
by a semi-colon as shell-command. Contributed by Xuan Gong. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1544023)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/Client.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDistributedShell.java


 Allow multiple commands separating with ; in distributed-shell
 

 Key: YARN-1303
 URL: https://issues.apache.org/jira/browse/YARN-1303
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: applications/distributed-shell
Reporter: Tassapol Athiapinya
Assignee: Xuan Gong
 Fix For: 2.3.0

 Attachments: YARN-1303.1.patch, YARN-1303.2.patch, YARN-1303.3.patch, 
 YARN-1303.3.patch, YARN-1303.4.patch, YARN-1303.4.patch, YARN-1303.5.patch, 
 YARN-1303.6.patch, YARN-1303.7.patch, YARN-1303.8.1.patch, 
 YARN-1303.8.2.patch, YARN-1303.8.patch, YARN-1303.9.patch


 In shell, we can do ls; ls to run 2 commands at once. 
 In distributed shell, this is not working. We should improve to allow this to 
 occur. There are practical use cases that I know of to run multiple commands 
 or to set environment variables before a command.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1053) Diagnostic message from ContainerExitEvent is ignored in ContainerImpl

2013-11-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13828827#comment-13828827
 ] 

Hudson commented on YARN-1053:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #398 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/398/])
YARN-1053. Diagnostic message from ContainerExitEvent is ignored in 
ContainerImpl (Omkar Vinit Joshi via bikas) (bikas: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1543973)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/TestContainer.java


 Diagnostic message from ContainerExitEvent is ignored in ContainerImpl
 --

 Key: YARN-1053
 URL: https://issues.apache.org/jira/browse/YARN-1053
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.2.0, 2.2.1
Reporter: Omkar Vinit Joshi
Assignee: Omkar Vinit Joshi
Priority: Blocker
  Labels: newbie
 Fix For: 2.3.0

 Attachments: YARN-1053.1.patch, YARN-1053.20130809.patch


 If the container launch fails then we send ContainerExitEvent. This event 
 contains exitCode and diagnostic message. Today we are ignoring diagnostic 
 message while handling this event inside ContainerImpl. Fixing it as it is 
 useful in diagnosing the failure.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-967) [YARN-321] Command Line Interface(CLI) for Reading Application History Storage Data

2013-11-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13828839#comment-13828839
 ] 

Hadoop QA commented on YARN-967:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12615103/YARN-967-4.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2507//console

This message is automatically generated.

 [YARN-321] Command Line Interface(CLI) for Reading Application History 
 Storage Data
 ---

 Key: YARN-967
 URL: https://issues.apache.org/jira/browse/YARN-967
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Devaraj K
Assignee: Mayank Bansal
 Attachments: YARN-967-1.patch, YARN-967-2.patch, YARN-967-3.patch, 
 YARN-967-4.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1430) InvalidStateTransition exceptions are ignored in state machines

2013-11-21 Thread Omkar Vinit Joshi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13829400#comment-13829400
 ] 

Omkar Vinit Joshi commented on YARN-1430:
-

I think for now we should add assert statements so that in test environment it 
will always fail making sure we are not missing some invalid transitions? 
YARN-1416 is one of those examples.

I agree with [~vinodkv] and [~jlowe]. Probably we should be consistent 
everywhere and should show somewhere these system critical errors without 
actually crashing daemons.

 InvalidStateTransition exceptions are ignored in state machines
 ---

 Key: YARN-1430
 URL: https://issues.apache.org/jira/browse/YARN-1430
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Omkar Vinit Joshi
Assignee: Omkar Vinit Joshi

 We have all state machines ignoring InvalidStateTransitions. These exceptions 
 will get logged but will not crash the RM / NM. We definitely should crash it 
 as they move the system into some invalid / unacceptable state.
 * Places where we hide this exception :-
 ** JobImpl
 ** TaskAttemptImpl
 ** TaskImpl
 ** NMClientAsyncImpl
 ** ApplicationImpl
 ** ContainerImpl
 ** LocalizedResource
 ** RMAppAttemptImpl
 ** RMAppImpl
 ** RMContainerImpl
 ** RMNodeImpl
 thoughts?



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1416) InvalidStateTransitions getting reported in multiple test cases even though they pass

2013-11-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13829407#comment-13829407
 ] 

Hadoop QA commented on YARN-1416:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12615200/YARN-1416.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2510//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2510//console

This message is automatically generated.

 InvalidStateTransitions getting reported in multiple test cases even though 
 they pass
 -

 Key: YARN-1416
 URL: https://issues.apache.org/jira/browse/YARN-1416
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Omkar Vinit Joshi
Assignee: Jian He
 Attachments: YARN-1416.1.patch, YARN-1416.1.patch, YARN-1416.2.patch, 
 YARN-1416.2.patch, YARN-1416.2.patch


 It might be worth checking why they are reporting this.
 Testcase : TestRMAppTransitions, TestRM
 there are large number of such errors.
 can't handle RMAppEventType.APP_UPDATE_SAVED at RMAppState.FAILED



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1430) InvalidStateTransition exceptions are ignored in state machines

2013-11-21 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13829018#comment-13829018
 ] 

Jason Lowe commented on YARN-1430:
--

Before flipping the switch to change this, we need to carefully consider the 
consequences.  I'm all for making this a fatal error for unit tests, but I'm 
not convinced this is a good thing for production environments.

We have been running in production for quite some time now (0.23 instead of 
2.x, but the code is very similar in many of these areas).  We've seen invalid 
state transitions logged on our production machines and have filed quite a few 
JIRAs related to those.  However I was often thankful the invalid state 
transition did not crash, because in the vast majority of these cases the 
system can continue to function in an acceptable manner.  Sure, we might leak 
some resources related to an application, fail to aggregate some log or 
something similar, but I'd rather take that pain with a potential workaround 
than the alternative of bringing down the entire cluster each and every time it 
occurs.

What I'm worried about here is a case where we don't see the error during 
testing but when we deploy to production some critical, frequent job 
consistently triggers an unhandled transition.  If that's always fatal, now 
we're stuck in a state where the cluster cannot stay up very long until we 
scramble to develop and deploy a fix or have to rollback, and we have 
guaranteed downtime when it occurs.  In almost all of these cases the invalid 
transition is going to be localized to just one app, one container, or one 
node.  I'm not sure that kind of error is worth taking down an entire cluster 
outside of a testing setup.  I feel this is similar to how most software 
products handle asserts -- they are fatal during development but not during 
production.

 InvalidStateTransition exceptions are ignored in state machines
 ---

 Key: YARN-1430
 URL: https://issues.apache.org/jira/browse/YARN-1430
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Omkar Vinit Joshi
Assignee: Omkar Vinit Joshi

 We have all state machines ignoring InvalidStateTransitions. These exceptions 
 will get logged but will not crash the RM / NM. We definitely should crash it 
 as they move the system into some invalid / unacceptable state.
 * Places where we hide this exception :-
 ** JobImpl
 ** TaskAttemptImpl
 ** TaskImpl
 ** NMClientAsyncImpl
 ** ApplicationImpl
 ** ContainerImpl
 ** LocalizedResource
 ** RMAppAttemptImpl
 ** RMAppImpl
 ** RMContainerImpl
 ** RMNodeImpl
 thoughts?



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1412) Allocating Containers on a particular Node in Yarn

2013-11-21 Thread gaurav gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13829264#comment-13829264
 ] 

gaurav gupta commented on YARN-1412:


Yes All the nodes are on the same Rack.

Here is the experiment that I did to verify the theory
1. Cluster size: 36 nodes
2. yarn.scheduler.capacity.node-locality-delay is set to 36
3. Asked for 36 containers with priority 0
4. I requested containers with (node=yes, rack=yes,relax-locality=true)

But I still see that the containers are allocated on different nodes.


 Allocating Containers on a particular Node in Yarn
 --

 Key: YARN-1412
 URL: https://issues.apache.org/jira/browse/YARN-1412
 Project: Hadoop YARN
  Issue Type: Bug
 Environment: centos, Hadoop 2.2.0
Reporter: gaurav gupta

 Summary of the problem: 
  If I pass the node on which I want container and set relax locality default 
 which is true, I don't get back the container on the node specified even if 
 the resources are available on the node. It doesn't matter if I set rack or 
 not.
 Here is the snippet of the code that I am using
 AMRMClientContainerRequest amRmClient =  AMRMClient.createAMRMClient();;
 String host = h1;
 Resource capability = Records.newRecord(Resource.class);
 capability.setMemory(memory);
 nodes = new String[] {host};
 // in order to request a host, we also have to request the rack
 racks = new String[] {/default-rack};
  ListContainerRequest containerRequests = new 
 ArrayListContainerRequest();
 ListContainerId releasedContainers = new ArrayListContainerId();
 containerRequests.add(new ContainerRequest(capability, nodes, racks, 
 Priority.newInstance(priority)));
 if (containerRequests.size()  0) {
   LOG.info(Asking RM for containers:  + containerRequests);
   for (ContainerRequest cr : containerRequests) {
 LOG.info(Requested container: {}, cr.toString());
 amRmClient.addContainerRequest(cr);
   }
 }
 for (ContainerId containerId : releasedContainers) {
   LOG.info(Released container, id={}, containerId.getId());
   amRmClient.releaseAssignedContainer(containerId);
 }
 return amRmClient.allocate(0);



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1416) InvalidStateTransitions getting reported in multiple test cases even though they pass

2013-11-21 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13829251#comment-13829251
 ] 

Jian He commented on YARN-1416:
---

The reason for this invalid event transition is that inside testGetClientToken, 
it's manually controlling attempt to move some state such that some logic can 
be performed in one of the attemptTransition, which cause an unexpected event 
sent from attempt.  and the whole TestRMAppTransitions unit tests are just 
bypassing attempt transition logic and just manually sending the app event to 
trigger the app transition,  I think this can be fine ?

I put some comments for describing the test purposes.

bq. Do we know how many tests are reporting such exceptions but passing 
successfully?
This is the only invalid event exception in TestRMAppTransitions, all others 
are fixed.
No invalid event exception found TestRMAppAttemptTransitions.


 InvalidStateTransitions getting reported in multiple test cases even though 
 they pass
 -

 Key: YARN-1416
 URL: https://issues.apache.org/jira/browse/YARN-1416
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Omkar Vinit Joshi
Assignee: Jian He
 Attachments: YARN-1416.1.patch, YARN-1416.1.patch, YARN-1416.2.patch


 It might be worth checking why they are reporting this.
 Testcase : TestRMAppTransitions, TestRM
 there are large number of such errors.
 can't handle RMAppEventType.APP_UPDATE_SAVED at RMAppState.FAILED



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-149) ResourceManager (RM) High-Availability (HA)

2013-11-21 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13828948#comment-13828948
 ] 

Steve Loughran commented on YARN-149:
-

I want to warn that HADOOP-9905 is going to drop the ZK dependency from the 
core hadoop-client POM. If YARN client is going to depend on ZK -that's the 
client, not the server- then it's going to have to explicitly add it.

 ResourceManager (RM) High-Availability (HA)
 ---

 Key: YARN-149
 URL: https://issues.apache.org/jira/browse/YARN-149
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Harsh J
Assignee: Bikas Saha
 Attachments: YARN ResourceManager Automatic 
 Failover-rev-07-21-13.pdf, YARN ResourceManager Automatic 
 Failover-rev-08-04-13.pdf, rm-ha-phase1-approach-draft1.pdf, 
 rm-ha-phase1-draft2.pdf


 This jira tracks work needed to be done to support one RM instance failing 
 over to another RM instance so that we can have RM HA. Work includes leader 
 election, transfer of control to leader and client re-direction to new leader.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (YARN-1432) Reduce phase is failing with shuffle error in kerberos enabled cluster

2013-11-21 Thread Ramgopal N (JIRA)
Ramgopal N created YARN-1432:


 Summary: Reduce phase is failing with shuffle error in kerberos 
enabled cluster
 Key: YARN-1432
 URL: https://issues.apache.org/jira/browse/YARN-1432
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Ramgopal N


{code}
OS user: user3
kerberos user: hdfs
Reducer is trying to read the map intermediate output using kerberos 
user(hdfs),but the owner of this file is OS user(user3)


2013-11-21 20:35:48,169 ERROR org.apache.hadoop.mapred.ShuffleHandler: Shuffle 
error :
java.io.IOException: Error Reading IndexFile
at 
org.apache.hadoop.mapred.IndexCache.readIndexFileToCache(IndexCache.java:123)
at 
org.apache.hadoop.mapred.IndexCache.getIndexInformation(IndexCache.java:68)
at 
org.apache.hadoop.mapred.ShuffleHandler$Shuffle.sendMapOutput(ShuffleHandler.java:595)
at 
org.apache.hadoop.mapred.ShuffleHandler$Shuffle.messageReceived(ShuffleHandler.java:506)
at 
org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:80)
at 
org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:545)
at 
org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:754)
at 
org.jboss.netty.handler.stream.ChunkedWriteHandler.handleUpstream(ChunkedWriteHandler.java:144)
at 
org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:545)
at 
org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:754)
at 
org.jboss.netty.handler.codec.http.HttpChunkAggregator.messageReceived(HttpChunkAggregator.java:99)
at 
org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:80)
at 
org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:545)
at 
org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:754)
at 
org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:302)
at 
org.jboss.netty.handler.codec.replay.ReplayingDecoder.unfoldAndfireMessageReceived(ReplayingDecoder.java:523)
at 
org.jboss.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:507)
at 
org.jboss.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:444)
at 
org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:80)
at 
org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:545)
at 
org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:540)
at 
org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:274)
at 
org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:261)
at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:350)
at 
org.jboss.netty.channel.socket.nio.NioWorker.processSelectedKeys(NioWorker.java:281)
at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:201)
at 
org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at 
org.jboss.netty.util.internal.IoWorkerRunnable.run(IoWorkerRunnable.java:46)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOException: Owner 'user3' for path 
/home/user3/NodeAgentTmpDir/data/mapred/nm-local-dir/usercache/hdfs/appcache/application_1385040658134_0011/output/attempt_1385040658134_0011_m_00_0/file.out.index
 did not match expected owner 'hdfs'
at org.apache.hadoop.io.SecureIOUtils.checkStat(SecureIOUtils.java:285)
at 
org.apache.hadoop.io.SecureIOUtils.forceSecureOpenFSDataInputStream(SecureIOUtils.java:174)
at 
org.apache.hadoop.io.SecureIOUtils.openFSDataInputStream(SecureIOUtils.java:158)
at org.apache.hadoop.mapred.SpillRecord.init(SpillRecord.java:70)
at org.apache.hadoop.mapred.SpillRecord.init(SpillRecord.java:62)
at 
org.apache.hadoop.mapred.IndexCache.readIndexFileToCache(IndexCache.java:119)
... 30 more
{code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (YARN-1435) Custom script cannot be run because it lacks of executable bit at container level

2013-11-21 Thread Tassapol Athiapinya (JIRA)
Tassapol Athiapinya created YARN-1435:
-

 Summary: Custom script cannot be run because it lacks of 
executable bit at container level
 Key: YARN-1435
 URL: https://issues.apache.org/jira/browse/YARN-1435
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/distributed-shell
Affects Versions: 2.2.1
Reporter: Tassapol Athiapinya
 Fix For: 2.2.1


Create custom shell script and use -shell_command to point to that script. 
Uploaded shell script won't be able to execute at container level because 
executable bit is missing when container fetches the shell script from HDFS. 
Distributed shell should grant executable bit in this case.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1320) Custom log4j properties in Distributed shell does not work properly.

2013-11-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13829467#comment-13829467
 ] 

Hudson commented on YARN-1320:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #4784 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4784/])
YARN-1320. Fixed Distributed Shell application to respect custom log4j 
properties file. Contributed by Xuan Gong. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1544364)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/Client.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/Log4jPropertyHelper.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDistributedShell.java


 Custom log4j properties in Distributed shell does not work properly.
 

 Key: YARN-1320
 URL: https://issues.apache.org/jira/browse/YARN-1320
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/distributed-shell
Reporter: Tassapol Athiapinya
Assignee: Xuan Gong
 Fix For: 2.3.0

 Attachments: YARN-1320.1.patch, YARN-1320.2.patch, YARN-1320.3.patch, 
 YARN-1320.4.patch, YARN-1320.4.patch, YARN-1320.4.patch, YARN-1320.4.patch, 
 YARN-1320.4.patch, YARN-1320.5.patch, YARN-1320.6.patch, YARN-1320.6.patch, 
 YARN-1320.7.patch, YARN-1320.8.patch, YARN-1320.9.patch


 Distributed shell cannot pick up custom log4j properties (specified with 
 -log_properties). It always uses default log4j properties.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1425) TestRMRestart fails because MockRM.waitForState(AttemptId) uses current attempt instead of the attempt passed as argument

2013-11-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13828916#comment-13828916
 ] 

Hudson commented on YARN-1425:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1589 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1589/])
YARN-1425. TestRMRestart fails because MockRM.waitForState(AttemptId) uses 
current attempt instead of the attempt passed as argument (Omkar Vinit Joshi 
via bikas) (bikas: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1543952)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java


 TestRMRestart fails because MockRM.waitForState(AttemptId) uses current 
 attempt instead of the attempt passed as argument
 -

 Key: YARN-1425
 URL: https://issues.apache.org/jira/browse/YARN-1425
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Omkar Vinit Joshi
Assignee: Omkar Vinit Joshi
 Fix For: 2.3.0

 Attachments: YARN-1425.1.patch, error.log


 TestRMRestart is failing on trunk. Fixing it. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1266) inheriting Application client and History Protocol from base protocol and implement PB service and clients.

2013-11-21 Thread Mayank Bansal (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-1266:


Attachment: YARN-1266-6.patch

Thanks [~vinodkv] for review.

I agree with you.

I am updating the patch.

Thanks,
Mayank

 inheriting Application client and History Protocol from base protocol and 
 implement PB service and clients.
 ---

 Key: YARN-1266
 URL: https://issues.apache.org/jira/browse/YARN-1266
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Mayank Bansal
Assignee: Mayank Bansal
 Attachments: YARN-1266-1.patch, YARN-1266-2.patch, YARN-1266-3.patch, 
 YARN-1266-4.patch, YARN-1266-5.patch, YARN-1266-6.patch


 Adding ApplicationHistoryProtocolPBService to make web apps to work and 
 changing yarn to run AHS as a seprate process



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1425) TestRMRestart fails because MockRM.waitForState(AttemptId) uses current attempt instead of the attempt passed as argument

2013-11-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13828928#comment-13828928
 ] 

Hudson commented on YARN-1425:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1615 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1615/])
YARN-1425. TestRMRestart fails because MockRM.waitForState(AttemptId) uses 
current attempt instead of the attempt passed as argument (Omkar Vinit Joshi 
via bikas) (bikas: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1543952)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java


 TestRMRestart fails because MockRM.waitForState(AttemptId) uses current 
 attempt instead of the attempt passed as argument
 -

 Key: YARN-1425
 URL: https://issues.apache.org/jira/browse/YARN-1425
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Omkar Vinit Joshi
Assignee: Omkar Vinit Joshi
 Fix For: 2.3.0

 Attachments: YARN-1425.1.patch, error.log


 TestRMRestart is failing on trunk. Fixing it. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1266) inheriting Application client and History Protocol from base protocol and implement PB service and clients.

2013-11-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13829495#comment-13829495
 ] 

Hadoop QA commented on YARN-1266:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12615224/YARN-1266-6.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2511//console

This message is automatically generated.

 inheriting Application client and History Protocol from base protocol and 
 implement PB service and clients.
 ---

 Key: YARN-1266
 URL: https://issues.apache.org/jira/browse/YARN-1266
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Mayank Bansal
Assignee: Mayank Bansal
 Attachments: YARN-1266-1.patch, YARN-1266-2.patch, YARN-1266-3.patch, 
 YARN-1266-4.patch, YARN-1266-5.patch, YARN-1266-6.patch


 Adding ApplicationHistoryProtocolPBService to make web apps to work and 
 changing yarn to run AHS as a seprate process



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1320) Custom log4j properties in Distributed shell does not work properly.

2013-11-21 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13829239#comment-13829239
 ] 

Xuan Gong commented on YARN-1320:
-

Did the test on single node cluster.

Original:  
We have 
# Root logger option
hadoop.root.logger=INFO,console

We will not see any DEBUG messages.

Create a customer log4j.property and set 
# Root logger option
hadoop.root.logger=DEBUG,console

And use --log_properties customer.properties
We can see the DEBUG messages now. 

Part of the output : 
{code}
13/11/21 11:15:42 DEBUG service.AbstractService: Service: 
org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl entered state STOPPED
13/11/21 11:15:42 DEBUG ipc.Client: Stopping client
13/11/21 11:15:42 DEBUG ipc.Client: IPC Client (1122993107) connection to 
localhost/127.0.0.1:9105 from appattempt_1385060881865_0007_01: closed
13/11/21 11:15:42 DEBUG ipc.Client: IPC Client (1122993107) connection to 
localhost/127.0.0.1:9105 from appattempt_1385060881865_0007_01: stopped, 
remaining connections 0
13/11/21 11:15:42 DEBUG ipc.Client: IPC Client (1122993107) connection to 
localhost/127.0.0.1:54313 from xuan: closed
13/11/21 11:15:42 DEBUG ipc.Client: IPC Client (1122993107) connection to 
localhost/127.0.0.1:54313 from xuan: stopped, remaining connections 0
13/11/21 11:15:42 INFO distributedshell.ApplicationMaster: Application Master 
completed successfully. exiting
{code}

 Custom log4j properties in Distributed shell does not work properly.
 

 Key: YARN-1320
 URL: https://issues.apache.org/jira/browse/YARN-1320
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/distributed-shell
Reporter: Tassapol Athiapinya
Assignee: Xuan Gong
 Attachments: YARN-1320.1.patch, YARN-1320.2.patch, YARN-1320.3.patch, 
 YARN-1320.4.patch, YARN-1320.4.patch, YARN-1320.4.patch, YARN-1320.4.patch, 
 YARN-1320.4.patch, YARN-1320.5.patch, YARN-1320.6.patch, YARN-1320.6.patch, 
 YARN-1320.7.patch, YARN-1320.8.patch, YARN-1320.9.patch


 Distributed shell cannot pick up custom log4j properties (specified with 
 -log_properties). It always uses default log4j properties.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1314) Cannot pass more than 1 argument to shell command

2013-11-21 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-1314:


Attachment: YARN-1314.3.patch

1.Using the same approach as YARN-1303. Basically,  create a file that will 
save all the client's input args(from --shell_args). The AM will read all the 
args, and add them into CLC. We try to let all containers run the exactly the 
same args that client gives, and let clients to figure out when and where to do 
the correct escaping staff.
2. Did a little code formatting, since we are using lots of duplicate codes

 Cannot pass more than 1 argument to shell command
 -

 Key: YARN-1314
 URL: https://issues.apache.org/jira/browse/YARN-1314
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/distributed-shell
Reporter: Tassapol Athiapinya
Assignee: Xuan Gong
 Fix For: 2.2.1

 Attachments: YARN-1314.1.patch, YARN-1314.1.patch, YARN-1314.2.patch, 
 YARN-1314.3.patch


 Distributed shell cannot accept more than 1 parameters in argument parts.
 All of these commands are treated as 1 parameter:
 /usr/bin/yarn  org.apache.hadoop.yarn.applications.distributedshell.Client 
 -jar distrubuted shell jar -shell_command echo -shell_args 'My   name
 is  Teddy'
 /usr/bin/yarn  org.apache.hadoop.yarn.applications.distributedshell.Client 
 -jar distrubuted shell jar -shell_command echo -shell_args ''My   name'
 'is  Teddy''
 /usr/bin/yarn  org.apache.hadoop.yarn.applications.distributedshell.Client 
 -jar distrubuted shell jar -shell_command echo -shell_args 'My   name' 
'is  Teddy'



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1435) Custom script cannot be run because it lacks of executable bit at container level

2013-11-21 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13829506#comment-13829506
 ] 

Xuan Gong commented on YARN-1435:
-

Currently, if we want to run custom script at DS. We can do it like this :
--shell_command sh --shell_script custom_script.sh


 Custom script cannot be run because it lacks of executable bit at container 
 level
 -

 Key: YARN-1435
 URL: https://issues.apache.org/jira/browse/YARN-1435
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/distributed-shell
Affects Versions: 2.2.1
Reporter: Tassapol Athiapinya
Assignee: Xuan Gong
 Fix For: 2.2.1


 Create custom shell script and use -shell_command to point to that script. 
 Uploaded shell script won't be able to execute at container level because 
 executable bit is missing when container fetches the shell script from HDFS. 
 Distributed shell should grant executable bit in this case.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1435) Distributed Shell should not run other commands except sh, and run the custom script at the same time.

2013-11-21 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-1435:


Summary: Distributed Shell should not run other commands except sh, and 
run the custom script at the same time.  (was: Custom script cannot be run 
because it lacks of executable bit at container level)

 Distributed Shell should not run other commands except sh, and run the 
 custom script at the same time.
 

 Key: YARN-1435
 URL: https://issues.apache.org/jira/browse/YARN-1435
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/distributed-shell
Affects Versions: 2.2.1
Reporter: Tassapol Athiapinya
Assignee: Xuan Gong
 Fix For: 2.2.1


 Create custom shell script and use -shell_command to point to that script. 
 Uploaded shell script won't be able to execute at container level because 
 executable bit is missing when container fetches the shell script from HDFS. 
 Distributed shell should grant executable bit in this case.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1434) Single Job can affect fairshare of others

2013-11-21 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13829377#comment-13829377
 ] 

Carlo Curino commented on YARN-1434:


This has been observed while modifying the mapreduce AM behavior for other 
reasons. If the AM aggressively returns containers, it seems to be able to 
create the illusion to be under-capacity while wasting resources for everyone. 
A second job running in a separate queue (which was supposed to receive 50% of 
the cluster resources) was starved (only getting about 30% of the resources). 
This should be confirmed independently as the environment we observed this in 
had too much going on (i.e., this might be a false positive). 

If confirmed, this might be quite bad, as a single malevolent AM could affect 
the cluster utilization possibly by a lot.
  
[~sandyr], [~acmurthy]  thoughts?

 Single Job can affect fairshare of others
 -

 Key: YARN-1434
 URL: https://issues.apache.org/jira/browse/YARN-1434
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Carlo Curino
Priority: Minor

 A job receiving containers and deciding not to use them and yielding them 
 back in the next heartbeat could significantly affect the amount of resources 
 given to other jobs. 
 This is because by yielding containers back the job appears always to be 
 under-capacity (more than others) so it is picked to be the next to receive 
 containers.
 Observed by Robert Grandl, to be independently confirmed.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1435) Distributed Shell should not run other commands except sh, and run the custom script at the same time.

2013-11-21 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-1435:


Description: 
Currently, if we want to run custom script at DS. We can do it like this :
--shell_command sh --shell_script custom_script.sh
But it may be better to separate running shell_command and shell_script

  was:Create custom shell script and use -shell_command to point to that 
script. Uploaded shell script won't be able to execute at container level 
because executable bit is missing when container fetches the shell script from 
HDFS. Distributed shell should grant executable bit in this case.


 Distributed Shell should not run other commands except sh, and run the 
 custom script at the same time.
 

 Key: YARN-1435
 URL: https://issues.apache.org/jira/browse/YARN-1435
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/distributed-shell
Affects Versions: 2.2.1
Reporter: Tassapol Athiapinya
Assignee: Xuan Gong
 Fix For: 2.2.1


 Currently, if we want to run custom script at DS. We can do it like this :
 --shell_command sh --shell_script custom_script.sh
 But it may be better to separate running shell_command and shell_script



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1435) Distributed Shell should not run other commands except sh, and run the custom script at the same time.

2013-11-21 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13829511#comment-13829511
 ] 

Xuan Gong commented on YARN-1435:
-

We could let DS either execute shell_commands option or shell_script option. 
The right DS commandline should be that we provide either --shell_command or 
--shell_script.
If we provide both options, we can throw out exception, and say something like  
Do not provide both options at the same time.

 Distributed Shell should not run other commands except sh, and run the 
 custom script at the same time.
 

 Key: YARN-1435
 URL: https://issues.apache.org/jira/browse/YARN-1435
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/distributed-shell
Affects Versions: 2.2.1
Reporter: Tassapol Athiapinya
Assignee: Xuan Gong
 Fix For: 2.2.1


 Currently, if we want to run custom script at DS. We can do it like this :
 --shell_command sh --shell_script custom_script.sh
 But it may be better to separate running shell_command and shell_script



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (YARN-1433) ContainerManagementProtocolProxy doesn't have the retry policy

2013-11-21 Thread Zhijie Shen (JIRA)
Zhijie Shen created YARN-1433:
-

 Summary: ContainerManagementProtocolProxy doesn't have the retry 
policy
 Key: YARN-1433
 URL: https://issues.apache.org/jira/browse/YARN-1433
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen


ContainerManagementProtocolProxy doesn't have the retry policy, but RMProxy 
has. Is there any special consideration about whether the retry policy is 
required or not. The same question is applied to Application History Server as 
well (YARN-967). Any idea?



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1426) YARN Components need to unregister their beans upon shutdown

2013-11-21 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13829381#comment-13829381
 ] 

Jonathan Eagles commented on YARN-1426:
---

Test failures:
  - TestJobCleanup is from MAPREDUCE-5552.
  -- Ran this test with and without my patch and both succeed on my desktop.

 YARN Components need to unregister their beans upon shutdown
 

 Key: YARN-1426
 URL: https://issues.apache.org/jira/browse/YARN-1426
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 3.0.0, 2.3.0
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
 Attachments: YARN-1426.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1053) Diagnostic message from ContainerExitEvent is ignored in ContainerImpl

2013-11-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13828930#comment-13828930
 ] 

Hudson commented on YARN-1053:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1615 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1615/])
YARN-1053. Diagnostic message from ContainerExitEvent is ignored in 
ContainerImpl (Omkar Vinit Joshi via bikas) (bikas: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1543973)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/TestContainer.java


 Diagnostic message from ContainerExitEvent is ignored in ContainerImpl
 --

 Key: YARN-1053
 URL: https://issues.apache.org/jira/browse/YARN-1053
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.2.0, 2.2.1
Reporter: Omkar Vinit Joshi
Assignee: Omkar Vinit Joshi
Priority: Blocker
  Labels: newbie
 Fix For: 2.3.0

 Attachments: YARN-1053.1.patch, YARN-1053.20130809.patch


 If the container launch fails then we send ContainerExitEvent. This event 
 contains exitCode and diagnostic message. Today we are ignoring diagnostic 
 message while handling this event inside ContainerImpl. Fixing it as it is 
 useful in diagnosing the failure.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1314) Cannot pass more than 1 argument to shell command

2013-11-21 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-1314:


Attachment: YARN-1314.4.patch

fix test case failure

 Cannot pass more than 1 argument to shell command
 -

 Key: YARN-1314
 URL: https://issues.apache.org/jira/browse/YARN-1314
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/distributed-shell
Reporter: Tassapol Athiapinya
Assignee: Xuan Gong
 Fix For: 2.2.1

 Attachments: YARN-1314.1.patch, YARN-1314.1.patch, YARN-1314.2.patch, 
 YARN-1314.3.patch, YARN-1314.4.patch


 Distributed shell cannot accept more than 1 parameters in argument parts.
 All of these commands are treated as 1 parameter:
 /usr/bin/yarn  org.apache.hadoop.yarn.applications.distributedshell.Client 
 -jar distrubuted shell jar -shell_command echo -shell_args 'My   name
 is  Teddy'
 /usr/bin/yarn  org.apache.hadoop.yarn.applications.distributedshell.Client 
 -jar distrubuted shell jar -shell_command echo -shell_args ''My   name'
 'is  Teddy''
 /usr/bin/yarn  org.apache.hadoop.yarn.applications.distributedshell.Client 
 -jar distrubuted shell jar -shell_command echo -shell_args 'My   name' 
'is  Teddy'



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1314) Cannot pass more than 1 argument to shell command

2013-11-21 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-1314:


Attachment: YARN-1314.5.patch

 Cannot pass more than 1 argument to shell command
 -

 Key: YARN-1314
 URL: https://issues.apache.org/jira/browse/YARN-1314
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/distributed-shell
Reporter: Tassapol Athiapinya
Assignee: Xuan Gong
 Fix For: 2.2.1

 Attachments: YARN-1314.1.patch, YARN-1314.1.patch, YARN-1314.2.patch, 
 YARN-1314.3.patch, YARN-1314.4.1.patch, YARN-1314.4.patch, YARN-1314.5.patch


 Distributed shell cannot accept more than 1 parameters in argument parts.
 All of these commands are treated as 1 parameter:
 /usr/bin/yarn  org.apache.hadoop.yarn.applications.distributedshell.Client 
 -jar distrubuted shell jar -shell_command echo -shell_args 'My   name
 is  Teddy'
 /usr/bin/yarn  org.apache.hadoop.yarn.applications.distributedshell.Client 
 -jar distrubuted shell jar -shell_command echo -shell_args ''My   name'
 'is  Teddy''
 /usr/bin/yarn  org.apache.hadoop.yarn.applications.distributedshell.Client 
 -jar distrubuted shell jar -shell_command echo -shell_args 'My   name' 
'is  Teddy'



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1314) Cannot pass more than 1 argument to shell command

2013-11-21 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13829531#comment-13829531
 ] 

Xuan Gong commented on YARN-1314:
-

Increasing the --num_container numbers to let test case find the correct log 
folder

 Cannot pass more than 1 argument to shell command
 -

 Key: YARN-1314
 URL: https://issues.apache.org/jira/browse/YARN-1314
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/distributed-shell
Reporter: Tassapol Athiapinya
Assignee: Xuan Gong
 Fix For: 2.2.1

 Attachments: YARN-1314.1.patch, YARN-1314.1.patch, YARN-1314.2.patch, 
 YARN-1314.3.patch, YARN-1314.4.1.patch, YARN-1314.4.patch, YARN-1314.5.patch


 Distributed shell cannot accept more than 1 parameters in argument parts.
 All of these commands are treated as 1 parameter:
 /usr/bin/yarn  org.apache.hadoop.yarn.applications.distributedshell.Client 
 -jar distrubuted shell jar -shell_command echo -shell_args 'My   name
 is  Teddy'
 /usr/bin/yarn  org.apache.hadoop.yarn.applications.distributedshell.Client 
 -jar distrubuted shell jar -shell_command echo -shell_args ''My   name'
 'is  Teddy''
 /usr/bin/yarn  org.apache.hadoop.yarn.applications.distributedshell.Client 
 -jar distrubuted shell jar -shell_command echo -shell_args 'My   name' 
'is  Teddy'



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1416) InvalidStateTransitions getting reported in multiple test cases even though they pass

2013-11-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13829257#comment-13829257
 ] 

Hadoop QA commented on YARN-1416:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12615171/YARN-1416.2.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2508//console

This message is automatically generated.

 InvalidStateTransitions getting reported in multiple test cases even though 
 they pass
 -

 Key: YARN-1416
 URL: https://issues.apache.org/jira/browse/YARN-1416
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Omkar Vinit Joshi
Assignee: Jian He
 Attachments: YARN-1416.1.patch, YARN-1416.1.patch, YARN-1416.2.patch


 It might be worth checking why they are reporting this.
 Testcase : TestRMAppTransitions, TestRM
 there are large number of such errors.
 can't handle RMAppEventType.APP_UPDATE_SAVED at RMAppState.FAILED



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (YARN-1434) Single Job can affect fairshare of others

2013-11-21 Thread Carlo Curino (JIRA)
Carlo Curino created YARN-1434:
--

 Summary: Single Job can affect fairshare of others
 Key: YARN-1434
 URL: https://issues.apache.org/jira/browse/YARN-1434
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Carlo Curino
Priority: Minor


A job receiving containers and deciding not to use them and yielding them back 
in the next heartbeat could significantly affect the amount of resources given 
to other jobs. 

This is because by yielding containers back the job appears always to be 
under-capacity (more than others) so it is picked to be the next to receive 
containers.

Observed by Robert Grandl, to be independently confirmed.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1314) Cannot pass more than 1 argument to shell command

2013-11-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13829542#comment-13829542
 ] 

Hadoop QA commented on YARN-1314:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12615234/YARN-1314.5.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2513//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2513//console

This message is automatically generated.

 Cannot pass more than 1 argument to shell command
 -

 Key: YARN-1314
 URL: https://issues.apache.org/jira/browse/YARN-1314
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/distributed-shell
Reporter: Tassapol Athiapinya
Assignee: Xuan Gong
 Fix For: 2.2.1

 Attachments: YARN-1314.1.patch, YARN-1314.1.patch, YARN-1314.2.patch, 
 YARN-1314.3.patch, YARN-1314.4.1.patch, YARN-1314.4.patch, YARN-1314.5.patch


 Distributed shell cannot accept more than 1 parameters in argument parts.
 All of these commands are treated as 1 parameter:
 /usr/bin/yarn  org.apache.hadoop.yarn.applications.distributedshell.Client 
 -jar distrubuted shell jar -shell_command echo -shell_args 'My   name
 is  Teddy'
 /usr/bin/yarn  org.apache.hadoop.yarn.applications.distributedshell.Client 
 -jar distrubuted shell jar -shell_command echo -shell_args ''My   name'
 'is  Teddy''
 /usr/bin/yarn  org.apache.hadoop.yarn.applications.distributedshell.Client 
 -jar distrubuted shell jar -shell_command echo -shell_args 'My   name' 
'is  Teddy'



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1416) InvalidStateTransitions getting reported in multiple test cases even though they pass

2013-11-21 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-1416:
--

Attachment: YARN-1416.2.patch

 InvalidStateTransitions getting reported in multiple test cases even though 
 they pass
 -

 Key: YARN-1416
 URL: https://issues.apache.org/jira/browse/YARN-1416
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Omkar Vinit Joshi
Assignee: Jian He
 Attachments: YARN-1416.1.patch, YARN-1416.1.patch, YARN-1416.2.patch, 
 YARN-1416.2.patch, YARN-1416.2.patch


 It might be worth checking why they are reporting this.
 Testcase : TestRMAppTransitions, TestRM
 there are large number of such errors.
 can't handle RMAppEventType.APP_UPDATE_SAVED at RMAppState.FAILED



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1314) Cannot pass more than 1 argument to shell command

2013-11-21 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13829551#comment-13829551
 ] 

Xuan Gong commented on YARN-1314:
-

Did test on single node cluster:
Using
--shell_command echo --shell_args HADOOP YARN MAPREDUCE

In launch_container.sh ,
{code}
exec /bin/bash -c echo HADOOP YARN MAPREDUCE 
1/Users/xuan/dep/hadoop-3.0.0-SNAPSHOT/logs/application_1385060881865_0015/container_1385060881865_0015_01_02/stdout
 
2/Users/xuan/dep/hadoop-3.0.0-SNAPSHOT/logs/application_1385060881865_0015/container_1385060881865_0015_01_02/stderr
 
{code}

In container stdout log:
it shows 
{code}
HADOOP YARN MAPREDUCE
{code}
as expected

 Cannot pass more than 1 argument to shell command
 -

 Key: YARN-1314
 URL: https://issues.apache.org/jira/browse/YARN-1314
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/distributed-shell
Reporter: Tassapol Athiapinya
Assignee: Xuan Gong
 Fix For: 2.2.1

 Attachments: YARN-1314.1.patch, YARN-1314.1.patch, YARN-1314.2.patch, 
 YARN-1314.3.patch, YARN-1314.4.1.patch, YARN-1314.4.patch, YARN-1314.5.patch


 Distributed shell cannot accept more than 1 parameters in argument parts.
 All of these commands are treated as 1 parameter:
 /usr/bin/yarn  org.apache.hadoop.yarn.applications.distributedshell.Client 
 -jar distrubuted shell jar -shell_command echo -shell_args 'My   name
 is  Teddy'
 /usr/bin/yarn  org.apache.hadoop.yarn.applications.distributedshell.Client 
 -jar distrubuted shell jar -shell_command echo -shell_args ''My   name'
 'is  Teddy''
 /usr/bin/yarn  org.apache.hadoop.yarn.applications.distributedshell.Client 
 -jar distrubuted shell jar -shell_command echo -shell_args 'My   name' 
'is  Teddy'



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1266) Implement PB service and client wrappers for ApplicationHistoryProtocol

2013-11-21 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-1266:
--

Summary: Implement PB service and client wrappers for 
ApplicationHistoryProtocol  (was: inheriting Application client and History 
Protocol from base protocol and implement PB service and clients.)

 Implement PB service and client wrappers for ApplicationHistoryProtocol
 ---

 Key: YARN-1266
 URL: https://issues.apache.org/jira/browse/YARN-1266
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Mayank Bansal
Assignee: Mayank Bansal
 Attachments: YARN-1266-1.patch, YARN-1266-2.patch, YARN-1266-3.patch, 
 YARN-1266-4.patch, YARN-1266-5.patch, YARN-1266-6.patch


 Adding ApplicationHistoryProtocolPBService to make web apps to work and 
 changing yarn to run AHS as a seprate process



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-967) [YARN-321] Command Line Interface(CLI) for Reading Application History Storage Data

2013-11-21 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13829243#comment-13829243
 ] 

Zhijie Shen commented on YARN-967:
--

bq. I think we should use same to maintian consistency betwwn ahsclient and 
yarn cli in terms of polling interval. I think keeping lots of confs doesn't 
make sense.

Please remove it, because it doesn't make sense to AHS, it's used by 
YarnClient#submitApplication
{code}
+  @Override
+  protected void serviceInit(Configuration conf) throws Exception {
+this.ahsAddress = getAHSAddress(conf);
+statePollIntervalMillis = conf.getLong(
+YarnConfiguration.YARN_CLIENT_APP_SUBMISSION_POLL_INTERVAL_MS,
+YarnConfiguration.DEFAULT_YARN_CLIENT_APP_SUBMISSION_POLL_INTERVAL_MS);
+super.serviceInit(conf);
+  }
{code}

bq. This is on purpose, as we first want to make call to RM and if app is not 
there then call AHS if not there then send exception to client. For attempt and 
contianer it only look into AHS and if not found send exception back to client. 
Thats the older behavior.

The point is:
1. Before the patch, if the application is not found, 
ApplicationNotFoundException is thrown.
2. After the patch, if the application is not found in RM, then check AHS. If 
the application is not found in AHS, return null.

The behavior is changed, such that it is not compatible. I suggest throwing 
ApplicationNotFoundException if the application is not found in AHS as well. It 
seems to be done in the patch of YARN-955. Similar changes should be applied to 
getApplicationAttemptReport, and getContainerReport.

In addition, I also suggest looking the behavior of 
ClientRMService#getApplications, and make ApplicationHistoryClientSerivce to 
behave similarly.

bq. For listapplications we decide not to get info from AHS , we shall do it 
once we will have filters added. We are leaving it for now.

Ok, it's fine. We can fix it later.

More comments:

1. The javadoc is still not fixed
{code}
+   * Prints the application attempt report for an application id.
+   * 
+   * @param applicationId
+   * @throws YarnException
+   */
+  private void printApplicationAttemptReport(String applicationAttemptId)
{code}
{code}
+  /**
+   * Prints the container report for an application attempt id.
+   * 
+   * @param applicationAttemptId
+   * @throws YarnException
+   */
+  private void printContainerReport(String containerId) throws YarnException,
+  IOException {
{code}

2. Then, it's going to reuse the retry policy of RM, which seems not to be 
good. BTW, ContainerManagementProtocolProxy seems not to have the retry policy 
as well. Maybe we should simply create a proxy as HSProxies does?
{code}
+RetryPolicy retryPolicy = createRetryPolicy(conf);
{code}

 [YARN-321] Command Line Interface(CLI) for Reading Application History 
 Storage Data
 ---

 Key: YARN-967
 URL: https://issues.apache.org/jira/browse/YARN-967
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Devaraj K
Assignee: Mayank Bansal
 Attachments: YARN-967-1.patch, YARN-967-2.patch, YARN-967-3.patch, 
 YARN-967-4.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (YARN-1436) ZKRMStateStore should have separate configuration for retry period.

2013-11-21 Thread Omkar Vinit Joshi (JIRA)
Omkar Vinit Joshi created YARN-1436:
---

 Summary: ZKRMStateStore should have separate configuration for 
retry period.
 Key: YARN-1436
 URL: https://issues.apache.org/jira/browse/YARN-1436
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Omkar Vinit Joshi
Assignee: Jian He


Problem :- Today we have zkSessionTimeout period which is getting used for 
zookeeper session timeout and for ZKRMStateStore based retry policy. 

Proposed suggestion :- Ideally we should have different configuration knobs for 
this. 
Ideal values for 
zkSessionTimeout should be :- number of zookeeper instances participating in 
quorum * per zookeeper session timeout. see
{code}
org.apache.zookeeper.ClientCnxn.ClientCnxn()..
connectTimeout = sessionTimeout / hostProvider.size();
{code}
retry policy... (may be retry time period or count)



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1436) ZKRMStateStore should have separate configuration for retry period.

2013-11-21 Thread Omkar Vinit Joshi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi updated YARN-1436:


Component/s: resourcemanager

 ZKRMStateStore should have separate configuration for retry period.
 ---

 Key: YARN-1436
 URL: https://issues.apache.org/jira/browse/YARN-1436
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.2.1
Reporter: Omkar Vinit Joshi
Assignee: Jian He

 Problem :- Today we have zkSessionTimeout period which is getting used for 
 zookeeper session timeout and for ZKRMStateStore based retry policy. 
 Proposed suggestion :- Ideally we should have different configuration knobs 
 for this. 
 Ideal values for 
 zkSessionTimeout should be :- number of zookeeper instances participating in 
 quorum * per zookeeper session timeout. see
 {code}
 org.apache.zookeeper.ClientCnxn.ClientCnxn()..
 connectTimeout = sessionTimeout / hostProvider.size();
 {code}
 retry policy... (may be retry time period or count)



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1416) InvalidStateTransitions getting reported in multiple test cases even though they pass

2013-11-21 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-1416:
--

Attachment: YARN-1416.2.patch

 InvalidStateTransitions getting reported in multiple test cases even though 
 they pass
 -

 Key: YARN-1416
 URL: https://issues.apache.org/jira/browse/YARN-1416
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Omkar Vinit Joshi
Assignee: Jian He
 Attachments: YARN-1416.1.patch, YARN-1416.1.patch, YARN-1416.2.patch


 It might be worth checking why they are reporting this.
 Testcase : TestRMAppTransitions, TestRM
 there are large number of such errors.
 can't handle RMAppEventType.APP_UPDATE_SAVED at RMAppState.FAILED



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-149) ResourceManager (RM) High-Availability (HA)

2013-11-21 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13829125#comment-13829125
 ] 

Bikas Saha commented on YARN-149:
-

Please open a sub-task. A patch would be great or else someone else could pick 
it up too.

 ResourceManager (RM) High-Availability (HA)
 ---

 Key: YARN-149
 URL: https://issues.apache.org/jira/browse/YARN-149
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Harsh J
Assignee: Bikas Saha
 Attachments: YARN ResourceManager Automatic 
 Failover-rev-07-21-13.pdf, YARN ResourceManager Automatic 
 Failover-rev-08-04-13.pdf, rm-ha-phase1-approach-draft1.pdf, 
 rm-ha-phase1-draft2.pdf


 This jira tracks work needed to be done to support one RM instance failing 
 over to another RM instance so that we can have RM HA. Work includes leader 
 election, transfer of control to leader and client re-direction to new leader.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-955) [YARN-321] Implementation of ApplicationHistoryProtocol

2013-11-21 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13829246#comment-13829246
 ] 

Zhijie Shen commented on YARN-955:
--

This patch may need to be changed according to the comments in YARN-967:

https://issues.apache.org/jira/browse/YARN-967?focusedCommentId=13829243page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13829243

 [YARN-321] Implementation of ApplicationHistoryProtocol
 ---

 Key: YARN-955
 URL: https://issues.apache.org/jira/browse/YARN-955
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Mayank Bansal
 Attachments: YARN-955-1.patch, YARN-955-2.patch, YARN-955-3.patch, 
 YARN-955-4.patch, YARN-955-5.patch, YARN-955-6.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1303) Allow multiple commands separating with ; in distributed-shell

2013-11-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13828935#comment-13828935
 ] 

Hudson commented on YARN-1303:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1615 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1615/])
YARN-1303. Reverted the wrong patch committed earlier and committing the 
correct patch now. In one go. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1544029)
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/Client.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDistributedShell.java
YARN-1303. Fixed DistributedShell to not fail with multiple commands separated 
by a semi-colon as shell-command. Contributed by Xuan Gong. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1544023)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/Client.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDistributedShell.java


 Allow multiple commands separating with ; in distributed-shell
 

 Key: YARN-1303
 URL: https://issues.apache.org/jira/browse/YARN-1303
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: applications/distributed-shell
Reporter: Tassapol Athiapinya
Assignee: Xuan Gong
 Fix For: 2.3.0

 Attachments: YARN-1303.1.patch, YARN-1303.2.patch, YARN-1303.3.patch, 
 YARN-1303.3.patch, YARN-1303.4.patch, YARN-1303.4.patch, YARN-1303.5.patch, 
 YARN-1303.6.patch, YARN-1303.7.patch, YARN-1303.8.1.patch, 
 YARN-1303.8.2.patch, YARN-1303.8.patch, YARN-1303.9.patch


 In shell, we can do ls; ls to run 2 commands at once. 
 In distributed shell, this is not working. We should improve to allow this to 
 occur. There are practical use cases that I know of to run multiple commands 
 or to set environment variables before a command.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1416) InvalidStateTransitions getting reported in multiple test cases even though they pass

2013-11-21 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-1416:
--

Attachment: YARN-1416.2.patch

 InvalidStateTransitions getting reported in multiple test cases even though 
 they pass
 -

 Key: YARN-1416
 URL: https://issues.apache.org/jira/browse/YARN-1416
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Omkar Vinit Joshi
Assignee: Jian He
 Attachments: YARN-1416.1.patch, YARN-1416.1.patch, YARN-1416.2.patch, 
 YARN-1416.2.patch


 It might be worth checking why they are reporting this.
 Testcase : TestRMAppTransitions, TestRM
 there are large number of such errors.
 can't handle RMAppEventType.APP_UPDATE_SAVED at RMAppState.FAILED



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1430) InvalidStateTransition exceptions are ignored in state machines

2013-11-21 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13829229#comment-13829229
 ] 

Vinod Kumar Vavilapalli commented on YARN-1430:
---

There are pros and cons to both approaches.

If we completely ignore the errors, nobody knows about the problem. One 
solution to this is have these invalid transitions bubble up to the UI, say on 
RM UI, AM UI etc in wild, bold and red colors.

On the other side, I agree that crashing RM all the time is going to be more 
and more painful in production environments.

As for tests, I think we SHOULD clearly crash the tests, so that we can catch 
as many of these errors as quickly as possible.

But as of today, we are treating them inconsistently. An invalid event to the 
scheduler crashes the RM but an invalid event in RMNode isn't. We need to be 
consistent.

 InvalidStateTransition exceptions are ignored in state machines
 ---

 Key: YARN-1430
 URL: https://issues.apache.org/jira/browse/YARN-1430
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Omkar Vinit Joshi
Assignee: Omkar Vinit Joshi

 We have all state machines ignoring InvalidStateTransitions. These exceptions 
 will get logged but will not crash the RM / NM. We definitely should crash it 
 as they move the system into some invalid / unacceptable state.
 * Places where we hide this exception :-
 ** JobImpl
 ** TaskAttemptImpl
 ** TaskImpl
 ** NMClientAsyncImpl
 ** ApplicationImpl
 ** ContainerImpl
 ** LocalizedResource
 ** RMAppAttemptImpl
 ** RMAppImpl
 ** RMContainerImpl
 ** RMNodeImpl
 thoughts?



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1053) Diagnostic message from ContainerExitEvent is ignored in ContainerImpl

2013-11-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13828918#comment-13828918
 ] 

Hudson commented on YARN-1053:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1589 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1589/])
YARN-1053. Diagnostic message from ContainerExitEvent is ignored in 
ContainerImpl (Omkar Vinit Joshi via bikas) (bikas: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1543973)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/TestContainer.java


 Diagnostic message from ContainerExitEvent is ignored in ContainerImpl
 --

 Key: YARN-1053
 URL: https://issues.apache.org/jira/browse/YARN-1053
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.2.0, 2.2.1
Reporter: Omkar Vinit Joshi
Assignee: Omkar Vinit Joshi
Priority: Blocker
  Labels: newbie
 Fix For: 2.3.0

 Attachments: YARN-1053.1.patch, YARN-1053.20130809.patch


 If the container launch fails then we send ContainerExitEvent. This event 
 contains exitCode and diagnostic message. Today we are ignoring diagnostic 
 message while handling this event inside ContainerImpl. Fixing it as it is 
 useful in diagnosing the failure.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1303) Allow multiple commands separating with ; in distributed-shell

2013-11-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13828923#comment-13828923
 ] 

Hudson commented on YARN-1303:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1589 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1589/])
YARN-1303. Reverted the wrong patch committed earlier and committing the 
correct patch now. In one go. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1544029)
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/Client.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDistributedShell.java
YARN-1303. Fixed DistributedShell to not fail with multiple commands separated 
by a semi-colon as shell-command. Contributed by Xuan Gong. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1544023)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/Client.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDistributedShell.java


 Allow multiple commands separating with ; in distributed-shell
 

 Key: YARN-1303
 URL: https://issues.apache.org/jira/browse/YARN-1303
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: applications/distributed-shell
Reporter: Tassapol Athiapinya
Assignee: Xuan Gong
 Fix For: 2.3.0

 Attachments: YARN-1303.1.patch, YARN-1303.2.patch, YARN-1303.3.patch, 
 YARN-1303.3.patch, YARN-1303.4.patch, YARN-1303.4.patch, YARN-1303.5.patch, 
 YARN-1303.6.patch, YARN-1303.7.patch, YARN-1303.8.1.patch, 
 YARN-1303.8.2.patch, YARN-1303.8.patch, YARN-1303.9.patch


 In shell, we can do ls; ls to run 2 commands at once. 
 In distributed shell, this is not working. We should improve to allow this to 
 occur. There are practical use cases that I know of to run multiple commands 
 or to set environment variables before a command.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1416) InvalidStateTransitions getting reported in multiple test cases even though they pass

2013-11-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13829306#comment-13829306
 ] 

Hadoop QA commented on YARN-1416:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12615174/YARN-1416.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2509//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2509//console

This message is automatically generated.

 InvalidStateTransitions getting reported in multiple test cases even though 
 they pass
 -

 Key: YARN-1416
 URL: https://issues.apache.org/jira/browse/YARN-1416
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Omkar Vinit Joshi
Assignee: Jian He
 Attachments: YARN-1416.1.patch, YARN-1416.1.patch, YARN-1416.2.patch, 
 YARN-1416.2.patch


 It might be worth checking why they are reporting this.
 Testcase : TestRMAppTransitions, TestRM
 there are large number of such errors.
 can't handle RMAppEventType.APP_UPDATE_SAVED at RMAppState.FAILED



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-946) Adding HDFS implementation for History Reader Interface

2013-11-21 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-946:
-

Summary: Adding HDFS implementation for History Reader Interface  (was: 
Adding HDFS implementation for Histrory Reader Interface)

 Adding HDFS implementation for History Reader Interface
 ---

 Key: YARN-946
 URL: https://issues.apache.org/jira/browse/YARN-946
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Mayank Bansal
Assignee: Mayank Bansal

 By default we decided to do the HDFS implementation for HistoryReader 
 Interface.
 Thanks,
 Mayank



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-955) [YARN-321] Implementation of ApplicationHistoryProtocol

2013-11-21 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13829490#comment-13829490
 ] 

Zhijie Shen commented on YARN-955:
--

bq. As discussed , We will do the changes as part of YARN-967.

+1. Let's unblock this ticket.

 [YARN-321] Implementation of ApplicationHistoryProtocol
 ---

 Key: YARN-955
 URL: https://issues.apache.org/jira/browse/YARN-955
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Mayank Bansal
 Attachments: YARN-955-1.patch, YARN-955-2.patch, YARN-955-3.patch, 
 YARN-955-4.patch, YARN-955-5.patch, YARN-955-6.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1436) ZKRMStateStore should have separate configuration for retry period.

2013-11-21 Thread Omkar Vinit Joshi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi updated YARN-1436:


Affects Version/s: 2.2.1

 ZKRMStateStore should have separate configuration for retry period.
 ---

 Key: YARN-1436
 URL: https://issues.apache.org/jira/browse/YARN-1436
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.2.1
Reporter: Omkar Vinit Joshi
Assignee: Jian He

 Problem :- Today we have zkSessionTimeout period which is getting used for 
 zookeeper session timeout and for ZKRMStateStore based retry policy. 
 Proposed suggestion :- Ideally we should have different configuration knobs 
 for this. 
 Ideal values for 
 zkSessionTimeout should be :- number of zookeeper instances participating in 
 quorum * per zookeeper session timeout. see
 {code}
 org.apache.zookeeper.ClientCnxn.ClientCnxn()..
 connectTimeout = sessionTimeout / hostProvider.size();
 {code}
 retry policy... (may be retry time period or count)



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1314) Cannot pass more than 1 argument to shell command

2013-11-21 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-1314:


Attachment: YARN-1314.4.1.patch

 Cannot pass more than 1 argument to shell command
 -

 Key: YARN-1314
 URL: https://issues.apache.org/jira/browse/YARN-1314
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/distributed-shell
Reporter: Tassapol Athiapinya
Assignee: Xuan Gong
 Fix For: 2.2.1

 Attachments: YARN-1314.1.patch, YARN-1314.1.patch, YARN-1314.2.patch, 
 YARN-1314.3.patch, YARN-1314.4.1.patch, YARN-1314.4.patch


 Distributed shell cannot accept more than 1 parameters in argument parts.
 All of these commands are treated as 1 parameter:
 /usr/bin/yarn  org.apache.hadoop.yarn.applications.distributedshell.Client 
 -jar distrubuted shell jar -shell_command echo -shell_args 'My   name
 is  Teddy'
 /usr/bin/yarn  org.apache.hadoop.yarn.applications.distributedshell.Client 
 -jar distrubuted shell jar -shell_command echo -shell_args ''My   name'
 'is  Teddy''
 /usr/bin/yarn  org.apache.hadoop.yarn.applications.distributedshell.Client 
 -jar distrubuted shell jar -shell_command echo -shell_args 'My   name' 
'is  Teddy'



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1314) Cannot pass more than 1 argument to shell command

2013-11-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13829518#comment-13829518
 ] 

Hadoop QA commented on YARN-1314:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12615225/YARN-1314.3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell:

  
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2512//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2512//console

This message is automatically generated.

 Cannot pass more than 1 argument to shell command
 -

 Key: YARN-1314
 URL: https://issues.apache.org/jira/browse/YARN-1314
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/distributed-shell
Reporter: Tassapol Athiapinya
Assignee: Xuan Gong
 Fix For: 2.2.1

 Attachments: YARN-1314.1.patch, YARN-1314.1.patch, YARN-1314.2.patch, 
 YARN-1314.3.patch


 Distributed shell cannot accept more than 1 parameters in argument parts.
 All of these commands are treated as 1 parameter:
 /usr/bin/yarn  org.apache.hadoop.yarn.applications.distributedshell.Client 
 -jar distrubuted shell jar -shell_command echo -shell_args 'My   name
 is  Teddy'
 /usr/bin/yarn  org.apache.hadoop.yarn.applications.distributedshell.Client 
 -jar distrubuted shell jar -shell_command echo -shell_args ''My   name'
 'is  Teddy''
 /usr/bin/yarn  org.apache.hadoop.yarn.applications.distributedshell.Client 
 -jar distrubuted shell jar -shell_command echo -shell_args 'My   name' 
'is  Teddy'



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-954) [YARN-321] History Service should create the webUI and wire it to HistoryStorage

2013-11-21 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13829632#comment-13829632
 ] 

Vinod Kumar Vavilapalli commented on YARN-954:
--

Looked at the patch, mostly looks good!
 - AppBlock.java has lots of spacing with dots on individual lines. Please fix 
that.
 - Does displaying of appType work?
 - Styling of app-attempts table and container table: Footer bar for the tables 
is missing.
 - Appattempts page: Reorder the info data to be State first, then Master 
container, node, Tracking URL, Diagnostic info.
 - Container page:
-- Title missing
-- Reorder the information: State, ExitStatus, Node, priority, Started, 
Elapsed, Resource: ( Memory, Vcores ), Logs, Diagnostics
-- Extra Underline after the table appearing

 [YARN-321] History Service should create the webUI and wire it to 
 HistoryStorage
 

 Key: YARN-954
 URL: https://issues.apache.org/jira/browse/YARN-954
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Zhijie Shen
 Attachments: YARN-954-3.patch, YARN-954-v0.patch, YARN-954-v1.patch, 
 YARN-954-v2.patch, YARN-954.4.patch, YARN-954.5.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Resolved] (YARN-951) Add hard minimum resource capabilities for container launching

2013-11-21 Thread Alejandro Abdelnur (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur resolved YARN-951.
-

Resolution: Won't Fix

 Add hard minimum resource capabilities for container launching
 --

 Key: YARN-951
 URL: https://issues.apache.org/jira/browse/YARN-951
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.1.0-beta
Reporter: Alejandro Abdelnur
Assignee: Wei Yan

 This is a follow up of YARN-789, which enabled FairScheduler to handle zero 
 capabilities resource requests in one dimension (either zero CPU or zero 
 memory).
 When resource enforcement is enabled (cgroups for CPU and 
 ProcfsBasedProcessTree for memory) we cannot use zero because the underlying 
 container processes will be killed.
 We need to introduce an absolute or hard minimum:
 * For CPU. Hard enforcement can be done via a cgroup cpu controller. Using an 
 absolute minimum of a few CPU shares (ie 10) in the LinuxContainerExecutor we 
 ensure there is enough CPU cycles to run the sleep process. This absolute 
 minimum would only kick-in if zero is allowed, otherwise will never kick in 
 as the shares for 1 CPU are 1024.
 * For Memory. Hard enforcement is currently done by the 
 ProcfsBasedProcessTree.java, using a minimum absolute of 1 or 2 MBs would 
 take care of zero memory resources. And again, this absolute minimum would 
 only kick-in if zero is allowed, otherwise will never kick in as the 
 increment memory is in several MBs if not 1GB.
 There would be no default for this hard minimum, if not set no correction 
 will be done. If set, then the MAX(hard-minimum, 
 container-resource-capability) will be used. 
 Effectively there will not be any impact unless the hard minimum capabilities 
 are explicitly set.
 And, even if set, unless the scheduler is configured to allow zero 
 capabilities, the hard-minimum value will not kick in unless is set to a 
 value higher than the MIN capabilities for a container.
 Expected values, when set, would be 10 shares for CPU and 2 MB for memory.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1404) Enable external systems/frameworks to share resources with Hadoop leveraging Yarn resource scheduling

2013-11-21 Thread Alejandro Abdelnur (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur updated YARN-1404:
-

Description: 
Currently Hadoop Yarn expects to manage the lifecycle of the processes its 
applications run workload in. External frameworks/systems could benefit from 
sharing resources with other Yarn applications while running their workload 
within long-running processes owned by the external framework (in other words, 
running their workload outside of the context of a Yarn container process). 

Because Yarn provides robust and scalable resource management, it is desirable 
for some external systems to leverage the resource governance capabilities of 
Yarn (queues, capacities, scheduling, access control) while supplying their own 
resource enforcement.

Impala is an example of such system. Impala uses Llama 
(http://cloudera.github.io/llama/) to request resources from Yarn.

Impala runs an impalad process in every node of the cluster, when a user 
submits a query, the processing is broken into 'query fragments' which are run 
in multiple impalad processes leveraging data locality (similar to Map-Reduce 
Mappers processing a collocated HDFS block of input data).

The execution of a 'query fragment' requires an amount of CPU and memory in the 
impalad. As the impalad shares the host with other services (HDFS DataNode, 
Yarn NodeManager, Hbase Region Server) and Yarn Applications (MapReduce tasks).

To ensure cluster utilization that follow the Yarn scheduler policies and it 
does not overload the cluster nodes, before running a 'query fragment' in a 
node, Impala requests the required amount of CPU and memory from Yarn. Once the 
requested CPU and memory has been allocated, Impala starts running the 'query 
fragment' taking care that the 'query fragment' does not use more resources 
than the ones that have been allocated. Memory is book kept per 'query 
fragment' and the threads used for the processing of the 'query fragment' are 
placed under a cgroup to contain CPU utilization.

Today, for all resources that have been asked to Yarn RM, a (container) process 
must be started via the corresponding NodeManager. Failing to do this, will 
result on the cancelation of the container allocation relinquishing the 
acquired resource capacity back to the pool of available resources. To avoid 
this, Impala starts a dummy container process doing 'sleep 10y'.

Using a dummy container process has its drawbacks:

* the dummy container process is in a cgroup with a given number of CPU shares 
that are not used and Impala is re-issuing those CPU shares to another cgroup 
for the thread running the 'query fragment'. The cgroup CPU enforcement works 
correctly because of the CPU controller implementation (but the formal 
specified behavior is actually undefined).
* Impala may ask for CPU and memory independent of each other. Some requests 
may be only memory with no CPU or viceversa. Because a container requires a 
process, complete absence of memory or CPU is not possible even if the dummy 
process is 'sleep', a minimal amount of memory and CPU is required for the 
dummy process.

Because of this it is desirable to be able to have a container without a 
backing process.

  was:
Currently a container allocation requires to start a container process with the 
corresponding NodeManager's node.

For applications that need to use the allocated resources out of band from Yarn 
this means that a dummy container process must be started.

Impala/Llama is an example of such application which is currently starting a 
'sleep 10y' (10 years) process as the container process. And the resource 
capabilities are used out of by and the Impala process collocated in the node. 
The Impala process ensures the processing associated to that resources do not 
exceed the capabilities of the container. Also, if the container is 
lost/preempted/killed, Impala stops using the corresponding resources.

In addition, in the case of Llama, the current requirement of having a 
container process, gets complicates when hard resource enforcement (memory 
-ContainersMonitor- or cpu -via cgroups-) is enabled because Impala/Llama 
request resources with CPU and memory independently of each other. Some 
requests are CPU only and others are memory only. Unmanaged containers solve 
this problem as there is no underlying process with zero CPU or zero memory.



Summary: Enable external systems/frameworks to share resources with 
Hadoop leveraging Yarn resource scheduling  (was: Add support for unmanaged 
containers)

Updated the summary and the description to better describe the use case driving 
this JIRA.

I've closed YARN-951 as won't fix as it is a workaround of the problem this 
JIRA is trying to address.

I don't think there is a need for an umbrella JIRA as this is the only change 
we need.


 Enable external systems/frameworks to share 

[jira] [Commented] (YARN-1404) Enable external systems/frameworks to share resources with Hadoop leveraging Yarn resource scheduling

2013-11-21 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13829640#comment-13829640
 ] 

Alejandro Abdelnur commented on YARN-1404:
--

The proposal to address this JIRA is:

* Allow a NULL ContainerLaunchContext in the startContainer() call, this 
signals the is not process to be started with the container.
* The ContainerLaunch logic would use a latch to lock when there is not 
associated process. The latch will be released on container completion 
(preemption or terminated by the AM)

The changes to achieve this are minimal and they do not alter at all the 
lifecycle of a container, nor in the RM, nor in the NM.

As previously mentioned by Bikas, this can be seen as a special case of the 
functionality that YARN-1040 is proposing for managing multiple processes with 
the same container. 

The scope of work of YARN-1040 is significantly larger and requires API 
changes, while this JIRA does not require API changes and the changes are not 
incompatible with each other.



 Enable external systems/frameworks to share resources with Hadoop leveraging 
 Yarn resource scheduling
 -

 Key: YARN-1404
 URL: https://issues.apache.org/jira/browse/YARN-1404
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager
Affects Versions: 2.2.0
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Attachments: YARN-1404.patch


 Currently Hadoop Yarn expects to manage the lifecycle of the processes its 
 applications run workload in. External frameworks/systems could benefit from 
 sharing resources with other Yarn applications while running their workload 
 within long-running processes owned by the external framework (in other 
 words, running their workload outside of the context of a Yarn container 
 process). 
 Because Yarn provides robust and scalable resource management, it is 
 desirable for some external systems to leverage the resource governance 
 capabilities of Yarn (queues, capacities, scheduling, access control) while 
 supplying their own resource enforcement.
 Impala is an example of such system. Impala uses Llama 
 (http://cloudera.github.io/llama/) to request resources from Yarn.
 Impala runs an impalad process in every node of the cluster, when a user 
 submits a query, the processing is broken into 'query fragments' which are 
 run in multiple impalad processes leveraging data locality (similar to 
 Map-Reduce Mappers processing a collocated HDFS block of input data).
 The execution of a 'query fragment' requires an amount of CPU and memory in 
 the impalad. As the impalad shares the host with other services (HDFS 
 DataNode, Yarn NodeManager, Hbase Region Server) and Yarn Applications 
 (MapReduce tasks).
 To ensure cluster utilization that follow the Yarn scheduler policies and it 
 does not overload the cluster nodes, before running a 'query fragment' in a 
 node, Impala requests the required amount of CPU and memory from Yarn. Once 
 the requested CPU and memory has been allocated, Impala starts running the 
 'query fragment' taking care that the 'query fragment' does not use more 
 resources than the ones that have been allocated. Memory is book kept per 
 'query fragment' and the threads used for the processing of the 'query 
 fragment' are placed under a cgroup to contain CPU utilization.
 Today, for all resources that have been asked to Yarn RM, a (container) 
 process must be started via the corresponding NodeManager. Failing to do 
 this, will result on the cancelation of the container allocation 
 relinquishing the acquired resource capacity back to the pool of available 
 resources. To avoid this, Impala starts a dummy container process doing 
 'sleep 10y'.
 Using a dummy container process has its drawbacks:
 * the dummy container process is in a cgroup with a given number of CPU 
 shares that are not used and Impala is re-issuing those CPU shares to another 
 cgroup for the thread running the 'query fragment'. The cgroup CPU 
 enforcement works correctly because of the CPU controller implementation (but 
 the formal specified behavior is actually undefined).
 * Impala may ask for CPU and memory independent of each other. Some requests 
 may be only memory with no CPU or viceversa. Because a container requires a 
 process, complete absence of memory or CPU is not possible even if the dummy 
 process is 'sleep', a minimal amount of memory and CPU is required for the 
 dummy process.
 Because of this it is desirable to be able to have a container without a 
 backing process.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1434) Single Job can affect fairshare of others

2013-11-21 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13829761#comment-13829761
 ] 

Sandy Ryza commented on YARN-1434:
--

This seems possible.  To further spell this out:
Imagine an AM that, by fairness, receives a container on an NM heartbeat.  If 
it retrieves the container from the RM and gives it back before any other NM 
can heartbeat, it will also, by fairness, receive the next container that the 
RM allocates.  In this way, it could starve all the other applications on the 
cluster.  An AM that deserves more than a single container could do this with a 
slower heartbeat interval.

For the Fair Scheduler, YARN-1010, which decouples container allocations from 
node heartbeats, should solve this in most cases.  With it, it is nearly 
impossible for an AM to return containers before the RM allocates other free 
space to other applications.

 Single Job can affect fairshare of others
 -

 Key: YARN-1434
 URL: https://issues.apache.org/jira/browse/YARN-1434
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Carlo Curino
Priority: Minor

 A job receiving containers and deciding not to use them and yielding them 
 back in the next heartbeat could significantly affect the amount of resources 
 given to other jobs. 
 This is because by yielding containers back the job appears always to be 
 under-capacity (more than others) so it is picked to be the next to receive 
 containers.
 Observed by Robert Grandl, to be independently confirmed.



--
This message was sent by Atlassian JIRA
(v6.1#6144)