[jira] [Commented] (YARN-1131) $ yarn logs should return a message log aggregation is during progress if YARN application is running

2013-10-03 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784848#comment-13784848
 ] 

Hitesh Shah commented on YARN-1131:
---

Sounds good. Would you mind opening jiras for the open comments? Also, the test 
failure needs to be addressed.

 $ yarn logs should return a message log aggregation is during progress if 
 YARN application is running
 -

 Key: YARN-1131
 URL: https://issues.apache.org/jira/browse/YARN-1131
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client
Affects Versions: 2.1.1-beta
Reporter: Tassapol Athiapinya
Assignee: Siddharth Seth
Priority: Minor
 Attachments: YARN-1131.1.txt


 In the case when log aggregation is enabled, if a user submits MapReduce job 
 and runs $ yarn logs -applicationId app ID while the YARN application is 
 running, the command will return no message and return user back to shell. It 
 is nice to tell the user that log aggregation is in progress.
 {code}
 -bash-4.1$ /usr/bin/yarn logs -applicationId application_1377900193583_0002
 -bash-4.1$
 {code}
 At the same time, if invalid application ID is given, YARN CLI should say 
 that the application ID is incorrect rather than throwing 
 NoSuchElementException.
 {code}
 $ /usr/bin/yarn logs -applicationId application_0
 Exception in thread main java.util.NoSuchElementException
 at com.google.common.base.AbstractIterator.next(AbstractIterator.java:75)
 at 
 org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:124)
 at 
 org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:119)
 at org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:110)
 at org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:255)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1131) $yarn logs command should return an appropriate error message if YARN application is still running

2013-10-03 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784850#comment-13784850
 ] 

Siddharth Seth commented on YARN-1131:
--

Will open the followup jiras. Running this through jenkins again. Haven't seen 
the specific test fail or timeout on my local runs.

 $yarn logs command should return an appropriate error message if YARN 
 application is still running
 --

 Key: YARN-1131
 URL: https://issues.apache.org/jira/browse/YARN-1131
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client
Affects Versions: 2.1.1-beta
Reporter: Tassapol Athiapinya
Assignee: Siddharth Seth
Priority: Minor
 Attachments: YARN-1131.1.txt, YARN-1131.2.txt


 In the case when log aggregation is enabled, if a user submits MapReduce job 
 and runs $ yarn logs -applicationId app ID while the YARN application is 
 running, the command will return no message and return user back to shell. It 
 is nice to tell the user that log aggregation is in progress.
 {code}
 -bash-4.1$ /usr/bin/yarn logs -applicationId application_1377900193583_0002
 -bash-4.1$
 {code}
 At the same time, if invalid application ID is given, YARN CLI should say 
 that the application ID is incorrect rather than throwing 
 NoSuchElementException.
 {code}
 $ /usr/bin/yarn logs -applicationId application_0
 Exception in thread main java.util.NoSuchElementException
 at com.google.common.base.AbstractIterator.next(AbstractIterator.java:75)
 at 
 org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:124)
 at 
 org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:119)
 at org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:110)
 at org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:255)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1131) $yarn logs command should return an appropriate error message if YARN application is still running

2013-10-03 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated YARN-1131:
--

Summary: $yarn logs command should return an appropriate error message if 
YARN application is still running  (was: $ yarn logs should return a message 
log aggregation is during progress if YARN application is running)

 $yarn logs command should return an appropriate error message if YARN 
 application is still running
 --

 Key: YARN-1131
 URL: https://issues.apache.org/jira/browse/YARN-1131
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client
Affects Versions: 2.1.1-beta
Reporter: Tassapol Athiapinya
Assignee: Siddharth Seth
Priority: Minor
 Attachments: YARN-1131.1.txt, YARN-1131.2.txt


 In the case when log aggregation is enabled, if a user submits MapReduce job 
 and runs $ yarn logs -applicationId app ID while the YARN application is 
 running, the command will return no message and return user back to shell. It 
 is nice to tell the user that log aggregation is in progress.
 {code}
 -bash-4.1$ /usr/bin/yarn logs -applicationId application_1377900193583_0002
 -bash-4.1$
 {code}
 At the same time, if invalid application ID is given, YARN CLI should say 
 that the application ID is incorrect rather than throwing 
 NoSuchElementException.
 {code}
 $ /usr/bin/yarn logs -applicationId application_0
 Exception in thread main java.util.NoSuchElementException
 at com.google.common.base.AbstractIterator.next(AbstractIterator.java:75)
 at 
 org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:124)
 at 
 org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:119)
 at org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:110)
 at org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:255)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-890) The roundup for memory values on resource manager UI is misleading

2013-10-03 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784864#comment-13784864
 ] 

Siddharth Seth commented on YARN-890:
-

+1. Resources should not be rounded up.
Is there a similar round up in the actual allocation code, which may cause 
additional containers to be allocated to a queue ?.
Should the CS be allowing nodes to register if the nm-memory.mb is not a 
multiple of minimum-allocation-mb, or should it just be rounding down at 
registration ?

 The roundup for memory values on resource manager UI is misleading
 --

 Key: YARN-890
 URL: https://issues.apache.org/jira/browse/YARN-890
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Trupti Dhavle
Assignee: Xuan Gong
 Attachments: Screen Shot 2013-07-10 at 10.43.34 AM.png, 
 YARN-890.1.patch, YARN-890.2.patch


 From the yarn-site.xml, I see following values-
 property
 nameyarn.nodemanager.resource.memory-mb/name
 value4192/value
 /property
 property
 nameyarn.scheduler.maximum-allocation-mb/name
 value4192/value
 /property
 property
 nameyarn.scheduler.minimum-allocation-mb/name
 value1024/value
 /property
 However the resourcemanager UI shows total memory as 5MB 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-890) The roundup for memory values on resource manager UI is misleading

2013-10-03 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784878#comment-13784878
 ] 

Zhijie Shen commented on YARN-890:
--

bq. Is there a similar round up in the actual allocation code, which may cause 
additional containers to be allocated to a queue ?

Checked this before. It seems it's only for web UI.

bq.  or should it just be rounding down at registration ?

It sounds make sense. Allocated memory will always be a multiple of  
minimum-allocation-mb, therefore, the available memory will be so as well.

In this sense, minimum-allocation-mb is somehow considered as 1 unit memory. We 
allocate n units memory to a container, the cluster remains m units container, 
blah blah... Probably, we can simplify the internal memory description. Just 
think out loud.


 The roundup for memory values on resource manager UI is misleading
 --

 Key: YARN-890
 URL: https://issues.apache.org/jira/browse/YARN-890
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Trupti Dhavle
Assignee: Xuan Gong
 Attachments: Screen Shot 2013-07-10 at 10.43.34 AM.png, 
 YARN-890.1.patch, YARN-890.2.patch


 From the yarn-site.xml, I see following values-
 property
 nameyarn.nodemanager.resource.memory-mb/name
 value4192/value
 /property
 property
 nameyarn.scheduler.maximum-allocation-mb/name
 value4192/value
 /property
 property
 nameyarn.scheduler.minimum-allocation-mb/name
 value1024/value
 /property
 However the resourcemanager UI shows total memory as 5MB 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2013-10-03 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784887#comment-13784887
 ] 

Bikas Saha commented on YARN-1197:
--

I dont think Sandy meant that the AM first tells the NM to decrease the size 
and then the NM informs the RM. He meant AM asks the RM. The RM 
decreases/increases the size and then the AM informs the NM about the change. 
RM-NM communication via heartbeat that may happen after some time. 

For decreasing resources, if the RM is to consider the free resource available 
only after the AM informs the NM and the NM heartbeats with the RM then this 
change may become more complicated since the current schedulers dont expect any 
lag in their allocations. This will also delay the allocation of the free space 
to others. Also this delay is determined by when the AM syncs with the NM. 
Thats not a good property. We should probably assume the decrease to be 
effective immediately and RM-NM sync should enforce that. The downside is that 
for the duration of the heartbeat interval, the node may get overbooked but 
that should not be a problem in practice since the container would already be 
using a lower value of resources before the AM asked its capacity to be 
decreased.
The same problem does not hold for increasing resources.

 Support changing resources of an allocated container
 

 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan
 Attachments: yarn-1197.pdf


 Currently, YARN cannot support merge several containers in one node to a big 
 container, which can make us incrementally ask resources, merge them to a 
 bigger one, and launch our processes. The user scenario is described in the 
 comments.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1131) $yarn logs command should return an appropriate error message if YARN application is still running

2013-10-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784894#comment-13784894
 ] 

Hadoop QA commented on YARN-1131:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12606532/YARN-1131.2.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The following test timeouts occurred in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common:

org.apache.hadoop.yarn.client.cli.TestLogsCLI

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2076//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2076//console

This message is automatically generated.

 $yarn logs command should return an appropriate error message if YARN 
 application is still running
 --

 Key: YARN-1131
 URL: https://issues.apache.org/jira/browse/YARN-1131
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client
Affects Versions: 2.1.1-beta
Reporter: Tassapol Athiapinya
Assignee: Siddharth Seth
Priority: Minor
 Attachments: YARN-1131.1.txt, YARN-1131.2.txt


 In the case when log aggregation is enabled, if a user submits MapReduce job 
 and runs $ yarn logs -applicationId app ID while the YARN application is 
 running, the command will return no message and return user back to shell. It 
 is nice to tell the user that log aggregation is in progress.
 {code}
 -bash-4.1$ /usr/bin/yarn logs -applicationId application_1377900193583_0002
 -bash-4.1$
 {code}
 At the same time, if invalid application ID is given, YARN CLI should say 
 that the application ID is incorrect rather than throwing 
 NoSuchElementException.
 {code}
 $ /usr/bin/yarn logs -applicationId application_0
 Exception in thread main java.util.NoSuchElementException
 at com.google.common.base.AbstractIterator.next(AbstractIterator.java:75)
 at 
 org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:124)
 at 
 org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:119)
 at org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:110)
 at org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:255)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2013-10-03 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784906#comment-13784906
 ] 

Wangda Tan commented on YARN-1197:
--

{quote}
For decreasing resources, if the RM is to consider the free resource available 
only after the AM informs the NM and the NM heartbeats with the RM then this 
change may become more complicated since the current schedulers dont expect any 
lag in their allocations. This will also delay the allocation of the free space 
to others. Also this delay is determined by when the AM syncs with the NM. 
Thats not a good property. We should probably assume the decrease to be 
effective immediately and RM-NM sync should enforce that. The downside is that 
for the duration of the heartbeat interval, the node may get overbooked but 
that should not be a problem in practice since the container would already be 
using a lower value of resources before the AM asked its capacity to be 
decreased.
{quote}

I think it make sense, AM tell NM first will make RM cannot leverage freed 
resources, it's not good for heavy-loaded cluster. I'll update document as our 
discussion and start break down tasks. Please let me know if you have any other 
comments.

 Support changing resources of an allocated container
 

 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan
 Attachments: yarn-1197.pdf


 Currently, YARN cannot support merge several containers in one node to a big 
 container, which can make us incrementally ask resources, merge them to a 
 bigger one, and launch our processes. The user scenario is described in the 
 comments.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2013-10-03 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784914#comment-13784914
 ] 

Sandy Ryza commented on YARN-1197:
--

bq. I dont think Sandy meant that the AM first tells the NM to decrease the 
size and then the NM informs the RM.
You're right about what I meant.  Though thinking about this more, is there any 
reason a container shrinking needs to get permission from the RM?  Should we 
not treat giving up part of a container in the same way we treat giving up an 
entire container? I.e. that the app unilaterally decides when to do it.  If we 
need to respect properties like yarn.scheduler.minmum-allocation-mb, the 
NodeManagers could pick these up and enforce them by rejecting shrinkings.

bq. The downside is that for the duration of the heartbeat interval, the node 
may get overbooked but that should not be a problem in practice since the 
container would already be using a lower value of resources before the AM asked 
its capacity to be decreased.
Accepting overbooking in this context seems to me like it would open up a bunch 
of race conditions and compromise a bunch of useful assumptions an 
administrator can make about what's running on a node at a given time.  Do the 
uses of container shrinking require such low latency? (which we would also 
achieve by avoiding the round trip to the RM)

 Support changing resources of an allocated container
 

 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan
 Attachments: yarn-1197.pdf


 Currently, YARN cannot support merge several containers in one node to a big 
 container, which can make us incrementally ask resources, merge them to a 
 bigger one, and launch our processes. The user scenario is described in the 
 comments.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2013-10-03 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784924#comment-13784924
 ] 

Bikas Saha commented on YARN-1197:
--

So the suggestion is that increase goes 
AM(request)-RM(allocation)-AM(increase)-NM and decrease goes 
AM(decrease)-NM(inform)-RM(consider free)-AM (confirmation from RM similar 
to completedContainerStatus) ?

 Support changing resources of an allocated container
 

 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan
 Attachments: yarn-1197.pdf


 Currently, YARN cannot support merge several containers in one node to a big 
 container, which can make us incrementally ask resources, merge them to a 
 bigger one, and launch our processes. The user scenario is described in the 
 comments.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-425) coverage fix for yarn api

2013-10-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784985#comment-13784985
 ] 

Hudson commented on YARN-425:
-

FAILURE: Integrated in Hadoop-Yarn-trunk #351 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/351/])
YARN-425. coverage fix for yarn api (Aleksey Gorshkov via jeagles) (jeagles: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1528641)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestResourceManagerAdministrationProtocolPBClientImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestYarnApiClasses.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/resources
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/resources/config-with-security.xml


 coverage fix for yarn api
 -

 Key: YARN-425
 URL: https://issues.apache.org/jira/browse/YARN-425
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 3.0.0, 0.23.7, 2.1.0-beta
Reporter: Aleksey Gorshkov
Assignee: Aleksey Gorshkov
 Fix For: 3.0.0, 2.3.0

 Attachments: YARN-425-branch-0.23-d.patch, 
 YARN-425-branch-0.23.patch, YARN-425-branch-0.23-v1.patch, 
 YARN-425-branch-2-b.patch, YARN-425-branch-2-c.patch, 
 YARN-425-branch-2.patch, YARN-425-branch-2-v1.patch, YARN-425-trunk-a.patch, 
 YARN-425-trunk-b.patch, YARN-425-trunk-c.patch, YARN-425-trunk-d.patch, 
 YARN-425-trunk.patch, YARN-425-trunk-v1.patch, YARN-425-trunk-v2.patch


 coverage fix for yarn api
 patch YARN-425-trunk-a.patch for trunk
 patch YARN-425-branch-2.patch for branch-2
 patch YARN-425-branch-0.23.patch for branch-0.23



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1141) Updating resource requests should be decoupled with updating blacklist

2013-10-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784987#comment-13784987
 ] 

Hudson commented on YARN-1141:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #351 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/351/])
YARN-1141. Updating resource requests should be decoupled with updating 
blacklist (Zhijie Shen via bikas) (bikas: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1528632)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSSchedulerApp.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java


 Updating resource requests should be decoupled with updating blacklist
 --

 Key: YARN-1141
 URL: https://issues.apache.org/jira/browse/YARN-1141
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Fix For: 2.1.2-beta

 Attachments: YARN-1141.1.patch, YARN-1141.2.patch, YARN-1141.3.patch


 Currently, in CapacityScheduler and FifoScheduler, blacklist is updated 
 together with resource requests, only when the incoming resource requests are 
 not empty. Therefore, when the incoming resource requests are empty, the 
 blacklist will not be updated even when blacklist additions and removals are 
 not empty.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-876) Node resource is added twice when node comes back from unhealthy to healthy

2013-10-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784983#comment-13784983
 ] 

Hudson commented on YARN-876:
-

FAILURE: Integrated in Hadoop-Yarn-trunk #351 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/351/])
YARN-876. Node resource is added twice when node comes back from unhealthy. 
(Peng Zhang via Sandy Ryza) (sandy: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1528660)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java


 Node resource is added twice when node comes back from unhealthy to healthy
 ---

 Key: YARN-876
 URL: https://issues.apache.org/jira/browse/YARN-876
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: PengZhang
Assignee: PengZhang
 Fix For: 2.1.2-beta

 Attachments: YARN-876.patch


 When an unhealthy restarts, its resource maybe added twice in scheduler.
 First time is at node's reconnection, while node's final state is still 
 UNHEALTHY.
 And second time is at node's update, while node's state changing from 
 UNHEALTHY to HEALTHY.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1213) Restore config to ban submitting to undeclared pools in the Fair Scheduler

2013-10-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784984#comment-13784984
 ] 

Hudson commented on YARN-1213:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #351 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/351/])
YARN-1213. Restore config to ban submitting to undeclared pools in the Fair 
Scheduler. (Sandy Ryza) (sandy: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1528696)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerConfiguration.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java


 Restore config to ban submitting to undeclared pools in the Fair Scheduler
 --

 Key: YARN-1213
 URL: https://issues.apache.org/jira/browse/YARN-1213
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.1.0-beta
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Fix For: 2.1.2-beta

 Attachments: YARN-1213.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-677) Increase coverage to FairScheduler

2013-10-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784988#comment-13784988
 ] 

Hudson commented on YARN-677:
-

FAILURE: Integrated in Hadoop-Yarn-trunk #351 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/351/])
YARN-677. Increase coverage to FairScheduler (Vadim Bondarev and Dennis Y via 
jeagles) (jeagles: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1528524)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java


 Increase coverage to FairScheduler
 --

 Key: YARN-677
 URL: https://issues.apache.org/jira/browse/YARN-677
 Project: Hadoop YARN
  Issue Type: Test
Affects Versions: 3.0.0, 2.0.3-alpha, 0.23.6
Reporter: Vadim Bondarev
 Fix For: 3.0.0, 2.3.0

 Attachments: HADOOP-4536-branch-2-a.patch, 
 HADOOP-4536-branch-2c.patch, HADOOP-4536-trunk-a.patch, 
 HADOOP-4536-trunk-c.patch, HDFS-4536-branch-2--N7.patch, 
 HDFS-4536-branch-2--N8.patch, HDFS-4536-branch-2-N9.patch, 
 HDFS-4536-trunk--N6.patch, HDFS-4536-trunk--N7.patch, 
 HDFS-4536-trunk--N8.patch, HDFS-4536-trunk-N9.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-876) Node resource is added twice when node comes back from unhealthy to healthy

2013-10-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785080#comment-13785080
 ] 

Hudson commented on YARN-876:
-

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1541 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1541/])
YARN-876. Node resource is added twice when node comes back from unhealthy. 
(Peng Zhang via Sandy Ryza) (sandy: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1528660)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java


 Node resource is added twice when node comes back from unhealthy to healthy
 ---

 Key: YARN-876
 URL: https://issues.apache.org/jira/browse/YARN-876
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: PengZhang
Assignee: PengZhang
 Fix For: 2.1.2-beta

 Attachments: YARN-876.patch


 When an unhealthy restarts, its resource maybe added twice in scheduler.
 First time is at node's reconnection, while node's final state is still 
 UNHEALTHY.
 And second time is at node's update, while node's state changing from 
 UNHEALTHY to HEALTHY.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-425) coverage fix for yarn api

2013-10-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785082#comment-13785082
 ] 

Hudson commented on YARN-425:
-

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1541 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1541/])
YARN-425. coverage fix for yarn api (Aleksey Gorshkov via jeagles) (jeagles: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1528641)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestResourceManagerAdministrationProtocolPBClientImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestYarnApiClasses.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/resources
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/resources/config-with-security.xml


 coverage fix for yarn api
 -

 Key: YARN-425
 URL: https://issues.apache.org/jira/browse/YARN-425
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 3.0.0, 0.23.7, 2.1.0-beta
Reporter: Aleksey Gorshkov
Assignee: Aleksey Gorshkov
 Fix For: 3.0.0, 2.3.0

 Attachments: YARN-425-branch-0.23-d.patch, 
 YARN-425-branch-0.23.patch, YARN-425-branch-0.23-v1.patch, 
 YARN-425-branch-2-b.patch, YARN-425-branch-2-c.patch, 
 YARN-425-branch-2.patch, YARN-425-branch-2-v1.patch, YARN-425-trunk-a.patch, 
 YARN-425-trunk-b.patch, YARN-425-trunk-c.patch, YARN-425-trunk-d.patch, 
 YARN-425-trunk.patch, YARN-425-trunk-v1.patch, YARN-425-trunk-v2.patch


 coverage fix for yarn api
 patch YARN-425-trunk-a.patch for trunk
 patch YARN-425-branch-2.patch for branch-2
 patch YARN-425-branch-0.23.patch for branch-0.23



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1213) Restore config to ban submitting to undeclared pools in the Fair Scheduler

2013-10-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785081#comment-13785081
 ] 

Hudson commented on YARN-1213:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1541 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1541/])
YARN-1213. Restore config to ban submitting to undeclared pools in the Fair 
Scheduler. (Sandy Ryza) (sandy: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1528696)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerConfiguration.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java


 Restore config to ban submitting to undeclared pools in the Fair Scheduler
 --

 Key: YARN-1213
 URL: https://issues.apache.org/jira/browse/YARN-1213
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.1.0-beta
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Fix For: 2.1.2-beta

 Attachments: YARN-1213.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-677) Increase coverage to FairScheduler

2013-10-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785085#comment-13785085
 ] 

Hudson commented on YARN-677:
-

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1541 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1541/])
YARN-677. Increase coverage to FairScheduler (Vadim Bondarev and Dennis Y via 
jeagles) (jeagles: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1528524)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java


 Increase coverage to FairScheduler
 --

 Key: YARN-677
 URL: https://issues.apache.org/jira/browse/YARN-677
 Project: Hadoop YARN
  Issue Type: Test
Affects Versions: 3.0.0, 2.0.3-alpha, 0.23.6
Reporter: Vadim Bondarev
 Fix For: 3.0.0, 2.3.0

 Attachments: HADOOP-4536-branch-2-a.patch, 
 HADOOP-4536-branch-2c.patch, HADOOP-4536-trunk-a.patch, 
 HADOOP-4536-trunk-c.patch, HDFS-4536-branch-2--N7.patch, 
 HDFS-4536-branch-2--N8.patch, HDFS-4536-branch-2-N9.patch, 
 HDFS-4536-trunk--N6.patch, HDFS-4536-trunk--N7.patch, 
 HDFS-4536-trunk--N8.patch, HDFS-4536-trunk-N9.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1213) Restore config to ban submitting to undeclared pools in the Fair Scheduler

2013-10-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785204#comment-13785204
 ] 

Hudson commented on YARN-1213:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1567 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1567/])
YARN-1213. Restore config to ban submitting to undeclared pools in the Fair 
Scheduler. (Sandy Ryza) (sandy: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1528696)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerConfiguration.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java


 Restore config to ban submitting to undeclared pools in the Fair Scheduler
 --

 Key: YARN-1213
 URL: https://issues.apache.org/jira/browse/YARN-1213
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.1.0-beta
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Fix For: 2.1.2-beta

 Attachments: YARN-1213.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-425) coverage fix for yarn api

2013-10-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785205#comment-13785205
 ] 

Hudson commented on YARN-425:
-

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1567 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1567/])
YARN-425. coverage fix for yarn api (Aleksey Gorshkov via jeagles) (jeagles: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1528641)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestResourceManagerAdministrationProtocolPBClientImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestYarnApiClasses.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/resources
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/resources/config-with-security.xml


 coverage fix for yarn api
 -

 Key: YARN-425
 URL: https://issues.apache.org/jira/browse/YARN-425
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 3.0.0, 0.23.7, 2.1.0-beta
Reporter: Aleksey Gorshkov
Assignee: Aleksey Gorshkov
 Fix For: 3.0.0, 2.3.0

 Attachments: YARN-425-branch-0.23-d.patch, 
 YARN-425-branch-0.23.patch, YARN-425-branch-0.23-v1.patch, 
 YARN-425-branch-2-b.patch, YARN-425-branch-2-c.patch, 
 YARN-425-branch-2.patch, YARN-425-branch-2-v1.patch, YARN-425-trunk-a.patch, 
 YARN-425-trunk-b.patch, YARN-425-trunk-c.patch, YARN-425-trunk-d.patch, 
 YARN-425-trunk.patch, YARN-425-trunk-v1.patch, YARN-425-trunk-v2.patch


 coverage fix for yarn api
 patch YARN-425-trunk-a.patch for trunk
 patch YARN-425-branch-2.patch for branch-2
 patch YARN-425-branch-0.23.patch for branch-0.23



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-876) Node resource is added twice when node comes back from unhealthy to healthy

2013-10-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785203#comment-13785203
 ] 

Hudson commented on YARN-876:
-

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1567 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1567/])
YARN-876. Node resource is added twice when node comes back from unhealthy. 
(Peng Zhang via Sandy Ryza) (sandy: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1528660)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java


 Node resource is added twice when node comes back from unhealthy to healthy
 ---

 Key: YARN-876
 URL: https://issues.apache.org/jira/browse/YARN-876
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: PengZhang
Assignee: PengZhang
 Fix For: 2.1.2-beta

 Attachments: YARN-876.patch


 When an unhealthy restarts, its resource maybe added twice in scheduler.
 First time is at node's reconnection, while node's final state is still 
 UNHEALTHY.
 And second time is at node's update, while node's state changing from 
 UNHEALTHY to HEALTHY.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-677) Increase coverage to FairScheduler

2013-10-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785208#comment-13785208
 ] 

Hudson commented on YARN-677:
-

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1567 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1567/])
YARN-677. Increase coverage to FairScheduler (Vadim Bondarev and Dennis Y via 
jeagles) (jeagles: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1528524)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java


 Increase coverage to FairScheduler
 --

 Key: YARN-677
 URL: https://issues.apache.org/jira/browse/YARN-677
 Project: Hadoop YARN
  Issue Type: Test
Affects Versions: 3.0.0, 2.0.3-alpha, 0.23.6
Reporter: Vadim Bondarev
 Fix For: 3.0.0, 2.3.0

 Attachments: HADOOP-4536-branch-2-a.patch, 
 HADOOP-4536-branch-2c.patch, HADOOP-4536-trunk-a.patch, 
 HADOOP-4536-trunk-c.patch, HDFS-4536-branch-2--N7.patch, 
 HDFS-4536-branch-2--N8.patch, HDFS-4536-branch-2-N9.patch, 
 HDFS-4536-trunk--N6.patch, HDFS-4536-trunk--N7.patch, 
 HDFS-4536-trunk--N8.patch, HDFS-4536-trunk-N9.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1141) Updating resource requests should be decoupled with updating blacklist

2013-10-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785207#comment-13785207
 ] 

Hudson commented on YARN-1141:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1567 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1567/])
YARN-1141. Updating resource requests should be decoupled with updating 
blacklist (Zhijie Shen via bikas) (bikas: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1528632)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSSchedulerApp.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java


 Updating resource requests should be decoupled with updating blacklist
 --

 Key: YARN-1141
 URL: https://issues.apache.org/jira/browse/YARN-1141
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Fix For: 2.1.2-beta

 Attachments: YARN-1141.1.patch, YARN-1141.2.patch, YARN-1141.3.patch


 Currently, in CapacityScheduler and FifoScheduler, blacklist is updated 
 together with resource requests, only when the incoming resource requests are 
 not empty. Therefore, when the incoming resource requests are empty, the 
 blacklist will not be updated even when blacklist additions and removals are 
 not empty.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1267) Refactor cgroup logic out of LCE into a standalone binary

2013-10-03 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-1267:
-

Target Version/s: 2.3.0
   Fix Version/s: (was: 2.3.0)

 Refactor cgroup logic out of LCE into a standalone binary
 -

 Key: YARN-1267
 URL: https://issues.apache.org/jira/browse/YARN-1267
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.1.2-beta
Reporter: Alejandro Abdelnur
Assignee: Roman Shaposhnik

 As discussed in YARN-1253 we should consider decoupling cgroups handling from 
 the LCE. YARN-3 initially had a proposal on how this could be done, we should 
 see if any of that make sense in the current state of things.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-677) Increase coverage to FairScheduler

2013-10-03 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785276#comment-13785276
 ] 

Jonathan Eagles commented on YARN-677:
--

Thanks, Sandy. Let me take a look at the coverage numbers before this patch 
went in. In the mean time I will revert until I can prove we need this coverage 
patch.

 Increase coverage to FairScheduler
 --

 Key: YARN-677
 URL: https://issues.apache.org/jira/browse/YARN-677
 Project: Hadoop YARN
  Issue Type: Test
Affects Versions: 3.0.0, 2.0.3-alpha, 0.23.6
Reporter: Vadim Bondarev
 Fix For: 3.0.0, 2.3.0

 Attachments: HADOOP-4536-branch-2-a.patch, 
 HADOOP-4536-branch-2c.patch, HADOOP-4536-trunk-a.patch, 
 HADOOP-4536-trunk-c.patch, HDFS-4536-branch-2--N7.patch, 
 HDFS-4536-branch-2--N8.patch, HDFS-4536-branch-2-N9.patch, 
 HDFS-4536-trunk--N6.patch, HDFS-4536-trunk--N7.patch, 
 HDFS-4536-trunk--N8.patch, HDFS-4536-trunk-N9.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Comment Edited] (YARN-1197) Support changing resources of an allocated container

2013-10-03 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785293#comment-13785293
 ] 

Alejandro Abdelnur edited comment on YARN-1197 at 10/3/13 3:47 PM:
---

[~gp.leftnoteasy], thanks for your previous answer, it makes sense. 

We've been thinking about this a while ago in the context of Llama for 
Impala-Yarn integration.

Along the lines of what Sandy suggested, just a couple of extra comments.

For decreasing AM can request the correction, effective immediately to the NM. 
the NM reports the container correction and new free space to the RM in the 
next heartbeat. Regarding enforcing minimum, configuration properties are 
scheduler specific, so the minimum should have to come to the NM from the RM as 
part of the registration response.

For increasing the AM must go to the RM first o avoid the race conditions 
already mentioned. To reduce the changes in the RM to a minimum I was thinking 
the following approach:

  * AM does a regular new allocation request with the desired delta 
capabilities increases with relaxedLocality=false (no changes on the AM-RM 
protocol/logic).
  * AM waits for the delta container allocation from the RM.
  * When AM receives the delta container allocation, using a new AM-NM API, it 
updates the original container with the delta container.
  * The NM makes the necessary corrections locally to the original container 
adding the capabilities o the delta container.
  * The NM notifies the RM to merge the original container with the delta 
container.
  * The RM updates the original container and drops the delta container.

The complete list of changes for this approach would be:

* AM-NM API
  ** decreaseContainer(ContainerId original, Resources)
  ** increateContainer(ContainerId original, ContainerId delta)
* NM-RM API
  ** decreaseContainer(ContainerId original, Resources)
  ** registration() - +minimumcontainersize
  ** mergeContainers(ContainerId originalKeep, ContainerId deltaDiscard)
* NM logic
  ** needs to correct capabilities enforcement for +/- delta
* RM logic
  ** needs to update container resources when receiving a NM's 
decreaseContainer() call
  ** needs to update original container resources and delete delta container 
resources when receiving a NM's mergeContainer() call
* RM scheduler API
  ** it should expose methods for decreaseContainer() and mergeContainers() 
functionality





was (Author: tucu00):
[~gp.leftnoteasy], thanks for your previous answer, it makes sense. 

We've been thinking about this a while ago in the context of Llama for 
Impala-Yarn integration.

Along the lines of what Sandy suggested, just a couple of extra comments.

For decreasing AM can request the correction, effective immediately to the NM. 
the NM reports the container correction and new free space to the RM in the 
next heartbeat. Regarding enforcing minimum, configuration properties are 
scheduler specific, so the minimum should have to come to the NM from the RM as 
part of the registration response.

For increasing the AM must go to the RM first o avoid the race conditions 
already mentioned. To reduce the changes in the RM to a minimum I was thinking 
the following approach:

  * AM does a regular new allocation request with the desired delta 
capabilities increases with relaxedLocality=false (no changes on the AM-RM 
protocol/logic).
  * AM waits for the delta container allocation from the RM.
  * When AM receives the delta container allocation, using a new AM-NM API, it 
updates the original container with the delta container.
  * The NM makes the necessary corrections locally to the original container 
adding the capabilities o the delta container.
  * The NM notifies the RM to merge the original container with the delta 
container.
  * The RM updates the original container and drops the delta container.

The complete list of changes for this approach would be:

* AM-NM API
  ** decreaseContainer(ContainerId original, Resources)
  ** increateContainer(ContainerId original, ContainerId delta)
* NM-RM API
  ** decreaseContainer(ContainerId original, Resources)
  ** registration() - +minimumcontainersize
  ** mergeContainers(ContainerId originalKeep, ContainerId deltaDiscard)
* NM logic
  * needs to correct capabilities enforcement for +/- delta
* RM logic
  ** needs to update container resources when receiving a NM's 
decreaseContainer() call
  ** needs to update original container resources and delete delta container 
resources when receiving a NM's mergeContainer() call
* RM scheduler API
  ** it should expose methods for decreaseContainer() and mergeContainers() 
functionality




 Support changing resources of an allocated container
 

 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
   

[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2013-10-03 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785293#comment-13785293
 ] 

Alejandro Abdelnur commented on YARN-1197:
--

[~gp.leftnoteasy], thanks for your previous answer, it makes sense. 

We've been thinking about this a while ago in the context of Llama for 
Impala-Yarn integration.

Along the lines of what Sandy suggested, just a couple of extra comments.

For decreasing AM can request the correction, effective immediately to the NM. 
the NM reports the container correction and new free space to the RM in the 
next heartbeat. Regarding enforcing minimum, configuration properties are 
scheduler specific, so the minimum should have to come to the NM from the RM as 
part of the registration response.

For increasing the AM must go to the RM first o avoid the race conditions 
already mentioned. To reduce the changes in the RM to a minimum I was thinking 
the following approach:

  * AM does a regular new allocation request with the desired delta 
capabilities increases with relaxedLocality=false (no changes on the AM-RM 
protocol/logic).
  * AM waits for the delta container allocation from the RM.
  * When AM receives the delta container allocation, using a new AM-NM API, it 
updates the original container with the delta container.
  * The NM makes the necessary corrections locally to the original container 
adding the capabilities o the delta container.
  * The NM notifies the RM to merge the original container with the delta 
container.
  * The RM updates the original container and drops the delta container.

The complete list of changes for this approach would be:

* AM-NM API
  * decreaseContainer(ContainerId original, Resources)
  * increateContainer(ContainerId original, ContainerId delta)
* NM-RM API
  * decreaseContainer(ContainerId original, Resources)
  * registration() - +minimumcontainersize
  * mergeContainers(ContainerId originalKeep, ContainerId deltaDiscard)
* NM logic
  * needs to correct capabilities enforcement for +/- delta
* RM logic
  * needs to update container resources when receiving a NM's 
decreaseContainer() call
  * needs to update original container resources and delete delta container 
resources when receiving a NM's mergeContainer() call
* RM scheduler API
  * it should expose methods for decreaseContainer() and mergeContainers() 
functionality




 Support changing resources of an allocated container
 

 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan
 Attachments: yarn-1197.pdf


 Currently, YARN cannot support merge several containers in one node to a big 
 container, which can make us incrementally ask resources, merge them to a 
 bigger one, and launch our processes. The user scenario is described in the 
 comments.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-677) Increase coverage to FairScheduler

2013-10-03 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated YARN-677:
-

Fix Version/s: (was: 2.3.0)
   (was: 3.0.0)

 Increase coverage to FairScheduler
 --

 Key: YARN-677
 URL: https://issues.apache.org/jira/browse/YARN-677
 Project: Hadoop YARN
  Issue Type: Test
Affects Versions: 3.0.0, 2.0.3-alpha, 0.23.6
Reporter: Vadim Bondarev
 Attachments: HADOOP-4536-branch-2-a.patch, 
 HADOOP-4536-branch-2c.patch, HADOOP-4536-trunk-a.patch, 
 HADOOP-4536-trunk-c.patch, HDFS-4536-branch-2--N7.patch, 
 HDFS-4536-branch-2--N8.patch, HDFS-4536-branch-2-N9.patch, 
 HDFS-4536-trunk--N6.patch, HDFS-4536-trunk--N7.patch, 
 HDFS-4536-trunk--N8.patch, HDFS-4536-trunk-N9.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-677) Increase coverage to FairScheduler

2013-10-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785305#comment-13785305
 ] 

Hudson commented on YARN-677:
-

SUCCESS: Integrated in Hadoop-trunk-Commit #4525 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4525/])
Revert YARN-677. Increase coverage to FairScheduler (Vadim Bondarev and Dennis 
Y via jeagles) (jeagles: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1528914)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java


 Increase coverage to FairScheduler
 --

 Key: YARN-677
 URL: https://issues.apache.org/jira/browse/YARN-677
 Project: Hadoop YARN
  Issue Type: Test
Affects Versions: 3.0.0, 2.0.3-alpha, 0.23.6
Reporter: Vadim Bondarev
 Attachments: HADOOP-4536-branch-2-a.patch, 
 HADOOP-4536-branch-2c.patch, HADOOP-4536-trunk-a.patch, 
 HADOOP-4536-trunk-c.patch, HDFS-4536-branch-2--N7.patch, 
 HDFS-4536-branch-2--N8.patch, HDFS-4536-branch-2-N9.patch, 
 HDFS-4536-trunk--N6.patch, HDFS-4536-trunk--N7.patch, 
 HDFS-4536-trunk--N8.patch, HDFS-4536-trunk-N9.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-867) Isolation of failures in aux services

2013-10-03 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785306#comment-13785306
 ] 

Hitesh Shah commented on YARN-867:
--

[~xgong] [~bikassaha] [~vinodkv] It seems like this fix is getting quite 
complex and the introduction of container failure on service event handling has 
a possibility of introducing a lot of different race conditions.

I propose the following:

   - Add the code for catch Throwable whenever an aux service is invoked for 
handling the container related events ( app init, container start, container 
stop, app cleanup ). And, do not fail the container if an exception is thrown. 
   - A simpler check could be done to match the service metadata from the 
ContainerLaunchContext and ensure that the service is configured on the NM in 
question. 

Using the above, at the very least, we can catch issues related to 
mis-configured NMs where the shuffle service is not configured. This is way 
simpler as it could be done a simple synchronous check when handling the 
startContainers rpc call. This could be targeted to 2.1.2/2.2.0

As for the failing containers, I propose that we target fixing the feedback of 
failed containers back to the AM on service handling errors in 2.3.0. For the 
2.3.0 targeted jira, I would prefer to increase the scope of this to design for 
differentiating critical vs non-critical services so as to have the framework 
in place to understand which service's errors result in failed containers. 

Comments? 




 Isolation of failures in aux services 
 --

 Key: YARN-867
 URL: https://issues.apache.org/jira/browse/YARN-867
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Hitesh Shah
Assignee: Xuan Gong
Priority: Critical
 Attachments: YARN-867.1.sampleCode.patch, YARN-867.3.patch, 
 YARN-867.4.patch, YARN-867.5.patch, YARN-867.sampleCode.2.patch


 Today, a malicious application can bring down the NM by sending bad data to a 
 service. For example, sending data to the ShuffleService such that it results 
 any non-IOException will cause the NM's async dispatcher to exit as the 
 service's INIT APP event is not handled properly. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-867) Isolation of failures in aux services

2013-10-03 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-867:
---

Attachment: YARN-867.6.patch

Simply logging and taking no action for catch Throwable whenever an aux service 
is invoked for handling the container related events ( app init, container 
start, container stop, app cleanup )

 Isolation of failures in aux services 
 --

 Key: YARN-867
 URL: https://issues.apache.org/jira/browse/YARN-867
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Hitesh Shah
Assignee: Xuan Gong
Priority: Critical
 Attachments: YARN-867.1.sampleCode.patch, YARN-867.3.patch, 
 YARN-867.4.patch, YARN-867.5.patch, YARN-867.6.patch, 
 YARN-867.sampleCode.2.patch


 Today, a malicious application can bring down the NM by sending bad data to a 
 service. For example, sending data to the ShuffleService such that it results 
 any non-IOException will cause the NM's async dispatcher to exit as the 
 service's INIT APP event is not handled properly. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-677) Increase coverage to FairScheduler

2013-10-03 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785354#comment-13785354
 ] 

Sandy Ryza commented on YARN-677:
-

Thanks, Jonathan.

 Increase coverage to FairScheduler
 --

 Key: YARN-677
 URL: https://issues.apache.org/jira/browse/YARN-677
 Project: Hadoop YARN
  Issue Type: Test
Affects Versions: 3.0.0, 2.0.3-alpha, 0.23.6
Reporter: Vadim Bondarev
 Attachments: HADOOP-4536-branch-2-a.patch, 
 HADOOP-4536-branch-2c.patch, HADOOP-4536-trunk-a.patch, 
 HADOOP-4536-trunk-c.patch, HDFS-4536-branch-2--N7.patch, 
 HDFS-4536-branch-2--N8.patch, HDFS-4536-branch-2-N9.patch, 
 HDFS-4536-trunk--N6.patch, HDFS-4536-trunk--N7.patch, 
 HDFS-4536-trunk--N8.patch, HDFS-4536-trunk-N9.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-867) Isolation of failures in aux services

2013-10-03 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785364#comment-13785364
 ] 

Alejandro Abdelnur commented on YARN-867:
-

the try/catch should be around each aux service method invocation so a failure 
of a given service does not affect delivery to other services.

 Isolation of failures in aux services 
 --

 Key: YARN-867
 URL: https://issues.apache.org/jira/browse/YARN-867
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Hitesh Shah
Assignee: Xuan Gong
Priority: Critical
 Attachments: YARN-867.1.sampleCode.patch, YARN-867.3.patch, 
 YARN-867.4.patch, YARN-867.5.patch, YARN-867.6.patch, 
 YARN-867.sampleCode.2.patch


 Today, a malicious application can bring down the NM by sending bad data to a 
 service. For example, sending data to the ShuffleService such that it results 
 any non-IOException will cause the NM's async dispatcher to exit as the 
 service's INIT APP event is not handled properly. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-867) Isolation of failures in aux services

2013-10-03 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785373#comment-13785373
 ] 

Bikas Saha commented on YARN-867:
-

bq. Using the above, at the very least, we can catch issues related to 
mis-configured NMs where the shuffle service is not configured. This is way 
simpler as it could be done a simple synchronous check when handling the 
startContainers rpc call. This could be targeted to 2.1.2/2.2.0

@hitesh, I agree. In that case shall we leave re-target this jira to 2.3 and 
use YARN-1256 to fix the misconfigured service and exception logging?


 Isolation of failures in aux services 
 --

 Key: YARN-867
 URL: https://issues.apache.org/jira/browse/YARN-867
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Hitesh Shah
Assignee: Xuan Gong
Priority: Critical
 Attachments: YARN-867.1.sampleCode.patch, YARN-867.3.patch, 
 YARN-867.4.patch, YARN-867.5.patch, YARN-867.6.patch, 
 YARN-867.sampleCode.2.patch


 Today, a malicious application can bring down the NM by sending bad data to a 
 service. For example, sending data to the ShuffleService such that it results 
 any non-IOException will cause the NM's async dispatcher to exit as the 
 service's INIT APP event is not handled properly. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-867) Isolation of failures in aux services

2013-10-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785371#comment-13785371
 ] 

Hadoop QA commented on YARN-867:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12606599/YARN-867.6.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2077//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2077//console

This message is automatically generated.

 Isolation of failures in aux services 
 --

 Key: YARN-867
 URL: https://issues.apache.org/jira/browse/YARN-867
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Hitesh Shah
Assignee: Xuan Gong
Priority: Critical
 Attachments: YARN-867.1.sampleCode.patch, YARN-867.3.patch, 
 YARN-867.4.patch, YARN-867.5.patch, YARN-867.6.patch, 
 YARN-867.sampleCode.2.patch


 Today, a malicious application can bring down the NM by sending bad data to a 
 service. For example, sending data to the ShuffleService such that it results 
 any non-IOException will cause the NM's async dispatcher to exit as the 
 service's INIT APP event is not handled properly. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-867) Isolation of failures in aux services

2013-10-03 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785380#comment-13785380
 ] 

Hitesh Shah commented on YARN-867:
--

+1 to Bikas's suggestion.

 Isolation of failures in aux services 
 --

 Key: YARN-867
 URL: https://issues.apache.org/jira/browse/YARN-867
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Hitesh Shah
Assignee: Xuan Gong
Priority: Critical
 Attachments: YARN-867.1.sampleCode.patch, YARN-867.3.patch, 
 YARN-867.4.patch, YARN-867.5.patch, YARN-867.6.patch, 
 YARN-867.sampleCode.2.patch


 Today, a malicious application can bring down the NM by sending bad data to a 
 service. For example, sending data to the ShuffleService such that it results 
 any non-IOException will cause the NM's async dispatcher to exit as the 
 service's INIT APP event is not handled properly. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1219) FSDownload changes file suffix making FileUtil.unTar() throw exception

2013-10-03 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated YARN-1219:


Hadoop Flags: Reviewed

+1 for the patch.  I verified on both Mac and Windows.  I plan to commit this 
later today.

 FSDownload changes file suffix making FileUtil.unTar() throw exception
 --

 Key: YARN-1219
 URL: https://issues.apache.org/jira/browse/YARN-1219
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 3.0.0, 2.1.1-beta, 2.1.2-beta
Reporter: shanyu zhao
Assignee: shanyu zhao
 Fix For: 2.1.2-beta

 Attachments: YARN-1219.patch


 While running a Hive join operation on Yarn, I saw exception as described 
 below. This is caused by FSDownload copy the files into a temp file and 
 change the suffix into .tmp before unpacking it. In unpack(), it uses 
 FileUtil.unTar() which will determine if the file is gzipped by looking at 
 the file suffix:
 {code}
 boolean gzipped = inFile.toString().endsWith(gz);
 {code}
 To fix this problem, we can remove the .tmp in the temp file name.
 Here is the detailed exception:
 org.apache.commons.compress.archivers.tar.TarArchiveInputStream.getNextTarEntry(TarArchiveInputStream.java:240)
   at org.apache.hadoop.fs.FileUtil.unTarUsingJava(FileUtil.java:676)
   at org.apache.hadoop.fs.FileUtil.unTar(FileUtil.java:625)
   at org.apache.hadoop.yarn.util.FSDownload.unpack(FSDownload.java:203)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:287)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:50)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
   at java.util.concurrent.FutureTask.run(FutureTask.java:166)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
   at java.util.concurrent.FutureTask.run(FutureTask.java:166)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
 at java.lang.Thread.run(Thread.java:722)



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-465) fix coverage org.apache.hadoop.yarn.server.webproxy

2013-10-03 Thread Andrey Klochkov (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785381#comment-13785381
 ] 

Andrey Klochkov commented on YARN-465:
--

The robot failed when testing the branch-2 patch against trunk, this is 
expected.

 fix coverage  org.apache.hadoop.yarn.server.webproxy
 

 Key: YARN-465
 URL: https://issues.apache.org/jira/browse/YARN-465
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 3.0.0, 0.23.7, 2.0.4-alpha
Reporter: Aleksey Gorshkov
Assignee: Aleksey Gorshkov
 Attachments: YARN-465-branch-0.23-a.patch, 
 YARN-465-branch-0.23.patch, YARN-465-branch-2-a.patch, 
 YARN-465-branch-2--n3.patch, YARN-465-branch-2.patch, YARN-465-trunk-a.patch, 
 YARN-465-trunk--n3.patch, YARN-465-trunk.patch


 fix coverage  org.apache.hadoop.yarn.server.webproxy
 patch YARN-465-trunk.patch for trunk
 patch YARN-465-branch-2.patch for branch-2
 patch YARN-465-branch-0.23.patch for branch-0.23
 There is issue in branch-0.23 . Patch does not creating .keep file.
 For fix it need to run commands:
 mkdir 
 yhadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/webapps/proxy
 touch 
 yhadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/webapps/proxy/.keep
  



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Assigned] (YARN-677) Increase coverage to FairScheduler

2013-10-03 Thread Andrey Klochkov (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Klochkov reassigned YARN-677:


Assignee: Andrey Klochkov

 Increase coverage to FairScheduler
 --

 Key: YARN-677
 URL: https://issues.apache.org/jira/browse/YARN-677
 Project: Hadoop YARN
  Issue Type: Test
Affects Versions: 3.0.0, 2.0.3-alpha, 0.23.6
Reporter: Vadim Bondarev
Assignee: Andrey Klochkov
 Attachments: HADOOP-4536-branch-2-a.patch, 
 HADOOP-4536-branch-2c.patch, HADOOP-4536-trunk-a.patch, 
 HADOOP-4536-trunk-c.patch, HDFS-4536-branch-2--N7.patch, 
 HDFS-4536-branch-2--N8.patch, HDFS-4536-branch-2-N9.patch, 
 HDFS-4536-trunk--N6.patch, HDFS-4536-trunk--N7.patch, 
 HDFS-4536-trunk--N8.patch, HDFS-4536-trunk-N9.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Assigned] (YARN-1256) NM silently ignores non-existent service in StartContainerRequest

2013-10-03 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong reassigned YARN-1256:
---

Assignee: Xuan Gong

 NM silently ignores non-existent service in StartContainerRequest
 -

 Key: YARN-1256
 URL: https://issues.apache.org/jira/browse/YARN-1256
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.1-beta
Reporter: Bikas Saha
Assignee: Xuan Gong
Priority: Critical
 Fix For: 2.1.2-beta


 A container can set token service metadata for a service, say 
 shuffle_service. If that service does not exist then the errors is silently 
 ignored. Later, when the next container wants to access data written to 
 shuffle_service by the first task, then it fails because the service does not 
 have the token that was supposed to be set by the first task.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1256) NM silently ignores non-existent service in StartContainerRequest

2013-10-03 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-1256:


Attachment: YARN-1256.1.patch

Simply logging and taking no action

 NM silently ignores non-existent service in StartContainerRequest
 -

 Key: YARN-1256
 URL: https://issues.apache.org/jira/browse/YARN-1256
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.1-beta
Reporter: Bikas Saha
Assignee: Xuan Gong
Priority: Critical
 Fix For: 2.1.2-beta

 Attachments: YARN-1256.1.patch


 A container can set token service metadata for a service, say 
 shuffle_service. If that service does not exist then the errors is silently 
 ignored. Later, when the next container wants to access data written to 
 shuffle_service by the first task, then it fails because the service does not 
 have the token that was supposed to be set by the first task.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2013-10-03 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785403#comment-13785403
 ] 

Alejandro Abdelnur commented on YARN-1197:
--

Bikas, makes sense, thanks for summarizing.

On the decreasing. Given that we also do a round loop AM-NM-RM-AM, Why not 
make it a bit more symmetric

  *AM asks RM to decrease a container
  *RM notifies NM on next heartbeat about container decreasing

With this approach the RM can enforce the MIN on AM decrease and reject it if 
below MIN. Also, there is not need to notify the AM of the decrease taking 
place as the AM requested that. And as it is a decrease the AM can instruct the 
container to shrink even if the RM does not told the NM yet. Furthermore, I 
would expect an AM instructs a container to shrink before asking Yarn to avoid 
a race condition that could kill the container for using more resources than it 
should.

Also, by doing this there would not be difference in the free resources 
bookkeeping in the RM and the NMs. Thing that may be handy not to complicate 
things for YARN-311.

Thoughts?

 Support changing resources of an allocated container
 

 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan
 Attachments: yarn-1197.pdf


 Currently, YARN cannot support merge several containers in one node to a big 
 container, which can make us incrementally ask resources, merge them to a 
 bigger one, and launch our processes. The user scenario is described in the 
 comments.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-677) Increase coverage to FairScheduler

2013-10-03 Thread Andrey Klochkov (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785407#comment-13785407
 ] 

Andrey Klochkov commented on YARN-677:
--

I looked at the difference in coverage before and after the patch. There are 2 
test methods added: 
1. testSchedulerHandleFailWithExternalEvents checks that 
FairScheduler.handle() throws RuntimeException when supplied with a wrong event 
type. Actual check is missing so seems like the test will pass in any case. 
This is a very minor addition to the coverage. If we want to keep it, I can add 
the check and update the patch.
2. testAggregateCapacityTrackingWithPreemptionEnabled -- not sure about the 
intention. I see that it adds coverage to the 
FairScheduler.preemptTasksIfNecessary() method, but basically it just sleeps so 
the method is invoked, but preemption never happens and the test is not making 
any checks. I think we can skip this one.

Should we keep #1? 

 Increase coverage to FairScheduler
 --

 Key: YARN-677
 URL: https://issues.apache.org/jira/browse/YARN-677
 Project: Hadoop YARN
  Issue Type: Test
Affects Versions: 3.0.0, 2.0.3-alpha, 0.23.6
Reporter: Vadim Bondarev
Assignee: Andrey Klochkov
 Attachments: HADOOP-4536-branch-2-a.patch, 
 HADOOP-4536-branch-2c.patch, HADOOP-4536-trunk-a.patch, 
 HADOOP-4536-trunk-c.patch, HDFS-4536-branch-2--N7.patch, 
 HDFS-4536-branch-2--N8.patch, HDFS-4536-branch-2-N9.patch, 
 HDFS-4536-trunk--N6.patch, HDFS-4536-trunk--N7.patch, 
 HDFS-4536-trunk--N8.patch, HDFS-4536-trunk-N9.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Assigned] (YARN-465) fix coverage org.apache.hadoop.yarn.server.webproxy

2013-10-03 Thread Andrey Klochkov (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Klochkov reassigned YARN-465:


Assignee: Andrey Klochkov  (was: Aleksey Gorshkov)

 fix coverage  org.apache.hadoop.yarn.server.webproxy
 

 Key: YARN-465
 URL: https://issues.apache.org/jira/browse/YARN-465
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 3.0.0, 0.23.7, 2.0.4-alpha
Reporter: Aleksey Gorshkov
Assignee: Andrey Klochkov
 Attachments: YARN-465-branch-0.23-a.patch, 
 YARN-465-branch-0.23.patch, YARN-465-branch-2-a.patch, 
 YARN-465-branch-2--n3.patch, YARN-465-branch-2.patch, YARN-465-trunk-a.patch, 
 YARN-465-trunk--n3.patch, YARN-465-trunk.patch


 fix coverage  org.apache.hadoop.yarn.server.webproxy
 patch YARN-465-trunk.patch for trunk
 patch YARN-465-branch-2.patch for branch-2
 patch YARN-465-branch-0.23.patch for branch-0.23
 There is issue in branch-0.23 . Patch does not creating .keep file.
 For fix it need to run commands:
 mkdir 
 yhadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/webapps/proxy
 touch 
 yhadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/webapps/proxy/.keep
  



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-867) Isolation of failures in aux services

2013-10-03 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785412#comment-13785412
 ] 

Vinod Kumar Vavilapalli commented on YARN-867:
--

bq. @hitesh, I agree. In that case shall we leave re-target this jira to 2.3 
and use YARN-1256 to fix the misconfigured service and exception logging?
+1.

+1 also to the earlier suggestion - too late to put it more state machine 
changes into 2.1.2.

 Isolation of failures in aux services 
 --

 Key: YARN-867
 URL: https://issues.apache.org/jira/browse/YARN-867
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Hitesh Shah
Assignee: Xuan Gong
Priority: Critical
 Attachments: YARN-867.1.sampleCode.patch, YARN-867.3.patch, 
 YARN-867.4.patch, YARN-867.5.patch, YARN-867.6.patch, 
 YARN-867.sampleCode.2.patch


 Today, a malicious application can bring down the NM by sending bad data to a 
 service. For example, sending data to the ShuffleService such that it results 
 any non-IOException will cause the NM's async dispatcher to exit as the 
 service's INIT APP event is not handled properly. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1232) Configuration to support multiple RMs

2013-10-03 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785428#comment-13785428
 ] 

Bikas Saha commented on YARN-1232:
--

Sorry. My bad. I got confused.

Patch looks good overall.

This should either be RM_ID = RM_PREFIX+id or RM_HA_ID = RM_HA_PREFIX+id  
Lets be consistent.
{code}
+  public static final String RM_ID = RM_HA_PREFIX + id;
{code}

Wherever possible, can we simply always call HAUtil methods and let HAUtil 
handle the if/else. This would help reduce a bunch of if-else blocks scattered 
in the code.
{code}
+if (HAUtil.isHAEnabled(this)) {
+  address = HAUtil.getConfValueForRMId(name, defaultAddress, this);
+} else {
+  address = get(name, defaultAddress);
+}
{code}

Let make a mental note, that when new AlwaysOn services (say RPC) are added 
then they need to use the updated conf.
{code}
+  void setConf(Configuration configuration) {
+conf = configuration;
+  }
{code}

Minor nits
getRMServiceIds() getRMId() - would help if they both had service or skipped 
service in the name.

getConfValueForRMId(String prefix/getConfValueForRMId(String 
prefix/setConfValue(String prefix - String rmId instead of prefix?


 Configuration to support multiple RMs
 -

 Key: YARN-1232
 URL: https://issues.apache.org/jira/browse/YARN-1232
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
  Labels: ha
 Attachments: yarn-1232-1.patch, yarn-1232-2.patch, yarn-1232-3.patch, 
 yarn-1232-4.patch, yarn-1232-5.patch, yarn-1232-6.patch


 We should augment the configuration to allow users specify two RMs and the 
 individual RPC addresses for them.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1232) Configuration to support multiple RMs

2013-10-03 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785438#comment-13785438
 ] 

Karthik Kambatla commented on YARN-1232:


bq. getConfValueForRMId(String prefix/getConfValueForRMId(String 
prefix/setConfValue(String prefix - String rmId instead of prefix?

Didn't quite understand this comment. Other comments make sense, will try and 
accommodate them.

 Configuration to support multiple RMs
 -

 Key: YARN-1232
 URL: https://issues.apache.org/jira/browse/YARN-1232
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
  Labels: ha
 Attachments: yarn-1232-1.patch, yarn-1232-2.patch, yarn-1232-3.patch, 
 yarn-1232-4.patch, yarn-1232-5.patch, yarn-1232-6.patch


 We should augment the configuration to allow users specify two RMs and the 
 individual RPC addresses for them.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1232) Configuration to support multiple RMs

2013-10-03 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785441#comment-13785441
 ] 

Bikas Saha commented on YARN-1232:
--

I meant lets use String rmId instead of String prefix in the arguments to 
those methods to clarify that we expect the rm-id to be sent as an argument and 
not some arbitrary prefix. Isnt that the case?

 Configuration to support multiple RMs
 -

 Key: YARN-1232
 URL: https://issues.apache.org/jira/browse/YARN-1232
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
  Labels: ha
 Attachments: yarn-1232-1.patch, yarn-1232-2.patch, yarn-1232-3.patch, 
 yarn-1232-4.patch, yarn-1232-5.patch, yarn-1232-6.patch


 We should augment the configuration to allow users specify two RMs and the 
 individual RPC addresses for them.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2013-10-03 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785444#comment-13785444
 ] 

Bikas Saha commented on YARN-1197:
--

Sandy had some arguments on why this has race conditions wrt when the RM can 
start allocating the freed-up resources. Can you please look at the comments 
above to check if its the same thing or not.

 Support changing resources of an allocated container
 

 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan
 Attachments: yarn-1197.pdf


 Currently, YARN cannot support merge several containers in one node to a big 
 container, which can make us incrementally ask resources, merge them to a 
 bigger one, and launch our processes. The user scenario is described in the 
 comments.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Moved] (YARN-1269) QueueACLs doesn't work as root allows *

2013-10-03 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen moved MAPREDUCE-5557 to YARN-1269:
--

Key: YARN-1269  (was: MAPREDUCE-5557)
Project: Hadoop YARN  (was: Hadoop Map/Reduce)

 QueueACLs doesn't work as root allows *
 ---

 Key: YARN-1269
 URL: https://issues.apache.org/jira/browse/YARN-1269
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen

 Even if we specify acl for default queue, say user1, user2 can still submit 
 and kill applications on default queue, because the queue checked user2 don't 
 have the access to it, it then checked whether user2 has the access to it's 
 parent recursively, and finally it found user2 have the access to root.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1269) QueueACLs doesn't work as root allows *

2013-10-03 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785452#comment-13785452
 ] 

Zhijie Shen commented on YARN-1269:
---

We need to configure root not to accept *. However, the following case will 
have some problem.

{code}
property
nameyarn.scheduler.capacity.root.queue1.acl_submit_applications/name
valueuser1/value
description
The ACL of who can submit jobs to the default queue.
/description
/property
property
nameyarn.scheduler.capacity.root.queue2.acl_submit_applications/name
valueuser2/value
description
The ACL of who can submit jobs to the default queue.
/description
/property
{code}

If we have the two queues, we definitely don't want to set the users of the 
root to be the union of the users of both queues. Otherwise, user1 and user2 
have the the access to both queues.

Maybe we should not check the parent queue access if the parent queue is root?

 QueueACLs doesn't work as root allows *
 ---

 Key: YARN-1269
 URL: https://issues.apache.org/jira/browse/YARN-1269
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen

 Even if we specify acl for default queue, say user1, user2 can still submit 
 and kill applications on default queue, because the queue checked user2 don't 
 have the access to it, it then checked whether user2 has the access to it's 
 parent recursively, and finally it found user2 have the access to root.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1232) Configuration to support multiple RMs

2013-10-03 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785455#comment-13785455
 ] 

Karthik Kambatla commented on YARN-1232:


Oh. The prefix is a config key - e.g. yarn.resourcemanager.address. By 
getConfValueForRMId, we mean get the value of this key for the specific ID 
mentioned in the Configuration.

 Configuration to support multiple RMs
 -

 Key: YARN-1232
 URL: https://issues.apache.org/jira/browse/YARN-1232
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
  Labels: ha
 Attachments: yarn-1232-1.patch, yarn-1232-2.patch, yarn-1232-3.patch, 
 yarn-1232-4.patch, yarn-1232-5.patch, yarn-1232-6.patch


 We should augment the configuration to allow users specify two RMs and the 
 individual RPC addresses for them.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1256) NM silently ignores non-existent service in StartContainerRequest

2013-10-03 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-1256:


Attachment: YARN-1256.2.patch

Add logic to fail container start if the auxservice can not be found.
Remove checking (null == service) from AuxService#handle, since we have already 
checked in startContainer

 NM silently ignores non-existent service in StartContainerRequest
 -

 Key: YARN-1256
 URL: https://issues.apache.org/jira/browse/YARN-1256
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.1-beta
Reporter: Bikas Saha
Assignee: Xuan Gong
Priority: Critical
 Fix For: 2.1.2-beta

 Attachments: YARN-1256.1.patch, YARN-1256.2.patch


 A container can set token service metadata for a service, say 
 shuffle_service. If that service does not exist then the errors is silently 
 ignored. Later, when the next container wants to access data written to 
 shuffle_service by the first task, then it fails because the service does not 
 have the token that was supposed to be set by the first task.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1256) NM silently ignores non-existent service in StartContainerRequest

2013-10-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785520#comment-13785520
 ] 

Hadoop QA commented on YARN-1256:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12606633/YARN-1256.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2078//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2078//console

This message is automatically generated.

 NM silently ignores non-existent service in StartContainerRequest
 -

 Key: YARN-1256
 URL: https://issues.apache.org/jira/browse/YARN-1256
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.1-beta
Reporter: Bikas Saha
Assignee: Xuan Gong
Priority: Critical
 Fix For: 2.1.2-beta

 Attachments: YARN-1256.1.patch, YARN-1256.2.patch


 A container can set token service metadata for a service, say 
 shuffle_service. If that service does not exist then the errors is silently 
 ignored. Later, when the next container wants to access data written to 
 shuffle_service by the first task, then it fails because the service does not 
 have the token that was supposed to be set by the first task.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1199) Make NM/RM Versions Available

2013-10-03 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785526#comment-13785526
 ] 

Jonathan Eagles commented on YARN-1199:
---

+1. Thanks, Mit.

 Make NM/RM Versions Available
 -

 Key: YARN-1199
 URL: https://issues.apache.org/jira/browse/YARN-1199
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Mit Desai
Assignee: Mit Desai
 Attachments: YARN-1199.patch, YARN-1199.patch, YARN-1199.patch, 
 YARN-1199.patch


 Now as we have the NM and RM Versions available, we can display the YARN 
 version of nodes running in the cluster.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1270) TestSLSRunner test is failing

2013-10-03 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated YARN-1270:


Summary: TestSLSRunner test is failing  (was: TestSLSRunner is failing)

 TestSLSRunner test is failing
 -

 Key: YARN-1270
 URL: https://issues.apache.org/jira/browse/YARN-1270
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Mit Desai

 Added in the YARn-1021 patch, the test TestSLSRunner is now failing.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1021) Yarn Scheduler Load Simulator

2013-10-03 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785547#comment-13785547
 ] 

Mit Desai commented on YARN-1021:
-

Hey Wei, FYI, I would like to inform you that the test TestSLSRunner is 
failing. I have created a new JIRA for that YARN-1270

 Yarn Scheduler Load Simulator
 -

 Key: YARN-1021
 URL: https://issues.apache.org/jira/browse/YARN-1021
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: scheduler
Reporter: Wei Yan
Assignee: Wei Yan
 Fix For: 2.3.0

 Attachments: YARN-1021-demo.tar.gz, YARN-1021-images.tar.gz, 
 YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, 
 YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, 
 YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, 
 YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.pdf


 The Yarn Scheduler is a fertile area of interest with different 
 implementations, e.g., Fifo, Capacity and Fair  schedulers. Meanwhile, 
 several optimizations are also made to improve scheduler performance for 
 different scenarios and workload. Each scheduler algorithm has its own set of 
 features, and drives scheduling decisions by many factors, such as fairness, 
 capacity guarantee, resource availability, etc. It is very important to 
 evaluate a scheduler algorithm very well before we deploy it in a production 
 cluster. Unfortunately, currently it is non-trivial to evaluate a scheduling 
 algorithm. Evaluating in a real cluster is always time and cost consuming, 
 and it is also very hard to find a large-enough cluster. Hence, a simulator 
 which can predict how well a scheduler algorithm for some specific workload 
 would be quite useful.
 We want to build a Scheduler Load Simulator to simulate large-scale Yarn 
 clusters and application loads in a single machine. This would be invaluable 
 in furthering Yarn by providing a tool for researchers and developers to 
 prototype new scheduler features and predict their behavior and performance 
 with reasonable amount of confidence, there-by aiding rapid innovation.
 The simulator will exercise the real Yarn ResourceManager removing the 
 network factor by simulating NodeManagers and ApplicationMasters via handling 
 and dispatching NM/AMs heartbeat events from within the same JVM.
 To keep tracking of scheduler behavior and performance, a scheduler wrapper 
 will wrap the real scheduler.
 The simulator will produce real time metrics while executing, including:
 * Resource usages for whole cluster and each queue, which can be utilized to 
 configure cluster and queue's capacity.
 * The detailed application execution trace (recorded in relation to simulated 
 time), which can be analyzed to understand/validate the  scheduler behavior 
 (individual jobs turn around time, throughput, fairness, capacity guarantee, 
 etc).
 * Several key metrics of scheduler algorithm, such as time cost of each 
 scheduler operation (allocate, handle, etc), which can be utilized by Hadoop 
 developers to find the code spots and scalability limits.
 The simulator will provide real time charts showing the behavior of the 
 scheduler and its performance.
 A short demo is available http://www.youtube.com/watch?v=6thLi8q0qLE, showing 
 how to use simulator to simulate Fair Scheduler and Capacity Scheduler.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2013-10-03 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785550#comment-13785550
 ] 

Sandy Ryza commented on YARN-1197:
--

To summarize what I wrote above: YARN is already asymmetrical wrt acquiring and 
releasing resources.  I don't think the minimum allocation logic is enough to 
justify a round trip to the RM.  It will require adding more new states that 
will make the whole thing more confusing and bug-prone.  We can either push 
down this logic into the NodeManager or just handle it on the RM side, i.e. 
refuse to free any resources in the scheduler for a container that decreases 
from 1024 to 1023 mb.

 Support changing resources of an allocated container
 

 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan
 Attachments: yarn-1197.pdf


 Currently, YARN cannot support merge several containers in one node to a big 
 container, which can make us incrementally ask resources, merge them to a 
 bigger one, and launch our processes. The user scenario is described in the 
 comments.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1199) Make NM/RM Versions Available

2013-10-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785551#comment-13785551
 ] 

Hudson commented on YARN-1199:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #4526 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4526/])
YARN-1199. Make NM/RM Versions Available (Mit Desai via jeagles) (jeagles: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1529003)
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/v2/TestRMNMInfo.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/nodemanager/NodeInfo.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/RMNodeWrapper.java
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMNMInfo.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNode.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/NodesPage.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/NodeInfo.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNM.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNodes.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMNodeTransitions.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestNodesPage.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesNodes.java


 Make NM/RM Versions Available
 -

 Key: YARN-1199
 URL: https://issues.apache.org/jira/browse/YARN-1199
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Mit Desai
Assignee: Mit Desai
 Attachments: YARN-1199.patch, YARN-1199.patch, YARN-1199.patch, 
 YARN-1199.patch


 Now as we have the NM and RM Versions available, we can display the YARN 
 version of nodes running in the cluster.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-624) Support gang scheduling in the AM RM protocol

2013-10-03 Thread Robert Joseph Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785556#comment-13785556
 ] 

Robert Joseph Evans commented on YARN-624:
--

[~curino] Sorry about the late reply.  I have not really tested this much with 
storm on YARN.  Most of our experiments it is negligible the amount of time it 
takes to get nodes.  But we have not really done anything serious with it, and 
adding new nodes right now is a manual operation.

 Support gang scheduling in the AM RM protocol
 -

 Key: YARN-624
 URL: https://issues.apache.org/jira/browse/YARN-624
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, scheduler
Affects Versions: 2.0.4-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza

 Per discussion on YARN-392 and elsewhere, gang scheduling, in which a 
 scheduler runs a set of tasks when they can all be run at the same time, 
 would be a useful feature for YARN schedulers to support.
 Currently, AMs can approximate this by holding on to containers until they 
 get all the ones they need.  However, this lends itself to deadlocks when 
 different AMs are waiting on the same containers.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-624) Support gang scheduling in the AM RM protocol

2013-10-03 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785570#comment-13785570
 ] 

Carlo Curino commented on YARN-624:
---

Got it.. thanks anyway, please keep us posted if you get with Storm or Giraph 
to get some concrete numbers... 

 Support gang scheduling in the AM RM protocol
 -

 Key: YARN-624
 URL: https://issues.apache.org/jira/browse/YARN-624
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, scheduler
Affects Versions: 2.0.4-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza

 Per discussion on YARN-392 and elsewhere, gang scheduling, in which a 
 scheduler runs a set of tasks when they can all be run at the same time, 
 would be a useful feature for YARN schedulers to support.
 Currently, AMs can approximate this by holding on to containers until they 
 get all the ones they need.  However, this lends itself to deadlocks when 
 different AMs are waiting on the same containers.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1256) NM silently ignores non-existent service in StartContainerRequest

2013-10-03 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785571#comment-13785571
 ] 

Bikas Saha commented on YARN-1256:
--

Can we do a null check for serviceData. And can we please create a new 
exception similar to InvalidContainerException. e.g. InvalidAuxServiceException.
{code}
+MapString, ByteBuffer serviceData = getAuxServiceMetaData();
+for (Map.EntryString, ByteBuffer meta : launchContext.getServiceData()
+.entrySet()) {
+  if (null == serviceData.get(meta.getKey())) {
+throw new YarnException(The auxService: + meta.getKey()
++  does not exist);
+  }
+}
{code}

 NM silently ignores non-existent service in StartContainerRequest
 -

 Key: YARN-1256
 URL: https://issues.apache.org/jira/browse/YARN-1256
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.1-beta
Reporter: Bikas Saha
Assignee: Xuan Gong
Priority: Critical
 Fix For: 2.1.2-beta

 Attachments: YARN-1256.1.patch, YARN-1256.2.patch


 A container can set token service metadata for a service, say 
 shuffle_service. If that service does not exist then the errors is silently 
 ignored. Later, when the next container wants to access data written to 
 shuffle_service by the first task, then it fails because the service does not 
 have the token that was supposed to be set by the first task.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Assigned] (YARN-658) Command to kill a YARN application does not work with newer Ubuntu versions

2013-10-03 Thread Robert Parker (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Parker reassigned YARN-658:
--

Assignee: Robert Parker

 Command to kill a YARN application does not work with newer Ubuntu versions
 ---

 Key: YARN-658
 URL: https://issues.apache.org/jira/browse/YARN-658
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.0.3-alpha, 2.0.4-alpha
Reporter: David Yan
Assignee: Robert Parker
 Attachments: AppMaster.stderr, 
 yarn-david-nodemanager-david-ubuntu.out, 
 yarn-david-resourcemanager-david-ubuntu.out


 After issuing a KillApplicationRequest, the application keeps running on the 
 system even though the state is changed to KILLED.  It happens on both Ubuntu 
 12.10 and 13.04, but works fine on Ubuntu 12.04.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Resolved] (YARN-658) Command to kill a YARN application does not work with newer Ubuntu versions

2013-10-03 Thread Robert Parker (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Parker resolved YARN-658.


Resolution: Duplicate

 Command to kill a YARN application does not work with newer Ubuntu versions
 ---

 Key: YARN-658
 URL: https://issues.apache.org/jira/browse/YARN-658
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.0.3-alpha, 2.0.4-alpha
Reporter: David Yan
Assignee: Robert Parker
 Attachments: AppMaster.stderr, 
 yarn-david-nodemanager-david-ubuntu.out, 
 yarn-david-resourcemanager-david-ubuntu.out


 After issuing a KillApplicationRequest, the application keeps running on the 
 system even though the state is changed to KILLED.  It happens on both Ubuntu 
 12.10 and 13.04, but works fine on Ubuntu 12.04.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-890) The roundup for memory values on resource manager UI is misleading

2013-10-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785589#comment-13785589
 ] 

Hudson commented on YARN-890:
-

SUCCESS: Integrated in Hadoop-trunk-Commit #4528 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4528/])
YARN-890. Ensure CapacityScheduler doesn't round-up metric for available 
resources. Contributed by Xuan Gong  Hitesh Shah. (acmurthy: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1529015)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSQueueUtils.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java


 The roundup for memory values on resource manager UI is misleading
 --

 Key: YARN-890
 URL: https://issues.apache.org/jira/browse/YARN-890
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Trupti Dhavle
Assignee: Xuan Gong
 Fix For: 2.1.2-beta

 Attachments: Screen Shot 2013-07-10 at 10.43.34 AM.png, 
 YARN-890.1.patch, YARN-890.2.patch


 From the yarn-site.xml, I see following values-
 property
 nameyarn.nodemanager.resource.memory-mb/name
 value4192/value
 /property
 property
 nameyarn.scheduler.maximum-allocation-mb/name
 value4192/value
 /property
 property
 nameyarn.scheduler.minimum-allocation-mb/name
 value1024/value
 /property
 However the resourcemanager UI shows total memory as 5MB 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1149) NM throws InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING

2013-10-03 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-1149:


Attachment: YARN-1149.7.patch

Remove some transitions from ApplicationImpl state from previous patch since 
those transitions are impossible to happen.

 NM throws InvalidStateTransitonException: Invalid event: 
 APPLICATION_LOG_HANDLING_FINISHED at RUNNING
 -

 Key: YARN-1149
 URL: https://issues.apache.org/jira/browse/YARN-1149
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ramya Sunil
Assignee: Xuan Gong
 Fix For: 2.1.2-beta

 Attachments: YARN-1149.1.patch, YARN-1149.2.patch, YARN-1149.3.patch, 
 YARN-1149.4.patch, YARN-1149.5.patch, YARN-1149.6.patch, YARN-1149.7.patch


 When nodemanager receives a kill signal when an application has finished 
 execution but log aggregation has not kicked in, 
 InvalidStateTransitonException: Invalid event: 
 APPLICATION_LOG_HANDLING_FINISHED at RUNNING is thrown
 {noformat}
 2013-08-25 20:45:00,875 INFO  logaggregation.AppLogAggregatorImpl 
 (AppLogAggregatorImpl.java:finishLogAggregation(254)) - Application just 
 finished : application_1377459190746_0118
 2013-08-25 20:45:00,876 INFO  logaggregation.AppLogAggregatorImpl 
 (AppLogAggregatorImpl.java:uploadLogsForContainer(105)) - Starting aggregate 
 log-file for app application_1377459190746_0118 at 
 /app-logs/foo/logs/application_1377459190746_0118/host_45454.tmp
 2013-08-25 20:45:00,876 INFO  logaggregation.LogAggregationService 
 (LogAggregationService.java:stopAggregators(151)) - Waiting for aggregation 
 to complete for application_1377459190746_0118
 2013-08-25 20:45:00,891 INFO  logaggregation.AppLogAggregatorImpl 
 (AppLogAggregatorImpl.java:uploadLogsForContainer(122)) - Uploading logs for 
 container container_1377459190746_0118_01_04. Current good log dirs are 
 /tmp/yarn/local
 2013-08-25 20:45:00,915 INFO  logaggregation.AppLogAggregatorImpl 
 (AppLogAggregatorImpl.java:doAppLogAggregation(182)) - Finished aggregate 
 log-file for app application_1377459190746_0118
 2013-08-25 20:45:00,925 WARN  application.Application 
 (ApplicationImpl.java:handle(427)) - Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 APPLICATION_LOG_HANDLING_FINISHED at RUNNING
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
  
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:425)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:59)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:697)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:689)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:81)   
 at java.lang.Thread.run(Thread.java:662)
 2013-08-25 20:45:00,926 INFO  application.Application 
 (ApplicationImpl.java:handle(430)) - Application 
 application_1377459190746_0118 transitioned from RUNNING to null
 2013-08-25 20:45:00,927 WARN  monitor.ContainersMonitorImpl 
 (ContainersMonitorImpl.java:run(463)) - 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl
  is interrupted. Exiting.
 2013-08-25 20:45:00,938 INFO  ipc.Server (Server.java:stop(2437)) - Stopping 
 server on 8040
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1256) NM silently ignores non-existent service in StartContainerRequest

2013-10-03 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-1256:


Attachment: YARN-1256.3.patch

Add null check for AuxService, and create new InvalidAuxServiceException for 
missing AuxService

 NM silently ignores non-existent service in StartContainerRequest
 -

 Key: YARN-1256
 URL: https://issues.apache.org/jira/browse/YARN-1256
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.1-beta
Reporter: Bikas Saha
Assignee: Xuan Gong
Priority: Critical
 Fix For: 2.1.2-beta

 Attachments: YARN-1256.1.patch, YARN-1256.2.patch, YARN-1256.3.patch


 A container can set token service metadata for a service, say 
 shuffle_service. If that service does not exist then the errors is silently 
 ignored. Later, when the next container wants to access data written to 
 shuffle_service by the first task, then it fails because the service does not 
 have the token that was supposed to be set by the first task.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1149) NM throws InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING

2013-10-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785609#comment-13785609
 ] 

Hadoop QA commented on YARN-1149:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12606658/YARN-1149.7.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 9 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2079//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2079//console

This message is automatically generated.

 NM throws InvalidStateTransitonException: Invalid event: 
 APPLICATION_LOG_HANDLING_FINISHED at RUNNING
 -

 Key: YARN-1149
 URL: https://issues.apache.org/jira/browse/YARN-1149
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ramya Sunil
Assignee: Xuan Gong
 Fix For: 2.1.2-beta

 Attachments: YARN-1149.1.patch, YARN-1149.2.patch, YARN-1149.3.patch, 
 YARN-1149.4.patch, YARN-1149.5.patch, YARN-1149.6.patch, YARN-1149.7.patch


 When nodemanager receives a kill signal when an application has finished 
 execution but log aggregation has not kicked in, 
 InvalidStateTransitonException: Invalid event: 
 APPLICATION_LOG_HANDLING_FINISHED at RUNNING is thrown
 {noformat}
 2013-08-25 20:45:00,875 INFO  logaggregation.AppLogAggregatorImpl 
 (AppLogAggregatorImpl.java:finishLogAggregation(254)) - Application just 
 finished : application_1377459190746_0118
 2013-08-25 20:45:00,876 INFO  logaggregation.AppLogAggregatorImpl 
 (AppLogAggregatorImpl.java:uploadLogsForContainer(105)) - Starting aggregate 
 log-file for app application_1377459190746_0118 at 
 /app-logs/foo/logs/application_1377459190746_0118/host_45454.tmp
 2013-08-25 20:45:00,876 INFO  logaggregation.LogAggregationService 
 (LogAggregationService.java:stopAggregators(151)) - Waiting for aggregation 
 to complete for application_1377459190746_0118
 2013-08-25 20:45:00,891 INFO  logaggregation.AppLogAggregatorImpl 
 (AppLogAggregatorImpl.java:uploadLogsForContainer(122)) - Uploading logs for 
 container container_1377459190746_0118_01_04. Current good log dirs are 
 /tmp/yarn/local
 2013-08-25 20:45:00,915 INFO  logaggregation.AppLogAggregatorImpl 
 (AppLogAggregatorImpl.java:doAppLogAggregation(182)) - Finished aggregate 
 log-file for app application_1377459190746_0118
 2013-08-25 20:45:00,925 WARN  application.Application 
 (ApplicationImpl.java:handle(427)) - Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 APPLICATION_LOG_HANDLING_FINISHED at RUNNING
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
  
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:425)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:59)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:697)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:689)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:81)   
 at 

[jira] [Commented] (YARN-1149) NM throws InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING

2013-10-03 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785622#comment-13785622
 ] 

Hitesh Shah commented on YARN-1149:
---

{code}
+/**
+ * Application is killed by ResourceManager
+ */
+ON_SHUTDOWN, 
+
+/**
+ * Application is killed as NodeManager is shut down
+ */
+BY_RESOURCEMANAGER
{code}

  -  descriptions reversed 

{code}
+  default:
+LOG.warn(Invalid eventType:  + eventType);
+}
{code}

   - earlier comment on invalid event type not addressed? 

{code}
+  @SuppressWarnings(unchecked)
+  static class NonTransition implements
+  SingleArcTransitionApplicationImpl, ApplicationEvent {
+@Override
+public void transition(ApplicationImpl app, ApplicationEvent event) {
+  if (LOG.isDebugEnabled()) {
+LOG.debug(The event:  + event.getType()
++  is invalid in current state :  + app.getApplicationState());
+  }
+}
+  }
+
{code}
   
- may be better to not have a non-transition. Current message reads as if 
this is an error and is being ignored with no reason as to why it is ignored. 

 NM throws InvalidStateTransitonException: Invalid event: 
 APPLICATION_LOG_HANDLING_FINISHED at RUNNING
 -

 Key: YARN-1149
 URL: https://issues.apache.org/jira/browse/YARN-1149
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ramya Sunil
Assignee: Xuan Gong
 Fix For: 2.1.2-beta

 Attachments: YARN-1149.1.patch, YARN-1149.2.patch, YARN-1149.3.patch, 
 YARN-1149.4.patch, YARN-1149.5.patch, YARN-1149.6.patch, YARN-1149.7.patch


 When nodemanager receives a kill signal when an application has finished 
 execution but log aggregation has not kicked in, 
 InvalidStateTransitonException: Invalid event: 
 APPLICATION_LOG_HANDLING_FINISHED at RUNNING is thrown
 {noformat}
 2013-08-25 20:45:00,875 INFO  logaggregation.AppLogAggregatorImpl 
 (AppLogAggregatorImpl.java:finishLogAggregation(254)) - Application just 
 finished : application_1377459190746_0118
 2013-08-25 20:45:00,876 INFO  logaggregation.AppLogAggregatorImpl 
 (AppLogAggregatorImpl.java:uploadLogsForContainer(105)) - Starting aggregate 
 log-file for app application_1377459190746_0118 at 
 /app-logs/foo/logs/application_1377459190746_0118/host_45454.tmp
 2013-08-25 20:45:00,876 INFO  logaggregation.LogAggregationService 
 (LogAggregationService.java:stopAggregators(151)) - Waiting for aggregation 
 to complete for application_1377459190746_0118
 2013-08-25 20:45:00,891 INFO  logaggregation.AppLogAggregatorImpl 
 (AppLogAggregatorImpl.java:uploadLogsForContainer(122)) - Uploading logs for 
 container container_1377459190746_0118_01_04. Current good log dirs are 
 /tmp/yarn/local
 2013-08-25 20:45:00,915 INFO  logaggregation.AppLogAggregatorImpl 
 (AppLogAggregatorImpl.java:doAppLogAggregation(182)) - Finished aggregate 
 log-file for app application_1377459190746_0118
 2013-08-25 20:45:00,925 WARN  application.Application 
 (ApplicationImpl.java:handle(427)) - Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 APPLICATION_LOG_HANDLING_FINISHED at RUNNING
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
  
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:425)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:59)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:697)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:689)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:81)   
 at java.lang.Thread.run(Thread.java:662)
 2013-08-25 20:45:00,926 INFO  application.Application 
 (ApplicationImpl.java:handle(430)) - Application 
 application_1377459190746_0118 transitioned from RUNNING to null
 2013-08-25 20:45:00,927 WARN  monitor.ContainersMonitorImpl 
 (ContainersMonitorImpl.java:run(463)) - 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl
  is interrupted. Exiting.
 2013-08-25 

[jira] [Updated] (YARN-1256) NM silently ignores non-existent service in StartContainerRequest

2013-10-03 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-1256:


Attachment: YARN-1256.4.patch

Add the null check for AuxService Data requested from CLC

 NM silently ignores non-existent service in StartContainerRequest
 -

 Key: YARN-1256
 URL: https://issues.apache.org/jira/browse/YARN-1256
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.1-beta
Reporter: Bikas Saha
Assignee: Xuan Gong
Priority: Critical
 Fix For: 2.1.2-beta

 Attachments: YARN-1256.1.patch, YARN-1256.2.patch, YARN-1256.3.patch, 
 YARN-1256.4.patch


 A container can set token service metadata for a service, say 
 shuffle_service. If that service does not exist then the errors is silently 
 ignored. Later, when the next container wants to access data written to 
 shuffle_service by the first task, then it fails because the service does not 
 have the token that was supposed to be set by the first task.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1131) $yarn logs command should return an appropriate error message if YARN application is still running

2013-10-03 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated YARN-1131:
-

Attachment: YARN-1131.3.txt

Updated the patch to get the tests working, also added one more test for when 
an app is not known by the RM.

 $yarn logs command should return an appropriate error message if YARN 
 application is still running
 --

 Key: YARN-1131
 URL: https://issues.apache.org/jira/browse/YARN-1131
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client
Affects Versions: 2.1.1-beta
Reporter: Tassapol Athiapinya
Assignee: Siddharth Seth
Priority: Minor
 Attachments: YARN-1131.1.txt, YARN-1131.2.txt, YARN-1131.3.txt


 In the case when log aggregation is enabled, if a user submits MapReduce job 
 and runs $ yarn logs -applicationId app ID while the YARN application is 
 running, the command will return no message and return user back to shell. It 
 is nice to tell the user that log aggregation is in progress.
 {code}
 -bash-4.1$ /usr/bin/yarn logs -applicationId application_1377900193583_0002
 -bash-4.1$
 {code}
 At the same time, if invalid application ID is given, YARN CLI should say 
 that the application ID is incorrect rather than throwing 
 NoSuchElementException.
 {code}
 $ /usr/bin/yarn logs -applicationId application_0
 Exception in thread main java.util.NoSuchElementException
 at com.google.common.base.AbstractIterator.next(AbstractIterator.java:75)
 at 
 org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:124)
 at 
 org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:119)
 at org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:110)
 at org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:255)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1256) NM silently ignores non-existent service in StartContainerRequest

2013-10-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785627#comment-13785627
 ] 

Hadoop QA commented on YARN-1256:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12606665/YARN-1256.3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2080//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2080//console

This message is automatically generated.

 NM silently ignores non-existent service in StartContainerRequest
 -

 Key: YARN-1256
 URL: https://issues.apache.org/jira/browse/YARN-1256
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.1-beta
Reporter: Bikas Saha
Assignee: Xuan Gong
Priority: Critical
 Fix For: 2.1.2-beta

 Attachments: YARN-1256.1.patch, YARN-1256.2.patch, YARN-1256.3.patch, 
 YARN-1256.4.patch


 A container can set token service metadata for a service, say 
 shuffle_service. If that service does not exist then the errors is silently 
 ignored. Later, when the next container wants to access data written to 
 shuffle_service by the first task, then it fails because the service does not 
 have the token that was supposed to be set by the first task.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2013-10-03 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785628#comment-13785628
 ] 

Alejandro Abdelnur commented on YARN-1197:
--

Bikas, yep, there is a race condition the AM-RM-NM for decreasing. At least 
in the FS due to continuous scheduling (YARN-1010) because the RM could 
allocate the freed space to an AM before the NM heartbeats and gets the info. 
This does not happen if allocations are tied to the corresponding NM 
heartbeating. Thanks.

 Support changing resources of an allocated container
 

 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan
 Attachments: yarn-1197.pdf


 Currently, YARN cannot support merge several containers in one node to a big 
 container, which can make us incrementally ask resources, merge them to a 
 bigger one, and launch our processes. The user scenario is described in the 
 comments.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1236) FairScheduler setting queue name in RMApp is not working

2013-10-03 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785638#comment-13785638
 ] 

Alejandro Abdelnur commented on YARN-1236:
--

+1

 FairScheduler setting queue name in RMApp is not working 
 -

 Key: YARN-1236
 URL: https://issues.apache.org/jira/browse/YARN-1236
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-1236.patch


 The fair scheduler sometimes picks a different queue than the one an 
 application was submitted to, such as when user-as-default-queue is turned 
 on.  It needs to update the queue name in the RMApp so that this choice will 
 be reflected in the UI.
 This isn't working because the scheduler is looking up the RMApp by 
 application attempt id instead of app id and failing to find it.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1265) Fair Scheduler chokes on unhealthy node reconnect

2013-10-03 Thread Sandy Ryza (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-1265:
-

Attachment: YARN-1265-1.patch

 Fair Scheduler chokes on unhealthy node reconnect
 -

 Key: YARN-1265
 URL: https://issues.apache.org/jira/browse/YARN-1265
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.1.1-beta
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-1265-1.patch, YARN-1265.patch


 Only nodes in the RUNNING state are tracked by schedulers.  When a node 
 reconnects, RMNodeImpl.ReconnectNodeTransition tries to remove it, even if 
 it's in the RUNNING state.  The FairScheduler doesn't guard against this.
 I think the best way to fix this is to check to see whether a node is RUNNING 
 before telling the scheduler to remove it.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1265) Fair Scheduler chokes on unhealthy node reconnect

2013-10-03 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785669#comment-13785669
 ] 

Sandy Ryza commented on YARN-1265:
--

Uploaded a patch that, instead of the above, changes the Fair Scheduler's 
behavior to mimic the Capacity Scheduler.

 Fair Scheduler chokes on unhealthy node reconnect
 -

 Key: YARN-1265
 URL: https://issues.apache.org/jira/browse/YARN-1265
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.1.1-beta
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-1265-1.patch, YARN-1265.patch


 Only nodes in the RUNNING state are tracked by schedulers.  When a node 
 reconnects, RMNodeImpl.ReconnectNodeTransition tries to remove it, even if 
 it's in the RUNNING state.  The FairScheduler doesn't guard against this.
 I think the best way to fix this is to check to see whether a node is RUNNING 
 before telling the scheduler to remove it.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-621) RM triggers web auth failure before first job

2013-10-03 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785670#comment-13785670
 ] 

Vinod Kumar Vavilapalli commented on YARN-621:
--

Tx for the clarification. The patch now makes sense to me. +1, checking this in.

 RM triggers web auth failure before first job
 -

 Key: YARN-621
 URL: https://issues.apache.org/jira/browse/YARN-621
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.0.4-alpha
Reporter: Allen Wittenauer
Assignee: Omkar Vinit Joshi
Priority: Critical
 Attachments: YARN-621.20131001.1.patch


 On a secure YARN setup, before the first job is executed, going to the web 
 interface of the resource manager triggers authentication errors.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-621) RM triggers web auth failure before first job

2013-10-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785673#comment-13785673
 ] 

Hudson commented on YARN-621:
-

SUCCESS: Integrated in Hadoop-trunk-Commit #4529 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4529/])
YARN-621. Changed YARN web app to not add paths that can cause duplicate 
additions of authenticated filters there by causing kerberos replay errors. 
Contributed by Omkar Vinit Joshi. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1529030)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/WebApps.java


 RM triggers web auth failure before first job
 -

 Key: YARN-621
 URL: https://issues.apache.org/jira/browse/YARN-621
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.0.4-alpha
Reporter: Allen Wittenauer
Assignee: Omkar Vinit Joshi
Priority: Critical
 Fix For: 2.1.2-beta

 Attachments: YARN-621.20131001.1.patch


 On a secure YARN setup, before the first job is executed, going to the web 
 interface of the resource manager triggers authentication errors.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1265) Fair Scheduler chokes on unhealthy node reconnect

2013-10-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785683#comment-13785683
 ] 

Hadoop QA commented on YARN-1265:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12606676/YARN-1265-1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2083//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2083//console

This message is automatically generated.

 Fair Scheduler chokes on unhealthy node reconnect
 -

 Key: YARN-1265
 URL: https://issues.apache.org/jira/browse/YARN-1265
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.1.1-beta
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-1265-1.patch, YARN-1265.patch


 Only nodes in the RUNNING state are tracked by schedulers.  When a node 
 reconnects, RMNodeImpl.ReconnectNodeTransition tries to remove it, even if 
 it's in the RUNNING state.  The FairScheduler doesn't guard against this.
 I think the best way to fix this is to check to see whether a node is RUNNING 
 before telling the scheduler to remove it.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (YARN-1271) Text file busy errors launching containers again

2013-10-03 Thread Sandy Ryza (JIRA)
Sandy Ryza created YARN-1271:


 Summary: Text file busy errors launching containers again
 Key: YARN-1271
 URL: https://issues.apache.org/jira/browse/YARN-1271
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.1.1-beta
Reporter: Sandy Ryza
Assignee: Sandy Ryza


The error is shown below in the comments.

MAPREDUCE-2374 fixed this by removing -c when running the container launch 
script.  It looks like the -c got brought back during the windows branch 
merge, so we should remove it again.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1271) Text file busy errors launching containers again

2013-10-03 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785688#comment-13785688
 ] 

Sandy Ryza commented on YARN-1271:
--

{code}
Exception from container-launch: 
org.apache.hadoop.util.Shell$ExitCodeException: bash: 
/data/5/yarn/nm/usercache/jenkins/appcache/application_1380783835333_0011/container_1380783835333_0011_01_000476/default_container_executor.sh:
 /bin/bash: bad interpreter: Text file busy

at org.apache.hadoop.util.Shell.runCommand(Shell.java:458)
at org.apache.hadoop.util.Shell.run(Shell.java:373)
at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:578)
at 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:258)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:74)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
{code}

 Text file busy errors launching containers again
 --

 Key: YARN-1271
 URL: https://issues.apache.org/jira/browse/YARN-1271
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.1.1-beta
Reporter: Sandy Ryza
Assignee: Sandy Ryza

 The error is shown below in the comments.
 MAPREDUCE-2374 fixed this by removing -c when running the container launch 
 script.  It looks like the -c got brought back during the windows branch 
 merge, so we should remove it again.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1149) NM throws InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING

2013-10-03 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-1149:


Attachment: YARN-1149.8.patch

Address all the comments

 NM throws InvalidStateTransitonException: Invalid event: 
 APPLICATION_LOG_HANDLING_FINISHED at RUNNING
 -

 Key: YARN-1149
 URL: https://issues.apache.org/jira/browse/YARN-1149
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ramya Sunil
Assignee: Xuan Gong
 Fix For: 2.1.2-beta

 Attachments: YARN-1149.1.patch, YARN-1149.2.patch, YARN-1149.3.patch, 
 YARN-1149.4.patch, YARN-1149.5.patch, YARN-1149.6.patch, YARN-1149.7.patch, 
 YARN-1149.8.patch


 When nodemanager receives a kill signal when an application has finished 
 execution but log aggregation has not kicked in, 
 InvalidStateTransitonException: Invalid event: 
 APPLICATION_LOG_HANDLING_FINISHED at RUNNING is thrown
 {noformat}
 2013-08-25 20:45:00,875 INFO  logaggregation.AppLogAggregatorImpl 
 (AppLogAggregatorImpl.java:finishLogAggregation(254)) - Application just 
 finished : application_1377459190746_0118
 2013-08-25 20:45:00,876 INFO  logaggregation.AppLogAggregatorImpl 
 (AppLogAggregatorImpl.java:uploadLogsForContainer(105)) - Starting aggregate 
 log-file for app application_1377459190746_0118 at 
 /app-logs/foo/logs/application_1377459190746_0118/host_45454.tmp
 2013-08-25 20:45:00,876 INFO  logaggregation.LogAggregationService 
 (LogAggregationService.java:stopAggregators(151)) - Waiting for aggregation 
 to complete for application_1377459190746_0118
 2013-08-25 20:45:00,891 INFO  logaggregation.AppLogAggregatorImpl 
 (AppLogAggregatorImpl.java:uploadLogsForContainer(122)) - Uploading logs for 
 container container_1377459190746_0118_01_04. Current good log dirs are 
 /tmp/yarn/local
 2013-08-25 20:45:00,915 INFO  logaggregation.AppLogAggregatorImpl 
 (AppLogAggregatorImpl.java:doAppLogAggregation(182)) - Finished aggregate 
 log-file for app application_1377459190746_0118
 2013-08-25 20:45:00,925 WARN  application.Application 
 (ApplicationImpl.java:handle(427)) - Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 APPLICATION_LOG_HANDLING_FINISHED at RUNNING
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
  
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:425)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:59)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:697)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:689)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:81)   
 at java.lang.Thread.run(Thread.java:662)
 2013-08-25 20:45:00,926 INFO  application.Application 
 (ApplicationImpl.java:handle(430)) - Application 
 application_1377459190746_0118 transitioned from RUNNING to null
 2013-08-25 20:45:00,927 WARN  monitor.ContainersMonitorImpl 
 (ContainersMonitorImpl.java:run(463)) - 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl
  is interrupted. Exiting.
 2013-08-25 20:45:00,938 INFO  ipc.Server (Server.java:stop(2437)) - Stopping 
 server on 8040
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1149) NM throws InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING

2013-10-03 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-1149:


Attachment: YARN-1149.9.patch

 NM throws InvalidStateTransitonException: Invalid event: 
 APPLICATION_LOG_HANDLING_FINISHED at RUNNING
 -

 Key: YARN-1149
 URL: https://issues.apache.org/jira/browse/YARN-1149
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ramya Sunil
Assignee: Xuan Gong
 Fix For: 2.1.2-beta

 Attachments: YARN-1149.1.patch, YARN-1149.2.patch, YARN-1149.3.patch, 
 YARN-1149.4.patch, YARN-1149.5.patch, YARN-1149.6.patch, YARN-1149.7.patch, 
 YARN-1149.8.patch, YARN-1149.9.patch


 When nodemanager receives a kill signal when an application has finished 
 execution but log aggregation has not kicked in, 
 InvalidStateTransitonException: Invalid event: 
 APPLICATION_LOG_HANDLING_FINISHED at RUNNING is thrown
 {noformat}
 2013-08-25 20:45:00,875 INFO  logaggregation.AppLogAggregatorImpl 
 (AppLogAggregatorImpl.java:finishLogAggregation(254)) - Application just 
 finished : application_1377459190746_0118
 2013-08-25 20:45:00,876 INFO  logaggregation.AppLogAggregatorImpl 
 (AppLogAggregatorImpl.java:uploadLogsForContainer(105)) - Starting aggregate 
 log-file for app application_1377459190746_0118 at 
 /app-logs/foo/logs/application_1377459190746_0118/host_45454.tmp
 2013-08-25 20:45:00,876 INFO  logaggregation.LogAggregationService 
 (LogAggregationService.java:stopAggregators(151)) - Waiting for aggregation 
 to complete for application_1377459190746_0118
 2013-08-25 20:45:00,891 INFO  logaggregation.AppLogAggregatorImpl 
 (AppLogAggregatorImpl.java:uploadLogsForContainer(122)) - Uploading logs for 
 container container_1377459190746_0118_01_04. Current good log dirs are 
 /tmp/yarn/local
 2013-08-25 20:45:00,915 INFO  logaggregation.AppLogAggregatorImpl 
 (AppLogAggregatorImpl.java:doAppLogAggregation(182)) - Finished aggregate 
 log-file for app application_1377459190746_0118
 2013-08-25 20:45:00,925 WARN  application.Application 
 (ApplicationImpl.java:handle(427)) - Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 APPLICATION_LOG_HANDLING_FINISHED at RUNNING
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
  
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:425)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:59)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:697)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:689)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:81)   
 at java.lang.Thread.run(Thread.java:662)
 2013-08-25 20:45:00,926 INFO  application.Application 
 (ApplicationImpl.java:handle(430)) - Application 
 application_1377459190746_0118 transitioned from RUNNING to null
 2013-08-25 20:45:00,927 WARN  monitor.ContainersMonitorImpl 
 (ContainersMonitorImpl.java:run(463)) - 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl
  is interrupted. Exiting.
 2013-08-25 20:45:00,938 INFO  ipc.Server (Server.java:stop(2437)) - Stopping 
 server on 8040
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1271) Text file busy errors launching containers again

2013-10-03 Thread Sandy Ryza (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-1271:
-

Attachment: YARN-1271.patch

 Text file busy errors launching containers again
 --

 Key: YARN-1271
 URL: https://issues.apache.org/jira/browse/YARN-1271
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.1.1-beta
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-1271.patch


 The error is shown below in the comments.
 MAPREDUCE-2374 fixed this by removing -c when running the container launch 
 script.  It looks like the -c got brought back during the windows branch 
 merge, so we should remove it again.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1256) NM silently ignores non-existent service in StartContainerRequest

2013-10-03 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-1256:


Attachment: YARN-1256.5.patch

 NM silently ignores non-existent service in StartContainerRequest
 -

 Key: YARN-1256
 URL: https://issues.apache.org/jira/browse/YARN-1256
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.1-beta
Reporter: Bikas Saha
Assignee: Xuan Gong
Priority: Critical
 Fix For: 2.1.2-beta

 Attachments: YARN-1256.1.patch, YARN-1256.2.patch, YARN-1256.3.patch, 
 YARN-1256.4.patch, YARN-1256.5.patch


 A container can set token service metadata for a service, say 
 shuffle_service. If that service does not exist then the errors is silently 
 ignored. Later, when the next container wants to access data written to 
 shuffle_service by the first task, then it fails because the service does not 
 have the token that was supposed to be set by the first task.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1149) NM throws InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING

2013-10-03 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785714#comment-13785714
 ] 

Hitesh Shah commented on YARN-1149:
---

+1. Latest patch looks good. Will commit after jenkins blesses it. 

 NM throws InvalidStateTransitonException: Invalid event: 
 APPLICATION_LOG_HANDLING_FINISHED at RUNNING
 -

 Key: YARN-1149
 URL: https://issues.apache.org/jira/browse/YARN-1149
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ramya Sunil
Assignee: Xuan Gong
 Fix For: 2.1.2-beta

 Attachments: YARN-1149.1.patch, YARN-1149.2.patch, YARN-1149.3.patch, 
 YARN-1149.4.patch, YARN-1149.5.patch, YARN-1149.6.patch, YARN-1149.7.patch, 
 YARN-1149.8.patch, YARN-1149.9.patch


 When nodemanager receives a kill signal when an application has finished 
 execution but log aggregation has not kicked in, 
 InvalidStateTransitonException: Invalid event: 
 APPLICATION_LOG_HANDLING_FINISHED at RUNNING is thrown
 {noformat}
 2013-08-25 20:45:00,875 INFO  logaggregation.AppLogAggregatorImpl 
 (AppLogAggregatorImpl.java:finishLogAggregation(254)) - Application just 
 finished : application_1377459190746_0118
 2013-08-25 20:45:00,876 INFO  logaggregation.AppLogAggregatorImpl 
 (AppLogAggregatorImpl.java:uploadLogsForContainer(105)) - Starting aggregate 
 log-file for app application_1377459190746_0118 at 
 /app-logs/foo/logs/application_1377459190746_0118/host_45454.tmp
 2013-08-25 20:45:00,876 INFO  logaggregation.LogAggregationService 
 (LogAggregationService.java:stopAggregators(151)) - Waiting for aggregation 
 to complete for application_1377459190746_0118
 2013-08-25 20:45:00,891 INFO  logaggregation.AppLogAggregatorImpl 
 (AppLogAggregatorImpl.java:uploadLogsForContainer(122)) - Uploading logs for 
 container container_1377459190746_0118_01_04. Current good log dirs are 
 /tmp/yarn/local
 2013-08-25 20:45:00,915 INFO  logaggregation.AppLogAggregatorImpl 
 (AppLogAggregatorImpl.java:doAppLogAggregation(182)) - Finished aggregate 
 log-file for app application_1377459190746_0118
 2013-08-25 20:45:00,925 WARN  application.Application 
 (ApplicationImpl.java:handle(427)) - Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 APPLICATION_LOG_HANDLING_FINISHED at RUNNING
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
  
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:425)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:59)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:697)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:689)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:81)   
 at java.lang.Thread.run(Thread.java:662)
 2013-08-25 20:45:00,926 INFO  application.Application 
 (ApplicationImpl.java:handle(430)) - Application 
 application_1377459190746_0118 transitioned from RUNNING to null
 2013-08-25 20:45:00,927 WARN  monitor.ContainersMonitorImpl 
 (ContainersMonitorImpl.java:run(463)) - 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl
  is interrupted. Exiting.
 2013-08-25 20:45:00,938 INFO  ipc.Server (Server.java:stop(2437)) - Stopping 
 server on 8040
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1167) Submitted distributed shell application shows appMasterHost = empty

2013-10-03 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-1167:


Attachment: YARN-1167.3.patch

Remove rpcPort change

 Submitted distributed shell application shows appMasterHost = empty
 ---

 Key: YARN-1167
 URL: https://issues.apache.org/jira/browse/YARN-1167
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/distributed-shell
Reporter: Tassapol Athiapinya
Assignee: Xuan Gong
 Fix For: 2.1.2-beta

 Attachments: YARN-1167.1.patch, YARN-1167.2.patch, YARN-1167.3.patch


 Submit distributed shell application. Once the application turns to be 
 RUNNING state, app master host should not be empty. In reality, it is empty.
 ==console logs==
 distributedshell.Client: Got application report from ASM for, appId=12, 
 clientToAMToken=null, appDiagnostics=, appMasterHost=, appQueue=default, 
 appMasterRpcPort=0, appStartTime=1378505161360, yarnAppState=RUNNING, 
 distributedFinalState=UNDEFINED, 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1149) NM throws InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING

2013-10-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785717#comment-13785717
 ] 

Hadoop QA commented on YARN-1149:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12606681/YARN-1149.8.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 9 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2084//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2084//console

This message is automatically generated.

 NM throws InvalidStateTransitonException: Invalid event: 
 APPLICATION_LOG_HANDLING_FINISHED at RUNNING
 -

 Key: YARN-1149
 URL: https://issues.apache.org/jira/browse/YARN-1149
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ramya Sunil
Assignee: Xuan Gong
 Fix For: 2.1.2-beta

 Attachments: YARN-1149.1.patch, YARN-1149.2.patch, YARN-1149.3.patch, 
 YARN-1149.4.patch, YARN-1149.5.patch, YARN-1149.6.patch, YARN-1149.7.patch, 
 YARN-1149.8.patch, YARN-1149.9.patch


 When nodemanager receives a kill signal when an application has finished 
 execution but log aggregation has not kicked in, 
 InvalidStateTransitonException: Invalid event: 
 APPLICATION_LOG_HANDLING_FINISHED at RUNNING is thrown
 {noformat}
 2013-08-25 20:45:00,875 INFO  logaggregation.AppLogAggregatorImpl 
 (AppLogAggregatorImpl.java:finishLogAggregation(254)) - Application just 
 finished : application_1377459190746_0118
 2013-08-25 20:45:00,876 INFO  logaggregation.AppLogAggregatorImpl 
 (AppLogAggregatorImpl.java:uploadLogsForContainer(105)) - Starting aggregate 
 log-file for app application_1377459190746_0118 at 
 /app-logs/foo/logs/application_1377459190746_0118/host_45454.tmp
 2013-08-25 20:45:00,876 INFO  logaggregation.LogAggregationService 
 (LogAggregationService.java:stopAggregators(151)) - Waiting for aggregation 
 to complete for application_1377459190746_0118
 2013-08-25 20:45:00,891 INFO  logaggregation.AppLogAggregatorImpl 
 (AppLogAggregatorImpl.java:uploadLogsForContainer(122)) - Uploading logs for 
 container container_1377459190746_0118_01_04. Current good log dirs are 
 /tmp/yarn/local
 2013-08-25 20:45:00,915 INFO  logaggregation.AppLogAggregatorImpl 
 (AppLogAggregatorImpl.java:doAppLogAggregation(182)) - Finished aggregate 
 log-file for app application_1377459190746_0118
 2013-08-25 20:45:00,925 WARN  application.Application 
 (ApplicationImpl.java:handle(427)) - Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 APPLICATION_LOG_HANDLING_FINISHED at RUNNING
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
  
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:425)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:59)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:697)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:689)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134)
 at 
 

[jira] [Commented] (YARN-1131) $yarn logs command should return an appropriate error message if YARN application is still running

2013-10-03 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785718#comment-13785718
 ] 

Siddharth Seth commented on YARN-1131:
--

If another state does get added to the YarnApplicationState - we don't know if 
this is a final state or not. I'd prefer falling back to trying to find the 
logs on disk, which is what happens rightnow.

 $yarn logs command should return an appropriate error message if YARN 
 application is still running
 --

 Key: YARN-1131
 URL: https://issues.apache.org/jira/browse/YARN-1131
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client
Affects Versions: 2.1.1-beta
Reporter: Tassapol Athiapinya
Assignee: Siddharth Seth
Priority: Minor
 Attachments: YARN-1131.1.txt, YARN-1131.2.txt, YARN-1131.3.txt


 In the case when log aggregation is enabled, if a user submits MapReduce job 
 and runs $ yarn logs -applicationId app ID while the YARN application is 
 running, the command will return no message and return user back to shell. It 
 is nice to tell the user that log aggregation is in progress.
 {code}
 -bash-4.1$ /usr/bin/yarn logs -applicationId application_1377900193583_0002
 -bash-4.1$
 {code}
 At the same time, if invalid application ID is given, YARN CLI should say 
 that the application ID is incorrect rather than throwing 
 NoSuchElementException.
 {code}
 $ /usr/bin/yarn logs -applicationId application_0
 Exception in thread main java.util.NoSuchElementException
 at com.google.common.base.AbstractIterator.next(AbstractIterator.java:75)
 at 
 org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:124)
 at 
 org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:119)
 at org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:110)
 at org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:255)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1167) Submitted distributed shell application shows appMasterHost = empty

2013-10-03 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-1167:


Attachment: YARN-1167.4.patch

 Submitted distributed shell application shows appMasterHost = empty
 ---

 Key: YARN-1167
 URL: https://issues.apache.org/jira/browse/YARN-1167
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/distributed-shell
Reporter: Tassapol Athiapinya
Assignee: Xuan Gong
 Fix For: 2.1.2-beta

 Attachments: YARN-1167.1.patch, YARN-1167.2.patch, YARN-1167.3.patch, 
 YARN-1167.4.patch


 Submit distributed shell application. Once the application turns to be 
 RUNNING state, app master host should not be empty. In reality, it is empty.
 ==console logs==
 distributedshell.Client: Got application report from ASM for, appId=12, 
 clientToAMToken=null, appDiagnostics=, appMasterHost=, appQueue=default, 
 appMasterRpcPort=0, appStartTime=1378505161360, yarnAppState=RUNNING, 
 distributedFinalState=UNDEFINED, 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1131) $yarn logs command should return an appropriate error message if YARN application is still running

2013-10-03 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785725#comment-13785725
 ] 

Hitesh Shah commented on YARN-1131:
---

+1. Will commit once jenkins Ok's the patch.

 $yarn logs command should return an appropriate error message if YARN 
 application is still running
 --

 Key: YARN-1131
 URL: https://issues.apache.org/jira/browse/YARN-1131
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client
Affects Versions: 2.1.1-beta
Reporter: Tassapol Athiapinya
Assignee: Siddharth Seth
Priority: Minor
 Attachments: YARN-1131.1.txt, YARN-1131.2.txt, YARN-1131.3.txt


 In the case when log aggregation is enabled, if a user submits MapReduce job 
 and runs $ yarn logs -applicationId app ID while the YARN application is 
 running, the command will return no message and return user back to shell. It 
 is nice to tell the user that log aggregation is in progress.
 {code}
 -bash-4.1$ /usr/bin/yarn logs -applicationId application_1377900193583_0002
 -bash-4.1$
 {code}
 At the same time, if invalid application ID is given, YARN CLI should say 
 that the application ID is incorrect rather than throwing 
 NoSuchElementException.
 {code}
 $ /usr/bin/yarn logs -applicationId application_0
 Exception in thread main java.util.NoSuchElementException
 at com.google.common.base.AbstractIterator.next(AbstractIterator.java:75)
 at 
 org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:124)
 at 
 org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:119)
 at org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:110)
 at org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:255)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1271) Text file busy errors launching containers again

2013-10-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785733#comment-13785733
 ] 

Hadoop QA commented on YARN-1271:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12606685/YARN-1271.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2086//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2086//console

This message is automatically generated.

 Text file busy errors launching containers again
 --

 Key: YARN-1271
 URL: https://issues.apache.org/jira/browse/YARN-1271
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.1.1-beta
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-1271.patch


 The error is shown below in the comments.
 MAPREDUCE-2374 fixed this by removing -c when running the container launch 
 script.  It looks like the -c got brought back during the windows branch 
 merge, so we should remove it again.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1167) Submitted distributed shell application shows appMasterHost = empty

2013-10-03 Thread Omkar Vinit Joshi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785736#comment-13785736
 ] 

Omkar Vinit Joshi commented on YARN-1167:
-

+1 lgtm. Thanks [~xgong]

 Submitted distributed shell application shows appMasterHost = empty
 ---

 Key: YARN-1167
 URL: https://issues.apache.org/jira/browse/YARN-1167
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/distributed-shell
Reporter: Tassapol Athiapinya
Assignee: Xuan Gong
 Fix For: 2.1.2-beta

 Attachments: YARN-1167.1.patch, YARN-1167.2.patch, YARN-1167.3.patch, 
 YARN-1167.4.patch


 Submit distributed shell application. Once the application turns to be 
 RUNNING state, app master host should not be empty. In reality, it is empty.
 ==console logs==
 distributedshell.Client: Got application report from ASM for, appId=12, 
 clientToAMToken=null, appDiagnostics=, appMasterHost=, appQueue=default, 
 appMasterRpcPort=0, appStartTime=1378505161360, yarnAppState=RUNNING, 
 distributedFinalState=UNDEFINED, 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1256) NM silently ignores non-existent service in StartContainerRequest

2013-10-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785741#comment-13785741
 ] 

Hadoop QA commented on YARN-1256:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12606687/YARN-1256.5.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2085//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2085//console

This message is automatically generated.

 NM silently ignores non-existent service in StartContainerRequest
 -

 Key: YARN-1256
 URL: https://issues.apache.org/jira/browse/YARN-1256
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.1-beta
Reporter: Bikas Saha
Assignee: Xuan Gong
Priority: Critical
 Fix For: 2.1.2-beta

 Attachments: YARN-1256.1.patch, YARN-1256.2.patch, YARN-1256.3.patch, 
 YARN-1256.4.patch, YARN-1256.5.patch


 A container can set token service metadata for a service, say 
 shuffle_service. If that service does not exist then the errors is silently 
 ignored. Later, when the next container wants to access data written to 
 shuffle_service by the first task, then it fails because the service does not 
 have the token that was supposed to be set by the first task.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1167) Submitted distributed shell application shows appMasterHost = empty

2013-10-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785744#comment-13785744
 ] 

Hadoop QA commented on YARN-1167:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12606697/YARN-1167.4.patch
  against trunk revision .

{color:red}-1 patch{color}.  Trunk compilation may be broken.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2088//console

This message is automatically generated.

 Submitted distributed shell application shows appMasterHost = empty
 ---

 Key: YARN-1167
 URL: https://issues.apache.org/jira/browse/YARN-1167
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/distributed-shell
Reporter: Tassapol Athiapinya
Assignee: Xuan Gong
 Fix For: 2.1.2-beta

 Attachments: YARN-1167.1.patch, YARN-1167.2.patch, YARN-1167.3.patch, 
 YARN-1167.4.patch


 Submit distributed shell application. Once the application turns to be 
 RUNNING state, app master host should not be empty. In reality, it is empty.
 ==console logs==
 distributedshell.Client: Got application report from ASM for, appId=12, 
 clientToAMToken=null, appDiagnostics=, appMasterHost=, appQueue=default, 
 appMasterRpcPort=0, appStartTime=1378505161360, yarnAppState=RUNNING, 
 distributedFinalState=UNDEFINED, 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1149) NM throws InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING

2013-10-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785745#comment-13785745
 ] 

Hadoop QA commented on YARN-1149:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12606684/YARN-1149.9.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 9 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2087//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2087//console

This message is automatically generated.

 NM throws InvalidStateTransitonException: Invalid event: 
 APPLICATION_LOG_HANDLING_FINISHED at RUNNING
 -

 Key: YARN-1149
 URL: https://issues.apache.org/jira/browse/YARN-1149
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ramya Sunil
Assignee: Xuan Gong
 Fix For: 2.1.2-beta

 Attachments: YARN-1149.1.patch, YARN-1149.2.patch, YARN-1149.3.patch, 
 YARN-1149.4.patch, YARN-1149.5.patch, YARN-1149.6.patch, YARN-1149.7.patch, 
 YARN-1149.8.patch, YARN-1149.9.patch


 When nodemanager receives a kill signal when an application has finished 
 execution but log aggregation has not kicked in, 
 InvalidStateTransitonException: Invalid event: 
 APPLICATION_LOG_HANDLING_FINISHED at RUNNING is thrown
 {noformat}
 2013-08-25 20:45:00,875 INFO  logaggregation.AppLogAggregatorImpl 
 (AppLogAggregatorImpl.java:finishLogAggregation(254)) - Application just 
 finished : application_1377459190746_0118
 2013-08-25 20:45:00,876 INFO  logaggregation.AppLogAggregatorImpl 
 (AppLogAggregatorImpl.java:uploadLogsForContainer(105)) - Starting aggregate 
 log-file for app application_1377459190746_0118 at 
 /app-logs/foo/logs/application_1377459190746_0118/host_45454.tmp
 2013-08-25 20:45:00,876 INFO  logaggregation.LogAggregationService 
 (LogAggregationService.java:stopAggregators(151)) - Waiting for aggregation 
 to complete for application_1377459190746_0118
 2013-08-25 20:45:00,891 INFO  logaggregation.AppLogAggregatorImpl 
 (AppLogAggregatorImpl.java:uploadLogsForContainer(122)) - Uploading logs for 
 container container_1377459190746_0118_01_04. Current good log dirs are 
 /tmp/yarn/local
 2013-08-25 20:45:00,915 INFO  logaggregation.AppLogAggregatorImpl 
 (AppLogAggregatorImpl.java:doAppLogAggregation(182)) - Finished aggregate 
 log-file for app application_1377459190746_0118
 2013-08-25 20:45:00,925 WARN  application.Application 
 (ApplicationImpl.java:handle(427)) - Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 APPLICATION_LOG_HANDLING_FINISHED at RUNNING
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
  
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:425)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:59)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:697)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:689)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134)
 at 
 

[jira] [Commented] (YARN-1256) NM silently ignores non-existent service in StartContainerRequest

2013-10-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785746#comment-13785746
 ] 

Hudson commented on YARN-1256:
--

FAILURE: Integrated in Hadoop-trunk-Commit #4531 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4531/])
YARN-1256. NM silently ignores non-existent service in StartContainerRequest 
(Xuan Gong via bikas) (bikas: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1529039)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/AuxiliaryServiceHelper.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/AuxServices.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManagerImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestContainerManagerWithLCE.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/TestContainerManager.java


 NM silently ignores non-existent service in StartContainerRequest
 -

 Key: YARN-1256
 URL: https://issues.apache.org/jira/browse/YARN-1256
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.1-beta
Reporter: Bikas Saha
Assignee: Xuan Gong
Priority: Critical
 Fix For: 2.1.2-beta

 Attachments: YARN-1256.1.patch, YARN-1256.2.patch, YARN-1256.3.patch, 
 YARN-1256.4.patch, YARN-1256.5.patch


 A container can set token service metadata for a service, say 
 shuffle_service. If that service does not exist then the errors is silently 
 ignored. Later, when the next container wants to access data written to 
 shuffle_service by the first task, then it fails because the service does not 
 have the token that was supposed to be set by the first task.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1271) Text file busy errors launching containers again

2013-10-03 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785752#comment-13785752
 ] 

Aaron T. Myers commented on YARN-1271:
--

+1, looks good to me. This is the exact same fix as what we used in 
MAPREDUCE-2374.

Thanks, Sandy.

 Text file busy errors launching containers again
 --

 Key: YARN-1271
 URL: https://issues.apache.org/jira/browse/YARN-1271
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.1.1-beta
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-1271.patch


 The error is shown below in the comments.
 MAPREDUCE-2374 fixed this by removing -c when running the container launch 
 script.  It looks like the -c got brought back during the windows branch 
 merge, so we should remove it again.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1167) Submitted distributed shell application shows appMasterHost = empty

2013-10-03 Thread Omkar Vinit Joshi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785755#comment-13785755
 ] 

Omkar Vinit Joshi commented on YARN-1167:
-

Thanks [~vinodkv] for pointing it out.. test case is wrong.. I mean it is not 
testing the distributed shell code...check TestDistributedShell.java

 Submitted distributed shell application shows appMasterHost = empty
 ---

 Key: YARN-1167
 URL: https://issues.apache.org/jira/browse/YARN-1167
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/distributed-shell
Reporter: Tassapol Athiapinya
Assignee: Xuan Gong
 Fix For: 2.1.2-beta

 Attachments: YARN-1167.1.patch, YARN-1167.2.patch, YARN-1167.3.patch, 
 YARN-1167.4.patch


 Submit distributed shell application. Once the application turns to be 
 RUNNING state, app master host should not be empty. In reality, it is empty.
 ==console logs==
 distributedshell.Client: Got application report from ASM for, appId=12, 
 clientToAMToken=null, appDiagnostics=, appMasterHost=, appQueue=default, 
 appMasterRpcPort=0, appStartTime=1378505161360, yarnAppState=RUNNING, 
 distributedFinalState=UNDEFINED, 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


  1   2   >