[jira] [Created] (YARN-2229) Making ContainerId long type

2014-06-27 Thread Tsuyoshi OZAWA (JIRA)
Tsuyoshi OZAWA created YARN-2229:


 Summary: Making ContainerId long type
 Key: YARN-2229
 URL: https://issues.apache.org/jira/browse/YARN-2229
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA


On YARN-2052, we changed containerId format: upper 10 bits are for epoch, lower 
22 bits are for sequence number of Ids. This is for preserving semantics of 
{{ContainerId#getId()}}, {{ContainerId#toString()}}, 
{{ContainerId#compareTo()}}, {{ContainerId#equals}}, and 
{{ConverterUtils#toContainerId}}. One concern is epoch can overflow after RM 
restarts 1024 times.
To avoid the problem, its better to make containerId long. We need to define 
the new format of container Id with preserving backward compatibility on this 
JIRA.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2201) TestRMWebServicesAppsModification dependent on yarn-default.xml

2014-06-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046722#comment-14046722
 ] 

Hudson commented on YARN-2201:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5794 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5794/])
YARN-2201. Made TestRMWebServicesAppsModification be independent of the changes 
on yarn-default.xml. Contributed by Varun Vasudev. (zjshen: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1606285)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesAppsModification.java


> TestRMWebServicesAppsModification dependent on yarn-default.xml
> ---
>
> Key: YARN-2201
> URL: https://issues.apache.org/jira/browse/YARN-2201
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Ray Chiang
>Assignee: Varun Vasudev
>  Labels: test
> Fix For: 2.5.0
>
> Attachments: apache-yarn-2201.0.patch, apache-yarn-2201.1.patch, 
> apache-yarn-2201.2.patch, apache-yarn-2201.3.patch
>
>
> TestRMWebServicesAppsModification.java has some errors that are 
> yarn-default.xml dependent.  By changing yarn-default.xml properties, I'm 
> seeing the following errors:
> 1) Changing yarn.resourcemanager.scheduler.class from 
> capacity.CapacityScheduler to fair.FairScheduler gives the error:
> Running 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification
> Tests run: 10, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 79.047 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification
> testSingleAppKillUnauthorized[1](org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification)
>   Time elapsed: 3.22 sec  <<< FAILURE!
> java.lang.AssertionError: expected: but was:
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.failNotEquals(Assert.java:743)
> at org.junit.Assert.assertEquals(Assert.java:118)
> at org.junit.Assert.assertEquals(Assert.java:144)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testSingleAppKillUnauthorized(TestRMWebServicesAppsModification.java:458)
> 2) Changing yarn.acl.enable from false to true results in the following 
> errors:
> Running 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification
> Tests run: 10, Failures: 4, Errors: 0, Skipped: 0, Time elapsed: 49.044 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification
> testSingleAppKill[0](org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification)
>   Time elapsed: 2.986 sec  <<< FAILURE!
> java.lang.AssertionError: expected: but was:
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.failNotEquals(Assert.java:743)
> at org.junit.Assert.assertEquals(Assert.java:118)
> at org.junit.Assert.assertEquals(Assert.java:144)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testSingleAppKill(TestRMWebServicesAppsModification.java:287)
> testSingleAppKillInvalidState[0](org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification)
>   Time elapsed: 2.258 sec  <<< FAILURE!
> java.lang.AssertionError: expected: but was:
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.failNotEquals(Assert.java:743)
> at org.junit.Assert.assertEquals(Assert.java:118)
> at org.junit.Assert.assertEquals(Assert.java:144)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testSingleAppKillInvalidState(TestRMWebServicesAppsModification.java:369)
> testSingleAppKillUnauthorized[0](org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification)
>   Time elapsed: 2.263 sec  <<< FAILURE!
> java.lang.AssertionError: expected: but was:
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.failNotEquals(Assert.java:743)
> at org.junit.Assert.assertEquals(Assert.java:118)
> at org.junit.Assert.assertEquals(Assert.java:144)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testSingleAppKillUnauthorized(TestRMWebServicesAppsModification.java:458)
> testSingleAppKillInvalidId[0](org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification)
>   Time elapsed: 0.214 sec  <<< FAILURE!
> java.lang.A

[jira] [Commented] (YARN-2201) TestRMWebServicesAppsModification dependent on yarn-default.xml

2014-06-27 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046721#comment-14046721
 ] 

Zhijie Shen commented on YARN-2201:
---

Committed to trunk, branch-2. Thanks Varun for the patch, and Ray for review!

> TestRMWebServicesAppsModification dependent on yarn-default.xml
> ---
>
> Key: YARN-2201
> URL: https://issues.apache.org/jira/browse/YARN-2201
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Ray Chiang
>Assignee: Varun Vasudev
>  Labels: test
> Attachments: apache-yarn-2201.0.patch, apache-yarn-2201.1.patch, 
> apache-yarn-2201.2.patch, apache-yarn-2201.3.patch
>
>
> TestRMWebServicesAppsModification.java has some errors that are 
> yarn-default.xml dependent.  By changing yarn-default.xml properties, I'm 
> seeing the following errors:
> 1) Changing yarn.resourcemanager.scheduler.class from 
> capacity.CapacityScheduler to fair.FairScheduler gives the error:
> Running 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification
> Tests run: 10, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 79.047 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification
> testSingleAppKillUnauthorized[1](org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification)
>   Time elapsed: 3.22 sec  <<< FAILURE!
> java.lang.AssertionError: expected: but was:
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.failNotEquals(Assert.java:743)
> at org.junit.Assert.assertEquals(Assert.java:118)
> at org.junit.Assert.assertEquals(Assert.java:144)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testSingleAppKillUnauthorized(TestRMWebServicesAppsModification.java:458)
> 2) Changing yarn.acl.enable from false to true results in the following 
> errors:
> Running 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification
> Tests run: 10, Failures: 4, Errors: 0, Skipped: 0, Time elapsed: 49.044 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification
> testSingleAppKill[0](org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification)
>   Time elapsed: 2.986 sec  <<< FAILURE!
> java.lang.AssertionError: expected: but was:
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.failNotEquals(Assert.java:743)
> at org.junit.Assert.assertEquals(Assert.java:118)
> at org.junit.Assert.assertEquals(Assert.java:144)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testSingleAppKill(TestRMWebServicesAppsModification.java:287)
> testSingleAppKillInvalidState[0](org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification)
>   Time elapsed: 2.258 sec  <<< FAILURE!
> java.lang.AssertionError: expected: but was:
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.failNotEquals(Assert.java:743)
> at org.junit.Assert.assertEquals(Assert.java:118)
> at org.junit.Assert.assertEquals(Assert.java:144)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testSingleAppKillInvalidState(TestRMWebServicesAppsModification.java:369)
> testSingleAppKillUnauthorized[0](org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification)
>   Time elapsed: 2.263 sec  <<< FAILURE!
> java.lang.AssertionError: expected: but was:
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.failNotEquals(Assert.java:743)
> at org.junit.Assert.assertEquals(Assert.java:118)
> at org.junit.Assert.assertEquals(Assert.java:144)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testSingleAppKillUnauthorized(TestRMWebServicesAppsModification.java:458)
> testSingleAppKillInvalidId[0](org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification)
>   Time elapsed: 0.214 sec  <<< FAILURE!
> java.lang.AssertionError: expected: but was:
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.failNotEquals(Assert.java:743)
> at org.junit.Assert.assertEquals(Assert.java:118)
> at org.junit.Assert.assertEquals(Assert.java:144)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testSingleAppKillInvalidId(TestRMWebServicesAppsModification.java:482)
> I'm opening this JIRA as a discussion for the best way to fix this.  I've got 
> a few ideas, but I wo

[jira] [Created] (YARN-2228) TimelineServer should load pseudo authentication filter when authentication = simple

2014-06-27 Thread Zhijie Shen (JIRA)
Zhijie Shen created YARN-2228:
-

 Summary: TimelineServer should load pseudo authentication filter 
when authentication = simple
 Key: YARN-2228
 URL: https://issues.apache.org/jira/browse/YARN-2228
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen


When kerberos authentication is not enabled, we should let the timeline server 
to work with pseudo authentication filter. In this way, the sever is able to 
detect the request user by checking "user.name".

On the other hand, timeline client should append "user.name" in un-secure case 
as well, such that ACLs can keep working in this case. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2052) ContainerId creation after work preserving restart is broken

2014-06-27 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046707#comment-14046707
 ] 

Tsuyoshi OZAWA commented on YARN-2052:
--

The test failure is not related.

> ContainerId creation after work preserving restart is broken
> 
>
> Key: YARN-2052
> URL: https://issues.apache.org/jira/browse/YARN-2052
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Tsuyoshi OZAWA
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-2052.1.patch, YARN-2052.10.patch, 
> YARN-2052.11.patch, YARN-2052.2.patch, YARN-2052.3.patch, YARN-2052.4.patch, 
> YARN-2052.5.patch, YARN-2052.6.patch, YARN-2052.7.patch, YARN-2052.8.patch, 
> YARN-2052.9.patch, YARN-2052.9.patch
>
>
> Container ids are made unique by using the app identifier and appending a 
> monotonically increasing sequence number to it. Since container creation is a 
> high churn activity the RM does not store the sequence number per app. So 
> after restart it does not know what the new sequence number should be for new 
> allocations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2052) ContainerId creation after work preserving restart is broken

2014-06-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046703#comment-14046703
 ] 

Hadoop QA commented on YARN-2052:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12652938/YARN-2052.11.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 7 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4129//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4129//console

This message is automatically generated.

> ContainerId creation after work preserving restart is broken
> 
>
> Key: YARN-2052
> URL: https://issues.apache.org/jira/browse/YARN-2052
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Tsuyoshi OZAWA
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-2052.1.patch, YARN-2052.10.patch, 
> YARN-2052.11.patch, YARN-2052.2.patch, YARN-2052.3.patch, YARN-2052.4.patch, 
> YARN-2052.5.patch, YARN-2052.6.patch, YARN-2052.7.patch, YARN-2052.8.patch, 
> YARN-2052.9.patch, YARN-2052.9.patch
>
>
> Container ids are made unique by using the app identifier and appending a 
> monotonically increasing sequence number to it. Since container creation is a 
> high churn activity the RM does not store the sequence number per app. So 
> after restart it does not know what the new sequence number should be for new 
> allocations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-614) Separate AM failures from hardware failure or YARN error and do not count them to AM retry count

2014-06-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046647#comment-14046647
 ] 

Hadoop QA commented on YARN-614:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12652934/YARN-614.13.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4128//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4128//console

This message is automatically generated.

> Separate AM failures from hardware failure or YARN error and do not count 
> them to AM retry count
> 
>
> Key: YARN-614
> URL: https://issues.apache.org/jira/browse/YARN-614
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Bikas Saha
>Assignee: Xuan Gong
> Fix For: 2.5.0
>
> Attachments: YARN-614-0.patch, YARN-614-1.patch, YARN-614-2.patch, 
> YARN-614-3.patch, YARN-614-4.patch, YARN-614-5.patch, YARN-614-6.patch, 
> YARN-614.10.patch, YARN-614.11.patch, YARN-614.12.patch, YARN-614.13.patch, 
> YARN-614.7.patch, YARN-614.8.patch, YARN-614.9.patch
>
>
> Attempts can fail due to a large number of user errors and they should not be 
> retried unnecessarily. The only reason YARN should retry an attempt is when 
> the hardware fails or YARN has an error. NM failing, lost NM and NM disk 
> errors are the hardware errors that come to mind.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2052) ContainerId creation after work preserving restart is broken

2014-06-27 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-2052:
-

Attachment: YARN-2052.11.patch

Updated a patch to address the comments:
* Bumped up the version of FileSystemRMStateStore.
* Refactored  {{getAndIncrement}} of FileSystemStateStore/ZKRMStateStore to 
remove duplicated check of the epoch znode/file.
* Renamed RMEpoch.java to Epoch.java and RMEpochPBImpl.java to 
EpochPBImpl.java. For the consistency, updated the file/znode name of 
EPOCH_NODE from "RMEpochNode" to "EpochNode".

> ContainerId creation after work preserving restart is broken
> 
>
> Key: YARN-2052
> URL: https://issues.apache.org/jira/browse/YARN-2052
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Tsuyoshi OZAWA
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-2052.1.patch, YARN-2052.10.patch, 
> YARN-2052.11.patch, YARN-2052.2.patch, YARN-2052.3.patch, YARN-2052.4.patch, 
> YARN-2052.5.patch, YARN-2052.6.patch, YARN-2052.7.patch, YARN-2052.8.patch, 
> YARN-2052.9.patch, YARN-2052.9.patch
>
>
> Container ids are made unique by using the app identifier and appending a 
> monotonically increasing sequence number to it. Since container creation is a 
> high churn activity the RM does not store the sequence number per app. So 
> after restart it does not know what the new sequence number should be for new 
> allocations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1408) Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task timeout for 30mins

2014-06-27 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046590#comment-14046590
 ] 

Wangda Tan commented on YARN-1408:
--

Hi [~sunilg],
Thanks for working out this patch so fast!

*A major problems I've seen.*
ResourceRequest stored in RMContainerImpl should include rack/any RR,
Currently, there's only one ResourceRequest stored in RMContainerImpl, this may 
should not enough for recovering in following cases:
Case 1: RR may contain other fields like relaxLocality, etc. Assume a RR is 
node-local, the relaxLocaity=true (default), and it's rack-local/any RR's 
relaxLocality=false. In your current implementation, you cannot fully recover 
original RRs.
Case 2: Rack-local RR will be missing. Assume a RR is node-local, when do 
resource allocation, the outstanding rack-local/any numContainer will be 
decreased, you can check AppSchedulingInfo#allocateNodeLocal for the logic of 
how outstanding rack/any #containers decreased.

*My thoughts about how to implement this is:*
In FiCaScheduler#allocate, appSchedulingInfo.allocate will be invoked. You can 
edit appSchedulingInfo.allocate to return a list a RRs, include node/rack/any 
if possible.
Pass such RRs to RMContainerImpl

And could you please elaborate on this?
bq. AM would have asked for NodeLocal in another Hosts, which may not be able 
to recover.

Does it make sense to you?  I'll review minor issues and test cases in next 
cycle.

Thanks,
Wangda

> Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task 
> timeout for 30mins
> --
>
> Key: YARN-1408
> URL: https://issues.apache.org/jira/browse/YARN-1408
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.2.0
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: Yarn-1408.1.patch, Yarn-1408.2.patch, Yarn-1408.3.patch, 
> Yarn-1408.4.patch, Yarn-1408.5.patch, Yarn-1408.patch
>
>
> Capacity preemption is enabled as follows.
>  *  yarn.resourcemanager.scheduler.monitor.enable= true ,
>  *  
> yarn.resourcemanager.scheduler.monitor.policies=org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy
> Queue = a,b
> Capacity of Queue A = 80%
> Capacity of Queue B = 20%
> Step 1: Assign a big jobA on queue a which uses full cluster capacity
> Step 2: Submitted a jobB to queue b  which would use less than 20% of cluster 
> capacity
> JobA task which uses queue b capcity is been preempted and killed.
> This caused below problem:
> 1. New Container has got allocated for jobA in Queue A as per node update 
> from an NM.
> 2. This container has been preempted immediately as per preemption.
> Here ACQUIRED at KILLED Invalid State exception came when the next AM 
> heartbeat reached RM.
> ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
> Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> ACQUIRED at KILLED
> This also caused the Task to go for a timeout for 30minutes as this Container 
> was already killed by preemption.
> attempt_1380289782418_0003_m_00_0 Timed out after 1800 secs



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-614) Separate AM failures from hardware failure or YARN error and do not count them to AM retry count

2014-06-27 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-614:
-

Attachment: YARN-614.13.patch

renamed a unit test  name

> Separate AM failures from hardware failure or YARN error and do not count 
> them to AM retry count
> 
>
> Key: YARN-614
> URL: https://issues.apache.org/jira/browse/YARN-614
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Bikas Saha
>Assignee: Xuan Gong
> Fix For: 2.5.0
>
> Attachments: YARN-614-0.patch, YARN-614-1.patch, YARN-614-2.patch, 
> YARN-614-3.patch, YARN-614-4.patch, YARN-614-5.patch, YARN-614-6.patch, 
> YARN-614.10.patch, YARN-614.11.patch, YARN-614.12.patch, YARN-614.13.patch, 
> YARN-614.7.patch, YARN-614.8.patch, YARN-614.9.patch
>
>
> Attempts can fail due to a large number of user errors and they should not be 
> retried unnecessarily. The only reason YARN should retry an attempt is when 
> the hardware fails or YARN has an error. NM failing, lost NM and NM disk 
> errors are the hardware errors that come to mind.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-614) Separate AM failures from hardware failure or YARN error and do not count them to AM retry count

2014-06-27 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046572#comment-14046572
 ] 

Jian He commented on YARN-614:
--

+1

> Separate AM failures from hardware failure or YARN error and do not count 
> them to AM retry count
> 
>
> Key: YARN-614
> URL: https://issues.apache.org/jira/browse/YARN-614
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Bikas Saha
>Assignee: Xuan Gong
> Fix For: 2.5.0
>
> Attachments: YARN-614-0.patch, YARN-614-1.patch, YARN-614-2.patch, 
> YARN-614-3.patch, YARN-614-4.patch, YARN-614-5.patch, YARN-614-6.patch, 
> YARN-614.10.patch, YARN-614.11.patch, YARN-614.12.patch, YARN-614.7.patch, 
> YARN-614.8.patch, YARN-614.9.patch
>
>
> Attempts can fail due to a large number of user errors and they should not be 
> retried unnecessarily. The only reason YARN should retry an attempt is when 
> the hardware fails or YARN has an error. NM failing, lost NM and NM disk 
> errors are the hardware errors that come to mind.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2225) Turn the virtual memory check to be off by default

2014-06-27 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046566#comment-14046566
 ] 

Vinod Kumar Vavilapalli commented on YARN-2225:
---

It breaks compatibility w.r.t behavior - asking existing users who care about 
it to turn it on explicitly.

bq. In spirit, virtual memory check has been a pain and we end up recommending 
users to turn it off.
I have had a different experience. It indeed is a pain for testing both in 
Hadoop and in high level frameworks, but it's been invaluable in real life 
clusters to thwart run away jobs - specifically the non-java ones - from 
affecting the cluster.

> Turn the virtual memory check to be off by default
> --
>
> Key: YARN-2225
> URL: https://issues.apache.org/jira/browse/YARN-2225
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-2225.patch
>
>
> The virtual memory check may not be the best way to isolate applications. 
> Virtual memory is not the constrained resource. It would be better if we 
> limit the swapping of the task using swapiness instead. This patch will turn 
> this DEFAULT_NM_VMEM_CHECK_ENABLED off by default and let users turn it on if 
> they need to.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2227) Move containerMgrProxy from RM's AMLaunch to get rid of issues that new client talking with old server

2014-06-27 Thread Junping Du (JIRA)
Junping Du created YARN-2227:


 Summary: Move containerMgrProxy from RM's AMLaunch to get rid of 
issues that new client talking with old server
 Key: YARN-2227
 URL: https://issues.apache.org/jira/browse/YARN-2227
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Junping Du
Assignee: Junping Du


In rolling upgrade semantics, we should handle cases that old client should 
talk with new servers if only compatible changes happen in RPC protocol. In 
this semantics, there is no guarantee that new client should able to talk with 
old server which need us to pay specially attention on upgrading sequence. Even 
this, we will find that it is still hard to deal with NM talk with RM as there 
are both client and server at both side: in regular heartbeat, NM is client and 
RM is server; when RM launch AM client, it go through containerMgrProxy and RM 
is client while NM is server in this case. We should get rid of this situation, 
i.e. by removing containerMgrProxy in RM and use other way to launch container.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2227) Move containerMgrProxy from RM's AMLaunch to get rid of issues that new client talking with old server

2014-06-27 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-2227:
-

Issue Type: Sub-task  (was: Improvement)
Parent: YARN-666

> Move containerMgrProxy from RM's AMLaunch to get rid of issues that new 
> client talking with old server
> --
>
> Key: YARN-2227
> URL: https://issues.apache.org/jira/browse/YARN-2227
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Junping Du
>Assignee: Junping Du
>
> In rolling upgrade semantics, we should handle cases that old client should 
> talk with new servers if only compatible changes happen in RPC protocol. In 
> this semantics, there is no guarantee that new client should able to talk 
> with old server which need us to pay specially attention on upgrading 
> sequence. Even this, we will find that it is still hard to deal with NM talk 
> with RM as there are both client and server at both side: in regular 
> heartbeat, NM is client and RM is server; when RM launch AM client, it go 
> through containerMgrProxy and RM is client while NM is server in this case. 
> We should get rid of this situation, i.e. by removing containerMgrProxy in RM 
> and use other way to launch container.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2104) Scheduler queue filter failed to work because index of queue column changed

2014-06-27 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046560#comment-14046560
 ] 

Wangda Tan commented on YARN-2104:
--

Thanks [~maysamyabandeh] and [~jlowe] for review and commit!

> Scheduler queue filter failed to work because index of queue column changed
> ---
>
> Key: YARN-2104
> URL: https://issues.apache.org/jira/browse/YARN-2104
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, webapp
>Affects Versions: 2.4.0
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Fix For: 3.0.0, 2.5.0
>
> Attachments: YARN-2104.patch
>
>
> YARN-563 added,
> {code}
> + th(".type", "Application Type”).
> {code}
> to application table, which makes queue’s column index from 3 to 4. And in 
> scheduler page, queue’s column index is hard coded to 3 when filter 
> application with queue’s name,
> {code}
>   "if (q == 'root') q = '';",
>   "else q = '^' + q.substr(q.lastIndexOf('.') + 1) + '$';",
>   "$('#apps').dataTable().fnFilter(q, 3, true);",
> {code}
> So queue filter will not work for application page.
> Reproduce steps: (Thanks Bo Yang for pointing this)
> {code}
> 1) In default setup, there’s a default queue under root queue
> 2) Run an arbitrary application, you can find it in “Applications” page
> 3) Click “Default” queue in scheduler page
> 4) Click “Applications”, no application will show here
> 5) Click “Root” queue in scheduler page
> 6) Click “Applications”, application will show again
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2052) ContainerId creation after work preserving restart is broken

2014-06-27 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-2052:
-

Attachment: (was: YARN-2052.11.patch)

> ContainerId creation after work preserving restart is broken
> 
>
> Key: YARN-2052
> URL: https://issues.apache.org/jira/browse/YARN-2052
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Tsuyoshi OZAWA
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-2052.1.patch, YARN-2052.10.patch, 
> YARN-2052.2.patch, YARN-2052.3.patch, YARN-2052.4.patch, YARN-2052.5.patch, 
> YARN-2052.6.patch, YARN-2052.7.patch, YARN-2052.8.patch, YARN-2052.9.patch, 
> YARN-2052.9.patch
>
>
> Container ids are made unique by using the app identifier and appending a 
> monotonically increasing sequence number to it. Since container creation is a 
> high churn activity the RM does not store the sequence number per app. So 
> after restart it does not know what the new sequence number should be for new 
> allocations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2052) ContainerId creation after work preserving restart is broken

2014-06-27 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046557#comment-14046557
 ] 

Jian He commented on YARN-2052:
---

can you rename RMEpoch.java to Epoch and similar RMEpochPBimpl too ?

> ContainerId creation after work preserving restart is broken
> 
>
> Key: YARN-2052
> URL: https://issues.apache.org/jira/browse/YARN-2052
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Tsuyoshi OZAWA
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-2052.1.patch, YARN-2052.10.patch, 
> YARN-2052.11.patch, YARN-2052.2.patch, YARN-2052.3.patch, YARN-2052.4.patch, 
> YARN-2052.5.patch, YARN-2052.6.patch, YARN-2052.7.patch, YARN-2052.8.patch, 
> YARN-2052.9.patch, YARN-2052.9.patch
>
>
> Container ids are made unique by using the app identifier and appending a 
> monotonically increasing sequence number to it. Since container creation is a 
> high churn activity the RM does not store the sequence number per app. So 
> after restart it does not know what the new sequence number should be for new 
> allocations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2052) ContainerId creation after work preserving restart is broken

2014-06-27 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-2052:
-

Attachment: YARN-2052.11.patch

> ContainerId creation after work preserving restart is broken
> 
>
> Key: YARN-2052
> URL: https://issues.apache.org/jira/browse/YARN-2052
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Tsuyoshi OZAWA
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-2052.1.patch, YARN-2052.10.patch, 
> YARN-2052.11.patch, YARN-2052.2.patch, YARN-2052.3.patch, YARN-2052.4.patch, 
> YARN-2052.5.patch, YARN-2052.6.patch, YARN-2052.7.patch, YARN-2052.8.patch, 
> YARN-2052.9.patch, YARN-2052.9.patch
>
>
> Container ids are made unique by using the app identifier and appending a 
> monotonically increasing sequence number to it. Since container creation is a 
> high churn activity the RM does not store the sequence number per app. So 
> after restart it does not know what the new sequence number should be for new 
> allocations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2052) ContainerId creation after work preserving restart is broken

2014-06-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046544#comment-14046544
 ] 

Hadoop QA commented on YARN-2052:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12652917/YARN-2052.10.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 7 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4127//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4127//console

This message is automatically generated.

> ContainerId creation after work preserving restart is broken
> 
>
> Key: YARN-2052
> URL: https://issues.apache.org/jira/browse/YARN-2052
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Tsuyoshi OZAWA
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-2052.1.patch, YARN-2052.10.patch, 
> YARN-2052.2.patch, YARN-2052.3.patch, YARN-2052.4.patch, YARN-2052.5.patch, 
> YARN-2052.6.patch, YARN-2052.7.patch, YARN-2052.8.patch, YARN-2052.9.patch, 
> YARN-2052.9.patch
>
>
> Container ids are made unique by using the app identifier and appending a 
> monotonically increasing sequence number to it. Since container creation is a 
> high churn activity the RM does not store the sequence number per app. So 
> after restart it does not know what the new sequence number should be for new 
> allocations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2226) RMStateStore versioning (CURRENT_VERSION_INFO) should apply to all stores

2014-06-27 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046531#comment-14046531
 ] 

Jian He commented on YARN-2226:
---

Actually, FileSystem and ZK state store has separate version because they might 
at some point diverge. close this as invalid

> RMStateStore versioning (CURRENT_VERSION_INFO) should apply to all stores
> -
>
> Key: YARN-2226
> URL: https://issues.apache.org/jira/browse/YARN-2226
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Vinod Kumar Vavilapalli
>  Labels: newbie
>
> We need all state store impls to be versioned. Should move 
> ZKRMStateStore.CURRENT_VERSION_INFO to RMStateStore so that versioning 
> applies to all stores.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (YARN-2226) RMStateStore versioning (CURRENT_VERSION_INFO) should apply to all stores

2014-06-27 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He resolved YARN-2226.
---

Resolution: Invalid

> RMStateStore versioning (CURRENT_VERSION_INFO) should apply to all stores
> -
>
> Key: YARN-2226
> URL: https://issues.apache.org/jira/browse/YARN-2226
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Vinod Kumar Vavilapalli
>  Labels: newbie
>
> We need all state store impls to be versioned. Should move 
> ZKRMStateStore.CURRENT_VERSION_INFO to RMStateStore so that versioning 
> applies to all stores.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2052) ContainerId creation after work preserving restart is broken

2014-06-27 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046530#comment-14046530
 ] 

Jian He commented on YARN-2052:
---

- Actually, FileSystem and ZK state store has separate version because they 
might at some point diverge, we should bump up filesystem version too in this 
patch.
- These two calls are duplicated in getAndIncrement of 
FileSystemStateStore/ZKRMStateStore, we can consolidate into one,
“fs.exists(epochNodePath)/ existsWithRetries(epochNodePath, true) != null;”

> ContainerId creation after work preserving restart is broken
> 
>
> Key: YARN-2052
> URL: https://issues.apache.org/jira/browse/YARN-2052
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Tsuyoshi OZAWA
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-2052.1.patch, YARN-2052.10.patch, 
> YARN-2052.2.patch, YARN-2052.3.patch, YARN-2052.4.patch, YARN-2052.5.patch, 
> YARN-2052.6.patch, YARN-2052.7.patch, YARN-2052.8.patch, YARN-2052.9.patch, 
> YARN-2052.9.patch
>
>
> Container ids are made unique by using the app identifier and appending a 
> monotonically increasing sequence number to it. Since container creation is a 
> high churn activity the RM does not store the sequence number per app. So 
> after restart it does not know what the new sequence number should be for new 
> allocations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2104) Scheduler queue filter failed to work because index of queue column changed

2014-06-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046528#comment-14046528
 ] 

Hudson commented on YARN-2104:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5792 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5792/])
YARN-2104. Scheduler queue filter failed to work because index of queue column 
changed. Contributed by Wangda Tan (jlowe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1606265)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/CapacitySchedulerPage.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/DefaultSchedulerPage.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/FairSchedulerPage.java


> Scheduler queue filter failed to work because index of queue column changed
> ---
>
> Key: YARN-2104
> URL: https://issues.apache.org/jira/browse/YARN-2104
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, webapp
>Affects Versions: 2.4.0
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Fix For: 3.0.0, 2.5.0
>
> Attachments: YARN-2104.patch
>
>
> YARN-563 added,
> {code}
> + th(".type", "Application Type”).
> {code}
> to application table, which makes queue’s column index from 3 to 4. And in 
> scheduler page, queue’s column index is hard coded to 3 when filter 
> application with queue’s name,
> {code}
>   "if (q == 'root') q = '';",
>   "else q = '^' + q.substr(q.lastIndexOf('.') + 1) + '$';",
>   "$('#apps').dataTable().fnFilter(q, 3, true);",
> {code}
> So queue filter will not work for application page.
> Reproduce steps: (Thanks Bo Yang for pointing this)
> {code}
> 1) In default setup, there’s a default queue under root queue
> 2) Run an arbitrary application, you can find it in “Applications” page
> 3) Click “Default” queue in scheduler page
> 4) Click “Applications”, no application will show here
> 5) Click “Root” queue in scheduler page
> 6) Click “Applications”, application will show again
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2104) Scheduler queue filter failed to work because index of queue column changed

2014-06-27 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046512#comment-14046512
 ] 

Jason Lowe commented on YARN-2104:
--

+1 lgtm.  The test failure is unrelated.  Committing this.

> Scheduler queue filter failed to work because index of queue column changed
> ---
>
> Key: YARN-2104
> URL: https://issues.apache.org/jira/browse/YARN-2104
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, webapp
>Affects Versions: 2.4.0
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-2104.patch
>
>
> YARN-563 added,
> {code}
> + th(".type", "Application Type”).
> {code}
> to application table, which makes queue’s column index from 3 to 4. And in 
> scheduler page, queue’s column index is hard coded to 3 when filter 
> application with queue’s name,
> {code}
>   "if (q == 'root') q = '';",
>   "else q = '^' + q.substr(q.lastIndexOf('.') + 1) + '$';",
>   "$('#apps').dataTable().fnFilter(q, 3, true);",
> {code}
> So queue filter will not work for application page.
> Reproduce steps: (Thanks Bo Yang for pointing this)
> {code}
> 1) In default setup, there’s a default queue under root queue
> 2) Run an arbitrary application, you can find it in “Applications” page
> 3) Click “Default” queue in scheduler page
> 4) Click “Applications”, no application will show here
> 5) Click “Root” queue in scheduler page
> 6) Click “Applications”, application will show again
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2052) ContainerId creation after work preserving restart is broken

2014-06-27 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-2052:
-

Attachment: YARN-2052.10.patch

[~jianhe], good catch. Updated MemoryRMStateStore and its tests.
[~vinodkv], yes, let's do this on YARN-2226.

> ContainerId creation after work preserving restart is broken
> 
>
> Key: YARN-2052
> URL: https://issues.apache.org/jira/browse/YARN-2052
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Tsuyoshi OZAWA
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-2052.1.patch, YARN-2052.10.patch, 
> YARN-2052.2.patch, YARN-2052.3.patch, YARN-2052.4.patch, YARN-2052.5.patch, 
> YARN-2052.6.patch, YARN-2052.7.patch, YARN-2052.8.patch, YARN-2052.9.patch, 
> YARN-2052.9.patch
>
>
> Container ids are made unique by using the app identifier and appending a 
> monotonically increasing sequence number to it. Since container creation is a 
> high churn activity the RM does not store the sequence number per app. So 
> after restart it does not know what the new sequence number should be for new 
> allocations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2226) RMStateStore versioning (CURRENT_VERSION_INFO) should apply to all stores

2014-06-27 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-2226:
--

Assignee: (was: Vinod Kumar Vavilapalli)
  Labels: newbie  (was: )

> RMStateStore versioning (CURRENT_VERSION_INFO) should apply to all stores
> -
>
> Key: YARN-2226
> URL: https://issues.apache.org/jira/browse/YARN-2226
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Vinod Kumar Vavilapalli
>  Labels: newbie
>
> We need all state store impls to be versioned. Should move 
> ZKRMStateStore.CURRENT_VERSION_INFO to RMStateStore so that versioning 
> applies to all stores.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2052) ContainerId creation after work preserving restart is broken

2014-06-27 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046504#comment-14046504
 ] 

Vinod Kumar Vavilapalli commented on YARN-2052:
---

Not related to this patch, but I think CURRENT_VERSION_INFO shouldn't be in 
ZKRMStateStore. Filed YARN-2226.

> ContainerId creation after work preserving restart is broken
> 
>
> Key: YARN-2052
> URL: https://issues.apache.org/jira/browse/YARN-2052
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Tsuyoshi OZAWA
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-2052.1.patch, YARN-2052.2.patch, YARN-2052.3.patch, 
> YARN-2052.4.patch, YARN-2052.5.patch, YARN-2052.6.patch, YARN-2052.7.patch, 
> YARN-2052.8.patch, YARN-2052.9.patch, YARN-2052.9.patch
>
>
> Container ids are made unique by using the app identifier and appending a 
> monotonically increasing sequence number to it. Since container creation is a 
> high churn activity the RM does not store the sequence number per app. So 
> after restart it does not know what the new sequence number should be for new 
> allocations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2226) RMStateStore versioning (CURRENT_VERSION_INFO) should apply to all stores

2014-06-27 Thread Vinod Kumar Vavilapalli (JIRA)
Vinod Kumar Vavilapalli created YARN-2226:
-

 Summary: RMStateStore versioning (CURRENT_VERSION_INFO) should 
apply to all stores
 Key: YARN-2226
 URL: https://issues.apache.org/jira/browse/YARN-2226
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli


We need all state store impls to be versioned. Should move 
ZKRMStateStore.CURRENT_VERSION_INFO to RMStateStore so that versioning applies 
to all stores.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (YARN-2225) Turn the virtual memory check to be off by default

2014-06-27 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046494#comment-14046494
 ] 

Karthik Kambatla edited comment on YARN-2225 at 6/27/14 10:48 PM:
--

According to our compatibility guide, "The default values of Hadoop-defined 
properties can be changed across minor/major releases, but will remain the same 
across point releases within a minor release."

So, in letter, we can't target 2.4.1 or 2.5.1, but can target 2.5 or 2.6. In 
spirit, virtual memory check has been a pain and we end up recommending users 
to turn it off. 


was (Author: kkambatl):
According to your compatibility guide, "The default values of Hadoop-defined 
properties can be changed across minor/major releases, but will remain the same 
across point releases within a minor release."

So, in letter, we can't target 2.4.1 or 2.5.1, but can target 2.5 or 2.6. In 
spirit, virtual memory check has been a pain and we end up recommending users 
to turn it off. 

> Turn the virtual memory check to be off by default
> --
>
> Key: YARN-2225
> URL: https://issues.apache.org/jira/browse/YARN-2225
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-2225.patch
>
>
> The virtual memory check may not be the best way to isolate applications. 
> Virtual memory is not the constrained resource. It would be better if we 
> limit the swapping of the task using swapiness instead. This patch will turn 
> this DEFAULT_NM_VMEM_CHECK_ENABLED off by default and let users turn it on if 
> they need to.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2225) Turn the virtual memory check to be off by default

2014-06-27 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046494#comment-14046494
 ] 

Karthik Kambatla commented on YARN-2225:


According to your compatibility guide, "The default values of Hadoop-defined 
properties can be changed across minor/major releases, but will remain the same 
across point releases within a minor release."

So, in letter, we can't target 2.4.1 or 2.5.1, but can target 2.5 or 2.6. In 
spirit, virtual memory check has been a pain and we end up recommending users 
to turn it off. 

> Turn the virtual memory check to be off by default
> --
>
> Key: YARN-2225
> URL: https://issues.apache.org/jira/browse/YARN-2225
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-2225.patch
>
>
> The virtual memory check may not be the best way to isolate applications. 
> Virtual memory is not the constrained resource. It would be better if we 
> limit the swapping of the task using swapiness instead. This patch will turn 
> this DEFAULT_NM_VMEM_CHECK_ENABLED off by default and let users turn it on if 
> they need to.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2225) Turn the virtual memory check to be off by default

2014-06-27 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046489#comment-14046489
 ] 

Vinod Kumar Vavilapalli commented on YARN-2225:
---

-1 for changing the default.. This breaks compatibility.

bq. The virtual memory check may not be the best way to isolate applications. 
Virtual memory is not the constrained resource.
I still see a lot of apps that needs isolation w.r.t vmem.

It's not about which resource is constrained, it is about isolation. We already 
identify physical memory as constrained and use that as the main scheduling 
dimension. 

> Turn the virtual memory check to be off by default
> --
>
> Key: YARN-2225
> URL: https://issues.apache.org/jira/browse/YARN-2225
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-2225.patch
>
>
> The virtual memory check may not be the best way to isolate applications. 
> Virtual memory is not the constrained resource. It would be better if we 
> limit the swapping of the task using swapiness instead. This patch will turn 
> this DEFAULT_NM_VMEM_CHECK_ENABLED off by default and let users turn it on if 
> they need to.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2225) Turn the virtual memory check to be off by default

2014-06-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046480#comment-14046480
 ] 

Hadoop QA commented on YARN-2225:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12652908/YARN-2225.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4126//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4126//console

This message is automatically generated.

> Turn the virtual memory check to be off by default
> --
>
> Key: YARN-2225
> URL: https://issues.apache.org/jira/browse/YARN-2225
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-2225.patch
>
>
> The virtual memory check may not be the best way to isolate applications. 
> Virtual memory is not the constrained resource. It would be better if we 
> limit the swapping of the task using swapiness instead. This patch will turn 
> this DEFAULT_NM_VMEM_CHECK_ENABLED off by default and let users turn it on if 
> they need to.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2224) Let TestContainersMonitor#testContainerKillOnMemoryOverflow work irrespective of the default settings

2014-06-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046469#comment-14046469
 ] 

Hadoop QA commented on YARN-2224:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12652903/YARN-2224.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4125//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4125//console

This message is automatically generated.

> Let TestContainersMonitor#testContainerKillOnMemoryOverflow work irrespective 
> of the default settings
> -
>
> Key: YARN-2224
> URL: https://issues.apache.org/jira/browse/YARN-2224
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-2224.patch
>
>
> If the default setting DEFAULT_NM_VMEM_CHECK_ENABLED is set to false the test 
> will fail. Make the test pass not rely on the default settings but just let 
> it verify that once the setting is turned on it actually does the memory 
> check. See YARN-2225 which suggests we turn the default off.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2225) Turn the virtual memory check to be off by default

2014-06-27 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-2225:


Attachment: YARN-2225.patch

> Turn the virtual memory check to be off by default
> --
>
> Key: YARN-2225
> URL: https://issues.apache.org/jira/browse/YARN-2225
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-2225.patch
>
>
> The virtual memory check may not be the best way to isolate applications. 
> Virtual memory is not the constrained resource. It would be better if we 
> limit the swapping of the task using swapiness instead. This patch will turn 
> this off by default and let users turn it on if they need to.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (YARN-2225) Turn the virtual memory check to be off by default

2014-06-27 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot reassigned YARN-2225:
---

Assignee: Anubhav Dhoot

> Turn the virtual memory check to be off by default
> --
>
> Key: YARN-2225
> URL: https://issues.apache.org/jira/browse/YARN-2225
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-2225.patch
>
>
> The virtual memory check may not be the best way to isolate applications. 
> Virtual memory is not the constrained resource. It would be better if we 
> limit the swapping of the task using swapiness instead. This patch will turn 
> this off by default and let users turn it on if they need to.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2225) Turn the virtual memory check to be off by default

2014-06-27 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-2225:


Description: The virtual memory check may not be the best way to isolate 
applications. Virtual memory is not the constrained resource. It would be 
better if we limit the swapping of the task using swapiness instead. This patch 
will turn this DEFAULT_NM_VMEM_CHECK_ENABLED off by default and let users turn 
it on if they need to.  (was: The virtual memory check may not be the best way 
to isolate applications. Virtual memory is not the constrained resource. It 
would be better if we limit the swapping of the task using swapiness instead. 
This patch will turn this off by default and let users turn it on if they need 
to.)

> Turn the virtual memory check to be off by default
> --
>
> Key: YARN-2225
> URL: https://issues.apache.org/jira/browse/YARN-2225
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-2225.patch
>
>
> The virtual memory check may not be the best way to isolate applications. 
> Virtual memory is not the constrained resource. It would be better if we 
> limit the swapping of the task using swapiness instead. This patch will turn 
> this DEFAULT_NM_VMEM_CHECK_ENABLED off by default and let users turn it on if 
> they need to.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2224) Let TestContainersMonitor#testContainerKillOnMemoryOverflow work irrespective of the default settings

2014-06-27 Thread Anubhav Dhoot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046446#comment-14046446
 ] 

Anubhav Dhoot commented on YARN-2224:
-

Once the test is made resilient, we can decide in YARN-2225 to turn the 
defaults off

> Let TestContainersMonitor#testContainerKillOnMemoryOverflow work irrespective 
> of the default settings
> -
>
> Key: YARN-2224
> URL: https://issues.apache.org/jira/browse/YARN-2224
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-2224.patch
>
>
> If the default setting DEFAULT_NM_VMEM_CHECK_ENABLED is set to false the test 
> will fail. Make the test pass not rely on the default settings but just let 
> it verify that once the setting is turned on it actually does the memory 
> check. See YARN-2225 which suggests we turn the default off.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2225) Turn the virtual memory check to be off by default

2014-06-27 Thread Anubhav Dhoot (JIRA)
Anubhav Dhoot created YARN-2225:
---

 Summary: Turn the virtual memory check to be off by default
 Key: YARN-2225
 URL: https://issues.apache.org/jira/browse/YARN-2225
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Anubhav Dhoot


The virtual memory check may not be the best way to isolate applications. 
Virtual memory is not the constrained resource. It would be better if we limit 
the swapping of the task using swapiness instead. This patch will turn this off 
by default and let users turn it on if they need to.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2224) Let TestContainersMonitor#testContainerKillOnMemoryOverflow work irrespective of the default settings

2014-06-27 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-2224:


Description: If the default setting DEFAULT_NM_VMEM_CHECK_ENABLED is set to 
false the test will fail. Make the test pass not rely on the default settings 
but just let it verify that once the setting is turned on it actually does the 
memory check. See YARN-2225 which suggests we turn the default off.  (was: If 
the default setting DEFAULT_NM_VMEM_CHECK_ENABLED is set to false the test will 
fail. Make the test pass not rely on the default settings but just let it 
verify that once the setting is turned on it actually does the memory check. )

> Let TestContainersMonitor#testContainerKillOnMemoryOverflow work irrespective 
> of the default settings
> -
>
> Key: YARN-2224
> URL: https://issues.apache.org/jira/browse/YARN-2224
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-2224.patch
>
>
> If the default setting DEFAULT_NM_VMEM_CHECK_ENABLED is set to false the test 
> will fail. Make the test pass not rely on the default settings but just let 
> it verify that once the setting is turned on it actually does the memory 
> check. See YARN-2225 which suggests we turn the default off.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2224) Let TestContainersMonitor#testContainerKillOnMemoryOverflow work irrespective of the default settings

2014-06-27 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-2224:


Attachment: YARN-2224.patch

Sets the flag to be true so that the test does not fail if the default was set 
to false.

> Let TestContainersMonitor#testContainerKillOnMemoryOverflow work irrespective 
> of the default settings
> -
>
> Key: YARN-2224
> URL: https://issues.apache.org/jira/browse/YARN-2224
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-2224.patch
>
>
> If the default setting DEFAULT_NM_VMEM_CHECK_ENABLED is set to false the test 
> will fail. Make the test pass not rely on the default settings but just let 
> it verify that once the setting is turned on it actually does the memory 
> check. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2224) Let TestContainersMonitor#testContainerKillOnMemoryOverflow work irrespective of the default settings

2014-06-27 Thread Anubhav Dhoot (JIRA)
Anubhav Dhoot created YARN-2224:
---

 Summary: Let 
TestContainersMonitor#testContainerKillOnMemoryOverflow work irrespective of 
the default settings
 Key: YARN-2224
 URL: https://issues.apache.org/jira/browse/YARN-2224
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Anubhav Dhoot


If the default setting DEFAULT_NM_VMEM_CHECK_ENABLED is set to false the test 
will fail. Make the test pass not rely on the default settings but just let it 
verify that once the setting is turned on it actually does the memory check. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2104) Scheduler queue filter failed to work because index of queue column changed

2014-06-27 Thread Maysam Yabandeh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046425#comment-14046425
 ] 

Maysam Yabandeh commented on YARN-2104:
---

+1
Worked for us. And the failed unit test seems irrelevant.

> Scheduler queue filter failed to work because index of queue column changed
> ---
>
> Key: YARN-2104
> URL: https://issues.apache.org/jira/browse/YARN-2104
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, webapp
>Affects Versions: 2.4.0
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-2104.patch
>
>
> YARN-563 added,
> {code}
> + th(".type", "Application Type”).
> {code}
> to application table, which makes queue’s column index from 3 to 4. And in 
> scheduler page, queue’s column index is hard coded to 3 when filter 
> application with queue’s name,
> {code}
>   "if (q == 'root') q = '';",
>   "else q = '^' + q.substr(q.lastIndexOf('.') + 1) + '$';",
>   "$('#apps').dataTable().fnFilter(q, 3, true);",
> {code}
> So queue filter will not work for application page.
> Reproduce steps: (Thanks Bo Yang for pointing this)
> {code}
> 1) In default setup, there’s a default queue under root queue
> 2) Run an arbitrary application, you can find it in “Applications” page
> 3) Click “Default” queue in scheduler page
> 4) Click “Applications”, no application will show here
> 5) Click “Root” queue in scheduler page
> 6) Click “Applications”, application will show again
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2052) ContainerId creation after work preserving restart is broken

2014-06-27 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046419#comment-14046419
 ] 

Jian He commented on YARN-2052:
---

Patch looks good overall, can you update MemoryStateStore also so that we can 
test the containerId issued by the new RM is correctly ? thx
{code}
-assertEquals(4, schedulerAttempt.getNewContainerId());
+assertEquals(1, schedulerAttempt.getNewContainerId());
{code}

> ContainerId creation after work preserving restart is broken
> 
>
> Key: YARN-2052
> URL: https://issues.apache.org/jira/browse/YARN-2052
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Tsuyoshi OZAWA
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-2052.1.patch, YARN-2052.2.patch, YARN-2052.3.patch, 
> YARN-2052.4.patch, YARN-2052.5.patch, YARN-2052.6.patch, YARN-2052.7.patch, 
> YARN-2052.8.patch, YARN-2052.9.patch, YARN-2052.9.patch
>
>
> Container ids are made unique by using the app identifier and appending a 
> monotonically increasing sequence number to it. Since container creation is a 
> high churn activity the RM does not store the sequence number per app. So 
> after restart it does not know what the new sequence number should be for new 
> allocations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2223) NPE on ResourceManager recover

2014-06-27 Thread Jon Bringhurst (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Bringhurst updated YARN-2223:
-

Description: 
I upgraded two clusters from tag 2.2.0 to branch-2.4.1 (latest commit is 
https://github.com/apache/hadoop-common/commit/c96c8e45a60651b677a1de338b7856a444dc0461).

Both clusters have the same config (other than hostnames). Both are running on 
JDK8u5 (I'm not sure if this is a factor here).

One cluster started up without any errors. The other started up with the 
following error on the RM:

{noformat}
18:33:45,463  WARN RMAppImpl:331 - The specific max attempts: 0 for 
application: 1 is invalid, because it is out of the range [1, 50]. Use the 
global max attempts instead.
18:33:45,465  INFO RMAppImpl:651 - Recovering app: 
application_1398450350082_0001 with 8 attempts and final state = KILLED
18:33:45,468  INFO RMAppAttemptImpl:691 - Recovering attempt: 
appattempt_1398450350082_0001_01 with final state: KILLED
18:33:45,478  INFO RMAppAttemptImpl:691 - Recovering attempt: 
appattempt_1398450350082_0001_02 with final state: FAILED
18:33:45,478  INFO RMAppAttemptImpl:691 - Recovering attempt: 
appattempt_1398450350082_0001_03 with final state: FAILED
18:33:45,479  INFO RMAppAttemptImpl:691 - Recovering attempt: 
appattempt_1398450350082_0001_04 with final state: FAILED
18:33:45,479  INFO RMAppAttemptImpl:691 - Recovering attempt: 
appattempt_1398450350082_0001_05 with final state: FAILED
18:33:45,480  INFO RMAppAttemptImpl:691 - Recovering attempt: 
appattempt_1398450350082_0001_06 with final state: FAILED
18:33:45,480  INFO RMAppAttemptImpl:691 - Recovering attempt: 
appattempt_1398450350082_0001_07 with final state: FAILED
18:33:45,481  INFO RMAppAttemptImpl:691 - Recovering attempt: 
appattempt_1398450350082_0001_08 with final state: FAILED
18:33:45,482  INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0001_01 
State change from NEW to KILLED
18:33:45,482  INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0001_02 
State change from NEW to FAILED
18:33:45,482  INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0001_03 
State change from NEW to FAILED
18:33:45,482  INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0001_04 
State change from NEW to FAILED
18:33:45,483  INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0001_05 
State change from NEW to FAILED
18:33:45,483  INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0001_06 
State change from NEW to FAILED
18:33:45,483  INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0001_07 
State change from NEW to FAILED
18:33:45,483  INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0001_08 
State change from NEW to FAILED
18:33:45,485  INFO RMAppImpl:639 - application_1398450350082_0001 State change 
from NEW to KILLED
18:33:45,485  WARN RMAppImpl:331 - The specific max attempts: 0 for 
application: 2 is invalid, because it is out of the range [1, 50]. Use the 
global max attempts instead.
18:33:45,485  INFO RMAppImpl:651 - Recovering app: 
application_1398450350082_0002 with 8 attempts and final state = KILLED
18:33:45,486  INFO RMAppAttemptImpl:691 - Recovering attempt: 
appattempt_1398450350082_0002_01 with final state: KILLED
18:33:45,486  INFO RMAppAttemptImpl:691 - Recovering attempt: 
appattempt_1398450350082_0002_02 with final state: FAILED
18:33:45,487  INFO RMAppAttemptImpl:691 - Recovering attempt: 
appattempt_1398450350082_0002_03 with final state: FAILED
18:33:45,487  INFO RMAppAttemptImpl:691 - Recovering attempt: 
appattempt_1398450350082_0002_04 with final state: FAILED
18:33:45,488  INFO RMAppAttemptImpl:691 - Recovering attempt: 
appattempt_1398450350082_0002_05 with final state: FAILED
18:33:45,488  INFO RMAppAttemptImpl:691 - Recovering attempt: 
appattempt_1398450350082_0002_06 with final state: FAILED
18:33:45,489  INFO RMAppAttemptImpl:691 - Recovering attempt: 
appattempt_1398450350082_0002_07 with final state: FAILED
18:33:45,489  INFO RMAppAttemptImpl:691 - Recovering attempt: 
appattempt_1398450350082_0002_08 with final state: FAILED
18:33:45,490  INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0002_01 
State change from NEW to KILLED
18:33:45,490  INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0002_02 
State change from NEW to FAILED
18:33:45,490  INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0002_03 
State change from NEW to FAILED
18:33:45,490  INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0002_04 
State change from NEW to FAILED
18:33:45,491  INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0002_05 
State change from NEW to FAILED
18:33:45,491  INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0002_06 
State change from NEW to FAILED
18:33:45,491  INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0002_07 
State change from NEW to FAILED
18:

[jira] [Created] (YARN-2223) NPE on ResourceManager recover

2014-06-27 Thread Jon Bringhurst (JIRA)
Jon Bringhurst created YARN-2223:


 Summary: NPE on ResourceManager recover
 Key: YARN-2223
 URL: https://issues.apache.org/jira/browse/YARN-2223
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.1
Reporter: Jon Bringhurst


I upgraded two clusters from tag 2.2.0 to branch-2.4.1 (latest commit is 
https://github.com/apache/hadoop-common/commit/c96c8e45a60651b677a1de338b7856a444dc0461).

Both clusters have the same config (other than hostnames). Both are running on 
JDK8u5 (I'm not sure if this is a factor here).

One cluster started up without any errors. The other started up with the 
following error on the RM:

{noformat}
18:33:45,463  WARN RMAppImpl:331 - The specific max attempts: 0 for 
application: 1 is invalid, because it is out of the range [1, 50]. Use the 
global max attempts instead.
18:33:45,465  INFO RMAppImpl:651 - Recovering app: 
application_1398450350082_0001 with 8 attempts and final state = KILLED
18:33:45,468  INFO RMAppAttemptImpl:691 - Recovering attempt: 
appattempt_1398450350082_0001_01 with final state: KILLED
18:33:45,478  INFO RMAppAttemptImpl:691 - Recovering attempt: 
appattempt_1398450350082_0001_02 with final state: FAILED
18:33:45,478  INFO RMAppAttemptImpl:691 - Recovering attempt: 
appattempt_1398450350082_0001_03 with final state: FAILED
18:33:45,479  INFO RMAppAttemptImpl:691 - Recovering attempt: 
appattempt_1398450350082_0001_04 with final state: FAILED
18:33:45,479  INFO RMAppAttemptImpl:691 - Recovering attempt: 
appattempt_1398450350082_0001_05 with final state: FAILED
18:33:45,480  INFO RMAppAttemptImpl:691 - Recovering attempt: 
appattempt_1398450350082_0001_06 with final state: FAILED
18:33:45,480  INFO RMAppAttemptImpl:691 - Recovering attempt: 
appattempt_1398450350082_0001_07 with final state: FAILED
18:33:45,481  INFO RMAppAttemptImpl:691 - Recovering attempt: 
appattempt_1398450350082_0001_08 with final state: FAILED
18:33:45,482  INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0001_01 
State change from NEW to KILLED
18:33:45,482  INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0001_02 
State change from NEW to FAILED
18:33:45,482  INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0001_03 
State change from NEW to FAILED
18:33:45,482  INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0001_04 
State change from NEW to FAILED
18:33:45,483  INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0001_05 
State change from NEW to FAILED
18:33:45,483  INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0001_06 
State change from NEW to FAILED
18:33:45,483  INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0001_07 
State change from NEW to FAILED
18:33:45,483  INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0001_08 
State change from NEW to FAILED
18:33:45,485  INFO RMAppImpl:639 - application_1398450350082_0001 State change 
from NEW to KILLED
18:33:45,485  WARN RMAppImpl:331 - The specific max attempts: 0 for 
application: 2 is invalid, because it is out of the range [1, 50]. Use the 
global max attempts instead.
18:33:45,485  INFO RMAppImpl:651 - Recovering app: 
application_1398450350082_0002 with 8 attempts and final state = KILLED
18:33:45,486  INFO RMAppAttemptImpl:691 - Recovering attempt: 
appattempt_1398450350082_0002_01 with final state: KILLED
18:33:45,486  INFO RMAppAttemptImpl:691 - Recovering attempt: 
appattempt_1398450350082_0002_02 with final state: FAILED
18:33:45,487  INFO RMAppAttemptImpl:691 - Recovering attempt: 
appattempt_1398450350082_0002_03 with final state: FAILED
18:33:45,487  INFO RMAppAttemptImpl:691 - Recovering attempt: 
appattempt_1398450350082_0002_04 with final state: FAILED
18:33:45,488  INFO RMAppAttemptImpl:691 - Recovering attempt: 
appattempt_1398450350082_0002_05 with final state: FAILED
18:33:45,488  INFO RMAppAttemptImpl:691 - Recovering attempt: 
appattempt_1398450350082_0002_06 with final state: FAILED
18:33:45,489  INFO RMAppAttemptImpl:691 - Recovering attempt: 
appattempt_1398450350082_0002_07 with final state: FAILED
18:33:45,489  INFO RMAppAttemptImpl:691 - Recovering attempt: 
appattempt_1398450350082_0002_08 with final state: FAILED
18:33:45,490  INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0002_01 
State change from NEW to KILLED
18:33:45,490  INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0002_02 
State change from NEW to FAILED
18:33:45,490  INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0002_03 
State change from NEW to FAILED
18:33:45,490  INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0002_04 
State change from NEW to FAILED
18:33:45,491  INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0002_05 
State change from NEW to FAILED
18:33:45,491  INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0002_06 
State c

Re: Anyone know how to mock a secured hdfs for unit test?

2014-06-27 Thread Chris Nauroth
Hi David and Kai,

There are a couple of challenges with this, but I just figured out a pretty
decent setup while working on HDFS-2856.  That code isn't committed yet,
but if you open patch version 5 attached to that issue and look for the
TestSaslDataTransfer class, then you'll see how it works.  Most of the
logic for bootstrapping a MiniKDC and setting up the right HDFS
configuration properties is in an abstract base class named
SaslDataTransferTestCase.

I hope this helps.

There are a few other open issues out there related to tests in secure
mode.  I know of HDFS-4312 and HDFS-5410.  It would be great to get more
regular test coverage with something that more closely approximates a
secured deployment.

Chris Nauroth
Hortonworks
http://hortonworks.com/



On Thu, Jun 26, 2014 at 7:27 AM, Zheng, Kai  wrote:

> Hi David,
>
> Quite some time ago I opened HADOOP-9952 and planned to create secured
> MiniClusters by making use of MiniKDC. Unfortunately since then I didn't
> get the chance to work on it yet. If you need something like that and would
> contribute, please let me know and see if anything I can help with. Thanks.
>
> Regards,
> Kai
>
> -Original Message-
> From: Liu, David [mailto:liujion...@gmail.com]
> Sent: Thursday, June 26, 2014 10:12 PM
> To: hdfs-...@hadoop.apache.org; hdfs-iss...@hadoop.apache.org;
> yarn-...@hadoop.apache.org; yarn-issues@hadoop.apache.org;
> mapreduce-...@hadoop.apache.org; secur...@hadoop.apache.org
> Subject: Anyone know how to mock a secured hdfs for unit test?
>
> Hi all,
>
> I need to test my code which read data from secured hdfs, is there any
> library to mock secured hdfs, can minihdfscluster do the work?
> Any suggestion is appreciated.
>
>
> Thanks
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


[jira] [Commented] (YARN-614) Separate AM failures from hardware failure or YARN error and do not count them to AM retry count

2014-06-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046350#comment-14046350
 ] 

Hadoop QA commented on YARN-614:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12652876/YARN-614.12.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4124//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4124//console

This message is automatically generated.

> Separate AM failures from hardware failure or YARN error and do not count 
> them to AM retry count
> 
>
> Key: YARN-614
> URL: https://issues.apache.org/jira/browse/YARN-614
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Bikas Saha
>Assignee: Xuan Gong
> Fix For: 2.5.0
>
> Attachments: YARN-614-0.patch, YARN-614-1.patch, YARN-614-2.patch, 
> YARN-614-3.patch, YARN-614-4.patch, YARN-614-5.patch, YARN-614-6.patch, 
> YARN-614.10.patch, YARN-614.11.patch, YARN-614.12.patch, YARN-614.7.patch, 
> YARN-614.8.patch, YARN-614.9.patch
>
>
> Attempts can fail due to a large number of user errors and they should not be 
> retried unnecessarily. The only reason YARN should retry an attempt is when 
> the hardware fails or YARN has an error. NM failing, lost NM and NM disk 
> errors are the hardware errors that come to mind.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1713) Implement getnewapplication and submitapp as part of RM web service

2014-06-27 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1713:
---

Priority: Blocker  (was: Major)
Target Version/s: 2.5.0

> Implement getnewapplication and submitapp as part of RM web service
> ---
>
> Key: YARN-1713
> URL: https://issues.apache.org/jira/browse/YARN-1713
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
>Priority: Blocker
> Attachments: apache-yarn-1713.3.patch, apache-yarn-1713.4.patch, 
> apache-yarn-1713.5.patch, apache-yarn-1713.6.patch, apache-yarn-1713.7.patch, 
> apache-yarn-1713.8.patch, apache-yarn-1713.cumulative.2.patch, 
> apache-yarn-1713.cumulative.3.patch, apache-yarn-1713.cumulative.4.patch, 
> apache-yarn-1713.cumulative.patch, apache-yarn-1713.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1695) Implement the rest (writable APIs) of RM web-services

2014-06-27 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1695:
---

Priority: Blocker  (was: Major)

> Implement the rest (writable APIs) of RM web-services
> -
>
> Key: YARN-1695
> URL: https://issues.apache.org/jira/browse/YARN-1695
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Varun Vasudev
>Priority: Blocker
>
> MAPREDUCE-2863 added the REST web-services to RM and NM. But all the APIs 
> added there were only focused on obtaining information from the cluster. We 
> need to have the following REST APIs to finish the feature
>  - Application submission/termination (Priority): This unblocks easy client 
> interaction with a YARN cluster
>  - Application Client protocol: For resource scheduling by apps written in an 
> arbitrary language. Will have to think about throughput concerns
>  - ContainerManagement Protocol: Again for arbitrary language apps.
> One important thing to note here is that we already have client libraries on 
> all the three protocols that do some some heavy-lifting. One part of the 
> effort is to figure out if they can be made any thinner and/or how 
> web-services will implement the same functionality.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1373) Transition RMApp and RMAppAttempt state to RUNNING after restart for recovered running apps

2014-06-27 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046328#comment-14046328
 ] 

Vinod Kumar Vavilapalli commented on YARN-1373:
---

Since YARN-1210, we always have had the app and app-attempt move to RUNNING 
state after RM restarts. That's why it is a dup.

> Transition RMApp and RMAppAttempt state to RUNNING after restart for 
> recovered running apps
> ---
>
> Key: YARN-1373
> URL: https://issues.apache.org/jira/browse/YARN-1373
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Bikas Saha
>Assignee: Omkar Vinit Joshi
>
> Currently the RM moves recovered app attempts to the a terminal recovered 
> state and starts a new attempt. Instead, it will have to transition the last 
> attempt to a running state such that it can proceed as normal once the 
> running attempt has resynced with the ApplicationMasterService (YARN-1365 and 
> YARN-1366). If the RM had started the application container before dying then 
> the AM would be up and trying to contact the RM. The RM may have had died 
> before launching the container. For this case, the RM should wait for AM 
> liveliness period and issue a kill container for the stored master container. 
> It should transition this attempt to some RECOVER_ERROR state and proceed to 
> start a new attempt.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-614) Separate AM failures from hardware failure or YARN error and do not count them to AM retry count

2014-06-27 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-614:
---

Attachment: YARN-614.12.patch

> Separate AM failures from hardware failure or YARN error and do not count 
> them to AM retry count
> 
>
> Key: YARN-614
> URL: https://issues.apache.org/jira/browse/YARN-614
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Bikas Saha
>Assignee: Xuan Gong
> Fix For: 2.5.0
>
> Attachments: YARN-614-0.patch, YARN-614-1.patch, YARN-614-2.patch, 
> YARN-614-3.patch, YARN-614-4.patch, YARN-614-5.patch, YARN-614-6.patch, 
> YARN-614.10.patch, YARN-614.11.patch, YARN-614.12.patch, YARN-614.7.patch, 
> YARN-614.8.patch, YARN-614.9.patch
>
>
> Attempts can fail due to a large number of user errors and they should not be 
> retried unnecessarily. The only reason YARN should retry an attempt is when 
> the hardware fails or YARN has an error. NM failing, lost NM and NM disk 
> errors are the hardware errors that come to mind.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-614) Separate AM failures from hardware failure or YARN error and do not count them to AM retry count

2014-06-27 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046300#comment-14046300
 ] 

Xuan Gong commented on YARN-614:


Not sure why this 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions fails, 
it passed on my local machine.
org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter
 is not related
For 
org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart,
 it fails because of time-out.  I added more logic on the test case, I need to 
increase the time-out.

Submitted new patch to kick the Jenkins again..

> Separate AM failures from hardware failure or YARN error and do not count 
> them to AM retry count
> 
>
> Key: YARN-614
> URL: https://issues.apache.org/jira/browse/YARN-614
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Bikas Saha
>Assignee: Xuan Gong
> Fix For: 2.5.0
>
> Attachments: YARN-614-0.patch, YARN-614-1.patch, YARN-614-2.patch, 
> YARN-614-3.patch, YARN-614-4.patch, YARN-614-5.patch, YARN-614-6.patch, 
> YARN-614.10.patch, YARN-614.11.patch, YARN-614.7.patch, YARN-614.8.patch, 
> YARN-614.9.patch
>
>
> Attempts can fail due to a large number of user errors and they should not be 
> retried unnecessarily. The only reason YARN should retry an attempt is when 
> the hardware fails or YARN has an error. NM failing, lost NM and NM disk 
> errors are the hardware errors that come to mind.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2204) TestAMRestart#testAMRestartWithExistingContainers assumes CapacityScheduler

2014-06-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046219#comment-14046219
 ] 

Hudson commented on YARN-2204:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5790 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5790/])
YARN-2204. Addendum patch. TestAMRestart#testAMRestartWithExistingContainers 
assumes CapacityScheduler. (Robert Kanter via kasha) (kasha: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1606168)
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRestart.java


> TestAMRestart#testAMRestartWithExistingContainers assumes CapacityScheduler
> ---
>
> Key: YARN-2204
> URL: https://issues.apache.org/jira/browse/YARN-2204
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.5.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
>Priority: Trivial
> Fix For: 2.5.0
>
> Attachments: YARN-2204.patch, YARN-2204_addendum.patch, 
> YARN-2204_addendum.patch
>
>
> TestAMRestart#testAMRestartWithExistingContainers assumes CapacityScheduler



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-614) Separate AM failures from hardware failure or YARN error and do not count them to AM retry count

2014-06-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046217#comment-14046217
 ] 

Hadoop QA commented on YARN-614:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12652862/YARN-614.11.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions
  
org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter
  
org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4123//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4123//console

This message is automatically generated.

> Separate AM failures from hardware failure or YARN error and do not count 
> them to AM retry count
> 
>
> Key: YARN-614
> URL: https://issues.apache.org/jira/browse/YARN-614
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Bikas Saha
>Assignee: Xuan Gong
> Fix For: 2.5.0
>
> Attachments: YARN-614-0.patch, YARN-614-1.patch, YARN-614-2.patch, 
> YARN-614-3.patch, YARN-614-4.patch, YARN-614-5.patch, YARN-614-6.patch, 
> YARN-614.10.patch, YARN-614.11.patch, YARN-614.7.patch, YARN-614.8.patch, 
> YARN-614.9.patch
>
>
> Attempts can fail due to a large number of user errors and they should not be 
> retried unnecessarily. The only reason YARN should retry an attempt is when 
> the hardware fails or YARN has an error. NM failing, lost NM and NM disk 
> errors are the hardware errors that come to mind.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1408) Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task timeout for 30mins

2014-06-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046199#comment-14046199
 ] 

Hadoop QA commented on YARN-1408:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12652860/Yarn-1408.5.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4122//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4122//console

This message is automatically generated.

> Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task 
> timeout for 30mins
> --
>
> Key: YARN-1408
> URL: https://issues.apache.org/jira/browse/YARN-1408
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.2.0
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: Yarn-1408.1.patch, Yarn-1408.2.patch, Yarn-1408.3.patch, 
> Yarn-1408.4.patch, Yarn-1408.5.patch, Yarn-1408.patch
>
>
> Capacity preemption is enabled as follows.
>  *  yarn.resourcemanager.scheduler.monitor.enable= true ,
>  *  
> yarn.resourcemanager.scheduler.monitor.policies=org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy
> Queue = a,b
> Capacity of Queue A = 80%
> Capacity of Queue B = 20%
> Step 1: Assign a big jobA on queue a which uses full cluster capacity
> Step 2: Submitted a jobB to queue b  which would use less than 20% of cluster 
> capacity
> JobA task which uses queue b capcity is been preempted and killed.
> This caused below problem:
> 1. New Container has got allocated for jobA in Queue A as per node update 
> from an NM.
> 2. This container has been preempted immediately as per preemption.
> Here ACQUIRED at KILLED Invalid State exception came when the next AM 
> heartbeat reached RM.
> ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
> Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> ACQUIRED at KILLED
> This also caused the Task to go for a timeout for 30minutes as this Container 
> was already killed by preemption.
> attempt_1380289782418_0003_m_00_0 Timed out after 1800 secs



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1366) AM should implement Resync with the ApplicationMasterService instead of shutting down

2014-06-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046176#comment-14046176
 ] 

Hadoop QA commented on YARN-1366:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12652857/YARN-1366.5.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The following test timeouts occurred in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client:

org.apache.hadoop.yarn.client.api.impl.TestAMRMClient

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4121//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/4121//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-client.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4121//console

This message is automatically generated.

> AM should implement Resync with the ApplicationMasterService instead of 
> shutting down
> -
>
> Key: YARN-1366
> URL: https://issues.apache.org/jira/browse/YARN-1366
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Bikas Saha
>Assignee: Rohith
> Attachments: YARN-1366.1.patch, YARN-1366.2.patch, YARN-1366.3.patch, 
> YARN-1366.4.patch, YARN-1366.5.patch, YARN-1366.patch, 
> YARN-1366.prototype.patch, YARN-1366.prototype.patch
>
>
> The ApplicationMasterService currently sends a resync response to which the 
> AM responds by shutting down. The AM behavior is expected to change to 
> calling resyncing with the RM. Resync means resetting the allocate RPC 
> sequence number to 0 and the AM should send its entire outstanding request to 
> the RM. Note that if the AM is making its first allocate call to the RM then 
> things should proceed like normal without needing a resync. The RM will 
> return all containers that have completed since the RM last synced with the 
> AM. Some container completions may be reported more than once.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-614) Separate AM failures from hardware failure or YARN error and do not count them to AM retry count

2014-06-27 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-614:
---

Attachment: YARN-614.11.patch

Added more testcases

> Separate AM failures from hardware failure or YARN error and do not count 
> them to AM retry count
> 
>
> Key: YARN-614
> URL: https://issues.apache.org/jira/browse/YARN-614
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Bikas Saha
>Assignee: Xuan Gong
> Fix For: 2.5.0
>
> Attachments: YARN-614-0.patch, YARN-614-1.patch, YARN-614-2.patch, 
> YARN-614-3.patch, YARN-614-4.patch, YARN-614-5.patch, YARN-614-6.patch, 
> YARN-614.10.patch, YARN-614.11.patch, YARN-614.7.patch, YARN-614.8.patch, 
> YARN-614.9.patch
>
>
> Attempts can fail due to a large number of user errors and they should not be 
> retried unnecessarily. The only reason YARN should retry an attempt is when 
> the hardware fails or YARN has an error. NM failing, lost NM and NM disk 
> errors are the hardware errors that come to mind.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1408) Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task timeout for 30mins

2014-06-27 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-1408:
--

Attachment: Yarn-1408.5.patch

> Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task 
> timeout for 30mins
> --
>
> Key: YARN-1408
> URL: https://issues.apache.org/jira/browse/YARN-1408
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.2.0
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: Yarn-1408.1.patch, Yarn-1408.2.patch, Yarn-1408.3.patch, 
> Yarn-1408.4.patch, Yarn-1408.5.patch, Yarn-1408.patch
>
>
> Capacity preemption is enabled as follows.
>  *  yarn.resourcemanager.scheduler.monitor.enable= true ,
>  *  
> yarn.resourcemanager.scheduler.monitor.policies=org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy
> Queue = a,b
> Capacity of Queue A = 80%
> Capacity of Queue B = 20%
> Step 1: Assign a big jobA on queue a which uses full cluster capacity
> Step 2: Submitted a jobB to queue b  which would use less than 20% of cluster 
> capacity
> JobA task which uses queue b capcity is been preempted and killed.
> This caused below problem:
> 1. New Container has got allocated for jobA in Queue A as per node update 
> from an NM.
> 2. This container has been preempted immediately as per preemption.
> Here ACQUIRED at KILLED Invalid State exception came when the next AM 
> heartbeat reached RM.
> ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
> Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> ACQUIRED at KILLED
> This also caused the Task to go for a timeout for 30minutes as this Container 
> was already killed by preemption.
> attempt_1380289782418_0003_m_00_0 Timed out after 1800 secs



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1408) Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task timeout for 30mins

2014-06-27 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-1408:
--

Attachment: (was: Yarn-1408.5.patch)

> Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task 
> timeout for 30mins
> --
>
> Key: YARN-1408
> URL: https://issues.apache.org/jira/browse/YARN-1408
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.2.0
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: Yarn-1408.1.patch, Yarn-1408.2.patch, Yarn-1408.3.patch, 
> Yarn-1408.4.patch, Yarn-1408.patch
>
>
> Capacity preemption is enabled as follows.
>  *  yarn.resourcemanager.scheduler.monitor.enable= true ,
>  *  
> yarn.resourcemanager.scheduler.monitor.policies=org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy
> Queue = a,b
> Capacity of Queue A = 80%
> Capacity of Queue B = 20%
> Step 1: Assign a big jobA on queue a which uses full cluster capacity
> Step 2: Submitted a jobB to queue b  which would use less than 20% of cluster 
> capacity
> JobA task which uses queue b capcity is been preempted and killed.
> This caused below problem:
> 1. New Container has got allocated for jobA in Queue A as per node update 
> from an NM.
> 2. This container has been preempted immediately as per preemption.
> Here ACQUIRED at KILLED Invalid State exception came when the next AM 
> heartbeat reached RM.
> ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
> Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> ACQUIRED at KILLED
> This also caused the Task to go for a timeout for 30minutes as this Container 
> was already killed by preemption.
> attempt_1380289782418_0003_m_00_0 Timed out after 1800 secs



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1366) AM should implement Resync with the ApplicationMasterService instead of shutting down

2014-06-27 Thread Rohith (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith updated YARN-1366:
-

Attachment: YARN-1366.5.patch

I updated the patch for following incremental change.
1. Reregister for AmRMClient if unregister throw 
ApplicationMasterNotRegisteredException.
2. Unregister will be called only if it is registered.

Please review the updated patch

> AM should implement Resync with the ApplicationMasterService instead of 
> shutting down
> -
>
> Key: YARN-1366
> URL: https://issues.apache.org/jira/browse/YARN-1366
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Bikas Saha
>Assignee: Rohith
> Attachments: YARN-1366.1.patch, YARN-1366.2.patch, YARN-1366.3.patch, 
> YARN-1366.4.patch, YARN-1366.5.patch, YARN-1366.patch, 
> YARN-1366.prototype.patch, YARN-1366.prototype.patch
>
>
> The ApplicationMasterService currently sends a resync response to which the 
> AM responds by shutting down. The AM behavior is expected to change to 
> calling resyncing with the RM. Resync means resetting the allocate RPC 
> sequence number to 0 and the AM should send its entire outstanding request to 
> the RM. Note that if the AM is making its first allocate call to the RM then 
> things should proceed like normal without needing a resync. The RM will 
> return all containers that have completed since the RM last synced with the 
> AM. Some container completions may be reported more than once.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-415) Capture memory utilization at the app-level for chargeback

2014-06-27 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046113#comment-14046113
 ] 

Eric Payne commented on YARN-415:
-

Test failures for TestRMApplicationHistoryWriter predate this patch.

> Capture memory utilization at the app-level for chargeback
> --
>
> Key: YARN-415
> URL: https://issues.apache.org/jira/browse/YARN-415
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: resourcemanager
>Affects Versions: 0.23.6
>Reporter: Kendall Thrapp
>Assignee: Andrey Klochkov
> Attachments: YARN-415--n10.patch, YARN-415--n2.patch, 
> YARN-415--n3.patch, YARN-415--n4.patch, YARN-415--n5.patch, 
> YARN-415--n6.patch, YARN-415--n7.patch, YARN-415--n8.patch, 
> YARN-415--n9.patch, YARN-415.201405311749.txt, YARN-415.201406031616.txt, 
> YARN-415.201406262136.txt, YARN-415.patch
>
>
> For the purpose of chargeback, I'd like to be able to compute the cost of an
> application in terms of cluster resource usage.  To start out, I'd like to 
> get the memory utilization of an application.  The unit should be MB-seconds 
> or something similar and, from a chargeback perspective, the memory amount 
> should be the memory reserved for the application, as even if the app didn't 
> use all that memory, no one else was able to use it.
> (reserved ram for container 1 * lifetime of container 1) + (reserved ram for
> container 2 * lifetime of container 2) + ... + (reserved ram for container n 
> * lifetime of container n)
> It'd be nice to have this at the app level instead of the job level because:
> 1. We'd still be able to get memory usage for jobs that crashed (and wouldn't 
> appear on the job history server).
> 2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm).
> This new metric should be available both through the RM UI and RM Web 
> Services REST API.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2201) TestRMWebServicesAppsModification dependent on yarn-default.xml

2014-06-27 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046109#comment-14046109
 ] 

Ray Chiang commented on YARN-2201:
--

+1 for the latest patch.  The tests are now independent of changes in 
yarn-default.xml.

> TestRMWebServicesAppsModification dependent on yarn-default.xml
> ---
>
> Key: YARN-2201
> URL: https://issues.apache.org/jira/browse/YARN-2201
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Ray Chiang
>Assignee: Varun Vasudev
>  Labels: test
> Attachments: apache-yarn-2201.0.patch, apache-yarn-2201.1.patch, 
> apache-yarn-2201.2.patch, apache-yarn-2201.3.patch
>
>
> TestRMWebServicesAppsModification.java has some errors that are 
> yarn-default.xml dependent.  By changing yarn-default.xml properties, I'm 
> seeing the following errors:
> 1) Changing yarn.resourcemanager.scheduler.class from 
> capacity.CapacityScheduler to fair.FairScheduler gives the error:
> Running 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification
> Tests run: 10, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 79.047 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification
> testSingleAppKillUnauthorized[1](org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification)
>   Time elapsed: 3.22 sec  <<< FAILURE!
> java.lang.AssertionError: expected: but was:
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.failNotEquals(Assert.java:743)
> at org.junit.Assert.assertEquals(Assert.java:118)
> at org.junit.Assert.assertEquals(Assert.java:144)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testSingleAppKillUnauthorized(TestRMWebServicesAppsModification.java:458)
> 2) Changing yarn.acl.enable from false to true results in the following 
> errors:
> Running 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification
> Tests run: 10, Failures: 4, Errors: 0, Skipped: 0, Time elapsed: 49.044 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification
> testSingleAppKill[0](org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification)
>   Time elapsed: 2.986 sec  <<< FAILURE!
> java.lang.AssertionError: expected: but was:
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.failNotEquals(Assert.java:743)
> at org.junit.Assert.assertEquals(Assert.java:118)
> at org.junit.Assert.assertEquals(Assert.java:144)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testSingleAppKill(TestRMWebServicesAppsModification.java:287)
> testSingleAppKillInvalidState[0](org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification)
>   Time elapsed: 2.258 sec  <<< FAILURE!
> java.lang.AssertionError: expected: but was:
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.failNotEquals(Assert.java:743)
> at org.junit.Assert.assertEquals(Assert.java:118)
> at org.junit.Assert.assertEquals(Assert.java:144)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testSingleAppKillInvalidState(TestRMWebServicesAppsModification.java:369)
> testSingleAppKillUnauthorized[0](org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification)
>   Time elapsed: 2.263 sec  <<< FAILURE!
> java.lang.AssertionError: expected: but was:
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.failNotEquals(Assert.java:743)
> at org.junit.Assert.assertEquals(Assert.java:118)
> at org.junit.Assert.assertEquals(Assert.java:144)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testSingleAppKillUnauthorized(TestRMWebServicesAppsModification.java:458)
> testSingleAppKillInvalidId[0](org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification)
>   Time elapsed: 0.214 sec  <<< FAILURE!
> java.lang.AssertionError: expected: but was:
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.failNotEquals(Assert.java:743)
> at org.junit.Assert.assertEquals(Assert.java:118)
> at org.junit.Assert.assertEquals(Assert.java:144)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testSingleAppKillInvalidId(TestRMWebServicesAppsModification.java:482)
> I'm opening this JIRA as a discussion for the best way to fix this.  I've got 
> a few ideas,

[jira] [Commented] (YARN-896) Roll up for long-lived services in YARN

2014-06-27 Thread john lilley (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046065#comment-14046065
 ] 

john lilley commented on YARN-896:
--

Grreetings!  Arun pointed me to this JIRA to see if this could potentially meet 
our needs.  We are an ISV that currently ships a data-quality/integration suite 
running as a native YARN application.  We are finding several use cases that 
would benefit from being able to manage a per-node persistent service.  
MapReduce has its “shuffle auxiliary service”, but it isn’t straightforward to 
add auxiliary services because they cannot be loaded from HDFS, so we’d have to 
manage the distribution of JARs across nodes (please tell me if I’m wrong 
here…).

This seems to be addressing a lot of the issues around persistent services, and 
frankly I'm out of my depth in this discussion.  But if you all can help me 
understand if this might help our situation, I'd be happy to have our team put 
shoulder to the wheel and help advance the development.  Please comment our 
contemplated use case and help me understand if this is the right place to be.

Our software doesn't use MapReduce.  It is a pure YARN application that is 
basically a peer to MapReduce.  There are a lot of reasons for this decision, 
but the main one is that we have a large code base that already executes data 
transformations in a single-server environment, and we wanted to produce a 
product without rewriting huge swaths of code.  Given that, our software takes 
care of many things usually delegated to MapReduce, including distributed 
sort/partition (i.e. "the shuffle").  However, MapReduce has a special place in 
the ecosystem, in that it creates an auxiliary service to handle the 
distribution of shuffle data to reducers.  It doesn't look like third-party 
apps have an easy time installing aux services.  The JARs for any such service 
must be in Hadoop's classpath on all nodes at startup, creating both a 
management issue and a trust/security issue.  Currently our software places 
temporary data into HDFS for this purpose, but we've found that HDFS has a huge 
overhead in terms of performance and file handles, even at low replication.  We 
desire to replace the use of HDFS with a lighter-weight service to manage temp 
files and distribute their data.


> Roll up for long-lived services in YARN
> ---
>
> Key: YARN-896
> URL: https://issues.apache.org/jira/browse/YARN-896
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Robert Joseph Evans
>
> YARN is intended to be general purpose, but it is missing some features to be 
> able to truly support long lived applications and long lived containers.
> This ticket is intended to
>  # discuss what is needed to support long lived processes
>  # track the resulting JIRA.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2034) Description for yarn.nodemanager.localizer.cache.target-size-mb is incorrect

2014-06-27 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046028#comment-14046028
 ] 

Tsuyoshi OZAWA commented on YARN-2034:
--

+1(non-binding)

> Description for yarn.nodemanager.localizer.cache.target-size-mb is incorrect
> 
>
> Key: YARN-2034
> URL: https://issues.apache.org/jira/browse/YARN-2034
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 0.23.10, 2.4.0
>Reporter: Jason Lowe
>Assignee: Chen He
>Priority: Minor
>  Labels: documentation
> Attachments: YARN-2034.patch, YARN-2034.patch
>
>
> The description in yarn-default.xml for 
> yarn.nodemanager.localizer.cache.target-size-mb says that it is a setting per 
> local directory, but according to the code it's a setting for the entire node.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2034) Description for yarn.nodemanager.localizer.cache.target-size-mb is incorrect

2014-06-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046019#comment-14046019
 ] 

Hadoop QA commented on YARN-2034:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12652832/YARN-2034.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+0 tests included{color}.  The patch appears to be a 
documentation patch that doesn't require tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4120//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4120//console

This message is automatically generated.

> Description for yarn.nodemanager.localizer.cache.target-size-mb is incorrect
> 
>
> Key: YARN-2034
> URL: https://issues.apache.org/jira/browse/YARN-2034
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 0.23.10, 2.4.0
>Reporter: Jason Lowe
>Assignee: Chen He
>Priority: Minor
>  Labels: documentation
> Attachments: YARN-2034.patch, YARN-2034.patch
>
>
> The description in yarn-default.xml for 
> yarn.nodemanager.localizer.cache.target-size-mb says that it is a setting per 
> local directory, but according to the code it's a setting for the entire node.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2222) Helper scirpt: looping tests until it fails

2014-06-27 Thread Tsuyoshi OZAWA (JIRA)
Tsuyoshi OZAWA created YARN-:


 Summary: Helper scirpt: looping tests until it fails
 Key: YARN-
 URL: https://issues.apache.org/jira/browse/YARN-
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA


Some tests can fail intermittently because of timing bugs. To reproduce the 
test failure, it's useful to add script which launches specified test until it 
fails.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2034) Description for yarn.nodemanager.localizer.cache.target-size-mb is incorrect

2014-06-27 Thread Chen He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen He updated YARN-2034:
--

Attachment: YARN-2034.patch

resubmit to trigger HadoopQA

> Description for yarn.nodemanager.localizer.cache.target-size-mb is incorrect
> 
>
> Key: YARN-2034
> URL: https://issues.apache.org/jira/browse/YARN-2034
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 0.23.10, 2.4.0
>Reporter: Jason Lowe
>Assignee: Chen He
>Priority: Minor
>  Labels: documentation
> Attachments: YARN-2034.patch, YARN-2034.patch
>
>
> The description in yarn-default.xml for 
> yarn.nodemanager.localizer.cache.target-size-mb says that it is a setting per 
> local directory, but according to the code it's a setting for the entire node.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2034) Description for yarn.nodemanager.localizer.cache.target-size-mb is incorrect

2014-06-27 Thread Chen He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen He updated YARN-2034:
--

Labels: documentation  (was: )

> Description for yarn.nodemanager.localizer.cache.target-size-mb is incorrect
> 
>
> Key: YARN-2034
> URL: https://issues.apache.org/jira/browse/YARN-2034
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 0.23.10, 2.4.0
>Reporter: Jason Lowe
>Assignee: Chen He
>Priority: Minor
>  Labels: documentation
> Attachments: YARN-2034.patch
>
>
> The description in yarn-default.xml for 
> yarn.nodemanager.localizer.cache.target-size-mb says that it is a setting per 
> local directory, but according to the code it's a setting for the entire node.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2034) Description for yarn.nodemanager.localizer.cache.target-size-mb is incorrect

2014-06-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14045991#comment-14045991
 ] 

Hadoop QA commented on YARN-2034:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12644210/YARN-2034.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4118//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4118//console

This message is automatically generated.

> Description for yarn.nodemanager.localizer.cache.target-size-mb is incorrect
> 
>
> Key: YARN-2034
> URL: https://issues.apache.org/jira/browse/YARN-2034
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 0.23.10, 2.4.0
>Reporter: Jason Lowe
>Assignee: Chen He
>Priority: Minor
> Attachments: YARN-2034.patch
>
>
> The description in yarn-default.xml for 
> yarn.nodemanager.localizer.cache.target-size-mb says that it is a setting per 
> local directory, but according to the code it's a setting for the entire node.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2178) TestApplicationMasterService sometimes fails in trunk

2014-06-27 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14045974#comment-14045974
 ] 

Tsuyoshi OZAWA commented on YARN-2178:
--

[~mitdesai] [~ted_yu] FYI: I use this bash script to reproduce timing bugs: 
https://github.com/oza/failchecker

{code}
$ ./failchecker TestApplicationMasterService
{code}

This scripts run specified tests iteratively until it fails.

> TestApplicationMasterService sometimes fails in trunk
> -
>
> Key: YARN-2178
> URL: https://issues.apache.org/jira/browse/YARN-2178
> Project: Hadoop YARN
>  Issue Type: Test
>Reporter: Ted Yu
>Priority: Minor
>  Labels: test
>
> From https://builds.apache.org/job/Hadoop-Yarn-trunk/587/ :
> {code}
> Running 
> org.apache.hadoop.yarn.server.resourcemanager.TestApplicationMasterService
> Tests run: 4, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 55.763 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.resourcemanager.TestApplicationMasterService
> testInvalidContainerReleaseRequest(org.apache.hadoop.yarn.server.resourcemanager.TestApplicationMasterService)
>   Time elapsed: 41.336 sec  <<< FAILURE!
> java.lang.AssertionError: AppAttempt state is not correct (timedout) 
> expected: but was:
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.MockAM.waitForState(MockAM.java:82)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.MockRM.sendAMLaunched(MockRM.java:401)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestApplicationMasterService.testInvalidContainerReleaseRequest(TestApplicationMasterService.java:143)
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1408) Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task timeout for 30mins

2014-06-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14045970#comment-14045970
 ] 

Hadoop QA commented on YARN-1408:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12652828/Yarn-1408.5.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4119//console

This message is automatically generated.

> Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task 
> timeout for 30mins
> --
>
> Key: YARN-1408
> URL: https://issues.apache.org/jira/browse/YARN-1408
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.2.0
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: Yarn-1408.1.patch, Yarn-1408.2.patch, Yarn-1408.3.patch, 
> Yarn-1408.4.patch, Yarn-1408.5.patch, Yarn-1408.patch
>
>
> Capacity preemption is enabled as follows.
>  *  yarn.resourcemanager.scheduler.monitor.enable= true ,
>  *  
> yarn.resourcemanager.scheduler.monitor.policies=org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy
> Queue = a,b
> Capacity of Queue A = 80%
> Capacity of Queue B = 20%
> Step 1: Assign a big jobA on queue a which uses full cluster capacity
> Step 2: Submitted a jobB to queue b  which would use less than 20% of cluster 
> capacity
> JobA task which uses queue b capcity is been preempted and killed.
> This caused below problem:
> 1. New Container has got allocated for jobA in Queue A as per node update 
> from an NM.
> 2. This container has been preempted immediately as per preemption.
> Here ACQUIRED at KILLED Invalid State exception came when the next AM 
> heartbeat reached RM.
> ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
> Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> ACQUIRED at KILLED
> This also caused the Task to go for a timeout for 30minutes as this Container 
> was already killed by preemption.
> attempt_1380289782418_0003_m_00_0 Timed out after 1800 secs



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1408) Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task timeout for 30mins

2014-06-27 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-1408:
--

Attachment: Yarn-1408.5.patch

Hi [~vinodkv] [~leftnoteasy]
Please find initial patch.

Some information about the patch.
* While recovering ResourceRequest, if such an entry is found in Scheduling 
Info then the number of container is incremented. Else added as a new entry.
* Adding a new OffRackRequest also while recovering, if the stored request is 
not OffRack.
* AM would have asked for NodeLocal in another Hosts, which may not be able to 
recover.

Kindly review.

> Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task 
> timeout for 30mins
> --
>
> Key: YARN-1408
> URL: https://issues.apache.org/jira/browse/YARN-1408
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.2.0
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: Yarn-1408.1.patch, Yarn-1408.2.patch, Yarn-1408.3.patch, 
> Yarn-1408.4.patch, Yarn-1408.5.patch, Yarn-1408.patch
>
>
> Capacity preemption is enabled as follows.
>  *  yarn.resourcemanager.scheduler.monitor.enable= true ,
>  *  
> yarn.resourcemanager.scheduler.monitor.policies=org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy
> Queue = a,b
> Capacity of Queue A = 80%
> Capacity of Queue B = 20%
> Step 1: Assign a big jobA on queue a which uses full cluster capacity
> Step 2: Submitted a jobB to queue b  which would use less than 20% of cluster 
> capacity
> JobA task which uses queue b capcity is been preempted and killed.
> This caused below problem:
> 1. New Container has got allocated for jobA in Queue A as per node update 
> from an NM.
> 2. This container has been preempted immediately as per preemption.
> Here ACQUIRED at KILLED Invalid State exception came when the next AM 
> heartbeat reached RM.
> ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
> Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> ACQUIRED at KILLED
> This also caused the Task to go for a timeout for 30minutes as this Container 
> was already killed by preemption.
> attempt_1380289782418_0003_m_00_0 Timed out after 1800 secs



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1341) Recover NMTokens upon nodemanager restart

2014-06-27 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14045959#comment-14045959
 ] 

Jason Lowe commented on YARN-1341:
--

Agree it's not ideal to discuss handling state store errors for all NM 
components in this JIRA.  In general I'd prefer to discuss and address each 
case with the corresponding JIRA, e.g.: application state store errors 
discussed and addressed in YARN-1354, container state store errors in 
YARN-1337, etc.  If we feel there's significant utility to committing a JIRA 
before all the issues are addressed then we can file one or more followup JIRAs 
to track those outstanding issues.  That's the normal process we follow with 
other features/fixes as well.  

So if we follow that process then we're back to the discussion about RM master 
keys not being able to be stored in the state store.  The choices we've 
discussed are:

1) Log an error, update the master key in memory, and continue
2) Log an error, _not_ update the master key in memory, and continue
3) Log an error and tear down the NM

I'd prefer 1) since that is the option that preserves the most work in all 
scenarios I can think of, and I don't know of a scenario where 2) would handle 
it better.  However I could be convinced given the right scenario.  I'd really 
rather avoid 3) since that seems like a severe way to "handle" the error and 
guarantees work is lost.

Oh there is one more handling scenario we briefly discussed where we flag the 
NM as "undesirable".  When that occurs we don't shoot the containers that are 
running, but we avoid adding new containers since the node is having issues 
(i.e.: a drain-decommission).  I feel that would be a separate JIRA since it 
needs YARN-914, and we'd still need to decide how to handle the error until the 
decommission is complete (i.e.: choice 1 or 2 above).

> Recover NMTokens upon nodemanager restart
> -
>
> Key: YARN-1341
> URL: https://issues.apache.org/jira/browse/YARN-1341
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.3.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Attachments: YARN-1341.patch, YARN-1341v2.patch, YARN-1341v3.patch, 
> YARN-1341v4-and-YARN-1987.patch, YARN-1341v5.patch, YARN-1341v6.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-570) Time strings are formated in different timezone

2014-06-27 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14045945#comment-14045945
 ] 

Tsuyoshi OZAWA commented on YARN-570:
-

{quote}
The format of JavaScript Date.toLocaleString() varies by the browser. 
{quote}

One alternative to make format same is to change {{renderHadoopDate}} to return 
same format as {{yarn.util.Times#format()}} does instead of using 
{{Date#toLocaleString}}. [~ajisakaa], [~qwertymaniac], what do you think?

> Time strings are formated in different timezone
> ---
>
> Key: YARN-570
> URL: https://issues.apache.org/jira/browse/YARN-570
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Affects Versions: 2.2.0
>Reporter: Peng Zhang
>Assignee: Akira AJISAKA
> Attachments: MAPREDUCE-5141.patch, YARN-570.2.patch
>
>
> Time strings on different page are displayed in different timezone.
> If it is rendered by renderHadoopDate() in yarn.dt.plugins.js, it appears as 
> "Wed, 10 Apr 2013 08:29:56 GMT"
> If it is formatted by format() in yarn.util.Times, it appears as "10-Apr-2013 
> 16:29:56"
> Same value, but different timezone.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1514) Utility to benchmark ZKRMStateStore#loadState for ResourceManager-HA

2014-06-27 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14045926#comment-14045926
 ] 

Tsuyoshi OZAWA commented on YARN-1514:
--

[~kkambatl], could you take a look at this JIRA? This per tools is useful and I 
hope to include this feature in 2.5.0 release.

> Utility to benchmark ZKRMStateStore#loadState for ResourceManager-HA
> 
>
> Key: YARN-1514
> URL: https://issues.apache.org/jira/browse/YARN-1514
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Tsuyoshi OZAWA
>Assignee: Tsuyoshi OZAWA
> Fix For: 2.5.0
>
> Attachments: YARN-1514.1.patch, YARN-1514.2.patch, 
> YARN-1514.wip-2.patch, YARN-1514.wip.patch
>
>
> ZKRMStateStore is very sensitive to ZNode-related operations as discussed in 
> YARN-1307, YARN-1378 and so on. Especially, ZKRMStateStore#loadState is 
> called when RM-HA cluster does failover. Therefore, its execution time 
> impacts failover time of RM-HA.
> We need utility to benchmark time execution time of ZKRMStateStore#loadStore 
> as development tool.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1872) DistributedShell occasionally keeps running endlessly

2014-06-27 Thread Hong Zhiguo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Zhiguo updated YARN-1872:
--

Summary: DistributedShell occasionally keeps running endlessly  (was: 
TestDistributedShell occasionally fails in trunk)

> DistributedShell occasionally keeps running endlessly
> -
>
> Key: YARN-1872
> URL: https://issues.apache.org/jira/browse/YARN-1872
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Hong Zhiguo
> Attachments: TestDistributedShell.out, YARN-1872.patch
>
>
> From https://builds.apache.org/job/Hadoop-Yarn-trunk/520/console :
> TestDistributedShell#testDSShellWithCustomLogPropertyFile failed and 
> TestDistributedShell#testDSShell timed out.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2130) Cleanup: Adding getRMAppManager, getQueueACLsManager, getApplicationACLsManager to RMContext

2014-06-27 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14045799#comment-14045799
 ] 

Tsuyoshi OZAWA commented on YARN-2130:
--

The test failure of TestRMApplicationHistoryWriter is not related and the issue 
is filed as YARN-2216. 

> Cleanup: Adding getRMAppManager, getQueueACLsManager, 
> getApplicationACLsManager to RMContext
> 
>
> Key: YARN-2130
> URL: https://issues.apache.org/jira/browse/YARN-2130
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Tsuyoshi OZAWA
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-2130.1.patch, YARN-2130.2.patch, YARN-2130.3.patch, 
> YARN-2130.4.patch, YARN-2130.5.patch, YARN-2130.6.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2130) Cleanup: Adding getRMAppManager, getQueueACLsManager, getApplicationACLsManager to RMContext

2014-06-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14045795#comment-14045795
 ] 

Hadoop QA commented on YARN-2130:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12652792/YARN-2130.6.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 17 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4117//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4117//console

This message is automatically generated.

> Cleanup: Adding getRMAppManager, getQueueACLsManager, 
> getApplicationACLsManager to RMContext
> 
>
> Key: YARN-2130
> URL: https://issues.apache.org/jira/browse/YARN-2130
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Tsuyoshi OZAWA
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-2130.1.patch, YARN-2130.2.patch, YARN-2130.3.patch, 
> YARN-2130.4.patch, YARN-2130.5.patch, YARN-2130.6.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-570) Time strings are formated in different timezone

2014-06-27 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14045772#comment-14045772
 ] 

Tsuyoshi OZAWA commented on YARN-570:
-

[~qwertymaniac], Thank you for the review. If we'll make time format same 
completely, we need to change lots parts to use same format function. As a 
temporary fix that addresses this issue at first, Akira's patch looks good to 
me. What do you think?

I think the timezone difference confuses users frequently, so we should fix it 
in the next release(2.5.0).

> Time strings are formated in different timezone
> ---
>
> Key: YARN-570
> URL: https://issues.apache.org/jira/browse/YARN-570
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Affects Versions: 2.2.0
>Reporter: Peng Zhang
>Assignee: Akira AJISAKA
> Attachments: MAPREDUCE-5141.patch, YARN-570.2.patch
>
>
> Time strings on different page are displayed in different timezone.
> If it is rendered by renderHadoopDate() in yarn.dt.plugins.js, it appears as 
> "Wed, 10 Apr 2013 08:29:56 GMT"
> If it is formatted by format() in yarn.util.Times, it appears as "10-Apr-2013 
> 16:29:56"
> Same value, but different timezone.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2130) Cleanup: Adding getRMAppManager, getQueueACLsManager, getApplicationACLsManager to RMContext

2014-06-27 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-2130:
-

Attachment: YARN-2130.6.patch

Rebased on trunk.

> Cleanup: Adding getRMAppManager, getQueueACLsManager, 
> getApplicationACLsManager to RMContext
> 
>
> Key: YARN-2130
> URL: https://issues.apache.org/jira/browse/YARN-2130
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Tsuyoshi OZAWA
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-2130.1.patch, YARN-2130.2.patch, YARN-2130.3.patch, 
> YARN-2130.4.patch, YARN-2130.5.patch, YARN-2130.6.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2142) Add one service to check the nodes' TRUST status

2014-06-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14045745#comment-14045745
 ] 

Hadoop QA commented on YARN-2142:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12652788/trust001.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

  {color:red}-1 javac{color}.  The applied patch generated 1266 javac 
compiler warnings (more than the trunk's current 1258 warnings).

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-auth.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4116//testReport/
Javac warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/4116//artifact/trunk/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4116//console

This message is automatically generated.

> Add one service to check the nodes' TRUST status 
> -
>
> Key: YARN-2142
> URL: https://issues.apache.org/jira/browse/YARN-2142
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager, resourcemanager, scheduler
>Affects Versions: 2.2.0
> Environment: OS:Ubuntu 13.04; 
> JAVA:OpenJDK 7u51-2.4.4-0
>Reporter: anders
>Priority: Minor
>  Labels: patch
> Fix For: 2.2.0
>
> Attachments: test.patch, trust.patch, trust.patch, trust.patch, 
> trust001.patch
>
>   Original Estimate: 1m
>  Remaining Estimate: 1m
>
> Because of critical computing environment ,we must test every node's TRUST 
> status in the cluster (We can get the TRUST status by the API of OAT 
> sever),So I add this feature into hadoop's schedule .
> By the TRUST check service ,node can get the TRUST status of itself,
> then through the heartbeat ,send the TRUST status to resource manager for 
> scheduling.
> In the scheduling step,if the node's TRUST status is 'false', it will be 
> abandoned until it's TRUST status turn to 'true'.
> ***The logic of this feature is similar to node's health checkservice.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2104) Scheduler queue filter failed to work because index of queue column changed

2014-06-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14045742#comment-14045742
 ] 

Hadoop QA commented on YARN-2104:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12652783/YARN-2104.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4113//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4113//console

This message is automatically generated.

> Scheduler queue filter failed to work because index of queue column changed
> ---
>
> Key: YARN-2104
> URL: https://issues.apache.org/jira/browse/YARN-2104
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, webapp
>Affects Versions: 2.4.0
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-2104.patch
>
>
> YARN-563 added,
> {code}
> + th(".type", "Application Type”).
> {code}
> to application table, which makes queue’s column index from 3 to 4. And in 
> scheduler page, queue’s column index is hard coded to 3 when filter 
> application with queue’s name,
> {code}
>   "if (q == 'root') q = '';",
>   "else q = '^' + q.substr(q.lastIndexOf('.') + 1) + '$';",
>   "$('#apps').dataTable().fnFilter(q, 3, true);",
> {code}
> So queue filter will not work for application page.
> Reproduce steps: (Thanks Bo Yang for pointing this)
> {code}
> 1) In default setup, there’s a default queue under root queue
> 2) Run an arbitrary application, you can find it in “Applications” page
> 3) Click “Default” queue in scheduler page
> 4) Click “Applications”, no application will show here
> 5) Click “Root” queue in scheduler page
> 6) Click “Applications”, application will show again
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-570) Time strings are formated in different timezone

2014-06-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14045740#comment-14045740
 ] 

Hadoop QA commented on YARN-570:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12644756/YARN-570.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4115//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4115//console

This message is automatically generated.

> Time strings are formated in different timezone
> ---
>
> Key: YARN-570
> URL: https://issues.apache.org/jira/browse/YARN-570
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Affects Versions: 2.2.0
>Reporter: Peng Zhang
>Assignee: Akira AJISAKA
> Attachments: MAPREDUCE-5141.patch, YARN-570.2.patch
>
>
> Time strings on different page are displayed in different timezone.
> If it is rendered by renderHadoopDate() in yarn.dt.plugins.js, it appears as 
> "Wed, 10 Apr 2013 08:29:56 GMT"
> If it is formatted by format() in yarn.util.Times, it appears as "10-Apr-2013 
> 16:29:56"
> Same value, but different timezone.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2201) TestRMWebServicesAppsModification dependent on yarn-default.xml

2014-06-27 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14045726#comment-14045726
 ] 

Varun Vasudev commented on YARN-2201:
-

Test failure is unrelated.

> TestRMWebServicesAppsModification dependent on yarn-default.xml
> ---
>
> Key: YARN-2201
> URL: https://issues.apache.org/jira/browse/YARN-2201
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Ray Chiang
>Assignee: Varun Vasudev
>  Labels: test
> Attachments: apache-yarn-2201.0.patch, apache-yarn-2201.1.patch, 
> apache-yarn-2201.2.patch, apache-yarn-2201.3.patch
>
>
> TestRMWebServicesAppsModification.java has some errors that are 
> yarn-default.xml dependent.  By changing yarn-default.xml properties, I'm 
> seeing the following errors:
> 1) Changing yarn.resourcemanager.scheduler.class from 
> capacity.CapacityScheduler to fair.FairScheduler gives the error:
> Running 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification
> Tests run: 10, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 79.047 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification
> testSingleAppKillUnauthorized[1](org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification)
>   Time elapsed: 3.22 sec  <<< FAILURE!
> java.lang.AssertionError: expected: but was:
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.failNotEquals(Assert.java:743)
> at org.junit.Assert.assertEquals(Assert.java:118)
> at org.junit.Assert.assertEquals(Assert.java:144)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testSingleAppKillUnauthorized(TestRMWebServicesAppsModification.java:458)
> 2) Changing yarn.acl.enable from false to true results in the following 
> errors:
> Running 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification
> Tests run: 10, Failures: 4, Errors: 0, Skipped: 0, Time elapsed: 49.044 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification
> testSingleAppKill[0](org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification)
>   Time elapsed: 2.986 sec  <<< FAILURE!
> java.lang.AssertionError: expected: but was:
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.failNotEquals(Assert.java:743)
> at org.junit.Assert.assertEquals(Assert.java:118)
> at org.junit.Assert.assertEquals(Assert.java:144)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testSingleAppKill(TestRMWebServicesAppsModification.java:287)
> testSingleAppKillInvalidState[0](org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification)
>   Time elapsed: 2.258 sec  <<< FAILURE!
> java.lang.AssertionError: expected: but was:
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.failNotEquals(Assert.java:743)
> at org.junit.Assert.assertEquals(Assert.java:118)
> at org.junit.Assert.assertEquals(Assert.java:144)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testSingleAppKillInvalidState(TestRMWebServicesAppsModification.java:369)
> testSingleAppKillUnauthorized[0](org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification)
>   Time elapsed: 2.263 sec  <<< FAILURE!
> java.lang.AssertionError: expected: but was:
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.failNotEquals(Assert.java:743)
> at org.junit.Assert.assertEquals(Assert.java:118)
> at org.junit.Assert.assertEquals(Assert.java:144)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testSingleAppKillUnauthorized(TestRMWebServicesAppsModification.java:458)
> testSingleAppKillInvalidId[0](org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification)
>   Time elapsed: 0.214 sec  <<< FAILURE!
> java.lang.AssertionError: expected: but was:
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.failNotEquals(Assert.java:743)
> at org.junit.Assert.assertEquals(Assert.java:118)
> at org.junit.Assert.assertEquals(Assert.java:144)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testSingleAppKillInvalidId(TestRMWebServicesAppsModification.java:482)
> I'm opening this JIRA as a discussion for the best way to fix this.  I've got 
> a few ideas, but I would like to get some feedback about potentially

[jira] [Updated] (YARN-2142) Add one service to check the nodes' TRUST status

2014-06-27 Thread anders (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

anders updated YARN-2142:
-

Attachment: trust001.patch

modify the xml

> Add one service to check the nodes' TRUST status 
> -
>
> Key: YARN-2142
> URL: https://issues.apache.org/jira/browse/YARN-2142
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager, resourcemanager, scheduler
>Affects Versions: 2.2.0
> Environment: OS:Ubuntu 13.04; 
> JAVA:OpenJDK 7u51-2.4.4-0
>Reporter: anders
>Priority: Minor
>  Labels: patch
> Fix For: 2.2.0
>
> Attachments: test.patch, trust.patch, trust.patch, trust.patch, 
> trust001.patch
>
>   Original Estimate: 1m
>  Remaining Estimate: 1m
>
> Because of critical computing environment ,we must test every node's TRUST 
> status in the cluster (We can get the TRUST status by the API of OAT 
> sever),So I add this feature into hadoop's schedule .
> By the TRUST check service ,node can get the TRUST status of itself,
> then through the heartbeat ,send the TRUST status to resource manager for 
> scheduling.
> In the scheduling step,if the node's TRUST status is 'false', it will be 
> abandoned until it's TRUST status turn to 'true'.
> ***The logic of this feature is similar to node's health checkservice.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2142) Add one service to check the nodes' TRUST status

2014-06-27 Thread anders (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

anders updated YARN-2142:
-

Attachment: (was: trust.patch)

> Add one service to check the nodes' TRUST status 
> -
>
> Key: YARN-2142
> URL: https://issues.apache.org/jira/browse/YARN-2142
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager, resourcemanager, scheduler
>Affects Versions: 2.2.0
> Environment: OS:Ubuntu 13.04; 
> JAVA:OpenJDK 7u51-2.4.4-0
>Reporter: anders
>Priority: Minor
>  Labels: patch
> Fix For: 2.2.0
>
> Attachments: test.patch, trust.patch, trust.patch, trust.patch, 
> trust001.patch
>
>   Original Estimate: 1m
>  Remaining Estimate: 1m
>
> Because of critical computing environment ,we must test every node's TRUST 
> status in the cluster (We can get the TRUST status by the API of OAT 
> sever),So I add this feature into hadoop's schedule .
> By the TRUST check service ,node can get the TRUST status of itself,
> then through the heartbeat ,send the TRUST status to resource manager for 
> scheduling.
> In the scheduling step,if the node's TRUST status is 'false', it will be 
> abandoned until it's TRUST status turn to 'true'.
> ***The logic of this feature is similar to node's health checkservice.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2052) ContainerId creation after work preserving restart is broken

2014-06-27 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14045720#comment-14045720
 ] 

Tsuyoshi OZAWA commented on YARN-2052:
--

The test failure of TestRMApplicationHistoryWriter is filed as YARN-2216. This 
failure not related to this JIRA.

[~jianhe] [~vinodkv], can you take a look, please?

> ContainerId creation after work preserving restart is broken
> 
>
> Key: YARN-2052
> URL: https://issues.apache.org/jira/browse/YARN-2052
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Tsuyoshi OZAWA
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-2052.1.patch, YARN-2052.2.patch, YARN-2052.3.patch, 
> YARN-2052.4.patch, YARN-2052.5.patch, YARN-2052.6.patch, YARN-2052.7.patch, 
> YARN-2052.8.patch, YARN-2052.9.patch, YARN-2052.9.patch
>
>
> Container ids are made unique by using the app identifier and appending a 
> monotonically increasing sequence number to it. Since container creation is a 
> high churn activity the RM does not store the sequence number per app. So 
> after restart it does not know what the new sequence number should be for new 
> allocations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1810) YARN RM Webapp Application page Issue

2014-06-27 Thread Peng Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14045717#comment-14045717
 ] 

Peng Zhang commented on YARN-1810:
--

OK, I created JIRA: https://issues.apache.org/jira/browse/YARN-2221

> YARN RM Webapp Application page Issue
> -
>
> Key: YARN-1810
> URL: https://issues.apache.org/jira/browse/YARN-1810
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, webapp
>Affects Versions: 2.3.0
>Reporter: Ethan Setnik
> Attachments: Screen Shot 2014-03-10 at 3.59.54 PM.png, Screen Shot 
> 2014-03-11 at 1.40.12 PM.png
>
>
> When browsing the ResourceManager's web interface I am presented with the 
> attached screenshot.
> I can't understand why it does not show the applications, even though there 
> is no search text.  The application counts show the correct values relative 
> to the submissions, successes, and failures.
> Also see the text in the screenshot:
> "Showing 0 to 0 of 0 entries (filtered from 19 total entries)"



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2221) WebUI: RM scheduler page's queue filter status will affect appllication page

2014-06-27 Thread Peng Zhang (JIRA)
Peng Zhang created YARN-2221:


 Summary: WebUI: RM scheduler page's queue filter status will 
affect appllication page
 Key: YARN-2221
 URL: https://issues.apache.org/jira/browse/YARN-2221
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, webapp
Affects Versions: 2.4.0
Reporter: Peng Zhang
Priority: Minor


Apps queue filter added by clicking queue bar in scheduler page will affect 
display of applications page.
No filter query is shown on applications page, this makes confusions.
Also we cannot reset the filter query on application page, and we must come 
back to scheduler page, click "root" queue to reset. 

Reproduce steps: 
{code}
1) Configure two queues under root( A & B)
2) Run some apps using queue A and B respectively
3) Click “A” queue in scheduler page
4) Click “Applications”, only apps of queue A show
5) Click “B” queue in scheduler page
6) Click “Applications”, only apps of queue B show
{code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2142) Add one service to check the nodes' TRUST status

2014-06-27 Thread anders (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

anders updated YARN-2142:
-

Attachment: trust.patch

> Add one service to check the nodes' TRUST status 
> -
>
> Key: YARN-2142
> URL: https://issues.apache.org/jira/browse/YARN-2142
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager, resourcemanager, scheduler
>Affects Versions: 2.2.0
> Environment: OS:Ubuntu 13.04; 
> JAVA:OpenJDK 7u51-2.4.4-0
>Reporter: anders
>Priority: Minor
>  Labels: patch
> Fix For: 2.2.0
>
> Attachments: test.patch, trust.patch, trust.patch, trust.patch, 
> trust.patch
>
>   Original Estimate: 1m
>  Remaining Estimate: 1m
>
> Because of critical computing environment ,we must test every node's TRUST 
> status in the cluster (We can get the TRUST status by the API of OAT 
> sever),So I add this feature into hadoop's schedule .
> By the TRUST check service ,node can get the TRUST status of itself,
> then through the heartbeat ,send the TRUST status to resource manager for 
> scheduling.
> In the scheduling step,if the node's TRUST status is 'false', it will be 
> abandoned until it's TRUST status turn to 'true'.
> ***The logic of this feature is similar to node's health checkservice.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2142) Add one service to check the nodes' TRUST status

2014-06-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14045713#comment-14045713
 ] 

Hadoop QA commented on YARN-2142:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12652787/trust.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4114//console

This message is automatically generated.

> Add one service to check the nodes' TRUST status 
> -
>
> Key: YARN-2142
> URL: https://issues.apache.org/jira/browse/YARN-2142
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager, resourcemanager, scheduler
>Affects Versions: 2.2.0
> Environment: OS:Ubuntu 13.04; 
> JAVA:OpenJDK 7u51-2.4.4-0
>Reporter: anders
>Priority: Minor
>  Labels: patch
> Fix For: 2.2.0
>
> Attachments: test.patch, trust.patch, trust.patch, trust.patch, 
> trust.patch
>
>   Original Estimate: 1m
>  Remaining Estimate: 1m
>
> Because of critical computing environment ,we must test every node's TRUST 
> status in the cluster (We can get the TRUST status by the API of OAT 
> sever),So I add this feature into hadoop's schedule .
> By the TRUST check service ,node can get the TRUST status of itself,
> then through the heartbeat ,send the TRUST status to resource manager for 
> scheduling.
> In the scheduling step,if the node's TRUST status is 'false', it will be 
> abandoned until it's TRUST status turn to 'true'.
> ***The logic of this feature is similar to node's health checkservice.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2104) Scheduler queue filter failed to work because index of queue column changed

2014-06-27 Thread Peng Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14045708#comment-14045708
 ] 

Peng Zhang commented on YARN-2104:
--

Looks good to me.

> Scheduler queue filter failed to work because index of queue column changed
> ---
>
> Key: YARN-2104
> URL: https://issues.apache.org/jira/browse/YARN-2104
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, webapp
>Affects Versions: 2.4.0
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-2104.patch
>
>
> YARN-563 added,
> {code}
> + th(".type", "Application Type”).
> {code}
> to application table, which makes queue’s column index from 3 to 4. And in 
> scheduler page, queue’s column index is hard coded to 3 when filter 
> application with queue’s name,
> {code}
>   "if (q == 'root') q = '';",
>   "else q = '^' + q.substr(q.lastIndexOf('.') + 1) + '$';",
>   "$('#apps').dataTable().fnFilter(q, 3, true);",
> {code}
> So queue filter will not work for application page.
> Reproduce steps: (Thanks Bo Yang for pointing this)
> {code}
> 1) In default setup, there’s a default queue under root queue
> 2) Run an arbitrary application, you can find it in “Applications” page
> 3) Click “Default” queue in scheduler page
> 4) Click “Applications”, no application will show here
> 5) Click “Root” queue in scheduler page
> 6) Click “Applications”, application will show again
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1810) YARN RM Webapp Application page Issue

2014-06-27 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14045697#comment-14045697
 ] 

Wangda Tan commented on YARN-1810:
--

I've uploaded a simple fix to YARN-2104, please kindly review!
[~peng.zhang], good suggestion, could you create a JIRA to track it? 

> YARN RM Webapp Application page Issue
> -
>
> Key: YARN-1810
> URL: https://issues.apache.org/jira/browse/YARN-1810
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, webapp
>Affects Versions: 2.3.0
>Reporter: Ethan Setnik
> Attachments: Screen Shot 2014-03-10 at 3.59.54 PM.png, Screen Shot 
> 2014-03-11 at 1.40.12 PM.png
>
>
> When browsing the ResourceManager's web interface I am presented with the 
> attached screenshot.
> I can't understand why it does not show the applications, even though there 
> is no search text.  The application counts show the correct values relative 
> to the submissions, successes, and failures.
> Also see the text in the screenshot:
> "Showing 0 to 0 of 0 entries (filtered from 19 total entries)"



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2104) Scheduler queue filter failed to work because index of queue column changed

2014-06-27 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2104:
-

Attachment: YARN-2104.patch

Attached a simple fix for this

> Scheduler queue filter failed to work because index of queue column changed
> ---
>
> Key: YARN-2104
> URL: https://issues.apache.org/jira/browse/YARN-2104
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, webapp
>Affects Versions: 2.4.0
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-2104.patch
>
>
> YARN-563 added,
> {code}
> + th(".type", "Application Type”).
> {code}
> to application table, which makes queue’s column index from 3 to 4. And in 
> scheduler page, queue’s column index is hard coded to 3 when filter 
> application with queue’s name,
> {code}
>   "if (q == 'root') q = '';",
>   "else q = '^' + q.substr(q.lastIndexOf('.') + 1) + '$';",
>   "$('#apps').dataTable().fnFilter(q, 3, true);",
> {code}
> So queue filter will not work for application page.
> Reproduce steps: (Thanks Bo Yang for pointing this)
> {code}
> 1) In default setup, there’s a default queue under root queue
> 2) Run an arbitrary application, you can find it in “Applications” page
> 3) Click “Default” queue in scheduler page
> 4) Click “Applications”, no application will show here
> 5) Click “Root” queue in scheduler page
> 6) Click “Applications”, application will show again
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2052) ContainerId creation after work preserving restart is broken

2014-06-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14045681#comment-14045681
 ] 

Hadoop QA commented on YARN-2052:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12652767/YARN-2052.9.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 7 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4112//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4112//console

This message is automatically generated.

> ContainerId creation after work preserving restart is broken
> 
>
> Key: YARN-2052
> URL: https://issues.apache.org/jira/browse/YARN-2052
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Tsuyoshi OZAWA
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-2052.1.patch, YARN-2052.2.patch, YARN-2052.3.patch, 
> YARN-2052.4.patch, YARN-2052.5.patch, YARN-2052.6.patch, YARN-2052.7.patch, 
> YARN-2052.8.patch, YARN-2052.9.patch, YARN-2052.9.patch
>
>
> Container ids are made unique by using the app identifier and appending a 
> monotonically increasing sequence number to it. Since container creation is a 
> high churn activity the RM does not store the sequence number per app. So 
> after restart it does not know what the new sequence number should be for new 
> allocations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1810) YARN RM Webapp Application page Issue

2014-06-27 Thread Peng Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14045679#comment-14045679
 ] 

Peng Zhang commented on YARN-1810:
--

I updated $('#apps').dataTable().fnFilter(q, 3, true);" field number from 3 to 
4, click “default" queue bar, applications will not disappear.

But I found this fnFilter query will be maintained to "Application" page. As we 
have multiple queues, If I  click one of them in scheduler page, and go to 
application page, only applications of clicked queue will show, other 
applications are filtered. Cause no filter query shows on page, so this may 
cause confusions.



> YARN RM Webapp Application page Issue
> -
>
> Key: YARN-1810
> URL: https://issues.apache.org/jira/browse/YARN-1810
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, webapp
>Affects Versions: 2.3.0
>Reporter: Ethan Setnik
> Attachments: Screen Shot 2014-03-10 at 3.59.54 PM.png, Screen Shot 
> 2014-03-11 at 1.40.12 PM.png
>
>
> When browsing the ResourceManager's web interface I am presented with the 
> attached screenshot.
> I can't understand why it does not show the applications, even though there 
> is no search text.  The application counts show the correct values relative 
> to the submissions, successes, and failures.
> Also see the text in the screenshot:
> "Showing 0 to 0 of 0 entries (filtered from 19 total entries)"



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2163) WebUI: Order of AppId in apps table should be consistent with ApplicationId.compareTo().

2014-06-27 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14045668#comment-14045668
 ] 

Wangda Tan commented on YARN-2163:
--

Thanks [~raviprak] for review and commit!

> WebUI: Order of AppId in apps table should be consistent with 
> ApplicationId.compareTo().
> 
>
> Key: YARN-2163
> URL: https://issues.apache.org/jira/browse/YARN-2163
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, webapp
>Affects Versions: 2.4.0
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Minor
> Fix For: 3.0.0, 2.5.0
>
> Attachments: YARN-2163.patch, apps page.png
>
>
> Currently, AppId is treated as numeric, so the sort result in applications 
> table is sorted by int typed id only (not included cluster timestamp), see 
> attached screenshot. Order of AppId in web page should be consistent with 
> ApplicationId.compareTo().



--
This message was sent by Atlassian JIRA
(v6.2#6252)