[jira] [Commented] (YARN-4479) Retrospect app-priority in pendingOrderingPolicy during recovering applications

2016-01-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15085274#comment-15085274
 ] 

Hadoop QA commented on YARN-4479:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
51s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 58s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 16s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
30s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 43s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
21s {color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 6m 39s 
{color} | {color:red} branch/hadoop-yarn-project/hadoop-yarn no findbugs output 
file (hadoop-yarn-project/hadoop-yarn/target/findbugsXml.xml) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 57s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 32s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 
11s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 56s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 56s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 14s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 14s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
32s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 40s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
19s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 0s 
{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 6m 40s 
{color} | {color:red} patch/hadoop-yarn-project/hadoop-yarn no findbugs output 
file (hadoop-yarn-project/hadoop-yarn/target/findbugsXml.xml) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 56s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 31s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 75m 29s {color} 
| {color:red} hadoop-yarn in the patch failed with JDK v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 80m 25s {color} 
| {color:red} hadoop-yarn in the patch failed with JDK v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
22s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 209m 2s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestAMAuthorization |
|   | hadoop.yarn.server.resourcemanager.TestClientRMTokens |
| JDK v1.7.0_91 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestAMAuthorization |
|   | hadoop.yarn.server.resourcemanager.TestClientRMTokens |
\\
\\
|

[jira] [Commented] (YARN-4479) Retrospect app-priority in pendingOrderingPolicy during recovering applications

2016-01-06 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15085297#comment-15085297
 ] 

Rohith Sharma K S commented on YARN-4479:
-

test cases are unrelated to this patch. These tests failures will be handled in 
YARN-4478

> Retrospect app-priority in pendingOrderingPolicy during recovering 
> applications
> ---
>
> Key: YARN-4479
> URL: https://issues.apache.org/jira/browse/YARN-4479
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, resourcemanager
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Attachments: 0001-YARN-4479.patch, 0002-YARN-4479.patch, 
> 0003-YARN-4479.patch, 0004-YARN-4479.patch, 0004-YARN-4479.patch, 
> 0005-YARN-4479.patch, 0006-YARN-4479.patch
>
>
> Currently, same ordering policy is used for pending applications and active 
> applications. When priority is configured for an applications, during 
> recovery high priority application get activated first. It is possible that 
> low priority job was submitted and running state. 
> This causes low priority job in starvation after recovery



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4548) TestCapacityScheduler.testRecoverRequestAfterPreemption fails with NPE

2016-01-06 Thread Akihiro Suda (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akihiro Suda updated YARN-4548:
---
Attachment: yarn-4548.log

Reproduced again, and collected the log ({{yarn-4548.log}})

> TestCapacityScheduler.testRecoverRequestAfterPreemption fails with NPE
> --
>
> Key: YARN-4548
> URL: https://issues.apache.org/jira/browse/YARN-4548
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Akihiro Suda
> Attachments: yarn-4548.log
>
>
> {code}
> testRecoverRequestAfterPreemption(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler)
>   Time elapsed: 5.552 sec 
> <<< ERROR!
> java.lang.NullPointerException: null
>at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler.testRecoverRequestAfterPreemption(TestCapacitySch
> eduler.java:1263)
> {code}
> https://github.com/apache/hadoop/blob/d36b6e045f317c94e97cb41a163aa974d161a404/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java#L1260-L1263
> Jenkins also hit this two months ago: 
> https://mail-archives.apache.org/mod_mbox/hadoop-yarn-dev/201510.mbox/%3C1100047319.7290.1446252743553.JavaMail.jenkins@crius%3E
> My Hadoop version: 4e4b3a8465a8433e78e015cb1ce7e0dc1ebeb523 (Dec 30, 2015)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4549) Containers stuck in KILLING state

2016-01-06 Thread Danil Serdyuchenko (JIRA)
Danil Serdyuchenko created YARN-4549:


 Summary: Containers stuck in KILLING state
 Key: YARN-4549
 URL: https://issues.apache.org/jira/browse/YARN-4549
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.1
Reporter: Danil Serdyuchenko


We are running samza 0.8 on YARN 2.7.1 with {{LinuxContainerExecutor}} as the 
container-executor with cgroups configuration. Also we have NM recovery enabled.

We observe a lot of containers that get stuck in the KIILLING state after the 
NM tries to kill them. The container remains running indefinitely, this causes 
some duplication as new containers are brought up to replace them. Looking 
through the logs NM can't seem to get the container PID.

{noformat}
16/01/05 05:16:44 INFO containermanager.ContainerManagerImpl: Stopping 
container with container Id: container_1448454866800_0023_01_05
16/01/05 05:16:44 INFO nodemanager.NMAuditLogger: USER=ec2-user 
IP=10.51.111.243OPERATION=Stop Container Request
TARGET=ContainerManageImpl  RESULT=SUCCESS  
APPID=application_1448454866800_0023
CONTAINERID=container_1448454866800_0023_01_05
16/01/05 05:16:44 INFO container.ContainerImpl: Container 
container_1448454866800_0023_01_05 transitioned from RUNNING to KILLING
16/01/05 05:16:44 INFO launcher.ContainerLaunch: Cleaning up container 
container_1448454866800_0023_01_05
16/01/05 05:16:47 INFO launcher.ContainerLaunch: Could not get pid for 
container_1448454866800_0023_01_05. Waited for 2000 ms.
{noformat}

The PID files for each container seem to be present on the node. We waren't 
able to consistently replicate this and hoping that someone has come across 
this before.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4224) Support fetching entities by UID and change the REST interface to conform to current REST APIs' in YARN

2016-01-06 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15085426#comment-15085426
 ] 

Varun Saxena commented on YARN-4224:


It wont clash as per current APIs'.
But activeFlows also looks fine to me. We an disucss this during meeting.

> Support fetching entities by UID and change the REST interface to conform to 
> current REST APIs' in YARN
> ---
>
> Key: YARN-4224
> URL: https://issues.apache.org/jira/browse/YARN-4224
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4224-YARN-2928.01.patch, 
> YARN-4224-feature-YARN-2928.wip.02.patch, 
> YARN-4224-feature-YARN-2928.wip.03.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4306) Test failure: TestClientRMTokens

2016-01-06 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15085444#comment-15085444
 ] 

Rohith Sharma K S commented on YARN-4306:
-

Verified the test by patching HADOOP-12687 in ubuntu. The test cases are 
passing. We can wait for HADOOP-12687 get committed and close this JIRA

> Test failure: TestClientRMTokens
> 
>
> Key: YARN-4306
> URL: https://issues.apache.org/jira/browse/YARN-4306
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: test
>Reporter: Sunil G
>Assignee: Sunil G
>
> Tests are getting failed in local also. As part of HADOOP-12321 jenkins run, 
> I see same error.:
> {noformat}testShortCircuitRenewCancelDifferentHostSamePort(org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens)
>   Time elapsed: 0.638 sec  <<< FAILURE!
> java.lang.AssertionError: expected: but was:
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:144)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.checkShortCircuitRenewCancel(TestClientRMTokens.java:363)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.testShortCircuitRenewCancelDifferentHostSamePort(TestClientRMTokens.java:316)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4318) Test failure: TestAMAuthorization

2016-01-06 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15085449#comment-15085449
 ] 

Rohith Sharma K S commented on YARN-4318:
-

It seems root cause for the test failure is same as  HADOOP-12687.  Verified 
the test by patching HADOOP-12687 in ubuntu. The test cases are passing. We can 
wait for HADOOP-12687 get committed and close this JIRA. 

> Test failure: TestAMAuthorization
> -
>
> Key: YARN-4318
> URL: https://issues.apache.org/jira/browse/YARN-4318
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: test
>Affects Versions: 3.0.0
> Environment: jenkins
>Reporter: Tsuyoshi Ozawa
>Assignee: Kuhu Shukla
>
> {quote}
> Tests run: 4, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 14.891 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization
> testUnauthorizedAccess[0](org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization)
>   Time elapsed: 3.208 sec  <<< ERROR!
> java.net.UnknownHostException: Invalid host name: local host is: (unknown); 
> destination host is: "b5a5dd9ec835":8030; java.net.UnknownHostException; For 
> more details see:  http://wiki.apache.org/hadoop/UnknownHost
>   at org.apache.hadoop.ipc.Client$Connection.(Client.java:403)
>   at org.apache.hadoop.ipc.Client.getConnection(Client.java:1512)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1439)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1400)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
>   at com.sun.proxy.$Proxy15.registerApplicationMaster(Unknown Source)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.registerApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:106)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization.testUnauthorizedAccess(TestAMAuthorization.java:273)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4546) ResourceManager crash due to scheduling opportunity overflow

2016-01-06 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15085534#comment-15085534
 ] 

Junping Du commented on YARN-4546:
--

The test failures are not related and I believe there is several JIRAs to track 
them now. Committing the patch.

> ResourceManager crash due to scheduling opportunity overflow
> 
>
> Key: YARN-4546
> URL: https://issues.apache.org/jira/browse/YARN-4546
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.1
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Critical
> Attachments: YARN-4546.001.patch
>
>
> If a resource request lingers long enough unsatisfied then the scheduling 
> opportunities count for the request can overflow and cause an RM crash.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4546) ResourceManager crash due to scheduling opportunity overflow

2016-01-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1508#comment-1508
 ] 

Hudson commented on YARN-4546:
--

FAILURE: Integrated in Hadoop-trunk-Commit #9056 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9056/])
YARN-4546. ResourceManager crash due to scheduling opportunity overflow. 
(junping_du: rev c1462a67ff7bb632df50e1c52de971cced56c6a3)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestSchedulerApplicationAttempt.java
* hadoop-yarn-project/CHANGES.txt


> ResourceManager crash due to scheduling opportunity overflow
> 
>
> Key: YARN-4546
> URL: https://issues.apache.org/jira/browse/YARN-4546
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.1
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Critical
> Attachments: YARN-4546.001.patch
>
>
> If a resource request lingers long enough unsatisfied then the scheduling 
> opportunities count for the request can overflow and cause an RM crash.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4335) Allow ResourceRequests to specify ExecutionType of a request ask

2016-01-06 Thread Konstantinos Karanasos (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantinos Karanasos updated YARN-4335:
-
Attachment: YARN-4335.003.patch

Updating patch:
# Turned one of the new APIs from stable to unstable
# Additional information in the javadoc of the ExecutionType

> Allow ResourceRequests to specify ExecutionType of a request ask
> 
>
> Key: YARN-4335
> URL: https://issues.apache.org/jira/browse/YARN-4335
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Konstantinos Karanasos
>Assignee: Konstantinos Karanasos
> Attachments: YARN-4335-yarn-2877.001.patch, YARN-4335.002.patch, 
> YARN-4335.003.patch
>
>
> YARN-2882 introduced container types that are internal (not user-facing) and 
> are used by the ContainerManager during execution at the NM.
> With this JIRA we are introducing (user-facing) resource request types that 
> are used by the AM to specify the type of the ResourceRequest.
> We will initially support two resource request types: CONSERVATIVE and 
> OPTIMISTIC.
> CONSERVATIVE resource requests will be handed internally to containers of 
> GUARANTEED type, whereas OPTIMISTIC resource requests will be handed to 
> QUEUEABLE containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3940) Application moveToQueue should check NodeLabel permission

2016-01-06 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-3940:
---
Attachment: 0003-YARN-3940.patch

[~leftnoteasy]

Attaching path for review
* Check for destination label subset of source label done and also resource 
usage check 

> Application moveToQueue should check NodeLabel permission 
> --
>
> Key: YARN-3940
> URL: https://issues.apache.org/jira/browse/YARN-3940
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Attachments: 0001-YARN-3940.patch, 0002-YARN-3940.patch, 
> 0003-YARN-3940.patch
>
>
> Configure capacity scheduler 
> Configure node label an submit application {{queue=A Label=X}}
> Move application to queue {{B}} and x is not having access
> {code}
> 2015-07-20 19:46:19,626 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
>  Application attempt appattempt_1437385548409_0005_01 released container 
> container_e08_1437385548409_0005_01_02 on node: host: 
> host-10-19-92-117:64318 #containers=1 available= 
> used= with event: KILL
> 2015-07-20 19:46:20,970 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: 
> Invalid resource ask by application appattempt_1437385548409_0005_01
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request, queue=b1 doesn't have permission to access all labels in 
> resource request. labelExpression of resource request=x. Queue labels=y
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:304)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:234)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndvalidateRequest(SchedulerUtils.java:250)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.normalizeAndValidateRequests(RMServerUtils.java:106)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:515)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
> at 
> org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:636)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:976)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2174)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2170)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1666)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2168)
> {code}
> Same exception will be thrown till *heartbeat timeout*
> Then application state will be updated to *FAILED*



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (YARN-2902) Killing a container that is localizing can orphan resources in the DOWNLOADING state

2016-01-06 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena reopened YARN-2902:


Reopening issue to run QA for branch-2.6 patch

> Killing a container that is localizing can orphan resources in the 
> DOWNLOADING state
> 
>
> Key: YARN-2902
> URL: https://issues.apache.org/jira/browse/YARN-2902
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.5.0
>Reporter: Jason Lowe
>Assignee: Varun Saxena
> Fix For: 2.7.2
>
> Attachments: YARN-2902-branch-2.6.01.patch, YARN-2902.002.patch, 
> YARN-2902.03.patch, YARN-2902.04.patch, YARN-2902.05.patch, 
> YARN-2902.06.patch, YARN-2902.07.patch, YARN-2902.08.patch, 
> YARN-2902.09.patch, YARN-2902.10.patch, YARN-2902.11.patch, YARN-2902.patch
>
>
> If a container is in the process of localizing when it is stopped/killed then 
> resources are left in the DOWNLOADING state.  If no other container comes 
> along and requests these resources they linger around with no reference 
> counts but aren't cleaned up during normal cache cleanup scans since it will 
> never delete resources in the DOWNLOADING state even if their reference count 
> is zero.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3934) Application with large ApplicationSubmissionContext can cause RM to exit when ZK store is used

2016-01-06 Thread Dustin Cote (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dustin Cote updated YARN-3934:
--
Attachment: YARN-3934-1.patch

Here's a first attempt at the fix.  We cannot know with certainty what ZK has 
set for jute.maxbuffer on the server side, so we have to make the assumption 
that it matches what is on the client side (in this case the RM).  I've setup 
the code to read the property as a system property which is how we normally 
specify it.  There may be a desire to standardize it into the YARN config later 
on, but I think that's outside the scope of fixing this.  Without the patch, 
the ZK connection is broken and retried by default *1000* times, so the RM 
doesn't go down for awhile and all applications are blocked from submission.  I 
think it's probably worth revisiting that default value as well, but I'd like 
some feedback from reviewers on that if we should open a separate JIRA there.

> Application with large ApplicationSubmissionContext can cause RM to exit when 
> ZK store is used
> --
>
> Key: YARN-3934
> URL: https://issues.apache.org/jira/browse/YARN-3934
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Ming Ma
>Assignee: Dustin Cote
> Attachments: YARN-3934-1.patch
>
>
> Use the following steps to test.
> 1. Set up ZK as the RM HA store.
> 2. Submit a job that refers to lots of distributed cache files with long HDFS 
> path, which will cause the app state size to exceed ZK's max object size 
> limit.
> 3. RM can't write to ZK and exit with the following exception.
> {noformat}
> 2015-07-10 22:21:13,002 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a 
> org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type 
> STATE_STORE_OP_FAILED. Cause:
> org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode 
> = Session expired
> at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
> at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:935)
> at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:944)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:941)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1083)
> {noformat}
> In this case, RM could have rejected the app during submitApplication RPC if 
> the size of ApplicationSubmissionContext is too large.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4550) some tests in TestContainerLanch fails on non-english locale environment

2016-01-06 Thread Takashi Ohnishi (JIRA)
Takashi Ohnishi created YARN-4550:
-

 Summary: some tests in TestContainerLanch fails on non-english 
locale environment
 Key: YARN-4550
 URL: https://issues.apache.org/jira/browse/YARN-4550
 Project: Hadoop YARN
  Issue Type: Test
  Components: nodemanager
Affects Versions: 2.7.1
 Environment: CentOS 7 with below locale configurations:
{code}
$ locale
LANG=ja_JP.UTF-8
LC_CTYPE="ja_JP.UTF-8"
LC_NUMERIC="ja_JP.UTF-8"
LC_TIME="ja_JP.UTF-8"
LC_COLLATE="ja_JP.UTF-8"
LC_MONETARY="ja_JP.UTF-8"
LC_MESSAGES="ja_JP.UTF-8"
LC_PAPER="ja_JP.UTF-8"
LC_NAME="ja_JP.UTF-8"
LC_ADDRESS="ja_JP.UTF-8"
LC_TELEPHONE="ja_JP.UTF-8"
LC_MEASUREMENT="ja_JP.UTF-8"
LC_IDENTIFICATION="ja_JP.UTF-8"
LC_ALL=
{code}
Reporter: Takashi Ohnishi
Priority: Minor


The tests listed below fail.

* testErrorLogOnContainerExitWithMultipleFiles
* testErrorLogOnContainerExitWithCustomPattern
* testErrorLogOnContainerExitForCase
* testErrorLogOnContainerExit
* testErrorLogOnContainerExitForExt

The failures happen in same place.

{code}
java.lang.AssertionError: Should contain contents of error Log
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.assertTrue(Assert.java:41)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch$ContainerExitHandler.handle(TestContainerLaunch.java:633)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch$ContainerExitHandler.handle(TestContainerLaunch.java:602)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.handleContainerExitWithFailure(ContainerLaunch.java:438)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:359)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.verifyTailErrorLogOnContainerExit(TestContainerLaunch.java:597)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.testErrorLogOnContainerExitWithMultipleFiles(TestContainerLaunch.java:528)
{code}

All these tests call verifyTailErrorLogOnContainerExit and it calls below code.

{code:title=TestContainerLaunch.java}
 632 Assert.assertTrue("Should contain contents of error Log",  
 
 633 exitEvent.getDiagnosticInfo().contains(
 634 INVALID_JAVA_HOME + "/bin/java: No such file or 
directory"));
 635   }
{code}

In environment with non-english locale, this fails because the error message 
returned with non-english text like below.
{code}
2016-01-06 23:27:45,427 INFO  [main] containermanager.BaseContainerManagerTest 
(TestContainerLaunch.java:handle(622)) - Diagnostic Info : Container exited 
with a non-zero exit code 127. Error files: stderr.log, stdout.
Last 4096 bytes of stderr.log :
/bin/bash: /no/jvm/here/bin/java: そのようなファイルやディレクトリはありません
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4550) some tests in TestContainerLanch fails on non-english locale environment

2016-01-06 Thread Takashi Ohnishi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15085606#comment-15085606
 ] 

Takashi Ohnishi commented on YARN-4550:
---

I found that I can avoid these failures with 

{code}
$ LANG=C man test
{code}
 or 
{code}
$ man test -Duser.language=en'
{code}

> some tests in TestContainerLanch fails on non-english locale environment
> 
>
> Key: YARN-4550
> URL: https://issues.apache.org/jira/browse/YARN-4550
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: nodemanager
>Affects Versions: 2.7.1
> Environment: CentOS 7 with below locale configurations:
> {code}
> $ locale
> LANG=ja_JP.UTF-8
> LC_CTYPE="ja_JP.UTF-8"
> LC_NUMERIC="ja_JP.UTF-8"
> LC_TIME="ja_JP.UTF-8"
> LC_COLLATE="ja_JP.UTF-8"
> LC_MONETARY="ja_JP.UTF-8"
> LC_MESSAGES="ja_JP.UTF-8"
> LC_PAPER="ja_JP.UTF-8"
> LC_NAME="ja_JP.UTF-8"
> LC_ADDRESS="ja_JP.UTF-8"
> LC_TELEPHONE="ja_JP.UTF-8"
> LC_MEASUREMENT="ja_JP.UTF-8"
> LC_IDENTIFICATION="ja_JP.UTF-8"
> LC_ALL=
> {code}
>Reporter: Takashi Ohnishi
>Priority: Minor
>
> The tests listed below fail.
> * testErrorLogOnContainerExitWithMultipleFiles
> * testErrorLogOnContainerExitWithCustomPattern
> * testErrorLogOnContainerExitForCase
> * testErrorLogOnContainerExit
> * testErrorLogOnContainerExitForExt
> The failures happen in same place.
> {code}
> java.lang.AssertionError: Should contain contents of error Log
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.assertTrue(Assert.java:41)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch$ContainerExitHandler.handle(TestContainerLaunch.java:633)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch$ContainerExitHandler.handle(TestContainerLaunch.java:602)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.handleContainerExitWithFailure(ContainerLaunch.java:438)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:359)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.verifyTailErrorLogOnContainerExit(TestContainerLaunch.java:597)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.testErrorLogOnContainerExitWithMultipleFiles(TestContainerLaunch.java:528)
> {code}
> All these tests call verifyTailErrorLogOnContainerExit and it calls below 
> code.
> {code:title=TestContainerLaunch.java}
>  632 Assert.assertTrue("Should contain contents of error Log",
>
>  633 exitEvent.getDiagnosticInfo().contains(
>  634 INVALID_JAVA_HOME + "/bin/java: No such file or 
> directory"));
>  635   }
> {code}
> In environment with non-english locale, this fails because the error message 
> returned with non-english text like below.
> {code}
> 2016-01-06 23:27:45,427 INFO  [main] 
> containermanager.BaseContainerManagerTest 
> (TestContainerLaunch.java:handle(622)) - Diagnostic Info : Container exited 
> with a non-zero exit code 127. Error files: stderr.log, stdout.
> Last 4096 bytes of stderr.log :
> /bin/bash: /no/jvm/here/bin/java: そのようなファイルやディレクトリはありません
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4550) some tests in TestContainerLanch fails on non-english locale environment

2016-01-06 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15085607#comment-15085607
 ] 

Steve Loughran commented on YARN-4550:
--

funny. We should look for just the "INVALID_JAVA_HOME + "/bin/java" string

> some tests in TestContainerLanch fails on non-english locale environment
> 
>
> Key: YARN-4550
> URL: https://issues.apache.org/jira/browse/YARN-4550
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: nodemanager
>Affects Versions: 2.7.1
> Environment: CentOS 7 with below locale configurations:
> {code}
> $ locale
> LANG=ja_JP.UTF-8
> LC_CTYPE="ja_JP.UTF-8"
> LC_NUMERIC="ja_JP.UTF-8"
> LC_TIME="ja_JP.UTF-8"
> LC_COLLATE="ja_JP.UTF-8"
> LC_MONETARY="ja_JP.UTF-8"
> LC_MESSAGES="ja_JP.UTF-8"
> LC_PAPER="ja_JP.UTF-8"
> LC_NAME="ja_JP.UTF-8"
> LC_ADDRESS="ja_JP.UTF-8"
> LC_TELEPHONE="ja_JP.UTF-8"
> LC_MEASUREMENT="ja_JP.UTF-8"
> LC_IDENTIFICATION="ja_JP.UTF-8"
> LC_ALL=
> {code}
>Reporter: Takashi Ohnishi
>Priority: Minor
>
> The tests listed below fail.
> * testErrorLogOnContainerExitWithMultipleFiles
> * testErrorLogOnContainerExitWithCustomPattern
> * testErrorLogOnContainerExitForCase
> * testErrorLogOnContainerExit
> * testErrorLogOnContainerExitForExt
> The failures happen in same place.
> {code}
> java.lang.AssertionError: Should contain contents of error Log
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.assertTrue(Assert.java:41)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch$ContainerExitHandler.handle(TestContainerLaunch.java:633)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch$ContainerExitHandler.handle(TestContainerLaunch.java:602)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.handleContainerExitWithFailure(ContainerLaunch.java:438)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:359)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.verifyTailErrorLogOnContainerExit(TestContainerLaunch.java:597)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.testErrorLogOnContainerExitWithMultipleFiles(TestContainerLaunch.java:528)
> {code}
> All these tests call verifyTailErrorLogOnContainerExit and it calls below 
> code.
> {code:title=TestContainerLaunch.java}
>  632 Assert.assertTrue("Should contain contents of error Log",
>
>  633 exitEvent.getDiagnosticInfo().contains(
>  634 INVALID_JAVA_HOME + "/bin/java: No such file or 
> directory"));
>  635   }
> {code}
> In environment with non-english locale, this fails because the error message 
> returned with non-english text like below.
> {code}
> 2016-01-06 23:27:45,427 INFO  [main] 
> containermanager.BaseContainerManagerTest 
> (TestContainerLaunch.java:handle(622)) - Diagnostic Info : Container exited 
> with a non-zero exit code 127. Error files: stderr.log, stdout.
> Last 4096 bytes of stderr.log :
> /bin/bash: /no/jvm/here/bin/java: そのようなファイルやディレクトリはありません
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-1593) support out-of-proc AuxiliaryServices

2016-01-06 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du reassigned YARN-1593:


Assignee: Junping Du

> support out-of-proc AuxiliaryServices
> -
>
> Key: YARN-1593
> URL: https://issues.apache.org/jira/browse/YARN-1593
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, rolling upgrade
>Reporter: Ming Ma
>Assignee: Junping Du
>
> AuxiliaryServices such as ShuffleHandler currently run in the same process as 
> NM. There are some benefits to host them in dedicated processes.
> 1. NM rolling restart. If we want to upgrade YARN , NM restart will force the 
> ShuffleHandler restart. If ShuffleHandler runs as a separate process, 
> ShuffleHandler can continue to run during NM restart. NM can reconnect the 
> the running ShuffleHandler after restart.
> 2. Resource management. It is possible another type of AuxiliaryServices will 
> be implemented. AuxiliaryServices are considered YARN application specific 
> and could consume lots of resources. Running AuxiliaryServices in separate 
> processes allow easier resource management. NM could potentially stop a 
> specific AuxiliaryServices process from running if it consumes resource way 
> above its allocation.
> Here are some high level ideas:
> 1. NM provides a hosting process for each AuxiliaryService. Existing 
> AuxiliaryService API doesn't change.
> 2. The hosting process provides RPC server for AuxiliaryService proxy object 
> inside NM to connect to.
> 3. When we rolling restart NM, the existing AuxiliaryService processes will 
> continue to run. NM could reconnect to the running AuxiliaryService processes 
> upon restart.
> 4. Policy and resource management of AuxiliaryServices. So far we don't have 
> immediate need for this. AuxiliaryService could run inside a container and 
> its resource utilization could be taken into account by RM and RM could 
> consider a specific type of applications overutilize cluster resource.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4551) Address the duplication between StatusUpdateWhenHealthy and StatusUpdateWhenUnhealthy transitions

2016-01-06 Thread Karthik Kambatla (JIRA)
Karthik Kambatla created YARN-4551:
--

 Summary: Address the duplication between StatusUpdateWhenHealthy 
and StatusUpdateWhenUnhealthy transitions
 Key: YARN-4551
 URL: https://issues.apache.org/jira/browse/YARN-4551
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.8.0
Reporter: Karthik Kambatla
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4550) some tests in TestContainerLanch fails on non-english locale environment

2016-01-06 Thread Takashi Ohnishi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15085634#comment-15085634
 ] 

Takashi Ohnishi commented on YARN-4550:
---

s/man/mvn/g

> some tests in TestContainerLanch fails on non-english locale environment
> 
>
> Key: YARN-4550
> URL: https://issues.apache.org/jira/browse/YARN-4550
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: nodemanager
>Affects Versions: 2.7.1
> Environment: CentOS 7 with below locale configurations:
> {code}
> $ locale
> LANG=ja_JP.UTF-8
> LC_CTYPE="ja_JP.UTF-8"
> LC_NUMERIC="ja_JP.UTF-8"
> LC_TIME="ja_JP.UTF-8"
> LC_COLLATE="ja_JP.UTF-8"
> LC_MONETARY="ja_JP.UTF-8"
> LC_MESSAGES="ja_JP.UTF-8"
> LC_PAPER="ja_JP.UTF-8"
> LC_NAME="ja_JP.UTF-8"
> LC_ADDRESS="ja_JP.UTF-8"
> LC_TELEPHONE="ja_JP.UTF-8"
> LC_MEASUREMENT="ja_JP.UTF-8"
> LC_IDENTIFICATION="ja_JP.UTF-8"
> LC_ALL=
> {code}
>Reporter: Takashi Ohnishi
>Priority: Minor
>
> The tests listed below fail.
> * testErrorLogOnContainerExitWithMultipleFiles
> * testErrorLogOnContainerExitWithCustomPattern
> * testErrorLogOnContainerExitForCase
> * testErrorLogOnContainerExit
> * testErrorLogOnContainerExitForExt
> The failures happen in same place.
> {code}
> java.lang.AssertionError: Should contain contents of error Log
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.assertTrue(Assert.java:41)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch$ContainerExitHandler.handle(TestContainerLaunch.java:633)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch$ContainerExitHandler.handle(TestContainerLaunch.java:602)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.handleContainerExitWithFailure(ContainerLaunch.java:438)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:359)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.verifyTailErrorLogOnContainerExit(TestContainerLaunch.java:597)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.testErrorLogOnContainerExitWithMultipleFiles(TestContainerLaunch.java:528)
> {code}
> All these tests call verifyTailErrorLogOnContainerExit and it calls below 
> code.
> {code:title=TestContainerLaunch.java}
>  632 Assert.assertTrue("Should contain contents of error Log",
>
>  633 exitEvent.getDiagnosticInfo().contains(
>  634 INVALID_JAVA_HOME + "/bin/java: No such file or 
> directory"));
>  635   }
> {code}
> In environment with non-english locale, this fails because the error message 
> returned with non-english text like below.
> {code}
> 2016-01-06 23:27:45,427 INFO  [main] 
> containermanager.BaseContainerManagerTest 
> (TestContainerLaunch.java:handle(622)) - Diagnostic Info : Container exited 
> with a non-zero exit code 127. Error files: stderr.log, stdout.
> Last 4096 bytes of stderr.log :
> /bin/bash: /no/jvm/here/bin/java: そのようなファイルやディレクトリはありません
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4550) some tests in TestContainerLanch fails on non-english locale environment

2016-01-06 Thread Takashi Ohnishi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15085662#comment-15085662
 ] 

Takashi Ohnishi commented on YARN-4550:
---

> Steve Loughran

Ah! 
If makes so, these failure does not happen.
I will create a patch

> some tests in TestContainerLanch fails on non-english locale environment
> 
>
> Key: YARN-4550
> URL: https://issues.apache.org/jira/browse/YARN-4550
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: nodemanager
>Affects Versions: 2.7.1
> Environment: CentOS 7 with below locale configurations:
> {code}
> $ locale
> LANG=ja_JP.UTF-8
> LC_CTYPE="ja_JP.UTF-8"
> LC_NUMERIC="ja_JP.UTF-8"
> LC_TIME="ja_JP.UTF-8"
> LC_COLLATE="ja_JP.UTF-8"
> LC_MONETARY="ja_JP.UTF-8"
> LC_MESSAGES="ja_JP.UTF-8"
> LC_PAPER="ja_JP.UTF-8"
> LC_NAME="ja_JP.UTF-8"
> LC_ADDRESS="ja_JP.UTF-8"
> LC_TELEPHONE="ja_JP.UTF-8"
> LC_MEASUREMENT="ja_JP.UTF-8"
> LC_IDENTIFICATION="ja_JP.UTF-8"
> LC_ALL=
> {code}
>Reporter: Takashi Ohnishi
>Priority: Minor
>
> The tests listed below fail.
> * testErrorLogOnContainerExitWithMultipleFiles
> * testErrorLogOnContainerExitWithCustomPattern
> * testErrorLogOnContainerExitForCase
> * testErrorLogOnContainerExit
> * testErrorLogOnContainerExitForExt
> The failures happen in same place.
> {code}
> java.lang.AssertionError: Should contain contents of error Log
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.assertTrue(Assert.java:41)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch$ContainerExitHandler.handle(TestContainerLaunch.java:633)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch$ContainerExitHandler.handle(TestContainerLaunch.java:602)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.handleContainerExitWithFailure(ContainerLaunch.java:438)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:359)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.verifyTailErrorLogOnContainerExit(TestContainerLaunch.java:597)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.testErrorLogOnContainerExitWithMultipleFiles(TestContainerLaunch.java:528)
> {code}
> All these tests call verifyTailErrorLogOnContainerExit and it calls below 
> code.
> {code:title=TestContainerLaunch.java}
>  632 Assert.assertTrue("Should contain contents of error Log",
>
>  633 exitEvent.getDiagnosticInfo().contains(
>  634 INVALID_JAVA_HOME + "/bin/java: No such file or 
> directory"));
>  635   }
> {code}
> In environment with non-english locale, this fails because the error message 
> returned with non-english text like below.
> {code}
> 2016-01-06 23:27:45,427 INFO  [main] 
> containermanager.BaseContainerManagerTest 
> (TestContainerLaunch.java:handle(622)) - Diagnostic Info : Container exited 
> with a non-zero exit code 127. Error files: stderr.log, stdout.
> Last 4096 bytes of stderr.log :
> /bin/bash: /no/jvm/here/bin/java: そのようなファイルやディレクトリはありません
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4550) some tests in TestContainerLanch fails on non-english locale environment

2016-01-06 Thread Takashi Ohnishi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takashi Ohnishi updated YARN-4550:
--
Attachment: YARN-4550.1.patch

> some tests in TestContainerLanch fails on non-english locale environment
> 
>
> Key: YARN-4550
> URL: https://issues.apache.org/jira/browse/YARN-4550
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: nodemanager
>Affects Versions: 2.7.1
> Environment: CentOS 7 with below locale configurations:
> {code}
> $ locale
> LANG=ja_JP.UTF-8
> LC_CTYPE="ja_JP.UTF-8"
> LC_NUMERIC="ja_JP.UTF-8"
> LC_TIME="ja_JP.UTF-8"
> LC_COLLATE="ja_JP.UTF-8"
> LC_MONETARY="ja_JP.UTF-8"
> LC_MESSAGES="ja_JP.UTF-8"
> LC_PAPER="ja_JP.UTF-8"
> LC_NAME="ja_JP.UTF-8"
> LC_ADDRESS="ja_JP.UTF-8"
> LC_TELEPHONE="ja_JP.UTF-8"
> LC_MEASUREMENT="ja_JP.UTF-8"
> LC_IDENTIFICATION="ja_JP.UTF-8"
> LC_ALL=
> {code}
>Reporter: Takashi Ohnishi
>Priority: Minor
> Attachments: YARN-4550.1.patch
>
>
> The tests listed below fail.
> * testErrorLogOnContainerExitWithMultipleFiles
> * testErrorLogOnContainerExitWithCustomPattern
> * testErrorLogOnContainerExitForCase
> * testErrorLogOnContainerExit
> * testErrorLogOnContainerExitForExt
> The failures happen in same place.
> {code}
> java.lang.AssertionError: Should contain contents of error Log
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.assertTrue(Assert.java:41)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch$ContainerExitHandler.handle(TestContainerLaunch.java:633)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch$ContainerExitHandler.handle(TestContainerLaunch.java:602)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.handleContainerExitWithFailure(ContainerLaunch.java:438)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:359)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.verifyTailErrorLogOnContainerExit(TestContainerLaunch.java:597)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.testErrorLogOnContainerExitWithMultipleFiles(TestContainerLaunch.java:528)
> {code}
> All these tests call verifyTailErrorLogOnContainerExit and it calls below 
> code.
> {code:title=TestContainerLaunch.java}
>  632 Assert.assertTrue("Should contain contents of error Log",
>
>  633 exitEvent.getDiagnosticInfo().contains(
>  634 INVALID_JAVA_HOME + "/bin/java: No such file or 
> directory"));
>  635   }
> {code}
> In environment with non-english locale, this fails because the error message 
> returned with non-english text like below.
> {code}
> 2016-01-06 23:27:45,427 INFO  [main] 
> containermanager.BaseContainerManagerTest 
> (TestContainerLaunch.java:handle(622)) - Diagnostic Info : Container exited 
> with a non-zero exit code 127. Error files: stderr.log, stdout.
> Last 4096 bytes of stderr.log :
> /bin/bash: /no/jvm/here/bin/java: そのようなファイルやディレクトリはありません
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2902) Killing a container that is localizing can orphan resources in the DOWNLOADING state

2016-01-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15085669#comment-15085669
 ] 

Hadoop QA commented on YARN-2902:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 
0s {color} | {color:green} branch-2.6 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s 
{color} | {color:green} branch-2.6 passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s 
{color} | {color:green} branch-2.6 passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
16s {color} | {color:green} branch-2.6 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 33s 
{color} | {color:green} branch-2.6 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
17s {color} | {color:green} branch-2.6 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 1s 
{color} | {color:green} branch-2.6 passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 21s 
{color} | {color:red} hadoop-yarn-server-nodemanager in branch-2.6 failed with 
JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s 
{color} | {color:green} branch-2.6 passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
27s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 28s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
12s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 28s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
12s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 1s 
{color} | {color:red} The patch has 4465 line(s) that end in whitespace. Use 
git apply --whitespace=fix. {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 1m 59s 
{color} | {color:red} The patch has 272 line(s) with tabs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 1s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 21s 
{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed with 
JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 6m 4s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 6m 8s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.7.0_91. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 32s 
{color} | {color:red} Patch generated 75 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 46m 39s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:date2016-01-06 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12780377/YARN-2902-branch-2.6.01.patch
 |
| JIRA Issue | YARN-2902 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 9e273af6efbe 3.1

[jira] [Commented] (YARN-4551) Address the duplication between StatusUpdateWhenHealthy and StatusUpdateWhenUnhealthy transitions

2016-01-06 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15085675#comment-15085675
 ] 

Sunil G commented on YARN-4551:
---

Yes [~kasha] , This will definitely clean up some code there. I could help here 
if you are not planning.

> Address the duplication between StatusUpdateWhenHealthy and 
> StatusUpdateWhenUnhealthy transitions
> -
>
> Key: YARN-4551
> URL: https://issues.apache.org/jira/browse/YARN-4551
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>Priority: Minor
>  Labels: newbie
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1011) [Umbrella] Schedule containers based on utilization of currently allocated containers

2016-01-06 Thread Nathan Roberts (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15085677#comment-15085677
 ] 

Nathan Roberts commented on YARN-1011:
--

bq. This is one of the reasons I was proposing the notion of a max threshold 
which is less than 1 If the utilization goes to 100%, we clearly know there is 
contention. Since we measure resource utilization in resource-seconds (if not, 
we should update it), bursty spikes alone wouldn't take utilization over 100%. 
So, we shouldn't see a utilization greater than 100%.

Just to make sure I understand. When you say max threshold < 1 are you saying 
an NM could not advertise 48 vcores if there are only 24 vcores physically 
available? I think we have to support going above 1.0. We already go above 1.0 
on our clusters, even without this feature. What I'm thinking this feature will 
allow us to do is to go significantly above 1.0, especially for resources like 
memory where we have to be much more careful about not hitting 100%. 

One use case that I'm really hoping this feature can support is a batch cluster 
(loose SLAs) with very high utilization. For this use case, I'd like the 
following to be true:
- nodes can be at 100% CPU, 100% Network, or 100% Disk for long periods of time 
(several minutes). Memory could get to something like 80% before corrective 
action would be required. During these periods, no containers get shot to shed 
load. Nodemanagers might reduce their available resource advertised to the RM, 
but nothing would need to be killed.
- Both GUARANTEED and OPPORTUNISTIC containers get their fair share of 
resources. They're both drawing from the same capacity and user-limit from the 
RM's point of view so I feel like they should be given their fair set of 
resources on the nodes they execute on. The real point of being designated 
OPPORTUNISTIC in this use case is that the NM knows which containers to kill 
when it needs to shed load.  

Another use case is where you have a mixture of jobs, some with tight SLAs, 
some with looser SLAs. This one is mentioned in previous comments and is also 
very important. It requires a different set of thresholds and a different level 
of fairness controls. 

So, I just think things have to be configurable enough to handle both types of 
clusters. 


> [Umbrella] Schedule containers based on utilization of currently allocated 
> containers
> -
>
> Key: YARN-1011
> URL: https://issues.apache.org/jira/browse/YARN-1011
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Arun C Murthy
> Attachments: yarn-1011-design-v0.pdf, yarn-1011-design-v1.pdf
>
>
> Currently RM allocates containers and assumes resources allocated are 
> utilized.
> RM can, and should, get to a point where it measures utilization of allocated 
> containers and, if appropriate, allocate more (speculative?) containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3940) Application moveToQueue should check NodeLabel permission

2016-01-06 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-3940:
---
Attachment: 0004-YARN-3940.patch

> Application moveToQueue should check NodeLabel permission 
> --
>
> Key: YARN-3940
> URL: https://issues.apache.org/jira/browse/YARN-3940
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Attachments: 0001-YARN-3940.patch, 0002-YARN-3940.patch, 
> 0003-YARN-3940.patch, 0004-YARN-3940.patch
>
>
> Configure capacity scheduler 
> Configure node label an submit application {{queue=A Label=X}}
> Move application to queue {{B}} and x is not having access
> {code}
> 2015-07-20 19:46:19,626 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
>  Application attempt appattempt_1437385548409_0005_01 released container 
> container_e08_1437385548409_0005_01_02 on node: host: 
> host-10-19-92-117:64318 #containers=1 available= 
> used= with event: KILL
> 2015-07-20 19:46:20,970 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: 
> Invalid resource ask by application appattempt_1437385548409_0005_01
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request, queue=b1 doesn't have permission to access all labels in 
> resource request. labelExpression of resource request=x. Queue labels=y
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:304)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:234)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndvalidateRequest(SchedulerUtils.java:250)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.normalizeAndValidateRequests(RMServerUtils.java:106)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:515)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
> at 
> org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:636)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:976)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2174)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2170)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1666)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2168)
> {code}
> Same exception will be thrown till *heartbeat timeout*
> Then application state will be updated to *FAILED*



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4550) some tests in TestContainerLanch fails on non-english locale environment

2016-01-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15085730#comment-15085730
 ] 

Hadoop QA commented on YARN-4550:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
10s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
11s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 30s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 0s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
26s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
11s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 28s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
11s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 7s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 48s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 19s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
18s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 35m 7s {color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12780767/YARN-4550.1.patch |
| JIRA Issue | YARN-4550 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux d3deee3aa61d 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / c1462a6 |
| Default Java | 1.7.0_91 |
| Multi-JDK versions |

[jira] [Commented] (YARN-4549) Containers stuck in KILLING state

2016-01-06 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15085737#comment-15085737
 ] 

Jason Lowe commented on YARN-4549:
--

Did the kill occur shortly after the container was started?  I'm wondering if 
the pid file somehow appeared _after_ the attempt to kill.  What does {{ls -l 
--full-time}} show for the pid file, and how does that correlate to the 
timestamps in the NM log?  Also just to verify it's in the right place, where 
is the pid file located relative to the yarn local directory root?

You mentioned NM recovery is enabled.  Does this only occur on containers that 
were recovered on NM startup or also for containers that are started and killed 
within the same NM session?


> Containers stuck in KILLING state
> -
>
> Key: YARN-4549
> URL: https://issues.apache.org/jira/browse/YARN-4549
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Danil Serdyuchenko
>
> We are running samza 0.8 on YARN 2.7.1 with {{LinuxContainerExecutor}} as the 
> container-executor with cgroups configuration. Also we have NM recovery 
> enabled.
> We observe a lot of containers that get stuck in the KIILLING state after the 
> NM tries to kill them. The container remains running indefinitely, this 
> causes some duplication as new containers are brought up to replace them. 
> Looking through the logs NM can't seem to get the container PID.
> {noformat}
> 16/01/05 05:16:44 INFO containermanager.ContainerManagerImpl: Stopping 
> container with container Id: container_1448454866800_0023_01_05
> 16/01/05 05:16:44 INFO nodemanager.NMAuditLogger: USER=ec2-user 
> IP=10.51.111.243OPERATION=Stop Container Request
> TARGET=ContainerManageImpl  RESULT=SUCCESS  
> APPID=application_1448454866800_0023
> CONTAINERID=container_1448454866800_0023_01_05
> 16/01/05 05:16:44 INFO container.ContainerImpl: Container 
> container_1448454866800_0023_01_05 transitioned from RUNNING to KILLING
> 16/01/05 05:16:44 INFO launcher.ContainerLaunch: Cleaning up container 
> container_1448454866800_0023_01_05
> 16/01/05 05:16:47 INFO launcher.ContainerLaunch: Could not get pid for 
> container_1448454866800_0023_01_05. Waited for 2000 ms.
> {noformat}
> The PID files for each container seem to be present on the node. We waren't 
> able to consistently replicate this and hoping that someone has come across 
> this before.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1011) [Umbrella] Schedule containers based on utilization of currently allocated containers

2016-01-06 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15085738#comment-15085738
 ] 

Karthik Kambatla commented on YARN-1011:


bq, Just to make sure I understand. When you say max threshold < 1 are you 
saying an NM could not advertise 48 vcores if there are only 24 vcores 
physically available?
You can continue to advertise more vcores. 

Consider a cluster with nodes of 1 physical core. Let us say each node 
advertises 10 *vcores*. Today, let us say your CPU utilization under these 
settings is 50% running 10 containers. All these containers in this context 
would be GUARANTEED containers. I am proposing we set a max threshold for the 
RM over-allocating containers to 95%.This essentially means, the RM allocates 
OPPORTUNISTIC containers on this node (that has been previously fully 
allocated) until we hit the utilization threshold of 95% - say, running 19 
containers. At this point if one container's usage goes higher taking us beyond 
95%, we kill enough OPPORTUNISTIC containers to bring this under 95%. May be, 
the max allowed threshold could be higher - 99%. I am wary of setting it to 
100% unless we have some other way of differentiating "running comfortably at 
100%" vs "contention at 100%" because both look the same.  Also, I am assuming 
people would be very happy with 95% utilization if we achieve that :)

bq. nodes can be at 100% CPU, 100% Network, or 100% Disk for long periods of 
time (several minutes). Memory could get to something like 80% before 
corrective action would be required. 
I am beginning to see the need for different thresholds for different 
resources. While I wouldn't necessarily shoot for 100, I can see someone 
configuring it to 95% CPU, 85% network (as this could spike significantly with 
shuffle etc.), 90% disk, 80% memory. And, we would stop over-allocating the 
moment we hit *any one* of these thresholds. 

Should we keep it simple to begin with and have one config, and add other 
configs in the future? Or, do you think the config-per-resource should be there 
from the get go? 

> [Umbrella] Schedule containers based on utilization of currently allocated 
> containers
> -
>
> Key: YARN-1011
> URL: https://issues.apache.org/jira/browse/YARN-1011
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Arun C Murthy
> Attachments: yarn-1011-design-v0.pdf, yarn-1011-design-v1.pdf
>
>
> Currently RM allocates containers and assumes resources allocated are 
> utilized.
> RM can, and should, get to a point where it measures utilization of allocated 
> containers and, if appropriate, allocate more (speculative?) containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (YARN-1011) [Umbrella] Schedule containers based on utilization of currently allocated containers

2016-01-06 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15085738#comment-15085738
 ] 

Karthik Kambatla edited comment on YARN-1011 at 1/6/16 4:12 PM:


bq. Just to make sure I understand. When you say max threshold < 1 are you 
saying an NM could not advertise 48 vcores if there are only 24 vcores 
physically available?

You can continue to advertise more vcores. 

Consider a cluster with nodes of 1 physical core. Let us say each node 
advertises 10 *vcores*. Today, let us say your CPU utilization under these 
settings is 50% running 10 containers. All these containers in this context 
would be GUARANTEED containers. I am proposing we set a max threshold for the 
RM over-allocating containers to 95%.This essentially means, the RM allocates 
OPPORTUNISTIC containers on this node (that has been previously fully 
allocated) until we hit the utilization threshold of 95% - say, running 19 
containers. At this point if one container's usage goes higher taking us beyond 
95%, we kill enough OPPORTUNISTIC containers to bring this under 95%. May be, 
the max allowed threshold could be higher - 99%. I am wary of setting it to 
100% unless we have some other way of differentiating "running comfortably at 
100%" vs "contention at 100%" because both look the same.  Also, I am assuming 
people would be very happy with 95% utilization if we achieve that :)

bq. nodes can be at 100% CPU, 100% Network, or 100% Disk for long periods of 
time (several minutes). Memory could get to something like 80% before 
corrective action would be required. 
I am beginning to see the need for different thresholds for different 
resources. While I wouldn't necessarily shoot for 100, I can see someone 
configuring it to 95% CPU, 85% network (as this could spike significantly with 
shuffle etc.), 90% disk, 80% memory. And, we would stop over-allocating the 
moment we hit *any one* of these thresholds. 

Should we keep it simple to begin with and have one config, and add other 
configs in the future? Or, do you think the config-per-resource should be there 
from the get go? 


was (Author: kasha):
bq, Just to make sure I understand. When you say max threshold < 1 are you 
saying an NM could not advertise 48 vcores if there are only 24 vcores 
physically available?
You can continue to advertise more vcores. 

Consider a cluster with nodes of 1 physical core. Let us say each node 
advertises 10 *vcores*. Today, let us say your CPU utilization under these 
settings is 50% running 10 containers. All these containers in this context 
would be GUARANTEED containers. I am proposing we set a max threshold for the 
RM over-allocating containers to 95%.This essentially means, the RM allocates 
OPPORTUNISTIC containers on this node (that has been previously fully 
allocated) until we hit the utilization threshold of 95% - say, running 19 
containers. At this point if one container's usage goes higher taking us beyond 
95%, we kill enough OPPORTUNISTIC containers to bring this under 95%. May be, 
the max allowed threshold could be higher - 99%. I am wary of setting it to 
100% unless we have some other way of differentiating "running comfortably at 
100%" vs "contention at 100%" because both look the same.  Also, I am assuming 
people would be very happy with 95% utilization if we achieve that :)

bq. nodes can be at 100% CPU, 100% Network, or 100% Disk for long periods of 
time (several minutes). Memory could get to something like 80% before 
corrective action would be required. 
I am beginning to see the need for different thresholds for different 
resources. While I wouldn't necessarily shoot for 100, I can see someone 
configuring it to 95% CPU, 85% network (as this could spike significantly with 
shuffle etc.), 90% disk, 80% memory. And, we would stop over-allocating the 
moment we hit *any one* of these thresholds. 

Should we keep it simple to begin with and have one config, and add other 
configs in the future? Or, do you think the config-per-resource should be there 
from the get go? 

> [Umbrella] Schedule containers based on utilization of currently allocated 
> containers
> -
>
> Key: YARN-1011
> URL: https://issues.apache.org/jira/browse/YARN-1011
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Arun C Murthy
> Attachments: yarn-1011-design-v0.pdf, yarn-1011-design-v1.pdf
>
>
> Currently RM allocates containers and assumes resources allocated are 
> utilized.
> RM can, and should, get to a point where it measures utilization of allocated 
> containers and, if appropriate, allocate more (speculative?) containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4542) Cleanup AHS code and configuration

2016-01-06 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15085751#comment-15085751
 ] 

Naganarasimha G R commented on YARN-4542:
-

Thanks but just one query, 
bq. We should consider to cleanup AHS related configuration and code later
So do we need to work on it now or later, i faintly remember [~zjshen] also 
raised some jira's related to this but due to compatability issues it was not 
worked upon 

> Cleanup AHS code and configuration
> --
>
> Key: YARN-4542
> URL: https://issues.apache.org/jira/browse/YARN-4542
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Junping Du
>
> ATS (Application Timeline Sever/Service, we already have many versions so 
> far) has been designed and implemented to replace AHS for a long time. We 
> should consider to cleanup AHS related configuration and code later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3849) Too much of preemption activity causing continuos killing of containers across queues

2016-01-06 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-3849:
--
Attachment: 0004-YARN-3849-branch2-7.patch

Attaching branch2.6 patch. Locally all test cases were passing.
[~djp]/[~rohithsharma] Could you please take a look.

> Too much of preemption activity causing continuos killing of containers 
> across queues
> -
>
> Key: YARN-3849
> URL: https://issues.apache.org/jira/browse/YARN-3849
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.7.0
>Reporter: Sunil G
>Assignee: Sunil G
>Priority: Critical
> Fix For: 2.8.0, 2.7.3
>
> Attachments: 0001-YARN-3849.patch, 0002-YARN-3849.patch, 
> 0003-YARN-3849.patch, 0004-YARN-3849-branch2-7.patch, 
> 0004-YARN-3849-branch2-7.patch, 0004-YARN-3849.patch
>
>
> Two queues are used. Each queue has given a capacity of 0.5. Dominant 
> Resource policy is used.
> 1. An app is submitted in QueueA which is consuming full cluster capacity
> 2. After submitting an app in QueueB, there are some demand  and invoking 
> preemption in QueueA
> 3. Instead of killing the excess of 0.5 guaranteed capacity, we observed that 
> all containers other than AM is getting killed in QueueA
> 4. Now the app in QueueB is trying to take over cluster with the current free 
> space. But there are some updated demand from the app in QueueA which lost 
> its containers earlier, and preemption is kicked in QueueB now.
> Scenario in step 3 and 4 continuously happening in loop. Thus none of the 
> apps are completing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3849) Too much of preemption activity causing continuos killing of containers across queues

2016-01-06 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-3849:
--
Attachment: 0004-YARN-3849-branch2-6.patch

> Too much of preemption activity causing continuos killing of containers 
> across queues
> -
>
> Key: YARN-3849
> URL: https://issues.apache.org/jira/browse/YARN-3849
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.7.0
>Reporter: Sunil G
>Assignee: Sunil G
>Priority: Critical
> Fix For: 2.8.0, 2.7.3
>
> Attachments: 0001-YARN-3849.patch, 0002-YARN-3849.patch, 
> 0003-YARN-3849.patch, 0004-YARN-3849-branch2-6.patch, 
> 0004-YARN-3849-branch2-7.patch, 0004-YARN-3849.patch
>
>
> Two queues are used. Each queue has given a capacity of 0.5. Dominant 
> Resource policy is used.
> 1. An app is submitted in QueueA which is consuming full cluster capacity
> 2. After submitting an app in QueueB, there are some demand  and invoking 
> preemption in QueueA
> 3. Instead of killing the excess of 0.5 guaranteed capacity, we observed that 
> all containers other than AM is getting killed in QueueA
> 4. Now the app in QueueB is trying to take over cluster with the current free 
> space. But there are some updated demand from the app in QueueA which lost 
> its containers earlier, and preemption is kicked in QueueB now.
> Scenario in step 3 and 4 continuously happening in loop. Thus none of the 
> apps are completing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3849) Too much of preemption activity causing continuos killing of containers across queues

2016-01-06 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-3849:
--
Attachment: (was: 0004-YARN-3849-branch2-7.patch)

> Too much of preemption activity causing continuos killing of containers 
> across queues
> -
>
> Key: YARN-3849
> URL: https://issues.apache.org/jira/browse/YARN-3849
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.7.0
>Reporter: Sunil G
>Assignee: Sunil G
>Priority: Critical
> Fix For: 2.8.0, 2.7.3
>
> Attachments: 0001-YARN-3849.patch, 0002-YARN-3849.patch, 
> 0003-YARN-3849.patch, 0004-YARN-3849-branch2-6.patch, 
> 0004-YARN-3849-branch2-7.patch, 0004-YARN-3849.patch
>
>
> Two queues are used. Each queue has given a capacity of 0.5. Dominant 
> Resource policy is used.
> 1. An app is submitted in QueueA which is consuming full cluster capacity
> 2. After submitting an app in QueueB, there are some demand  and invoking 
> preemption in QueueA
> 3. Instead of killing the excess of 0.5 guaranteed capacity, we observed that 
> all containers other than AM is getting killed in QueueA
> 4. Now the app in QueueB is trying to take over cluster with the current free 
> space. But there are some updated demand from the app in QueueA which lost 
> its containers earlier, and preemption is kicked in QueueB now.
> Scenario in step 3 and 4 continuously happening in loop. Thus none of the 
> apps are completing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3940) Application moveToQueue should check NodeLabel permission

2016-01-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15085828#comment-15085828
 ] 

Hadoop QA commented on YARN-3940:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
37s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
13s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 37s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
13s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
31s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 28s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 14s 
{color} | {color:red} Patch generated 2 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 (total was 113, now 114). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 34s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
12s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 17s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 introduced 1 new FindBugs issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 59m 4s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 60m 24s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
19s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 136m 53s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | 
module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
|  |  LeafQueue is incompatible with expected argument type String in 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.checkQueuePartition(ApplicationId,
 LeafQueue)  At CapacityScheduler.java:argument type String in 
org.apache.hadoop.yarn.s

[jira] [Created] (YARN-4552) NM ResourceLocalizationService should check and initialize local filecache dir (and log dir) even if NM recover is enabled.

2016-01-06 Thread Junping Du (JIRA)
Junping Du created YARN-4552:


 Summary: NM ResourceLocalizationService should check and 
initialize local filecache dir (and log dir) even if NM recover is enabled.
 Key: YARN-4552
 URL: https://issues.apache.org/jira/browse/YARN-4552
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Junping Du
Assignee: Junping Du
Priority: Critical


In some cases, user are cleanup localized file cache for debugging/trouble 
shooting purpose during NM down time. However, after bring back NM (with 
recovery enabled), the job submission could be failed for exception like below:
{noformat}
Diagnostics: java.io.FileNotFoundException: File /disk/12/yarn/local/filecache 
does not exist.
{noformat}
This is due to we only create filecache dir when recover is not enabled during 
ResourceLocalizationService get initialized/started.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1011) [Umbrella] Schedule containers based on utilization of currently allocated containers

2016-01-06 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15085840#comment-15085840
 ] 

Karthik Kambatla commented on YARN-1011:


bq. Lets say job capacity is 1 container and the job asks for 2. Its get 1 
normal container and 1 opportunistic container. Now it releases its 1 normal 
container. At this point what happens to the opportunistic container. It is 
clearly running at lower priority on the node and as such we are not giving the 
job its guaranteed capacity. 
Momentarily, yes. The RM/NM ensemble (let us discuss that separately) realizes 
this and adjusts by promoting the opportunistic container. Is this different 
from what happens today? Today, the job is allocated one container since that 
is its capacity. Once that is done, it allocates another. Between the first one 
finishing and second one launching, we are not giving the job its guaranteed 
capacity. 

bq. The question is not about finding an optimal solution for this problem (and 
there may not be one). The issue here is to crisply define the semantics around 
scheduling in the design. Whatever the semantics are, we should clearly know 
what they are. IMO, the exact semantics of scheduling should be in the docs.
Agree. I ll add something to the design doc once we capture everyone's 
concerns/suggestions here on JIRA, and may be we could iterate. 

bq. Because of that complexity, I'm not 100% convinced that disfavoring 
OPPORTUNISTIC containers (e.g. low value for cpu_shares) is something that buys 
us very much. 
I don't necessarily see it as disfavoring OPPORTUNISTIC containers. Without 
over-allocation these containers wouldn't even have started. While we are 
optimizing for utilization and throughput, we are just making sure we don't 
adversely affect containers that have been launched prior with promises of 
isolation. 

The low value of cpu_shares only kicks in when the node is highly contended, 
and is intended to be a fail-safe. As long as there are free resources (which I 
believe is the most common case), these OPPORTUNISTIC containers should get a 
sizeable CPU share. No? 

bq. So, hopefully we can make the policy quite configurable so that the amount 
of disfavoring can be tuned for various workloads.
I agree that we might eventually need a configurable policy, but making the 
policy configurable might not be as straight-forward. I am definitely open to 
inputs on simple ways of doing this. Also, it is hard to comment on the 
effectiveness of a simple-but-not-so-configurable policy without implementing 
it and running sample workloads against it.

The simple policy I had in mind was:
# Update the SchedulerNode#getAvailable to include resources that could be 
opportunistically allocated. i.e., max(what_it_says_today, threshold * 
resource). It should be easy to support per-resource thresholds here. 
# At allocate time, label an allocation OPPORTUNISTIC if it takes the 
cumulative allocation over the advertised capacity.
# When space frees up on nodes, NMs send candidate containers for promotion on 
the heartbeat. The RM consults a policy to come up with a list of yes/no 
decisions for each of these candidates. Initially, I would like for the default 
to be yes without any reconsiderations. This favors continuing the execution of 
a container over preempting them. 

Based on what we see, we could tweak this simple policy or come up with more 
sophisticated policies. 

> [Umbrella] Schedule containers based on utilization of currently allocated 
> containers
> -
>
> Key: YARN-1011
> URL: https://issues.apache.org/jira/browse/YARN-1011
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Arun C Murthy
> Attachments: yarn-1011-design-v0.pdf, yarn-1011-design-v1.pdf
>
>
> Currently RM allocates containers and assumes resources allocated are 
> utilized.
> RM can, and should, get to a point where it measures utilization of allocated 
> containers and, if appropriate, allocate more (speculative?) containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4552) NM ResourceLocalizationService should check and initialize local filecache dir (and log dir) even if NM recover is enabled.

2016-01-06 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-4552:
-
Attachment: YARN-4552.patch

Upload a quick patch to fix it. It haven't included any test so far.

> NM ResourceLocalizationService should check and initialize local filecache 
> dir (and log dir) even if NM recover is enabled.
> ---
>
> Key: YARN-4552
> URL: https://issues.apache.org/jira/browse/YARN-4552
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Critical
> Attachments: YARN-4552.patch
>
>
> In some cases, user are cleanup localized file cache for debugging/trouble 
> shooting purpose during NM down time. However, after bring back NM (with 
> recovery enabled), the job submission could be failed for exception like 
> below:
> {noformat}
> Diagnostics: java.io.FileNotFoundException: File 
> /disk/12/yarn/local/filecache does not exist.
> {noformat}
> This is due to we only create filecache dir when recover is not enabled 
> during ResourceLocalizationService get initialized/started.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2902) Killing a container that is localizing can orphan resources in the DOWNLOADING state

2016-01-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15085856#comment-15085856
 ] 

Hadoop QA commented on YARN-2902:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
23s {color} | {color:green} branch-2.6 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s 
{color} | {color:green} branch-2.6 passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s 
{color} | {color:green} branch-2.6 passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
15s {color} | {color:green} branch-2.6 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 29s 
{color} | {color:green} branch-2.6 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
16s {color} | {color:green} branch-2.6 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
56s {color} | {color:green} branch-2.6 passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 19s 
{color} | {color:red} hadoop-yarn-server-nodemanager in branch-2.6 failed with 
JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s 
{color} | {color:green} branch-2.6 passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
23s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 20s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 20s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
10s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
10s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 2s 
{color} | {color:red} The patch has 4758 line(s) that end in whitespace. Use 
git apply --whitespace=fix. {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 1m 48s 
{color} | {color:red} The patch has 272 line(s) with tabs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
56s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 15s 
{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed with 
JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 6m 0s {color} | 
{color:red} hadoop-yarn-server-nodemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 6m 0s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.7.0_91. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 32s 
{color} | {color:red} Patch generated 75 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 31m 7s {color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | 
hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService
 |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:date2016-01-06 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12780377/YARN-2902-branch-2.6.01.patch
 |
| JIRA Issue | Y

[jira] [Commented] (YARN-4265) Provide new timeline plugin storage to support fine-grained entity caching

2016-01-06 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15085859#comment-15085859
 ] 

Junping Du commented on YARN-4265:
--

Thanks [~gtCarrera9] for updating the patch.
bq.  Since this is a separate work than introducing the whole new storage (as 
in this JIRA), maybe we can address this as a new JIRA? 
+1. Address this in a separated JIRA sounds good to me, and I can take a look 
at that demo patch in reviewing this JIRA.

> Provide new timeline plugin storage to support fine-grained entity caching
> --
>
> Key: YARN-4265
> URL: https://issues.apache.org/jira/browse/YARN-4265
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: YARN-4265-trunk.001.patch, YARN-4265-trunk.002.patch, 
> YARN-4265-trunk.003.patch, YARN-4265.YARN-4234.001.patch, 
> YARN-4265.YARN-4234.002.patch
>
>
> To support the newly proposed APIs in YARN-4234, we need to create a new 
> plugin timeline store. The store may have similar behavior as the 
> EntityFileTimelineStore proposed in YARN-3942, but cache date in cache id 
> granularity, instead of application id granularity. Let's have this storage 
> as a standalone one, instead of updating EntityFileTimelineStore, to keep the 
> existing store (EntityFileTimelineStore) stable. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1011) [Umbrella] Schedule containers based on utilization of currently allocated containers

2016-01-06 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15085868#comment-15085868
 ] 

Karthik Kambatla commented on YARN-1011:


We might be better of calling this overallocation instead of oversubscription 
as the latter could be mistaken for oversubscription through the 
yarn.nodemanager.resource.* configs. I ll go ahead and use overallocation in 
patches like for YARN-4512, unless someone expresses reservations here or on 
YARN-4512. 

> [Umbrella] Schedule containers based on utilization of currently allocated 
> containers
> -
>
> Key: YARN-1011
> URL: https://issues.apache.org/jira/browse/YARN-1011
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Arun C Murthy
> Attachments: yarn-1011-design-v0.pdf, yarn-1011-design-v1.pdf
>
>
> Currently RM allocates containers and assumes resources allocated are 
> utilized.
> RM can, and should, get to a point where it measures utilization of allocated 
> containers and, if appropriate, allocate more (speculative?) containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1011) [Umbrella] Schedule containers based on utilization of currently allocated containers

2016-01-06 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15085872#comment-15085872
 ] 

Karthik Kambatla commented on YARN-1011:


BTW, if we agree on the simple policy, I believe we should be able to pull off 
a scheduler-agnostic implementation. 

> [Umbrella] Schedule containers based on utilization of currently allocated 
> containers
> -
>
> Key: YARN-1011
> URL: https://issues.apache.org/jira/browse/YARN-1011
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Arun C Murthy
> Attachments: yarn-1011-design-v0.pdf, yarn-1011-design-v1.pdf
>
>
> Currently RM allocates containers and assumes resources allocated are 
> utilized.
> RM can, and should, get to a point where it measures utilization of allocated 
> containers and, if appropriate, allocate more (speculative?) containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4512) Provide a knob to turn on over-allocation

2016-01-06 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-4512:
---
Summary: Provide a knob to turn on over-allocation  (was: Provide a knob to 
turn on over-subscription)

> Provide a knob to turn on over-allocation
> -
>
> Key: YARN-4512
> URL: https://issues.apache.org/jira/browse/YARN-4512
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4265) Provide new timeline plugin storage to support fine-grained entity caching

2016-01-06 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15085873#comment-15085873
 ] 

Li Lu commented on YARN-4265:
-

Sure. Linking this the YARN-4545 which works on the distributed shell 
integration. 

> Provide new timeline plugin storage to support fine-grained entity caching
> --
>
> Key: YARN-4265
> URL: https://issues.apache.org/jira/browse/YARN-4265
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: YARN-4265-trunk.001.patch, YARN-4265-trunk.002.patch, 
> YARN-4265-trunk.003.patch, YARN-4265.YARN-4234.001.patch, 
> YARN-4265.YARN-4234.002.patch
>
>
> To support the newly proposed APIs in YARN-4234, we need to create a new 
> plugin timeline store. The store may have similar behavior as the 
> EntityFileTimelineStore proposed in YARN-3942, but cache date in cache id 
> granularity, instead of application id granularity. Let's have this storage 
> as a standalone one, instead of updating EntityFileTimelineStore, to keep the 
> existing store (EntityFileTimelineStore) stable. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2962) ZKRMStateStore: Limit the number of znodes under a znode

2016-01-06 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-2962:
---
Attachment: YARN-2962.04.patch

> ZKRMStateStore: Limit the number of znodes under a znode
> 
>
> Key: YARN-2962
> URL: https://issues.apache.org/jira/browse/YARN-2962
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Karthik Kambatla
>Assignee: Varun Saxena
>Priority: Critical
> Attachments: YARN-2962.01.patch, YARN-2962.04.patch, 
> YARN-2962.2.patch, YARN-2962.3.patch
>
>
> We ran into this issue where we were hitting the default ZK server message 
> size configs, primarily because the message had too many znodes even though 
> they individually they were all small.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4389) "yarn.am.blacklisting.enabled" and "yarn.am.blacklisting.disable-failure-threshold" should be app specific rather than a setting for whole YARN cluster

2016-01-06 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-4389:
--
Attachment: 0005-YARN-4389.patch

Rebasing against latest trunk. [~djp] could you please take a look.

> "yarn.am.blacklisting.enabled" and 
> "yarn.am.blacklisting.disable-failure-threshold" should be app specific 
> rather than a setting for whole YARN cluster
> ---
>
> Key: YARN-4389
> URL: https://issues.apache.org/jira/browse/YARN-4389
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications
>Reporter: Junping Du
>Assignee: Sunil G
>Priority: Critical
> Attachments: 0001-YARN-4389.patch, 0002-YARN-4389.patch, 
> 0003-YARN-4389.patch, 0004-YARN-4389.patch, 0005-YARN-4389.patch
>
>
> "yarn.am.blacklisting.enabled" and 
> "yarn.am.blacklisting.disable-failure-threshold" should be application 
> specific rather than a setting in cluster level, or we should't maintain 
> amBlacklistingEnabled and blacklistDisableThreshold in per rmApp level. We 
> should allow each am to override this config, i.e. via submissionContext.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3934) Application with large ApplicationSubmissionContext can cause RM to exit when ZK store is used

2016-01-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15085897#comment-15085897
 ] 

Hadoop QA commented on YARN-3934:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
49s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 47s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 5s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
30s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 5s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
27s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
39s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 58s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 13s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
54s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 42s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 42s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 4s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 4s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 30s 
{color} | {color:red} Patch generated 2 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn (total was 221, now 222). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 59s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
24s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
42s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 52s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 6s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 19s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_66. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 59m 13s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 23s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.7.0_91. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 60m 38s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
21s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 155m 59s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
| JDK v1.7.0_91 Failed junit tests | 
hadoop.yar

[jira] [Commented] (YARN-4371) "yarn application -kill" should take multiple application ids

2016-01-06 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15085900#comment-15085900
 ] 

Sunil G commented on YARN-4371:
---

[~ozawa], could you please help to check the updated approach.

> "yarn application -kill" should take multiple application ids
> -
>
> Key: YARN-4371
> URL: https://issues.apache.org/jira/browse/YARN-4371
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Tsuyoshi Ozawa
>Assignee: Sunil G
> Attachments: 0001-YARN-4371.patch, 0002-YARN-4371.patch
>
>
> Currently we cannot pass multiple applications to "yarn application -kill" 
> command. The command should take multiple application ids at the same time. 
> Each entries should be separated with whitespace like:
> {code}
> yarn application -kill application_1234_0001 application_1234_0007 
> application_1234_0012
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3940) Application moveToQueue should check NodeLabel permission

2016-01-06 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-3940:
---
Attachment: (was: 0004-YARN-3940.patch)

> Application moveToQueue should check NodeLabel permission 
> --
>
> Key: YARN-3940
> URL: https://issues.apache.org/jira/browse/YARN-3940
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Attachments: 0001-YARN-3940.patch, 0002-YARN-3940.patch, 
> 0003-YARN-3940.patch, 0004-YARN-3940.patch
>
>
> Configure capacity scheduler 
> Configure node label an submit application {{queue=A Label=X}}
> Move application to queue {{B}} and x is not having access
> {code}
> 2015-07-20 19:46:19,626 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
>  Application attempt appattempt_1437385548409_0005_01 released container 
> container_e08_1437385548409_0005_01_02 on node: host: 
> host-10-19-92-117:64318 #containers=1 available= 
> used= with event: KILL
> 2015-07-20 19:46:20,970 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: 
> Invalid resource ask by application appattempt_1437385548409_0005_01
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request, queue=b1 doesn't have permission to access all labels in 
> resource request. labelExpression of resource request=x. Queue labels=y
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:304)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:234)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndvalidateRequest(SchedulerUtils.java:250)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.normalizeAndValidateRequests(RMServerUtils.java:106)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:515)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
> at 
> org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:636)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:976)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2174)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2170)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1666)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2168)
> {code}
> Same exception will be thrown till *heartbeat timeout*
> Then application state will be updated to *FAILED*



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3940) Application moveToQueue should check NodeLabel permission

2016-01-06 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-3940:
---
Attachment: 0004-YARN-3940.patch

> Application moveToQueue should check NodeLabel permission 
> --
>
> Key: YARN-3940
> URL: https://issues.apache.org/jira/browse/YARN-3940
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Attachments: 0001-YARN-3940.patch, 0002-YARN-3940.patch, 
> 0003-YARN-3940.patch, 0004-YARN-3940.patch
>
>
> Configure capacity scheduler 
> Configure node label an submit application {{queue=A Label=X}}
> Move application to queue {{B}} and x is not having access
> {code}
> 2015-07-20 19:46:19,626 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
>  Application attempt appattempt_1437385548409_0005_01 released container 
> container_e08_1437385548409_0005_01_02 on node: host: 
> host-10-19-92-117:64318 #containers=1 available= 
> used= with event: KILL
> 2015-07-20 19:46:20,970 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: 
> Invalid resource ask by application appattempt_1437385548409_0005_01
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request, queue=b1 doesn't have permission to access all labels in 
> resource request. labelExpression of resource request=x. Queue labels=y
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:304)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:234)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndvalidateRequest(SchedulerUtils.java:250)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.normalizeAndValidateRequests(RMServerUtils.java:106)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:515)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
> at 
> org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:636)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:976)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2174)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2170)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1666)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2168)
> {code}
> Same exception will be thrown till *heartbeat timeout*
> Then application state will be updated to *FAILED*



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2902) Killing a container that is localizing can orphan resources in the DOWNLOADING state

2016-01-06 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15085908#comment-15085908
 ] 

Varun Saxena commented on YARN-2902:


The whitespace result seems to be weird. Maybe because some build related 
changes made in trunk are not in this branch

> Killing a container that is localizing can orphan resources in the 
> DOWNLOADING state
> 
>
> Key: YARN-2902
> URL: https://issues.apache.org/jira/browse/YARN-2902
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.5.0
>Reporter: Jason Lowe
>Assignee: Varun Saxena
> Fix For: 2.7.2
>
> Attachments: YARN-2902-branch-2.6.01.patch, YARN-2902.002.patch, 
> YARN-2902.03.patch, YARN-2902.04.patch, YARN-2902.05.patch, 
> YARN-2902.06.patch, YARN-2902.07.patch, YARN-2902.08.patch, 
> YARN-2902.09.patch, YARN-2902.10.patch, YARN-2902.11.patch, YARN-2902.patch
>
>
> If a container is in the process of localizing when it is stopped/killed then 
> resources are left in the DOWNLOADING state.  If no other container comes 
> along and requests these resources they linger around with no reference 
> counts but aren't cleaned up during normal cache cleanup scans since it will 
> never delete resources in the DOWNLOADING state even if their reference count 
> is zero.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3940) Application moveToQueue should check NodeLabel permission

2016-01-06 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15085917#comment-15085917
 ] 

Bibin A Chundatt commented on YARN-3940:


Uploading patch after fixing check style issue. Please do review latest patch

> Application moveToQueue should check NodeLabel permission 
> --
>
> Key: YARN-3940
> URL: https://issues.apache.org/jira/browse/YARN-3940
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Attachments: 0001-YARN-3940.patch, 0002-YARN-3940.patch, 
> 0003-YARN-3940.patch, 0004-YARN-3940.patch
>
>
> Configure capacity scheduler 
> Configure node label an submit application {{queue=A Label=X}}
> Move application to queue {{B}} and x is not having access
> {code}
> 2015-07-20 19:46:19,626 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
>  Application attempt appattempt_1437385548409_0005_01 released container 
> container_e08_1437385548409_0005_01_02 on node: host: 
> host-10-19-92-117:64318 #containers=1 available= 
> used= with event: KILL
> 2015-07-20 19:46:20,970 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: 
> Invalid resource ask by application appattempt_1437385548409_0005_01
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request, queue=b1 doesn't have permission to access all labels in 
> resource request. labelExpression of resource request=x. Queue labels=y
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:304)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:234)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndvalidateRequest(SchedulerUtils.java:250)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.normalizeAndValidateRequests(RMServerUtils.java:106)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:515)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
> at 
> org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:636)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:976)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2174)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2170)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1666)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2168)
> {code}
> Same exception will be thrown till *heartbeat timeout*
> Then application state will be updated to *FAILED*



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4552) NM ResourceLocalizationService should check and initialize local filecache dir (and log dir) even if NM recover is enabled.

2016-01-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15085920#comment-15085920
 ] 

Hadoop QA commented on YARN-4552:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
10s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
13s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 31s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 3s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
26s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 13s 
{color} | {color:red} Patch generated 4 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 (total was 152, now 154). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 30s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
11s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
10s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 59s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 19s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
20s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 35m 40s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12780786/YARN-4552.patch |
| JIRA Issue | YARN-4552 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 9a3178d1093d 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT

[jira] [Commented] (YARN-4029) Update LogAggregationStatus to store on finish

2016-01-06 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15085942#comment-15085942
 ] 

Sunil G commented on YARN-4029:
---

Looks fine. 

Some minor nits:
1.
{code}
+  public LogAggregationStatus getLogAggregationStatus() {
+ApplicationStateDataProtoOrBuilder p = viaProto ? proto : builder;
+return ProtoUtils.convertFromProtoFormat(p.getLogAggregationStatus());
+  }
{code}

May be if {{p.getLogAggregationStatus()}} is null, you can directly return null 
from this getter.

2.
Since you are planning to set {{LogAggregationStatus.NOT_START}} as default for 
all AppStates, I suggest you can try setting 
{{appState.setLogAggregationStatus(logstatus);}} in the primary {{newInstance}} 
and pass this default from other cases. Else if some new paramters comes later, 
it may become more confusing.


> Update LogAggregationStatus to store on finish
> --
>
> Key: YARN-4029
> URL: https://issues.apache.org/jira/browse/YARN-4029
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Attachments: 0001-YARN-4029.patch, 0002-YARN-4029.patch, 
> 0003-YARN-4029.patch, 0004-YARN-4029.patch, Image.jpg
>
>
> Currently the log aggregation status is not getting updated to Store. When RM 
> is restarted will show NOT_START. 
> Steps to reproduce
> 
> 1.Submit mapreduce application
> 2.Wait for completion
> 3.Once application is completed switch RM
> *Log Aggregation Status* are changing
> *Log Aggregation Status* from SUCCESS to NOT_START



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3940) Application moveToQueue should check NodeLabel permission

2016-01-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15085963#comment-15085963
 ] 

Hadoop QA commented on YARN-3940:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
44s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 37s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 43s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
18s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
17s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
32s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 14s 
{color} | {color:red} Patch generated 2 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 (total was 113, now 114). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 34s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
12s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 18s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 introduced 1 new FindBugs issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 63m 37s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 43s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
18s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 147m 32s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | 
module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
|  |  LeafQueue is incompatible with expected argument type String in 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.checkQueuePartition(ApplicationId,
 LeafQueue)  At CapacityScheduler.java:argument type String in 
org.apache.hadoop.yarn.

[jira] [Commented] (YARN-3940) Application moveToQueue should check NodeLabel permission

2016-01-06 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15085972#comment-15085972
 ] 

Sunil G commented on YARN-3940:
---

Thanks [~bibinchundatt]
Few comments:

1. {{ LeafQueue dest}} --> {{ LeafQueue target}} seems more better.
2. destqueuelabels --> targetQueueLabels
3. 
{code}
+Set sourcequeuelabels =
+getAndCheckLeafQueue(sourceQueueName).getAccessibleNodeLabels();
..
..
if (destqueuelabels.contains(sourcequeuelabels)
{code}

I think its not very good to have a check like this. Its fine and valid, but we 
could check for those labels which are used by this app under movement.
Such an api can be implemented, and i think it will be better.

4. {{public FiCaSchedulerApp getApplicationAttempt(ApplicationAttemptId 
applicationAttemptId) }}can be used to get the app object directly instead of 2 
calls explicitly.




> Application moveToQueue should check NodeLabel permission 
> --
>
> Key: YARN-3940
> URL: https://issues.apache.org/jira/browse/YARN-3940
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Attachments: 0001-YARN-3940.patch, 0002-YARN-3940.patch, 
> 0003-YARN-3940.patch, 0004-YARN-3940.patch
>
>
> Configure capacity scheduler 
> Configure node label an submit application {{queue=A Label=X}}
> Move application to queue {{B}} and x is not having access
> {code}
> 2015-07-20 19:46:19,626 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
>  Application attempt appattempt_1437385548409_0005_01 released container 
> container_e08_1437385548409_0005_01_02 on node: host: 
> host-10-19-92-117:64318 #containers=1 available= 
> used= with event: KILL
> 2015-07-20 19:46:20,970 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: 
> Invalid resource ask by application appattempt_1437385548409_0005_01
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request, queue=b1 doesn't have permission to access all labels in 
> resource request. labelExpression of resource request=x. Queue labels=y
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:304)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:234)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndvalidateRequest(SchedulerUtils.java:250)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.normalizeAndValidateRequests(RMServerUtils.java:106)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:515)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
> at 
> org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:636)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:976)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2174)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2170)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1666)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2168)
> {code}
> Same exception will be thrown till *heartbeat timeout*
> Then application state will be updated to *FAILED*



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2962) ZKRMStateStore: Limit the number of znodes under a znode

2016-01-06 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15085977#comment-15085977
 ] 

Varun Saxena commented on YARN-2962:


Rebased and updated the patch.
Additionally the patch ensures that changes in split index config does not lead 
to formatting of the state store.
The patch primarily adopts the suggestion given by [~vinodkv] above.
The storage scheme would look something like below.

{noformat}
  |--- RM_APP_ROOT
  | |- HIERARCHIES
  | ||- 1
  | ||  |- (#ApplicationId barring last character)
  | ||  |   |- (#Last character of ApplicationId)
  | ||  |   |   |- (#ApplicationAttemptIds)
  | ||  
  | ||
  | ||- 2
  | ||  |- (#ApplicationId barring last 2 characters)
  | ||  |   |- (#Last 2 characters of ApplicationId)
  | ||  |   |   |- (#ApplicationAttemptIds)
  | ||  
  | ||
  | ||- 3
  | ||  |- (#ApplicationId barring last 3 characters)
  | ||  |   |- (#Last 3 characters of ApplicationId)
  | ||  |   |   |- (#ApplicationAttemptIds)
  | ||  
  | ||
  | ||- 4
  | ||  |- (#ApplicationId barring last 4 characters)
  | ||  |   |- (#Last 4 characters of ApplicationId)
  | ||  |   |   |- (#ApplicationAttemptIds)
  | ||  
  | ||
  | |- (#ApplicationId1)
  | ||- (#ApplicationAttemptIds)
  | |
  | |- (#ApplicationId2)
  | |   |- (#ApplicationAttemptIds)
  | 
  |
{noformat}

Split index will be calculated from the end.
Apps will be stored outside HIERARCHIES folder(i.e. directly under RMAppRoot) 
if the split index config value is 0(i.e. no split) - default value. This has 
been done so that if users do not want to split app nodes, there will be no 
impact on them during an upgrade.

If app node is not found as in the folder as per configured split index, we 
will look into other paths. 
At the time of startup, we will include only those app hierarchies which have 
apps under them and the hierarchy as per configured split index. This would 
preclude the need to look in each and every path(as per split index) in case 
app znode is not found in path as per configured split index.

_Example :_ With no split, appid znode will be of the stored as 
{{RMAppRoot/application_1352994193343_0001}}. If the value of this config is 1, 
the appid znode will be broken into two parts application_1352994193343_000 and 
1 respectively with former being the parent node.
It will be stored in path 
{{RMAppRoot/HIERARCHIES/1/application_1352994193343_000/1}} i.e. upto 10 apps 
can be stored under the parent.
If config was 2, it will be stored in path 
{{RMAppRoot/HIERARCHIES/2/application_1352994193343_00/01}} i.e. upto 100 apps 
can be stored under the parent.
Likewise, upto 1000 apps can be stored under a parent if config is 3 and 1 
apps if config is 4.

We remove the parent app path if no apps exist under the parent upon removal.

As the ZKRMStateStore methods are synchronized I am assuming that there will be 
no race when deleting the parent path above. i.e. Race while deleting the app 
node parent and a new app being stored under the same parent. I hope that is a 
fair assumption assuming only one RM will be active at a time. And only one RM 
should be up in non HA mode. 
Do we need to take care of something else here ?
 

> ZKRMStateStore: Limit the number of znodes under a znode
> 
>
> Key: YARN-2962
> URL: https://issues.apache.org/jira/browse/YARN-2962
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Karthik Kambatla
>Assignee: Varun Saxena
>Priority: Critical
> Attachments: YARN-2962.01.patch, YARN-2962.04.patch, 
> YARN-2962.2.patch, YARN-2962.3.patch
>
>
> We ran into this issue where we were hitting the default ZK server message 
> size configs, primarily because the message had too many znodes even though 
> they individually they were all small.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2962) ZKRMStateStore: Limit the number of znodes under a znode

2016-01-06 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15085982#comment-15085982
 ] 

Varun Saxena commented on YARN-2962:


We can decide if we need to write a tool to migrate apps to a different 
hierarchy(can be done in another JIRA) and whether we would want this in 
branch-2.

> ZKRMStateStore: Limit the number of znodes under a znode
> 
>
> Key: YARN-2962
> URL: https://issues.apache.org/jira/browse/YARN-2962
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Karthik Kambatla
>Assignee: Varun Saxena
>Priority: Critical
> Attachments: YARN-2962.01.patch, YARN-2962.04.patch, 
> YARN-2962.2.patch, YARN-2962.3.patch
>
>
> We ran into this issue where we were hitting the default ZK server message 
> size configs, primarily because the message had too many znodes even though 
> they individually they were all small.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4538) QueueMetrics pending cores and memory metrics wrong

2016-01-06 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086037#comment-15086037
 ] 

Wangda Tan commented on YARN-4538:
--

Hi [~bibinchundatt],
Thanks for working on the patch, looked at your patch, I think this patch may 
break some existing scenario, let me explain:

For existing workflow:
- Container increase/decrease: call incr/decrPendingResource of QueueMetrics, 
Parameter is: #container=0, resource=delta-resource. And 
incr/decrPendingResource should not change #container, but should update 
pendingResource by resource
- Container allocation/release:  call incr/decrPendingResource of QueueMetrics, 
Parameter is: #container should > 0, resource=per-container-resource, 
pendingResource should be updated by #container * resource

So under this assumption, when container increase/decrease, your existing patch 
will skip updating pendingResource.

Suggestion of fixes:
- Choice#1, change all existing caller of incr/decrPendingResource for 
container allocation/release to make sure that they will be called when 
#container>0
- Choice#2, create a new method for increase/decrease container to change 
resource but not update #container in QueueMetrics.

I would prefer choice#2. 

Thoughts?

> QueueMetrics pending  cores and memory metrics wrong
> 
>
> Key: YARN-4538
> URL: https://issues.apache.org/jira/browse/YARN-4538
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Attachments: 0001-YARN-4538.patch
>
>
> Submit 2 application to default queue 
> Check queue metrics for pending cores and memory
> {noformat}
> List allQueues = client.getChildQueueInfos("root");
> for (QueueInfo queueInfo : allQueues) {
>   QueueStatistics quastats = queueInfo.getQueueStatistics();
>   System.out.println(quastats.getPendingVCores());
>   System.out.println(quastats.getPendingMemoryMB());
> }
> {noformat}
> *Output :*
> -20
> -20480



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4414) Nodemanager connection errors are retried at multiple levels

2016-01-06 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086061#comment-15086061
 ] 

Jason Lowe commented on YARN-4414:
--

Thanks for the patch, Chang!  I'm a bit curious on the naming convention of the 
patches.  Why .1.2 and .1.3 instead of just .2 and .3?  In the future, I'd 
recommend using the patch naming conventions as described in 
http://wiki.apache.org/hadoop/HowToContribute#Naming_your_patch to be 
consistent with other contributors and help reduce confusion.

As for the patch the main change looks OK to me, but I have some nits with the 
test:
- Why are we explicitly setting the NM port to 1234?  Shouldn't we inherit the 
same NM port setting from the base conf as the other connection retry tests 
already do?
- getNMProxy2 should just be getNMProxy, overloaded for the Configuration 
parameter.
- Rather than copying the entire method, getProxy() should be implemented in 
terms of getProxy(Configuration).


> Nodemanager connection errors are retried at multiple levels
> 
>
> Key: YARN-4414
> URL: https://issues.apache.org/jira/browse/YARN-4414
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.1, 2.6.2
>Reporter: Jason Lowe
>Assignee: Chang Li
> Attachments: YARN-4414.1.2.patch, YARN-4414.1.2.patch, 
> YARN-4414.1.3.patch, YARN-4414.1.patch
>
>
> This is related to YARN-3238.  Ran into more scenarios where connection 
> errors are being retried at multiple levels, like NoRouteToHostException.  
> The fix for YARN-3238 was too specific, and I think we need a more general 
> solution to catch a wider array of connection errors that can occur to avoid 
> retrying them both at the RPC layer and at the NM proxy layer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3995) Some of the NM events are not getting published due race condition when AM container finishes in NM

2016-01-06 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-3995:

Attachment: YARN-3995-feature-YARN-2928.v1.001.patch

Hi [~sjlee0], As per our discussion i have uploaded the patch. Please review.

> Some of the NM events are not getting published due race condition when AM 
> container finishes in NM 
> 
>
> Key: YARN-3995
> URL: https://issues.apache.org/jira/browse/YARN-3995
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, timelineserver
>Affects Versions: YARN-2928
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-3995-feature-YARN-2928.v1.001.patch
>
>
> As discussed in YARN-3045:  While testing in TestDistributedShell found out 
> that few of the container metrics events were failing as there will be race 
> condition. When the AM container finishes and removes the collector for the 
> app, still there is possibility that all the events published for the app by 
> the current NM and other NM are still in pipeline, 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4547) LeafQueue#getApplications() is read-only interface, but it provides reference to caller

2016-01-06 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086067#comment-15086067
 ] 

Sunil G commented on YARN-4547:
---

Hi [~rohithsharma]
In many of the caller side, we either loop through apps or trying to get the 
size. I think may be we could return an unmodified set/list here. 

> LeafQueue#getApplications() is read-only interface, but it provides reference 
> to caller
> ---
>
> Key: YARN-4547
> URL: https://issues.apache.org/jira/browse/YARN-4547
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>
> The below API is read-only interface, but returning reference to the caller. 
> This causing caller to modify the orderingPolicy entities. If required 
> reference of ordering policy, caller can use 
> {{LeagQueue#getOrderingPolicy()#getSchedulableEntities()}}
> The returning object should be clone of 
> orderingPolicy.getSchedulableEntities()
> {code}
>   /**
>* Obtain (read-only) collection of active applications.
>*/
>   public Collection getApplications() {
> return orderingPolicy.getSchedulableEntities();
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3995) Some of the NM events are not getting published due race condition when AM container finishes in NM

2016-01-06 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086072#comment-15086072
 ] 

Naganarasimha G R commented on YARN-3995:
-

bq. Are you thinking of cases where the AM crashes? If the app finishes 
normally, this sequence does not happen, right?
Well was just having a hunch that suppose AM finishes before its containers 
finishes (like AM will note once container informs AM through umbilical 
protocol that its finished but may be container is not yet finished one of the 
possible reasons being Timeline client has not yet finished flushing the ATS 
events or any other reason for cleaning up)


> Some of the NM events are not getting published due race condition when AM 
> container finishes in NM 
> 
>
> Key: YARN-3995
> URL: https://issues.apache.org/jira/browse/YARN-3995
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, timelineserver
>Affects Versions: YARN-2928
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-3995-feature-YARN-2928.v1.001.patch
>
>
> As discussed in YARN-3045:  While testing in TestDistributedShell found out 
> that few of the container metrics events were failing as there will be race 
> condition. When the AM container finishes and removes the collector for the 
> app, still there is possibility that all the events published for the app by 
> the current NM and other NM are still in pipeline, 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3940) Application moveToQueue should check NodeLabel permission

2016-01-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086195#comment-15086195
 ] 

Hadoop QA commented on YARN-3940:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
3s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 33s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
16s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 40s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
16s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
20s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 31s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
38s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 30s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 30s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 17s 
{color} | {color:red} Patch generated 2 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 (total was 113, now 114). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 37s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 34s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 introduced 1 new FindBugs issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 33s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 33s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 62m 25s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 63m 31s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
19s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 145m 29s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | 
module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
|  |  java.util.Set is incompatible with expected argument 
type String in 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.checkQueuePartition(ApplicationId,
 LeafQueue)  At CapacityScheduler.java:argument type String in 
org.apache.hadoop.y

[jira] [Commented] (YARN-3995) Some of the NM events are not getting published due race condition when AM container finishes in NM

2016-01-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086218#comment-15086218
 ] 

Hadoop QA commented on YARN-3995:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
56s {color} | {color:green} feature-YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 57s 
{color} | {color:green} feature-YARN-2928 passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 18s 
{color} | {color:green} feature-YARN-2928 passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
32s {color} | {color:green} feature-YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 33s 
{color} | {color:green} feature-YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
40s {color} | {color:green} feature-YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
53s {color} | {color:green} feature-YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 44s 
{color} | {color:green} feature-YARN-2928 passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 58s 
{color} | {color:green} feature-YARN-2928 passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
16s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 59s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 59s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 24s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 24s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 33s 
{color} | {color:red} Patch generated 3 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn (total was 217, now 218). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 25s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
31s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 0s 
{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 
15s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 28s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 47s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 22s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_66. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 50s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 52s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 24s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.7.0_91. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 6s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 5s 
{color} | {col

[jira] [Commented] (YARN-4389) "yarn.am.blacklisting.enabled" and "yarn.am.blacklisting.disable-failure-threshold" should be app specific rather than a setting for whole YARN cluster

2016-01-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086245#comment-15086245
 ] 

Hadoop QA commented on YARN-4389:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
1s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 8s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 11s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
28s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 39s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
41s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
49s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 31s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 44s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
23s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 6s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 2m 6s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 6s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 7s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 2m 7s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 7s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 28s 
{color} | {color:red} Patch generated 5 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn (total was 161, now 166). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 31s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
33s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 5s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 25s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 43s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 23s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_66. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 57s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 65m 15s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 24s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.7.0_91. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 10s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.7.0_91. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 66m 27s {color} 
| {color:re

[jira] [Commented] (YARN-2962) ZKRMStateStore: Limit the number of znodes under a znode

2016-01-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086246#comment-15086246
 ] 

Hadoop QA commented on YARN-2962:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
51s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 9s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 9s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
29s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 37s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
41s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
48s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 30s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 50s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
22s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 7s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 7s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 7s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 7s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 29s 
{color} | {color:red} Patch generated 2 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn (total was 221, now 221). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 33s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
35s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 0s 
{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 6s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 25s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 38s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 22s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_66. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 57s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 65m 25s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 23s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.7.0_91. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 10s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.7.0_91. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 66m 39s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_91. {color} |
| {color:green}

[jira] [Commented] (YARN-4311) Removing nodes from include and exclude lists will not remove them from decommissioned nodes list

2016-01-06 Thread Kuhu Shukla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086256#comment-15086256
 ] 

Kuhu Shukla commented on YARN-4311:
---

bq. Do note that with this solution, if a user does a node refresh at least 
once per node removal check interval, no nodes will ever be expunged because 
the timestamp will continually be updated and never exceed the interval.

Still need to address that. Will update patch shortly.

> Removing nodes from include and exclude lists will not remove them from 
> decommissioned nodes list
> -
>
> Key: YARN-4311
> URL: https://issues.apache.org/jira/browse/YARN-4311
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.1
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: YARN-4311-v1.patch, YARN-4311-v2.patch, 
> YARN-4311-v3.patch, YARN-4311-v4.patch
>
>
> In order to fully forget about a node, removing the node from include and 
> exclude list is not sufficient. The RM lists it under Decomm-ed nodes. The 
> tricky part that [~jlowe] pointed out was the case when include lists are not 
> used, in that case we don't want the nodes to fall off if they are not active.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4224) Support fetching entities by UID and change the REST interface to conform to current REST APIs' in YARN

2016-01-06 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086273#comment-15086273
 ] 

Varun Saxena commented on YARN-4224:


[~sjlee0], just carrying on from today's discussion. 
I think "$", "!" and "*" might be fair choices for delimiters as well. If I am 
not wrong, they are safe as well.

I agree we can give up on making it configurable.
What if the UID have the delimiter will choose. I guess asking the 
application(client side) to escape it with the another safe character of our 
choice should be good enough.
i.e. if say we choose delimiter to be ! and flow UID which is of the form 
{{\!\!}} has user component as {{a!b*!c}}, then client can 
encode it by *(say).

So, UID with a!b*!c as user would be sent as {{cluster!a*!b**!c!flow}}. We 
should be able to parse it.

But should we make the UID key itself configurable ? This is regarding the key 
we choose to store UID against in info field(in response).

Regarding entity type, [~gtCarrera9], are you fine with having it as part of 
URL(as a mandatory param) ? I would prefer to have it as part of URL if there 
is no use case. [~sjlee0], what you think on this ?

> Support fetching entities by UID and change the REST interface to conform to 
> current REST APIs' in YARN
> ---
>
> Key: YARN-4224
> URL: https://issues.apache.org/jira/browse/YARN-4224
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4224-YARN-2928.01.patch, 
> YARN-4224-feature-YARN-2928.wip.02.patch, 
> YARN-4224-feature-YARN-2928.wip.03.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4224) Support fetching entities by UID and change the REST interface to conform to current REST APIs' in YARN

2016-01-06 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086274#comment-15086274
 ] 

Varun Saxena commented on YARN-4224:


bq. then client can encode it by *(say).
Meant "then client can escape it by *(say)."

> Support fetching entities by UID and change the REST interface to conform to 
> current REST APIs' in YARN
> ---
>
> Key: YARN-4224
> URL: https://issues.apache.org/jira/browse/YARN-4224
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4224-YARN-2928.01.patch, 
> YARN-4224-feature-YARN-2928.wip.02.patch, 
> YARN-4224-feature-YARN-2928.wip.03.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1011) [Umbrella] Schedule containers based on utilization of currently allocated containers

2016-01-06 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086325#comment-15086325
 ] 

Bikas Saha commented on YARN-1011:
--

bq. At this point what happens to the opportunistic container. It is clearly 
running at lower priority on the node and as such we are not giving the job its 
guaranteed capacity.
bq. At this point what happens to the opportunistic container. It is clearly 
running at lower priority on the node and as such we are not giving the job its 
guaranteed capacity.
Yes. the difference is that the opportunistic container may not be convertible 
into a normal container because that node is still over-allocated. So at this 
point, what should be done? Should this container be terminated and run 
somewhere else as normal (because capacity is now available)? Should some other 
container be preempted on this node to make this container normal? Should the 
RM allocate a normal container and give it to the app in addition to the 
running opportunistic container in case the app can do the transfer internally?

Also, with this feature in place, should we run all containers beyond 
guaranteed capacity as opportunistic containers? This would ensure that any 
excess containers that we give to a job will not affect performance of the 
guaranteed containers of other jobs. This would also make the scheduling and 
allocation more consistent in that the guaranteed containers always run at 
normal priority and extra containers run at lower priority. The extra container 
could be extra over capacity (but without over-subscription) or extra 
over-subscription. Because of this I feel that running tasks at lower priority 
could be an independent (but related) work item.

Staying on this topic and addition configuration to it. It may make sense to 
add some way by which an application can say that dont oversubscribe nodes when 
my containers run on it. Putting cgroups or docker in this context, would these 
mechanism support over-allocating resources like cpu or memory?

bq. When space frees up on nodes, NMs send candidate containers for promotion 
on the heartbeat.
That shouldn't be necessary since the RM will get to know about free capacity 
and run its scheduling cycle for that node - at which point it will be able to 
take action like allocation a new container or upgrading an existing one. There 
isnt anything the NM can tell the RM (which the RM already does not know) 
except for the current utilization of the node.

Some of what I am saying emanates from prior experience with a different Hadoop 
like system. You can read more about it here. 
http://research.microsoft.com/pubs/232978/osdi14-paper-boutin.pdf


> [Umbrella] Schedule containers based on utilization of currently allocated 
> containers
> -
>
> Key: YARN-1011
> URL: https://issues.apache.org/jira/browse/YARN-1011
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Arun C Murthy
> Attachments: yarn-1011-design-v0.pdf, yarn-1011-design-v1.pdf
>
>
> Currently RM allocates containers and assumes resources allocated are 
> utilized.
> RM can, and should, get to a point where it measures utilization of allocated 
> containers and, if appropriate, allocate more (speculative?) containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1011) [Umbrella] Schedule containers based on utilization of currently allocated containers

2016-01-06 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086326#comment-15086326
 ] 

Bikas Saha commented on YARN-1011:
--

Some of what I am saying emanates from prior experience with a different Hadoop 
like system. You can read more about it here. 
http://research.microsoft.com/pubs/232978/osdi14-paper-boutin.pdf

> [Umbrella] Schedule containers based on utilization of currently allocated 
> containers
> -
>
> Key: YARN-1011
> URL: https://issues.apache.org/jira/browse/YARN-1011
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Arun C Murthy
> Attachments: yarn-1011-design-v0.pdf, yarn-1011-design-v1.pdf
>
>
> Currently RM allocates containers and assumes resources allocated are 
> utilized.
> RM can, and should, get to a point where it measures utilization of allocated 
> containers and, if appropriate, allocate more (speculative?) containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2902) Killing a container that is localizing can orphan resources in the DOWNLOADING state

2016-01-06 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086343#comment-15086343
 ] 

Junping Du commented on YARN-2902:
--

Thanks [~varun_saxena] for the patch against 2.6 branch. Agree that the 
whitespace issue is not related to this patch. I have merge it to branch-2.6.

> Killing a container that is localizing can orphan resources in the 
> DOWNLOADING state
> 
>
> Key: YARN-2902
> URL: https://issues.apache.org/jira/browse/YARN-2902
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.5.0
>Reporter: Jason Lowe
>Assignee: Varun Saxena
> Fix For: 2.7.2, 2.6.4
>
> Attachments: YARN-2902-branch-2.6.01.patch, YARN-2902.002.patch, 
> YARN-2902.03.patch, YARN-2902.04.patch, YARN-2902.05.patch, 
> YARN-2902.06.patch, YARN-2902.07.patch, YARN-2902.08.patch, 
> YARN-2902.09.patch, YARN-2902.10.patch, YARN-2902.11.patch, YARN-2902.patch
>
>
> If a container is in the process of localizing when it is stopped/killed then 
> resources are left in the DOWNLOADING state.  If no other container comes 
> along and requests these resources they linger around with no reference 
> counts but aren't cleaned up during normal cache cleanup scans since it will 
> never delete resources in the DOWNLOADING state even if their reference count 
> is zero.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4354) Public resource localization fails with NPE

2016-01-06 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-4354:
-
Target Version/s: 2.7.2, 2.6.4  (was: 2.7.2, 2.7.3)
   Fix Version/s: 2.6.4

Given YARN-2902 is just committed into 2.6 branch, I have cherry-pick this 
patch into branch-2.6 also.

> Public resource localization fails with NPE
> ---
>
> Key: YARN-4354
> URL: https://issues.apache.org/jira/browse/YARN-4354
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.2
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Blocker
> Fix For: 2.7.2, 2.6.4
>
> Attachments: YARN-4354-branch-2.7.002.patch, 
> YARN-4354-unittest.patch, YARN-4354.001.patch, YARN-4354.002.patch
>
>
> I saw public localization on nodemanagers get stuck because it was constantly 
> rejecting requests to the thread pool executor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4380) TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails intermittently

2016-01-06 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-4380:
-
Target Version/s: 2.7.3, 2.6.4  (was: 2.7.3)

> TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails 
> intermittently
> 
>
> Key: YARN-4380
> URL: https://issues.apache.org/jira/browse/YARN-4380
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.8.0, 2.7.1
>Reporter: Tsuyoshi Ozawa
>Assignee: Varun Saxena
> Fix For: 2.7.3
>
> Attachments: YARN-4380.01.patch, 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell-output.2.txt,
>  
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService-output.txt
>
>
> {quote}
> Tests run: 14, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.361 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService
> testDownloadingResourcesOnContainerKill(org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService)
>   Time elapsed: 0.109 sec  <<< FAILURE!
> org.mockito.exceptions.verification.junit.ArgumentsAreDifferent:
> Argument(s) are different! Wanted:
> deletionService.delete(
> "user0",
> null,
> 
> );
> -> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1322)
> Actual invocation has different arguments:
> deletionService.delete(
> "user0",
> 
> /home/ubuntu/hadoop-dev/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService/0/usercache/user0/appcache/application_314159265358979_0003/container_314159265358979_0003_01_42
> );
> -> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1296)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1322)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2975) FSLeafQueue app lists are accessed without required locks

2016-01-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086402#comment-15086402
 ] 

Hudson commented on YARN-2975:
--

FAILURE: Integrated in Hadoop-trunk-Commit #9060 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9060/])
Add YARN-2975, YARN-3893, YARN-2902 and YARN-4354 to Release 2.6.4 entry 
(junping_du: rev b6c9d3fab9c76b03abd664858f64a4ebf3c2bb20)
* hadoop-yarn-project/CHANGES.txt


> FSLeafQueue app lists are accessed without required locks
> -
>
> Key: YARN-2975
> URL: https://issues.apache.org/jira/browse/YARN-2975
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>Priority: Blocker
> Fix For: 2.7.0, 2.6.4
>
> Attachments: yarn-2975-1.patch, yarn-2975-2.patch, yarn-2975-3.patch
>
>
> YARN-2910 adds explicit locked access to runnable and non-runnable apps in 
> FSLeafQueue. As FSLeafQueue has getters for these, they can be accessed 
> without locks in other places. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2902) Killing a container that is localizing can orphan resources in the DOWNLOADING state

2016-01-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086401#comment-15086401
 ] 

Hudson commented on YARN-2902:
--

FAILURE: Integrated in Hadoop-trunk-Commit #9060 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9060/])
Add YARN-2975, YARN-3893, YARN-2902 and YARN-4354 to Release 2.6.4 entry 
(junping_du: rev b6c9d3fab9c76b03abd664858f64a4ebf3c2bb20)
* hadoop-yarn-project/CHANGES.txt


> Killing a container that is localizing can orphan resources in the 
> DOWNLOADING state
> 
>
> Key: YARN-2902
> URL: https://issues.apache.org/jira/browse/YARN-2902
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.5.0
>Reporter: Jason Lowe
>Assignee: Varun Saxena
> Fix For: 2.7.2, 2.6.4
>
> Attachments: YARN-2902-branch-2.6.01.patch, YARN-2902.002.patch, 
> YARN-2902.03.patch, YARN-2902.04.patch, YARN-2902.05.patch, 
> YARN-2902.06.patch, YARN-2902.07.patch, YARN-2902.08.patch, 
> YARN-2902.09.patch, YARN-2902.10.patch, YARN-2902.11.patch, YARN-2902.patch
>
>
> If a container is in the process of localizing when it is stopped/killed then 
> resources are left in the DOWNLOADING state.  If no other container comes 
> along and requests these resources they linger around with no reference 
> counts but aren't cleaned up during normal cache cleanup scans since it will 
> never delete resources in the DOWNLOADING state even if their reference count 
> is zero.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4354) Public resource localization fails with NPE

2016-01-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086404#comment-15086404
 ] 

Hudson commented on YARN-4354:
--

FAILURE: Integrated in Hadoop-trunk-Commit #9060 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9060/])
Add YARN-2975, YARN-3893, YARN-2902 and YARN-4354 to Release 2.6.4 entry 
(junping_du: rev b6c9d3fab9c76b03abd664858f64a4ebf3c2bb20)
* hadoop-yarn-project/CHANGES.txt


> Public resource localization fails with NPE
> ---
>
> Key: YARN-4354
> URL: https://issues.apache.org/jira/browse/YARN-4354
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.2
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Blocker
> Fix For: 2.7.2, 2.6.4
>
> Attachments: YARN-4354-branch-2.7.002.patch, 
> YARN-4354-unittest.patch, YARN-4354.001.patch, YARN-4354.002.patch
>
>
> I saw public localization on nodemanagers get stuck because it was constantly 
> rejecting requests to the thread pool executor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()

2016-01-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086403#comment-15086403
 ] 

Hudson commented on YARN-3893:
--

FAILURE: Integrated in Hadoop-trunk-Commit #9060 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9060/])
Add YARN-2975, YARN-3893, YARN-2902 and YARN-4354 to Release 2.6.4 entry 
(junping_du: rev b6c9d3fab9c76b03abd664858f64a4ebf3c2bb20)
* hadoop-yarn-project/CHANGES.txt


> Both RM in active state when Admin#transitionToActive failure from refeshAll()
> --
>
> Key: YARN-3893
> URL: https://issues.apache.org/jira/browse/YARN-3893
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Fix For: 2.7.2, 2.6.4
>
> Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
> 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, 
> 0006-YARN-3893.patch, 0007-YARN-3893.patch, 0008-YARN-3893.patch, 
> 0009-YARN-3893.patch, 0010-YARN-3893.patch, yarn-site.xml
>
>
> Cases that can cause this.
> # Capacity scheduler xml is wrongly configured during switch
> # Refresh ACL failure due to configuration
> # Refresh User group failure due to configuration
> Continuously both RM will try to be active
> {code}
> dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm1
> 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm2
> 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> {code}
> # Both Web UI active
> # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4380) TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails intermittently

2016-01-06 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-4380:
-
Fix Version/s: 2.6.4

Per Varun's comments in YARN-2902, I have cherry-pick this patch together with 
YARN-2902 to branch-2.6.

> TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails 
> intermittently
> 
>
> Key: YARN-4380
> URL: https://issues.apache.org/jira/browse/YARN-4380
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.8.0, 2.7.1
>Reporter: Tsuyoshi Ozawa
>Assignee: Varun Saxena
> Fix For: 2.7.3, 2.6.4
>
> Attachments: YARN-4380.01.patch, 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell-output.2.txt,
>  
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService-output.txt
>
>
> {quote}
> Tests run: 14, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.361 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService
> testDownloadingResourcesOnContainerKill(org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService)
>   Time elapsed: 0.109 sec  <<< FAILURE!
> org.mockito.exceptions.verification.junit.ArgumentsAreDifferent:
> Argument(s) are different! Wanted:
> deletionService.delete(
> "user0",
> null,
> 
> );
> -> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1322)
> Actual invocation has different arguments:
> deletionService.delete(
> "user0",
> 
> /home/ubuntu/hadoop-dev/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService/0/usercache/user0/appcache/application_314159265358979_0003/container_314159265358979_0003_01_42
> );
> -> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1296)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1322)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4550) some tests in TestContainerLanch fails on non-english locale environment

2016-01-06 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086423#comment-15086423
 ] 

Steve Loughran commented on YARN-4550:
--

+1

> some tests in TestContainerLanch fails on non-english locale environment
> 
>
> Key: YARN-4550
> URL: https://issues.apache.org/jira/browse/YARN-4550
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: nodemanager
>Affects Versions: 2.7.1
> Environment: CentOS 7 with below locale configurations:
> {code}
> $ locale
> LANG=ja_JP.UTF-8
> LC_CTYPE="ja_JP.UTF-8"
> LC_NUMERIC="ja_JP.UTF-8"
> LC_TIME="ja_JP.UTF-8"
> LC_COLLATE="ja_JP.UTF-8"
> LC_MONETARY="ja_JP.UTF-8"
> LC_MESSAGES="ja_JP.UTF-8"
> LC_PAPER="ja_JP.UTF-8"
> LC_NAME="ja_JP.UTF-8"
> LC_ADDRESS="ja_JP.UTF-8"
> LC_TELEPHONE="ja_JP.UTF-8"
> LC_MEASUREMENT="ja_JP.UTF-8"
> LC_IDENTIFICATION="ja_JP.UTF-8"
> LC_ALL=
> {code}
>Reporter: Takashi Ohnishi
>Priority: Minor
> Attachments: YARN-4550.1.patch
>
>
> The tests listed below fail.
> * testErrorLogOnContainerExitWithMultipleFiles
> * testErrorLogOnContainerExitWithCustomPattern
> * testErrorLogOnContainerExitForCase
> * testErrorLogOnContainerExit
> * testErrorLogOnContainerExitForExt
> The failures happen in same place.
> {code}
> java.lang.AssertionError: Should contain contents of error Log
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.assertTrue(Assert.java:41)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch$ContainerExitHandler.handle(TestContainerLaunch.java:633)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch$ContainerExitHandler.handle(TestContainerLaunch.java:602)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.handleContainerExitWithFailure(ContainerLaunch.java:438)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:359)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.verifyTailErrorLogOnContainerExit(TestContainerLaunch.java:597)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.testErrorLogOnContainerExitWithMultipleFiles(TestContainerLaunch.java:528)
> {code}
> All these tests call verifyTailErrorLogOnContainerExit and it calls below 
> code.
> {code:title=TestContainerLaunch.java}
>  632 Assert.assertTrue("Should contain contents of error Log",
>
>  633 exitEvent.getDiagnosticInfo().contains(
>  634 INVALID_JAVA_HOME + "/bin/java: No such file or 
> directory"));
>  635   }
> {code}
> In environment with non-english locale, this fails because the error message 
> returned with non-english text like below.
> {code}
> 2016-01-06 23:27:45,427 INFO  [main] 
> containermanager.BaseContainerManagerTest 
> (TestContainerLaunch.java:handle(622)) - Diagnostic Info : Container exited 
> with a non-zero exit code 127. Error files: stderr.log, stdout.
> Last 4096 bytes of stderr.log :
> /bin/bash: /no/jvm/here/bin/java: そのようなファイルやディレクトリはありません
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4550) some tests in TestContainerLanch fails on non-english locale environment

2016-01-06 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-4550:
-
Component/s: test

> some tests in TestContainerLanch fails on non-english locale environment
> 
>
> Key: YARN-4550
> URL: https://issues.apache.org/jira/browse/YARN-4550
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: nodemanager, test
>Affects Versions: 2.7.1
> Environment: CentOS 7 with below locale configurations:
> {code}
> $ locale
> LANG=ja_JP.UTF-8
> LC_CTYPE="ja_JP.UTF-8"
> LC_NUMERIC="ja_JP.UTF-8"
> LC_TIME="ja_JP.UTF-8"
> LC_COLLATE="ja_JP.UTF-8"
> LC_MONETARY="ja_JP.UTF-8"
> LC_MESSAGES="ja_JP.UTF-8"
> LC_PAPER="ja_JP.UTF-8"
> LC_NAME="ja_JP.UTF-8"
> LC_ADDRESS="ja_JP.UTF-8"
> LC_TELEPHONE="ja_JP.UTF-8"
> LC_MEASUREMENT="ja_JP.UTF-8"
> LC_IDENTIFICATION="ja_JP.UTF-8"
> LC_ALL=
> {code}
>Reporter: Takashi Ohnishi
>Priority: Minor
> Attachments: YARN-4550.1.patch
>
>
> The tests listed below fail.
> * testErrorLogOnContainerExitWithMultipleFiles
> * testErrorLogOnContainerExitWithCustomPattern
> * testErrorLogOnContainerExitForCase
> * testErrorLogOnContainerExit
> * testErrorLogOnContainerExitForExt
> The failures happen in same place.
> {code}
> java.lang.AssertionError: Should contain contents of error Log
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.assertTrue(Assert.java:41)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch$ContainerExitHandler.handle(TestContainerLaunch.java:633)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch$ContainerExitHandler.handle(TestContainerLaunch.java:602)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.handleContainerExitWithFailure(ContainerLaunch.java:438)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:359)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.verifyTailErrorLogOnContainerExit(TestContainerLaunch.java:597)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.testErrorLogOnContainerExitWithMultipleFiles(TestContainerLaunch.java:528)
> {code}
> All these tests call verifyTailErrorLogOnContainerExit and it calls below 
> code.
> {code:title=TestContainerLaunch.java}
>  632 Assert.assertTrue("Should contain contents of error Log",
>
>  633 exitEvent.getDiagnosticInfo().contains(
>  634 INVALID_JAVA_HOME + "/bin/java: No such file or 
> directory"));
>  635   }
> {code}
> In environment with non-english locale, this fails because the error message 
> returned with non-english text like below.
> {code}
> 2016-01-06 23:27:45,427 INFO  [main] 
> containermanager.BaseContainerManagerTest 
> (TestContainerLaunch.java:handle(622)) - Diagnostic Info : Container exited 
> with a non-zero exit code 127. Error files: stderr.log, stdout.
> Last 4096 bytes of stderr.log :
> /bin/bash: /no/jvm/here/bin/java: そのようなファイルやディレクトリはありません
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3857) Memory leak in ResourceManager with SIMPLE mode

2016-01-06 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-3857:
-
Target Version/s: 2.6.4

> Memory leak in ResourceManager with SIMPLE mode
> ---
>
> Key: YARN-3857
> URL: https://issues.apache.org/jira/browse/YARN-3857
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.0, 2.6.2
>Reporter: mujunchao
>Assignee: mujunchao
>Priority: Critical
>  Labels: patch
> Fix For: 2.7.2, 2.6.4
>
> Attachments: YARN-3857-1.patch, YARN-3857-2.patch, YARN-3857-3.patch, 
> YARN-3857-4.patch, hadoop-yarn-server-resourcemanager.patch
>
>
>  We register the ClientTokenMasterKey to avoid client may hold an invalid 
> ClientToken after RM restarts. In SIMPLE mode, we register 
> Pair ,  But we never remove it from HashMap, as 
> unregister only runing while in Security mode, so memory leak coming. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2975) FSLeafQueue app lists are accessed without required locks

2016-01-06 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-2975:
-
Target Version/s: 2.7.0, 2.6.4  (was: 2.7.0)

> FSLeafQueue app lists are accessed without required locks
> -
>
> Key: YARN-2975
> URL: https://issues.apache.org/jira/browse/YARN-2975
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>Priority: Blocker
> Fix For: 2.7.0, 2.6.4
>
> Attachments: yarn-2975-1.patch, yarn-2975-2.patch, yarn-2975-3.patch
>
>
> YARN-2910 adds explicit locked access to runnable and non-runnable apps in 
> FSLeafQueue. As FSLeafQueue has getters for these, they can be accessed 
> without locks in other places. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4238) createdTime and modifiedTime is not reported while publishing entities to ATSv2

2016-01-06 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086476#comment-15086476
 ] 

Vrushali C commented on YARN-4238:
--

[~gtCarrera9] ,
[~varun_saxena] mentioned that he most probably recollects that the modified 
time was being used to query the in memory store in ats v1. Do you have any 
idea if that was for Tez? Do you know if Tez specifically needs entities back 
in some chronological order?  Do you think we should email them and check? 

If that is the case, we do need the modification time class member. Then we 
perhaps can go ahead with updating it in each setter function in the 
TimelineEntity class. 


> createdTime and modifiedTime is not reported while publishing entities to 
> ATSv2
> ---
>
> Key: YARN-4238
> URL: https://issues.apache.org/jira/browse/YARN-4238
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4238-YARN-2928.01.patch, 
> YARN-4238-feature-YARN-2928.002.patch, YARN-4238-feature-YARN-2928.003.patch, 
> YARN-4238-feature-YARN-2928.02.patch
>
>
> While publishing entities from RM and elsewhere we are not sending created 
> time. For instance, created time in TimelineServiceV2Publisher class and for 
> other entities in other such similar classes is not updated. We can easily 
> update created time when sending application created event. Likewise for 
> modification time on every write.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4238) createdTime and modifiedTime is not reported while publishing entities to ATSv2

2016-01-06 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086484#comment-15086484
 ] 

Li Lu commented on YARN-4238:
-

That's an important concern. I know [~hitesh] knows a lot of details there, but 
not sure he can get back in time in vacation. I'll check with the Tez community 
about this. 

> createdTime and modifiedTime is not reported while publishing entities to 
> ATSv2
> ---
>
> Key: YARN-4238
> URL: https://issues.apache.org/jira/browse/YARN-4238
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4238-YARN-2928.01.patch, 
> YARN-4238-feature-YARN-2928.002.patch, YARN-4238-feature-YARN-2928.003.patch, 
> YARN-4238-feature-YARN-2928.02.patch
>
>
> While publishing entities from RM and elsewhere we are not sending created 
> time. For instance, created time in TimelineServiceV2Publisher class and for 
> other entities in other such similar classes is not updated. We can easily 
> update created time when sending application created event. Likewise for 
> modification time on every write.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4238) createdTime and modifiedTime is not reported while publishing entities to ATSv2

2016-01-06 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086498#comment-15086498
 ] 

Li Lu commented on YARN-4238:
-

Seems like to only thing related is the Tez UI. [~Sreenath] any comments from 
the UI perspective? Thanks! 

> createdTime and modifiedTime is not reported while publishing entities to 
> ATSv2
> ---
>
> Key: YARN-4238
> URL: https://issues.apache.org/jira/browse/YARN-4238
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4238-YARN-2928.01.patch, 
> YARN-4238-feature-YARN-2928.002.patch, YARN-4238-feature-YARN-2928.003.patch, 
> YARN-4238-feature-YARN-2928.02.patch
>
>
> While publishing entities from RM and elsewhere we are not sending created 
> time. For instance, created time in TimelineServiceV2Publisher class and for 
> other entities in other such similar classes is not updated. We can easily 
> update created time when sending application created event. Likewise for 
> modification time on every write.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4007) Add support for different network setups when launching the docker container

2016-01-06 Thread Sidharta Seethana (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086672#comment-15086672
 ] 

Sidharta Seethana commented on YARN-4007:
-

Hi [~vvasudev],

Would you mind if I take over this JIRA if you are not working on this 
currently?

thanks,
-Sidharta

> Add support for different network setups when launching the docker container
> 
>
> Key: YARN-4007
> URL: https://issues.apache.org/jira/browse/YARN-4007
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.8.0
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
>
> We should allow users to be able to launch containers with appropriate 
> network setups. For security, we should allow admins to provide a set of 
> options that the users are allowed to use.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4553) Add cgroups support for docker containers

2016-01-06 Thread Sidharta Seethana (JIRA)
Sidharta Seethana created YARN-4553:
---

 Summary: Add cgroups support for docker containers
 Key: YARN-4553
 URL: https://issues.apache.org/jira/browse/YARN-4553
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Sidharta Seethana
Assignee: Sidharta Seethana


Currently, cgroups-based resource isolation does not work with docker 
containers under YARN. The processes in these containers are launched by the 
docker daemon and they are not children of a container-executor process. Docker 
supports a --cgroup-parent flag which can be used to point to the 
container-specific cgroups that are created by the nodemanager. This will allow 
the Nodemanager to manage cgroups (as it does today) while allowing resource 
isolation to work with docker containers. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4554) ApplicationReport.getDiagnostics does not return diagnostics from individual attempts

2016-01-06 Thread Siddharth Seth (JIRA)
Siddharth Seth created YARN-4554:


 Summary: ApplicationReport.getDiagnostics does not return 
diagnostics from individual attempts
 Key: YARN-4554
 URL: https://issues.apache.org/jira/browse/YARN-4554
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Siddharth Seth


For an Application with ApplicationReport.getFinalApplicationStatus=FAILED and 
ApplicationReport.getYarnApplicationState=FINISHED - 
ApplicationReport.getDiagnostics returns an empty string.

Instead I had to use ApplicationReport.getCurrentApplicationAttemptId, followed 
by getApplicationAttemptReport to get diagnostics for the attempt - which 
contained the information I had used to unregister the app.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-4554) ApplicationReport.getDiagnostics does not return diagnostics from individual attempts

2016-01-06 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G reassigned YARN-4554:
-

Assignee: Sunil G

> ApplicationReport.getDiagnostics does not return diagnostics from individual 
> attempts
> -
>
> Key: YARN-4554
> URL: https://issues.apache.org/jira/browse/YARN-4554
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Siddharth Seth
>Assignee: Sunil G
>
> For an Application with ApplicationReport.getFinalApplicationStatus=FAILED 
> and ApplicationReport.getYarnApplicationState=FINISHED - 
> ApplicationReport.getDiagnostics returns an empty string.
> Instead I had to use ApplicationReport.getCurrentApplicationAttemptId, 
> followed by getApplicationAttemptReport to get diagnostics for the attempt - 
> which contained the information I had used to unregister the app.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4554) ApplicationReport.getDiagnostics does not return diagnostics from individual attempts

2016-01-06 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086739#comment-15086739
 ] 

Sunil G commented on YARN-4554:
---

Yes, I think this will be helpful. I will try to share a patch for same if you 
are not planning.

> ApplicationReport.getDiagnostics does not return diagnostics from individual 
> attempts
> -
>
> Key: YARN-4554
> URL: https://issues.apache.org/jira/browse/YARN-4554
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Siddharth Seth
>Assignee: Sunil G
>
> For an Application with ApplicationReport.getFinalApplicationStatus=FAILED 
> and ApplicationReport.getYarnApplicationState=FINISHED - 
> ApplicationReport.getDiagnostics returns an empty string.
> Instead I had to use ApplicationReport.getCurrentApplicationAttemptId, 
> followed by getApplicationAttemptReport to get diagnostics for the attempt - 
> which contained the information I had used to unregister the app.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4553) Add cgroups support for docker containers

2016-01-06 Thread Sidharta Seethana (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sidharta Seethana updated YARN-4553:

Attachment: YARN-4553.001.patch

Submitted patch to launch docker containers with --cgroup-parent where 
applicable. 

> Add cgroups support for docker containers
> -
>
> Key: YARN-4553
> URL: https://issues.apache.org/jira/browse/YARN-4553
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Sidharta Seethana
>Assignee: Sidharta Seethana
> Attachments: YARN-4553.001.patch
>
>
> Currently, cgroups-based resource isolation does not work with docker 
> containers under YARN. The processes in these containers are launched by the 
> docker daemon and they are not children of a container-executor process. 
> Docker supports a --cgroup-parent flag which can be used to point to the 
> container-specific cgroups that are created by the nodemanager. This will 
> allow the Nodemanager to manage cgroups (as it does today) while allowing 
> resource isolation to work with docker containers. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4537) Pull out priority comparison from fifocomparator and use compound comparator for FifoOrdering policy

2016-01-06 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086761#comment-15086761
 ] 

Rohith Sharma K S commented on YARN-4537:
-

For findbug, red mark is due to file not found. I think it is an issue in 
HadoopQA.

> Pull out priority comparison from fifocomparator and use compound comparator 
> for FifoOrdering policy
> 
>
> Key: YARN-4537
> URL: https://issues.apache.org/jira/browse/YARN-4537
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Attachments: 0001-YARN-4537.patch, 0002-YARN-4537.patch, 
> 0003-YARN-4537.patch, 0003-YARN-4537.patch
>
>
> Currently, priority comparison is integrated with FifoComparator. There 
> should be a separate comparator defined for priority comparison so that down 
> the line if any new ordering policy wants to integrate priority, they can use 
> compound comparator where priority will be high preference. 
> The following changes are expected to be done as part of this JIRA
> # Pull out priority comparison from FifoComparator
> # Define new priority comparator
> # Use compound comparator for FifoOrderingPolicy. Oder of preference is   
> Priority,FIFO



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4393) TestResourceLocalizationService#testFailedDirsResourceRelease fails intermittently

2016-01-06 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-4393:

Labels: test  (was: )

> TestResourceLocalizationService#testFailedDirsResourceRelease fails 
> intermittently
> --
>
> Key: YARN-4393
> URL: https://issues.apache.org/jira/browse/YARN-4393
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: test
>Affects Versions: 2.7.1
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: test
> Fix For: 2.7.3
>
> Attachments: YARN-4393.01.patch
>
>
> [~ozawa] pointed out this failure on YARN-4380.
> Check 
> https://issues.apache.org/jira/browse/YARN-4380?focusedCommentId=15023773&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15023773
> {noformat}
> sts run: 14, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.518 sec <<< 
> FAILURE! - in 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService
> testFailedDirsResourceRelease(org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService)
>  Time elapsed: 0.093 sec <<< FAILURE!
> org.mockito.exceptions.verification.junit.ArgumentsAreDifferent:
> Argument(s) are different! Wanted:
> eventHandler.handle(
> 
> );
> -> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testFailedDirsResourceRelease(TestResourceLocalizationService.java:2632)
> Actual invocation has different arguments:
> eventHandler.handle(
> EventType: APPLICATION_INITED
> );
> -> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testFailedDirsResourceRelease(TestResourceLocalizationService.java:2632)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4393) TestResourceLocalizationService#testFailedDirsResourceRelease fails intermittently

2016-01-06 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-4393:

Component/s: test

> TestResourceLocalizationService#testFailedDirsResourceRelease fails 
> intermittently
> --
>
> Key: YARN-4393
> URL: https://issues.apache.org/jira/browse/YARN-4393
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: test
>Affects Versions: 2.7.1
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: test
> Fix For: 2.7.3
>
> Attachments: YARN-4393.01.patch
>
>
> [~ozawa] pointed out this failure on YARN-4380.
> Check 
> https://issues.apache.org/jira/browse/YARN-4380?focusedCommentId=15023773&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15023773
> {noformat}
> sts run: 14, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.518 sec <<< 
> FAILURE! - in 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService
> testFailedDirsResourceRelease(org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService)
>  Time elapsed: 0.093 sec <<< FAILURE!
> org.mockito.exceptions.verification.junit.ArgumentsAreDifferent:
> Argument(s) are different! Wanted:
> eventHandler.handle(
> 
> );
> -> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testFailedDirsResourceRelease(TestResourceLocalizationService.java:2632)
> Actual invocation has different arguments:
> eventHandler.handle(
> EventType: APPLICATION_INITED
> );
> -> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testFailedDirsResourceRelease(TestResourceLocalizationService.java:2632)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4393) TestResourceLocalizationService#testFailedDirsResourceRelease fails intermittently

2016-01-06 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086769#comment-15086769
 ] 

Rohith Sharma K S commented on YARN-4393:
-

Committing shortly

> TestResourceLocalizationService#testFailedDirsResourceRelease fails 
> intermittently
> --
>
> Key: YARN-4393
> URL: https://issues.apache.org/jira/browse/YARN-4393
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: test
>Affects Versions: 2.7.1
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: test
> Fix For: 2.7.3
>
> Attachments: YARN-4393.01.patch
>
>
> [~ozawa] pointed out this failure on YARN-4380.
> Check 
> https://issues.apache.org/jira/browse/YARN-4380?focusedCommentId=15023773&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15023773
> {noformat}
> sts run: 14, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.518 sec <<< 
> FAILURE! - in 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService
> testFailedDirsResourceRelease(org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService)
>  Time elapsed: 0.093 sec <<< FAILURE!
> org.mockito.exceptions.verification.junit.ArgumentsAreDifferent:
> Argument(s) are different! Wanted:
> eventHandler.handle(
> 
> );
> -> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testFailedDirsResourceRelease(TestResourceLocalizationService.java:2632)
> Actual invocation has different arguments:
> eventHandler.handle(
> EventType: APPLICATION_INITED
> );
> -> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testFailedDirsResourceRelease(TestResourceLocalizationService.java:2632)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4553) Add cgroups support for docker containers

2016-01-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086779#comment-15086779
 ] 

Hadoop QA commented on YARN-4553:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
32s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
10s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 29s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
55s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
23s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 21s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 21s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 11s 
{color} | {color:red} Patch generated 3 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 (total was 8, now 11). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
10s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 0s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 15s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 33s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 4s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
17s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 33m 22s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12780890/YARN-4553.001.patch |
| JIRA Issue | YARN-4553 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 8828e446a8e2 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchproc

[jira] [Updated] (YARN-4393) TestResourceLocalizationService#testFailedDirsResourceRelease fails intermittently

2016-01-06 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-4393:

Target Version/s:   (was: 2.7.3)

> TestResourceLocalizationService#testFailedDirsResourceRelease fails 
> intermittently
> --
>
> Key: YARN-4393
> URL: https://issues.apache.org/jira/browse/YARN-4393
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: test
>Affects Versions: 2.7.1
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: test
> Fix For: 2.9.0
>
> Attachments: YARN-4393.01.patch
>
>
> [~ozawa] pointed out this failure on YARN-4380.
> Check 
> https://issues.apache.org/jira/browse/YARN-4380?focusedCommentId=15023773&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15023773
> {noformat}
> sts run: 14, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.518 sec <<< 
> FAILURE! - in 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService
> testFailedDirsResourceRelease(org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService)
>  Time elapsed: 0.093 sec <<< FAILURE!
> org.mockito.exceptions.verification.junit.ArgumentsAreDifferent:
> Argument(s) are different! Wanted:
> eventHandler.handle(
> 
> );
> -> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testFailedDirsResourceRelease(TestResourceLocalizationService.java:2632)
> Actual invocation has different arguments:
> eventHandler.handle(
> EventType: APPLICATION_INITED
> );
> -> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testFailedDirsResourceRelease(TestResourceLocalizationService.java:2632)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4393) TestResourceLocalizationService#testFailedDirsResourceRelease fails intermittently

2016-01-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086781#comment-15086781
 ] 

Hudson commented on YARN-4393:
--

FAILURE: Integrated in Hadoop-trunk-Commit #9062 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9062/])
YARN-4393. Fix intermittent test failure for (rohithsharmaks: rev 
791c1639ae0b351e0bf0b2ecec854dc72ab07935)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java


> TestResourceLocalizationService#testFailedDirsResourceRelease fails 
> intermittently
> --
>
> Key: YARN-4393
> URL: https://issues.apache.org/jira/browse/YARN-4393
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: test
>Affects Versions: 2.7.1
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: test
> Fix For: 2.9.0
>
> Attachments: YARN-4393.01.patch
>
>
> [~ozawa] pointed out this failure on YARN-4380.
> Check 
> https://issues.apache.org/jira/browse/YARN-4380?focusedCommentId=15023773&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15023773
> {noformat}
> sts run: 14, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.518 sec <<< 
> FAILURE! - in 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService
> testFailedDirsResourceRelease(org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService)
>  Time elapsed: 0.093 sec <<< FAILURE!
> org.mockito.exceptions.verification.junit.ArgumentsAreDifferent:
> Argument(s) are different! Wanted:
> eventHandler.handle(
> 
> );
> -> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testFailedDirsResourceRelease(TestResourceLocalizationService.java:2632)
> Actual invocation has different arguments:
> eventHandler.handle(
> EventType: APPLICATION_INITED
> );
> -> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testFailedDirsResourceRelease(TestResourceLocalizationService.java:2632)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4553) Add cgroups support for docker containers

2016-01-06 Thread Sidharta Seethana (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sidharta Seethana updated YARN-4553:

Attachment: YARN-4553.002.patch

Uploaded patch with minor checkstyle fixes. 

> Add cgroups support for docker containers
> -
>
> Key: YARN-4553
> URL: https://issues.apache.org/jira/browse/YARN-4553
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Sidharta Seethana
>Assignee: Sidharta Seethana
> Attachments: YARN-4553.001.patch, YARN-4553.002.patch
>
>
> Currently, cgroups-based resource isolation does not work with docker 
> containers under YARN. The processes in these containers are launched by the 
> docker daemon and they are not children of a container-executor process. 
> Docker supports a --cgroup-parent flag which can be used to point to the 
> container-specific cgroups that are created by the nodemanager. This will 
> allow the Nodemanager to manage cgroups (as it does today) while allowing 
> resource isolation to work with docker containers. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >