[jira] [Commented] (YARN-4090) Make Collections.sort() more efficient in FSParentQueue.java

2016-04-27 Thread Xianyin Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15261546#comment-15261546
 ] 

Xianyin Xin commented on YARN-4090:
---

Sorry for the delay, [~kasha] [~yufeigu]. Just uploaded the patch which fixed 
the three test fails above. For the two fails in 
{{TestFairSchedulerPreemption}}, it is because the {{decResourceUsage}} was 
double called when processing preemption (in both {{addPreemption()}} and 
{{containerCompleted()}}), and for the fail in {{TestAppRunnability}}, it is 
because we missed updating the queue's resource usage when moving an app.
Thanks [~yufeigu] for you info.

> Make Collections.sort() more efficient in FSParentQueue.java
> 
>
> Key: YARN-4090
> URL: https://issues.apache.org/jira/browse/YARN-4090
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Reporter: Xianyin Xin
>Assignee: Xianyin Xin
> Attachments: YARN-4090-TestResult.pdf, YARN-4090-preview.patch, 
> YARN-4090.001.patch, sampling1.jpg, sampling2.jpg
>
>
> Collections.sort() consumes too much time in a scheduling round.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4090) Make Collections.sort() more efficient in FSParentQueue.java

2016-04-27 Thread Xianyin Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xianyin Xin updated YARN-4090:
--
Attachment: YARN-4090.001.patch

> Make Collections.sort() more efficient in FSParentQueue.java
> 
>
> Key: YARN-4090
> URL: https://issues.apache.org/jira/browse/YARN-4090
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Reporter: Xianyin Xin
>Assignee: Xianyin Xin
> Attachments: YARN-4090-TestResult.pdf, YARN-4090-preview.patch, 
> YARN-4090.001.patch, sampling1.jpg, sampling2.jpg
>
>
> Collections.sort() consumes too much time in a scheduling round.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-5005) TestRMWebServices#testDumpingSchedulerLogs fails randomly

2016-04-27 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-5005:
---
Description: 
{noformat}
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Appender is already 
dumping logs
at 
org.apache.hadoop.yarn.util.AdHocLogDumper.dumpLogs(AdHocLogDumper.java:65)
at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.dumpSchedulerLogs(RMWebServices.java:321)
at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices.testDumpingSchedulerLogs(TestRMWebServices.java:674)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at 
org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:86)
at 
org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:459)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:675)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:382)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:192)
{noformat}

First dumpLog is set to dump logs for 1 sec
{noformat}
webSvc.dumpSchedulerLogs("1", mockHsr);
Thread.sleep(1000);
{noformat}
sleep(1000) is used wait for completion but randomly during testcase run the 
log dump is called again with in 1 sec.



  was:
{noformat}
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Appender is already 
dumping logs
at 
org.apache.hadoop.yarn.util.AdHocLogDumper.dumpLogs(AdHocLogDumper.java:65)
at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.dumpSchedulerLogs(RMWebServices.java:321)
at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices.testDumpingSchedulerLogs(TestRMWebServices.java:674)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.ja

[jira] [Created] (YARN-5005) TestRMWebServices#testDumpingSchedulerLogs fails randomly

2016-04-27 Thread Bibin A Chundatt (JIRA)
Bibin A Chundatt created YARN-5005:
--

 Summary: TestRMWebServices#testDumpingSchedulerLogs fails randomly
 Key: YARN-5005
 URL: https://issues.apache.org/jira/browse/YARN-5005
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Bibin A Chundatt


{noformat}
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Appender is already 
dumping logs
at 
org.apache.hadoop.yarn.util.AdHocLogDumper.dumpLogs(AdHocLogDumper.java:65)
at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.dumpSchedulerLogs(RMWebServices.java:321)
at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices.testDumpingSchedulerLogs(TestRMWebServices.java:674)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at 
org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:86)
at 
org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:459)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:675)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:382)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:192)
{noformat}

First dumpLog is set to dump logs for 1 sec
{noformat}
webSvc.dumpSchedulerLogs("1", mockHsr);
Thread.sleep(1000);
{noformat}
And sleep(1000) is used wait for completion and randomly during testcase run 
the log dump is called again with in 1 sec.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-5005) TestRMWebServices#testDumpingSchedulerLogs fails randomly

2016-04-27 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-5005:
---
Issue Type: Test  (was: Bug)

> TestRMWebServices#testDumpingSchedulerLogs fails randomly
> -
>
> Key: YARN-5005
> URL: https://issues.apache.org/jira/browse/YARN-5005
> Project: Hadoop YARN
>  Issue Type: Test
>Reporter: Bibin A Chundatt
>
> {noformat}
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Appender is already 
> dumping logs
>   at 
> org.apache.hadoop.yarn.util.AdHocLogDumper.dumpLogs(AdHocLogDumper.java:65)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.dumpSchedulerLogs(RMWebServices.java:321)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices.testDumpingSchedulerLogs(TestRMWebServices.java:674)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
>   at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
>   at java.lang.reflect.Method.invoke(Unknown Source)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
>   at 
> org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:86)
>   at 
> org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
>   at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:459)
>   at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:675)
>   at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:382)
>   at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:192)
> {noformat}
> First dumpLog is set to dump logs for 1 sec
> {noformat}
> webSvc.dumpSchedulerLogs("1", mockHsr);
> Thread.sleep(1000);
> {noformat}
> And sleep(1000) is used wait for completion and randomly during testcase run 
> the log dump is called again with in 1 sec.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4447) Provide a mechanism to represent complex filters and parse them at the REST layer

2016-04-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15261309#comment-15261309
 ] 

Hadoop QA commented on YARN-4447:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 14s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
26s {color} | {color:green} YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 20s 
{color} | {color:green} YARN-2928 passed with JDK v1.8.0_92 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 19s 
{color} | {color:green} YARN-2928 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
17s {color} | {color:green} YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 27s 
{color} | {color:green} YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
17s {color} | {color:green} YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
37s {color} | {color:green} YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 16s 
{color} | {color:green} YARN-2928 passed with JDK v1.8.0_92 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s 
{color} | {color:green} YARN-2928 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
22s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 15s 
{color} | {color:green} the patch passed with JDK v1.8.0_92 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 15s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 18s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 18s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 13s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice:
 patch generated 12 new + 5 unchanged - 10 fixed = 17 total (was 15) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
44s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 13s 
{color} | {color:green} the patch passed with JDK v1.8.0_92 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 4m 16s 
{color} | {color:green} hadoop-yarn-server-timelineservice in the patch passed 
with JDK v1.8.0_92. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 4m 14s 
{color} | {color:green} hadoop-yarn-server-timelineservice in the patch passed 
with JDK v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
18s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 24m 25s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12801073/YARN-4447-YARN-2928.01.patch
 |
| JIRA Issue | YARN-4447 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 47289d5b4cec 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
|

[jira] [Commented] (YARN-4447) Provide a mechanism to represent complex filters and parse them at the REST layer

2016-04-27 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15261276#comment-15261276
 ] 

Sangjin Lee commented on YARN-4447:
---

I just restarted the jenkins build, but it seems that there is a larger build 
issue going on. We'll see.

> Provide a mechanism to represent complex filters and parse them at the REST 
> layer 
> --
>
> Key: YARN-4447
> URL: https://issues.apache.org/jira/browse/YARN-4447
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4447-YARN-2928.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4987) Read cache concurrency issue between read and evict in EntityGroupFS timeline store

2016-04-27 Thread Li Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-4987:

Description: To handle concurrency issues, key value based timeline storage 
may return null on reads that are concurrent to service stop. This is actually 
caused by a concurrency issue between cache reads and evicts. Specifically, if 
the storage is being read when it gets evicted, the storage may turn into null. 
EntityGroupFS timeline store needs to handle this case gracefully.   (was: To 
handle concurrency issues, key value based timeline storage may return null on 
reads that are concurrent to service stop. EntityGroupFS timeline store needs 
to handle this case gracefully. )

> Read cache concurrency issue between read and evict in EntityGroupFS timeline 
> store 
> 
>
> Key: YARN-4987
> URL: https://issues.apache.org/jira/browse/YARN-4987
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Li Lu
>Assignee: Li Lu
>Priority: Critical
>
> To handle concurrency issues, key value based timeline storage may return 
> null on reads that are concurrent to service stop. This is actually caused by 
> a concurrency issue between cache reads and evicts. Specifically, if the 
> storage is being read when it gets evicted, the storage may turn into null. 
> EntityGroupFS timeline store needs to handle this case gracefully. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4987) Read cache concurrency issue between read and evict in EntityGroupFS timeline store

2016-04-27 Thread Li Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-4987:

Summary: Read cache concurrency issue between read and evict in 
EntityGroupFS timeline store   (was: EntityGroupFS timeline store needs to 
handle null storage gracefully)

> Read cache concurrency issue between read and evict in EntityGroupFS timeline 
> store 
> 
>
> Key: YARN-4987
> URL: https://issues.apache.org/jira/browse/YARN-4987
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Li Lu
>Assignee: Li Lu
>Priority: Critical
>
> To handle concurrency issues, key value based timeline storage may return 
> null on reads that are concurrent to service stop. EntityGroupFS timeline 
> store needs to handle this case gracefully. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4987) EntityGroupFS timeline store needs to handle null storage gracefully

2016-04-27 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15261240#comment-15261240
 ] 

Li Lu commented on YARN-4987:
-

I just notice there is a caching/concurrency bug hidden behind this issue. The 
cache item being reclaimed may actually be reading by some other concurrent 
readers. Will fix the problem in this JIRA. 

> EntityGroupFS timeline store needs to handle null storage gracefully
> 
>
> Key: YARN-4987
> URL: https://issues.apache.org/jira/browse/YARN-4987
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Li Lu
>Assignee: Li Lu
>Priority: Critical
>
> To handle concurrency issues, key value based timeline storage may return 
> null on reads that are concurrent to service stop. EntityGroupFS timeline 
> store needs to handle this case gracefully. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4987) EntityGroupFS timeline store needs to handle null storage gracefully

2016-04-27 Thread Li Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-4987:

Priority: Critical  (was: Minor)

> EntityGroupFS timeline store needs to handle null storage gracefully
> 
>
> Key: YARN-4987
> URL: https://issues.apache.org/jira/browse/YARN-4987
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Li Lu
>Assignee: Li Lu
>Priority: Critical
>
> To handle concurrency issues, key value based timeline storage may return 
> null on reads that are concurrent to service stop. EntityGroupFS timeline 
> store needs to handle this case gracefully. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4390) Do surgical preemption based on reserved container in CapacityScheduler

2016-04-27 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15261207#comment-15261207
 ] 

Wangda Tan commented on YARN-4390:
--

Oh, I think I understand what happened.
If you set nature_termination_factor to < 1, AM cannot be reserved in some 
cases because of YARN-4280. (Reserving AM needs 2G resources, but every time 
preemption policy only preempts one container which goes back to original 
queue.).

So set it to 1 is still needed.

> Do surgical preemption based on reserved container in CapacityScheduler
> ---
>
> Key: YARN-4390
> URL: https://issues.apache.org/jira/browse/YARN-4390
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.0.0, 2.8.0, 2.7.3
>Reporter: Eric Payne
>Assignee: Wangda Tan
> Attachments: QueueNotHittingMax.jpg, YARN-4390-design.1.pdf, 
> YARN-4390-test-results.pdf, YARN-4390.1.patch, YARN-4390.2.patch, 
> YARN-4390.3.branch-2.patch, YARN-4390.3.patch, YARN-4390.4.patch, 
> YARN-4390.5.patch, YARN-4390.6.patch, YARN-4390.7.patch
>
>
> There are multiple reasons why preemption could unnecessarily preempt 
> containers. One is that an app could be requesting a large container (say 
> 8-GB), and the preemption monitor could conceivably preempt multiple 
> containers (say 8, 1-GB containers) in order to fill the large container 
> request. These smaller containers would then be rejected by the requesting AM 
> and potentially given right back to the preempted app.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4844) Upgrade fields of o.a.h.y.api.records.Resource from int32 to int64

2016-04-27 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15261187#comment-15261187
 ] 

Hitesh Shah commented on YARN-4844:
---

bq. Per my understanding, changing from int to long won't affect downstream 
project a lot, it's an error which can be captured by compiler directly. And 
getMemory/getVCores should not be used intensively by downstream project. For 
example, MR uses only ~20 times of getMemory()/VCores for non-testing code. 
Which can be easily fixed.

If you are going to force downstream apps to change, I dont understand why you 
are not forcing them to do this in the first 3.0.0 release? What benefit does 
this provide anyone by delaying it to some later 3.x.y release? It just means 
that you have do the production stability verification of upstream apps all 
over again. 

> Upgrade fields of o.a.h.y.api.records.Resource from int32 to int64
> --
>
> Key: YARN-4844
> URL: https://issues.apache.org/jira/browse/YARN-4844
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Blocker
> Attachments: YARN-4844.1.patch, YARN-4844.2.patch, YARN-4844.3.patch
>
>
> We use int32 for memory now, if a cluster has 10k nodes, each node has 210G 
> memory, we will get a negative total cluster memory.
> And another case that easier overflows int32 is: we added all pending 
> resources of running apps to cluster's total pending resources. If a 
> problematic app requires too much resources (let's say 1M+ containers, each 
> of them has 3G containers), int32 will be not enough.
> Even if we can cap each app's pending request, we cannot handle the case that 
> there're many running apps, each of them has capped but still significant 
> numbers of pending resources.
> So we may possibly need to upgrade int32 memory field (could include v-cores 
> as well) to int64 to avoid integer overflow. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4844) Upgrade fields of o.a.h.y.api.records.Resource from int32 to int64

2016-04-27 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-4844:
-
Attachment: YARN-4844.3.patch

Attached ver.3 patch.

> Upgrade fields of o.a.h.y.api.records.Resource from int32 to int64
> --
>
> Key: YARN-4844
> URL: https://issues.apache.org/jira/browse/YARN-4844
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Blocker
> Attachments: YARN-4844.1.patch, YARN-4844.2.patch, YARN-4844.3.patch
>
>
> We use int32 for memory now, if a cluster has 10k nodes, each node has 210G 
> memory, we will get a negative total cluster memory.
> And another case that easier overflows int32 is: we added all pending 
> resources of running apps to cluster's total pending resources. If a 
> problematic app requires too much resources (let's say 1M+ containers, each 
> of them has 3G containers), int32 will be not enough.
> Even if we can cap each app's pending request, we cannot handle the case that 
> there're many running apps, each of them has capped but still significant 
> numbers of pending resources.
> So we may possibly need to upgrade int32 memory field (could include v-cores 
> as well) to int64 to avoid integer overflow. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (YARN-4913) Yarn logs should take a -out option to write to a directory

2016-04-27 Thread Ram Venkatesh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ram Venkatesh reopened YARN-4913:
-

> Yarn logs should take a -out option to write to a directory
> ---
>
> Key: YARN-4913
> URL: https://issues.apache.org/jira/browse/YARN-4913
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4913.1.patch, YARN-4913.2.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4913) Yarn logs should take a -out option to write to a directory

2016-04-27 Thread Ram Venkatesh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15261164#comment-15261164
 ] 

Ram Venkatesh commented on YARN-4913:
-

[~xgong] Here are two reasons for the -out option, both are more relevant for 
large multi-GB app instances. 
1. yarn logs > $targetFile produces a single file that appends all the 
individual container logs. This requires (clumsy | complex) parsing to split 
the files apart if you are looking for specific task or app-specific logs. 
Writing to a directory will preserve the distinct files easily and also lend 
itself to archiving.
2. redirecting through the console instead of writing directly to the local 
filesystem apis adds additional overhead on some platforms like Windows.

>From a supportability standpoint I think this option will be useful.

> Yarn logs should take a -out option to write to a directory
> ---
>
> Key: YARN-4913
> URL: https://issues.apache.org/jira/browse/YARN-4913
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4913.1.patch, YARN-4913.2.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2888) Corrective mechanisms for rebalancing NM container queues

2016-04-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15261144#comment-15261144
 ] 

Hadoop QA commented on YARN-2888:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} docker {color} | {color:red} 0m 3s {color} 
| {color:red} Docker failed to build yetus/hadoop:7b1c37a. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12801107/YARN-2888.004.patch |
| JIRA Issue | YARN-2888 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/11253/console |
| Powered by | Apache Yetus 0.2.0   http://yetus.apache.org |


This message was automatically generated.



> Corrective mechanisms for rebalancing NM container queues
> -
>
> Key: YARN-2888
> URL: https://issues.apache.org/jira/browse/YARN-2888
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Konstantinos Karanasos
>Assignee: Arun Suresh
> Attachments: YARN-2888-yarn-2877.001.patch, 
> YARN-2888-yarn-2877.002.patch, YARN-2888.003.patch, YARN-2888.004.patch
>
>
> Bad queuing decisions by the LocalRMs (e.g., due to the distributed nature of 
> the scheduling decisions or due to having a stale image of the system) may 
> lead to an imbalance in the waiting times of the NM container queues. This 
> can in turn have an impact in job execution times and cluster utilization.
> To this end, we introduce corrective mechanisms that may remove (whenever 
> needed) container requests from overloaded queues, adding them to less-loaded 
> ones.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-5002) getApplicationReport call may raise NPE

2016-04-27 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15261139#comment-15261139
 ] 

Jian He commented on YARN-5002:
---

bq. S3 credentials in the output path.
sorry, could you clarify what you mean?  what output path are you referring to 
? This is to say whether the user can view the YARN app meta info from the UI 
or command line.  Also, the original app_view acl is still taking effect.  
Also, I think user app should not rely on the yarn queue acl for their app 
access control in the first place. 

> getApplicationReport call may raise NPE
> ---
>
> Key: YARN-5002
> URL: https://issues.apache.org/jira/browse/YARN-5002
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Sumana Sathish
>Assignee: Jian He
>Priority: Critical
> Attachments: YARN-5002.1.patch, YARN-5002.2.patch, YARN-5002.3.patch
>
>
> getApplicationReport call may raise NPE
> {code}
> Exception in thread "main" java.lang.NullPointerException: 
> java.lang.NullPointerException
>  
> org.apache.hadoop.yarn.server.resourcemanager.security.QueueACLsManager.checkAccess(QueueACLsManager.java:57)
>  
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.checkAccess(ClientRMService.java:279)
>  
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplications(ClientRMService.java:760)
>  
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplications(ClientRMService.java:682)
>  
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplications(ApplicationClientProtocolPBServiceImpl.java:234)
>  
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:425)
>  
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>  org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
>  org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2268)
>  org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2264)
>  java.security.AccessController.doPrivileged(Native Method)
>  javax.security.auth.Subject.doAs(Subject.java:422)
>  
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1708)
>  org.apache.hadoop.ipc.Server$Handler.run(Server.java:2262)
>  sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>  
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>  
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>  java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>  org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
>  org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:107)
>  
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplications(ApplicationClientProtocolPBClientImpl.java:254)
>  sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  java.lang.reflect.Method.invoke(Method.java:498)
>  
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256)
>  
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
>  com.sun.proxy.$Proxy18.getApplications(Unknown Source)
>  
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplications(YarnClientImpl.java:479)
>  
> org.apache.hadoop.mapred.ResourceMgrDelegate.getAllJobs(ResourceMgrDelegate.java:135)
>  org.apache.hadoop.mapred.YARNRunner.getAllJobs(YARNRunner.java:167)
>  org.apache.hadoop.mapreduce.Cluster.getAllJobStatuses(Cluster.java:294)
>  org.apache.hadoop.mapreduce.tools.CLI.listJobs(CLI.java:553)
>  org.apache.hadoop.mapreduce.tools.CLI.run(CLI.java:338)
>  org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>  org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
>  org.apache.hadoop.mapred.JobClient.main(JobClient.java:1274)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-5004) FS: queue can use more than the max resources set

2016-04-27 Thread Yufei Gu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yufei Gu updated YARN-5004:
---
Affects Version/s: 2.8.0

> FS: queue can use more than the max resources set
> -
>
> Key: YARN-5004
> URL: https://issues.apache.org/jira/browse/YARN-5004
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler, yarn
>Affects Versions: 2.8.0
>Reporter: Yufei Gu
>Assignee: Yufei Gu
>
> We found a case that the queue is using 301 vcores while the max is set to 
> 300. The same for the memory usage. The documentation states (see hadoop 
> 2.7.1 FairScheduler documentation on apache):
> -+-+-
> A queue will never be assigned a container that would put its aggregate usage 
> over this limit.
> -+-+- 
> This is clearly not correct in the documentation or the behaviour.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-5004) FS: queue can use more than the max resources set

2016-04-27 Thread Yufei Gu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yufei Gu updated YARN-5004:
---
Component/s: yarn
 fairscheduler

> FS: queue can use more than the max resources set
> -
>
> Key: YARN-5004
> URL: https://issues.apache.org/jira/browse/YARN-5004
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler, yarn
>Reporter: Yufei Gu
>Assignee: Yufei Gu
>
> We found a case that the queue is using 301 vcores while the max is set to 
> 300. The same for the memory usage. The documentation states (see hadoop 
> 2.7.1 FairScheduler documentation on apache):
> -+-+-
> A queue will never be assigned a container that would put its aggregate usage 
> over this limit.
> -+-+- 
> This is clearly not correct in the documentation or the behaviour.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-5002) getApplicationReport call may raise NPE

2016-04-27 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15261098#comment-15261098
 ] 

Daniel Templeton commented on YARN-5002:


If I submit a mapreduce job to a secure queue that has my S3 credentials in the 
output path, I'm gonna be pretty pissed if some admin deleting a queue causes 
my credentials to be exposed.

> getApplicationReport call may raise NPE
> ---
>
> Key: YARN-5002
> URL: https://issues.apache.org/jira/browse/YARN-5002
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Sumana Sathish
>Assignee: Jian He
>Priority: Critical
> Attachments: YARN-5002.1.patch, YARN-5002.2.patch, YARN-5002.3.patch
>
>
> getApplicationReport call may raise NPE
> {code}
> Exception in thread "main" java.lang.NullPointerException: 
> java.lang.NullPointerException
>  
> org.apache.hadoop.yarn.server.resourcemanager.security.QueueACLsManager.checkAccess(QueueACLsManager.java:57)
>  
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.checkAccess(ClientRMService.java:279)
>  
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplications(ClientRMService.java:760)
>  
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplications(ClientRMService.java:682)
>  
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplications(ApplicationClientProtocolPBServiceImpl.java:234)
>  
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:425)
>  
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>  org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
>  org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2268)
>  org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2264)
>  java.security.AccessController.doPrivileged(Native Method)
>  javax.security.auth.Subject.doAs(Subject.java:422)
>  
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1708)
>  org.apache.hadoop.ipc.Server$Handler.run(Server.java:2262)
>  sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>  
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>  
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>  java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>  org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
>  org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:107)
>  
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplications(ApplicationClientProtocolPBClientImpl.java:254)
>  sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  java.lang.reflect.Method.invoke(Method.java:498)
>  
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256)
>  
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
>  com.sun.proxy.$Proxy18.getApplications(Unknown Source)
>  
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplications(YarnClientImpl.java:479)
>  
> org.apache.hadoop.mapred.ResourceMgrDelegate.getAllJobs(ResourceMgrDelegate.java:135)
>  org.apache.hadoop.mapred.YARNRunner.getAllJobs(YARNRunner.java:167)
>  org.apache.hadoop.mapreduce.Cluster.getAllJobStatuses(Cluster.java:294)
>  org.apache.hadoop.mapreduce.tools.CLI.listJobs(CLI.java:553)
>  org.apache.hadoop.mapreduce.tools.CLI.run(CLI.java:338)
>  org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>  org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
>  org.apache.hadoop.mapred.JobClient.main(JobClient.java:1274)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-5002) getApplicationReport call may raise NPE

2016-04-27 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15261086#comment-15261086
 ] 

Wangda Tan commented on YARN-5002:
--

Echo [~jianhe]'s comment: I would also prefer existing approach: if queue is 
removed, ACL to view these apps from removed queue should be *, otherwise apps 
will be disappeared from any user's perspective. And this is the previous 
behavior too.

A more comprehensive approach is to record configuration to state-store. In 
short term, attached fix looks good.

> getApplicationReport call may raise NPE
> ---
>
> Key: YARN-5002
> URL: https://issues.apache.org/jira/browse/YARN-5002
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Sumana Sathish
>Assignee: Jian He
>Priority: Critical
> Attachments: YARN-5002.1.patch, YARN-5002.2.patch, YARN-5002.3.patch
>
>
> getApplicationReport call may raise NPE
> {code}
> Exception in thread "main" java.lang.NullPointerException: 
> java.lang.NullPointerException
>  
> org.apache.hadoop.yarn.server.resourcemanager.security.QueueACLsManager.checkAccess(QueueACLsManager.java:57)
>  
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.checkAccess(ClientRMService.java:279)
>  
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplications(ClientRMService.java:760)
>  
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplications(ClientRMService.java:682)
>  
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplications(ApplicationClientProtocolPBServiceImpl.java:234)
>  
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:425)
>  
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>  org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
>  org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2268)
>  org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2264)
>  java.security.AccessController.doPrivileged(Native Method)
>  javax.security.auth.Subject.doAs(Subject.java:422)
>  
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1708)
>  org.apache.hadoop.ipc.Server$Handler.run(Server.java:2262)
>  sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>  
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>  
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>  java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>  org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
>  org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:107)
>  
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplications(ApplicationClientProtocolPBClientImpl.java:254)
>  sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  java.lang.reflect.Method.invoke(Method.java:498)
>  
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256)
>  
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
>  com.sun.proxy.$Proxy18.getApplications(Unknown Source)
>  
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplications(YarnClientImpl.java:479)
>  
> org.apache.hadoop.mapred.ResourceMgrDelegate.getAllJobs(ResourceMgrDelegate.java:135)
>  org.apache.hadoop.mapred.YARNRunner.getAllJobs(YARNRunner.java:167)
>  org.apache.hadoop.mapreduce.Cluster.getAllJobStatuses(Cluster.java:294)
>  org.apache.hadoop.mapreduce.tools.CLI.listJobs(CLI.java:553)
>  org.apache.hadoop.mapreduce.tools.CLI.run(CLI.java:338)
>  org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>  org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
>  org.apache.hadoop.mapred.JobClient.main(JobClient.java:1274)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-5004) FS: queue can use more than the max resources set

2016-04-27 Thread Yufei Gu (JIRA)
Yufei Gu created YARN-5004:
--

 Summary: FS: queue can use more than the max resources set
 Key: YARN-5004
 URL: https://issues.apache.org/jira/browse/YARN-5004
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Yufei Gu
Assignee: Yufei Gu


We found a case that the queue is using 301 vcores while the max is set to 300. 
The same for the memory usage. The documentation states (see hadoop 2.7.1 
FairScheduler documentation on apache):
-+-+-
A queue will never be assigned a container that would put its aggregate usage 
over this limit.
-+-+- 
This is clearly not correct in the documentation or the behaviour.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4390) Do surgical preemption based on reserved container in CapacityScheduler

2016-04-27 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260985#comment-15260985
 ] 

Eric Payne commented on YARN-4390:
--

[~leftnoteasy],
{quote}
have you set following config?
{code}


yarn.resourcemanager.monitor.capacity.preemption.select_based_on_reserved_containers
true

{code}
{quote}
Yup! :-) I double checked and that parameter is definitely set in my 
environment.

bq. 1) total_preemption_per_round should make sure that, each round needs 
preempt enough resource to allocate one large container
preemption per round is set to 100%

bq.  2) before ver.7, natural_termination_factor should set to 1 to make sure 
enough resources will be preempted.
That was it! I set natural termination factor to 1.0 and it's working more in 
line with what I expect. I was not setting natural termination factor.

Unfortunately, when I applied YARN-4390.7.patch, I still need to set the 
natural termination factor in order to get the expected results. If I just 
leave that parameter out of my config and let it go to the default, the 
behavior is the same as in version 6 of the patch. That is, the app requesting 
larger containers can never use more than about 68% of the {{ops}} queue, and 
the app running on the preemptable queue has more than 100 containers 
preempted, only to be given back to the same app.


> Do surgical preemption based on reserved container in CapacityScheduler
> ---
>
> Key: YARN-4390
> URL: https://issues.apache.org/jira/browse/YARN-4390
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.0.0, 2.8.0, 2.7.3
>Reporter: Eric Payne
>Assignee: Wangda Tan
> Attachments: QueueNotHittingMax.jpg, YARN-4390-design.1.pdf, 
> YARN-4390-test-results.pdf, YARN-4390.1.patch, YARN-4390.2.patch, 
> YARN-4390.3.branch-2.patch, YARN-4390.3.patch, YARN-4390.4.patch, 
> YARN-4390.5.patch, YARN-4390.6.patch, YARN-4390.7.patch
>
>
> There are multiple reasons why preemption could unnecessarily preempt 
> containers. One is that an app could be requesting a large container (say 
> 8-GB), and the preemption monitor could conceivably preempt multiple 
> containers (say 8, 1-GB containers) in order to fill the large container 
> request. These smaller containers would then be rejected by the requesting AM 
> and potentially given right back to the preempted app.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3573) MiniMRYarnCluster constructor that starts the timeline server using a boolean should be marked deprecated

2016-04-27 Thread Andras Bokor (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260972#comment-15260972
 ] 

Andras Bokor commented on YARN-3573:


[~brahmareddy] Thanks for getting back to me. I meant MiniYARNCluster not 
MiniMRYarnCluster.
Actually what I see is that through the other constructors we call the 
deprecated constructor. So finally we cannot avoid to call a deprecated method. 
It can be confusing at first time. Is it considered to remove?

> MiniMRYarnCluster constructor that starts the timeline server using a boolean 
> should be marked deprecated
> -
>
> Key: YARN-3573
> URL: https://issues.apache.org/jira/browse/YARN-3573
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: timelineserver
>Affects Versions: 2.6.0
>Reporter: Mit Desai
>Assignee: Brahma Reddy Battula
> Fix For: 2.8.0
>
> Attachments: YARN-3573-002.patch, YARN-3573.patch
>
>
> {code}MiniMRYarnCluster(String testName, int noOfNMs, boolean enableAHS){code}
> starts the timeline server using *boolean enableAHS*. It is better to have 
> the timelineserver started based on the config value.
> We should mark this constructor as deprecated to avoid its future use.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2888) Corrective mechanisms for rebalancing NM container queues

2016-04-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260945#comment-15260945
 ] 

Hadoop QA commented on YARN-2888:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} docker {color} | {color:red} 0m 2s {color} 
| {color:red} Docker failed to build yetus/hadoop:7b1c37a. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12801107/YARN-2888.004.patch |
| JIRA Issue | YARN-2888 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/11252/console |
| Powered by | Apache Yetus 0.2.0   http://yetus.apache.org |


This message was automatically generated.



> Corrective mechanisms for rebalancing NM container queues
> -
>
> Key: YARN-2888
> URL: https://issues.apache.org/jira/browse/YARN-2888
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Konstantinos Karanasos
>Assignee: Arun Suresh
> Attachments: YARN-2888-yarn-2877.001.patch, 
> YARN-2888-yarn-2877.002.patch, YARN-2888.003.patch, YARN-2888.004.patch
>
>
> Bad queuing decisions by the LocalRMs (e.g., due to the distributed nature of 
> the scheduling decisions or due to having a stale image of the system) may 
> lead to an imbalance in the waiting times of the NM container queues. This 
> can in turn have an impact in job execution times and cluster utilization.
> To this end, we introduce corrective mechanisms that may remove (whenever 
> needed) container requests from overloaded queues, adding them to less-loaded 
> ones.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2888) Corrective mechanisms for rebalancing NM container queues

2016-04-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260941#comment-15260941
 ] 

Hadoop QA commented on YARN-2888:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} docker {color} | {color:red} 0m 3s {color} 
| {color:red} Docker failed to build yetus/hadoop:7b1c37a. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12801107/YARN-2888.004.patch |
| JIRA Issue | YARN-2888 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/11251/console |
| Powered by | Apache Yetus 0.2.0   http://yetus.apache.org |


This message was automatically generated.



> Corrective mechanisms for rebalancing NM container queues
> -
>
> Key: YARN-2888
> URL: https://issues.apache.org/jira/browse/YARN-2888
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Konstantinos Karanasos
>Assignee: Arun Suresh
> Attachments: YARN-2888-yarn-2877.001.patch, 
> YARN-2888-yarn-2877.002.patch, YARN-2888.003.patch, YARN-2888.004.patch
>
>
> Bad queuing decisions by the LocalRMs (e.g., due to the distributed nature of 
> the scheduling decisions or due to having a stale image of the system) may 
> lead to an imbalance in the waiting times of the NM container queues. This 
> can in turn have an impact in job execution times and cluster utilization.
> To this end, we introduce corrective mechanisms that may remove (whenever 
> needed) container requests from overloaded queues, adding them to less-loaded 
> ones.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2888) Corrective mechanisms for rebalancing NM container queues

2016-04-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260935#comment-15260935
 ] 

Hadoop QA commented on YARN-2888:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} docker {color} | {color:red} 0m 3s {color} 
| {color:red} Docker failed to build yetus/hadoop:7b1c37a. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12801107/YARN-2888.004.patch |
| JIRA Issue | YARN-2888 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/11250/console |
| Powered by | Apache Yetus 0.2.0   http://yetus.apache.org |


This message was automatically generated.



> Corrective mechanisms for rebalancing NM container queues
> -
>
> Key: YARN-2888
> URL: https://issues.apache.org/jira/browse/YARN-2888
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Konstantinos Karanasos
>Assignee: Arun Suresh
> Attachments: YARN-2888-yarn-2877.001.patch, 
> YARN-2888-yarn-2877.002.patch, YARN-2888.003.patch, YARN-2888.004.patch
>
>
> Bad queuing decisions by the LocalRMs (e.g., due to the distributed nature of 
> the scheduling decisions or due to having a stale image of the system) may 
> lead to an imbalance in the waiting times of the NM container queues. This 
> can in turn have an impact in job execution times and cluster utilization.
> To this end, we introduce corrective mechanisms that may remove (whenever 
> needed) container requests from overloaded queues, adding them to less-loaded 
> ones.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2888) Corrective mechanisms for rebalancing NM container queues

2016-04-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260933#comment-15260933
 ] 

Hadoop QA commented on YARN-2888:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} docker {color} | {color:red} 7m 54s 
{color} | {color:red} Docker failed to build yetus/hadoop:7b1c37a. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12801107/YARN-2888.004.patch |
| JIRA Issue | YARN-2888 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/11249/console |
| Powered by | Apache Yetus 0.2.0   http://yetus.apache.org |


This message was automatically generated.



> Corrective mechanisms for rebalancing NM container queues
> -
>
> Key: YARN-2888
> URL: https://issues.apache.org/jira/browse/YARN-2888
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Konstantinos Karanasos
>Assignee: Arun Suresh
> Attachments: YARN-2888-yarn-2877.001.patch, 
> YARN-2888-yarn-2877.002.patch, YARN-2888.003.patch, YARN-2888.004.patch
>
>
> Bad queuing decisions by the LocalRMs (e.g., due to the distributed nature of 
> the scheduling decisions or due to having a stale image of the system) may 
> lead to an imbalance in the waiting times of the NM container queues. This 
> can in turn have an impact in job execution times and cluster utilization.
> To this end, we introduce corrective mechanisms that may remove (whenever 
> needed) container requests from overloaded queues, adding them to less-loaded 
> ones.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2888) Corrective mechanisms for rebalancing NM container queues

2016-04-27 Thread Arun Suresh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-2888:
--
Attachment: YARN-2888.004.patch

kicking off Jenkins again..

> Corrective mechanisms for rebalancing NM container queues
> -
>
> Key: YARN-2888
> URL: https://issues.apache.org/jira/browse/YARN-2888
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Konstantinos Karanasos
>Assignee: Arun Suresh
> Attachments: YARN-2888-yarn-2877.001.patch, 
> YARN-2888-yarn-2877.002.patch, YARN-2888.003.patch, YARN-2888.004.patch
>
>
> Bad queuing decisions by the LocalRMs (e.g., due to the distributed nature of 
> the scheduling decisions or due to having a stale image of the system) may 
> lead to an imbalance in the waiting times of the NM container queues. This 
> can in turn have an impact in job execution times and cluster utilization.
> To this end, we introduce corrective mechanisms that may remove (whenever 
> needed) container requests from overloaded queues, adding them to less-loaded 
> ones.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4913) Yarn logs should take a -out option to write to a directory

2016-04-27 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260860#comment-15260860
 ] 

Xuan Gong commented on YARN-4913:
-

Yes, actually we do not need this

> Yarn logs should take a -out option to write to a directory
> ---
>
> Key: YARN-4913
> URL: https://issues.apache.org/jira/browse/YARN-4913
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4913.1.patch, YARN-4913.2.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-4913) Yarn logs should take a -out option to write to a directory

2016-04-27 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong resolved YARN-4913.
-
Resolution: Won't Fix

> Yarn logs should take a -out option to write to a directory
> ---
>
> Key: YARN-4913
> URL: https://issues.apache.org/jira/browse/YARN-4913
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4913.1.patch, YARN-4913.2.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-5002) getApplicationReport call may raise NPE

2016-04-27 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260840#comment-15260840
 ] 

Daniel Templeton commented on YARN-5002:


Security and usability are rarely in agreement.  Unfortunately, security 
carries a bigger stick.

I like the suggestion of handling the issue above the level of access control.  
It seems to me that this issue is most appropriately handled in the recovery 
code.  If a recovered application's queue doesn't exist, do something smart 
with it there.

> getApplicationReport call may raise NPE
> ---
>
> Key: YARN-5002
> URL: https://issues.apache.org/jira/browse/YARN-5002
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Sumana Sathish
>Assignee: Jian He
>Priority: Critical
> Attachments: YARN-5002.1.patch, YARN-5002.2.patch, YARN-5002.3.patch
>
>
> getApplicationReport call may raise NPE
> {code}
> Exception in thread "main" java.lang.NullPointerException: 
> java.lang.NullPointerException
>  
> org.apache.hadoop.yarn.server.resourcemanager.security.QueueACLsManager.checkAccess(QueueACLsManager.java:57)
>  
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.checkAccess(ClientRMService.java:279)
>  
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplications(ClientRMService.java:760)
>  
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplications(ClientRMService.java:682)
>  
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplications(ApplicationClientProtocolPBServiceImpl.java:234)
>  
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:425)
>  
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>  org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
>  org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2268)
>  org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2264)
>  java.security.AccessController.doPrivileged(Native Method)
>  javax.security.auth.Subject.doAs(Subject.java:422)
>  
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1708)
>  org.apache.hadoop.ipc.Server$Handler.run(Server.java:2262)
>  sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>  
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>  
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>  java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>  org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
>  org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:107)
>  
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplications(ApplicationClientProtocolPBClientImpl.java:254)
>  sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  java.lang.reflect.Method.invoke(Method.java:498)
>  
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256)
>  
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
>  com.sun.proxy.$Proxy18.getApplications(Unknown Source)
>  
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplications(YarnClientImpl.java:479)
>  
> org.apache.hadoop.mapred.ResourceMgrDelegate.getAllJobs(ResourceMgrDelegate.java:135)
>  org.apache.hadoop.mapred.YARNRunner.getAllJobs(YARNRunner.java:167)
>  org.apache.hadoop.mapreduce.Cluster.getAllJobStatuses(Cluster.java:294)
>  org.apache.hadoop.mapreduce.tools.CLI.listJobs(CLI.java:553)
>  org.apache.hadoop.mapreduce.tools.CLI.run(CLI.java:338)
>  org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>  org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
>  org.apache.hadoop.mapred.JobClient.main(JobClient.java:1274)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-5002) getApplicationReport call may raise NPE

2016-04-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260829#comment-15260829
 ] 

Hadoop QA commented on YARN-5002:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} docker {color} | {color:red} 0m 2s {color} 
| {color:red} Docker failed to build yetus/hadoop:7b1c37a. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12801098/YARN-5002.3.patch |
| JIRA Issue | YARN-5002 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/11248/console |
| Powered by | Apache Yetus 0.2.0   http://yetus.apache.org |


This message was automatically generated.



> getApplicationReport call may raise NPE
> ---
>
> Key: YARN-5002
> URL: https://issues.apache.org/jira/browse/YARN-5002
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Sumana Sathish
>Assignee: Jian He
>Priority: Critical
> Attachments: YARN-5002.1.patch, YARN-5002.2.patch, YARN-5002.3.patch
>
>
> getApplicationReport call may raise NPE
> {code}
> Exception in thread "main" java.lang.NullPointerException: 
> java.lang.NullPointerException
>  
> org.apache.hadoop.yarn.server.resourcemanager.security.QueueACLsManager.checkAccess(QueueACLsManager.java:57)
>  
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.checkAccess(ClientRMService.java:279)
>  
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplications(ClientRMService.java:760)
>  
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplications(ClientRMService.java:682)
>  
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplications(ApplicationClientProtocolPBServiceImpl.java:234)
>  
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:425)
>  
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>  org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
>  org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2268)
>  org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2264)
>  java.security.AccessController.doPrivileged(Native Method)
>  javax.security.auth.Subject.doAs(Subject.java:422)
>  
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1708)
>  org.apache.hadoop.ipc.Server$Handler.run(Server.java:2262)
>  sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>  
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>  
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>  java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>  org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
>  org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:107)
>  
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplications(ApplicationClientProtocolPBClientImpl.java:254)
>  sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  java.lang.reflect.Method.invoke(Method.java:498)
>  
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256)
>  
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
>  com.sun.proxy.$Proxy18.getApplications(Unknown Source)
>  
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplications(YarnClientImpl.java:479)
>  
> org.apache.hadoop.mapred.ResourceMgrDelegate.getAllJobs(ResourceMgrDelegate.java:135)
>  org.apache.hadoop.mapred.YARNRunner.getAllJobs(YARNRunner.java:167)
>  org.apache.hadoop.mapreduce.Cluster.getAllJobStatuses(Cluster.java:294)
>  org.apache.hadoop.mapreduce.tools.CLI.listJobs(CLI.java:553)
>  org.apache.hadoop.mapreduce.tools.CLI.run(CLI.java:338)
>  org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>  org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
>  org.apache.hadoop.mapred.JobClient.main(JobClient.java:1274)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-5002) getApplicationReport call may raise NPE

2016-04-27 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260821#comment-15260821
 ] 

Jian He commented on YARN-5002:
---

yeah, it is indeed confusing, earlier viewable  applications becomes not 
viewable if the queue gets removed. I think the non-existing queue should be 
treated explicitly instead of imbedding in the logic of access control. The 
fact that the queue is removed probably means the apps in that queue is of less 
concern in terms of ACLs.  For this patch, I think I'll still return true if 
queue does not exist for the sake of usability. 

> getApplicationReport call may raise NPE
> ---
>
> Key: YARN-5002
> URL: https://issues.apache.org/jira/browse/YARN-5002
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Sumana Sathish
>Assignee: Jian He
>Priority: Critical
> Attachments: YARN-5002.1.patch, YARN-5002.2.patch, YARN-5002.3.patch
>
>
> getApplicationReport call may raise NPE
> {code}
> Exception in thread "main" java.lang.NullPointerException: 
> java.lang.NullPointerException
>  
> org.apache.hadoop.yarn.server.resourcemanager.security.QueueACLsManager.checkAccess(QueueACLsManager.java:57)
>  
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.checkAccess(ClientRMService.java:279)
>  
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplications(ClientRMService.java:760)
>  
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplications(ClientRMService.java:682)
>  
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplications(ApplicationClientProtocolPBServiceImpl.java:234)
>  
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:425)
>  
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>  org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
>  org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2268)
>  org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2264)
>  java.security.AccessController.doPrivileged(Native Method)
>  javax.security.auth.Subject.doAs(Subject.java:422)
>  
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1708)
>  org.apache.hadoop.ipc.Server$Handler.run(Server.java:2262)
>  sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>  
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>  
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>  java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>  org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
>  org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:107)
>  
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplications(ApplicationClientProtocolPBClientImpl.java:254)
>  sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  java.lang.reflect.Method.invoke(Method.java:498)
>  
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256)
>  
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
>  com.sun.proxy.$Proxy18.getApplications(Unknown Source)
>  
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplications(YarnClientImpl.java:479)
>  
> org.apache.hadoop.mapred.ResourceMgrDelegate.getAllJobs(ResourceMgrDelegate.java:135)
>  org.apache.hadoop.mapred.YARNRunner.getAllJobs(YARNRunner.java:167)
>  org.apache.hadoop.mapreduce.Cluster.getAllJobStatuses(Cluster.java:294)
>  org.apache.hadoop.mapreduce.tools.CLI.listJobs(CLI.java:553)
>  org.apache.hadoop.mapreduce.tools.CLI.run(CLI.java:338)
>  org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>  org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
>  org.apache.hadoop.mapred.JobClient.main(JobClient.java:1274)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-5002) getApplicationReport call may raise NPE

2016-04-27 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-5002:
--
Attachment: YARN-5002.3.patch

> getApplicationReport call may raise NPE
> ---
>
> Key: YARN-5002
> URL: https://issues.apache.org/jira/browse/YARN-5002
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Sumana Sathish
>Assignee: Jian He
>Priority: Critical
> Attachments: YARN-5002.1.patch, YARN-5002.2.patch, YARN-5002.3.patch
>
>
> getApplicationReport call may raise NPE
> {code}
> Exception in thread "main" java.lang.NullPointerException: 
> java.lang.NullPointerException
>  
> org.apache.hadoop.yarn.server.resourcemanager.security.QueueACLsManager.checkAccess(QueueACLsManager.java:57)
>  
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.checkAccess(ClientRMService.java:279)
>  
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplications(ClientRMService.java:760)
>  
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplications(ClientRMService.java:682)
>  
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplications(ApplicationClientProtocolPBServiceImpl.java:234)
>  
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:425)
>  
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>  org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
>  org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2268)
>  org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2264)
>  java.security.AccessController.doPrivileged(Native Method)
>  javax.security.auth.Subject.doAs(Subject.java:422)
>  
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1708)
>  org.apache.hadoop.ipc.Server$Handler.run(Server.java:2262)
>  sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>  
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>  
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>  java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>  org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
>  org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:107)
>  
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplications(ApplicationClientProtocolPBClientImpl.java:254)
>  sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  java.lang.reflect.Method.invoke(Method.java:498)
>  
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256)
>  
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
>  com.sun.proxy.$Proxy18.getApplications(Unknown Source)
>  
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplications(YarnClientImpl.java:479)
>  
> org.apache.hadoop.mapred.ResourceMgrDelegate.getAllJobs(ResourceMgrDelegate.java:135)
>  org.apache.hadoop.mapred.YARNRunner.getAllJobs(YARNRunner.java:167)
>  org.apache.hadoop.mapreduce.Cluster.getAllJobStatuses(Cluster.java:294)
>  org.apache.hadoop.mapreduce.tools.CLI.listJobs(CLI.java:553)
>  org.apache.hadoop.mapreduce.tools.CLI.run(CLI.java:338)
>  org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>  org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
>  org.apache.hadoop.mapred.JobClient.main(JobClient.java:1274)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-5003) Add container resource to RM audit log

2016-04-27 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260816#comment-15260816
 ] 

Daniel Templeton commented on YARN-5003:


I didn't do a careful review yet, but the patch looks reasonable.  I don't see 
any obvious red flags.

> Add container resource to RM audit log
> --
>
> Key: YARN-5003
> URL: https://issues.apache.org/jira/browse/YARN-5003
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager, scheduler
>Affects Versions: 3.0.0
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Attachments: YARN-5003.001.patch
>
>
> It would be valuable to know the resource consumed by a container in the RM 
> audit log.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-5003) Add container resource to RM audit log

2016-04-27 Thread Nathan Roberts (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nathan Roberts updated YARN-5003:
-
Attachment: YARN-5003.001.patch

Attaching patch

> Add container resource to RM audit log
> --
>
> Key: YARN-5003
> URL: https://issues.apache.org/jira/browse/YARN-5003
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager, scheduler
>Affects Versions: 3.0.0
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Attachments: YARN-5003.001.patch
>
>
> It would be valuable to know the resource consumed by a container in the RM 
> audit log.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4447) Provide a mechanism to represent complex filters and parse them at the REST layer

2016-04-27 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260791#comment-15260791
 ] 

Varun Saxena commented on YARN-4447:


In point 3, I meant  "This means for an entity to match, event1 and event2 
should exist and event3 and event4 should {color:red}NOT{color} exist"

> Provide a mechanism to represent complex filters and parse them at the REST 
> layer 
> --
>
> Key: YARN-4447
> URL: https://issues.apache.org/jira/browse/YARN-4447
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4447-YARN-2928.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4676) Automatic and Asynchronous Decommissioning Nodes Status Tracking

2016-04-27 Thread Daniel Zhi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260765#comment-15260765
 ] 

Daniel Zhi commented on YARN-4676:
--

Just to clarify/repeat my understanding of current behavior (w/o this patch) in 
case I misread the code: It appears to me that regardless whether RM 
work-preserving restart is enabled or not, upon RM restart, NodesListManager 
creates pseudo RMNodeImpl for each excluded node and DECOMMISSION the node 
right away. Maybe there was intention to resume the DECOMMISSIONING, but I 
don't see current code is actually doing that.

> Automatic and Asynchronous Decommissioning Nodes Status Tracking
> 
>
> Key: YARN-4676
> URL: https://issues.apache.org/jira/browse/YARN-4676
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Daniel Zhi
>Assignee: Daniel Zhi
>  Labels: features
> Attachments: GracefulDecommissionYarnNode.pdf, YARN-4676.004.patch, 
> YARN-4676.005.patch, YARN-4676.006.patch, YARN-4676.007.patch, 
> YARN-4676.008.patch, YARN-4676.009.patch, YARN-4676.010.patch, 
> YARN-4676.011.patch, YARN-4676.012.patch, YARN-4676.013.patch
>
>
> DecommissioningNodeWatcher inside ResourceTrackingService tracks 
> DECOMMISSIONING nodes status automatically and asynchronously after 
> client/admin made the graceful decommission request. It tracks 
> DECOMMISSIONING nodes status to decide when, after all running containers on 
> the node have completed, will be transitioned into DECOMMISSIONED state. 
> NodesListManager detect and handle include and exclude list changes to kick 
> out decommission or recommission as necessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2888) Corrective mechanisms for rebalancing NM container queues

2016-04-27 Thread Arun Suresh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-2888:
--
Attachment: YARN-2888.003.patch

Updating patch to rebase with trunk and integrate with 
{{QueuingContainerManagerImpl}}

> Corrective mechanisms for rebalancing NM container queues
> -
>
> Key: YARN-2888
> URL: https://issues.apache.org/jira/browse/YARN-2888
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Konstantinos Karanasos
>Assignee: Arun Suresh
> Attachments: YARN-2888-yarn-2877.001.patch, 
> YARN-2888-yarn-2877.002.patch, YARN-2888.003.patch
>
>
> Bad queuing decisions by the LocalRMs (e.g., due to the distributed nature of 
> the scheduling decisions or due to having a stale image of the system) may 
> lead to an imbalance in the waiting times of the NM container queues. This 
> can in turn have an impact in job execution times and cluster utilization.
> To this end, we introduce corrective mechanisms that may remove (whenever 
> needed) container requests from overloaded queues, adding them to less-loaded 
> ones.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3998) Add retry-times to let NM re-launch container when it fails to run

2016-04-27 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260696#comment-15260696
 ] 

Varun Vasudev commented on YARN-3998:
-

[~vinodkv] - do you want to review this further or can I go ahead and commit it?

> Add retry-times to let NM re-launch container when it fails to run
> --
>
> Key: YARN-3998
> URL: https://issues.apache.org/jira/browse/YARN-3998
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jun Gong
>Assignee: Jun Gong
> Attachments: YARN-3998.01.patch, YARN-3998.02.patch, 
> YARN-3998.03.patch, YARN-3998.04.patch, YARN-3998.05.patch, 
> YARN-3998.06.patch, YARN-3998.07.patch, YARN-3998.08.patch, YARN-3998.09.patch
>
>
> I'd like to add a field(retry-times) in ContainerLaunchContext. When AM 
> launches containers, it could specify the value. Then NM will re-launch the 
> container 'retry-times' times when it fails to run(e.g.exit code is not 0). 
> It will save a lot of time. It avoids container localization. RM does not 
> need to re-schedule the container. And local files in container's working 
> directory will be left for re-use.(If container have downloaded some big 
> files, it does not need to re-download them when running again.) 
> We find it is useful in systems like Storm.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4734) Merge branch:YARN-3368 to trunk

2016-04-27 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260684#comment-15260684
 ] 

Wangda Tan commented on YARN-4734:
--

Above docker build failure is caused by one of package cannot be accessed:
bq. http://hackage.haskell.org/packages/archive/00-index.tar.gz

Will manually retry it later.

> Merge branch:YARN-3368 to trunk
> ---
>
> Key: YARN-4734
> URL: https://issues.apache.org/jira/browse/YARN-4734
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-4734.1.patch, YARN-4734.2.patch, YARN-4734.3.patch, 
> YARN-4734.4.patch, YARN-4734.5.patch, YARN-4734.6.patch, YARN-4734.7.patch, 
> YARN-4734.8.patch
>
>
> YARN-2928 branch is planned to merge back to trunk shortly, it depends on 
> changes of YARN-3368. This JIRA is to track the merging task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4734) Merge branch:YARN-3368 to trunk

2016-04-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260676#comment-15260676
 ] 

Hadoop QA commented on YARN-4734:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} docker {color} | {color:red} 0m 2s {color} 
| {color:red} Docker failed to build yetus/hadoop:7b1c37a. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12801087/YARN-4734.8.patch |
| JIRA Issue | YARN-4734 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/11245/console |
| Powered by | Apache Yetus 0.2.0   http://yetus.apache.org |


This message was automatically generated.



> Merge branch:YARN-3368 to trunk
> ---
>
> Key: YARN-4734
> URL: https://issues.apache.org/jira/browse/YARN-4734
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-4734.1.patch, YARN-4734.2.patch, YARN-4734.3.patch, 
> YARN-4734.4.patch, YARN-4734.5.patch, YARN-4734.6.patch, YARN-4734.7.patch, 
> YARN-4734.8.patch
>
>
> YARN-2928 branch is planned to merge back to trunk shortly, it depends on 
> changes of YARN-3368. This JIRA is to track the merging task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4734) Merge branch:YARN-3368 to trunk

2016-04-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260675#comment-15260675
 ] 

Hadoop QA commented on YARN-4734:
-

(!) A patch to the testing environment has been detected. 
Re-executing against the patched versions to perform further tests. 
The console is at 
https://builds.apache.org/job/PreCommit-YARN-Build/11245/console in case of 
problems.


> Merge branch:YARN-3368 to trunk
> ---
>
> Key: YARN-4734
> URL: https://issues.apache.org/jira/browse/YARN-4734
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-4734.1.patch, YARN-4734.2.patch, YARN-4734.3.patch, 
> YARN-4734.4.patch, YARN-4734.5.patch, YARN-4734.6.patch, YARN-4734.7.patch, 
> YARN-4734.8.patch
>
>
> YARN-2928 branch is planned to merge back to trunk shortly, it depends on 
> changes of YARN-3368. This JIRA is to track the merging task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4447) Provide a mechanism to represent complex filters and parse them at the REST layer

2016-04-27 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260665#comment-15260665
 ] 

Varun Saxena commented on YARN-4447:


There seems to be some problem with Jenkins machine. YARN-5002 had similar QA 
report.

> Provide a mechanism to represent complex filters and parse them at the REST 
> layer 
> --
>
> Key: YARN-4447
> URL: https://issues.apache.org/jira/browse/YARN-4447
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4447-YARN-2928.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4447) Provide a mechanism to represent complex filters and parse them at the REST layer

2016-04-27 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260663#comment-15260663
 ] 

Varun Saxena commented on YARN-4447:


[~sjlee0], kindly review.
Apologise for the delay in updating this patch. I was away for a week and was 
busy with internal work earlier so could not work on it.
I will add a few more test cases to test application fetching flow. But the 
main code can be reviewed anyways.

Will give a small description of what has been done.
# Metric filters are of the form {{(((metric1 gt 23) AND (metric2 lt 40)) OR 
(metric5 eq 40))}}. Comparison operators supported are as follows : 
#* gt(Greater than) / ge(Greater than or equals)
#* lt(Less than) / le(Less than or equals)
#* eq(Equals) / ne(Not equals. If key(metric/config/info) does not exist, this 
would mean a match) / ene(Exists and not equals. Key must exist).
# Config and info filters are of the same form as metric filters except that 
only eq,ne and ene comparison operators are supported for them.
# Event filters will take the form {{(((event1,event2) AND \!(event3,event4)) 
OR event5, event6)}}. This means for an entity to match, event1 and event2 
should exist and event3 and event4 should exist. Or, event5 and event6 should 
exist. ! indicates not equals(non-existence). A not(\!) should be followed by 
an opening bracket i.e. (
# Relation filters will take the same form as event filters. Here instead of 
each event we will have type-entities expression i.e. of the form 
{{type1:entity1:entity2}}. This part of the expression cannot contain spaces. 
Relation filters hence would be of the form {{type1:entity11:entity12 , 
type2:entity21 AND !(type3:entity31)}}.
Also, ene kind of case wont be supported here. If entity type does not exist, 
match wont occur. 
# Metrics to retrieve and configs to retrieve will have similar format to 
above. However ANDs' and ORs' do not make much sense here. Hence an expression 
of the form conf1,conf2 means return configs conf1 and conf2. And expression of 
the form \!(conf1,conf2) means returns all configs other than conf1 and conf2.
# Pls note that metric filters have not yet been supported for flow runs. They 
need to be matched locally as summation for metrics happens in coprocessor. Can 
be done in a separate JIRA.

Now coming to implementation, I had 2 options. To implement parsing logic in 
static methods or encapsulate this logic in a class. Went with latter as it 
makes it easier to break code into multiple methods without a need to pass 
several parameters to helper methods. And makes the code cleaner IMO. This 
would mean that an extra object will have to be created everytime though. Would 
like to know thoughts of others on the approach.
# There is a new interface named {{TimelineParser}} added which needs to be 
implemented for parsing different expressions.
# There are 2 abstract classes added namely TimelineParserForCompareExpr(for 
expressions of the form explained above for metric/config/info filters) and 
TimelineParserForEqualityExpr(for expressions of the form explained above for 
event/relation filters). These classes will have abstract methods, which will 
be implemented by concrete implementations, for deciding what kind of filter 
needs to be constructed for filter list, how to parse the values and how to set 
value, compareop, etc. to the filters.
# These abstract classes will then have concrete implementation for different 
filters. These include TimelineParserForNumericFilters(for metric filters), 
TimelineParserForKVFilters(for config/info filters), 
TimelineParserForExistFilters(used for filters which check for existence such 
as event filters) and TimelineParserForRelationFilters(for relation filters).
# Some code between TimelineParserForCompareExpr and 
TimelineParserForEqualityExpr is similar. It can be moved to another base 
abstract class. But this might make the code confusing. So have left it as it 
is. Would like to know thoughts of others on this.


> Provide a mechanism to represent complex filters and parse them at the REST 
> layer 
> --
>
> Key: YARN-4447
> URL: https://issues.apache.org/jira/browse/YARN-4447
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4447-YARN-2928.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4308) ContainersAggregated CPU resource utilization reports negative usage in first few heartbeats

2016-04-27 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260659#comment-15260659
 ] 

Sunil G commented on YARN-4308:
---

Sure. I feel we can weigh in opinion from [~kasha] and [~Naganarasimha Garla] 
too. I am fine in either way (documenting and commenting Or restricted warning 
logging) so it will be good if some more thoughts comes in so that best 
solution can go in.

> ContainersAggregated CPU resource utilization reports negative usage in first 
> few heartbeats
> 
>
> Key: YARN-4308
> URL: https://issues.apache.org/jira/browse/YARN-4308
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.1
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: 0001-YARN-4308.patch, 0002-YARN-4308.patch
>
>
> NodeManager reports ContainerAggregated CPU resource utilization as -ve value 
> in first few heartbeats cycles. I added a new debug print and received below 
> values from heartbeats.
> {noformat}
> INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
>  ContainersResource Utilization : CpuTrackerUsagePercent : -1.0 
> INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:ContainersResource
>  Utilization :  CpuTrackerUsagePercent : 198.94598
> {noformat}
> Its better we send 0 as CPU usage rather than sending a negative values in 
> heartbeats eventhough its happening in only first few heartbeats.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4734) Merge branch:YARN-3368 to trunk

2016-04-27 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-4734:
-
Attachment: YARN-4734.8.patch

> Merge branch:YARN-3368 to trunk
> ---
>
> Key: YARN-4734
> URL: https://issues.apache.org/jira/browse/YARN-4734
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-4734.1.patch, YARN-4734.2.patch, YARN-4734.3.patch, 
> YARN-4734.4.patch, YARN-4734.5.patch, YARN-4734.6.patch, YARN-4734.7.patch, 
> YARN-4734.8.patch
>
>
> YARN-2928 branch is planned to merge back to trunk shortly, it depends on 
> changes of YARN-3368. This JIRA is to track the merging task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4734) Merge branch:YARN-3368 to trunk

2016-04-27 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-4734:
-
Attachment: (was: YARN-4734.8.patch)

> Merge branch:YARN-3368 to trunk
> ---
>
> Key: YARN-4734
> URL: https://issues.apache.org/jira/browse/YARN-4734
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-4734.1.patch, YARN-4734.2.patch, YARN-4734.3.patch, 
> YARN-4734.4.patch, YARN-4734.5.patch, YARN-4734.6.patch, YARN-4734.7.patch, 
> YARN-4734.8.patch
>
>
> YARN-2928 branch is planned to merge back to trunk shortly, it depends on 
> changes of YARN-3368. This JIRA is to track the merging task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3573) MiniMRYarnCluster constructor that starts the timeline server using a boolean should be marked deprecated

2016-04-27 Thread Brahma Reddy Battula (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260649#comment-15260649
 ] 

Brahma Reddy Battula commented on YARN-3573:


 *Earlier:* 
{code}MiniMRYarnCluster(String testName, int noOfNMs, boolean enableAHS){code}
 *Now:* 
{code}MiniMRYarnCluster(String testName, int noOfNMs){code}

timelineserver startup wiil be based on the config value instead passing as 
Boolean param..


Please refer YARN-2890 for more details...Hope this helps..

> MiniMRYarnCluster constructor that starts the timeline server using a boolean 
> should be marked deprecated
> -
>
> Key: YARN-3573
> URL: https://issues.apache.org/jira/browse/YARN-3573
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: timelineserver
>Affects Versions: 2.6.0
>Reporter: Mit Desai
>Assignee: Brahma Reddy Battula
> Fix For: 2.8.0
>
> Attachments: YARN-3573-002.patch, YARN-3573.patch
>
>
> {code}MiniMRYarnCluster(String testName, int noOfNMs, boolean enableAHS){code}
> starts the timeline server using *boolean enableAHS*. It is better to have 
> the timelineserver started based on the config value.
> We should mark this constructor as deprecated to avoid its future use.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4956) findbug issue on LevelDBCacheTimelineStore

2016-04-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260647#comment-15260647
 ] 

Hudson commented on YARN-4956:
--

FAILURE: Integrated in Hadoop-trunk-Commit #9684 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9684/])
YARN-4956. findbug issue on LevelDBCacheTimelineStore. (Zhiyuan Yang via 
(gtcarrera9: rev f16722d2ef31338a57a13e2c8d18c1c62d58bbaf)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timeline-pluginstorage/src/main/java/org/apache/hadoop/yarn/server/timeline/LevelDBCacheTimelineStore.java


> findbug issue on LevelDBCacheTimelineStore
> --
>
> Key: YARN-4956
> URL: https://issues.apache.org/jira/browse/YARN-4956
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: 2.8.0
>Reporter: Xuan Gong
>Assignee: Zhiyuan Yang
> Fix For: 2.8.0
>
> Attachments: YARN-4956-trunk.000.patch
>
>
> {code}
> Multithreaded correctness Warnings
> Code Warning IS Inconsistent synchronization of 
> org.apache.hadoop.yarn.server.timeline.LevelDBCacheTimelineStore.configuration;
>  locked 66% of time
> Bug type IS2_INCONSISTENT_SYNC (click for details) 
> In class org.apache.hadoop.yarn.server.timeline.LevelDBCacheTimelineStore
> Field 
> org.apache.hadoop.yarn.server.timeline.LevelDBCacheTimelineStore.configuration
> Synchronized 66% of the time
> Unsynchronized access at LevelDBCacheTimelineStore.java:[line 82]
> Synchronized access at LevelDBCacheTimelineStore.java:[line 117]
> Synchronized access at LevelDBCacheTimelineStore.java:[line 122]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4308) ContainersAggregated CPU resource utilization reports negative usage in first few heartbeats

2016-04-27 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260645#comment-15260645
 ] 

Daniel Templeton commented on YARN-4308:


If you're not going to let the user know about persistent missed reports, then 
leave a wide trail of breadcrumbs for the person who has to debug it.  Putting 
it in the JavaDoc is a good first step.  Maybe also drop a comment into the 
code that calls the method.

> ContainersAggregated CPU resource utilization reports negative usage in first 
> few heartbeats
> 
>
> Key: YARN-4308
> URL: https://issues.apache.org/jira/browse/YARN-4308
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.1
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: 0001-YARN-4308.patch, 0002-YARN-4308.patch
>
>
> NodeManager reports ContainerAggregated CPU resource utilization as -ve value 
> in first few heartbeats cycles. I added a new debug print and received below 
> values from heartbeats.
> {noformat}
> INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
>  ContainersResource Utilization : CpuTrackerUsagePercent : -1.0 
> INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:ContainersResource
>  Utilization :  CpuTrackerUsagePercent : 198.94598
> {noformat}
> Its better we send 0 as CPU usage rather than sending a negative values in 
> heartbeats eventhough its happening in only first few heartbeats.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-5002) getApplicationReport call may raise NPE

2016-04-27 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260644#comment-15260644
 ] 

Sunil G commented on YARN-5002:
---

Hi [~jianhe]
I have one doubt here overall about this part.

{{checkAccess}} is invoked from {{forceKillApplication}} etc. And if 
{{checkAccess}} returns {{false}}, it is been  raised as access control 
exception.

But in reality issue was because of a non-existent queue after restart. 
Eventhough its logged, from client side exception seems like not accurate. 
Could this be a problem, how do you feel?


> getApplicationReport call may raise NPE
> ---
>
> Key: YARN-5002
> URL: https://issues.apache.org/jira/browse/YARN-5002
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Sumana Sathish
>Assignee: Jian He
>Priority: Critical
> Attachments: YARN-5002.1.patch, YARN-5002.2.patch
>
>
> getApplicationReport call may raise NPE
> {code}
> Exception in thread "main" java.lang.NullPointerException: 
> java.lang.NullPointerException
>  
> org.apache.hadoop.yarn.server.resourcemanager.security.QueueACLsManager.checkAccess(QueueACLsManager.java:57)
>  
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.checkAccess(ClientRMService.java:279)
>  
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplications(ClientRMService.java:760)
>  
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplications(ClientRMService.java:682)
>  
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplications(ApplicationClientProtocolPBServiceImpl.java:234)
>  
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:425)
>  
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>  org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
>  org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2268)
>  org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2264)
>  java.security.AccessController.doPrivileged(Native Method)
>  javax.security.auth.Subject.doAs(Subject.java:422)
>  
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1708)
>  org.apache.hadoop.ipc.Server$Handler.run(Server.java:2262)
>  sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>  
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>  
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>  java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>  org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
>  org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:107)
>  
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplications(ApplicationClientProtocolPBClientImpl.java:254)
>  sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  java.lang.reflect.Method.invoke(Method.java:498)
>  
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256)
>  
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
>  com.sun.proxy.$Proxy18.getApplications(Unknown Source)
>  
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplications(YarnClientImpl.java:479)
>  
> org.apache.hadoop.mapred.ResourceMgrDelegate.getAllJobs(ResourceMgrDelegate.java:135)
>  org.apache.hadoop.mapred.YARNRunner.getAllJobs(YARNRunner.java:167)
>  org.apache.hadoop.mapreduce.Cluster.getAllJobStatuses(Cluster.java:294)
>  org.apache.hadoop.mapreduce.tools.CLI.listJobs(CLI.java:553)
>  org.apache.hadoop.mapreduce.tools.CLI.run(CLI.java:338)
>  org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>  org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
>  org.apache.hadoop.mapred.JobClient.main(JobClient.java:1274)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4308) ContainersAggregated CPU resource utilization reports negative usage in first few heartbeats

2016-04-27 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260617#comment-15260617
 ] 

Sunil G commented on YARN-4308:
---

Yes. debug log is already present, my bad.

bq.Even if right now the only time a negative value comes back is on the first 
report, that doesn't mean it won't change later. 

I agree your thought. {{CpuTimeTracker}} doesnt have a protocol/standard 
defined when to return -1 or 0 or other values. So there are chances that this 
can be changed too in future. But I am thinking in covering this proposed INFO 
log code from test case point if view also. Because after skipping n times, we 
have to log one warning and this cycle has to continue. So this code snippet 
also to be covered via a test case. Is it fine if we make a note in 
{{CpuTimeTracker}} for its behavior or its expected return code as java doc?. I 
am fine in either way, but was thinking about the real usecase for now.

> ContainersAggregated CPU resource utilization reports negative usage in first 
> few heartbeats
> 
>
> Key: YARN-4308
> URL: https://issues.apache.org/jira/browse/YARN-4308
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.1
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: 0001-YARN-4308.patch, 0002-YARN-4308.patch
>
>
> NodeManager reports ContainerAggregated CPU resource utilization as -ve value 
> in first few heartbeats cycles. I added a new debug print and received below 
> values from heartbeats.
> {noformat}
> INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
>  ContainersResource Utilization : CpuTrackerUsagePercent : -1.0 
> INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:ContainersResource
>  Utilization :  CpuTrackerUsagePercent : 198.94598
> {noformat}
> Its better we send 0 as CPU usage rather than sending a negative values in 
> heartbeats eventhough its happening in only first few heartbeats.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4577) Enable aux services to have their own custom classpath/jar file

2016-04-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260614#comment-15260614
 ] 

Hadoop QA commented on YARN-4577:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} docker {color} | {color:red} 0m 4s {color} 
| {color:red} Docker failed to build yetus/hadoop:7b1c37a. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12801074/YARN-4577.5.patch |
| JIRA Issue | YARN-4577 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/11244/console |
| Powered by | Apache Yetus 0.2.0   http://yetus.apache.org |


This message was automatically generated.



> Enable aux services to have their own custom classpath/jar file
> ---
>
> Key: YARN-4577
> URL: https://issues.apache.org/jira/browse/YARN-4577
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.8.0
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4577.1.patch, YARN-4577.2.patch, 
> YARN-4577.20160119.1.patch, YARN-4577.20160204.patch, YARN-4577.3.patch, 
> YARN-4577.3.rebase.patch, YARN-4577.4.patch, YARN-4577.5.patch, 
> YARN-4577.poc.patch
>
>
> Right now, users have to add their jars to the NM classpath directly, thus 
> put them on the system classloader. But if multiple versions of the plugin 
> are present on the classpath, there is no control over which version actually 
> gets loaded. Or if there are any conflicts between the dependencies 
> introduced by the auxiliary service and the NM itself, they can break the NM, 
> the auxiliary service, or both.
> The solution could be: to instantiate aux services using a classloader that 
> is different from the system classloader.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4577) Enable aux services to have their own custom classpath/jar file

2016-04-27 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260609#comment-15260609
 ] 

Xuan Gong commented on YARN-4577:
-

[~sjlee0] Thanks for the review.

Attached a new patch to address the comments.

Unfortunately, I am not able to create a unit test for this. But I did test it 
manually.

Here is how I test it:
1. Create a customized TestAuxService which extends AuxiliaryService.
2. Create two jar file which have the same jar file name: TestAuxSerivce.jar 
and have the same class name: TestAuxService.java
3. Each TestAuxService.java has different log message. something like 
"TestAuxService in NM ClassPath" and "TestAuxService in Customer ClassPath"
4. Put one TestAuxService.jar into NM ClassPath, and put another 
TestAuxService.jar into customer class path, such as 
"/Users/xuan/dep/TestAuxService.jar"
5. modify several configuration in YARN-SITE.XML
{code}

yarn.nodemanager.aux-services
mapreduce_shuffle,TestAuxService
shuffle service that needs to be set for Map Reduce to run 



  
  yarn.nodemanager.aux-services.TestAuxService.class
  org.aux.TestAuxService
  
{code}
6. start NM, and verified the log message in NM logs, we can see
{code}
Test My AuxService in NM ClassPath in Service Init stage
Test My AuxService in NM ClassPath in Service Start stage
{code}
And we can verify that we load the TestAuxService class from NM Class Path
7. add one more configuration into yarn-site.xml
{code}


yarn.nodemanager.aux-services.TestAuxService.class.classpath
/Users/xuan/dep/TestAuxService.jar

{code}
8. Start NM, and check log message in NM log, we can find 
{code}
Test My AuxService in Customer ClassPath in Service Init stage
Test My AuxService in Customer ClassPath in Service Start stage
{code}
we can verify that if we set the customer class path, we would load 
TestAuxService from customer class path instead of NM classpath.

> Enable aux services to have their own custom classpath/jar file
> ---
>
> Key: YARN-4577
> URL: https://issues.apache.org/jira/browse/YARN-4577
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.8.0
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4577.1.patch, YARN-4577.2.patch, 
> YARN-4577.20160119.1.patch, YARN-4577.20160204.patch, YARN-4577.3.patch, 
> YARN-4577.3.rebase.patch, YARN-4577.4.patch, YARN-4577.5.patch, 
> YARN-4577.poc.patch
>
>
> Right now, users have to add their jars to the NM classpath directly, thus 
> put them on the system classloader. But if multiple versions of the plugin 
> are present on the classpath, there is no control over which version actually 
> gets loaded. Or if there are any conflicts between the dependencies 
> introduced by the auxiliary service and the NM itself, they can break the NM, 
> the auxiliary service, or both.
> The solution could be: to instantiate aux services using a classloader that 
> is different from the system classloader.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-5002) getApplicationReport call may raise NPE

2016-04-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260607#comment-15260607
 ] 

Hadoop QA commented on YARN-5002:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} docker {color} | {color:red} 4m 53s 
{color} | {color:red} Docker failed to build yetus/hadoop:7b1c37a. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12801079/YARN-5002.2.patch |
| JIRA Issue | YARN-5002 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/11243/console |
| Powered by | Apache Yetus 0.2.0   http://yetus.apache.org |


This message was automatically generated.



> getApplicationReport call may raise NPE
> ---
>
> Key: YARN-5002
> URL: https://issues.apache.org/jira/browse/YARN-5002
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Sumana Sathish
>Assignee: Jian He
>Priority: Critical
> Attachments: YARN-5002.1.patch, YARN-5002.2.patch
>
>
> getApplicationReport call may raise NPE
> {code}
> Exception in thread "main" java.lang.NullPointerException: 
> java.lang.NullPointerException
>  
> org.apache.hadoop.yarn.server.resourcemanager.security.QueueACLsManager.checkAccess(QueueACLsManager.java:57)
>  
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.checkAccess(ClientRMService.java:279)
>  
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplications(ClientRMService.java:760)
>  
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplications(ClientRMService.java:682)
>  
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplications(ApplicationClientProtocolPBServiceImpl.java:234)
>  
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:425)
>  
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>  org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
>  org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2268)
>  org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2264)
>  java.security.AccessController.doPrivileged(Native Method)
>  javax.security.auth.Subject.doAs(Subject.java:422)
>  
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1708)
>  org.apache.hadoop.ipc.Server$Handler.run(Server.java:2262)
>  sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>  
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>  
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>  java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>  org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
>  org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:107)
>  
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplications(ApplicationClientProtocolPBClientImpl.java:254)
>  sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  java.lang.reflect.Method.invoke(Method.java:498)
>  
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256)
>  
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
>  com.sun.proxy.$Proxy18.getApplications(Unknown Source)
>  
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplications(YarnClientImpl.java:479)
>  
> org.apache.hadoop.mapred.ResourceMgrDelegate.getAllJobs(ResourceMgrDelegate.java:135)
>  org.apache.hadoop.mapred.YARNRunner.getAllJobs(YARNRunner.java:167)
>  org.apache.hadoop.mapreduce.Cluster.getAllJobStatuses(Cluster.java:294)
>  org.apache.hadoop.mapreduce.tools.CLI.listJobs(CLI.java:553)
>  org.apache.hadoop.mapreduce.tools.CLI.run(CLI.java:338)
>  org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>  org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
>  org.apache.hadoop.mapred.JobClient.main(JobClient.java:1274)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4447) Provide a mechanism to represent complex filters and parse them at the REST layer

2016-04-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260591#comment-15260591
 ] 

Hadoop QA commented on YARN-4447:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} docker {color} | {color:red} 3m 18s 
{color} | {color:red} Docker failed to build yetus/hadoop:0ca8df7. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12801073/YARN-4447-YARN-2928.01.patch
 |
| JIRA Issue | YARN-4447 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/11240/console |
| Powered by | Apache Yetus 0.2.0   http://yetus.apache.org |


This message was automatically generated.



> Provide a mechanism to represent complex filters and parse them at the REST 
> layer 
> --
>
> Key: YARN-4447
> URL: https://issues.apache.org/jira/browse/YARN-4447
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4447-YARN-2928.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4577) Enable aux services to have their own custom classpath/jar file

2016-04-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260589#comment-15260589
 ] 

Hadoop QA commented on YARN-4577:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} docker {color} | {color:red} 0m 2s {color} 
| {color:red} Docker failed to build yetus/hadoop:7b1c37a. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12801074/YARN-4577.5.patch |
| JIRA Issue | YARN-4577 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/11242/console |
| Powered by | Apache Yetus 0.2.0   http://yetus.apache.org |


This message was automatically generated.



> Enable aux services to have their own custom classpath/jar file
> ---
>
> Key: YARN-4577
> URL: https://issues.apache.org/jira/browse/YARN-4577
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.8.0
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4577.1.patch, YARN-4577.2.patch, 
> YARN-4577.20160119.1.patch, YARN-4577.20160204.patch, YARN-4577.3.patch, 
> YARN-4577.3.rebase.patch, YARN-4577.4.patch, YARN-4577.5.patch, 
> YARN-4577.poc.patch
>
>
> Right now, users have to add their jars to the NM classpath directly, thus 
> put them on the system classloader. But if multiple versions of the plugin 
> are present on the classpath, there is no control over which version actually 
> gets loaded. Or if there are any conflicts between the dependencies 
> introduced by the auxiliary service and the NM itself, they can break the NM, 
> the auxiliary service, or both.
> The solution could be: to instantiate aux services using a classloader that 
> is different from the system classloader.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4734) Merge branch:YARN-3368 to trunk

2016-04-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260588#comment-15260588
 ] 

Hadoop QA commented on YARN-4734:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 4s {color} 
| {color:red} YARN-4734 does not apply to trunk. Rebase required? Wrong Branch? 
See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12801075/YARN-4734.8.patch |
| JIRA Issue | YARN-4734 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/11241/console |
| Powered by | Apache Yetus 0.2.0   http://yetus.apache.org |


This message was automatically generated.



> Merge branch:YARN-3368 to trunk
> ---
>
> Key: YARN-4734
> URL: https://issues.apache.org/jira/browse/YARN-4734
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-4734.1.patch, YARN-4734.2.patch, YARN-4734.3.patch, 
> YARN-4734.4.patch, YARN-4734.5.patch, YARN-4734.6.patch, YARN-4734.7.patch, 
> YARN-4734.8.patch
>
>
> YARN-2928 branch is planned to merge back to trunk shortly, it depends on 
> changes of YARN-3368. This JIRA is to track the merging task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4956) findbug issue on LevelDBCacheTimelineStore

2016-04-27 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260585#comment-15260585
 ] 

Li Lu commented on YARN-4956:
-

No concerns raised. I'll commit shortly. 

> findbug issue on LevelDBCacheTimelineStore
> --
>
> Key: YARN-4956
> URL: https://issues.apache.org/jira/browse/YARN-4956
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: 2.8.0
>Reporter: Xuan Gong
>Assignee: Zhiyuan Yang
> Attachments: YARN-4956-trunk.000.patch
>
>
> {code}
> Multithreaded correctness Warnings
> Code Warning IS Inconsistent synchronization of 
> org.apache.hadoop.yarn.server.timeline.LevelDBCacheTimelineStore.configuration;
>  locked 66% of time
> Bug type IS2_INCONSISTENT_SYNC (click for details) 
> In class org.apache.hadoop.yarn.server.timeline.LevelDBCacheTimelineStore
> Field 
> org.apache.hadoop.yarn.server.timeline.LevelDBCacheTimelineStore.configuration
> Synchronized 66% of the time
> Unsynchronized access at LevelDBCacheTimelineStore.java:[line 82]
> Synchronized access at LevelDBCacheTimelineStore.java:[line 117]
> Synchronized access at LevelDBCacheTimelineStore.java:[line 122]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-5002) getApplicationReport call may raise NPE

2016-04-27 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-5002:
--
Attachment: YARN-5002.2.patch

> getApplicationReport call may raise NPE
> ---
>
> Key: YARN-5002
> URL: https://issues.apache.org/jira/browse/YARN-5002
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Sumana Sathish
>Assignee: Jian He
>Priority: Critical
> Attachments: YARN-5002.1.patch, YARN-5002.2.patch
>
>
> getApplicationReport call may raise NPE
> {code}
> Exception in thread "main" java.lang.NullPointerException: 
> java.lang.NullPointerException
>  
> org.apache.hadoop.yarn.server.resourcemanager.security.QueueACLsManager.checkAccess(QueueACLsManager.java:57)
>  
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.checkAccess(ClientRMService.java:279)
>  
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplications(ClientRMService.java:760)
>  
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplications(ClientRMService.java:682)
>  
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplications(ApplicationClientProtocolPBServiceImpl.java:234)
>  
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:425)
>  
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>  org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
>  org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2268)
>  org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2264)
>  java.security.AccessController.doPrivileged(Native Method)
>  javax.security.auth.Subject.doAs(Subject.java:422)
>  
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1708)
>  org.apache.hadoop.ipc.Server$Handler.run(Server.java:2262)
>  sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>  
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>  
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>  java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>  org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
>  org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:107)
>  
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplications(ApplicationClientProtocolPBClientImpl.java:254)
>  sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  java.lang.reflect.Method.invoke(Method.java:498)
>  
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256)
>  
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
>  com.sun.proxy.$Proxy18.getApplications(Unknown Source)
>  
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplications(YarnClientImpl.java:479)
>  
> org.apache.hadoop.mapred.ResourceMgrDelegate.getAllJobs(ResourceMgrDelegate.java:135)
>  org.apache.hadoop.mapred.YARNRunner.getAllJobs(YARNRunner.java:167)
>  org.apache.hadoop.mapreduce.Cluster.getAllJobStatuses(Cluster.java:294)
>  org.apache.hadoop.mapreduce.tools.CLI.listJobs(CLI.java:553)
>  org.apache.hadoop.mapreduce.tools.CLI.run(CLI.java:338)
>  org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>  org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
>  org.apache.hadoop.mapred.JobClient.main(JobClient.java:1274)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-5002) getApplicationReport call may raise NPE

2016-04-27 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260574#comment-15260574
 ] 

Jian He commented on YARN-5002:
---

[~templedf], thanks for the review 
bq. You can't hard-code the capacity scheduler into the RM
yeah, missed this, even unit test failed because of this. fixed it.
bq. From a security perspective it's better to deny access to an app if we 
can't find the queue.
I don't have strong opinion on this. Problem with denying access is that these 
apps will never be able to be viewed. changed it anyway.  

> getApplicationReport call may raise NPE
> ---
>
> Key: YARN-5002
> URL: https://issues.apache.org/jira/browse/YARN-5002
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Sumana Sathish
>Assignee: Jian He
>Priority: Critical
> Attachments: YARN-5002.1.patch
>
>
> getApplicationReport call may raise NPE
> {code}
> Exception in thread "main" java.lang.NullPointerException: 
> java.lang.NullPointerException
>  
> org.apache.hadoop.yarn.server.resourcemanager.security.QueueACLsManager.checkAccess(QueueACLsManager.java:57)
>  
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.checkAccess(ClientRMService.java:279)
>  
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplications(ClientRMService.java:760)
>  
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplications(ClientRMService.java:682)
>  
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplications(ApplicationClientProtocolPBServiceImpl.java:234)
>  
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:425)
>  
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>  org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
>  org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2268)
>  org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2264)
>  java.security.AccessController.doPrivileged(Native Method)
>  javax.security.auth.Subject.doAs(Subject.java:422)
>  
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1708)
>  org.apache.hadoop.ipc.Server$Handler.run(Server.java:2262)
>  sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>  
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>  
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>  java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>  org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
>  org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:107)
>  
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplications(ApplicationClientProtocolPBClientImpl.java:254)
>  sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  java.lang.reflect.Method.invoke(Method.java:498)
>  
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256)
>  
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
>  com.sun.proxy.$Proxy18.getApplications(Unknown Source)
>  
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplications(YarnClientImpl.java:479)
>  
> org.apache.hadoop.mapred.ResourceMgrDelegate.getAllJobs(ResourceMgrDelegate.java:135)
>  org.apache.hadoop.mapred.YARNRunner.getAllJobs(YARNRunner.java:167)
>  org.apache.hadoop.mapreduce.Cluster.getAllJobStatuses(Cluster.java:294)
>  org.apache.hadoop.mapreduce.tools.CLI.listJobs(CLI.java:553)
>  org.apache.hadoop.mapreduce.tools.CLI.run(CLI.java:338)
>  org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>  org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
>  org.apache.hadoop.mapred.JobClient.main(JobClient.java:1274)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4905) Improve Yarn log Command line option to show log metadata

2016-04-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260571#comment-15260571
 ] 

Hadoop QA commented on YARN-4905:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} docker {color} | {color:red} 11m 54s 
{color} | {color:red} Docker failed to build yetus/hadoop:7b1c37a. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12801072/YARN-4905.5.patch |
| JIRA Issue | YARN-4905 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/11239/console |
| Powered by | Apache Yetus 0.2.0   http://yetus.apache.org |


This message was automatically generated.



> Improve Yarn log Command line option to show log metadata
> -
>
> Key: YARN-4905
> URL: https://issues.apache.org/jira/browse/YARN-4905
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4905.1.patch, YARN-4905.2.patch, YARN-4905.3.patch, 
> YARN-4905.4.patch, YARN-4905.5.patch
>
>
> Improve the Yarn log commandline to have "ls" command which can list 
> containers for which we have logs, list files within each container, along 
> with file size



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4734) Merge branch:YARN-3368 to trunk

2016-04-27 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-4734:
-
Attachment: YARN-4734.8.patch

Attached ver.8 patch, rebased to latest trunk, merged LICENSE.txt.

> Merge branch:YARN-3368 to trunk
> ---
>
> Key: YARN-4734
> URL: https://issues.apache.org/jira/browse/YARN-4734
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-4734.1.patch, YARN-4734.2.patch, YARN-4734.3.patch, 
> YARN-4734.4.patch, YARN-4734.5.patch, YARN-4734.6.patch, YARN-4734.7.patch, 
> YARN-4734.8.patch
>
>
> YARN-2928 branch is planned to merge back to trunk shortly, it depends on 
> changes of YARN-3368. This JIRA is to track the merging task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4577) Enable aux services to have their own custom classpath/jar file

2016-04-27 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-4577:

Attachment: YARN-4577.5.patch

> Enable aux services to have their own custom classpath/jar file
> ---
>
> Key: YARN-4577
> URL: https://issues.apache.org/jira/browse/YARN-4577
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.8.0
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4577.1.patch, YARN-4577.2.patch, 
> YARN-4577.20160119.1.patch, YARN-4577.20160204.patch, YARN-4577.3.patch, 
> YARN-4577.3.rebase.patch, YARN-4577.4.patch, YARN-4577.5.patch, 
> YARN-4577.poc.patch
>
>
> Right now, users have to add their jars to the NM classpath directly, thus 
> put them on the system classloader. But if multiple versions of the plugin 
> are present on the classpath, there is no control over which version actually 
> gets loaded. Or if there are any conflicts between the dependencies 
> introduced by the auxiliary service and the NM itself, they can break the NM, 
> the auxiliary service, or both.
> The solution could be: to instantiate aux services using a classloader that 
> is different from the system classloader.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4308) ContainersAggregated CPU resource utilization reports negative usage in first few heartbeats

2016-04-27 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260558#comment-15260558
 ] 

Daniel Templeton commented on YARN-4308:


There's already a debug log on a miss in the patch.

Even if right now the only time a negative value comes back is on the first 
report, that doesn't mean it won't change later.  My spider sense says that the 
chance of the reports going away permanently with no sign as to why is bad.  
We're talking about futures, though, so I'm willing to accept your assertion 
that this change can't possible create a customer support case, but I reserve 
the right to an I-told-you-so later if it does.

> ContainersAggregated CPU resource utilization reports negative usage in first 
> few heartbeats
> 
>
> Key: YARN-4308
> URL: https://issues.apache.org/jira/browse/YARN-4308
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.1
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: 0001-YARN-4308.patch, 0002-YARN-4308.patch
>
>
> NodeManager reports ContainerAggregated CPU resource utilization as -ve value 
> in first few heartbeats cycles. I added a new debug print and received below 
> values from heartbeats.
> {noformat}
> INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
>  ContainersResource Utilization : CpuTrackerUsagePercent : -1.0 
> INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:ContainersResource
>  Utilization :  CpuTrackerUsagePercent : 198.94598
> {noformat}
> Its better we send 0 as CPU usage rather than sending a negative values in 
> heartbeats eventhough its happening in only first few heartbeats.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4447) Provide a mechanism to represent complex filters and parse them at the REST layer

2016-04-27 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-4447:
---
Attachment: YARN-4447-YARN-2928.01.patch

> Provide a mechanism to represent complex filters and parse them at the REST 
> layer 
> --
>
> Key: YARN-4447
> URL: https://issues.apache.org/jira/browse/YARN-4447
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4447-YARN-2928.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4308) ContainersAggregated CPU resource utilization reports negative usage in first few heartbeats

2016-04-27 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260533#comment-15260533
 ] 

Sunil G commented on YARN-4308:
---

Thanks [~templedf] for sharing the thoughts.
I have checked the possibilities of getting -ve values from {{CpuTimeTracker}} 
. As I see it, we can get negative only first time and I was not seeing other 
cases. In that case, considering the skipping happen only once, do we need a 
INFO log there ? I think I can add a debug log if that code is hit. But I am 
not very sure whether we need log after "n" hits, because it may hit only first 
time. Could you pls correct me If i missed somthing .


> ContainersAggregated CPU resource utilization reports negative usage in first 
> few heartbeats
> 
>
> Key: YARN-4308
> URL: https://issues.apache.org/jira/browse/YARN-4308
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.1
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: 0001-YARN-4308.patch, 0002-YARN-4308.patch
>
>
> NodeManager reports ContainerAggregated CPU resource utilization as -ve value 
> in first few heartbeats cycles. I added a new debug print and received below 
> values from heartbeats.
> {noformat}
> INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
>  ContainersResource Utilization : CpuTrackerUsagePercent : -1.0 
> INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:ContainersResource
>  Utilization :  CpuTrackerUsagePercent : 198.94598
> {noformat}
> Its better we send 0 as CPU usage rather than sending a negative values in 
> heartbeats eventhough its happening in only first few heartbeats.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4905) Improve Yarn log Command line option to show log metadata

2016-04-27 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-4905:

Attachment: YARN-4905.5.patch

rebase the patch

> Improve Yarn log Command line option to show log metadata
> -
>
> Key: YARN-4905
> URL: https://issues.apache.org/jira/browse/YARN-4905
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4905.1.patch, YARN-4905.2.patch, YARN-4905.3.patch, 
> YARN-4905.4.patch, YARN-4905.5.patch
>
>
> Improve the Yarn log commandline to have "ls" command which can list 
> containers for which we have logs, list files within each container, along 
> with file size



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4807) MockAM#waitForState sleep duration is too long

2016-04-27 Thread Yufei Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260496#comment-15260496
 ] 

Yufei Gu commented on YARN-4807:


Thanks a lot for the review, [~templedf] and [~kasha]. 
Thanks a lot for committing, [~kasha]. 

> MockAM#waitForState sleep duration is too long
> --
>
> Key: YARN-4807
> URL: https://issues.apache.org/jira/browse/YARN-4807
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>Assignee: Yufei Gu
> Fix For: 2.9.0
>
> Attachments: YARN-4807.001.patch, YARN-4807.002.patch, 
> YARN-4807.003.patch, YARN-4807.004.patch, YARN-4807.005.patch, 
> YARN-4807.006.patch, YARN-4807.007.patch, YARN-4807.008.patch, 
> YARN-4807.009.patch, YARN-4807.010.patch, YARN-4807.011.patch, 
> YARN-4807.012.patch, YARN-4807.013.patch, YARN-4807.014.patch, 
> YARN-4807.015.patch
>
>
> MockAM#waitForState sleep duration (500 ms) is too long. Also, there is 
> significant duplication with MockRM#waitForState.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4807) MockAM#waitForState sleep duration is too long

2016-04-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260477#comment-15260477
 ] 

Hudson commented on YARN-4807:
--

FAILURE: Integrated in Hadoop-trunk-Commit #9682 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9682/])
YARN-4807. MockAM#waitForState sleep duration is too long. (Yufei Gu via 
(kasha: rev 185c3d4de1ac4cf10cc1aa00f367b3880b80)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestWorkPreservingRMRestartForNodeLabel.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRMRPCNodeUpdates.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesNodes.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockAM.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestContainerResourceUsage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRestart.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestNodeLabelContainerAllocation.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestContainerResizing.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestSignalContainer.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/ahs/TestRMApplicationHistoryWriter.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacitySchedulerNodeLabelUpdate.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestApplicationCleanup.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestAbstractYarnScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestNodesListManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestApplicationMasterService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRM.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesApps.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestApplicationMasterLauncher.java


> MockAM#waitForState sleep duration is too long
> --
>
> Key: YARN-4807
> URL: https://issues.apache.org/jira/browse/YARN-4807
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>Assignee: Yufei Gu
> Fix For: 2.9.0
>
> Attachments: YARN-4807.001.patch, YARN-4807.002.pa

[jira] [Created] (YARN-5003) Add container resource to RM audit log

2016-04-27 Thread Nathan Roberts (JIRA)
Nathan Roberts created YARN-5003:


 Summary: Add container resource to RM audit log
 Key: YARN-5003
 URL: https://issues.apache.org/jira/browse/YARN-5003
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager, scheduler
Affects Versions: 3.0.0
Reporter: Nathan Roberts
Assignee: Nathan Roberts


It would be valuable to know the resource consumed by a container in the RM 
audit log.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4676) Automatic and Asynchronous Decommissioning Nodes Status Tracking

2016-04-27 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260468#comment-15260468
 ] 

Junping Du commented on YARN-4676:
--

bq. If RM work-preserving restart is not enabled, it should be okay to 
decommission a node right away. 
Agree. But it is not today's behavior w/o this patch. After this patch, the 
decommissioning nodes will lose timeout until all running applications on top 
of are get finished.

bq. If work-preserving restart is enabled and a node is decommissioned with a 
timeout, it would be nice to store when the decommission has been called and 
the timeout in the state-store. Note that, in an HA setup, the two RMs could 
have a clock skew. Since that work is non-trivial, I am open to doing it in a 
follow-up JIRA.
I really have concern to put everything into state-store. I think we should try 
to get rid of store unnecessary info as much as possible - just like what we do 
in RM recover applications/nodes for RM restart. Isn't it? Additional 
Store/Recovery operation for each NM's decommissioning timeout value sounds too 
over-weighted. 
Actually, I was more interested on the Daniel's idea above to combine the 
client side track and RM side track so that we could track timeout in client 
side in case we lose timeout in RM side. However, I need to check more to have 
some more concrete ideas.

> Automatic and Asynchronous Decommissioning Nodes Status Tracking
> 
>
> Key: YARN-4676
> URL: https://issues.apache.org/jira/browse/YARN-4676
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Daniel Zhi
>Assignee: Daniel Zhi
>  Labels: features
> Attachments: GracefulDecommissionYarnNode.pdf, YARN-4676.004.patch, 
> YARN-4676.005.patch, YARN-4676.006.patch, YARN-4676.007.patch, 
> YARN-4676.008.patch, YARN-4676.009.patch, YARN-4676.010.patch, 
> YARN-4676.011.patch, YARN-4676.012.patch, YARN-4676.013.patch
>
>
> DecommissioningNodeWatcher inside ResourceTrackingService tracks 
> DECOMMISSIONING nodes status automatically and asynchronously after 
> client/admin made the graceful decommission request. It tracks 
> DECOMMISSIONING nodes status to decide when, after all running containers on 
> the node have completed, will be transitioned into DECOMMISSIONED state. 
> NodesListManager detect and handle include and exclude list changes to kick 
> out decommission or recommission as necessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4804) [Umbrella] Improve test run duration

2016-04-27 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260465#comment-15260465
 ] 

Karthik Kambatla commented on YARN-4804:


We shaved off 10 minutes through YARN-4805 and YARN-4807. 

> [Umbrella] Improve test run duration
> 
>
> Key: YARN-4804
> URL: https://issues.apache.org/jira/browse/YARN-4804
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>
> Our tests take a long time to run. e.g. the RM tests take 67 minutes. Given 
> our precommit builds run our tests against two Java versions, this issue is 
> exacerbated. 
> Filing this umbrella JIRA to address this. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4807) MockAM#waitForState sleep duration is too long

2016-04-27 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-4807:
---
Labels:   (was: newbie)

> MockAM#waitForState sleep duration is too long
> --
>
> Key: YARN-4807
> URL: https://issues.apache.org/jira/browse/YARN-4807
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>Assignee: Yufei Gu
> Fix For: 2.9.0
>
> Attachments: YARN-4807.001.patch, YARN-4807.002.patch, 
> YARN-4807.003.patch, YARN-4807.004.patch, YARN-4807.005.patch, 
> YARN-4807.006.patch, YARN-4807.007.patch, YARN-4807.008.patch, 
> YARN-4807.009.patch, YARN-4807.010.patch, YARN-4807.011.patch, 
> YARN-4807.012.patch, YARN-4807.013.patch, YARN-4807.014.patch, 
> YARN-4807.015.patch
>
>
> MockAM#waitForState sleep duration (500 ms) is too long. Also, there is 
> significant duplication with MockRM#waitForState.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4595) Add support for configurable read-only mounts

2016-04-27 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260454#comment-15260454
 ] 

Varun Vasudev commented on YARN-4595:
-

[~aw] - do Billie's changes address your concerns?

> Add support for configurable read-only mounts
> -
>
> Key: YARN-4595
> URL: https://issues.apache.org/jira/browse/YARN-4595
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Billie Rinaldi
>Assignee: Billie Rinaldi
> Attachments: YARN-4595.1.patch, YARN-4595.2.patch, YARN-4595.3.patch, 
> YARN-4595.4.patch, YARN-4595.5.patch
>
>
> Mounting files or directories from the host is one way of passing 
> configuration and other information into a docker container.  We could allow 
> the user to set a list of mounts in the environment of ContainerLaunchContext 
> (e.g. /dir1:/targetdir1,/dir2:/targetdir2).  These would be mounted read-only 
> to the specified target locations.
> Due to permissions and user concerns, for this ticket we will require the 
> mounts to be resources that are in the distributed cache.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4807) MockAM#waitForState sleep duration is too long

2016-04-27 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260452#comment-15260452
 ] 

Karthik Kambatla commented on YARN-4807:


+1, checking this in. 

> MockAM#waitForState sleep duration is too long
> --
>
> Key: YARN-4807
> URL: https://issues.apache.org/jira/browse/YARN-4807
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>Assignee: Yufei Gu
>  Labels: newbie
> Attachments: YARN-4807.001.patch, YARN-4807.002.patch, 
> YARN-4807.003.patch, YARN-4807.004.patch, YARN-4807.005.patch, 
> YARN-4807.006.patch, YARN-4807.007.patch, YARN-4807.008.patch, 
> YARN-4807.009.patch, YARN-4807.010.patch, YARN-4807.011.patch, 
> YARN-4807.012.patch, YARN-4807.013.patch, YARN-4807.014.patch, 
> YARN-4807.015.patch
>
>
> MockAM#waitForState sleep duration (500 ms) is too long. Also, there is 
> significant duplication with MockRM#waitForState.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4676) Automatic and Asynchronous Decommissioning Nodes Status Tracking

2016-04-27 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260441#comment-15260441
 ] 

Karthik Kambatla commented on YARN-4676:


Haven't looked at the code itself, but looked at recent discussion around RM 
restart and [~rkanter] filled me in on some of the details. 

If RM work-preserving restart is not enabled, it should be okay to decommission 
a node right away. If work-preserving restart is enabled and a node is 
decommissioned with a timeout, it would be nice to store *when* the 
decommission has been called and the timeout in the state-store. Note that, in 
an HA setup, the two RMs could have a clock skew. Since that work is 
non-trivial, I am open to doing it in a follow-up JIRA. 

> Automatic and Asynchronous Decommissioning Nodes Status Tracking
> 
>
> Key: YARN-4676
> URL: https://issues.apache.org/jira/browse/YARN-4676
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Daniel Zhi
>Assignee: Daniel Zhi
>  Labels: features
> Attachments: GracefulDecommissionYarnNode.pdf, YARN-4676.004.patch, 
> YARN-4676.005.patch, YARN-4676.006.patch, YARN-4676.007.patch, 
> YARN-4676.008.patch, YARN-4676.009.patch, YARN-4676.010.patch, 
> YARN-4676.011.patch, YARN-4676.012.patch, YARN-4676.013.patch
>
>
> DecommissioningNodeWatcher inside ResourceTrackingService tracks 
> DECOMMISSIONING nodes status automatically and asynchronously after 
> client/admin made the graceful decommission request. It tracks 
> DECOMMISSIONING nodes status to decide when, after all running containers on 
> the node have completed, will be transitioned into DECOMMISSIONED state. 
> NodesListManager detect and handle include and exclude list changes to kick 
> out decommission or recommission as necessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-5002) getApplicationReport call may raise NPE

2016-04-27 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260430#comment-15260430
 ] 

Daniel Templeton commented on YARN-5002:


Thanks for posting the patch, [~jianhe].  I have a few concerns:

# You can't hard-code the capacity scheduler into the RM.  The code has to use 
whatever scheduler is selected.
# From a security perspective it's better to deny access to an app if we can't 
find the queue.  We should probably log an INFO or DEBUG level message when 
that happens so that there's a paper trail.
# This:

{code}
final ApplicationReport[] report = { null };
user2.doAs(new PrivilegedAction() {
  @Override
  public ApplicationReport run() {
try {
  report[0] = rm2.getApplicationReport(app1.getApplicationId());
} catch (Exception e) {
  e.printStackTrace();
}
return report[0];
  }
});
{code}

seems a bit convoluted.  How about just:

{code}
ApplicationReport report =
user2.doAs(new PrivilegedExceptionAction() {
  @Override
  public ApplicationReport run() throws Exception {
return rm2.getApplicationReport(app1.getApplicationId());
  }
});
{code}

> getApplicationReport call may raise NPE
> ---
>
> Key: YARN-5002
> URL: https://issues.apache.org/jira/browse/YARN-5002
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Sumana Sathish
>Assignee: Jian He
>Priority: Critical
> Attachments: YARN-5002.1.patch
>
>
> getApplicationReport call may raise NPE
> {code}
> Exception in thread "main" java.lang.NullPointerException: 
> java.lang.NullPointerException
>  
> org.apache.hadoop.yarn.server.resourcemanager.security.QueueACLsManager.checkAccess(QueueACLsManager.java:57)
>  
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.checkAccess(ClientRMService.java:279)
>  
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplications(ClientRMService.java:760)
>  
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplications(ClientRMService.java:682)
>  
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplications(ApplicationClientProtocolPBServiceImpl.java:234)
>  
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:425)
>  
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>  org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
>  org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2268)
>  org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2264)
>  java.security.AccessController.doPrivileged(Native Method)
>  javax.security.auth.Subject.doAs(Subject.java:422)
>  
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1708)
>  org.apache.hadoop.ipc.Server$Handler.run(Server.java:2262)
>  sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>  
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>  
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>  java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>  org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
>  org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:107)
>  
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplications(ApplicationClientProtocolPBClientImpl.java:254)
>  sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  java.lang.reflect.Method.invoke(Method.java:498)
>  
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256)
>  
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
>  com.sun.proxy.$Proxy18.getApplications(Unknown Source)
>  
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplications(YarnClientImpl.java:479)
>  
> org.apache.hadoop.mapred.ResourceMgrDelegate.getAllJobs(ResourceMgrDelegate.java:135)
>  org.apache.hadoop.mapred.YARNRunner.getAllJobs(YARNRunner.java:167)
>  org.apache.hadoop.mapreduce.Cluster.getAllJobStatuses(Cluster.java:294)
>  org.apache.hadoop.mapreduce.tools.CLI.listJobs(CLI.java:553)
>  org.apache.hadoop.mapreduce.tools.CLI.run(CLI.java:338)
>  org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>  org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
>  org.apache.hadoop.mapred.JobClient.main(JobClien

[jira] [Commented] (YARN-4994) Use MiniYARNCluster with try-with-resources in tests

2016-04-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260397#comment-15260397
 ] 

Hadoop QA commented on YARN-4994:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 15s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 7 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 17s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 12m 
4s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 23s 
{color} | {color:green} trunk passed with JDK v1.8.0_92 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 5s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
3s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 31s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
55s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
16s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 22s 
{color} | {color:green} trunk passed with JDK v1.8.0_92 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 10s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 16s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
21s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 11m 
43s {color} | {color:green} the patch passed with JDK v1.8.0_92 {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 13m 21s 
{color} | {color:red} root-jdk1.8.0_92 with JDK v1.8.0_92 generated 3 new + 736 
unchanged - 3 fixed = 739 total (was 739) {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 11m 43s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 
30s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 23m 52s 
{color} | {color:red} root-jdk1.7.0_95 with JDK v1.7.0_95 generated 3 new + 733 
unchanged - 3 fixed = 736 total (was 736) {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 10m 30s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
6s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 34s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
54s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
40s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 28s 
{color} | {color:green} the patch passed with JDK v1.8.0_92 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 19s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 7m 32s {color} 
| {color:red} hadoop-yarn-server-tests in the patch failed with JDK v1.8.0_92. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 67m 22s {color} 
| {color:red} hadoop-yarn-client in the patch failed with JDK v1.8.0_92. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 9m 55s {color} 
| {color:red} hadoop-mapreduce-client-app in the patch failed with JDK 
v1.8.0_92. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color

[jira] [Commented] (YARN-4595) Add support for configurable read-only mounts

2016-04-27 Thread Billie Rinaldi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260353#comment-15260353
 ] 

Billie Rinaldi commented on YARN-4595:
--

Yes, I believe that is correct.  Thanks for the review, [~vvasudev].

> Add support for configurable read-only mounts
> -
>
> Key: YARN-4595
> URL: https://issues.apache.org/jira/browse/YARN-4595
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Billie Rinaldi
>Assignee: Billie Rinaldi
> Attachments: YARN-4595.1.patch, YARN-4595.2.patch, YARN-4595.3.patch, 
> YARN-4595.4.patch, YARN-4595.5.patch
>
>
> Mounting files or directories from the host is one way of passing 
> configuration and other information into a docker container.  We could allow 
> the user to set a list of mounts in the environment of ContainerLaunchContext 
> (e.g. /dir1:/targetdir1,/dir2:/targetdir2).  These would be mounted read-only 
> to the specified target locations.
> Due to permissions and user concerns, for this ticket we will require the 
> mounts to be resources that are in the distributed cache.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3573) MiniMRYarnCluster constructor that starts the timeline server using a boolean should be marked deprecated

2016-04-27 Thread Andras Bokor (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260263#comment-15260263
 ] 

Andras Bokor commented on YARN-3573:


Anybody who can help me?

> MiniMRYarnCluster constructor that starts the timeline server using a boolean 
> should be marked deprecated
> -
>
> Key: YARN-3573
> URL: https://issues.apache.org/jira/browse/YARN-3573
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: timelineserver
>Affects Versions: 2.6.0
>Reporter: Mit Desai
>Assignee: Brahma Reddy Battula
> Fix For: 2.8.0
>
> Attachments: YARN-3573-002.patch, YARN-3573.patch
>
>
> {code}MiniMRYarnCluster(String testName, int noOfNMs, boolean enableAHS){code}
> starts the timeline server using *boolean enableAHS*. It is better to have 
> the timelineserver started based on the config value.
> We should mark this constructor as deprecated to avoid its future use.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4994) Use MiniYARNCluster with try-with-resources in tests

2016-04-27 Thread Andras Bokor (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andras Bokor updated YARN-4994:
---
Attachment: HDFS-10287.02.patch

[~templedf] Thanks a lot for reviewing my patch.
I updated the patch according to you recommendations. Some notes to the second 
point:
My IDE's settings was not in sync with Apache conventions. I set indent to 2 
and set continuation intend to 4.
[^HDFS-10287.02.patch]

> Use MiniYARNCluster with try-with-resources in tests
> 
>
> Key: YARN-4994
> URL: https://issues.apache.org/jira/browse/YARN-4994
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 2.7.0
>Reporter: Andras Bokor
>Assignee: Andras Bokor
>Priority: Trivial
> Fix For: 2.7.0
>
> Attachments: HDFS-10287.01.patch, HDFS-10287.02.patch
>
>
> In tests MiniYARNCluster is used with the following pattern:
> In try-catch block create a MiniYARNCluster instance and in finally block 
> close it.
> [Try-with-resources|https://docs.oracle.com/javase/tutorial/essential/exceptions/tryResourceClose.html]
>  is preferred since Java7 instead of the pattern above.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4122) Add support for GPU as a resource

2016-04-27 Thread Jun Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260021#comment-15260021
 ] 

Jun Gong commented on YARN-4122:


{quote}
>From the SLURM lists, it looks like prior to CUDA 7, the environment variable 
>was not working correctly:
https://devtalk.nvidia.com/default/topic/512869/cuda-accessing-all-devices-even-those-which-are-blacklisted/?offset=2
{quote}
We are using CUDA 7.5 now. I remember we did not come across this problem.

{quote}
This design will probably also have to adjust for the work being done in 
YARN-4726.
{quote}
Is there any plan for YARN to support GPU? It will be easier to support it 
based on YARN-3926. It will be a little complex to allocate GPU on NM because 
we need take GPU's topological structure into consideration for better 
performance.

{quote}
In the doc you say that YARN is currently providing you GPU isolation. How are 
you making that work?
{quote}
Use cgroups for hard limit. '*docker run --device=...*' does the same job, we 
do not need set cgroups by ourselves.

> Add support for GPU as a resource
> -
>
> Key: YARN-4122
> URL: https://issues.apache.org/jira/browse/YARN-4122
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Jun Gong
>Assignee: Jun Gong
> Attachments: GPUAsAResourceDesign.pdf
>
>
> Use [cgroups 
> devcies|https://www.kernel.org/doc/Documentation/cgroups/devices.txt] to 
> isolate GPUs for containers. For docker containers, we could use 'docker run 
> --device=...'.
> Reference: [SLURM Resources isolation through 
> cgroups|http://slurm.schedmd.com/slurm_ug_2011/SLURM_UserGroup2011_cgroups.pdf].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4966) Improve yarn logs to fetch container logs without specifying nodeId

2016-04-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15259945#comment-15259945
 ] 

Hudson commented on YARN-4966:
--

FAILURE: Integrated in Hadoop-trunk-Commit #9679 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9679/])
YARN-4966. Improve yarn logs to fetch container logs without specifying 
(vvasudev: rev 66b07d83740a2ec3e6bfb2bfd064863bae37a1b5)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/LogCLIHelpers.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestLogsCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/LogsCLI.java


> Improve yarn logs to fetch container logs without specifying nodeId
> ---
>
> Key: YARN-4966
> URL: https://issues.apache.org/jira/browse/YARN-4966
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Fix For: 2.9.0
>
> Attachments: YARN-4966.1.patch, YARN-4966.2.patch, YARN-4966.3.patch, 
> YARN-4966.4.patch
>
>
> Currently, for the finished application, we can get the container logs 
> without specify node id, but we need to enable 
> yarn.timeline-service.generic-application-history.enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4966) Improve yarn logs to fetch container logs without specifying nodeId

2016-04-27 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-4966:

Summary: Improve yarn logs to fetch container logs without specifying 
nodeId  (was: More improvement to get Container logs without specify nodeId)

> Improve yarn logs to fetch container logs without specifying nodeId
> ---
>
> Key: YARN-4966
> URL: https://issues.apache.org/jira/browse/YARN-4966
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4966.1.patch, YARN-4966.2.patch, YARN-4966.3.patch, 
> YARN-4966.4.patch
>
>
> Currently, for the finished application, we can get the container logs 
> without specify node id, but we need to enable 
> yarn.timeline-service.generic-application-history.enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (YARN-4595) Add support for configurable read-only mounts

2016-04-27 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15259917#comment-15259917
 ] 

Varun Vasudev edited comment on YARN-4595 at 4/27/16 10:14 AM:
---

Thanks for the explanation [~billie.rinaldi]. Just to summarize - the latest 
patch 
# allows users to only mount files and directories from the localized resources 
into the docker container
# in case of files, it does not allow symbolic links to be mounted into the 
docker container
# in case of directories, even if they have symbolic links pointing to 
directories outside the YARN local directories, since the target of the symlink 
is not mounted into the container, there is no access violation we need to take 
care of.

Is my understanding correct?


was (Author: vvasudev):
Thanks for the explanation [~billie.rinaldi]. 

> Add support for configurable read-only mounts
> -
>
> Key: YARN-4595
> URL: https://issues.apache.org/jira/browse/YARN-4595
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Billie Rinaldi
>Assignee: Billie Rinaldi
> Attachments: YARN-4595.1.patch, YARN-4595.2.patch, YARN-4595.3.patch, 
> YARN-4595.4.patch, YARN-4595.5.patch
>
>
> Mounting files or directories from the host is one way of passing 
> configuration and other information into a docker container.  We could allow 
> the user to set a list of mounts in the environment of ContainerLaunchContext 
> (e.g. /dir1:/targetdir1,/dir2:/targetdir2).  These would be mounted read-only 
> to the specified target locations.
> Due to permissions and user concerns, for this ticket we will require the 
> mounts to be resources that are in the distributed cache.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4595) Add support for configurable read-only mounts

2016-04-27 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15259917#comment-15259917
 ] 

Varun Vasudev commented on YARN-4595:
-

Thanks for the explanation [~billie.rinaldi]. 

> Add support for configurable read-only mounts
> -
>
> Key: YARN-4595
> URL: https://issues.apache.org/jira/browse/YARN-4595
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Billie Rinaldi
>Assignee: Billie Rinaldi
> Attachments: YARN-4595.1.patch, YARN-4595.2.patch, YARN-4595.3.patch, 
> YARN-4595.4.patch, YARN-4595.5.patch
>
>
> Mounting files or directories from the host is one way of passing 
> configuration and other information into a docker container.  We could allow 
> the user to set a list of mounts in the environment of ContainerLaunchContext 
> (e.g. /dir1:/targetdir1,/dir2:/targetdir2).  These would be mounted read-only 
> to the specified target locations.
> Due to permissions and user concerns, for this ticket we will require the 
> mounts to be resources that are in the distributed cache.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)