[jira] [Commented] (YARN-4090) Make Collections.sort() more efficient in FSParentQueue.java
[ https://issues.apache.org/jira/browse/YARN-4090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15261546#comment-15261546 ] Xianyin Xin commented on YARN-4090: --- Sorry for the delay, [~kasha] [~yufeigu]. Just uploaded the patch which fixed the three test fails above. For the two fails in {{TestFairSchedulerPreemption}}, it is because the {{decResourceUsage}} was double called when processing preemption (in both {{addPreemption()}} and {{containerCompleted()}}), and for the fail in {{TestAppRunnability}}, it is because we missed updating the queue's resource usage when moving an app. Thanks [~yufeigu] for you info. > Make Collections.sort() more efficient in FSParentQueue.java > > > Key: YARN-4090 > URL: https://issues.apache.org/jira/browse/YARN-4090 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Reporter: Xianyin Xin >Assignee: Xianyin Xin > Attachments: YARN-4090-TestResult.pdf, YARN-4090-preview.patch, > YARN-4090.001.patch, sampling1.jpg, sampling2.jpg > > > Collections.sort() consumes too much time in a scheduling round. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4090) Make Collections.sort() more efficient in FSParentQueue.java
[ https://issues.apache.org/jira/browse/YARN-4090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xianyin Xin updated YARN-4090: -- Attachment: YARN-4090.001.patch > Make Collections.sort() more efficient in FSParentQueue.java > > > Key: YARN-4090 > URL: https://issues.apache.org/jira/browse/YARN-4090 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Reporter: Xianyin Xin >Assignee: Xianyin Xin > Attachments: YARN-4090-TestResult.pdf, YARN-4090-preview.patch, > YARN-4090.001.patch, sampling1.jpg, sampling2.jpg > > > Collections.sort() consumes too much time in a scheduling round. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-5005) TestRMWebServices#testDumpingSchedulerLogs fails randomly
[ https://issues.apache.org/jira/browse/YARN-5005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-5005: --- Description: {noformat} org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Appender is already dumping logs at org.apache.hadoop.yarn.util.AdHocLogDumper.dumpLogs(AdHocLogDumper.java:65) at org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.dumpSchedulerLogs(RMWebServices.java:321) at org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices.testDumpingSchedulerLogs(TestRMWebServices.java:674) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.runners.ParentRunner.run(ParentRunner.java:309) at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:86) at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:459) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:675) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:382) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:192) {noformat} First dumpLog is set to dump logs for 1 sec {noformat} webSvc.dumpSchedulerLogs("1", mockHsr); Thread.sleep(1000); {noformat} sleep(1000) is used wait for completion but randomly during testcase run the log dump is called again with in 1 sec. was: {noformat} org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Appender is already dumping logs at org.apache.hadoop.yarn.util.AdHocLogDumper.dumpLogs(AdHocLogDumper.java:65) at org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.dumpSchedulerLogs(RMWebServices.java:321) at org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices.testDumpingSchedulerLogs(TestRMWebServices.java:674) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) at org.junit.runners.ParentRunner.access$000(ParentRunner.ja
[jira] [Created] (YARN-5005) TestRMWebServices#testDumpingSchedulerLogs fails randomly
Bibin A Chundatt created YARN-5005: -- Summary: TestRMWebServices#testDumpingSchedulerLogs fails randomly Key: YARN-5005 URL: https://issues.apache.org/jira/browse/YARN-5005 Project: Hadoop YARN Issue Type: Bug Reporter: Bibin A Chundatt {noformat} org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Appender is already dumping logs at org.apache.hadoop.yarn.util.AdHocLogDumper.dumpLogs(AdHocLogDumper.java:65) at org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.dumpSchedulerLogs(RMWebServices.java:321) at org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices.testDumpingSchedulerLogs(TestRMWebServices.java:674) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.runners.ParentRunner.run(ParentRunner.java:309) at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:86) at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:459) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:675) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:382) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:192) {noformat} First dumpLog is set to dump logs for 1 sec {noformat} webSvc.dumpSchedulerLogs("1", mockHsr); Thread.sleep(1000); {noformat} And sleep(1000) is used wait for completion and randomly during testcase run the log dump is called again with in 1 sec. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-5005) TestRMWebServices#testDumpingSchedulerLogs fails randomly
[ https://issues.apache.org/jira/browse/YARN-5005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-5005: --- Issue Type: Test (was: Bug) > TestRMWebServices#testDumpingSchedulerLogs fails randomly > - > > Key: YARN-5005 > URL: https://issues.apache.org/jira/browse/YARN-5005 > Project: Hadoop YARN > Issue Type: Test >Reporter: Bibin A Chundatt > > {noformat} > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Appender is already > dumping logs > at > org.apache.hadoop.yarn.util.AdHocLogDumper.dumpLogs(AdHocLogDumper.java:65) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.dumpSchedulerLogs(RMWebServices.java:321) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices.testDumpingSchedulerLogs(TestRMWebServices.java:674) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) > at java.lang.reflect.Method.invoke(Unknown Source) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at org.junit.runners.ParentRunner.run(ParentRunner.java:309) > at > org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:86) > at > org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38) > at > org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:459) > at > org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:675) > at > org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:382) > at > org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:192) > {noformat} > First dumpLog is set to dump logs for 1 sec > {noformat} > webSvc.dumpSchedulerLogs("1", mockHsr); > Thread.sleep(1000); > {noformat} > And sleep(1000) is used wait for completion and randomly during testcase run > the log dump is called again with in 1 sec. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4447) Provide a mechanism to represent complex filters and parse them at the REST layer
[ https://issues.apache.org/jira/browse/YARN-4447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15261309#comment-15261309 ] Hadoop QA commented on YARN-4447: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 14s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 3 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 26s {color} | {color:green} YARN-2928 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 20s {color} | {color:green} YARN-2928 passed with JDK v1.8.0_92 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 19s {color} | {color:green} YARN-2928 passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 17s {color} | {color:green} YARN-2928 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 27s {color} | {color:green} YARN-2928 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 17s {color} | {color:green} YARN-2928 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 37s {color} | {color:green} YARN-2928 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 16s {color} | {color:green} YARN-2928 passed with JDK v1.8.0_92 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s {color} | {color:green} YARN-2928 passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 22s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 15s {color} | {color:green} the patch passed with JDK v1.8.0_92 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 15s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 18s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 18s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 13s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice: patch generated 12 new + 5 unchanged - 10 fixed = 17 total (was 15) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 26s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 44s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 13s {color} | {color:green} the patch passed with JDK v1.8.0_92 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 4m 16s {color} | {color:green} hadoop-yarn-server-timelineservice in the patch passed with JDK v1.8.0_92. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 4m 14s {color} | {color:green} hadoop-yarn-server-timelineservice in the patch passed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 18s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 24m 25s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:0ca8df7 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12801073/YARN-4447-YARN-2928.01.patch | | JIRA Issue | YARN-4447 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 47289d5b4cec 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | |
[jira] [Commented] (YARN-4447) Provide a mechanism to represent complex filters and parse them at the REST layer
[ https://issues.apache.org/jira/browse/YARN-4447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15261276#comment-15261276 ] Sangjin Lee commented on YARN-4447: --- I just restarted the jenkins build, but it seems that there is a larger build issue going on. We'll see. > Provide a mechanism to represent complex filters and parse them at the REST > layer > -- > > Key: YARN-4447 > URL: https://issues.apache.org/jira/browse/YARN-4447 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Labels: yarn-2928-1st-milestone > Attachments: YARN-4447-YARN-2928.01.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4987) Read cache concurrency issue between read and evict in EntityGroupFS timeline store
[ https://issues.apache.org/jira/browse/YARN-4987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu updated YARN-4987: Description: To handle concurrency issues, key value based timeline storage may return null on reads that are concurrent to service stop. This is actually caused by a concurrency issue between cache reads and evicts. Specifically, if the storage is being read when it gets evicted, the storage may turn into null. EntityGroupFS timeline store needs to handle this case gracefully. (was: To handle concurrency issues, key value based timeline storage may return null on reads that are concurrent to service stop. EntityGroupFS timeline store needs to handle this case gracefully. ) > Read cache concurrency issue between read and evict in EntityGroupFS timeline > store > > > Key: YARN-4987 > URL: https://issues.apache.org/jira/browse/YARN-4987 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Li Lu >Assignee: Li Lu >Priority: Critical > > To handle concurrency issues, key value based timeline storage may return > null on reads that are concurrent to service stop. This is actually caused by > a concurrency issue between cache reads and evicts. Specifically, if the > storage is being read when it gets evicted, the storage may turn into null. > EntityGroupFS timeline store needs to handle this case gracefully. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4987) Read cache concurrency issue between read and evict in EntityGroupFS timeline store
[ https://issues.apache.org/jira/browse/YARN-4987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu updated YARN-4987: Summary: Read cache concurrency issue between read and evict in EntityGroupFS timeline store (was: EntityGroupFS timeline store needs to handle null storage gracefully) > Read cache concurrency issue between read and evict in EntityGroupFS timeline > store > > > Key: YARN-4987 > URL: https://issues.apache.org/jira/browse/YARN-4987 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Li Lu >Assignee: Li Lu >Priority: Critical > > To handle concurrency issues, key value based timeline storage may return > null on reads that are concurrent to service stop. EntityGroupFS timeline > store needs to handle this case gracefully. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4987) EntityGroupFS timeline store needs to handle null storage gracefully
[ https://issues.apache.org/jira/browse/YARN-4987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15261240#comment-15261240 ] Li Lu commented on YARN-4987: - I just notice there is a caching/concurrency bug hidden behind this issue. The cache item being reclaimed may actually be reading by some other concurrent readers. Will fix the problem in this JIRA. > EntityGroupFS timeline store needs to handle null storage gracefully > > > Key: YARN-4987 > URL: https://issues.apache.org/jira/browse/YARN-4987 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Li Lu >Assignee: Li Lu >Priority: Critical > > To handle concurrency issues, key value based timeline storage may return > null on reads that are concurrent to service stop. EntityGroupFS timeline > store needs to handle this case gracefully. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4987) EntityGroupFS timeline store needs to handle null storage gracefully
[ https://issues.apache.org/jira/browse/YARN-4987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu updated YARN-4987: Priority: Critical (was: Minor) > EntityGroupFS timeline store needs to handle null storage gracefully > > > Key: YARN-4987 > URL: https://issues.apache.org/jira/browse/YARN-4987 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Li Lu >Assignee: Li Lu >Priority: Critical > > To handle concurrency issues, key value based timeline storage may return > null on reads that are concurrent to service stop. EntityGroupFS timeline > store needs to handle this case gracefully. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4390) Do surgical preemption based on reserved container in CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-4390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15261207#comment-15261207 ] Wangda Tan commented on YARN-4390: -- Oh, I think I understand what happened. If you set nature_termination_factor to < 1, AM cannot be reserved in some cases because of YARN-4280. (Reserving AM needs 2G resources, but every time preemption policy only preempts one container which goes back to original queue.). So set it to 1 is still needed. > Do surgical preemption based on reserved container in CapacityScheduler > --- > > Key: YARN-4390 > URL: https://issues.apache.org/jira/browse/YARN-4390 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Affects Versions: 3.0.0, 2.8.0, 2.7.3 >Reporter: Eric Payne >Assignee: Wangda Tan > Attachments: QueueNotHittingMax.jpg, YARN-4390-design.1.pdf, > YARN-4390-test-results.pdf, YARN-4390.1.patch, YARN-4390.2.patch, > YARN-4390.3.branch-2.patch, YARN-4390.3.patch, YARN-4390.4.patch, > YARN-4390.5.patch, YARN-4390.6.patch, YARN-4390.7.patch > > > There are multiple reasons why preemption could unnecessarily preempt > containers. One is that an app could be requesting a large container (say > 8-GB), and the preemption monitor could conceivably preempt multiple > containers (say 8, 1-GB containers) in order to fill the large container > request. These smaller containers would then be rejected by the requesting AM > and potentially given right back to the preempted app. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4844) Upgrade fields of o.a.h.y.api.records.Resource from int32 to int64
[ https://issues.apache.org/jira/browse/YARN-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15261187#comment-15261187 ] Hitesh Shah commented on YARN-4844: --- bq. Per my understanding, changing from int to long won't affect downstream project a lot, it's an error which can be captured by compiler directly. And getMemory/getVCores should not be used intensively by downstream project. For example, MR uses only ~20 times of getMemory()/VCores for non-testing code. Which can be easily fixed. If you are going to force downstream apps to change, I dont understand why you are not forcing them to do this in the first 3.0.0 release? What benefit does this provide anyone by delaying it to some later 3.x.y release? It just means that you have do the production stability verification of upstream apps all over again. > Upgrade fields of o.a.h.y.api.records.Resource from int32 to int64 > -- > > Key: YARN-4844 > URL: https://issues.apache.org/jira/browse/YARN-4844 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Blocker > Attachments: YARN-4844.1.patch, YARN-4844.2.patch, YARN-4844.3.patch > > > We use int32 for memory now, if a cluster has 10k nodes, each node has 210G > memory, we will get a negative total cluster memory. > And another case that easier overflows int32 is: we added all pending > resources of running apps to cluster's total pending resources. If a > problematic app requires too much resources (let's say 1M+ containers, each > of them has 3G containers), int32 will be not enough. > Even if we can cap each app's pending request, we cannot handle the case that > there're many running apps, each of them has capped but still significant > numbers of pending resources. > So we may possibly need to upgrade int32 memory field (could include v-cores > as well) to int64 to avoid integer overflow. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4844) Upgrade fields of o.a.h.y.api.records.Resource from int32 to int64
[ https://issues.apache.org/jira/browse/YARN-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-4844: - Attachment: YARN-4844.3.patch Attached ver.3 patch. > Upgrade fields of o.a.h.y.api.records.Resource from int32 to int64 > -- > > Key: YARN-4844 > URL: https://issues.apache.org/jira/browse/YARN-4844 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Blocker > Attachments: YARN-4844.1.patch, YARN-4844.2.patch, YARN-4844.3.patch > > > We use int32 for memory now, if a cluster has 10k nodes, each node has 210G > memory, we will get a negative total cluster memory. > And another case that easier overflows int32 is: we added all pending > resources of running apps to cluster's total pending resources. If a > problematic app requires too much resources (let's say 1M+ containers, each > of them has 3G containers), int32 will be not enough. > Even if we can cap each app's pending request, we cannot handle the case that > there're many running apps, each of them has capped but still significant > numbers of pending resources. > So we may possibly need to upgrade int32 memory field (could include v-cores > as well) to int64 to avoid integer overflow. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (YARN-4913) Yarn logs should take a -out option to write to a directory
[ https://issues.apache.org/jira/browse/YARN-4913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ram Venkatesh reopened YARN-4913: - > Yarn logs should take a -out option to write to a directory > --- > > Key: YARN-4913 > URL: https://issues.apache.org/jira/browse/YARN-4913 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-4913.1.patch, YARN-4913.2.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4913) Yarn logs should take a -out option to write to a directory
[ https://issues.apache.org/jira/browse/YARN-4913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15261164#comment-15261164 ] Ram Venkatesh commented on YARN-4913: - [~xgong] Here are two reasons for the -out option, both are more relevant for large multi-GB app instances. 1. yarn logs > $targetFile produces a single file that appends all the individual container logs. This requires (clumsy | complex) parsing to split the files apart if you are looking for specific task or app-specific logs. Writing to a directory will preserve the distinct files easily and also lend itself to archiving. 2. redirecting through the console instead of writing directly to the local filesystem apis adds additional overhead on some platforms like Windows. >From a supportability standpoint I think this option will be useful. > Yarn logs should take a -out option to write to a directory > --- > > Key: YARN-4913 > URL: https://issues.apache.org/jira/browse/YARN-4913 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-4913.1.patch, YARN-4913.2.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2888) Corrective mechanisms for rebalancing NM container queues
[ https://issues.apache.org/jira/browse/YARN-2888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15261144#comment-15261144 ] Hadoop QA commented on YARN-2888: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} docker {color} | {color:red} 0m 3s {color} | {color:red} Docker failed to build yetus/hadoop:7b1c37a. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12801107/YARN-2888.004.patch | | JIRA Issue | YARN-2888 | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/11253/console | | Powered by | Apache Yetus 0.2.0 http://yetus.apache.org | This message was automatically generated. > Corrective mechanisms for rebalancing NM container queues > - > > Key: YARN-2888 > URL: https://issues.apache.org/jira/browse/YARN-2888 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Konstantinos Karanasos >Assignee: Arun Suresh > Attachments: YARN-2888-yarn-2877.001.patch, > YARN-2888-yarn-2877.002.patch, YARN-2888.003.patch, YARN-2888.004.patch > > > Bad queuing decisions by the LocalRMs (e.g., due to the distributed nature of > the scheduling decisions or due to having a stale image of the system) may > lead to an imbalance in the waiting times of the NM container queues. This > can in turn have an impact in job execution times and cluster utilization. > To this end, we introduce corrective mechanisms that may remove (whenever > needed) container requests from overloaded queues, adding them to less-loaded > ones. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-5002) getApplicationReport call may raise NPE
[ https://issues.apache.org/jira/browse/YARN-5002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15261139#comment-15261139 ] Jian He commented on YARN-5002: --- bq. S3 credentials in the output path. sorry, could you clarify what you mean? what output path are you referring to ? This is to say whether the user can view the YARN app meta info from the UI or command line. Also, the original app_view acl is still taking effect. Also, I think user app should not rely on the yarn queue acl for their app access control in the first place. > getApplicationReport call may raise NPE > --- > > Key: YARN-5002 > URL: https://issues.apache.org/jira/browse/YARN-5002 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Sumana Sathish >Assignee: Jian He >Priority: Critical > Attachments: YARN-5002.1.patch, YARN-5002.2.patch, YARN-5002.3.patch > > > getApplicationReport call may raise NPE > {code} > Exception in thread "main" java.lang.NullPointerException: > java.lang.NullPointerException > > org.apache.hadoop.yarn.server.resourcemanager.security.QueueACLsManager.checkAccess(QueueACLsManager.java:57) > > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.checkAccess(ClientRMService.java:279) > > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplications(ClientRMService.java:760) > > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplications(ClientRMService.java:682) > > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplications(ApplicationClientProtocolPBServiceImpl.java:234) > > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:425) > > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969) > org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2268) > org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2264) > java.security.AccessController.doPrivileged(Native Method) > javax.security.auth.Subject.doAs(Subject.java:422) > > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1708) > org.apache.hadoop.ipc.Server$Handler.run(Server.java:2262) > sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > java.lang.reflect.Constructor.newInstance(Constructor.java:423) > org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53) > org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:107) > > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplications(ApplicationClientProtocolPBClientImpl.java:254) > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > java.lang.reflect.Method.invoke(Method.java:498) > > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256) > > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104) > com.sun.proxy.$Proxy18.getApplications(Unknown Source) > > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplications(YarnClientImpl.java:479) > > org.apache.hadoop.mapred.ResourceMgrDelegate.getAllJobs(ResourceMgrDelegate.java:135) > org.apache.hadoop.mapred.YARNRunner.getAllJobs(YARNRunner.java:167) > org.apache.hadoop.mapreduce.Cluster.getAllJobStatuses(Cluster.java:294) > org.apache.hadoop.mapreduce.tools.CLI.listJobs(CLI.java:553) > org.apache.hadoop.mapreduce.tools.CLI.run(CLI.java:338) > org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) > org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) > org.apache.hadoop.mapred.JobClient.main(JobClient.java:1274) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-5004) FS: queue can use more than the max resources set
[ https://issues.apache.org/jira/browse/YARN-5004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yufei Gu updated YARN-5004: --- Affects Version/s: 2.8.0 > FS: queue can use more than the max resources set > - > > Key: YARN-5004 > URL: https://issues.apache.org/jira/browse/YARN-5004 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler, yarn >Affects Versions: 2.8.0 >Reporter: Yufei Gu >Assignee: Yufei Gu > > We found a case that the queue is using 301 vcores while the max is set to > 300. The same for the memory usage. The documentation states (see hadoop > 2.7.1 FairScheduler documentation on apache): > -+-+- > A queue will never be assigned a container that would put its aggregate usage > over this limit. > -+-+- > This is clearly not correct in the documentation or the behaviour. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-5004) FS: queue can use more than the max resources set
[ https://issues.apache.org/jira/browse/YARN-5004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yufei Gu updated YARN-5004: --- Component/s: yarn fairscheduler > FS: queue can use more than the max resources set > - > > Key: YARN-5004 > URL: https://issues.apache.org/jira/browse/YARN-5004 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler, yarn >Reporter: Yufei Gu >Assignee: Yufei Gu > > We found a case that the queue is using 301 vcores while the max is set to > 300. The same for the memory usage. The documentation states (see hadoop > 2.7.1 FairScheduler documentation on apache): > -+-+- > A queue will never be assigned a container that would put its aggregate usage > over this limit. > -+-+- > This is clearly not correct in the documentation or the behaviour. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-5002) getApplicationReport call may raise NPE
[ https://issues.apache.org/jira/browse/YARN-5002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15261098#comment-15261098 ] Daniel Templeton commented on YARN-5002: If I submit a mapreduce job to a secure queue that has my S3 credentials in the output path, I'm gonna be pretty pissed if some admin deleting a queue causes my credentials to be exposed. > getApplicationReport call may raise NPE > --- > > Key: YARN-5002 > URL: https://issues.apache.org/jira/browse/YARN-5002 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Sumana Sathish >Assignee: Jian He >Priority: Critical > Attachments: YARN-5002.1.patch, YARN-5002.2.patch, YARN-5002.3.patch > > > getApplicationReport call may raise NPE > {code} > Exception in thread "main" java.lang.NullPointerException: > java.lang.NullPointerException > > org.apache.hadoop.yarn.server.resourcemanager.security.QueueACLsManager.checkAccess(QueueACLsManager.java:57) > > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.checkAccess(ClientRMService.java:279) > > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplications(ClientRMService.java:760) > > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplications(ClientRMService.java:682) > > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplications(ApplicationClientProtocolPBServiceImpl.java:234) > > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:425) > > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969) > org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2268) > org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2264) > java.security.AccessController.doPrivileged(Native Method) > javax.security.auth.Subject.doAs(Subject.java:422) > > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1708) > org.apache.hadoop.ipc.Server$Handler.run(Server.java:2262) > sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > java.lang.reflect.Constructor.newInstance(Constructor.java:423) > org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53) > org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:107) > > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplications(ApplicationClientProtocolPBClientImpl.java:254) > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > java.lang.reflect.Method.invoke(Method.java:498) > > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256) > > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104) > com.sun.proxy.$Proxy18.getApplications(Unknown Source) > > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplications(YarnClientImpl.java:479) > > org.apache.hadoop.mapred.ResourceMgrDelegate.getAllJobs(ResourceMgrDelegate.java:135) > org.apache.hadoop.mapred.YARNRunner.getAllJobs(YARNRunner.java:167) > org.apache.hadoop.mapreduce.Cluster.getAllJobStatuses(Cluster.java:294) > org.apache.hadoop.mapreduce.tools.CLI.listJobs(CLI.java:553) > org.apache.hadoop.mapreduce.tools.CLI.run(CLI.java:338) > org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) > org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) > org.apache.hadoop.mapred.JobClient.main(JobClient.java:1274) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-5002) getApplicationReport call may raise NPE
[ https://issues.apache.org/jira/browse/YARN-5002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15261086#comment-15261086 ] Wangda Tan commented on YARN-5002: -- Echo [~jianhe]'s comment: I would also prefer existing approach: if queue is removed, ACL to view these apps from removed queue should be *, otherwise apps will be disappeared from any user's perspective. And this is the previous behavior too. A more comprehensive approach is to record configuration to state-store. In short term, attached fix looks good. > getApplicationReport call may raise NPE > --- > > Key: YARN-5002 > URL: https://issues.apache.org/jira/browse/YARN-5002 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Sumana Sathish >Assignee: Jian He >Priority: Critical > Attachments: YARN-5002.1.patch, YARN-5002.2.patch, YARN-5002.3.patch > > > getApplicationReport call may raise NPE > {code} > Exception in thread "main" java.lang.NullPointerException: > java.lang.NullPointerException > > org.apache.hadoop.yarn.server.resourcemanager.security.QueueACLsManager.checkAccess(QueueACLsManager.java:57) > > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.checkAccess(ClientRMService.java:279) > > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplications(ClientRMService.java:760) > > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplications(ClientRMService.java:682) > > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplications(ApplicationClientProtocolPBServiceImpl.java:234) > > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:425) > > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969) > org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2268) > org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2264) > java.security.AccessController.doPrivileged(Native Method) > javax.security.auth.Subject.doAs(Subject.java:422) > > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1708) > org.apache.hadoop.ipc.Server$Handler.run(Server.java:2262) > sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > java.lang.reflect.Constructor.newInstance(Constructor.java:423) > org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53) > org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:107) > > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplications(ApplicationClientProtocolPBClientImpl.java:254) > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > java.lang.reflect.Method.invoke(Method.java:498) > > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256) > > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104) > com.sun.proxy.$Proxy18.getApplications(Unknown Source) > > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplications(YarnClientImpl.java:479) > > org.apache.hadoop.mapred.ResourceMgrDelegate.getAllJobs(ResourceMgrDelegate.java:135) > org.apache.hadoop.mapred.YARNRunner.getAllJobs(YARNRunner.java:167) > org.apache.hadoop.mapreduce.Cluster.getAllJobStatuses(Cluster.java:294) > org.apache.hadoop.mapreduce.tools.CLI.listJobs(CLI.java:553) > org.apache.hadoop.mapreduce.tools.CLI.run(CLI.java:338) > org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) > org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) > org.apache.hadoop.mapred.JobClient.main(JobClient.java:1274) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-5004) FS: queue can use more than the max resources set
Yufei Gu created YARN-5004: -- Summary: FS: queue can use more than the max resources set Key: YARN-5004 URL: https://issues.apache.org/jira/browse/YARN-5004 Project: Hadoop YARN Issue Type: Bug Reporter: Yufei Gu Assignee: Yufei Gu We found a case that the queue is using 301 vcores while the max is set to 300. The same for the memory usage. The documentation states (see hadoop 2.7.1 FairScheduler documentation on apache): -+-+- A queue will never be assigned a container that would put its aggregate usage over this limit. -+-+- This is clearly not correct in the documentation or the behaviour. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4390) Do surgical preemption based on reserved container in CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-4390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260985#comment-15260985 ] Eric Payne commented on YARN-4390: -- [~leftnoteasy], {quote} have you set following config? {code} yarn.resourcemanager.monitor.capacity.preemption.select_based_on_reserved_containers true {code} {quote} Yup! :-) I double checked and that parameter is definitely set in my environment. bq. 1) total_preemption_per_round should make sure that, each round needs preempt enough resource to allocate one large container preemption per round is set to 100% bq. 2) before ver.7, natural_termination_factor should set to 1 to make sure enough resources will be preempted. That was it! I set natural termination factor to 1.0 and it's working more in line with what I expect. I was not setting natural termination factor. Unfortunately, when I applied YARN-4390.7.patch, I still need to set the natural termination factor in order to get the expected results. If I just leave that parameter out of my config and let it go to the default, the behavior is the same as in version 6 of the patch. That is, the app requesting larger containers can never use more than about 68% of the {{ops}} queue, and the app running on the preemptable queue has more than 100 containers preempted, only to be given back to the same app. > Do surgical preemption based on reserved container in CapacityScheduler > --- > > Key: YARN-4390 > URL: https://issues.apache.org/jira/browse/YARN-4390 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Affects Versions: 3.0.0, 2.8.0, 2.7.3 >Reporter: Eric Payne >Assignee: Wangda Tan > Attachments: QueueNotHittingMax.jpg, YARN-4390-design.1.pdf, > YARN-4390-test-results.pdf, YARN-4390.1.patch, YARN-4390.2.patch, > YARN-4390.3.branch-2.patch, YARN-4390.3.patch, YARN-4390.4.patch, > YARN-4390.5.patch, YARN-4390.6.patch, YARN-4390.7.patch > > > There are multiple reasons why preemption could unnecessarily preempt > containers. One is that an app could be requesting a large container (say > 8-GB), and the preemption monitor could conceivably preempt multiple > containers (say 8, 1-GB containers) in order to fill the large container > request. These smaller containers would then be rejected by the requesting AM > and potentially given right back to the preempted app. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3573) MiniMRYarnCluster constructor that starts the timeline server using a boolean should be marked deprecated
[ https://issues.apache.org/jira/browse/YARN-3573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260972#comment-15260972 ] Andras Bokor commented on YARN-3573: [~brahmareddy] Thanks for getting back to me. I meant MiniYARNCluster not MiniMRYarnCluster. Actually what I see is that through the other constructors we call the deprecated constructor. So finally we cannot avoid to call a deprecated method. It can be confusing at first time. Is it considered to remove? > MiniMRYarnCluster constructor that starts the timeline server using a boolean > should be marked deprecated > - > > Key: YARN-3573 > URL: https://issues.apache.org/jira/browse/YARN-3573 > Project: Hadoop YARN > Issue Type: Test > Components: timelineserver >Affects Versions: 2.6.0 >Reporter: Mit Desai >Assignee: Brahma Reddy Battula > Fix For: 2.8.0 > > Attachments: YARN-3573-002.patch, YARN-3573.patch > > > {code}MiniMRYarnCluster(String testName, int noOfNMs, boolean enableAHS){code} > starts the timeline server using *boolean enableAHS*. It is better to have > the timelineserver started based on the config value. > We should mark this constructor as deprecated to avoid its future use. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2888) Corrective mechanisms for rebalancing NM container queues
[ https://issues.apache.org/jira/browse/YARN-2888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260945#comment-15260945 ] Hadoop QA commented on YARN-2888: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} docker {color} | {color:red} 0m 2s {color} | {color:red} Docker failed to build yetus/hadoop:7b1c37a. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12801107/YARN-2888.004.patch | | JIRA Issue | YARN-2888 | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/11252/console | | Powered by | Apache Yetus 0.2.0 http://yetus.apache.org | This message was automatically generated. > Corrective mechanisms for rebalancing NM container queues > - > > Key: YARN-2888 > URL: https://issues.apache.org/jira/browse/YARN-2888 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Konstantinos Karanasos >Assignee: Arun Suresh > Attachments: YARN-2888-yarn-2877.001.patch, > YARN-2888-yarn-2877.002.patch, YARN-2888.003.patch, YARN-2888.004.patch > > > Bad queuing decisions by the LocalRMs (e.g., due to the distributed nature of > the scheduling decisions or due to having a stale image of the system) may > lead to an imbalance in the waiting times of the NM container queues. This > can in turn have an impact in job execution times and cluster utilization. > To this end, we introduce corrective mechanisms that may remove (whenever > needed) container requests from overloaded queues, adding them to less-loaded > ones. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2888) Corrective mechanisms for rebalancing NM container queues
[ https://issues.apache.org/jira/browse/YARN-2888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260941#comment-15260941 ] Hadoop QA commented on YARN-2888: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} docker {color} | {color:red} 0m 3s {color} | {color:red} Docker failed to build yetus/hadoop:7b1c37a. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12801107/YARN-2888.004.patch | | JIRA Issue | YARN-2888 | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/11251/console | | Powered by | Apache Yetus 0.2.0 http://yetus.apache.org | This message was automatically generated. > Corrective mechanisms for rebalancing NM container queues > - > > Key: YARN-2888 > URL: https://issues.apache.org/jira/browse/YARN-2888 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Konstantinos Karanasos >Assignee: Arun Suresh > Attachments: YARN-2888-yarn-2877.001.patch, > YARN-2888-yarn-2877.002.patch, YARN-2888.003.patch, YARN-2888.004.patch > > > Bad queuing decisions by the LocalRMs (e.g., due to the distributed nature of > the scheduling decisions or due to having a stale image of the system) may > lead to an imbalance in the waiting times of the NM container queues. This > can in turn have an impact in job execution times and cluster utilization. > To this end, we introduce corrective mechanisms that may remove (whenever > needed) container requests from overloaded queues, adding them to less-loaded > ones. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2888) Corrective mechanisms for rebalancing NM container queues
[ https://issues.apache.org/jira/browse/YARN-2888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260935#comment-15260935 ] Hadoop QA commented on YARN-2888: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} docker {color} | {color:red} 0m 3s {color} | {color:red} Docker failed to build yetus/hadoop:7b1c37a. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12801107/YARN-2888.004.patch | | JIRA Issue | YARN-2888 | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/11250/console | | Powered by | Apache Yetus 0.2.0 http://yetus.apache.org | This message was automatically generated. > Corrective mechanisms for rebalancing NM container queues > - > > Key: YARN-2888 > URL: https://issues.apache.org/jira/browse/YARN-2888 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Konstantinos Karanasos >Assignee: Arun Suresh > Attachments: YARN-2888-yarn-2877.001.patch, > YARN-2888-yarn-2877.002.patch, YARN-2888.003.patch, YARN-2888.004.patch > > > Bad queuing decisions by the LocalRMs (e.g., due to the distributed nature of > the scheduling decisions or due to having a stale image of the system) may > lead to an imbalance in the waiting times of the NM container queues. This > can in turn have an impact in job execution times and cluster utilization. > To this end, we introduce corrective mechanisms that may remove (whenever > needed) container requests from overloaded queues, adding them to less-loaded > ones. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2888) Corrective mechanisms for rebalancing NM container queues
[ https://issues.apache.org/jira/browse/YARN-2888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260933#comment-15260933 ] Hadoop QA commented on YARN-2888: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} docker {color} | {color:red} 7m 54s {color} | {color:red} Docker failed to build yetus/hadoop:7b1c37a. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12801107/YARN-2888.004.patch | | JIRA Issue | YARN-2888 | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/11249/console | | Powered by | Apache Yetus 0.2.0 http://yetus.apache.org | This message was automatically generated. > Corrective mechanisms for rebalancing NM container queues > - > > Key: YARN-2888 > URL: https://issues.apache.org/jira/browse/YARN-2888 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Konstantinos Karanasos >Assignee: Arun Suresh > Attachments: YARN-2888-yarn-2877.001.patch, > YARN-2888-yarn-2877.002.patch, YARN-2888.003.patch, YARN-2888.004.patch > > > Bad queuing decisions by the LocalRMs (e.g., due to the distributed nature of > the scheduling decisions or due to having a stale image of the system) may > lead to an imbalance in the waiting times of the NM container queues. This > can in turn have an impact in job execution times and cluster utilization. > To this end, we introduce corrective mechanisms that may remove (whenever > needed) container requests from overloaded queues, adding them to less-loaded > ones. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2888) Corrective mechanisms for rebalancing NM container queues
[ https://issues.apache.org/jira/browse/YARN-2888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun Suresh updated YARN-2888: -- Attachment: YARN-2888.004.patch kicking off Jenkins again.. > Corrective mechanisms for rebalancing NM container queues > - > > Key: YARN-2888 > URL: https://issues.apache.org/jira/browse/YARN-2888 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Konstantinos Karanasos >Assignee: Arun Suresh > Attachments: YARN-2888-yarn-2877.001.patch, > YARN-2888-yarn-2877.002.patch, YARN-2888.003.patch, YARN-2888.004.patch > > > Bad queuing decisions by the LocalRMs (e.g., due to the distributed nature of > the scheduling decisions or due to having a stale image of the system) may > lead to an imbalance in the waiting times of the NM container queues. This > can in turn have an impact in job execution times and cluster utilization. > To this end, we introduce corrective mechanisms that may remove (whenever > needed) container requests from overloaded queues, adding them to less-loaded > ones. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4913) Yarn logs should take a -out option to write to a directory
[ https://issues.apache.org/jira/browse/YARN-4913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260860#comment-15260860 ] Xuan Gong commented on YARN-4913: - Yes, actually we do not need this > Yarn logs should take a -out option to write to a directory > --- > > Key: YARN-4913 > URL: https://issues.apache.org/jira/browse/YARN-4913 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-4913.1.patch, YARN-4913.2.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-4913) Yarn logs should take a -out option to write to a directory
[ https://issues.apache.org/jira/browse/YARN-4913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong resolved YARN-4913. - Resolution: Won't Fix > Yarn logs should take a -out option to write to a directory > --- > > Key: YARN-4913 > URL: https://issues.apache.org/jira/browse/YARN-4913 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-4913.1.patch, YARN-4913.2.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-5002) getApplicationReport call may raise NPE
[ https://issues.apache.org/jira/browse/YARN-5002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260840#comment-15260840 ] Daniel Templeton commented on YARN-5002: Security and usability are rarely in agreement. Unfortunately, security carries a bigger stick. I like the suggestion of handling the issue above the level of access control. It seems to me that this issue is most appropriately handled in the recovery code. If a recovered application's queue doesn't exist, do something smart with it there. > getApplicationReport call may raise NPE > --- > > Key: YARN-5002 > URL: https://issues.apache.org/jira/browse/YARN-5002 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Sumana Sathish >Assignee: Jian He >Priority: Critical > Attachments: YARN-5002.1.patch, YARN-5002.2.patch, YARN-5002.3.patch > > > getApplicationReport call may raise NPE > {code} > Exception in thread "main" java.lang.NullPointerException: > java.lang.NullPointerException > > org.apache.hadoop.yarn.server.resourcemanager.security.QueueACLsManager.checkAccess(QueueACLsManager.java:57) > > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.checkAccess(ClientRMService.java:279) > > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplications(ClientRMService.java:760) > > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplications(ClientRMService.java:682) > > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplications(ApplicationClientProtocolPBServiceImpl.java:234) > > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:425) > > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969) > org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2268) > org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2264) > java.security.AccessController.doPrivileged(Native Method) > javax.security.auth.Subject.doAs(Subject.java:422) > > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1708) > org.apache.hadoop.ipc.Server$Handler.run(Server.java:2262) > sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > java.lang.reflect.Constructor.newInstance(Constructor.java:423) > org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53) > org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:107) > > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplications(ApplicationClientProtocolPBClientImpl.java:254) > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > java.lang.reflect.Method.invoke(Method.java:498) > > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256) > > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104) > com.sun.proxy.$Proxy18.getApplications(Unknown Source) > > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplications(YarnClientImpl.java:479) > > org.apache.hadoop.mapred.ResourceMgrDelegate.getAllJobs(ResourceMgrDelegate.java:135) > org.apache.hadoop.mapred.YARNRunner.getAllJobs(YARNRunner.java:167) > org.apache.hadoop.mapreduce.Cluster.getAllJobStatuses(Cluster.java:294) > org.apache.hadoop.mapreduce.tools.CLI.listJobs(CLI.java:553) > org.apache.hadoop.mapreduce.tools.CLI.run(CLI.java:338) > org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) > org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) > org.apache.hadoop.mapred.JobClient.main(JobClient.java:1274) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-5002) getApplicationReport call may raise NPE
[ https://issues.apache.org/jira/browse/YARN-5002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260829#comment-15260829 ] Hadoop QA commented on YARN-5002: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} docker {color} | {color:red} 0m 2s {color} | {color:red} Docker failed to build yetus/hadoop:7b1c37a. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12801098/YARN-5002.3.patch | | JIRA Issue | YARN-5002 | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/11248/console | | Powered by | Apache Yetus 0.2.0 http://yetus.apache.org | This message was automatically generated. > getApplicationReport call may raise NPE > --- > > Key: YARN-5002 > URL: https://issues.apache.org/jira/browse/YARN-5002 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Sumana Sathish >Assignee: Jian He >Priority: Critical > Attachments: YARN-5002.1.patch, YARN-5002.2.patch, YARN-5002.3.patch > > > getApplicationReport call may raise NPE > {code} > Exception in thread "main" java.lang.NullPointerException: > java.lang.NullPointerException > > org.apache.hadoop.yarn.server.resourcemanager.security.QueueACLsManager.checkAccess(QueueACLsManager.java:57) > > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.checkAccess(ClientRMService.java:279) > > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplications(ClientRMService.java:760) > > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplications(ClientRMService.java:682) > > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplications(ApplicationClientProtocolPBServiceImpl.java:234) > > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:425) > > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969) > org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2268) > org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2264) > java.security.AccessController.doPrivileged(Native Method) > javax.security.auth.Subject.doAs(Subject.java:422) > > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1708) > org.apache.hadoop.ipc.Server$Handler.run(Server.java:2262) > sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > java.lang.reflect.Constructor.newInstance(Constructor.java:423) > org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53) > org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:107) > > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplications(ApplicationClientProtocolPBClientImpl.java:254) > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > java.lang.reflect.Method.invoke(Method.java:498) > > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256) > > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104) > com.sun.proxy.$Proxy18.getApplications(Unknown Source) > > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplications(YarnClientImpl.java:479) > > org.apache.hadoop.mapred.ResourceMgrDelegate.getAllJobs(ResourceMgrDelegate.java:135) > org.apache.hadoop.mapred.YARNRunner.getAllJobs(YARNRunner.java:167) > org.apache.hadoop.mapreduce.Cluster.getAllJobStatuses(Cluster.java:294) > org.apache.hadoop.mapreduce.tools.CLI.listJobs(CLI.java:553) > org.apache.hadoop.mapreduce.tools.CLI.run(CLI.java:338) > org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) > org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) > org.apache.hadoop.mapred.JobClient.main(JobClient.java:1274) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-5002) getApplicationReport call may raise NPE
[ https://issues.apache.org/jira/browse/YARN-5002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260821#comment-15260821 ] Jian He commented on YARN-5002: --- yeah, it is indeed confusing, earlier viewable applications becomes not viewable if the queue gets removed. I think the non-existing queue should be treated explicitly instead of imbedding in the logic of access control. The fact that the queue is removed probably means the apps in that queue is of less concern in terms of ACLs. For this patch, I think I'll still return true if queue does not exist for the sake of usability. > getApplicationReport call may raise NPE > --- > > Key: YARN-5002 > URL: https://issues.apache.org/jira/browse/YARN-5002 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Sumana Sathish >Assignee: Jian He >Priority: Critical > Attachments: YARN-5002.1.patch, YARN-5002.2.patch, YARN-5002.3.patch > > > getApplicationReport call may raise NPE > {code} > Exception in thread "main" java.lang.NullPointerException: > java.lang.NullPointerException > > org.apache.hadoop.yarn.server.resourcemanager.security.QueueACLsManager.checkAccess(QueueACLsManager.java:57) > > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.checkAccess(ClientRMService.java:279) > > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplications(ClientRMService.java:760) > > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplications(ClientRMService.java:682) > > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplications(ApplicationClientProtocolPBServiceImpl.java:234) > > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:425) > > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969) > org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2268) > org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2264) > java.security.AccessController.doPrivileged(Native Method) > javax.security.auth.Subject.doAs(Subject.java:422) > > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1708) > org.apache.hadoop.ipc.Server$Handler.run(Server.java:2262) > sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > java.lang.reflect.Constructor.newInstance(Constructor.java:423) > org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53) > org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:107) > > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplications(ApplicationClientProtocolPBClientImpl.java:254) > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > java.lang.reflect.Method.invoke(Method.java:498) > > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256) > > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104) > com.sun.proxy.$Proxy18.getApplications(Unknown Source) > > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplications(YarnClientImpl.java:479) > > org.apache.hadoop.mapred.ResourceMgrDelegate.getAllJobs(ResourceMgrDelegate.java:135) > org.apache.hadoop.mapred.YARNRunner.getAllJobs(YARNRunner.java:167) > org.apache.hadoop.mapreduce.Cluster.getAllJobStatuses(Cluster.java:294) > org.apache.hadoop.mapreduce.tools.CLI.listJobs(CLI.java:553) > org.apache.hadoop.mapreduce.tools.CLI.run(CLI.java:338) > org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) > org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) > org.apache.hadoop.mapred.JobClient.main(JobClient.java:1274) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-5002) getApplicationReport call may raise NPE
[ https://issues.apache.org/jira/browse/YARN-5002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-5002: -- Attachment: YARN-5002.3.patch > getApplicationReport call may raise NPE > --- > > Key: YARN-5002 > URL: https://issues.apache.org/jira/browse/YARN-5002 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Sumana Sathish >Assignee: Jian He >Priority: Critical > Attachments: YARN-5002.1.patch, YARN-5002.2.patch, YARN-5002.3.patch > > > getApplicationReport call may raise NPE > {code} > Exception in thread "main" java.lang.NullPointerException: > java.lang.NullPointerException > > org.apache.hadoop.yarn.server.resourcemanager.security.QueueACLsManager.checkAccess(QueueACLsManager.java:57) > > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.checkAccess(ClientRMService.java:279) > > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplications(ClientRMService.java:760) > > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplications(ClientRMService.java:682) > > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplications(ApplicationClientProtocolPBServiceImpl.java:234) > > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:425) > > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969) > org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2268) > org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2264) > java.security.AccessController.doPrivileged(Native Method) > javax.security.auth.Subject.doAs(Subject.java:422) > > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1708) > org.apache.hadoop.ipc.Server$Handler.run(Server.java:2262) > sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > java.lang.reflect.Constructor.newInstance(Constructor.java:423) > org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53) > org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:107) > > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplications(ApplicationClientProtocolPBClientImpl.java:254) > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > java.lang.reflect.Method.invoke(Method.java:498) > > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256) > > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104) > com.sun.proxy.$Proxy18.getApplications(Unknown Source) > > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplications(YarnClientImpl.java:479) > > org.apache.hadoop.mapred.ResourceMgrDelegate.getAllJobs(ResourceMgrDelegate.java:135) > org.apache.hadoop.mapred.YARNRunner.getAllJobs(YARNRunner.java:167) > org.apache.hadoop.mapreduce.Cluster.getAllJobStatuses(Cluster.java:294) > org.apache.hadoop.mapreduce.tools.CLI.listJobs(CLI.java:553) > org.apache.hadoop.mapreduce.tools.CLI.run(CLI.java:338) > org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) > org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) > org.apache.hadoop.mapred.JobClient.main(JobClient.java:1274) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-5003) Add container resource to RM audit log
[ https://issues.apache.org/jira/browse/YARN-5003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260816#comment-15260816 ] Daniel Templeton commented on YARN-5003: I didn't do a careful review yet, but the patch looks reasonable. I don't see any obvious red flags. > Add container resource to RM audit log > -- > > Key: YARN-5003 > URL: https://issues.apache.org/jira/browse/YARN-5003 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager, scheduler >Affects Versions: 3.0.0 >Reporter: Nathan Roberts >Assignee: Nathan Roberts > Attachments: YARN-5003.001.patch > > > It would be valuable to know the resource consumed by a container in the RM > audit log. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-5003) Add container resource to RM audit log
[ https://issues.apache.org/jira/browse/YARN-5003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nathan Roberts updated YARN-5003: - Attachment: YARN-5003.001.patch Attaching patch > Add container resource to RM audit log > -- > > Key: YARN-5003 > URL: https://issues.apache.org/jira/browse/YARN-5003 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager, scheduler >Affects Versions: 3.0.0 >Reporter: Nathan Roberts >Assignee: Nathan Roberts > Attachments: YARN-5003.001.patch > > > It would be valuable to know the resource consumed by a container in the RM > audit log. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4447) Provide a mechanism to represent complex filters and parse them at the REST layer
[ https://issues.apache.org/jira/browse/YARN-4447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260791#comment-15260791 ] Varun Saxena commented on YARN-4447: In point 3, I meant "This means for an entity to match, event1 and event2 should exist and event3 and event4 should {color:red}NOT{color} exist" > Provide a mechanism to represent complex filters and parse them at the REST > layer > -- > > Key: YARN-4447 > URL: https://issues.apache.org/jira/browse/YARN-4447 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Labels: yarn-2928-1st-milestone > Attachments: YARN-4447-YARN-2928.01.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4676) Automatic and Asynchronous Decommissioning Nodes Status Tracking
[ https://issues.apache.org/jira/browse/YARN-4676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260765#comment-15260765 ] Daniel Zhi commented on YARN-4676: -- Just to clarify/repeat my understanding of current behavior (w/o this patch) in case I misread the code: It appears to me that regardless whether RM work-preserving restart is enabled or not, upon RM restart, NodesListManager creates pseudo RMNodeImpl for each excluded node and DECOMMISSION the node right away. Maybe there was intention to resume the DECOMMISSIONING, but I don't see current code is actually doing that. > Automatic and Asynchronous Decommissioning Nodes Status Tracking > > > Key: YARN-4676 > URL: https://issues.apache.org/jira/browse/YARN-4676 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.8.0 >Reporter: Daniel Zhi >Assignee: Daniel Zhi > Labels: features > Attachments: GracefulDecommissionYarnNode.pdf, YARN-4676.004.patch, > YARN-4676.005.patch, YARN-4676.006.patch, YARN-4676.007.patch, > YARN-4676.008.patch, YARN-4676.009.patch, YARN-4676.010.patch, > YARN-4676.011.patch, YARN-4676.012.patch, YARN-4676.013.patch > > > DecommissioningNodeWatcher inside ResourceTrackingService tracks > DECOMMISSIONING nodes status automatically and asynchronously after > client/admin made the graceful decommission request. It tracks > DECOMMISSIONING nodes status to decide when, after all running containers on > the node have completed, will be transitioned into DECOMMISSIONED state. > NodesListManager detect and handle include and exclude list changes to kick > out decommission or recommission as necessary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2888) Corrective mechanisms for rebalancing NM container queues
[ https://issues.apache.org/jira/browse/YARN-2888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun Suresh updated YARN-2888: -- Attachment: YARN-2888.003.patch Updating patch to rebase with trunk and integrate with {{QueuingContainerManagerImpl}} > Corrective mechanisms for rebalancing NM container queues > - > > Key: YARN-2888 > URL: https://issues.apache.org/jira/browse/YARN-2888 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Konstantinos Karanasos >Assignee: Arun Suresh > Attachments: YARN-2888-yarn-2877.001.patch, > YARN-2888-yarn-2877.002.patch, YARN-2888.003.patch > > > Bad queuing decisions by the LocalRMs (e.g., due to the distributed nature of > the scheduling decisions or due to having a stale image of the system) may > lead to an imbalance in the waiting times of the NM container queues. This > can in turn have an impact in job execution times and cluster utilization. > To this end, we introduce corrective mechanisms that may remove (whenever > needed) container requests from overloaded queues, adding them to less-loaded > ones. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3998) Add retry-times to let NM re-launch container when it fails to run
[ https://issues.apache.org/jira/browse/YARN-3998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260696#comment-15260696 ] Varun Vasudev commented on YARN-3998: - [~vinodkv] - do you want to review this further or can I go ahead and commit it? > Add retry-times to let NM re-launch container when it fails to run > -- > > Key: YARN-3998 > URL: https://issues.apache.org/jira/browse/YARN-3998 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Jun Gong >Assignee: Jun Gong > Attachments: YARN-3998.01.patch, YARN-3998.02.patch, > YARN-3998.03.patch, YARN-3998.04.patch, YARN-3998.05.patch, > YARN-3998.06.patch, YARN-3998.07.patch, YARN-3998.08.patch, YARN-3998.09.patch > > > I'd like to add a field(retry-times) in ContainerLaunchContext. When AM > launches containers, it could specify the value. Then NM will re-launch the > container 'retry-times' times when it fails to run(e.g.exit code is not 0). > It will save a lot of time. It avoids container localization. RM does not > need to re-schedule the container. And local files in container's working > directory will be left for re-use.(If container have downloaded some big > files, it does not need to re-download them when running again.) > We find it is useful in systems like Storm. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4734) Merge branch:YARN-3368 to trunk
[ https://issues.apache.org/jira/browse/YARN-4734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260684#comment-15260684 ] Wangda Tan commented on YARN-4734: -- Above docker build failure is caused by one of package cannot be accessed: bq. http://hackage.haskell.org/packages/archive/00-index.tar.gz Will manually retry it later. > Merge branch:YARN-3368 to trunk > --- > > Key: YARN-4734 > URL: https://issues.apache.org/jira/browse/YARN-4734 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-4734.1.patch, YARN-4734.2.patch, YARN-4734.3.patch, > YARN-4734.4.patch, YARN-4734.5.patch, YARN-4734.6.patch, YARN-4734.7.patch, > YARN-4734.8.patch > > > YARN-2928 branch is planned to merge back to trunk shortly, it depends on > changes of YARN-3368. This JIRA is to track the merging task. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4734) Merge branch:YARN-3368 to trunk
[ https://issues.apache.org/jira/browse/YARN-4734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260676#comment-15260676 ] Hadoop QA commented on YARN-4734: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} docker {color} | {color:red} 0m 2s {color} | {color:red} Docker failed to build yetus/hadoop:7b1c37a. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12801087/YARN-4734.8.patch | | JIRA Issue | YARN-4734 | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/11245/console | | Powered by | Apache Yetus 0.2.0 http://yetus.apache.org | This message was automatically generated. > Merge branch:YARN-3368 to trunk > --- > > Key: YARN-4734 > URL: https://issues.apache.org/jira/browse/YARN-4734 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-4734.1.patch, YARN-4734.2.patch, YARN-4734.3.patch, > YARN-4734.4.patch, YARN-4734.5.patch, YARN-4734.6.patch, YARN-4734.7.patch, > YARN-4734.8.patch > > > YARN-2928 branch is planned to merge back to trunk shortly, it depends on > changes of YARN-3368. This JIRA is to track the merging task. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4734) Merge branch:YARN-3368 to trunk
[ https://issues.apache.org/jira/browse/YARN-4734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260675#comment-15260675 ] Hadoop QA commented on YARN-4734: - (!) A patch to the testing environment has been detected. Re-executing against the patched versions to perform further tests. The console is at https://builds.apache.org/job/PreCommit-YARN-Build/11245/console in case of problems. > Merge branch:YARN-3368 to trunk > --- > > Key: YARN-4734 > URL: https://issues.apache.org/jira/browse/YARN-4734 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-4734.1.patch, YARN-4734.2.patch, YARN-4734.3.patch, > YARN-4734.4.patch, YARN-4734.5.patch, YARN-4734.6.patch, YARN-4734.7.patch, > YARN-4734.8.patch > > > YARN-2928 branch is planned to merge back to trunk shortly, it depends on > changes of YARN-3368. This JIRA is to track the merging task. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4447) Provide a mechanism to represent complex filters and parse them at the REST layer
[ https://issues.apache.org/jira/browse/YARN-4447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260665#comment-15260665 ] Varun Saxena commented on YARN-4447: There seems to be some problem with Jenkins machine. YARN-5002 had similar QA report. > Provide a mechanism to represent complex filters and parse them at the REST > layer > -- > > Key: YARN-4447 > URL: https://issues.apache.org/jira/browse/YARN-4447 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Labels: yarn-2928-1st-milestone > Attachments: YARN-4447-YARN-2928.01.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4447) Provide a mechanism to represent complex filters and parse them at the REST layer
[ https://issues.apache.org/jira/browse/YARN-4447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260663#comment-15260663 ] Varun Saxena commented on YARN-4447: [~sjlee0], kindly review. Apologise for the delay in updating this patch. I was away for a week and was busy with internal work earlier so could not work on it. I will add a few more test cases to test application fetching flow. But the main code can be reviewed anyways. Will give a small description of what has been done. # Metric filters are of the form {{(((metric1 gt 23) AND (metric2 lt 40)) OR (metric5 eq 40))}}. Comparison operators supported are as follows : #* gt(Greater than) / ge(Greater than or equals) #* lt(Less than) / le(Less than or equals) #* eq(Equals) / ne(Not equals. If key(metric/config/info) does not exist, this would mean a match) / ene(Exists and not equals. Key must exist). # Config and info filters are of the same form as metric filters except that only eq,ne and ene comparison operators are supported for them. # Event filters will take the form {{(((event1,event2) AND \!(event3,event4)) OR event5, event6)}}. This means for an entity to match, event1 and event2 should exist and event3 and event4 should exist. Or, event5 and event6 should exist. ! indicates not equals(non-existence). A not(\!) should be followed by an opening bracket i.e. ( # Relation filters will take the same form as event filters. Here instead of each event we will have type-entities expression i.e. of the form {{type1:entity1:entity2}}. This part of the expression cannot contain spaces. Relation filters hence would be of the form {{type1:entity11:entity12 , type2:entity21 AND !(type3:entity31)}}. Also, ene kind of case wont be supported here. If entity type does not exist, match wont occur. # Metrics to retrieve and configs to retrieve will have similar format to above. However ANDs' and ORs' do not make much sense here. Hence an expression of the form conf1,conf2 means return configs conf1 and conf2. And expression of the form \!(conf1,conf2) means returns all configs other than conf1 and conf2. # Pls note that metric filters have not yet been supported for flow runs. They need to be matched locally as summation for metrics happens in coprocessor. Can be done in a separate JIRA. Now coming to implementation, I had 2 options. To implement parsing logic in static methods or encapsulate this logic in a class. Went with latter as it makes it easier to break code into multiple methods without a need to pass several parameters to helper methods. And makes the code cleaner IMO. This would mean that an extra object will have to be created everytime though. Would like to know thoughts of others on the approach. # There is a new interface named {{TimelineParser}} added which needs to be implemented for parsing different expressions. # There are 2 abstract classes added namely TimelineParserForCompareExpr(for expressions of the form explained above for metric/config/info filters) and TimelineParserForEqualityExpr(for expressions of the form explained above for event/relation filters). These classes will have abstract methods, which will be implemented by concrete implementations, for deciding what kind of filter needs to be constructed for filter list, how to parse the values and how to set value, compareop, etc. to the filters. # These abstract classes will then have concrete implementation for different filters. These include TimelineParserForNumericFilters(for metric filters), TimelineParserForKVFilters(for config/info filters), TimelineParserForExistFilters(used for filters which check for existence such as event filters) and TimelineParserForRelationFilters(for relation filters). # Some code between TimelineParserForCompareExpr and TimelineParserForEqualityExpr is similar. It can be moved to another base abstract class. But this might make the code confusing. So have left it as it is. Would like to know thoughts of others on this. > Provide a mechanism to represent complex filters and parse them at the REST > layer > -- > > Key: YARN-4447 > URL: https://issues.apache.org/jira/browse/YARN-4447 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Labels: yarn-2928-1st-milestone > Attachments: YARN-4447-YARN-2928.01.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4308) ContainersAggregated CPU resource utilization reports negative usage in first few heartbeats
[ https://issues.apache.org/jira/browse/YARN-4308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260659#comment-15260659 ] Sunil G commented on YARN-4308: --- Sure. I feel we can weigh in opinion from [~kasha] and [~Naganarasimha Garla] too. I am fine in either way (documenting and commenting Or restricted warning logging) so it will be good if some more thoughts comes in so that best solution can go in. > ContainersAggregated CPU resource utilization reports negative usage in first > few heartbeats > > > Key: YARN-4308 > URL: https://issues.apache.org/jira/browse/YARN-4308 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.1 >Reporter: Sunil G >Assignee: Sunil G > Attachments: 0001-YARN-4308.patch, 0002-YARN-4308.patch > > > NodeManager reports ContainerAggregated CPU resource utilization as -ve value > in first few heartbeats cycles. I added a new debug print and received below > values from heartbeats. > {noformat} > INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: > ContainersResource Utilization : CpuTrackerUsagePercent : -1.0 > INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:ContainersResource > Utilization : CpuTrackerUsagePercent : 198.94598 > {noformat} > Its better we send 0 as CPU usage rather than sending a negative values in > heartbeats eventhough its happening in only first few heartbeats. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4734) Merge branch:YARN-3368 to trunk
[ https://issues.apache.org/jira/browse/YARN-4734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-4734: - Attachment: YARN-4734.8.patch > Merge branch:YARN-3368 to trunk > --- > > Key: YARN-4734 > URL: https://issues.apache.org/jira/browse/YARN-4734 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-4734.1.patch, YARN-4734.2.patch, YARN-4734.3.patch, > YARN-4734.4.patch, YARN-4734.5.patch, YARN-4734.6.patch, YARN-4734.7.patch, > YARN-4734.8.patch > > > YARN-2928 branch is planned to merge back to trunk shortly, it depends on > changes of YARN-3368. This JIRA is to track the merging task. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4734) Merge branch:YARN-3368 to trunk
[ https://issues.apache.org/jira/browse/YARN-4734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-4734: - Attachment: (was: YARN-4734.8.patch) > Merge branch:YARN-3368 to trunk > --- > > Key: YARN-4734 > URL: https://issues.apache.org/jira/browse/YARN-4734 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-4734.1.patch, YARN-4734.2.patch, YARN-4734.3.patch, > YARN-4734.4.patch, YARN-4734.5.patch, YARN-4734.6.patch, YARN-4734.7.patch, > YARN-4734.8.patch > > > YARN-2928 branch is planned to merge back to trunk shortly, it depends on > changes of YARN-3368. This JIRA is to track the merging task. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3573) MiniMRYarnCluster constructor that starts the timeline server using a boolean should be marked deprecated
[ https://issues.apache.org/jira/browse/YARN-3573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260649#comment-15260649 ] Brahma Reddy Battula commented on YARN-3573: *Earlier:* {code}MiniMRYarnCluster(String testName, int noOfNMs, boolean enableAHS){code} *Now:* {code}MiniMRYarnCluster(String testName, int noOfNMs){code} timelineserver startup wiil be based on the config value instead passing as Boolean param.. Please refer YARN-2890 for more details...Hope this helps.. > MiniMRYarnCluster constructor that starts the timeline server using a boolean > should be marked deprecated > - > > Key: YARN-3573 > URL: https://issues.apache.org/jira/browse/YARN-3573 > Project: Hadoop YARN > Issue Type: Test > Components: timelineserver >Affects Versions: 2.6.0 >Reporter: Mit Desai >Assignee: Brahma Reddy Battula > Fix For: 2.8.0 > > Attachments: YARN-3573-002.patch, YARN-3573.patch > > > {code}MiniMRYarnCluster(String testName, int noOfNMs, boolean enableAHS){code} > starts the timeline server using *boolean enableAHS*. It is better to have > the timelineserver started based on the config value. > We should mark this constructor as deprecated to avoid its future use. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4956) findbug issue on LevelDBCacheTimelineStore
[ https://issues.apache.org/jira/browse/YARN-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260647#comment-15260647 ] Hudson commented on YARN-4956: -- FAILURE: Integrated in Hadoop-trunk-Commit #9684 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9684/]) YARN-4956. findbug issue on LevelDBCacheTimelineStore. (Zhiyuan Yang via (gtcarrera9: rev f16722d2ef31338a57a13e2c8d18c1c62d58bbaf) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timeline-pluginstorage/src/main/java/org/apache/hadoop/yarn/server/timeline/LevelDBCacheTimelineStore.java > findbug issue on LevelDBCacheTimelineStore > -- > > Key: YARN-4956 > URL: https://issues.apache.org/jira/browse/YARN-4956 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: 2.8.0 >Reporter: Xuan Gong >Assignee: Zhiyuan Yang > Fix For: 2.8.0 > > Attachments: YARN-4956-trunk.000.patch > > > {code} > Multithreaded correctness Warnings > Code Warning IS Inconsistent synchronization of > org.apache.hadoop.yarn.server.timeline.LevelDBCacheTimelineStore.configuration; > locked 66% of time > Bug type IS2_INCONSISTENT_SYNC (click for details) > In class org.apache.hadoop.yarn.server.timeline.LevelDBCacheTimelineStore > Field > org.apache.hadoop.yarn.server.timeline.LevelDBCacheTimelineStore.configuration > Synchronized 66% of the time > Unsynchronized access at LevelDBCacheTimelineStore.java:[line 82] > Synchronized access at LevelDBCacheTimelineStore.java:[line 117] > Synchronized access at LevelDBCacheTimelineStore.java:[line 122] > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4308) ContainersAggregated CPU resource utilization reports negative usage in first few heartbeats
[ https://issues.apache.org/jira/browse/YARN-4308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260645#comment-15260645 ] Daniel Templeton commented on YARN-4308: If you're not going to let the user know about persistent missed reports, then leave a wide trail of breadcrumbs for the person who has to debug it. Putting it in the JavaDoc is a good first step. Maybe also drop a comment into the code that calls the method. > ContainersAggregated CPU resource utilization reports negative usage in first > few heartbeats > > > Key: YARN-4308 > URL: https://issues.apache.org/jira/browse/YARN-4308 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.1 >Reporter: Sunil G >Assignee: Sunil G > Attachments: 0001-YARN-4308.patch, 0002-YARN-4308.patch > > > NodeManager reports ContainerAggregated CPU resource utilization as -ve value > in first few heartbeats cycles. I added a new debug print and received below > values from heartbeats. > {noformat} > INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: > ContainersResource Utilization : CpuTrackerUsagePercent : -1.0 > INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:ContainersResource > Utilization : CpuTrackerUsagePercent : 198.94598 > {noformat} > Its better we send 0 as CPU usage rather than sending a negative values in > heartbeats eventhough its happening in only first few heartbeats. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-5002) getApplicationReport call may raise NPE
[ https://issues.apache.org/jira/browse/YARN-5002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260644#comment-15260644 ] Sunil G commented on YARN-5002: --- Hi [~jianhe] I have one doubt here overall about this part. {{checkAccess}} is invoked from {{forceKillApplication}} etc. And if {{checkAccess}} returns {{false}}, it is been raised as access control exception. But in reality issue was because of a non-existent queue after restart. Eventhough its logged, from client side exception seems like not accurate. Could this be a problem, how do you feel? > getApplicationReport call may raise NPE > --- > > Key: YARN-5002 > URL: https://issues.apache.org/jira/browse/YARN-5002 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Sumana Sathish >Assignee: Jian He >Priority: Critical > Attachments: YARN-5002.1.patch, YARN-5002.2.patch > > > getApplicationReport call may raise NPE > {code} > Exception in thread "main" java.lang.NullPointerException: > java.lang.NullPointerException > > org.apache.hadoop.yarn.server.resourcemanager.security.QueueACLsManager.checkAccess(QueueACLsManager.java:57) > > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.checkAccess(ClientRMService.java:279) > > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplications(ClientRMService.java:760) > > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplications(ClientRMService.java:682) > > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplications(ApplicationClientProtocolPBServiceImpl.java:234) > > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:425) > > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969) > org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2268) > org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2264) > java.security.AccessController.doPrivileged(Native Method) > javax.security.auth.Subject.doAs(Subject.java:422) > > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1708) > org.apache.hadoop.ipc.Server$Handler.run(Server.java:2262) > sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > java.lang.reflect.Constructor.newInstance(Constructor.java:423) > org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53) > org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:107) > > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplications(ApplicationClientProtocolPBClientImpl.java:254) > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > java.lang.reflect.Method.invoke(Method.java:498) > > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256) > > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104) > com.sun.proxy.$Proxy18.getApplications(Unknown Source) > > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplications(YarnClientImpl.java:479) > > org.apache.hadoop.mapred.ResourceMgrDelegate.getAllJobs(ResourceMgrDelegate.java:135) > org.apache.hadoop.mapred.YARNRunner.getAllJobs(YARNRunner.java:167) > org.apache.hadoop.mapreduce.Cluster.getAllJobStatuses(Cluster.java:294) > org.apache.hadoop.mapreduce.tools.CLI.listJobs(CLI.java:553) > org.apache.hadoop.mapreduce.tools.CLI.run(CLI.java:338) > org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) > org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) > org.apache.hadoop.mapred.JobClient.main(JobClient.java:1274) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4308) ContainersAggregated CPU resource utilization reports negative usage in first few heartbeats
[ https://issues.apache.org/jira/browse/YARN-4308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260617#comment-15260617 ] Sunil G commented on YARN-4308: --- Yes. debug log is already present, my bad. bq.Even if right now the only time a negative value comes back is on the first report, that doesn't mean it won't change later. I agree your thought. {{CpuTimeTracker}} doesnt have a protocol/standard defined when to return -1 or 0 or other values. So there are chances that this can be changed too in future. But I am thinking in covering this proposed INFO log code from test case point if view also. Because after skipping n times, we have to log one warning and this cycle has to continue. So this code snippet also to be covered via a test case. Is it fine if we make a note in {{CpuTimeTracker}} for its behavior or its expected return code as java doc?. I am fine in either way, but was thinking about the real usecase for now. > ContainersAggregated CPU resource utilization reports negative usage in first > few heartbeats > > > Key: YARN-4308 > URL: https://issues.apache.org/jira/browse/YARN-4308 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.1 >Reporter: Sunil G >Assignee: Sunil G > Attachments: 0001-YARN-4308.patch, 0002-YARN-4308.patch > > > NodeManager reports ContainerAggregated CPU resource utilization as -ve value > in first few heartbeats cycles. I added a new debug print and received below > values from heartbeats. > {noformat} > INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: > ContainersResource Utilization : CpuTrackerUsagePercent : -1.0 > INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:ContainersResource > Utilization : CpuTrackerUsagePercent : 198.94598 > {noformat} > Its better we send 0 as CPU usage rather than sending a negative values in > heartbeats eventhough its happening in only first few heartbeats. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4577) Enable aux services to have their own custom classpath/jar file
[ https://issues.apache.org/jira/browse/YARN-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260614#comment-15260614 ] Hadoop QA commented on YARN-4577: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} docker {color} | {color:red} 0m 4s {color} | {color:red} Docker failed to build yetus/hadoop:7b1c37a. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12801074/YARN-4577.5.patch | | JIRA Issue | YARN-4577 | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/11244/console | | Powered by | Apache Yetus 0.2.0 http://yetus.apache.org | This message was automatically generated. > Enable aux services to have their own custom classpath/jar file > --- > > Key: YARN-4577 > URL: https://issues.apache.org/jira/browse/YARN-4577 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.8.0 >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-4577.1.patch, YARN-4577.2.patch, > YARN-4577.20160119.1.patch, YARN-4577.20160204.patch, YARN-4577.3.patch, > YARN-4577.3.rebase.patch, YARN-4577.4.patch, YARN-4577.5.patch, > YARN-4577.poc.patch > > > Right now, users have to add their jars to the NM classpath directly, thus > put them on the system classloader. But if multiple versions of the plugin > are present on the classpath, there is no control over which version actually > gets loaded. Or if there are any conflicts between the dependencies > introduced by the auxiliary service and the NM itself, they can break the NM, > the auxiliary service, or both. > The solution could be: to instantiate aux services using a classloader that > is different from the system classloader. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4577) Enable aux services to have their own custom classpath/jar file
[ https://issues.apache.org/jira/browse/YARN-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260609#comment-15260609 ] Xuan Gong commented on YARN-4577: - [~sjlee0] Thanks for the review. Attached a new patch to address the comments. Unfortunately, I am not able to create a unit test for this. But I did test it manually. Here is how I test it: 1. Create a customized TestAuxService which extends AuxiliaryService. 2. Create two jar file which have the same jar file name: TestAuxSerivce.jar and have the same class name: TestAuxService.java 3. Each TestAuxService.java has different log message. something like "TestAuxService in NM ClassPath" and "TestAuxService in Customer ClassPath" 4. Put one TestAuxService.jar into NM ClassPath, and put another TestAuxService.jar into customer class path, such as "/Users/xuan/dep/TestAuxService.jar" 5. modify several configuration in YARN-SITE.XML {code} yarn.nodemanager.aux-services mapreduce_shuffle,TestAuxService shuffle service that needs to be set for Map Reduce to run yarn.nodemanager.aux-services.TestAuxService.class org.aux.TestAuxService {code} 6. start NM, and verified the log message in NM logs, we can see {code} Test My AuxService in NM ClassPath in Service Init stage Test My AuxService in NM ClassPath in Service Start stage {code} And we can verify that we load the TestAuxService class from NM Class Path 7. add one more configuration into yarn-site.xml {code} yarn.nodemanager.aux-services.TestAuxService.class.classpath /Users/xuan/dep/TestAuxService.jar {code} 8. Start NM, and check log message in NM log, we can find {code} Test My AuxService in Customer ClassPath in Service Init stage Test My AuxService in Customer ClassPath in Service Start stage {code} we can verify that if we set the customer class path, we would load TestAuxService from customer class path instead of NM classpath. > Enable aux services to have their own custom classpath/jar file > --- > > Key: YARN-4577 > URL: https://issues.apache.org/jira/browse/YARN-4577 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.8.0 >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-4577.1.patch, YARN-4577.2.patch, > YARN-4577.20160119.1.patch, YARN-4577.20160204.patch, YARN-4577.3.patch, > YARN-4577.3.rebase.patch, YARN-4577.4.patch, YARN-4577.5.patch, > YARN-4577.poc.patch > > > Right now, users have to add their jars to the NM classpath directly, thus > put them on the system classloader. But if multiple versions of the plugin > are present on the classpath, there is no control over which version actually > gets loaded. Or if there are any conflicts between the dependencies > introduced by the auxiliary service and the NM itself, they can break the NM, > the auxiliary service, or both. > The solution could be: to instantiate aux services using a classloader that > is different from the system classloader. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-5002) getApplicationReport call may raise NPE
[ https://issues.apache.org/jira/browse/YARN-5002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260607#comment-15260607 ] Hadoop QA commented on YARN-5002: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} docker {color} | {color:red} 4m 53s {color} | {color:red} Docker failed to build yetus/hadoop:7b1c37a. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12801079/YARN-5002.2.patch | | JIRA Issue | YARN-5002 | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/11243/console | | Powered by | Apache Yetus 0.2.0 http://yetus.apache.org | This message was automatically generated. > getApplicationReport call may raise NPE > --- > > Key: YARN-5002 > URL: https://issues.apache.org/jira/browse/YARN-5002 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Sumana Sathish >Assignee: Jian He >Priority: Critical > Attachments: YARN-5002.1.patch, YARN-5002.2.patch > > > getApplicationReport call may raise NPE > {code} > Exception in thread "main" java.lang.NullPointerException: > java.lang.NullPointerException > > org.apache.hadoop.yarn.server.resourcemanager.security.QueueACLsManager.checkAccess(QueueACLsManager.java:57) > > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.checkAccess(ClientRMService.java:279) > > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplications(ClientRMService.java:760) > > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplications(ClientRMService.java:682) > > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplications(ApplicationClientProtocolPBServiceImpl.java:234) > > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:425) > > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969) > org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2268) > org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2264) > java.security.AccessController.doPrivileged(Native Method) > javax.security.auth.Subject.doAs(Subject.java:422) > > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1708) > org.apache.hadoop.ipc.Server$Handler.run(Server.java:2262) > sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > java.lang.reflect.Constructor.newInstance(Constructor.java:423) > org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53) > org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:107) > > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplications(ApplicationClientProtocolPBClientImpl.java:254) > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > java.lang.reflect.Method.invoke(Method.java:498) > > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256) > > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104) > com.sun.proxy.$Proxy18.getApplications(Unknown Source) > > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplications(YarnClientImpl.java:479) > > org.apache.hadoop.mapred.ResourceMgrDelegate.getAllJobs(ResourceMgrDelegate.java:135) > org.apache.hadoop.mapred.YARNRunner.getAllJobs(YARNRunner.java:167) > org.apache.hadoop.mapreduce.Cluster.getAllJobStatuses(Cluster.java:294) > org.apache.hadoop.mapreduce.tools.CLI.listJobs(CLI.java:553) > org.apache.hadoop.mapreduce.tools.CLI.run(CLI.java:338) > org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) > org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) > org.apache.hadoop.mapred.JobClient.main(JobClient.java:1274) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4447) Provide a mechanism to represent complex filters and parse them at the REST layer
[ https://issues.apache.org/jira/browse/YARN-4447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260591#comment-15260591 ] Hadoop QA commented on YARN-4447: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} docker {color} | {color:red} 3m 18s {color} | {color:red} Docker failed to build yetus/hadoop:0ca8df7. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12801073/YARN-4447-YARN-2928.01.patch | | JIRA Issue | YARN-4447 | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/11240/console | | Powered by | Apache Yetus 0.2.0 http://yetus.apache.org | This message was automatically generated. > Provide a mechanism to represent complex filters and parse them at the REST > layer > -- > > Key: YARN-4447 > URL: https://issues.apache.org/jira/browse/YARN-4447 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Labels: yarn-2928-1st-milestone > Attachments: YARN-4447-YARN-2928.01.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4577) Enable aux services to have their own custom classpath/jar file
[ https://issues.apache.org/jira/browse/YARN-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260589#comment-15260589 ] Hadoop QA commented on YARN-4577: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} docker {color} | {color:red} 0m 2s {color} | {color:red} Docker failed to build yetus/hadoop:7b1c37a. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12801074/YARN-4577.5.patch | | JIRA Issue | YARN-4577 | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/11242/console | | Powered by | Apache Yetus 0.2.0 http://yetus.apache.org | This message was automatically generated. > Enable aux services to have their own custom classpath/jar file > --- > > Key: YARN-4577 > URL: https://issues.apache.org/jira/browse/YARN-4577 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.8.0 >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-4577.1.patch, YARN-4577.2.patch, > YARN-4577.20160119.1.patch, YARN-4577.20160204.patch, YARN-4577.3.patch, > YARN-4577.3.rebase.patch, YARN-4577.4.patch, YARN-4577.5.patch, > YARN-4577.poc.patch > > > Right now, users have to add their jars to the NM classpath directly, thus > put them on the system classloader. But if multiple versions of the plugin > are present on the classpath, there is no control over which version actually > gets loaded. Or if there are any conflicts between the dependencies > introduced by the auxiliary service and the NM itself, they can break the NM, > the auxiliary service, or both. > The solution could be: to instantiate aux services using a classloader that > is different from the system classloader. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4734) Merge branch:YARN-3368 to trunk
[ https://issues.apache.org/jira/browse/YARN-4734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260588#comment-15260588 ] Hadoop QA commented on YARN-4734: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 4s {color} | {color:red} YARN-4734 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12801075/YARN-4734.8.patch | | JIRA Issue | YARN-4734 | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/11241/console | | Powered by | Apache Yetus 0.2.0 http://yetus.apache.org | This message was automatically generated. > Merge branch:YARN-3368 to trunk > --- > > Key: YARN-4734 > URL: https://issues.apache.org/jira/browse/YARN-4734 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-4734.1.patch, YARN-4734.2.patch, YARN-4734.3.patch, > YARN-4734.4.patch, YARN-4734.5.patch, YARN-4734.6.patch, YARN-4734.7.patch, > YARN-4734.8.patch > > > YARN-2928 branch is planned to merge back to trunk shortly, it depends on > changes of YARN-3368. This JIRA is to track the merging task. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4956) findbug issue on LevelDBCacheTimelineStore
[ https://issues.apache.org/jira/browse/YARN-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260585#comment-15260585 ] Li Lu commented on YARN-4956: - No concerns raised. I'll commit shortly. > findbug issue on LevelDBCacheTimelineStore > -- > > Key: YARN-4956 > URL: https://issues.apache.org/jira/browse/YARN-4956 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: 2.8.0 >Reporter: Xuan Gong >Assignee: Zhiyuan Yang > Attachments: YARN-4956-trunk.000.patch > > > {code} > Multithreaded correctness Warnings > Code Warning IS Inconsistent synchronization of > org.apache.hadoop.yarn.server.timeline.LevelDBCacheTimelineStore.configuration; > locked 66% of time > Bug type IS2_INCONSISTENT_SYNC (click for details) > In class org.apache.hadoop.yarn.server.timeline.LevelDBCacheTimelineStore > Field > org.apache.hadoop.yarn.server.timeline.LevelDBCacheTimelineStore.configuration > Synchronized 66% of the time > Unsynchronized access at LevelDBCacheTimelineStore.java:[line 82] > Synchronized access at LevelDBCacheTimelineStore.java:[line 117] > Synchronized access at LevelDBCacheTimelineStore.java:[line 122] > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-5002) getApplicationReport call may raise NPE
[ https://issues.apache.org/jira/browse/YARN-5002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-5002: -- Attachment: YARN-5002.2.patch > getApplicationReport call may raise NPE > --- > > Key: YARN-5002 > URL: https://issues.apache.org/jira/browse/YARN-5002 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Sumana Sathish >Assignee: Jian He >Priority: Critical > Attachments: YARN-5002.1.patch, YARN-5002.2.patch > > > getApplicationReport call may raise NPE > {code} > Exception in thread "main" java.lang.NullPointerException: > java.lang.NullPointerException > > org.apache.hadoop.yarn.server.resourcemanager.security.QueueACLsManager.checkAccess(QueueACLsManager.java:57) > > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.checkAccess(ClientRMService.java:279) > > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplications(ClientRMService.java:760) > > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplications(ClientRMService.java:682) > > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplications(ApplicationClientProtocolPBServiceImpl.java:234) > > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:425) > > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969) > org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2268) > org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2264) > java.security.AccessController.doPrivileged(Native Method) > javax.security.auth.Subject.doAs(Subject.java:422) > > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1708) > org.apache.hadoop.ipc.Server$Handler.run(Server.java:2262) > sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > java.lang.reflect.Constructor.newInstance(Constructor.java:423) > org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53) > org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:107) > > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplications(ApplicationClientProtocolPBClientImpl.java:254) > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > java.lang.reflect.Method.invoke(Method.java:498) > > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256) > > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104) > com.sun.proxy.$Proxy18.getApplications(Unknown Source) > > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplications(YarnClientImpl.java:479) > > org.apache.hadoop.mapred.ResourceMgrDelegate.getAllJobs(ResourceMgrDelegate.java:135) > org.apache.hadoop.mapred.YARNRunner.getAllJobs(YARNRunner.java:167) > org.apache.hadoop.mapreduce.Cluster.getAllJobStatuses(Cluster.java:294) > org.apache.hadoop.mapreduce.tools.CLI.listJobs(CLI.java:553) > org.apache.hadoop.mapreduce.tools.CLI.run(CLI.java:338) > org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) > org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) > org.apache.hadoop.mapred.JobClient.main(JobClient.java:1274) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-5002) getApplicationReport call may raise NPE
[ https://issues.apache.org/jira/browse/YARN-5002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260574#comment-15260574 ] Jian He commented on YARN-5002: --- [~templedf], thanks for the review bq. You can't hard-code the capacity scheduler into the RM yeah, missed this, even unit test failed because of this. fixed it. bq. From a security perspective it's better to deny access to an app if we can't find the queue. I don't have strong opinion on this. Problem with denying access is that these apps will never be able to be viewed. changed it anyway. > getApplicationReport call may raise NPE > --- > > Key: YARN-5002 > URL: https://issues.apache.org/jira/browse/YARN-5002 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Sumana Sathish >Assignee: Jian He >Priority: Critical > Attachments: YARN-5002.1.patch > > > getApplicationReport call may raise NPE > {code} > Exception in thread "main" java.lang.NullPointerException: > java.lang.NullPointerException > > org.apache.hadoop.yarn.server.resourcemanager.security.QueueACLsManager.checkAccess(QueueACLsManager.java:57) > > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.checkAccess(ClientRMService.java:279) > > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplications(ClientRMService.java:760) > > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplications(ClientRMService.java:682) > > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplications(ApplicationClientProtocolPBServiceImpl.java:234) > > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:425) > > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969) > org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2268) > org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2264) > java.security.AccessController.doPrivileged(Native Method) > javax.security.auth.Subject.doAs(Subject.java:422) > > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1708) > org.apache.hadoop.ipc.Server$Handler.run(Server.java:2262) > sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > java.lang.reflect.Constructor.newInstance(Constructor.java:423) > org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53) > org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:107) > > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplications(ApplicationClientProtocolPBClientImpl.java:254) > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > java.lang.reflect.Method.invoke(Method.java:498) > > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256) > > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104) > com.sun.proxy.$Proxy18.getApplications(Unknown Source) > > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplications(YarnClientImpl.java:479) > > org.apache.hadoop.mapred.ResourceMgrDelegate.getAllJobs(ResourceMgrDelegate.java:135) > org.apache.hadoop.mapred.YARNRunner.getAllJobs(YARNRunner.java:167) > org.apache.hadoop.mapreduce.Cluster.getAllJobStatuses(Cluster.java:294) > org.apache.hadoop.mapreduce.tools.CLI.listJobs(CLI.java:553) > org.apache.hadoop.mapreduce.tools.CLI.run(CLI.java:338) > org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) > org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) > org.apache.hadoop.mapred.JobClient.main(JobClient.java:1274) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4905) Improve Yarn log Command line option to show log metadata
[ https://issues.apache.org/jira/browse/YARN-4905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260571#comment-15260571 ] Hadoop QA commented on YARN-4905: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} docker {color} | {color:red} 11m 54s {color} | {color:red} Docker failed to build yetus/hadoop:7b1c37a. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12801072/YARN-4905.5.patch | | JIRA Issue | YARN-4905 | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/11239/console | | Powered by | Apache Yetus 0.2.0 http://yetus.apache.org | This message was automatically generated. > Improve Yarn log Command line option to show log metadata > - > > Key: YARN-4905 > URL: https://issues.apache.org/jira/browse/YARN-4905 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-4905.1.patch, YARN-4905.2.patch, YARN-4905.3.patch, > YARN-4905.4.patch, YARN-4905.5.patch > > > Improve the Yarn log commandline to have "ls" command which can list > containers for which we have logs, list files within each container, along > with file size -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4734) Merge branch:YARN-3368 to trunk
[ https://issues.apache.org/jira/browse/YARN-4734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-4734: - Attachment: YARN-4734.8.patch Attached ver.8 patch, rebased to latest trunk, merged LICENSE.txt. > Merge branch:YARN-3368 to trunk > --- > > Key: YARN-4734 > URL: https://issues.apache.org/jira/browse/YARN-4734 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-4734.1.patch, YARN-4734.2.patch, YARN-4734.3.patch, > YARN-4734.4.patch, YARN-4734.5.patch, YARN-4734.6.patch, YARN-4734.7.patch, > YARN-4734.8.patch > > > YARN-2928 branch is planned to merge back to trunk shortly, it depends on > changes of YARN-3368. This JIRA is to track the merging task. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4577) Enable aux services to have their own custom classpath/jar file
[ https://issues.apache.org/jira/browse/YARN-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-4577: Attachment: YARN-4577.5.patch > Enable aux services to have their own custom classpath/jar file > --- > > Key: YARN-4577 > URL: https://issues.apache.org/jira/browse/YARN-4577 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.8.0 >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-4577.1.patch, YARN-4577.2.patch, > YARN-4577.20160119.1.patch, YARN-4577.20160204.patch, YARN-4577.3.patch, > YARN-4577.3.rebase.patch, YARN-4577.4.patch, YARN-4577.5.patch, > YARN-4577.poc.patch > > > Right now, users have to add their jars to the NM classpath directly, thus > put them on the system classloader. But if multiple versions of the plugin > are present on the classpath, there is no control over which version actually > gets loaded. Or if there are any conflicts between the dependencies > introduced by the auxiliary service and the NM itself, they can break the NM, > the auxiliary service, or both. > The solution could be: to instantiate aux services using a classloader that > is different from the system classloader. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4308) ContainersAggregated CPU resource utilization reports negative usage in first few heartbeats
[ https://issues.apache.org/jira/browse/YARN-4308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260558#comment-15260558 ] Daniel Templeton commented on YARN-4308: There's already a debug log on a miss in the patch. Even if right now the only time a negative value comes back is on the first report, that doesn't mean it won't change later. My spider sense says that the chance of the reports going away permanently with no sign as to why is bad. We're talking about futures, though, so I'm willing to accept your assertion that this change can't possible create a customer support case, but I reserve the right to an I-told-you-so later if it does. > ContainersAggregated CPU resource utilization reports negative usage in first > few heartbeats > > > Key: YARN-4308 > URL: https://issues.apache.org/jira/browse/YARN-4308 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.1 >Reporter: Sunil G >Assignee: Sunil G > Attachments: 0001-YARN-4308.patch, 0002-YARN-4308.patch > > > NodeManager reports ContainerAggregated CPU resource utilization as -ve value > in first few heartbeats cycles. I added a new debug print and received below > values from heartbeats. > {noformat} > INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: > ContainersResource Utilization : CpuTrackerUsagePercent : -1.0 > INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:ContainersResource > Utilization : CpuTrackerUsagePercent : 198.94598 > {noformat} > Its better we send 0 as CPU usage rather than sending a negative values in > heartbeats eventhough its happening in only first few heartbeats. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4447) Provide a mechanism to represent complex filters and parse them at the REST layer
[ https://issues.apache.org/jira/browse/YARN-4447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-4447: --- Attachment: YARN-4447-YARN-2928.01.patch > Provide a mechanism to represent complex filters and parse them at the REST > layer > -- > > Key: YARN-4447 > URL: https://issues.apache.org/jira/browse/YARN-4447 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Labels: yarn-2928-1st-milestone > Attachments: YARN-4447-YARN-2928.01.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4308) ContainersAggregated CPU resource utilization reports negative usage in first few heartbeats
[ https://issues.apache.org/jira/browse/YARN-4308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260533#comment-15260533 ] Sunil G commented on YARN-4308: --- Thanks [~templedf] for sharing the thoughts. I have checked the possibilities of getting -ve values from {{CpuTimeTracker}} . As I see it, we can get negative only first time and I was not seeing other cases. In that case, considering the skipping happen only once, do we need a INFO log there ? I think I can add a debug log if that code is hit. But I am not very sure whether we need log after "n" hits, because it may hit only first time. Could you pls correct me If i missed somthing . > ContainersAggregated CPU resource utilization reports negative usage in first > few heartbeats > > > Key: YARN-4308 > URL: https://issues.apache.org/jira/browse/YARN-4308 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.1 >Reporter: Sunil G >Assignee: Sunil G > Attachments: 0001-YARN-4308.patch, 0002-YARN-4308.patch > > > NodeManager reports ContainerAggregated CPU resource utilization as -ve value > in first few heartbeats cycles. I added a new debug print and received below > values from heartbeats. > {noformat} > INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: > ContainersResource Utilization : CpuTrackerUsagePercent : -1.0 > INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:ContainersResource > Utilization : CpuTrackerUsagePercent : 198.94598 > {noformat} > Its better we send 0 as CPU usage rather than sending a negative values in > heartbeats eventhough its happening in only first few heartbeats. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4905) Improve Yarn log Command line option to show log metadata
[ https://issues.apache.org/jira/browse/YARN-4905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-4905: Attachment: YARN-4905.5.patch rebase the patch > Improve Yarn log Command line option to show log metadata > - > > Key: YARN-4905 > URL: https://issues.apache.org/jira/browse/YARN-4905 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-4905.1.patch, YARN-4905.2.patch, YARN-4905.3.patch, > YARN-4905.4.patch, YARN-4905.5.patch > > > Improve the Yarn log commandline to have "ls" command which can list > containers for which we have logs, list files within each container, along > with file size -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4807) MockAM#waitForState sleep duration is too long
[ https://issues.apache.org/jira/browse/YARN-4807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260496#comment-15260496 ] Yufei Gu commented on YARN-4807: Thanks a lot for the review, [~templedf] and [~kasha]. Thanks a lot for committing, [~kasha]. > MockAM#waitForState sleep duration is too long > -- > > Key: YARN-4807 > URL: https://issues.apache.org/jira/browse/YARN-4807 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.8.0 >Reporter: Karthik Kambatla >Assignee: Yufei Gu > Fix For: 2.9.0 > > Attachments: YARN-4807.001.patch, YARN-4807.002.patch, > YARN-4807.003.patch, YARN-4807.004.patch, YARN-4807.005.patch, > YARN-4807.006.patch, YARN-4807.007.patch, YARN-4807.008.patch, > YARN-4807.009.patch, YARN-4807.010.patch, YARN-4807.011.patch, > YARN-4807.012.patch, YARN-4807.013.patch, YARN-4807.014.patch, > YARN-4807.015.patch > > > MockAM#waitForState sleep duration (500 ms) is too long. Also, there is > significant duplication with MockRM#waitForState. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4807) MockAM#waitForState sleep duration is too long
[ https://issues.apache.org/jira/browse/YARN-4807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260477#comment-15260477 ] Hudson commented on YARN-4807: -- FAILURE: Integrated in Hadoop-trunk-Commit #9682 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9682/]) YARN-4807. MockAM#waitForState sleep duration is too long. (Yufei Gu via (kasha: rev 185c3d4de1ac4cf10cc1aa00f367b3880b80) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestWorkPreservingRMRestartForNodeLabel.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRMRPCNodeUpdates.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesNodes.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockAM.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestContainerResourceUsage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRestart.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestNodeLabelContainerAllocation.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestContainerResizing.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestSignalContainer.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/ahs/TestRMApplicationHistoryWriter.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacitySchedulerNodeLabelUpdate.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestApplicationCleanup.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestAbstractYarnScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestNodesListManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestApplicationMasterService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRM.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesApps.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestApplicationMasterLauncher.java > MockAM#waitForState sleep duration is too long > -- > > Key: YARN-4807 > URL: https://issues.apache.org/jira/browse/YARN-4807 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.8.0 >Reporter: Karthik Kambatla >Assignee: Yufei Gu > Fix For: 2.9.0 > > Attachments: YARN-4807.001.patch, YARN-4807.002.pa
[jira] [Created] (YARN-5003) Add container resource to RM audit log
Nathan Roberts created YARN-5003: Summary: Add container resource to RM audit log Key: YARN-5003 URL: https://issues.apache.org/jira/browse/YARN-5003 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager, scheduler Affects Versions: 3.0.0 Reporter: Nathan Roberts Assignee: Nathan Roberts It would be valuable to know the resource consumed by a container in the RM audit log. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4676) Automatic and Asynchronous Decommissioning Nodes Status Tracking
[ https://issues.apache.org/jira/browse/YARN-4676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260468#comment-15260468 ] Junping Du commented on YARN-4676: -- bq. If RM work-preserving restart is not enabled, it should be okay to decommission a node right away. Agree. But it is not today's behavior w/o this patch. After this patch, the decommissioning nodes will lose timeout until all running applications on top of are get finished. bq. If work-preserving restart is enabled and a node is decommissioned with a timeout, it would be nice to store when the decommission has been called and the timeout in the state-store. Note that, in an HA setup, the two RMs could have a clock skew. Since that work is non-trivial, I am open to doing it in a follow-up JIRA. I really have concern to put everything into state-store. I think we should try to get rid of store unnecessary info as much as possible - just like what we do in RM recover applications/nodes for RM restart. Isn't it? Additional Store/Recovery operation for each NM's decommissioning timeout value sounds too over-weighted. Actually, I was more interested on the Daniel's idea above to combine the client side track and RM side track so that we could track timeout in client side in case we lose timeout in RM side. However, I need to check more to have some more concrete ideas. > Automatic and Asynchronous Decommissioning Nodes Status Tracking > > > Key: YARN-4676 > URL: https://issues.apache.org/jira/browse/YARN-4676 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.8.0 >Reporter: Daniel Zhi >Assignee: Daniel Zhi > Labels: features > Attachments: GracefulDecommissionYarnNode.pdf, YARN-4676.004.patch, > YARN-4676.005.patch, YARN-4676.006.patch, YARN-4676.007.patch, > YARN-4676.008.patch, YARN-4676.009.patch, YARN-4676.010.patch, > YARN-4676.011.patch, YARN-4676.012.patch, YARN-4676.013.patch > > > DecommissioningNodeWatcher inside ResourceTrackingService tracks > DECOMMISSIONING nodes status automatically and asynchronously after > client/admin made the graceful decommission request. It tracks > DECOMMISSIONING nodes status to decide when, after all running containers on > the node have completed, will be transitioned into DECOMMISSIONED state. > NodesListManager detect and handle include and exclude list changes to kick > out decommission or recommission as necessary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4804) [Umbrella] Improve test run duration
[ https://issues.apache.org/jira/browse/YARN-4804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260465#comment-15260465 ] Karthik Kambatla commented on YARN-4804: We shaved off 10 minutes through YARN-4805 and YARN-4807. > [Umbrella] Improve test run duration > > > Key: YARN-4804 > URL: https://issues.apache.org/jira/browse/YARN-4804 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.8.0 >Reporter: Karthik Kambatla > > Our tests take a long time to run. e.g. the RM tests take 67 minutes. Given > our precommit builds run our tests against two Java versions, this issue is > exacerbated. > Filing this umbrella JIRA to address this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4807) MockAM#waitForState sleep duration is too long
[ https://issues.apache.org/jira/browse/YARN-4807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-4807: --- Labels: (was: newbie) > MockAM#waitForState sleep duration is too long > -- > > Key: YARN-4807 > URL: https://issues.apache.org/jira/browse/YARN-4807 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.8.0 >Reporter: Karthik Kambatla >Assignee: Yufei Gu > Fix For: 2.9.0 > > Attachments: YARN-4807.001.patch, YARN-4807.002.patch, > YARN-4807.003.patch, YARN-4807.004.patch, YARN-4807.005.patch, > YARN-4807.006.patch, YARN-4807.007.patch, YARN-4807.008.patch, > YARN-4807.009.patch, YARN-4807.010.patch, YARN-4807.011.patch, > YARN-4807.012.patch, YARN-4807.013.patch, YARN-4807.014.patch, > YARN-4807.015.patch > > > MockAM#waitForState sleep duration (500 ms) is too long. Also, there is > significant duplication with MockRM#waitForState. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4595) Add support for configurable read-only mounts
[ https://issues.apache.org/jira/browse/YARN-4595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260454#comment-15260454 ] Varun Vasudev commented on YARN-4595: - [~aw] - do Billie's changes address your concerns? > Add support for configurable read-only mounts > - > > Key: YARN-4595 > URL: https://issues.apache.org/jira/browse/YARN-4595 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Billie Rinaldi >Assignee: Billie Rinaldi > Attachments: YARN-4595.1.patch, YARN-4595.2.patch, YARN-4595.3.patch, > YARN-4595.4.patch, YARN-4595.5.patch > > > Mounting files or directories from the host is one way of passing > configuration and other information into a docker container. We could allow > the user to set a list of mounts in the environment of ContainerLaunchContext > (e.g. /dir1:/targetdir1,/dir2:/targetdir2). These would be mounted read-only > to the specified target locations. > Due to permissions and user concerns, for this ticket we will require the > mounts to be resources that are in the distributed cache. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4807) MockAM#waitForState sleep duration is too long
[ https://issues.apache.org/jira/browse/YARN-4807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260452#comment-15260452 ] Karthik Kambatla commented on YARN-4807: +1, checking this in. > MockAM#waitForState sleep duration is too long > -- > > Key: YARN-4807 > URL: https://issues.apache.org/jira/browse/YARN-4807 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.8.0 >Reporter: Karthik Kambatla >Assignee: Yufei Gu > Labels: newbie > Attachments: YARN-4807.001.patch, YARN-4807.002.patch, > YARN-4807.003.patch, YARN-4807.004.patch, YARN-4807.005.patch, > YARN-4807.006.patch, YARN-4807.007.patch, YARN-4807.008.patch, > YARN-4807.009.patch, YARN-4807.010.patch, YARN-4807.011.patch, > YARN-4807.012.patch, YARN-4807.013.patch, YARN-4807.014.patch, > YARN-4807.015.patch > > > MockAM#waitForState sleep duration (500 ms) is too long. Also, there is > significant duplication with MockRM#waitForState. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4676) Automatic and Asynchronous Decommissioning Nodes Status Tracking
[ https://issues.apache.org/jira/browse/YARN-4676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260441#comment-15260441 ] Karthik Kambatla commented on YARN-4676: Haven't looked at the code itself, but looked at recent discussion around RM restart and [~rkanter] filled me in on some of the details. If RM work-preserving restart is not enabled, it should be okay to decommission a node right away. If work-preserving restart is enabled and a node is decommissioned with a timeout, it would be nice to store *when* the decommission has been called and the timeout in the state-store. Note that, in an HA setup, the two RMs could have a clock skew. Since that work is non-trivial, I am open to doing it in a follow-up JIRA. > Automatic and Asynchronous Decommissioning Nodes Status Tracking > > > Key: YARN-4676 > URL: https://issues.apache.org/jira/browse/YARN-4676 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.8.0 >Reporter: Daniel Zhi >Assignee: Daniel Zhi > Labels: features > Attachments: GracefulDecommissionYarnNode.pdf, YARN-4676.004.patch, > YARN-4676.005.patch, YARN-4676.006.patch, YARN-4676.007.patch, > YARN-4676.008.patch, YARN-4676.009.patch, YARN-4676.010.patch, > YARN-4676.011.patch, YARN-4676.012.patch, YARN-4676.013.patch > > > DecommissioningNodeWatcher inside ResourceTrackingService tracks > DECOMMISSIONING nodes status automatically and asynchronously after > client/admin made the graceful decommission request. It tracks > DECOMMISSIONING nodes status to decide when, after all running containers on > the node have completed, will be transitioned into DECOMMISSIONED state. > NodesListManager detect and handle include and exclude list changes to kick > out decommission or recommission as necessary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-5002) getApplicationReport call may raise NPE
[ https://issues.apache.org/jira/browse/YARN-5002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260430#comment-15260430 ] Daniel Templeton commented on YARN-5002: Thanks for posting the patch, [~jianhe]. I have a few concerns: # You can't hard-code the capacity scheduler into the RM. The code has to use whatever scheduler is selected. # From a security perspective it's better to deny access to an app if we can't find the queue. We should probably log an INFO or DEBUG level message when that happens so that there's a paper trail. # This: {code} final ApplicationReport[] report = { null }; user2.doAs(new PrivilegedAction() { @Override public ApplicationReport run() { try { report[0] = rm2.getApplicationReport(app1.getApplicationId()); } catch (Exception e) { e.printStackTrace(); } return report[0]; } }); {code} seems a bit convoluted. How about just: {code} ApplicationReport report = user2.doAs(new PrivilegedExceptionAction() { @Override public ApplicationReport run() throws Exception { return rm2.getApplicationReport(app1.getApplicationId()); } }); {code} > getApplicationReport call may raise NPE > --- > > Key: YARN-5002 > URL: https://issues.apache.org/jira/browse/YARN-5002 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Sumana Sathish >Assignee: Jian He >Priority: Critical > Attachments: YARN-5002.1.patch > > > getApplicationReport call may raise NPE > {code} > Exception in thread "main" java.lang.NullPointerException: > java.lang.NullPointerException > > org.apache.hadoop.yarn.server.resourcemanager.security.QueueACLsManager.checkAccess(QueueACLsManager.java:57) > > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.checkAccess(ClientRMService.java:279) > > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplications(ClientRMService.java:760) > > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplications(ClientRMService.java:682) > > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplications(ApplicationClientProtocolPBServiceImpl.java:234) > > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:425) > > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969) > org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2268) > org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2264) > java.security.AccessController.doPrivileged(Native Method) > javax.security.auth.Subject.doAs(Subject.java:422) > > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1708) > org.apache.hadoop.ipc.Server$Handler.run(Server.java:2262) > sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > java.lang.reflect.Constructor.newInstance(Constructor.java:423) > org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53) > org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:107) > > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplications(ApplicationClientProtocolPBClientImpl.java:254) > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > java.lang.reflect.Method.invoke(Method.java:498) > > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256) > > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104) > com.sun.proxy.$Proxy18.getApplications(Unknown Source) > > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplications(YarnClientImpl.java:479) > > org.apache.hadoop.mapred.ResourceMgrDelegate.getAllJobs(ResourceMgrDelegate.java:135) > org.apache.hadoop.mapred.YARNRunner.getAllJobs(YARNRunner.java:167) > org.apache.hadoop.mapreduce.Cluster.getAllJobStatuses(Cluster.java:294) > org.apache.hadoop.mapreduce.tools.CLI.listJobs(CLI.java:553) > org.apache.hadoop.mapreduce.tools.CLI.run(CLI.java:338) > org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) > org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) > org.apache.hadoop.mapred.JobClient.main(JobClien
[jira] [Commented] (YARN-4994) Use MiniYARNCluster with try-with-resources in tests
[ https://issues.apache.org/jira/browse/YARN-4994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260397#comment-15260397 ] Hadoop QA commented on YARN-4994: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 15s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 7 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 17s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 12m 4s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 23s {color} | {color:green} trunk passed with JDK v1.8.0_92 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 5s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 3s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 31s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 55s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 16s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 22s {color} | {color:green} trunk passed with JDK v1.8.0_92 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 10s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 16s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 21s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 11m 43s {color} | {color:green} the patch passed with JDK v1.8.0_92 {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 13m 21s {color} | {color:red} root-jdk1.8.0_92 with JDK v1.8.0_92 generated 3 new + 736 unchanged - 3 fixed = 739 total (was 739) {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 11m 43s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 30s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 23m 52s {color} | {color:red} root-jdk1.7.0_95 with JDK v1.7.0_95 generated 3 new + 733 unchanged - 3 fixed = 736 total (was 736) {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 10m 30s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 6s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 34s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 54s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 40s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 28s {color} | {color:green} the patch passed with JDK v1.8.0_92 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 19s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 7m 32s {color} | {color:red} hadoop-yarn-server-tests in the patch failed with JDK v1.8.0_92. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 67m 22s {color} | {color:red} hadoop-yarn-client in the patch failed with JDK v1.8.0_92. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 9m 55s {color} | {color:red} hadoop-mapreduce-client-app in the patch failed with JDK v1.8.0_92. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color
[jira] [Commented] (YARN-4595) Add support for configurable read-only mounts
[ https://issues.apache.org/jira/browse/YARN-4595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260353#comment-15260353 ] Billie Rinaldi commented on YARN-4595: -- Yes, I believe that is correct. Thanks for the review, [~vvasudev]. > Add support for configurable read-only mounts > - > > Key: YARN-4595 > URL: https://issues.apache.org/jira/browse/YARN-4595 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Billie Rinaldi >Assignee: Billie Rinaldi > Attachments: YARN-4595.1.patch, YARN-4595.2.patch, YARN-4595.3.patch, > YARN-4595.4.patch, YARN-4595.5.patch > > > Mounting files or directories from the host is one way of passing > configuration and other information into a docker container. We could allow > the user to set a list of mounts in the environment of ContainerLaunchContext > (e.g. /dir1:/targetdir1,/dir2:/targetdir2). These would be mounted read-only > to the specified target locations. > Due to permissions and user concerns, for this ticket we will require the > mounts to be resources that are in the distributed cache. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3573) MiniMRYarnCluster constructor that starts the timeline server using a boolean should be marked deprecated
[ https://issues.apache.org/jira/browse/YARN-3573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260263#comment-15260263 ] Andras Bokor commented on YARN-3573: Anybody who can help me? > MiniMRYarnCluster constructor that starts the timeline server using a boolean > should be marked deprecated > - > > Key: YARN-3573 > URL: https://issues.apache.org/jira/browse/YARN-3573 > Project: Hadoop YARN > Issue Type: Test > Components: timelineserver >Affects Versions: 2.6.0 >Reporter: Mit Desai >Assignee: Brahma Reddy Battula > Fix For: 2.8.0 > > Attachments: YARN-3573-002.patch, YARN-3573.patch > > > {code}MiniMRYarnCluster(String testName, int noOfNMs, boolean enableAHS){code} > starts the timeline server using *boolean enableAHS*. It is better to have > the timelineserver started based on the config value. > We should mark this constructor as deprecated to avoid its future use. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4994) Use MiniYARNCluster with try-with-resources in tests
[ https://issues.apache.org/jira/browse/YARN-4994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andras Bokor updated YARN-4994: --- Attachment: HDFS-10287.02.patch [~templedf] Thanks a lot for reviewing my patch. I updated the patch according to you recommendations. Some notes to the second point: My IDE's settings was not in sync with Apache conventions. I set indent to 2 and set continuation intend to 4. [^HDFS-10287.02.patch] > Use MiniYARNCluster with try-with-resources in tests > > > Key: YARN-4994 > URL: https://issues.apache.org/jira/browse/YARN-4994 > Project: Hadoop YARN > Issue Type: Improvement > Components: test >Affects Versions: 2.7.0 >Reporter: Andras Bokor >Assignee: Andras Bokor >Priority: Trivial > Fix For: 2.7.0 > > Attachments: HDFS-10287.01.patch, HDFS-10287.02.patch > > > In tests MiniYARNCluster is used with the following pattern: > In try-catch block create a MiniYARNCluster instance and in finally block > close it. > [Try-with-resources|https://docs.oracle.com/javase/tutorial/essential/exceptions/tryResourceClose.html] > is preferred since Java7 instead of the pattern above. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4122) Add support for GPU as a resource
[ https://issues.apache.org/jira/browse/YARN-4122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260021#comment-15260021 ] Jun Gong commented on YARN-4122: {quote} >From the SLURM lists, it looks like prior to CUDA 7, the environment variable >was not working correctly: https://devtalk.nvidia.com/default/topic/512869/cuda-accessing-all-devices-even-those-which-are-blacklisted/?offset=2 {quote} We are using CUDA 7.5 now. I remember we did not come across this problem. {quote} This design will probably also have to adjust for the work being done in YARN-4726. {quote} Is there any plan for YARN to support GPU? It will be easier to support it based on YARN-3926. It will be a little complex to allocate GPU on NM because we need take GPU's topological structure into consideration for better performance. {quote} In the doc you say that YARN is currently providing you GPU isolation. How are you making that work? {quote} Use cgroups for hard limit. '*docker run --device=...*' does the same job, we do not need set cgroups by ourselves. > Add support for GPU as a resource > - > > Key: YARN-4122 > URL: https://issues.apache.org/jira/browse/YARN-4122 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Jun Gong >Assignee: Jun Gong > Attachments: GPUAsAResourceDesign.pdf > > > Use [cgroups > devcies|https://www.kernel.org/doc/Documentation/cgroups/devices.txt] to > isolate GPUs for containers. For docker containers, we could use 'docker run > --device=...'. > Reference: [SLURM Resources isolation through > cgroups|http://slurm.schedmd.com/slurm_ug_2011/SLURM_UserGroup2011_cgroups.pdf]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4966) Improve yarn logs to fetch container logs without specifying nodeId
[ https://issues.apache.org/jira/browse/YARN-4966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15259945#comment-15259945 ] Hudson commented on YARN-4966: -- FAILURE: Integrated in Hadoop-trunk-Commit #9679 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9679/]) YARN-4966. Improve yarn logs to fetch container logs without specifying (vvasudev: rev 66b07d83740a2ec3e6bfb2bfd064863bae37a1b5) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/LogCLIHelpers.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestLogsCLI.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/LogsCLI.java > Improve yarn logs to fetch container logs without specifying nodeId > --- > > Key: YARN-4966 > URL: https://issues.apache.org/jira/browse/YARN-4966 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Xuan Gong >Assignee: Xuan Gong > Fix For: 2.9.0 > > Attachments: YARN-4966.1.patch, YARN-4966.2.patch, YARN-4966.3.patch, > YARN-4966.4.patch > > > Currently, for the finished application, we can get the container logs > without specify node id, but we need to enable > yarn.timeline-service.generic-application-history.enabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4966) Improve yarn logs to fetch container logs without specifying nodeId
[ https://issues.apache.org/jira/browse/YARN-4966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-4966: Summary: Improve yarn logs to fetch container logs without specifying nodeId (was: More improvement to get Container logs without specify nodeId) > Improve yarn logs to fetch container logs without specifying nodeId > --- > > Key: YARN-4966 > URL: https://issues.apache.org/jira/browse/YARN-4966 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-4966.1.patch, YARN-4966.2.patch, YARN-4966.3.patch, > YARN-4966.4.patch > > > Currently, for the finished application, we can get the container logs > without specify node id, but we need to enable > yarn.timeline-service.generic-application-history.enabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (YARN-4595) Add support for configurable read-only mounts
[ https://issues.apache.org/jira/browse/YARN-4595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15259917#comment-15259917 ] Varun Vasudev edited comment on YARN-4595 at 4/27/16 10:14 AM: --- Thanks for the explanation [~billie.rinaldi]. Just to summarize - the latest patch # allows users to only mount files and directories from the localized resources into the docker container # in case of files, it does not allow symbolic links to be mounted into the docker container # in case of directories, even if they have symbolic links pointing to directories outside the YARN local directories, since the target of the symlink is not mounted into the container, there is no access violation we need to take care of. Is my understanding correct? was (Author: vvasudev): Thanks for the explanation [~billie.rinaldi]. > Add support for configurable read-only mounts > - > > Key: YARN-4595 > URL: https://issues.apache.org/jira/browse/YARN-4595 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Billie Rinaldi >Assignee: Billie Rinaldi > Attachments: YARN-4595.1.patch, YARN-4595.2.patch, YARN-4595.3.patch, > YARN-4595.4.patch, YARN-4595.5.patch > > > Mounting files or directories from the host is one way of passing > configuration and other information into a docker container. We could allow > the user to set a list of mounts in the environment of ContainerLaunchContext > (e.g. /dir1:/targetdir1,/dir2:/targetdir2). These would be mounted read-only > to the specified target locations. > Due to permissions and user concerns, for this ticket we will require the > mounts to be resources that are in the distributed cache. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4595) Add support for configurable read-only mounts
[ https://issues.apache.org/jira/browse/YARN-4595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15259917#comment-15259917 ] Varun Vasudev commented on YARN-4595: - Thanks for the explanation [~billie.rinaldi]. > Add support for configurable read-only mounts > - > > Key: YARN-4595 > URL: https://issues.apache.org/jira/browse/YARN-4595 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Billie Rinaldi >Assignee: Billie Rinaldi > Attachments: YARN-4595.1.patch, YARN-4595.2.patch, YARN-4595.3.patch, > YARN-4595.4.patch, YARN-4595.5.patch > > > Mounting files or directories from the host is one way of passing > configuration and other information into a docker container. We could allow > the user to set a list of mounts in the environment of ContainerLaunchContext > (e.g. /dir1:/targetdir1,/dir2:/targetdir2). These would be mounted read-only > to the specified target locations. > Due to permissions and user concerns, for this ticket we will require the > mounts to be resources that are in the distributed cache. -- This message was sent by Atlassian JIRA (v6.3.4#6332)