[jira] [Commented] (YARN-7913) Improve error handling when application recovery fails with exception
[ https://issues.apache.org/jira/browse/YARN-7913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16483563#comment-16483563 ] Oleksandr Shevchenko commented on YARN-7913: Thank you [~wilfreds] for your comment. I created separated JIRA ticket YARN-7998 to solve the problem related to ACLs configurations changes. > Improve error handling when application recovery fails with exception > - > > Key: YARN-7913 > URL: https://issues.apache.org/jira/browse/YARN-7913 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 3.0.0 >Reporter: Gergo Repas >Assignee: Gergo Repas >Priority: Major > Attachments: YARN-7913.000.poc.patch > > > There are edge cases when the application recovery fails with an exception. > Example failure scenario: > * setup: a queue is a leaf queue in the primary RM's config and the same > queue is a parent queue in the secondary RM's config. > * When failover happens with this setup, the recovery will fail for > applications on this queue, and an APP_REJECTED event will be dispatched to > the async dispatcher. On the same thread (that handles the recovery), a > NullPointerException is thrown when the applicationAttempt is tried to be > recovered > (https://github.com/apache/hadoop/blob/55066cc53dc22b68f9ca55a0029741d6c846be0a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java#L494). > I don't see a good way to avoid the NPE in this scenario, because when the > NPE occurs the APP_REJECTED has not been processed yet, and we don't know > that the application recovery failed. > Currently the first exception will abort the recovery, and if there are X > applications, there will be ~X passive -> active RM transition attempts - the > passive -> active RM transition will only succeed when the last APP_REJECTED > event is processed on the async dispatcher thread. > _The point of this ticket is to improve the error handling and reduce the > number of passive -> active RM transition attempts (solving the above > described failure scenario isn't in scope)._ -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8248) Job hangs when a job requests a resource that its queue does not have
[ https://issues.apache.org/jira/browse/YARN-8248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16483546#comment-16483546 ] Szilard Nemeth commented on YARN-8248: -- Thanks [~haibochen] for the reviews! > Job hangs when a job requests a resource that its queue does not have > - > > Key: YARN-8248 > URL: https://issues.apache.org/jira/browse/YARN-8248 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Reporter: Szilard Nemeth >Assignee: Szilard Nemeth >Priority: Major > Fix For: 3.2.0 > > Attachments: YARN-8248-001.patch, YARN-8248-002.patch, > YARN-8248-003.patch, YARN-8248-004.patch, YARN-8248-005.patch, > YARN-8248-006.patch, YARN-8248-007.patch, YARN-8248-008.patch, > YARN-8248-009.patch, YARN-8248-010.patch, YARN-8248-011.patch, > YARN-8248-012.patch, YARN-8248-013.patch, YARN-8248-014.patch > > > Job hangs when mapreduce.job.queuename is specified and the queue has 0 of > any resource (vcores / memory / other) > In this scenario, the job should be immediately rejected upon submission > since the specified queue cannot serve the resource needs of the submitted > job. > > Command to run: > {code:java} > bin/yarn jar > "./share/hadoop/mapreduce/hadoop-mapreduce-examples-$MY_HADOOP_VERSION.jar" > pi -Dmapreduce.job.queuename=sample_queue 1 1000;{code} > fair-scheduler.xml queue config (excerpt): > > {code:java} > > 1 mb,0vcores > 9 mb,0vcores > 50 > -1.0f > 2.0 > fair > > {code} > Diagnostic message from the web UI: > {code:java} > Wed May 02 06:35:57 -0700 2018] Application is added to the scheduler and is > not yet activated. (Resource request: exceeds current > queue or its parents maximum resource allowed).{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4677) RMNodeResourceUpdateEvent update from scheduler can lead to race condition
[ https://issues.apache.org/jira/browse/YARN-4677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16483530#comment-16483530 ] Wilfred Spiegelenburg commented on YARN-4677: - This is the exception thrown when the issue happens {code:java} FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type NODE_UPDATE to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.nodeUpdate(FairScheduler.java:892) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1089) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:122) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:709) at java.lang.Thread.run(Thread.java:748) {code} The NPE is not caught and that triggers the uncaught exception handler which then triggers the exit. The code is a custom code base patched with a number of things, this is the code snippet that reflects the specific codebase: {code:java} 891 if (nm.getState() == NodeState.DECOMMISSIONING) { 892 this.rmContext 893 .getDispatcher() 894 .getEventHandler() 895 .handle( 896new RMNodeResourceUpdateEvent(nm.getNodeID(), ResourceOption 897 .newInstance(getSchedulerNode(nm.getNodeID()) 898 .getUsedResource(), 0))); 899 } {code} > RMNodeResourceUpdateEvent update from scheduler can lead to race condition > -- > > Key: YARN-4677 > URL: https://issues.apache.org/jira/browse/YARN-4677 > Project: Hadoop YARN > Issue Type: Sub-task > Components: graceful, resourcemanager, scheduler >Affects Versions: 2.7.1 >Reporter: Brook Zhou >Assignee: Wilfred Spiegelenburg >Priority: Major > Attachments: YARN-4677-branch-2.001.patch, > YARN-4677-branch-2.002.patch, YARN-4677.01.patch > > > When a node is in decommissioning state, there is time window between > completedContainer() and RMNodeResourceUpdateEvent get handled in > scheduler.nodeUpdate (YARN-3223). > So if a scheduling effort happens within this window, the new container could > still get allocated on this node. Even worse case is if scheduling effort > happen after RMNodeResourceUpdateEvent sent out but before it is propagated > to SchedulerNode - then the total resource is lower than used resource and > available resource is a negative value. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8320) Add support CPU isolation for latency-sensitive (LS) service
[ https://issues.apache.org/jira/browse/YARN-8320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiandan Yang updated YARN-8320: Attachment: CPU-isolation-for-latency-sensitive-services-v1.pdf > Add support CPU isolation for latency-sensitive (LS) service > - > > Key: YARN-8320 > URL: https://issues.apache.org/jira/browse/YARN-8320 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager >Reporter: Jiandan Yang >Priority: Major > Attachments: CPU-isolation-for-latency-sensitive-services-v1.pdf, > YARN-8320.001.patch > > > Currently NodeManager uses “cpu.cfs_period_us”, “cpu.cfs_quota_us” and > “cpu.shares” to isolate cpu resource. However, > * Linux Completely Fair Scheduling (CFS) is a throughput-oriented scheduler; > no support for differentiated latency > * Request latency of services running on container may be frequent shake > when all containers share cpus, and latency-sensitive services can not afford > in our production environment. > So we need more finer cpu isolation. > My co-workers and I propose a solution using cgroup cpuset to binds > containers to different processors, this is inspired by the isolation > technique in [Borg > system|http://schd.ws/hosted_files/lcccna2016/a7/CAT%20@%20Scale.pdf]. > Later I will upload a detailed design doc. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8320) Add support CPU isolation for latency-sensitive (LS) service
[ https://issues.apache.org/jira/browse/YARN-8320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiandan Yang updated YARN-8320: Attachment: (was: CPU-isolation-for-latency-sensitive-services-v1.pdf) > Add support CPU isolation for latency-sensitive (LS) service > - > > Key: YARN-8320 > URL: https://issues.apache.org/jira/browse/YARN-8320 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager >Reporter: Jiandan Yang >Priority: Major > Attachments: CPU-isolation-for-latency-sensitive-services-v1.pdf, > YARN-8320.001.patch > > > Currently NodeManager uses “cpu.cfs_period_us”, “cpu.cfs_quota_us” and > “cpu.shares” to isolate cpu resource. However, > * Linux Completely Fair Scheduling (CFS) is a throughput-oriented scheduler; > no support for differentiated latency > * Request latency of services running on container may be frequent shake > when all containers share cpus, and latency-sensitive services can not afford > in our production environment. > So we need more finer cpu isolation. > My co-workers and I propose a solution using cgroup cpuset to binds > containers to different processors, this is inspired by the isolation > technique in [Borg > system|http://schd.ws/hosted_files/lcccna2016/a7/CAT%20@%20Scale.pdf]. > Later I will upload a detailed design doc. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7913) Improve error handling when application recovery fails with exception
[ https://issues.apache.org/jira/browse/YARN-7913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16483490#comment-16483490 ] Wilfred Spiegelenburg commented on YARN-7913: - Just ran into this issue and stumbled into this jira. First point I do not see how continuing on really solves the issue. We are recovering running applications when this issue occurs. If we have failed or finished applications we shortcut and process things differently. I fixed restoring failed finished application in YARN-7139 in the proper queues. Do we really want to fail recovering a running application when we have a scheduler configuration that is out of sync? We do not allow the removal of a queue that is not empty and also handle other configuration changes graceful. That is all covered in YARN-8191. To now say we just dump a running application and forget about it is not the correct thing. I think we should handle this gracefully and properly restore the running application. As described by [~oshevchenko] we should take into account the ACLs and queues etc. YARN-2308 for the capacity scheduler handles it by stating that you MUST have all the queues from the previous state in the system. We can do something like that or more dynamic. I worked with Yufei on a possible solution a while back around this that fits in with the dynamic way we work with rules etc. I am happy to add a poc for this to the jira. > Improve error handling when application recovery fails with exception > - > > Key: YARN-7913 > URL: https://issues.apache.org/jira/browse/YARN-7913 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 3.0.0 >Reporter: Gergo Repas >Assignee: Gergo Repas >Priority: Major > Attachments: YARN-7913.000.poc.patch > > > There are edge cases when the application recovery fails with an exception. > Example failure scenario: > * setup: a queue is a leaf queue in the primary RM's config and the same > queue is a parent queue in the secondary RM's config. > * When failover happens with this setup, the recovery will fail for > applications on this queue, and an APP_REJECTED event will be dispatched to > the async dispatcher. On the same thread (that handles the recovery), a > NullPointerException is thrown when the applicationAttempt is tried to be > recovered > (https://github.com/apache/hadoop/blob/55066cc53dc22b68f9ca55a0029741d6c846be0a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java#L494). > I don't see a good way to avoid the NPE in this scenario, because when the > NPE occurs the APP_REJECTED has not been processed yet, and we don't know > that the application recovery failed. > Currently the first exception will abort the recovery, and if there are X > applications, there will be ~X passive -> active RM transition attempts - the > passive -> active RM transition will only succeed when the last APP_REJECTED > event is processed on the async dispatcher thread. > _The point of this ticket is to improve the error handling and reduce the > number of passive -> active RM transition attempts (solving the above > described failure scenario isn't in scope)._ -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8108) RM metrics rest API throws GSSException in kerberized environment
[ https://issues.apache.org/jira/browse/YARN-8108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16483464#comment-16483464 ] Yongjun Zhang edited comment on YARN-8108 at 5/22/18 4:15 AM: -- Hi [~eyang] and [~kbadani], Would you please confirm that this jira is a blocker for 3.0.3 release? If so, can we expedite the work here since we are in the process of preparing 3.0.3 release? Thanks a lot. was (Author: yzhangal): Hi Guys, Would you please confirm that this jira is a blocker for 3.0.3 release? If so, can we expedite the work here since we are in the process of preparing 3.0.3 release? Thanks a lot. > RM metrics rest API throws GSSException in kerberized environment > - > > Key: YARN-8108 > URL: https://issues.apache.org/jira/browse/YARN-8108 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Kshitij Badani >Assignee: Eric Yang >Priority: Blocker > Attachments: YARN-8108.001.patch > > > Test is trying to pull up metrics data from SHS after kiniting as 'test_user' > It is throwing GSSException as follows > {code:java} > b2b460b80713|RUNNING: curl --silent -k -X GET -D > /hwqe/hadoopqe/artifacts/tmp-94845 --negotiate -u : > http://rm_host:8088/proxy/application_1518674952153_0070/metrics/json2018-02-15 > 07:15:48,757|INFO|MainThread|machine.py:194 - > run()||GUID=fc5a3266-28f8-4eed-bae2-b2b460b80713|Exit Code: 0 > 2018-02-15 07:15:48,758|INFO|MainThread|spark.py:1757 - > getMetricsJsonData()|metrics: > > > > Error 403 GSSException: Failure unspecified at GSS-API level > (Mechanism level: Request is a replay (34)) > > HTTP ERROR 403 > Problem accessing /proxy/application_1518674952153_0070/metrics/json. > Reason: > GSSException: Failure unspecified at GSS-API level (Mechanism level: > Request is a replay (34)) > > > {code} > Rootcausing : proxyserver on RM can't be supported for Kerberos enabled > cluster because AuthenticationFilter is applied twice in Hadoop code (once in > httpServer2 for RM, and another instance from AmFilterInitializer for proxy > server). This will require code changes to hadoop-yarn-server-web-proxy > project -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8108) RM metrics rest API throws GSSException in kerberized environment
[ https://issues.apache.org/jira/browse/YARN-8108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16483464#comment-16483464 ] Yongjun Zhang commented on YARN-8108: - Hi Guys, Would you please confirm that this jira is a blocker for 3.0.3 release? If so, can we expedite the work here since we are in the process of preparing 3.0.3 release? Thanks a lot. > RM metrics rest API throws GSSException in kerberized environment > - > > Key: YARN-8108 > URL: https://issues.apache.org/jira/browse/YARN-8108 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Kshitij Badani >Assignee: Eric Yang >Priority: Blocker > Attachments: YARN-8108.001.patch > > > Test is trying to pull up metrics data from SHS after kiniting as 'test_user' > It is throwing GSSException as follows > {code:java} > b2b460b80713|RUNNING: curl --silent -k -X GET -D > /hwqe/hadoopqe/artifacts/tmp-94845 --negotiate -u : > http://rm_host:8088/proxy/application_1518674952153_0070/metrics/json2018-02-15 > 07:15:48,757|INFO|MainThread|machine.py:194 - > run()||GUID=fc5a3266-28f8-4eed-bae2-b2b460b80713|Exit Code: 0 > 2018-02-15 07:15:48,758|INFO|MainThread|spark.py:1757 - > getMetricsJsonData()|metrics: > > > > Error 403 GSSException: Failure unspecified at GSS-API level > (Mechanism level: Request is a replay (34)) > > HTTP ERROR 403 > Problem accessing /proxy/application_1518674952153_0070/metrics/json. > Reason: > GSSException: Failure unspecified at GSS-API level (Mechanism level: > Request is a replay (34)) > > > {code} > Rootcausing : proxyserver on RM can't be supported for Kerberos enabled > cluster because AuthenticationFilter is applied twice in Hadoop code (once in > httpServer2 for RM, and another instance from AmFilterInitializer for proxy > server). This will require code changes to hadoop-yarn-server-web-proxy > project -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8292) Fix the dominant resource preemption cannot happen when some of the resource vector becomes negative
[ https://issues.apache.org/jira/browse/YARN-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16483402#comment-16483402 ] genericqa commented on YARN-8292: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 37s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 4 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 20s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 46s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 47s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 53s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 15s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 48s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 24s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 42s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 23s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 12 new + 97 unchanged - 0 fixed = 109 total (was 97) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 38s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 29s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 9s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 24s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 21s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 69m 4s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 33s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}158m 10s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.monitor.capacity.TestPreemptionForQueueWithPriorities | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd | | JIRA Issue | YARN-8292 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12924438/YARN-8292.006.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 7acd2ecf598c 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess
[jira] [Commented] (YARN-8327) Fix TestAggregatedLogFormat#testReadAcontainerLogs1 on Windows
[ https://issues.apache.org/jira/browse/YARN-8327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16483394#comment-16483394 ] Anbang Hu commented on YARN-8327: - +1 on [^YARN-8327.v2.patch]. > Fix TestAggregatedLogFormat#testReadAcontainerLogs1 on Windows > -- > > Key: YARN-8327 > URL: https://issues.apache.org/jira/browse/YARN-8327 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation >Reporter: Giovanni Matteo Fumarola >Assignee: Giovanni Matteo Fumarola >Priority: Major > Attachments: YARN-8327.v1.patch, YARN-8327.v2.patch, > image-2018-05-18-16-52-08-250.png, image-2018-05-21-09-05-49-550.png > > > TestAggregatedLogFormat#testReadAcontainerLogs1 fails on Windows because of > the line separator. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8336) Fix potential connection leak in SchedConfCLI and YarnWebServiceUtils
[ https://issues.apache.org/jira/browse/YARN-8336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16483206#comment-16483206 ] genericqa commented on YARN-8336: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 27s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 31s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 29s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 8s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 23s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 44s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 57s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 6s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 5s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 5s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 8s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 48s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 27s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 48s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 20s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 14s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 28m 25s{color} | {color:green} hadoop-yarn-client in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 34s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}106m 28s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common | | | Nullcheck of webServiceClient at line 59 of value previously dereferenced in org.apache.hadoop.yarn.webapp.util.YarnWebServiceUtils.getNodeInfoFromRMWebService(Configuration, String) At YarnWebServiceUtils.java:59 of value previously dereferenced in org.apache.hadoop.yarn.webapp.util.YarnWebServiceUtils.getNodeInfoFromRMWebService(Con
[jira] [Commented] (YARN-8292) Fix the dominant resource preemption cannot happen when some of the resource vector becomes negative
[ https://issues.apache.org/jira/browse/YARN-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16483185#comment-16483185 ] Wangda Tan commented on YARN-8292: -- Rebased to latest trunk (006) > Fix the dominant resource preemption cannot happen when some of the resource > vector becomes negative > > > Key: YARN-8292 > URL: https://issues.apache.org/jira/browse/YARN-8292 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Sumana Sathish >Assignee: Wangda Tan >Priority: Critical > Attachments: YARN-8292.001.patch, YARN-8292.002.patch, > YARN-8292.003.patch, YARN-8292.004.patch, YARN-8292.005.patch, > YARN-8292.006.patch > > > This is an example of the problem: > > {code} > // guaranteed, max,used, pending > "root(=[30:18:6 30:18:6 12:12:6 1:1:1]);" + //root > "-a(=[10:6:2 10:6:2 6:6:3 0:0:0]);" + // a > "-b(=[10:6:2 10:6:2 6:6:3 0:0:0]);" + // b > "-c(=[10:6:2 10:6:2 0:0:0 1:1:1])"; // c > {code} > There're 3 resource types. Total resource of the cluster is 30:18:6 > For both of a/b, there're 3 containers running, each of container is 2:2:1. > Queue c uses 0 resource, and have 1:1:1 pending resource. > Under existing logic, preemption cannot happen. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8292) Fix the dominant resource preemption cannot happen when some of the resource vector becomes negative
[ https://issues.apache.org/jira/browse/YARN-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-8292: - Attachment: YARN-8292.006.patch > Fix the dominant resource preemption cannot happen when some of the resource > vector becomes negative > > > Key: YARN-8292 > URL: https://issues.apache.org/jira/browse/YARN-8292 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Sumana Sathish >Assignee: Wangda Tan >Priority: Critical > Attachments: YARN-8292.001.patch, YARN-8292.002.patch, > YARN-8292.003.patch, YARN-8292.004.patch, YARN-8292.005.patch, > YARN-8292.006.patch > > > This is an example of the problem: > > {code} > // guaranteed, max,used, pending > "root(=[30:18:6 30:18:6 12:12:6 1:1:1]);" + //root > "-a(=[10:6:2 10:6:2 6:6:3 0:0:0]);" + // a > "-b(=[10:6:2 10:6:2 6:6:3 0:0:0]);" + // b > "-c(=[10:6:2 10:6:2 0:0:0 1:1:1])"; // c > {code} > There're 3 resource types. Total resource of the cluster is 30:18:6 > For both of a/b, there're 3 containers running, each of container is 2:2:1. > Queue c uses 0 resource, and have 1:1:1 pending resource. > Under existing logic, preemption cannot happen. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8334) Fix potential connection leak in GPGUtils
[ https://issues.apache.org/jira/browse/YARN-8334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16483182#comment-16483182 ] genericqa commented on YARN-8334: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 27m 50s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} YARN-7402 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 24s{color} | {color:green} YARN-7402 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 17s{color} | {color:green} YARN-7402 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 12s{color} | {color:green} YARN-7402 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 21s{color} | {color:green} YARN-7402 passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 33s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 27s{color} | {color:green} YARN-7402 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s{color} | {color:green} YARN-7402 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 8s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 33s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 36s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-globalpolicygenerator generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 12s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 29s{color} | {color:green} hadoop-yarn-server-globalpolicygenerator in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 24s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 75m 3s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-globalpolicygenerator | | | Nullcheck of client at line 56 of value previously dereferenced in org.apache.hadoop.yarn.server.globalpolicygenerator.GPGUtils.invokeRMWebService(Configuration, String, String, Class) At GPGUtils.java:56 of value previously dereferenced in org.apache.hadoop.yarn.server.globalpolicygenerator.GPGUtils.invokeRMWebService(Configuration, String, String, Class) At GPGUtils.java:[line 56] | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 | | JIRA Issue | YARN-8334 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12924405/YARN-8334-YARN-7402.v1.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux d40cad36d959 4.4.0-116-generic #140-Ubuntu SMP Mon Feb 12 21:23:04 UTC 2018 x86_
[jira] [Commented] (YARN-8292) Fix the dominant resource preemption cannot happen when some of the resource vector becomes negative
[ https://issues.apache.org/jira/browse/YARN-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16483181#comment-16483181 ] Wangda Tan commented on YARN-8292: -- Thanks [~eepayne], I just checked both, For the infra queue preemption behavior: bq. For example, if gpu is the extended resource, but no apps are currently using gpu in the queue, no intra-queue preemption will take place. I think you're correct, the change I propose is: {code} if (conservativeDRF) { // When we want to do less aggressive preemption, we don't want to // preempt from any resource type if after preemption it becomes 0 or // negative. // For example: // - to-obtain = <30, 20, 0>, container <20, 20, 0> => allowed // - to-obtain = <30, 20, 0>, container <10, 10, 1> => disallowed // - to-obtain = <20, 30, 1>, container <20, 30, 1> => allowed // - to-obtain = <10, 20, 1>, container <11, 11, 0> = disallowed. doPreempt = Resources.lessThan(rc, clusterResource, Resources .componentwiseMin(toObtainAfterPreemption, Resources.none()), Resources.componentwiseMin(toObtainByPartition, Resources.none())); {code} However, this causes many (more than 20) infra queue preemption test cases failure. Since the logic (ver.005 patch) is not a regression. Can we address this in a separate JIRA if we cannot come with some simple solution? For: bq. I don't think this is necessary. .. Actually this is required after the change. TLDR; We now deduct unassigned (005) while doing calculation, but the previous logic doesn't. The previous logic deduct it after each iteration: {code} Resources.addTo(wQassigned, wQdone); } Resources.subtractFrom(unassigned, wQassigned); {code} In our logic, we need the up-to-date unassigned to cap the {{wQavail}}, so we deduct it with the calculation. {code} // Make sure it is not beyond unassigned wQavail = Resources.componentwiseMin(wQavail, unassigned); {code} > Fix the dominant resource preemption cannot happen when some of the resource > vector becomes negative > > > Key: YARN-8292 > URL: https://issues.apache.org/jira/browse/YARN-8292 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Sumana Sathish >Assignee: Wangda Tan >Priority: Critical > Attachments: YARN-8292.001.patch, YARN-8292.002.patch, > YARN-8292.003.patch, YARN-8292.004.patch, YARN-8292.005.patch > > > This is an example of the problem: > > {code} > // guaranteed, max,used, pending > "root(=[30:18:6 30:18:6 12:12:6 1:1:1]);" + //root > "-a(=[10:6:2 10:6:2 6:6:3 0:0:0]);" + // a > "-b(=[10:6:2 10:6:2 6:6:3 0:0:0]);" + // b > "-c(=[10:6:2 10:6:2 0:0:0 1:1:1])"; // c > {code} > There're 3 resource types. Total resource of the cluster is 30:18:6 > For both of a/b, there're 3 containers running, each of container is 2:2:1. > Queue c uses 0 resource, and have 1:1:1 pending resource. > Under existing logic, preemption cannot happen. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8327) Fix TestAggregatedLogFormat#testReadAcontainerLogs1 on Windows
[ https://issues.apache.org/jira/browse/YARN-8327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16483164#comment-16483164 ] genericqa commented on YARN-8327: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 28s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 1s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 41s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 23s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 36s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 49s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 20s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 44s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 32s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 41s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 18s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 37s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 13s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 22s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 57m 2s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd | | JIRA Issue | YARN-8327 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12924426/YARN-8327.v2.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 380119518e8b 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 0b4c44b | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_162 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/20813/testReport/ | | Max. process+thread count | 407 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/20813/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Fix TestAggregatedLogFormat#testReadAcontainerLogs1 on Windows > --
[jira] [Commented] (YARN-8334) Fix potential connection leak in GPGUtils
[ https://issues.apache.org/jira/browse/YARN-8334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16483159#comment-16483159 ] Botong Huang commented on YARN-8334: Thanks [~giovanni.fumarola] for the patch. Calling destroy() explicitly seems weird. Is it possible to avoid that? Otherwise LGTM. > Fix potential connection leak in GPGUtils > - > > Key: YARN-8334 > URL: https://issues.apache.org/jira/browse/YARN-8334 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Giovanni Matteo Fumarola >Assignee: Giovanni Matteo Fumarola >Priority: Minor > Attachments: YARN-8334-YARN-7402.v1.patch > > > Missing ClientResponse.close and Client.destroy can lead to a connection leak. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8329) Docker client configuration can still be set incorrectly
[ https://issues.apache.org/jira/browse/YARN-8329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16483140#comment-16483140 ] Jason Lowe commented on YARN-8329: -- Thanks for the patch! I'm wondering why the code is filtering the tokens by kind into a separate credentials only to pass that to another method which will also filter the tokens by the same kind. Can't we just iterate the original credentials directly, ignoring tokens that aren't the appropriate kind? I'm not seeing why the copy is necessary. Eliminating the copy would also eliminate the need to do a token identifier decode to construct an alias. > Docker client configuration can still be set incorrectly > > > Key: YARN-8329 > URL: https://issues.apache.org/jira/browse/YARN-8329 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.2.0, 3.1.1 >Reporter: Shane Kumpf >Assignee: Shane Kumpf >Priority: Major > Labels: Docker > Attachments: YARN-8329.001.patch > > > YARN-7996 implemented a fix to avoid writing an empty Docker client > configuration file, but there are still cases where the {{docker --config}} > argument is set in error. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7960) Add no-new-privileges flag to docker run
[ https://issues.apache.org/jira/browse/YARN-7960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16483138#comment-16483138 ] Eric Yang commented on YARN-7960: - +1 looks good to me. > Add no-new-privileges flag to docker run > > > Key: YARN-7960 > URL: https://issues.apache.org/jira/browse/YARN-7960 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Labels: Docker > Attachments: YARN-7960.001.patch, YARN-7960.002.patch > > > Minimally, this should be used for unprivileged containers. It's a cheap way > to add an extra layer of security to the docker model. For privileged > containers, it might be appropriate to omit this flag > https://github.com/moby/moby/pull/20727 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8041) [Router] Federation: routing some missing REST invocations transparently to multiple RMs
[ https://issues.apache.org/jira/browse/YARN-8041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16483119#comment-16483119 ] genericqa commented on YARN-8041: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 47s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 3 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 27m 17s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 38s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 1s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 51s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 5s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 39s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 10s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 54s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 56s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server: The patch generated 54 new + 18 unchanged - 0 fixed = 72 total (was 18) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 42s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 48s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 5s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 6s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 7s{color} | {color:green} hadoop-yarn-server-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 68m 0s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 21s{color} | {color:green} hadoop-yarn-server-router in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 23s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}144m 45s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd | | JIRA Issue | YARN-8041 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12924401/YARN-8041.004.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 75b548cc4895 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build t
[jira] [Commented] (YARN-8326) Yarn 3.0 seems runs slower than Yarn 2.6
[ https://issues.apache.org/jira/browse/YARN-8326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16483116#comment-16483116 ] Eric Yang commented on YARN-8326: - [~hlhu...@us.ibm.com] Does the same log entries show up? > Yarn 3.0 seems runs slower than Yarn 2.6 > > > Key: YARN-8326 > URL: https://issues.apache.org/jira/browse/YARN-8326 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 3.0.0 > Environment: This is the yarn-site.xml for 3.0. > > > > hadoop.registry.dns.bind-port > 5353 > > > hadoop.registry.dns.domain-name > hwx.site > > > hadoop.registry.dns.enabled > true > > > hadoop.registry.dns.zone-mask > 255.255.255.0 > > > hadoop.registry.dns.zone-subnet > 172.17.0.0 > > > manage.include.files > false > > > yarn.acl.enable > false > > > yarn.admin.acl > yarn > > > yarn.client.nodemanager-connect.max-wait-ms > 6 > > > yarn.client.nodemanager-connect.retry-interval-ms > 1 > > > yarn.http.policy > HTTP_ONLY > > > yarn.log-aggregation-enable > false > > > yarn.log-aggregation.retain-seconds > 2592000 > > > yarn.log.server.url > > [http://xx:19888/jobhistory/logs|http://whiny2.fyre.ibm.com:19888/jobhistory/logs] > > > yarn.log.server.web-service.url > > [http://xx:8188/ws/v1/applicationhistory|http://whiny2.fyre.ibm.com:8188/ws/v1/applicationhistory] > > > yarn.node-labels.enabled > false > > > yarn.node-labels.fs-store.retry-policy-spec > 2000, 500 > > > yarn.node-labels.fs-store.root-dir > /system/yarn/node-labels > > > yarn.nodemanager.address > 0.0.0.0:45454 > > > yarn.nodemanager.admin-env > MALLOC_ARENA_MAX=$MALLOC_ARENA_MAX > > > yarn.nodemanager.aux-services > mapreduce_shuffle,spark2_shuffle,timeline_collector > > > yarn.nodemanager.aux-services.mapreduce_shuffle.class > org.apache.hadoop.mapred.ShuffleHandler > > > yarn.nodemanager.aux-services.spark2_shuffle.class > org.apache.spark.network.yarn.YarnShuffleService > > > yarn.nodemanager.aux-services.spark2_shuffle.classpath > /usr/spark2/aux/* > > > yarn.nodemanager.aux-services.spark_shuffle.class > org.apache.spark.network.yarn.YarnShuffleService > > > yarn.nodemanager.aux-services.timeline_collector.class > > org.apache.hadoop.yarn.server.timelineservice.collector.PerNodeTimelineCollectorsAuxService > > > yarn.nodemanager.bind-host > 0.0.0.0 > > > yarn.nodemanager.container-executor.class > > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor > > > yarn.nodemanager.container-metrics.unregister-delay-ms > 6 > > > yarn.nodemanager.container-monitor.interval-ms > 3000 > > > yarn.nodemanager.delete.debug-delay-sec > 0 > > > > yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage > 90 > > > yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb > 1000 > > > yarn.nodemanager.disk-health-checker.min-healthy-disks > 0.25 > > > yarn.nodemanager.health-checker.interval-ms > 135000 > > > yarn.nodemanager.health-checker.script.timeout-ms > 6 > > > > yarn.nodemanager.linux-container-executor.cgroups.strict-resource-usage > false > > > yarn.nodemanager.linux-container-executor.group > hadoop > > > > yarn.nodemanager.linux-container-executor.nonsecure-mode.limit-users > false > > > yarn.nodemanager.local-dirs > /hadoop/yarn/local > > > yarn.nodemanager.log-aggregation.compression-type > gz > > > yarn.nodemanager.log-aggregation.debug-enabled > false > > > yarn.nodemanager.log-aggregation.num-log-files-per-app > 30 > > > > yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds > 3600 > > > yarn.nodemanager.log-dirs > /hadoop/yarn/log > > > yarn.nodemanager.log.retain-seconds > 604800 > > > yarn.nodemanager.pmem-check-enabled > false > > > yarn.nodemanager.recovery.dir > /var/log/hadoop-yarn/nodemanager/recovery-state > > > yarn.nodemanager.recovery.enabled > true > > > yarn.nodemanager.recovery.supervised > true > > > yarn.nodemanager.remote-app-log-dir > /app-logs > > > yarn.nodemanager.remote-app-log-dir-suffix > logs > > > yarn.nodemanager.resource-plugins > > > > yarn.nodemanager.resource-plugins.gpu.allowed-gpu-devices > auto > > > yarn.nodemanager.resource-plugins.gpu.docker-plugin > nvidia-docker-v1 > > > yarn.nodemanager.resource-plugins.gpu.docker-plugin.nvidiadocker- > v1.endpoint > [http://localhost:3476/v1.0/docker/cli] > > > > yarn.nodemanager.resource-plugins.gpu.path-to-discovery-executables > > > > yarn.nodemanager.resource.cpu-vcores > 6 > > > yarn.nodemanager.resource.memory-mb > 12288 > > > yarn.nodemanager.resource.percentage-ph
[jira] [Commented] (YARN-8327) Fix TestAggregatedLogFormat#testReadAcontainerLogs1 on Windows
[ https://issues.apache.org/jira/browse/YARN-8327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16483099#comment-16483099 ] Giovanni Matteo Fumarola commented on YARN-8327: [~huanbang1993] thanks for the review. Attached v2. > Fix TestAggregatedLogFormat#testReadAcontainerLogs1 on Windows > -- > > Key: YARN-8327 > URL: https://issues.apache.org/jira/browse/YARN-8327 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation >Reporter: Giovanni Matteo Fumarola >Assignee: Giovanni Matteo Fumarola >Priority: Major > Attachments: YARN-8327.v1.patch, YARN-8327.v2.patch, > image-2018-05-18-16-52-08-250.png, image-2018-05-21-09-05-49-550.png > > > TestAggregatedLogFormat#testReadAcontainerLogs1 fails on Windows because of > the line separator. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8327) Fix TestAggregatedLogFormat#testReadAcontainerLogs1 on Windows
[ https://issues.apache.org/jira/browse/YARN-8327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Giovanni Matteo Fumarola updated YARN-8327: --- Attachment: YARN-8327.v2.patch > Fix TestAggregatedLogFormat#testReadAcontainerLogs1 on Windows > -- > > Key: YARN-8327 > URL: https://issues.apache.org/jira/browse/YARN-8327 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation >Reporter: Giovanni Matteo Fumarola >Assignee: Giovanni Matteo Fumarola >Priority: Major > Attachments: YARN-8327.v1.patch, YARN-8327.v2.patch, > image-2018-05-18-16-52-08-250.png, image-2018-05-21-09-05-49-550.png > > > TestAggregatedLogFormat#testReadAcontainerLogs1 fails on Windows because of > the line separator. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8173) [Router] Implement missing FederationClientInterceptor#getApplications()
[ https://issues.apache.org/jira/browse/YARN-8173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Giovanni Matteo Fumarola updated YARN-8173: --- Attachment: YARN-8327.v2.patch > [Router] Implement missing FederationClientInterceptor#getApplications() > > > Key: YARN-8173 > URL: https://issues.apache.org/jira/browse/YARN-8173 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.0.0 >Reporter: Yiran Wu >Assignee: Yiran Wu >Priority: Major > Attachments: YARN-8173.001.patch, YARN-8173.002.patch, > YARN-8173.003.patch > > > oozie dependent method Implement > {code:java} > getApplications() > getDeglationToken() > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8173) [Router] Implement missing FederationClientInterceptor#getApplications()
[ https://issues.apache.org/jira/browse/YARN-8173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Giovanni Matteo Fumarola updated YARN-8173: --- Attachment: (was: YARN-8327.v2.patch) > [Router] Implement missing FederationClientInterceptor#getApplications() > > > Key: YARN-8173 > URL: https://issues.apache.org/jira/browse/YARN-8173 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.0.0 >Reporter: Yiran Wu >Assignee: Yiran Wu >Priority: Major > Attachments: YARN-8173.001.patch, YARN-8173.002.patch, > YARN-8173.003.patch > > > oozie dependent method Implement > {code:java} > getApplications() > getDeglationToken() > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8206) Sending a kill does not immediately kill docker containers
[ https://issues.apache.org/jira/browse/YARN-8206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16483087#comment-16483087 ] Jason Lowe commented on YARN-8206: -- Thanks for updating the patch! +1 lgtm. I'll commit this tomorrow if there are no objections. > Sending a kill does not immediately kill docker containers > -- > > Key: YARN-8206 > URL: https://issues.apache.org/jira/browse/YARN-8206 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Labels: Docker > Attachments: YARN-8206.001.patch, YARN-8206.002.patch, > YARN-8206.003.patch, YARN-8206.004.patch, YARN-8206.005.patch, > YARN-8206.006.patch, YARN-8206.007.patch, YARN-8206.008.patch, > YARN-8206.009.patch, YARN-8206.010.patch, YARN-8206.011.patch > > > {noformat} > if (ContainerExecutor.Signal.KILL.equals(signal) > || ContainerExecutor.Signal.TERM.equals(signal)) { > handleContainerStop(containerId, env); > {noformat} > Currently in the code, we are handling both SIGKILL and SIGTERM as equivalent > for docker containers. However, they should actually be separate. When YARN > sends a SIGKILL to a process, it means for it to die immediately and not sit > around waiting for anything. This ensures an immediate reclamation of > resources. Additionally, if a SIGTERM is sent before the SIGKILL, the task > might not handle the signal correctly, and will then end up as a failed task > instead of a killed task. This is especially bad for preemption. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8259) Revisit liveliness checks for Docker containers
[ https://issues.apache.org/jira/browse/YARN-8259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16483057#comment-16483057 ] Eric Yang commented on YARN-8259: - System administrator can reserve one cpu core for node manager and all the docker inspect call are counted toward saturating one cpu core, but not more. Exact accounting is not available today, but I usually recommend customers to do this to avoid system overload. At a glance of yarn code base, I only found one instance of code that is reading /proc/[pid]/ from node manager. This is located in CGroupsResourceCalculator.java. Hence, hidepid is not working by implementation. This can be addressed in other JIRAs to make this proper. I am +0 on this patch. > Revisit liveliness checks for Docker containers > --- > > Key: YARN-8259 > URL: https://issues.apache.org/jira/browse/YARN-8259 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.0.2, 3.2.0, 3.1.1 >Reporter: Shane Kumpf >Assignee: Shane Kumpf >Priority: Major > Labels: Docker > Attachments: YARN-8259.001.patch > > > As privileged containers may execute as a user that does not match the YARN > run as user, sending the null signal for liveliness checks could fail. We > need to reconsider how liveliness checks are handled in the Docker case. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8179) Preemption does not happen due to natural_termination_factor when DRF is used
[ https://issues.apache.org/jira/browse/YARN-8179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16483046#comment-16483046 ] Eric Payne commented on YARN-8179: -- Committed to trunk, branch-3.1, and branch-3.0. Thanks for the good work, [~kyungwan nam] > Preemption does not happen due to natural_termination_factor when DRF is used > - > > Key: YARN-8179 > URL: https://issues.apache.org/jira/browse/YARN-8179 > Project: Hadoop YARN > Issue Type: Bug >Reporter: kyungwan nam >Assignee: kyungwan nam >Priority: Major > Fix For: 3.2.0, 3.1.1, 3.0.3 > > Attachments: YARN-8179.001.patch, YARN-8179.002.patch, > YARN-8179.003.patch > > > cluster > * DominantResourceCalculator > * QueueA : 50 (capacity) ~ 100 (max capacity) > * QueueB : 50 (capacity) ~ 50 (max capacity) > all of resources have been allocated to QueueA. (all Vcores are allocated to > QueueA) > if App1 is submitted to QueueB, over-utilized QueueA should be preempted. > but, I’ve met the problem, which preemption does not happen. it caused that > App1 AM can not allocated. > when App1 is submitted, pending resources for asking App1 AM would be > > so, Vcores which need to be preempted from QueueB should be 1. > but, it can be 0 due to natural_termination_factor (default is 0.2) > we should guarantee that resources not to be 0 even though applying > natural_termination_factor -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8259) Revisit liveliness checks for Docker containers
[ https://issues.apache.org/jira/browse/YARN-8259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16483041#comment-16483041 ] Eric Badger commented on YARN-8259: --- Also, I have tested the current patch for correctness. So, if we decide to go with the current implementation, I am +1 on the patch. > Revisit liveliness checks for Docker containers > --- > > Key: YARN-8259 > URL: https://issues.apache.org/jira/browse/YARN-8259 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.0.2, 3.2.0, 3.1.1 >Reporter: Shane Kumpf >Assignee: Shane Kumpf >Priority: Major > Labels: Docker > Attachments: YARN-8259.001.patch > > > As privileged containers may execute as a user that does not match the YARN > run as user, sending the null signal for liveliness checks could fail. We > need to reconsider how liveliness checks are handled in the Docker case. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8259) Revisit liveliness checks for Docker containers
[ https://issues.apache.org/jira/browse/YARN-8259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16483038#comment-16483038 ] Eric Badger commented on YARN-8259: --- bq. If hidepid option is used by system administrator, yarn user might not have rights to check if /proc/[pid] exists. This might be a concern, but there is a workaround to allow for the admin to whitelist the NM user https://linux-audit.com/linux-system-hardening-adding-hidepid-to-proc/ bq. Also, the reacquistion code runs signalContainer once per second until the application finishes, this resulted in many docker inspect and container-executor calls, which are expensive operations. This worries me the most. Especially on nodes where there are lots of containers running concurrently, this could be pretty devastating for rolling upgrades. I'm not sure I have a strong opinion one way or another on retries vs. /proc for correctness, but I am worried about overloading the docker daemon with a large amount of inspect/ps calls. > Revisit liveliness checks for Docker containers > --- > > Key: YARN-8259 > URL: https://issues.apache.org/jira/browse/YARN-8259 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.0.2, 3.2.0, 3.1.1 >Reporter: Shane Kumpf >Assignee: Shane Kumpf >Priority: Major > Labels: Docker > Attachments: YARN-8259.001.patch > > > As privileged containers may execute as a user that does not match the YARN > run as user, sending the null signal for liveliness checks could fail. We > need to reconsider how liveliness checks are handled in the Docker case. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8179) Preemption does not happen due to natural_termination_factor when DRF is used
[ https://issues.apache.org/jira/browse/YARN-8179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16483035#comment-16483035 ] Hudson commented on YARN-8179: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14247 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/14247/]) YARN-8179: Preemption does not happen due to natural_termination_factor (ericp: rev 0b4c44bdeef62945b592d5761666ad026b629c0b) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/TestProportionalCapacityPreemptionPolicyInterQueueWithDRF.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/PreemptableResourceCalculator.java > Preemption does not happen due to natural_termination_factor when DRF is used > - > > Key: YARN-8179 > URL: https://issues.apache.org/jira/browse/YARN-8179 > Project: Hadoop YARN > Issue Type: Bug >Reporter: kyungwan nam >Assignee: kyungwan nam >Priority: Major > Attachments: YARN-8179.001.patch, YARN-8179.002.patch, > YARN-8179.003.patch > > > cluster > * DominantResourceCalculator > * QueueA : 50 (capacity) ~ 100 (max capacity) > * QueueB : 50 (capacity) ~ 50 (max capacity) > all of resources have been allocated to QueueA. (all Vcores are allocated to > QueueA) > if App1 is submitted to QueueB, over-utilized QueueA should be preempted. > but, I’ve met the problem, which preemption does not happen. it caused that > App1 AM can not allocated. > when App1 is submitted, pending resources for asking App1 AM would be > > so, Vcores which need to be preempted from QueueB should be 1. > but, it can be 0 due to natural_termination_factor (default is 0.2) > we should guarantee that resources not to be 0 even though applying > natural_termination_factor -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8259) Revisit liveliness checks for Docker containers
[ https://issues.apache.org/jira/browse/YARN-8259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16483032#comment-16483032 ] Jason Lowe commented on YARN-8259: -- I do agree with Shane that there are already subsystems that currently rely on /proc to function properly, e.g.: container resource monitoring. Hiding pids will break those subsystems. > Revisit liveliness checks for Docker containers > --- > > Key: YARN-8259 > URL: https://issues.apache.org/jira/browse/YARN-8259 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.0.2, 3.2.0, 3.1.1 >Reporter: Shane Kumpf >Assignee: Shane Kumpf >Priority: Major > Labels: Docker > Attachments: YARN-8259.001.patch > > > As privileged containers may execute as a user that does not match the YARN > run as user, sending the null signal for liveliness checks could fail. We > need to reconsider how liveliness checks are handled in the Docker case. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8259) Revisit liveliness checks for Docker containers
[ https://issues.apache.org/jira/browse/YARN-8259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16483029#comment-16483029 ] Jason Lowe commented on YARN-8259: -- Ah comment race with [~eyang], I'll defer until his concerns are addressed. > Revisit liveliness checks for Docker containers > --- > > Key: YARN-8259 > URL: https://issues.apache.org/jira/browse/YARN-8259 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.0.2, 3.2.0, 3.1.1 >Reporter: Shane Kumpf >Assignee: Shane Kumpf >Priority: Major > Labels: Docker > Attachments: YARN-8259.001.patch > > > As privileged containers may execute as a user that does not match the YARN > run as user, sending the null signal for liveliness checks could fail. We > need to reconsider how liveliness checks are handled in the Docker case. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8259) Revisit liveliness checks for Docker containers
[ https://issues.apache.org/jira/browse/YARN-8259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16483028#comment-16483028 ] Jason Lowe commented on YARN-8259: -- Thanks for the patch! +1 lgtm. I'll commit this tomorrow if there are no objections. > Revisit liveliness checks for Docker containers > --- > > Key: YARN-8259 > URL: https://issues.apache.org/jira/browse/YARN-8259 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.0.2, 3.2.0, 3.1.1 >Reporter: Shane Kumpf >Assignee: Shane Kumpf >Priority: Major > Labels: Docker > Attachments: YARN-8259.001.patch > > > As privileged containers may execute as a user that does not match the YARN > run as user, sending the null signal for liveliness checks could fail. We > need to reconsider how liveliness checks are handled in the Docker case. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8326) Yarn 3.0 seems runs slower than Yarn 2.6
[ https://issues.apache.org/jira/browse/YARN-8326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16483027#comment-16483027 ] Hsin-Liang Huang commented on YARN-8326: HI Eric, I tried the suggestion and changed the setting. The result on running {color:#14892c}time hadoop jar /usr/hdp/3.0.0.0-829/hadoop-yarn/hadoop-yarn-applications-unmanaged-am-launcher-3.0.0.3.0.0.0-829.jar Client -classpath simple-yarn-app-1.1.0.jar -cmd "java com.hortonworks.simpleyarnapp.ApplicationMaster /bin/date 8"{color} is 20s, 15s and 15s (I ran it 3 times). It didn't get better if it's not worse. (It was 14, 15 seconds before). > Yarn 3.0 seems runs slower than Yarn 2.6 > > > Key: YARN-8326 > URL: https://issues.apache.org/jira/browse/YARN-8326 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 3.0.0 > Environment: This is the yarn-site.xml for 3.0. > > > > hadoop.registry.dns.bind-port > 5353 > > > hadoop.registry.dns.domain-name > hwx.site > > > hadoop.registry.dns.enabled > true > > > hadoop.registry.dns.zone-mask > 255.255.255.0 > > > hadoop.registry.dns.zone-subnet > 172.17.0.0 > > > manage.include.files > false > > > yarn.acl.enable > false > > > yarn.admin.acl > yarn > > > yarn.client.nodemanager-connect.max-wait-ms > 6 > > > yarn.client.nodemanager-connect.retry-interval-ms > 1 > > > yarn.http.policy > HTTP_ONLY > > > yarn.log-aggregation-enable > false > > > yarn.log-aggregation.retain-seconds > 2592000 > > > yarn.log.server.url > > [http://xx:19888/jobhistory/logs|http://whiny2.fyre.ibm.com:19888/jobhistory/logs] > > > yarn.log.server.web-service.url > > [http://xx:8188/ws/v1/applicationhistory|http://whiny2.fyre.ibm.com:8188/ws/v1/applicationhistory] > > > yarn.node-labels.enabled > false > > > yarn.node-labels.fs-store.retry-policy-spec > 2000, 500 > > > yarn.node-labels.fs-store.root-dir > /system/yarn/node-labels > > > yarn.nodemanager.address > 0.0.0.0:45454 > > > yarn.nodemanager.admin-env > MALLOC_ARENA_MAX=$MALLOC_ARENA_MAX > > > yarn.nodemanager.aux-services > mapreduce_shuffle,spark2_shuffle,timeline_collector > > > yarn.nodemanager.aux-services.mapreduce_shuffle.class > org.apache.hadoop.mapred.ShuffleHandler > > > yarn.nodemanager.aux-services.spark2_shuffle.class > org.apache.spark.network.yarn.YarnShuffleService > > > yarn.nodemanager.aux-services.spark2_shuffle.classpath > /usr/spark2/aux/* > > > yarn.nodemanager.aux-services.spark_shuffle.class > org.apache.spark.network.yarn.YarnShuffleService > > > yarn.nodemanager.aux-services.timeline_collector.class > > org.apache.hadoop.yarn.server.timelineservice.collector.PerNodeTimelineCollectorsAuxService > > > yarn.nodemanager.bind-host > 0.0.0.0 > > > yarn.nodemanager.container-executor.class > > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor > > > yarn.nodemanager.container-metrics.unregister-delay-ms > 6 > > > yarn.nodemanager.container-monitor.interval-ms > 3000 > > > yarn.nodemanager.delete.debug-delay-sec > 0 > > > > yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage > 90 > > > yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb > 1000 > > > yarn.nodemanager.disk-health-checker.min-healthy-disks > 0.25 > > > yarn.nodemanager.health-checker.interval-ms > 135000 > > > yarn.nodemanager.health-checker.script.timeout-ms > 6 > > > > yarn.nodemanager.linux-container-executor.cgroups.strict-resource-usage > false > > > yarn.nodemanager.linux-container-executor.group > hadoop > > > > yarn.nodemanager.linux-container-executor.nonsecure-mode.limit-users > false > > > yarn.nodemanager.local-dirs > /hadoop/yarn/local > > > yarn.nodemanager.log-aggregation.compression-type > gz > > > yarn.nodemanager.log-aggregation.debug-enabled > false > > > yarn.nodemanager.log-aggregation.num-log-files-per-app > 30 > > > > yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds > 3600 > > > yarn.nodemanager.log-dirs > /hadoop/yarn/log > > > yarn.nodemanager.log.retain-seconds > 604800 > > > yarn.nodemanager.pmem-check-enabled > false > > > yarn.nodemanager.recovery.dir > /var/log/hadoop-yarn/nodemanager/recovery-state > > > yarn.nodemanager.recovery.enabled > true > > > yarn.nodemanager.recovery.supervised > true > > > yarn.nodemanager.remote-app-log-dir > /app-logs > > > yarn.nodemanager.remote-app-log-dir-suffix > logs > > > yarn.nodemanager.resource-plugins > > > > yarn.nodemanager.resource-plugins.gpu.allowed-gpu-devices > auto > > > yarn.nodemanager.resource
[jira] [Commented] (YARN-8173) [Router] Implement missing FederationClientInterceptor#getApplications()
[ https://issues.apache.org/jira/browse/YARN-8173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16483021#comment-16483021 ] Yiran Wu commented on YARN-8173: Thanks [~giovanni.fumarola]. I looked at [YARN-7010|https://issues.apache.org/jira/browse/YARN-7010] logic, the function is more powerful. I'll modify the logic based on [YARN-7010|https://issues.apache.org/jira/browse/YARN-7010]. > [Router] Implement missing FederationClientInterceptor#getApplications() > > > Key: YARN-8173 > URL: https://issues.apache.org/jira/browse/YARN-8173 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.0.0 >Reporter: Yiran Wu >Assignee: Yiran Wu >Priority: Major > Attachments: YARN-8173.001.patch, YARN-8173.002.patch, > YARN-8173.003.patch > > > oozie dependent method Implement > {code:java} > getApplications() > getDeglationToken() > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8336) Fix potential connection leak in SchedConfCLI and YarnWebServiceUtils
[ https://issues.apache.org/jira/browse/YARN-8336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Giovanni Matteo Fumarola updated YARN-8336: --- Description: Missing ClientResponse.close and Client.destroy can lead to a connection leak. > Fix potential connection leak in SchedConfCLI and YarnWebServiceUtils > - > > Key: YARN-8336 > URL: https://issues.apache.org/jira/browse/YARN-8336 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Giovanni Matteo Fumarola >Assignee: Giovanni Matteo Fumarola >Priority: Major > Attachments: YARN-8336.v1.patch > > > Missing ClientResponse.close and Client.destroy can lead to a connection leak. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8336) Fix potential connection leak in SchedConfCLI and YarnWebServiceUtils
[ https://issues.apache.org/jira/browse/YARN-8336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Giovanni Matteo Fumarola updated YARN-8336: --- Attachment: YARN-8336.v1.patch > Fix potential connection leak in SchedConfCLI and YarnWebServiceUtils > - > > Key: YARN-8336 > URL: https://issues.apache.org/jira/browse/YARN-8336 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Giovanni Matteo Fumarola >Assignee: Giovanni Matteo Fumarola >Priority: Major > Attachments: YARN-8336.v1.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8336) Fix potential connection leak in SchedConfCLI and YarnWebServiceUtils
Giovanni Matteo Fumarola created YARN-8336: -- Summary: Fix potential connection leak in SchedConfCLI and YarnWebServiceUtils Key: YARN-8336 URL: https://issues.apache.org/jira/browse/YARN-8336 Project: Hadoop YARN Issue Type: Bug Reporter: Giovanni Matteo Fumarola Assignee: Giovanni Matteo Fumarola -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8173) [Router] Implement missing FederationClientInterceptor#getApplications()
[ https://issues.apache.org/jira/browse/YARN-8173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482993#comment-16482993 ] Giovanni Matteo Fumarola commented on YARN-8173: {\{getApplications}} is a bit more complicated than the logic you implemented. Please take a look at YARN-7010 for the full context of the logic behind \{{getApps}}. > [Router] Implement missing FederationClientInterceptor#getApplications() > > > Key: YARN-8173 > URL: https://issues.apache.org/jira/browse/YARN-8173 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.0.0 >Reporter: Yiran Wu >Assignee: Yiran Wu >Priority: Major > Attachments: YARN-8173.001.patch, YARN-8173.002.patch, > YARN-8173.003.patch > > > oozie dependent method Implement > {code:java} > getApplications() > getDeglationToken() > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8259) Revisit liveliness checks for Docker containers
[ https://issues.apache.org/jira/browse/YARN-8259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482992#comment-16482992 ] Eric Yang edited comment on YARN-8259 at 5/21/18 8:15 PM: -- If I am not mistaken, DockerContainerRuntime is running as part of node manager. If hidepid option is used by system administrator, yarn user might not have rights to check if /proc/[pid] exists. We might need to create a LCE operation to perform the check, if we are going with the suggested pid file check path. I prefer the docker inspect command path with retry logic. In a non-blocking IO system, it is hard to avoid coding logic for retries. The investment will pay off in the long run, when each retry value is defined and optimized to make the system reliable and robust. was (Author: eyang): If I am not mistaken, DockerContainerRuntime is running as part of node manager. If hidepid option is used by system administrator, yarn user might not have rights to check if /proc/[pid] exists. We might need to create a LCE operation to perform the check, if we are going with the suggested pid file check path. I still prefers the docker inspect command path with retry logic. In a non-blocking IO system, it is hard to avoid coding logic for retries. The investment will pay off in the long run, when each retry value is defined and optimized to make the system reliable and robust. > Revisit liveliness checks for Docker containers > --- > > Key: YARN-8259 > URL: https://issues.apache.org/jira/browse/YARN-8259 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.0.2, 3.2.0, 3.1.1 >Reporter: Shane Kumpf >Assignee: Shane Kumpf >Priority: Major > Labels: Docker > Attachments: YARN-8259.001.patch > > > As privileged containers may execute as a user that does not match the YARN > run as user, sending the null signal for liveliness checks could fail. We > need to reconsider how liveliness checks are handled in the Docker case. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8259) Revisit liveliness checks for Docker containers
[ https://issues.apache.org/jira/browse/YARN-8259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482992#comment-16482992 ] Eric Yang commented on YARN-8259: - If I am not mistaken, DockerContainerRuntime is running as part of node manager. If hidepid option is used by system administrator, yarn user might not have rights to check if /proc/[pid] exists. We might need to create a LCE operation to perform the check, if we are going with the suggested pid file check path. I still prefers the docker inspect command path with retry logic. In a non-blocking IO system, it is hard to avoid coding logic for retries. The investment will pay off in the long run, when each retry value is defined and optimized to make the system reliable and robust. > Revisit liveliness checks for Docker containers > --- > > Key: YARN-8259 > URL: https://issues.apache.org/jira/browse/YARN-8259 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.0.2, 3.2.0, 3.1.1 >Reporter: Shane Kumpf >Assignee: Shane Kumpf >Priority: Major > Labels: Docker > Attachments: YARN-8259.001.patch > > > As privileged containers may execute as a user that does not match the YARN > run as user, sending the null signal for liveliness checks could fail. We > need to reconsider how liveliness checks are handled in the Docker case. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8173) [Router] FederationClientInterceptor#
[ https://issues.apache.org/jira/browse/YARN-8173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Giovanni Matteo Fumarola updated YARN-8173: --- Summary: [Router] FederationClientInterceptor# (was: oozie dependent ClientAPI Implement ) > [Router] FederationClientInterceptor# > - > > Key: YARN-8173 > URL: https://issues.apache.org/jira/browse/YARN-8173 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.0.0 >Reporter: Yiran Wu >Assignee: Yiran Wu >Priority: Major > Attachments: YARN-8173.001.patch, YARN-8173.002.patch, > YARN-8173.003.patch > > > oozie dependent method Implement > {code:java} > getApplications() > getDeglationToken() > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8173) [Router] Implement missing FederationClientInterceptor#getApplications()
[ https://issues.apache.org/jira/browse/YARN-8173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Giovanni Matteo Fumarola updated YARN-8173: --- Summary: [Router] Implement missing FederationClientInterceptor#getApplications() (was: [Router] FederationClientInterceptor#) > [Router] Implement missing FederationClientInterceptor#getApplications() > > > Key: YARN-8173 > URL: https://issues.apache.org/jira/browse/YARN-8173 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.0.0 >Reporter: Yiran Wu >Assignee: Yiran Wu >Priority: Major > Attachments: YARN-8173.001.patch, YARN-8173.002.patch, > YARN-8173.003.patch > > > oozie dependent method Implement > {code:java} > getApplications() > getDeglationToken() > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8173) oozie dependent ClientAPI Implement
[ https://issues.apache.org/jira/browse/YARN-8173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482986#comment-16482986 ] Giovanni Matteo Fumarola commented on YARN-8173: Moved under YARN-7402 and updated the title. > oozie dependent ClientAPI Implement > > > Key: YARN-8173 > URL: https://issues.apache.org/jira/browse/YARN-8173 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.0.0 >Reporter: Yiran Wu >Assignee: Yiran Wu >Priority: Major > Attachments: YARN-8173.001.patch, YARN-8173.002.patch, > YARN-8173.003.patch > > > oozie dependent method Implement > {code:java} > getApplications() > getDeglationToken() > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8173) oozie dependent ClientAPI Implement
[ https://issues.apache.org/jira/browse/YARN-8173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Giovanni Matteo Fumarola updated YARN-8173: --- Issue Type: Sub-task (was: Task) Parent: YARN-7402 > oozie dependent ClientAPI Implement > > > Key: YARN-8173 > URL: https://issues.apache.org/jira/browse/YARN-8173 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.0.0 >Reporter: Yiran Wu >Assignee: Yiran Wu >Priority: Major > Attachments: YARN-8173.001.patch, YARN-8173.002.patch, > YARN-8173.003.patch > > > oozie dependent method Implement > {code:java} > getApplications() > getDeglationToken() > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-8173) oozie dependent ClientAPI Implement
[ https://issues.apache.org/jira/browse/YARN-8173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yiran Wu reassigned YARN-8173: -- Assignee: Yiran Wu > oozie dependent ClientAPI Implement > > > Key: YARN-8173 > URL: https://issues.apache.org/jira/browse/YARN-8173 > Project: Hadoop YARN > Issue Type: Task >Affects Versions: 3.0.0 >Reporter: Yiran Wu >Assignee: Yiran Wu >Priority: Major > Attachments: YARN-8173.001.patch, YARN-8173.002.patch, > YARN-8173.003.patch > > > oozie dependent method Implement > {code:java} > getApplications() > getDeglationToken() > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8173) oozie dependent ClientAPI Implement
[ https://issues.apache.org/jira/browse/YARN-8173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482978#comment-16482978 ] Yiran Wu commented on YARN-8173: [~giovanni.fumarola], do you mind taking a look to this patch? > oozie dependent ClientAPI Implement > > > Key: YARN-8173 > URL: https://issues.apache.org/jira/browse/YARN-8173 > Project: Hadoop YARN > Issue Type: Task >Affects Versions: 3.0.0 >Reporter: Yiran Wu >Priority: Major > Attachments: YARN-8173.001.patch, YARN-8173.002.patch, > YARN-8173.003.patch > > > oozie dependent method Implement > {code:java} > getApplications() > getDeglationToken() > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8334) Fix potential connection leak in GPGUtils
[ https://issues.apache.org/jira/browse/YARN-8334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482967#comment-16482967 ] Giovanni Matteo Fumarola commented on YARN-8334: [~botong] please take a look. > Fix potential connection leak in GPGUtils > - > > Key: YARN-8334 > URL: https://issues.apache.org/jira/browse/YARN-8334 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Giovanni Matteo Fumarola >Assignee: Giovanni Matteo Fumarola >Priority: Minor > Attachments: YARN-8334-YARN-7402.v1.patch > > > Missing ClientResponse.close and Client.destroy can lead to a connection leak. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8334) Fix potential connection leak in GPGUtils
[ https://issues.apache.org/jira/browse/YARN-8334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Giovanni Matteo Fumarola updated YARN-8334: --- Description: Missing ClientResponse.close and Client.destroy can lead to a connection leak. > Fix potential connection leak in GPGUtils > - > > Key: YARN-8334 > URL: https://issues.apache.org/jira/browse/YARN-8334 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Giovanni Matteo Fumarola >Assignee: Giovanni Matteo Fumarola >Priority: Minor > Attachments: YARN-8334-YARN-7402.v1.patch > > > Missing ClientResponse.close and Client.destroy can lead to a connection leak. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8334) Fix potential connection leak in GPGUtils
[ https://issues.apache.org/jira/browse/YARN-8334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Giovanni Matteo Fumarola updated YARN-8334: --- Attachment: YARN-8334-YARN-7402.v1.patch > Fix potential connection leak in GPGUtils > - > > Key: YARN-8334 > URL: https://issues.apache.org/jira/browse/YARN-8334 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Giovanni Matteo Fumarola >Assignee: Giovanni Matteo Fumarola >Priority: Minor > Attachments: YARN-8334-YARN-7402.v1.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8335) Privileged docker containers' jobSubmitDir does not get successfully cleaned up
Eric Badger created YARN-8335: - Summary: Privileged docker containers' jobSubmitDir does not get successfully cleaned up Key: YARN-8335 URL: https://issues.apache.org/jira/browse/YARN-8335 Project: Hadoop YARN Issue Type: Sub-task Reporter: Eric Badger The jobSubmitDir directory is owned by root and is being cleaned up as the submitting user, which appears to be why it is failing to clean up. {noformat} 2018-05-21 19:46:15,124 WARN [DeletionService #0] privileged.PrivilegedOperationExecutor (PrivilegedOperationExecutor.java:executePrivilegedOperation(174)) - Shell execution returned exit code: 255. Privileged Execution Operation Stderr: Stdout: main : command provided 3 main : run as user is ebadger main : requested yarn user is ebadger failed to unlink /tmp/hadoop-local3/usercache/ebadger/appcache/application_1526931492976_0007/container_1526931492976_0007_01_01/jobSubmitDir/job.split: Permission denied failed to unlink /tmp/hadoop-local3/usercache/ebadger/appcache/application_1526931492976_0007/container_1526931492976_0007_01_01/jobSubmitDir/job.splitmetainfo: Permission denied failed to rmdir jobSubmitDir: Directory not empty Error while deleting /tmp/hadoop-local3/usercache/ebadger/appcache/application_1526931492976_0007/container_1526931492976_0007_01_01: 39 (Directory not empty) Full command array for failed execution: [/hadoop-3.2.0-SNAPSHOT/bin/container-executor, ebadger, ebadger, 3, /tmp/hadoop-local3/usercache/ebadger/appcache/application_1526931492976_0007/container_1526931492976_0007_01_01] 2018-05-21 19:46:15,124 ERROR [DeletionService #0] nodemanager.LinuxContainerExecutor (LinuxContainerExecutor.java:deleteAsUser(848)) - DeleteAsUser for /tmp/hadoop-local3/usercache/ebadger/appcache/application_1526931492976_0007/container_1526931492976_0007_01_01 returned with exit code: 255 org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException: ExitCodeException exitCode=255: at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:180) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:206) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.deleteAsUser(LinuxContainerExecutor.java:844) at org.apache.hadoop.yarn.server.nodemanager.containermanager.deletion.task.FileDeletionTask.run(FileDeletionTask.java:135) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: ExitCodeException exitCode=255: at org.apache.hadoop.util.Shell.runCommand(Shell.java:1009) at org.apache.hadoop.util.Shell.run(Shell.java:902) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1227) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:152) ... 10 more {noformat} {noformat} [foo@bar hadoop]$ ls -l /tmp/hadoop-local3/usercache/ebadger/appcache/application_1526931492976_0007/container_1526931492976_0007_01_01/ total 4 drwxr-sr-x 2 root users 4096 May 21 19:45 jobSubmitDir {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8334) Fix potential connection leak in GPGUtils
Giovanni Matteo Fumarola created YARN-8334: -- Summary: Fix potential connection leak in GPGUtils Key: YARN-8334 URL: https://issues.apache.org/jira/browse/YARN-8334 Project: Hadoop YARN Issue Type: Sub-task Reporter: Giovanni Matteo Fumarola Assignee: Giovanni Matteo Fumarola -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8041) [Router] Federation: routing some missing REST invocations transparently to multiple RMs
[ https://issues.apache.org/jira/browse/YARN-8041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482955#comment-16482955 ] Yiran Wu edited comment on YARN-8041 at 5/21/18 7:51 PM: - [~giovanni.fumarola] Ok, Thanks tell me this and review code. The previous Patch has a little problem. I'll fix it. was (Author: yiran): Ok, Thanks tell me this and review code. The previous Patch has a little problem. I'll fix it. > [Router] Federation: routing some missing REST invocations transparently to > multiple RMs > > > Key: YARN-8041 > URL: https://issues.apache.org/jira/browse/YARN-8041 > Project: Hadoop YARN > Issue Type: Sub-task > Components: federation, router >Reporter: Yiran Wu >Assignee: Yiran Wu >Priority: Minor > Attachments: YARN-8041.001.patch, YARN-8041.002.patch, > YARN-8041.003.patch, YARN-8041.004.patch > > > This Jira tracks the implementation of some missing REST invocation in > FederationInterceptorREST: > * getAppStatistics > * getNodeToLabels > * getLabelsOnNode > * updateApplicationPriority > * getAppQueue > * updateAppQueue > * getAppTimeout > * getAppTimeouts > * updateApplicationTimeout > * getAppAttempts > * getAppAttempt > * getContainers > * getContainer -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4677) RMNodeResourceUpdateEvent update from scheduler can lead to race condition
[ https://issues.apache.org/jira/browse/YARN-4677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482962#comment-16482962 ] Kanwaljeet Sachdev commented on YARN-4677: -- [~wilfreds], thanks for the patch and the context on it. The diffs look good. I guess just adding little more description that a NPE could occur because the heartbeat message might arrive after decommissioned along with stack trace will be good to have full context. The diffs look good, adding the trace will be beneficial in the Jira here. > RMNodeResourceUpdateEvent update from scheduler can lead to race condition > -- > > Key: YARN-4677 > URL: https://issues.apache.org/jira/browse/YARN-4677 > Project: Hadoop YARN > Issue Type: Sub-task > Components: graceful, resourcemanager, scheduler >Affects Versions: 2.7.1 >Reporter: Brook Zhou >Assignee: Wilfred Spiegelenburg >Priority: Major > Attachments: YARN-4677-branch-2.001.patch, > YARN-4677-branch-2.002.patch, YARN-4677.01.patch > > > When a node is in decommissioning state, there is time window between > completedContainer() and RMNodeResourceUpdateEvent get handled in > scheduler.nodeUpdate (YARN-3223). > So if a scheduling effort happens within this window, the new container could > still get allocated on this node. Even worse case is if scheduling effort > happen after RMNodeResourceUpdateEvent sent out but before it is propagated > to SchedulerNode - then the total resource is lower than used resource and > available resource is a negative value. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8041) [Router] Federation: routing some missing REST invocations transparently to multiple RMs
[ https://issues.apache.org/jira/browse/YARN-8041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482955#comment-16482955 ] Yiran Wu commented on YARN-8041: Ok, Thanks tell me this and review code. The previous Patch has a little problem. I'll fix it. > [Router] Federation: routing some missing REST invocations transparently to > multiple RMs > > > Key: YARN-8041 > URL: https://issues.apache.org/jira/browse/YARN-8041 > Project: Hadoop YARN > Issue Type: Sub-task > Components: federation, router >Reporter: Yiran Wu >Assignee: Yiran Wu >Priority: Minor > Attachments: YARN-8041.001.patch, YARN-8041.002.patch, > YARN-8041.003.patch, YARN-8041.004.patch > > > This Jira tracks the implementation of some missing REST invocation in > FederationInterceptorREST: > * getAppStatistics > * getNodeToLabels > * getLabelsOnNode > * updateApplicationPriority > * getAppQueue > * updateAppQueue > * getAppTimeout > * getAppTimeouts > * updateApplicationTimeout > * getAppAttempts > * getAppAttempt > * getContainers > * getContainer -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8041) [Router] Federation: routing some missing REST invocations transparently to multiple RMs
[ https://issues.apache.org/jira/browse/YARN-8041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482950#comment-16482950 ] Giovanni Matteo Fumarola commented on YARN-8041: [~yiran] thanks for the patch. You should not change the status of the Jira everytime you submit a patch. The system will pick up the latest patch in automatic > [Router] Federation: routing some missing REST invocations transparently to > multiple RMs > > > Key: YARN-8041 > URL: https://issues.apache.org/jira/browse/YARN-8041 > Project: Hadoop YARN > Issue Type: Sub-task > Components: federation, router >Reporter: Yiran Wu >Assignee: Yiran Wu >Priority: Minor > Attachments: YARN-8041.001.patch, YARN-8041.002.patch, > YARN-8041.003.patch, YARN-8041.004.patch > > > This Jira tracks the implementation of some missing REST invocation in > FederationInterceptorREST: > * getAppStatistics > * getNodeToLabels > * getLabelsOnNode > * updateApplicationPriority > * getAppQueue > * updateAppQueue > * getAppTimeout > * getAppTimeouts > * updateApplicationTimeout > * getAppAttempts > * getAppAttempt > * getContainers > * getContainer -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8041) [Router] Federation: routing some missing REST invocations transparently to multiple RMs
[ https://issues.apache.org/jira/browse/YARN-8041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yiran Wu updated YARN-8041: --- Attachment: YARN-8041.004.patch > [Router] Federation: routing some missing REST invocations transparently to > multiple RMs > > > Key: YARN-8041 > URL: https://issues.apache.org/jira/browse/YARN-8041 > Project: Hadoop YARN > Issue Type: Sub-task > Components: federation, router >Reporter: Yiran Wu >Assignee: Yiran Wu >Priority: Minor > Attachments: YARN-8041.001.patch, YARN-8041.002.patch, > YARN-8041.003.patch, YARN-8041.004.patch > > > This Jira tracks the implementation of some missing REST invocation in > FederationInterceptorREST: > * getAppStatistics > * getNodeToLabels > * getLabelsOnNode > * updateApplicationPriority > * getAppQueue > * updateAppQueue > * getAppTimeout > * getAppTimeouts > * updateApplicationTimeout > * getAppAttempts > * getAppAttempt > * getContainers > * getContainer -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8041) [Router] Federation: routing some missing REST invocations transparently to multiple RMs
[ https://issues.apache.org/jira/browse/YARN-8041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yiran Wu updated YARN-8041: --- Attachment: (was: YARN-8041.004.patch) > [Router] Federation: routing some missing REST invocations transparently to > multiple RMs > > > Key: YARN-8041 > URL: https://issues.apache.org/jira/browse/YARN-8041 > Project: Hadoop YARN > Issue Type: Sub-task > Components: federation, router >Reporter: Yiran Wu >Assignee: Yiran Wu >Priority: Minor > Attachments: YARN-8041.001.patch, YARN-8041.002.patch, > YARN-8041.003.patch > > > This Jira tracks the implementation of some missing REST invocation in > FederationInterceptorREST: > * getAppStatistics > * getNodeToLabels > * getLabelsOnNode > * updateApplicationPriority > * getAppQueue > * updateAppQueue > * getAppTimeout > * getAppTimeouts > * updateApplicationTimeout > * getAppAttempts > * getAppAttempt > * getContainers > * getContainer -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8041) [Router] Federation: routing some missing REST invocations transparently to multiple RMs
[ https://issues.apache.org/jira/browse/YARN-8041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482912#comment-16482912 ] genericqa commented on YARN-8041: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 29s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 3 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 48s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 0s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 20s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 51s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 40s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 1s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 5s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 14s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 46s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server: The patch generated 54 new + 18 unchanged - 0 fixed = 72 total (was 18) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 33s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 22s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 3s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 14s{color} | {color:green} hadoop-yarn-server-common in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 70m 50s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 21s{color} | {color:green} hadoop-yarn-server-router in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 24s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}138m 48s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.TestRMEmbeddedElector | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd | | JIRA Issue | YARN-8041 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12924375/YARN-8041.004.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 1e706175c57b 4.4.
[jira] [Comment Edited] (YARN-8326) Yarn 3.0 seems runs slower than Yarn 2.6
[ https://issues.apache.org/jira/browse/YARN-8326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482905#comment-16482905 ] Eric Yang edited comment on YARN-8326 at 5/21/18 6:57 PM: -- This appears to be introduced by YARN-5662 by turning on container monitor from false to true. This feature is used by opportunistic container scheduling and pre-emption to gather statics of the containers to make scheduling decisions. You can disable this feature by: {code} yarn.nodemanager.container-monitor.enabled false {code} Or reduce the stats collection time from 3 seconds to 300 milliseconds (use more system resources, but faster scheduling): {code} yarn.nodemanager.container-monitor.interval-ms 300 {code} Timer optimization might be possible to the work done in YARN-2883. The queuing and scheduling of containers is based on monitoring thread information. If it takes several seconds to wait for information to become available before next container is scheduled, then it can introduced artificial delay to rapidly launching containers. The timer value can not be smaller than certain value otherwise monitoring/container forking both will tax cpu resources too much. If your workloads take less time than container scheduling/launch, then you might need to revisit how to decrease the containers to launch, and increase the work to run in containers. [~hlhu...@us.ibm.com] Can you confirm that those settings changes the benchmark result? was (Author: eyang): This appears to be introduced by YARN-5662 by turning on container monitor from false to true. This feature is used by opportunistic container scheduling and pre-emption to gather statics of the containers to make scheduling decisions. You can disable this feature by: {code} yarn.nodemanager.container-monitor.enabled false {code} Or reduce the stats collection time from 3 seconds to 300 milliseconds (use more system resources, but faster scheduling): {code} yarn.nodemanager.container-monitor.interval-ms 300 {code} Timer optimization might be possible to the work done in YARN-2883. The queuing and scheduling of containers is based on monitoring thread information. If it takes several seconds to wait for information to become available before next container is scheduled, then it can introduced artificial delay to rapidly launching containers. The timer value can not be smaller than certain value otherwise monitoring/container forking both will tax cpu resources too much much. If your workloads take less time than container scheduling/launch, then you might need to revisit how to decrease the containers to launch, and increase the work to run in containers. [~hlhu...@us.ibm.com] Can you confirm that those settings changes the benchmark result? > Yarn 3.0 seems runs slower than Yarn 2.6 > > > Key: YARN-8326 > URL: https://issues.apache.org/jira/browse/YARN-8326 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 3.0.0 > Environment: This is the yarn-site.xml for 3.0. > > > > hadoop.registry.dns.bind-port > 5353 > > > hadoop.registry.dns.domain-name > hwx.site > > > hadoop.registry.dns.enabled > true > > > hadoop.registry.dns.zone-mask > 255.255.255.0 > > > hadoop.registry.dns.zone-subnet > 172.17.0.0 > > > manage.include.files > false > > > yarn.acl.enable > false > > > yarn.admin.acl > yarn > > > yarn.client.nodemanager-connect.max-wait-ms > 6 > > > yarn.client.nodemanager-connect.retry-interval-ms > 1 > > > yarn.http.policy > HTTP_ONLY > > > yarn.log-aggregation-enable > false > > > yarn.log-aggregation.retain-seconds > 2592000 > > > yarn.log.server.url > > [http://xx:19888/jobhistory/logs|http://whiny2.fyre.ibm.com:19888/jobhistory/logs] > > > yarn.log.server.web-service.url > > [http://xx:8188/ws/v1/applicationhistory|http://whiny2.fyre.ibm.com:8188/ws/v1/applicationhistory] > > > yarn.node-labels.enabled > false > > > yarn.node-labels.fs-store.retry-policy-spec > 2000, 500 > > > yarn.node-labels.fs-store.root-dir > /system/yarn/node-labels > > > yarn.nodemanager.address > 0.0.0.0:45454 > > > yarn.nodemanager.admin-env > MALLOC_ARENA_MAX=$MALLOC_ARENA_MAX > > > yarn.nodemanager.aux-services > mapreduce_shuffle,spark2_shuffle,timeline_collector > > > yarn.nodemanager.aux-services.mapreduce_shuffle.class > org.apache.hadoop.mapred.ShuffleHandler > > > yarn.nodemanager.aux-services.spark2_shuffle.class > org.apache.spark.network.yarn.YarnShuffleService > > > yarn.nodemanager.aux-services.spark2_shuffle.classpath > /usr/spark2/aux/* > > > yarn.nodemanager
[jira] [Commented] (YARN-8326) Yarn 3.0 seems runs slower than Yarn 2.6
[ https://issues.apache.org/jira/browse/YARN-8326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482905#comment-16482905 ] Eric Yang commented on YARN-8326: - This appears to be introduced by YARN-5662 by turning on container monitor from false to true. This feature is used by opportunistic container scheduling and pre-emption to gather statics of the containers to make scheduling decisions. You can disable this feature by: {code} yarn.nodemanager.container-monitor.enabled false {code} Or reduce the stats collection time from 3 seconds to 300 milliseconds (use more system resources, but faster scheduling): {code} yarn.nodemanager.container-monitor.interval-ms 300 {code} Timer optimization might be possible to the work done in YARN-2883. The queuing and scheduling of containers is based on monitoring thread information. If it takes several seconds to wait for information to become available before next container is scheduled, then it can introduced artificial delay to rapidly launching containers. The timer value can not be smaller than certain value otherwise monitoring/container forking both will tax cpu resources too much much. If your workloads take less time than container scheduling/launch, then you might need to revisit how to decrease the containers to launch, and increase the work to run in containers. [~hlhu...@us.ibm.com] Can you confirm that those settings changes the benchmark result? > Yarn 3.0 seems runs slower than Yarn 2.6 > > > Key: YARN-8326 > URL: https://issues.apache.org/jira/browse/YARN-8326 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 3.0.0 > Environment: This is the yarn-site.xml for 3.0. > > > > hadoop.registry.dns.bind-port > 5353 > > > hadoop.registry.dns.domain-name > hwx.site > > > hadoop.registry.dns.enabled > true > > > hadoop.registry.dns.zone-mask > 255.255.255.0 > > > hadoop.registry.dns.zone-subnet > 172.17.0.0 > > > manage.include.files > false > > > yarn.acl.enable > false > > > yarn.admin.acl > yarn > > > yarn.client.nodemanager-connect.max-wait-ms > 6 > > > yarn.client.nodemanager-connect.retry-interval-ms > 1 > > > yarn.http.policy > HTTP_ONLY > > > yarn.log-aggregation-enable > false > > > yarn.log-aggregation.retain-seconds > 2592000 > > > yarn.log.server.url > > [http://xx:19888/jobhistory/logs|http://whiny2.fyre.ibm.com:19888/jobhistory/logs] > > > yarn.log.server.web-service.url > > [http://xx:8188/ws/v1/applicationhistory|http://whiny2.fyre.ibm.com:8188/ws/v1/applicationhistory] > > > yarn.node-labels.enabled > false > > > yarn.node-labels.fs-store.retry-policy-spec > 2000, 500 > > > yarn.node-labels.fs-store.root-dir > /system/yarn/node-labels > > > yarn.nodemanager.address > 0.0.0.0:45454 > > > yarn.nodemanager.admin-env > MALLOC_ARENA_MAX=$MALLOC_ARENA_MAX > > > yarn.nodemanager.aux-services > mapreduce_shuffle,spark2_shuffle,timeline_collector > > > yarn.nodemanager.aux-services.mapreduce_shuffle.class > org.apache.hadoop.mapred.ShuffleHandler > > > yarn.nodemanager.aux-services.spark2_shuffle.class > org.apache.spark.network.yarn.YarnShuffleService > > > yarn.nodemanager.aux-services.spark2_shuffle.classpath > /usr/spark2/aux/* > > > yarn.nodemanager.aux-services.spark_shuffle.class > org.apache.spark.network.yarn.YarnShuffleService > > > yarn.nodemanager.aux-services.timeline_collector.class > > org.apache.hadoop.yarn.server.timelineservice.collector.PerNodeTimelineCollectorsAuxService > > > yarn.nodemanager.bind-host > 0.0.0.0 > > > yarn.nodemanager.container-executor.class > > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor > > > yarn.nodemanager.container-metrics.unregister-delay-ms > 6 > > > yarn.nodemanager.container-monitor.interval-ms > 3000 > > > yarn.nodemanager.delete.debug-delay-sec > 0 > > > > yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage > 90 > > > yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb > 1000 > > > yarn.nodemanager.disk-health-checker.min-healthy-disks > 0.25 > > > yarn.nodemanager.health-checker.interval-ms > 135000 > > > yarn.nodemanager.health-checker.script.timeout-ms > 6 > > > > yarn.nodemanager.linux-container-executor.cgroups.strict-resource-usage > false > > > yarn.nodemanager.linux-container-executor.group > hadoop > > > > yarn.nodemanager.linux-container-executor.nonsecure-mode.limit-users > false > > > yarn.nodemanager.local-dirs > /hadoop/yarn/local > > > yarn.nodemanager.log-aggregation.compression-type > gz > > > yarn.nod
[jira] [Commented] (YARN-8179) Preemption does not happen due to natural_termination_factor when DRF is used
[ https://issues.apache.org/jira/browse/YARN-8179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482855#comment-16482855 ] Eric Payne commented on YARN-8179: -- {quote}Can we please change the {{newInstance}} call to {{Resources.none()}}? This will accommodate extensible resources. {quote} Nope. Sorry. Forget about that previous comment. It looks like \{{Resource.newInstance(0, 0)}} already covers extensible resources. I will commit this today or tomorrow. > Preemption does not happen due to natural_termination_factor when DRF is used > - > > Key: YARN-8179 > URL: https://issues.apache.org/jira/browse/YARN-8179 > Project: Hadoop YARN > Issue Type: Bug >Reporter: kyungwan nam >Assignee: kyungwan nam >Priority: Major > Attachments: YARN-8179.001.patch, YARN-8179.002.patch, > YARN-8179.003.patch > > > cluster > * DominantResourceCalculator > * QueueA : 50 (capacity) ~ 100 (max capacity) > * QueueB : 50 (capacity) ~ 50 (max capacity) > all of resources have been allocated to QueueA. (all Vcores are allocated to > QueueA) > if App1 is submitted to QueueB, over-utilized QueueA should be preempted. > but, I’ve met the problem, which preemption does not happen. it caused that > App1 AM can not allocated. > when App1 is submitted, pending resources for asking App1 AM would be > > so, Vcores which need to be preempted from QueueB should be 1. > but, it can be 0 due to natural_termination_factor (default is 0.2) > we should guarantee that resources not to be 0 even though applying > natural_termination_factor -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8179) Preemption does not happen due to natural_termination_factor when DRF is used
[ https://issues.apache.org/jira/browse/YARN-8179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482841#comment-16482841 ] Eric Payne edited comment on YARN-8179 at 5/21/18 6:22 PM: --- [~kyungwan nam]-, I am really sorry for the long delay, and I'm also very sorry that I do have one more request, even though I had previously approved the latest patch.- {code:java|title=PreemptableResourceCalculator#calculateResToObtainByPartitionForLeafQueues} if (Resources.greaterThan(rc, clusterResource, resToObtain, Resource.newInstance(0, 0))) {code} -Can we please change the {{newInstance}} call to {{Resources.none()}}? This will accommodate extensible resources.- was (Author: eepayne): [~kyungwan nam], I am really sorry for the long delay, and I'm also very sorry that I do have one more request, even though I had previously approved the latest patch. {code:title=PreemptableResourceCalculator#calculateResToObtainByPartitionForLeafQueues} if (Resources.greaterThan(rc, clusterResource, resToObtain, Resource.newInstance(0, 0))) {code} Can we please change the {{newInstance}} call to {{Resources.none()}}? This will accommodate extensible resources. > Preemption does not happen due to natural_termination_factor when DRF is used > - > > Key: YARN-8179 > URL: https://issues.apache.org/jira/browse/YARN-8179 > Project: Hadoop YARN > Issue Type: Bug >Reporter: kyungwan nam >Assignee: kyungwan nam >Priority: Major > Attachments: YARN-8179.001.patch, YARN-8179.002.patch, > YARN-8179.003.patch > > > cluster > * DominantResourceCalculator > * QueueA : 50 (capacity) ~ 100 (max capacity) > * QueueB : 50 (capacity) ~ 50 (max capacity) > all of resources have been allocated to QueueA. (all Vcores are allocated to > QueueA) > if App1 is submitted to QueueB, over-utilized QueueA should be preempted. > but, I’ve met the problem, which preemption does not happen. it caused that > App1 AM can not allocated. > when App1 is submitted, pending resources for asking App1 AM would be > > so, Vcores which need to be preempted from QueueB should be 1. > but, it can be 0 due to natural_termination_factor (default is 0.2) > we should guarantee that resources not to be 0 even though applying > natural_termination_factor -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8179) Preemption does not happen due to natural_termination_factor when DRF is used
[ https://issues.apache.org/jira/browse/YARN-8179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482841#comment-16482841 ] Eric Payne commented on YARN-8179: -- [~kyungwan nam], I am really sorry for the long delay, and I'm also very sorry that I do have one more request, even though I had previously approved the latest patch. {code:title=PreemptableResourceCalculator#calculateResToObtainByPartitionForLeafQueues} if (Resources.greaterThan(rc, clusterResource, resToObtain, Resource.newInstance(0, 0))) {code} Can we please change the {{newInstance}} call to {{Resources.none()}}? This will accommodate extensible resources. > Preemption does not happen due to natural_termination_factor when DRF is used > - > > Key: YARN-8179 > URL: https://issues.apache.org/jira/browse/YARN-8179 > Project: Hadoop YARN > Issue Type: Bug >Reporter: kyungwan nam >Assignee: kyungwan nam >Priority: Major > Attachments: YARN-8179.001.patch, YARN-8179.002.patch, > YARN-8179.003.patch > > > cluster > * DominantResourceCalculator > * QueueA : 50 (capacity) ~ 100 (max capacity) > * QueueB : 50 (capacity) ~ 50 (max capacity) > all of resources have been allocated to QueueA. (all Vcores are allocated to > QueueA) > if App1 is submitted to QueueB, over-utilized QueueA should be preempted. > but, I’ve met the problem, which preemption does not happen. it caused that > App1 AM can not allocated. > when App1 is submitted, pending resources for asking App1 AM would be > > so, Vcores which need to be preempted from QueueB should be 1. > but, it can be 0 due to natural_termination_factor (default is 0.2) > we should guarantee that resources not to be 0 even though applying > natural_termination_factor -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8206) Sending a kill does not immediately kill docker containers
[ https://issues.apache.org/jira/browse/YARN-8206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482809#comment-16482809 ] genericqa commented on YARN-8206: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 28s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 46s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 49s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 20s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 30s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 43s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 51s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 41s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 19m 56s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 19s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 71m 19s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd | | JIRA Issue | YARN-8206 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12924373/YARN-8206.011.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux bbcb780999a2 4.4.0-89-generic #112-Ubuntu SMP Mon Jul 31 19:38:41 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / f48fec8 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_162 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/20811/testReport/ | | Max. process+thread count | 435 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/20811/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Sending a kill does not immediately kill docker
[jira] [Commented] (YARN-8259) Revisit liveliness checks for Docker containers
[ https://issues.apache.org/jira/browse/YARN-8259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482729#comment-16482729 ] genericqa commented on YARN-8259: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 35s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 8s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 56s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 25s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 38s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 10s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 52s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 53s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 53s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 27s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 19m 32s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 23s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 76m 18s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd | | JIRA Issue | YARN-8259 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12924356/YARN-8259.001.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux d3ca0d4182cb 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / f48fec8 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_162 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/20808/testReport/ | | Max. process+thread count | 301 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/20808/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Revisit liveliness checks for Docker container
[jira] [Commented] (YARN-8327) Fix TestAggregatedLogFormat#testReadAcontainerLogs1 on Windows
[ https://issues.apache.org/jira/browse/YARN-8327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482706#comment-16482706 ] Anbang Hu commented on YARN-8327: - Thanks [~giovanni.fumarola] for providing more information. Would you consider changing {code:java} ++ numChars + ("\n").length() + (System.lineSeparator() ++ "End of LogType:stdout" + System.lineSeparator()).length(); {code} to {code:java} ++ numChars + ("\n").length() + ("End of LogType:stdout" ++ System.lineSeparator() + System.lineSeparator()).length(); {code} This seems more aligned with the output message. > Fix TestAggregatedLogFormat#testReadAcontainerLogs1 on Windows > -- > > Key: YARN-8327 > URL: https://issues.apache.org/jira/browse/YARN-8327 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation >Reporter: Giovanni Matteo Fumarola >Assignee: Giovanni Matteo Fumarola >Priority: Major > Attachments: YARN-8327.v1.patch, image-2018-05-18-16-52-08-250.png, > image-2018-05-21-09-05-49-550.png > > > TestAggregatedLogFormat#testReadAcontainerLogs1 fails on Windows because of > the line separator. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8041) [Router] Federation: routing some missing REST invocations transparently to multiple RMs
[ https://issues.apache.org/jira/browse/YARN-8041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yiran Wu updated YARN-8041: --- Attachment: YARN-8041.004.patch > [Router] Federation: routing some missing REST invocations transparently to > multiple RMs > > > Key: YARN-8041 > URL: https://issues.apache.org/jira/browse/YARN-8041 > Project: Hadoop YARN > Issue Type: Sub-task > Components: federation, router >Reporter: Yiran Wu >Assignee: Yiran Wu >Priority: Minor > Attachments: YARN-8041.001.patch, YARN-8041.002.patch, > YARN-8041.003.patch, YARN-8041.004.patch > > > This Jira tracks the implementation of some missing REST invocation in > FederationInterceptorREST: > * getAppStatistics > * getNodeToLabels > * getLabelsOnNode > * updateApplicationPriority > * getAppQueue > * updateAppQueue > * getAppTimeout > * getAppTimeouts > * updateApplicationTimeout > * getAppAttempts > * getAppAttempt > * getContainers > * getContainer -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8206) Sending a kill does not immediately kill docker containers
[ https://issues.apache.org/jira/browse/YARN-8206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482694#comment-16482694 ] Eric Badger commented on YARN-8206: --- Thanks for the review, [~jlowe]! Patch 011 addresses your comments. At this point {{isContainerRequestedAsPrivileged()}} is a really simple gadget function that might not be necessary. Makes it easier to change the implementation of the function, though. If you'd like me to just inline this in the caller code, I can do that. > Sending a kill does not immediately kill docker containers > -- > > Key: YARN-8206 > URL: https://issues.apache.org/jira/browse/YARN-8206 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Labels: Docker > Attachments: YARN-8206.001.patch, YARN-8206.002.patch, > YARN-8206.003.patch, YARN-8206.004.patch, YARN-8206.005.patch, > YARN-8206.006.patch, YARN-8206.007.patch, YARN-8206.008.patch, > YARN-8206.009.patch, YARN-8206.010.patch, YARN-8206.011.patch > > > {noformat} > if (ContainerExecutor.Signal.KILL.equals(signal) > || ContainerExecutor.Signal.TERM.equals(signal)) { > handleContainerStop(containerId, env); > {noformat} > Currently in the code, we are handling both SIGKILL and SIGTERM as equivalent > for docker containers. However, they should actually be separate. When YARN > sends a SIGKILL to a process, it means for it to die immediately and not sit > around waiting for anything. This ensures an immediate reclamation of > resources. Additionally, if a SIGTERM is sent before the SIGKILL, the task > might not handle the signal correctly, and will then end up as a failed task > instead of a killed task. This is especially bad for preemption. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8256) Pluggable provider for node membership management
[ https://issues.apache.org/jira/browse/YARN-8256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482695#comment-16482695 ] Dagang Wei commented on YARN-8256: -- Friendly ping. > Pluggable provider for node membership management > - > > Key: YARN-8256 > URL: https://issues.apache.org/jira/browse/YARN-8256 > Project: Hadoop YARN > Issue Type: New Feature > Components: resourcemanager >Affects Versions: 2.8.3, 3.0.2 >Reporter: Dagang Wei >Priority: Major > Original Estimate: 24h > Remaining Estimate: 24h > > h1. Background > HDFS-7541 introduced a pluggable provider framework for node membership > management, which gives HDFS the flexibility to have different ways to manage > node membership for different needs. > [org.apache.hadoop.hdfs.server.blockmanagement.HostConfigManager|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HostConfigManager.java] > is the class which provides the abstraction. Currently, there are 2 > implementations in the HDFS codebase: > 1) > [org.apache.hadoop.hdfs.server.blockmanagement.HostFileManager|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HostFileManager.java] > which uses 2 config files which are defined by the properties dfs.hosts and > dfs.hosts.exclude. > 2) > [org.apache.hadoop.hdfs.server.blockmanagement.CombinedHostFileManager|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/CombinedHostFileManager.java] > which uses a single JSON file defined by the property dfs.hosts. > dfs.namenode.hosts.provider.classname is the property determining which > implementation is used > h1. Problem > YARN should be consistent with HDFS in terms of pluggable provider for node > membership management. The absence of it makes YARN impossible to have other > config sources, e.g., ZooKeeper, database, other config file formats, etc. > h1. Proposed solution > [org.apache.hadoop.yarn.server.resourcemanager.NodesListManager|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/NodesListManager.java] > is the class for managing YARN node membership today. It uses > [HostsFileReader|https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/HostsFileReader.java] > to read config files specified by the property > yarn.resourcemanager.nodes.include-path for nodes to include and > yarn.resourcemanager.nodes.nodes.exclude-path for nodes to exclude. > The proposed solution is to > 1) introduce a new interface {color:#008000}HostsConfigManager{color} which > provides the abstraction for node membership management. Update > {color:#008000}NodeListManager{color} to depend on > {color:#008000}HostsConfigManager{color} instead of > {color:#008000}HostsFileReader{color}. Then create a wrapper class for > {color:#008000}HostsFileReader{color} which implements the interface. > 2) introduce a new config property > {color:#008000}yarn.resourcemanager.hosts-config.manager.class{color} for > specifying the implementation class. Set the default value to the wrapper > class of {color:#008000}HostsFileReader{color} for backward compatibility > between new code and old config. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8206) Sending a kill does not immediately kill docker containers
[ https://issues.apache.org/jira/browse/YARN-8206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-8206: -- Attachment: YARN-8206.011.patch > Sending a kill does not immediately kill docker containers > -- > > Key: YARN-8206 > URL: https://issues.apache.org/jira/browse/YARN-8206 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Labels: Docker > Attachments: YARN-8206.001.patch, YARN-8206.002.patch, > YARN-8206.003.patch, YARN-8206.004.patch, YARN-8206.005.patch, > YARN-8206.006.patch, YARN-8206.007.patch, YARN-8206.008.patch, > YARN-8206.009.patch, YARN-8206.010.patch, YARN-8206.011.patch > > > {noformat} > if (ContainerExecutor.Signal.KILL.equals(signal) > || ContainerExecutor.Signal.TERM.equals(signal)) { > handleContainerStop(containerId, env); > {noformat} > Currently in the code, we are handling both SIGKILL and SIGTERM as equivalent > for docker containers. However, they should actually be separate. When YARN > sends a SIGKILL to a process, it means for it to die immediately and not sit > around waiting for anything. This ensures an immediate reclamation of > resources. Additionally, if a SIGTERM is sent before the SIGKILL, the task > might not handle the signal correctly, and will then end up as a failed task > instead of a killed task. This is especially bad for preemption. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8333) Load balance YARN services using RegistryDNS multiple A records
[ https://issues.apache.org/jira/browse/YARN-8333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang updated YARN-8333: Description: For scaling stateless containers, it would be great to support DNS round robin for fault tolerance and load balancing. The current DNS record format for RegistryDNS is [container-instance].[application-name].[username].[domain]. For example: {code} appcatalog-0.appname.hbase.ycluster. IN A 123.123.123.120 appcatalog-1.appname.hbase.ycluster. IN A 123.123.123.121 appcatalog-2.appname.hbase.ycluster. IN A 123.123.123.122 appcatalog-3.appname.hbase.ycluster. IN A 123.123.123.123 {code} It would be nice to add multi-A record that contains all IP addresses of the same component in addition to the instance based records. For example: {code} appcatalog.appname.hbase.ycluster. IN A 123.123.123.120 appcatalog.appname.hbase.ycluster. IN A 123.123.123.121 appcatalog.appname.hbase.ycluster. IN A 123.123.123.122 appcatalog.appname.hbase.ycluster. IN A 123.123.123.123 {code} > Load balance YARN services using RegistryDNS multiple A records > --- > > Key: YARN-8333 > URL: https://issues.apache.org/jira/browse/YARN-8333 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn-native-services >Affects Versions: 3.1.0 >Reporter: Eric Yang >Priority: Major > > For scaling stateless containers, it would be great to support DNS round > robin for fault tolerance and load balancing. The current DNS record format > for RegistryDNS is > [container-instance].[application-name].[username].[domain]. For example: > {code} > appcatalog-0.appname.hbase.ycluster. IN A 123.123.123.120 > appcatalog-1.appname.hbase.ycluster. IN A 123.123.123.121 > appcatalog-2.appname.hbase.ycluster. IN A 123.123.123.122 > appcatalog-3.appname.hbase.ycluster. IN A 123.123.123.123 > {code} > It would be nice to add multi-A record that contains all IP addresses of the > same component in addition to the instance based records. For example: > {code} > appcatalog.appname.hbase.ycluster. IN A 123.123.123.120 > appcatalog.appname.hbase.ycluster. IN A 123.123.123.121 > appcatalog.appname.hbase.ycluster. IN A 123.123.123.122 > appcatalog.appname.hbase.ycluster. IN A 123.123.123.123 > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8292) Fix the dominant resource preemption cannot happen when some of the resource vector becomes negative
[ https://issues.apache.org/jira/browse/YARN-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482689#comment-16482689 ] Eric Payne commented on YARN-8292: -- Thanks [~leftnoteasy] for your work on this issue. - I don't think this is necessary. {code:title=AbstractPreemptableResourceCalculator#computeFixpointAllocation} Resource dupUnassignedForTheRound = Resources.clone(unassigned); {code} - I'm concerned about checking for {{any resource <= 0}} before preempting for intra-queue preemption. When extended resources are used, won't this prevent any preemption in a queue where none of the apps used the extended resource? {code:title=CapacitySchedulerPreemptionUtils#tryPreemptContainerAndDeductResToObtain} if (conservativeDRF) { doPreempt = !Resources.isAnyMajorResourceZeroOrNegative(rc, toObtainByPartition); } else{ {code} For example, if gpu is the extended resource, but no apps are currently using gpu in the queue, no intra-queue preemption will take place. > Fix the dominant resource preemption cannot happen when some of the resource > vector becomes negative > > > Key: YARN-8292 > URL: https://issues.apache.org/jira/browse/YARN-8292 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Sumana Sathish >Assignee: Wangda Tan >Priority: Critical > Attachments: YARN-8292.001.patch, YARN-8292.002.patch, > YARN-8292.003.patch, YARN-8292.004.patch, YARN-8292.005.patch > > > This is an example of the problem: > > {code} > // guaranteed, max,used, pending > "root(=[30:18:6 30:18:6 12:12:6 1:1:1]);" + //root > "-a(=[10:6:2 10:6:2 6:6:3 0:0:0]);" + // a > "-b(=[10:6:2 10:6:2 6:6:3 0:0:0]);" + // b > "-c(=[10:6:2 10:6:2 0:0:0 1:1:1])"; // c > {code} > There're 3 resource types. Total resource of the cluster is 30:18:6 > For both of a/b, there're 3 containers running, each of container is 2:2:1. > Queue c uses 0 resource, and have 1:1:1 pending resource. > Under existing logic, preemption cannot happen. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8333) Load balance YARN services using RegistryDNS multiple A records
Eric Yang created YARN-8333: --- Summary: Load balance YARN services using RegistryDNS multiple A records Key: YARN-8333 URL: https://issues.apache.org/jira/browse/YARN-8333 Project: Hadoop YARN Issue Type: Improvement Components: yarn-native-services Affects Versions: 3.1.0 Reporter: Eric Yang -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8327) Fix TestAggregatedLogFormat#testReadAcontainerLogs1 on Windows
[ https://issues.apache.org/jira/browse/YARN-8327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482664#comment-16482664 ] Giovanni Matteo Fumarola commented on YARN-8327: Thanks [~huanbang1993] for the feedback. I copied the wrong picture. !image-2018-05-21-09-05-49-550.png! In Linux I can see: xx*\n*End of LogType:stdout*\n\n* My initial statemant about \r\n\n was incorrect. The problem is about the \n before the "End of LogType". > Fix TestAggregatedLogFormat#testReadAcontainerLogs1 on Windows > -- > > Key: YARN-8327 > URL: https://issues.apache.org/jira/browse/YARN-8327 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation >Reporter: Giovanni Matteo Fumarola >Assignee: Giovanni Matteo Fumarola >Priority: Major > Attachments: YARN-8327.v1.patch, image-2018-05-18-16-52-08-250.png, > image-2018-05-21-09-05-49-550.png > > > TestAggregatedLogFormat#testReadAcontainerLogs1 fails on Windows because of > the line separator. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8327) Fix TestAggregatedLogFormat#testReadAcontainerLogs1 on Windows
[ https://issues.apache.org/jira/browse/YARN-8327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Giovanni Matteo Fumarola updated YARN-8327: --- Attachment: image-2018-05-21-09-05-49-550.png > Fix TestAggregatedLogFormat#testReadAcontainerLogs1 on Windows > -- > > Key: YARN-8327 > URL: https://issues.apache.org/jira/browse/YARN-8327 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation >Reporter: Giovanni Matteo Fumarola >Assignee: Giovanni Matteo Fumarola >Priority: Major > Attachments: YARN-8327.v1.patch, image-2018-05-18-16-52-08-250.png, > image-2018-05-21-09-05-49-550.png > > > TestAggregatedLogFormat#testReadAcontainerLogs1 fails on Windows because of > the line separator. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8327) Fix TestAggregatedLogFormat#testReadAcontainerLogs1 on Windows
[ https://issues.apache.org/jira/browse/YARN-8327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Giovanni Matteo Fumarola updated YARN-8327: --- Attachment: (was: TestAggregatedLogFormat.png) > Fix TestAggregatedLogFormat#testReadAcontainerLogs1 on Windows > -- > > Key: YARN-8327 > URL: https://issues.apache.org/jira/browse/YARN-8327 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation >Reporter: Giovanni Matteo Fumarola >Assignee: Giovanni Matteo Fumarola >Priority: Major > Attachments: YARN-8327.v1.patch, image-2018-05-18-16-52-08-250.png > > > TestAggregatedLogFormat#testReadAcontainerLogs1 fails on Windows because of > the line separator. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8327) Fix TestAggregatedLogFormat#testReadAcontainerLogs1 on Windows
[ https://issues.apache.org/jira/browse/YARN-8327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16481367#comment-16481367 ] Giovanni Matteo Fumarola edited comment on YARN-8327 at 5/21/18 4:03 PM: - Before the patch: !TestAggregatedLogFormat.png! After the patch: !image-2018-05-18-16-52-08-250.png! I validated in Linux as well. The length of System.lineSeparator() in Windows is equal to 2 while in Linux is 1. In the logline, there are 2 consecutive line separators that in Windows are converted to \r\n\n. Due to this, I changed every *\n* but one. was (Author: giovanni.fumarola): Before the patch: !file:///C:/Users/gifuma/AppData/Local/Temp/msohtmlclip1/01/clip_image001.png|width=486,height=111! After the patch: !image-2018-05-18-16-52-08-250.png! I validated in Linux as well. The length of System.lineSeparator() in Windows is equal to 2 while in Linux is 1. In the logline, there are 2 consecutive line separators that in Windows are converted to \r\n\n. Due to this, I changed every *\n* but one. > Fix TestAggregatedLogFormat#testReadAcontainerLogs1 on Windows > -- > > Key: YARN-8327 > URL: https://issues.apache.org/jira/browse/YARN-8327 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation >Reporter: Giovanni Matteo Fumarola >Assignee: Giovanni Matteo Fumarola >Priority: Major > Attachments: TestAggregatedLogFormat.png, YARN-8327.v1.patch, > image-2018-05-18-16-52-08-250.png > > > TestAggregatedLogFormat#testReadAcontainerLogs1 fails on Windows because of > the line separator. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8327) Fix TestAggregatedLogFormat#testReadAcontainerLogs1 on Windows
[ https://issues.apache.org/jira/browse/YARN-8327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Giovanni Matteo Fumarola updated YARN-8327: --- Attachment: TestAggregatedLogFormat.png > Fix TestAggregatedLogFormat#testReadAcontainerLogs1 on Windows > -- > > Key: YARN-8327 > URL: https://issues.apache.org/jira/browse/YARN-8327 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation >Reporter: Giovanni Matteo Fumarola >Assignee: Giovanni Matteo Fumarola >Priority: Major > Attachments: TestAggregatedLogFormat.png, YARN-8327.v1.patch, > image-2018-05-18-16-52-08-250.png > > > TestAggregatedLogFormat#testReadAcontainerLogs1 fails on Windows because of > the line separator. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8327) Fix TestAggregatedLogFormat#testReadAcontainerLogs1 on Windows
[ https://issues.apache.org/jira/browse/YARN-8327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16481367#comment-16481367 ] Giovanni Matteo Fumarola edited comment on YARN-8327 at 5/21/18 4:02 PM: - Before the patch: !file:///C:/Users/gifuma/AppData/Local/Temp/msohtmlclip1/01/clip_image001.png|width=486,height=111! After the patch: !image-2018-05-18-16-52-08-250.png! I validated in Linux as well. The length of System.lineSeparator() in Windows is equal to 2 while in Linux is 1. In the logline, there are 2 consecutive line separators that in Windows are converted to \r\n\n. Due to this, I changed every *\n* but one. was (Author: giovanni.fumarola): Before the patch: !image-2018-05-18-16-51-47-324.png! After the patch: !image-2018-05-18-16-52-08-250.png! I validated in Linux as well. The length of System.lineSeparator() in Windows is equal to 2 while in Linux is 1. In the logline, there are 2 consecutive line separators that in Windows are converted to \r\n\n. Due to this, I changed every *\n* but one. > Fix TestAggregatedLogFormat#testReadAcontainerLogs1 on Windows > -- > > Key: YARN-8327 > URL: https://issues.apache.org/jira/browse/YARN-8327 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation >Reporter: Giovanni Matteo Fumarola >Assignee: Giovanni Matteo Fumarola >Priority: Major > Attachments: YARN-8327.v1.patch, image-2018-05-18-16-52-08-250.png > > > TestAggregatedLogFormat#testReadAcontainerLogs1 fails on Windows because of > the line separator. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8327) Fix TestAggregatedLogFormat#testReadAcontainerLogs1 on Windows
[ https://issues.apache.org/jira/browse/YARN-8327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Giovanni Matteo Fumarola updated YARN-8327: --- Attachment: (was: image-2018-05-18-16-51-47-324.png) > Fix TestAggregatedLogFormat#testReadAcontainerLogs1 on Windows > -- > > Key: YARN-8327 > URL: https://issues.apache.org/jira/browse/YARN-8327 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation >Reporter: Giovanni Matteo Fumarola >Assignee: Giovanni Matteo Fumarola >Priority: Major > Attachments: YARN-8327.v1.patch, image-2018-05-18-16-52-08-250.png > > > TestAggregatedLogFormat#testReadAcontainerLogs1 fails on Windows because of > the line separator. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-4353) Provide short circuit user group mapping for NM/AM
[ https://issues.apache.org/jira/browse/YARN-4353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Templeton resolved YARN-4353. Resolution: Won't Fix Hadoop Flags: Reviewed I'm fine with closing this out. I added {{NullGroupsMapping}} in order to resolve this JIRA, but I never felt confident enough to pull the trigger. > Provide short circuit user group mapping for NM/AM > -- > > Key: YARN-4353 > URL: https://issues.apache.org/jira/browse/YARN-4353 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 2.7.1 >Reporter: Daniel Templeton >Assignee: Daniel Templeton >Priority: Major > Attachments: YARN-4353.prelim.patch > > > When the NM launches an AM, the {{ContainerLocalizer}} gets the current user > from {{UserGroupInformation}}, which triggers user group mapping, even though > the user groups are never accessed. If secure LDAP is configured for group > mapping, then there are some additional complications created by the > unnecessary group resolution. Additionally, it adds unnecessary latency to > the container launch time. > To address the issue, before getting the current user, the > {{ContainerLocalizer}} should configure {{UserGroupInformation}} with a null > group mapping service that quickly and quietly returns an empty group list > for all users. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8328) NonAggregatingLogHandler needlessly waits upon shutdown if delayed deletion is scheduled
[ https://issues.apache.org/jira/browse/YARN-8328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482643#comment-16482643 ] Jason Lowe commented on YARN-8328: -- This is making a lot of unit tests take a lot longer than they should. NonAggregatingLogHandler is keeping applications alive as long as the logs are around, and the NM tries to wait for applications to finish cleaning up when it tears down. That leads to long delays in unit tests that use the minicluster. My initial reaction was the minicluster should automatically set the log deletion to 0, but I can see the desire to keep container logs around for test debugging. What if there was a way to tell the non-aggregating log handler to _not_ delete logs? Applications would then complete quickly rather than linger as they are doing today. I propose the minicluster ask for this by default, as IMHO it's the responsibility of the unit test running the minicluster to perform cleanup of the test directories (including the minicluster local and log dirs), and that would leave the option to the minicluster user whether to keep logs for debugging or clean them up with all the other test data. Thoughts? > NonAggregatingLogHandler needlessly waits upon shutdown if delayed deletion > is scheduled > > > Key: YARN-8328 > URL: https://issues.apache.org/jira/browse/YARN-8328 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jonathan Eagles >Priority: Major > Attachments: YARN-8328.001.patch > > > This happens frequently in the MiniYarnCluster setup where a job completes > and then the cluster is shutdown. Often the jobs are scheduled to delay > deletion of the log files so that container logs are available for debugging. > A scheduled log deletion has a default value of 3 hours. When the > NonAggregating LogHandler is stopped it waits 10 seconds for all delayed > scheduled log deletions to occur and then shuts down. > A test developer has to make a trade of to either 1) set the log deletion to > 0 to have the tests quickly shutdown but get no container logs to debug or 2) > keep the log deletion at 3 hours and incur the 10 second timeout for each > test suit. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8259) Revisit liveliness checks for privileged Docker containers
[ https://issues.apache.org/jira/browse/YARN-8259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482640#comment-16482640 ] Shane Kumpf commented on YARN-8259: --- Attaching a patch that addresses the issue. Unfortunately, I found a couple of problems with using the docker API/CLI that made it necessary to go a different direction. The liveliness check will fail during a docker daemon restart with live restore enabled. We would need to add retries to handle this case, but this is brittle. Also, the reacquistion code runs {{signalContainer}} once per second until the application finishes, this resulted in many {{docker inspect}} and {{container-executor}} calls, which are expensive operations. After testing the suggested approaches, I went with checking for the existence of /proc/. The obvious con of using /proc is portability, but I'm not sure this is really an issue given existing features of the docker runtime. The patch moves both privileged and non-privileged Docker container liveliness checks to use /proc/, saving the c-e call. Finally, the logging around signalling failures needs work, but this impacts all runtimes under LCE, so I'd prefer to address it in a separate issue. I'll open an issue on this. > Revisit liveliness checks for privileged Docker containers > -- > > Key: YARN-8259 > URL: https://issues.apache.org/jira/browse/YARN-8259 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.0.2, 3.2.0, 3.1.1 >Reporter: Shane Kumpf >Assignee: Shane Kumpf >Priority: Major > Labels: Docker > Attachments: YARN-8259.001.patch > > > As privileged containers may execute as a user that does not match the YARN > run as user, sending the null signal for liveliness checks could fail. We > need to reconsider how liveliness checks are handled in the Docker case. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8259) Revisit liveliness checks for Docker containers
[ https://issues.apache.org/jira/browse/YARN-8259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shane Kumpf updated YARN-8259: -- Summary: Revisit liveliness checks for Docker containers (was: Revisit liveliness checks for privileged Docker containers) > Revisit liveliness checks for Docker containers > --- > > Key: YARN-8259 > URL: https://issues.apache.org/jira/browse/YARN-8259 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.0.2, 3.2.0, 3.1.1 >Reporter: Shane Kumpf >Assignee: Shane Kumpf >Priority: Major > Labels: Docker > Attachments: YARN-8259.001.patch > > > As privileged containers may execute as a user that does not match the YARN > run as user, sending the null signal for liveliness checks could fail. We > need to reconsider how liveliness checks are handled in the Docker case. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8259) Revisit liveliness checks for privileged Docker containers
[ https://issues.apache.org/jira/browse/YARN-8259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shane Kumpf updated YARN-8259: -- Attachment: YARN-8259.001.patch > Revisit liveliness checks for privileged Docker containers > -- > > Key: YARN-8259 > URL: https://issues.apache.org/jira/browse/YARN-8259 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.0.2, 3.2.0, 3.1.1 >Reporter: Shane Kumpf >Assignee: Shane Kumpf >Priority: Major > Labels: Docker > Attachments: YARN-8259.001.patch > > > As privileged containers may execute as a user that does not match the YARN > run as user, sending the null signal for liveliness checks could fail. We > need to reconsider how liveliness checks are handled in the Docker case. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8248) Job hangs when a job requests a resource that its queue does not have
[ https://issues.apache.org/jira/browse/YARN-8248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482627#comment-16482627 ] Hudson commented on YARN-8248: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14244 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/14244/]) YARN-8248. Job hangs when a job requests a resource that its queue does (haibochen: rev f48fec83d0f2d1a781a141ad7216463c5526321f) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerUtils.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSParentQueue.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerTestBase.java > Job hangs when a job requests a resource that its queue does not have > - > > Key: YARN-8248 > URL: https://issues.apache.org/jira/browse/YARN-8248 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Reporter: Szilard Nemeth >Assignee: Szilard Nemeth >Priority: Major > Fix For: 3.2.0 > > Attachments: YARN-8248-001.patch, YARN-8248-002.patch, > YARN-8248-003.patch, YARN-8248-004.patch, YARN-8248-005.patch, > YARN-8248-006.patch, YARN-8248-007.patch, YARN-8248-008.patch, > YARN-8248-009.patch, YARN-8248-010.patch, YARN-8248-011.patch, > YARN-8248-012.patch, YARN-8248-013.patch, YARN-8248-014.patch > > > Job hangs when mapreduce.job.queuename is specified and the queue has 0 of > any resource (vcores / memory / other) > In this scenario, the job should be immediately rejected upon submission > since the specified queue cannot serve the resource needs of the submitted > job. > > Command to run: > {code:java} > bin/yarn jar > "./share/hadoop/mapreduce/hadoop-mapreduce-examples-$MY_HADOOP_VERSION.jar" > pi -Dmapreduce.job.queuename=sample_queue 1 1000;{code} > fair-scheduler.xml queue config (excerpt): > > {code:java} > > 1 mb,0vcores > 9 mb,0vcores > 50 > -1.0f > 2.0 > fair > > {code} > Diagnostic message from the web UI: > {code:java} > Wed May 02 06:35:57 -0700 2018] Application is added to the scheduler and is > not yet activated. (Resource request: exceeds current > queue or its parents maximum resource allowed).{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8206) Sending a kill does not immediately kill docker containers
[ https://issues.apache.org/jira/browse/YARN-8206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482584#comment-16482584 ] Jason Lowe commented on YARN-8206: -- Thanks for updating the patch! Unfortunately the patch no longer applies after YARN-8141 and needs to be updated. allowPrivilegedContainerExecution will now log a warning every time a privileged container is not requested which it did not do before. I'm wondering if a warning log is really ever needed here. It's already logging when a privileged container is requested, so we can infer if that log does not appear that the env variable was not set properly. Nit: isContainerRequestedAsPrivileged can be simplified with Boolean.parseBoolean which handles null directly. Nit: Since handleContainerKill is being refactored it would be good to improve the debug logging to leverage the SLF4J API so it doesn't have to do a debug check, i.e.: leveraging the \{\} positional parameter syntax so the message does not have to be built even if the log message will be discarded. > Sending a kill does not immediately kill docker containers > -- > > Key: YARN-8206 > URL: https://issues.apache.org/jira/browse/YARN-8206 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Labels: Docker > Attachments: YARN-8206.001.patch, YARN-8206.002.patch, > YARN-8206.003.patch, YARN-8206.004.patch, YARN-8206.005.patch, > YARN-8206.006.patch, YARN-8206.007.patch, YARN-8206.008.patch, > YARN-8206.009.patch, YARN-8206.010.patch > > > {noformat} > if (ContainerExecutor.Signal.KILL.equals(signal) > || ContainerExecutor.Signal.TERM.equals(signal)) { > handleContainerStop(containerId, env); > {noformat} > Currently in the code, we are handling both SIGKILL and SIGTERM as equivalent > for docker containers. However, they should actually be separate. When YARN > sends a SIGKILL to a process, it means for it to die immediately and not sit > around waiting for anything. This ensures an immediate reclamation of > resources. Additionally, if a SIGTERM is sent before the SIGKILL, the task > might not handle the signal correctly, and will then end up as a failed task > instead of a killed task. This is especially bad for preemption. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8248) Job hangs when a job requests a resource that its queue does not have
[ https://issues.apache.org/jira/browse/YARN-8248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482576#comment-16482576 ] Haibo Chen commented on YARN-8248: -- I'm okay with not fixing this one. But it is indeed a convention to cap static variables, IIRC. +1 checking it in. > Job hangs when a job requests a resource that its queue does not have > - > > Key: YARN-8248 > URL: https://issues.apache.org/jira/browse/YARN-8248 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Reporter: Szilard Nemeth >Assignee: Szilard Nemeth >Priority: Major > Attachments: YARN-8248-001.patch, YARN-8248-002.patch, > YARN-8248-003.patch, YARN-8248-004.patch, YARN-8248-005.patch, > YARN-8248-006.patch, YARN-8248-007.patch, YARN-8248-008.patch, > YARN-8248-009.patch, YARN-8248-010.patch, YARN-8248-011.patch, > YARN-8248-012.patch, YARN-8248-013.patch, YARN-8248-014.patch > > > Job hangs when mapreduce.job.queuename is specified and the queue has 0 of > any resource (vcores / memory / other) > In this scenario, the job should be immediately rejected upon submission > since the specified queue cannot serve the resource needs of the submitted > job. > > Command to run: > {code:java} > bin/yarn jar > "./share/hadoop/mapreduce/hadoop-mapreduce-examples-$MY_HADOOP_VERSION.jar" > pi -Dmapreduce.job.queuename=sample_queue 1 1000;{code} > fair-scheduler.xml queue config (excerpt): > > {code:java} > > 1 mb,0vcores > 9 mb,0vcores > 50 > -1.0f > 2.0 > fair > > {code} > Diagnostic message from the web UI: > {code:java} > Wed May 02 06:35:57 -0700 2018] Application is added to the scheduler and is > not yet activated. (Resource request: exceeds current > queue or its parents maximum resource allowed).{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7960) Add no-new-privileges flag to docker run
[ https://issues.apache.org/jira/browse/YARN-7960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482548#comment-16482548 ] Eric Badger commented on YARN-7960: --- Test doesn't fail for me locally and is in RM code, so it's unrelated. > Add no-new-privileges flag to docker run > > > Key: YARN-7960 > URL: https://issues.apache.org/jira/browse/YARN-7960 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Labels: Docker > Attachments: YARN-7960.001.patch, YARN-7960.002.patch > > > Minimally, this should be used for unprivileged containers. It's a cheap way > to add an extra layer of security to the docker model. For privileged > containers, it might be appropriate to omit this flag > https://github.com/moby/moby/pull/20727 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8103) Add CLI interface to query node attributes
[ https://issues.apache.org/jira/browse/YARN-8103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482526#comment-16482526 ] genericqa commented on YARN-8103: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 35s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 4 new or modified test files. {color} | || || || || {color:brown} YARN-3409 Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 49s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 29s{color} | {color:green} YARN-3409 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 29m 31s{color} | {color:green} YARN-3409 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 3m 34s{color} | {color:green} YARN-3409 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 8m 11s{color} | {color:green} YARN-3409 passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 43s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 6m 43s{color} | {color:green} YARN-3409 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 0s{color} | {color:green} YARN-3409 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 23s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 28m 4s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 28m 4s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 28m 4s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 3m 32s{color} | {color:orange} root: The patch generated 13 new + 484 unchanged - 27 fixed = 497 total (was 511) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 8m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} shellcheck {color} | {color:green} 0m 25s{color} | {color:green} There were no new shellcheck issues. {color} | | {color:green}+1{color} | {color:green} shelldocs {color} | {color:green} 0m 15s{color} | {color:green} There were no new shelldocs issues. {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch 2 line(s) with tabs. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 30s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 53s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 0s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 7m 25s{color} | {color:red} hadoop-yarn in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 53s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 3m 18s{color} | {color:red} hadoop-yarn-common in the patch failed. {color} | | {color:green}+1{color} | {color:green} uni
[jira] [Commented] (YARN-8248) Job hangs when a job requests a resource that its queue does not have
[ https://issues.apache.org/jira/browse/YARN-8248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482427#comment-16482427 ] genericqa commented on YARN-8248: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 32s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 56s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 48s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 41s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 48s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 54s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 45s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 38s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 1 new + 257 unchanged - 1 fixed = 258 total (was 258) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 50s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 68m 28s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 23s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}128m 26s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd | | JIRA Issue | YARN-8248 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12924323/YARN-8248-014.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux b9cdbcec529c 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / a23ff8d | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_162 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/20807/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/20807/testReport/ | | Max. process+thread count | 819 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resou
[jira] [Updated] (YARN-8320) Add support CPU isolation for latency-sensitive (LS) service
[ https://issues.apache.org/jira/browse/YARN-8320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiandan Yang updated YARN-8320: Attachment: YARN-8320.001.patch > Add support CPU isolation for latency-sensitive (LS) service > - > > Key: YARN-8320 > URL: https://issues.apache.org/jira/browse/YARN-8320 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager >Reporter: Jiandan Yang >Priority: Major > Attachments: CPU-isolation-for-latency-sensitive-services-v1.pdf, > YARN-8320.001.patch > > > Currently NodeManager uses “cpu.cfs_period_us”, “cpu.cfs_quota_us” and > “cpu.shares” to isolate cpu resource. However, > * Linux Completely Fair Scheduling (CFS) is a throughput-oriented scheduler; > no support for differentiated latency > * Request latency of services running on container may be frequent shake > when all containers share cpus, and latency-sensitive services can not afford > in our production environment. > So we need more finer cpu isolation. > My co-workers and I propose a solution using cgroup cpuset to binds > containers to different processors, this is inspired by the isolation > technique in [Borg > system|http://schd.ws/hosted_files/lcccna2016/a7/CAT%20@%20Scale.pdf]. > Later I will upload a detailed design doc. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8320) Add support CPU isolation for latency-sensitive (LS) service
[ https://issues.apache.org/jira/browse/YARN-8320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiandan Yang updated YARN-8320: Attachment: (was: YARN-8320.001.patch) > Add support CPU isolation for latency-sensitive (LS) service > - > > Key: YARN-8320 > URL: https://issues.apache.org/jira/browse/YARN-8320 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager >Reporter: Jiandan Yang >Priority: Major > Attachments: CPU-isolation-for-latency-sensitive-services-v1.pdf, > YARN-8320.001.patch > > > Currently NodeManager uses “cpu.cfs_period_us”, “cpu.cfs_quota_us” and > “cpu.shares” to isolate cpu resource. However, > * Linux Completely Fair Scheduling (CFS) is a throughput-oriented scheduler; > no support for differentiated latency > * Request latency of services running on container may be frequent shake > when all containers share cpus, and latency-sensitive services can not afford > in our production environment. > So we need more finer cpu isolation. > My co-workers and I propose a solution using cgroup cpuset to binds > containers to different processors, this is inspired by the isolation > technique in [Borg > system|http://schd.ws/hosted_files/lcccna2016/a7/CAT%20@%20Scale.pdf]. > Later I will upload a detailed design doc. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8320) Add support CPU isolation for latency-sensitive (LS) service
[ https://issues.apache.org/jira/browse/YARN-8320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482336#comment-16482336 ] Jiandan Yang commented on YARN-8320: - upload v1 patch to initiate disscussion > Add support CPU isolation for latency-sensitive (LS) service > - > > Key: YARN-8320 > URL: https://issues.apache.org/jira/browse/YARN-8320 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager >Reporter: Jiandan Yang >Priority: Major > Attachments: CPU-isolation-for-latency-sensitive-services-v1.pdf, > YARN-8320.001.patch > > > Currently NodeManager uses “cpu.cfs_period_us”, “cpu.cfs_quota_us” and > “cpu.shares” to isolate cpu resource. However, > * Linux Completely Fair Scheduling (CFS) is a throughput-oriented scheduler; > no support for differentiated latency > * Request latency of services running on container may be frequent shake > when all containers share cpus, and latency-sensitive services can not afford > in our production environment. > So we need more finer cpu isolation. > My co-workers and I propose a solution using cgroup cpuset to binds > containers to different processors, this is inspired by the isolation > technique in [Borg > system|http://schd.ws/hosted_files/lcccna2016/a7/CAT%20@%20Scale.pdf]. > Later I will upload a detailed design doc. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8320) Add support CPU isolation for latency-sensitive (LS) service
[ https://issues.apache.org/jira/browse/YARN-8320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiandan Yang updated YARN-8320: Attachment: YARN-8320.001.patch > Add support CPU isolation for latency-sensitive (LS) service > - > > Key: YARN-8320 > URL: https://issues.apache.org/jira/browse/YARN-8320 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager >Reporter: Jiandan Yang >Priority: Major > Attachments: CPU-isolation-for-latency-sensitive-services-v1.pdf, > YARN-8320.001.patch > > > Currently NodeManager uses “cpu.cfs_period_us”, “cpu.cfs_quota_us” and > “cpu.shares” to isolate cpu resource. However, > * Linux Completely Fair Scheduling (CFS) is a throughput-oriented scheduler; > no support for differentiated latency > * Request latency of services running on container may be frequent shake > when all containers share cpus, and latency-sensitive services can not afford > in our production environment. > So we need more finer cpu isolation. > My co-workers and I propose a solution using cgroup cpuset to binds > containers to different processors, this is inspired by the isolation > technique in [Borg > system|http://schd.ws/hosted_files/lcccna2016/a7/CAT%20@%20Scale.pdf]. > Later I will upload a detailed design doc. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8248) Job hangs when a job requests a resource that its queue does not have
[ https://issues.apache.org/jira/browse/YARN-8248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated YARN-8248: - Attachment: YARN-8248-014.patch > Job hangs when a job requests a resource that its queue does not have > - > > Key: YARN-8248 > URL: https://issues.apache.org/jira/browse/YARN-8248 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Reporter: Szilard Nemeth >Assignee: Szilard Nemeth >Priority: Major > Attachments: YARN-8248-001.patch, YARN-8248-002.patch, > YARN-8248-003.patch, YARN-8248-004.patch, YARN-8248-005.patch, > YARN-8248-006.patch, YARN-8248-007.patch, YARN-8248-008.patch, > YARN-8248-009.patch, YARN-8248-010.patch, YARN-8248-011.patch, > YARN-8248-012.patch, YARN-8248-013.patch, YARN-8248-014.patch > > > Job hangs when mapreduce.job.queuename is specified and the queue has 0 of > any resource (vcores / memory / other) > In this scenario, the job should be immediately rejected upon submission > since the specified queue cannot serve the resource needs of the submitted > job. > > Command to run: > {code:java} > bin/yarn jar > "./share/hadoop/mapreduce/hadoop-mapreduce-examples-$MY_HADOOP_VERSION.jar" > pi -Dmapreduce.job.queuename=sample_queue 1 1000;{code} > fair-scheduler.xml queue config (excerpt): > > {code:java} > > 1 mb,0vcores > 9 mb,0vcores > 50 > -1.0f > 2.0 > fair > > {code} > Diagnostic message from the web UI: > {code:java} > Wed May 02 06:35:57 -0700 2018] Application is added to the scheduler and is > not yet activated. (Resource request: exceeds current > queue or its parents maximum resource allowed).{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8248) Job hangs when a job requests a resource that its queue does not have
[ https://issues.apache.org/jira/browse/YARN-8248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482328#comment-16482328 ] genericqa commented on YARN-8248: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 35s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 14s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 39s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 46s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 38s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 10s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 35s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 2 new + 257 unchanged - 1 fixed = 259 total (was 258) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 32s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 67m 30s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 26s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}125m 45s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd | | JIRA Issue | YARN-8248 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12924314/YARN-8248-013.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux ea2c3aae2b4c 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / a23ff8d | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_162 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/20804/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/20804/testReport/ | | Max. process+thread count | 870 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resou
[jira] [Updated] (YARN-8103) Add CLI interface to query node attributes
[ https://issues.apache.org/jira/browse/YARN-8103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-8103: --- Attachment: YARN-8103-YARN-3409.002.patch > Add CLI interface to query node attributes > --- > > Key: YARN-8103 > URL: https://issues.apache.org/jira/browse/YARN-8103 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Major > Attachments: YARN-8103-YARN-3409.001.patch, > YARN-8103-YARN-3409.002.patch, YARN-8103-YARN-3409.WIP.patch > > > YARN-8100 will add API interface for querying the attributes. CLI interface > for querying node attributes for each nodes and list all attributes in > cluster. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org