[jira] [Commented] (YARN-8505) AMLimit and userAMLimit check should be skipped for unmanaged AM
[ https://issues.apache.org/jira/browse/YARN-8505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16541157#comment-16541157 ] Bibin A Chundatt commented on YARN-8505: [~leftnoteasy] +1 for adding a new configuration limiting number of RUNNING application for queue. I was having the same idea too. > AMLimit and userAMLimit check should be skipped for unmanaged AM > > > Key: YARN-8505 > URL: https://issues.apache.org/jira/browse/YARN-8505 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.2.0, 2.9.2 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Attachments: YARN-8505.001.patch > > > AMLimit and userAMLimit check in LeafQueue#activateApplications should be > skipped for unmanaged AM whose resource is not taken from YARN cluster. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8480) Add boolean option for resources
[ https://issues.apache.org/jira/browse/YARN-8480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16541156#comment-16541156 ] Wangda Tan commented on YARN-8480: -- [~cheersyang], What [~templedf] / [~snemeth] proposed is make the resource like a label, for example, node can report it has resource: memory=2048,vcore=3,has_java=true. And AM can request resource with ...has_java=true. Allocating container on the node will not update node's available resource, as far as the node has enough memory and vcores, it can allocate >1 containers with has_java=true resource request. This is an alternative way to represent node label by using resource types. > Add boolean option for resources > > > Key: YARN-8480 > URL: https://issues.apache.org/jira/browse/YARN-8480 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Daniel Templeton >Assignee: Szilard Nemeth >Priority: Major > Attachments: YARN-8480.001.patch, YARN-8480.002.patch > > > Make it possible to define a resource with a boolean value. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8505) AMLimit and userAMLimit check should be skipped for unmanaged AM
[ https://issues.apache.org/jira/browse/YARN-8505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16541155#comment-16541155 ] Wangda Tan commented on YARN-8505: -- [~bibinchundatt] / [~Tao Yang] / [~cheersyang], I would prefer to add a new limit to container #maximum-concurrently-activated-apps within a queue and skip updating AMLimit when a unmanaged AM is launched. > AMLimit and userAMLimit check should be skipped for unmanaged AM > > > Key: YARN-8505 > URL: https://issues.apache.org/jira/browse/YARN-8505 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.2.0, 2.9.2 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Attachments: YARN-8505.001.patch > > > AMLimit and userAMLimit check in LeafQueue#activateApplications should be > skipped for unmanaged AM whose resource is not taken from YARN cluster. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8511) When AM releases a container, RM removes allocation tags before it is released by NM
[ https://issues.apache.org/jira/browse/YARN-8511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16541153#comment-16541153 ] Wangda Tan commented on YARN-8511: -- [~cheersyang], Thanks for reporting and working on this issue, this is valid issue, and we saw it from other places. For example, when exclusive use resource types like GPU, we could allocate and container to a node before the previous container completed. Memory has the same issue. I'm not sure if your patch works since the {{SchedulerNode#releaseContainer}} could be invoked in scenarios like when an AM release container by invoking allocate call, or app attempt finishes. Scheduler could still place a new container on a node before it terminated by NM. Instead, I think we should have some hook to handle such event inside {{AbstractYarnScheduler#nodeUpdate}}. However we still have two issues: 1) If we deduct resource after actual container finishes, it is possible that scheduler application attempt already finished. In that case, scheduler is not able to deduct resources. (Scheduler relies on SchedulerApplicationAttempt to locate RMContainer). I'm not sure if it impacts allocation tags or not. 2) It is also possible that NM spend too much time on terminating containers, in our docker-in-docker setup, we observed OS takes several minutes to terminate container. And NM could report container is DONE before it is actually terminated. (Another bug here). YARN-8508 is caused by the issue. > When AM releases a container, RM removes allocation tags before it is > released by NM > > > Key: YARN-8511 > URL: https://issues.apache.org/jira/browse/YARN-8511 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Affects Versions: 3.1.0 >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Major > Attachments: YARN-8511.001.patch, YARN-8511.002.patch > > > User leverages PC with allocation tags to avoid port conflicts between apps, > we found sometimes they still get port conflicts. This is a similar issue > like YARN-4148. Because RM immediately removes allocation tags once > AM#allocate asks to release a container, however container on NM has some > delay until it actually gets killed and released the port. We should let RM > remove allocation tags AFTER NM confirms the containers are released. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7064) Use cgroup to get container resource utilization
[ https://issues.apache.org/jira/browse/YARN-7064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16541092#comment-16541092 ] yewei.huang commented on YARN-7064: --- {{Thanks [~miklos.szeg...@cloudera.com] for such nice feature!!!}} {{As a newBee for yarn, I've got a little confusion on why we choose to add up (user + sys) time in cpuacct.stat rather than use cpuacct.usage when try to get total cpu usage ? }} {{From the [kernel doc|https://www.kernel.org/doc/Documentation/cgroup-v1/cpuacct.txt]}} {{cpuacct.usage }}gives the CPU time (in nanoseconds) obtained by this group which is essentially the CPU time obtained by all the tasks in the system. cpuacct.stat lists a few statistics which further divide the CPU time (in USER_HZ unit) obtained by the cgroup into user and system times. And it also mentioned cpuacct controller uses percpu_counter interface to collect user and system times. This has two side effects: * It is theoretically possible to see wrong values for user and system times. This is because percpu_counter_read() on 32bit systems isn't safe against concurrent writes. * It is possible to see slightly outdated values for user and system times due to the batch processing nature of percpu_counter. seems much safer to use cpuacct.usage? > Use cgroup to get container resource utilization > > > Key: YARN-7064 > URL: https://issues.apache.org/jira/browse/YARN-7064 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: Miklos Szegedi >Assignee: Miklos Szegedi >Priority: Major > Fix For: 3.1.0 > > Attachments: YARN-7064.000.patch, YARN-7064.001.patch, > YARN-7064.002.patch, YARN-7064.003.patch, YARN-7064.004.patch, > YARN-7064.005.patch, YARN-7064.007.patch, YARN-7064.008.patch, > YARN-7064.009.patch, YARN-7064.010.patch, YARN-7064.011.patch, > YARN-7064.012.patch, YARN-7064.013.patch, YARN-7064.014.patch > > > This is an addendum to YARN-6668. What happens is that that jira always wants > to rebase patches against YARN-1011 instead of trunk. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-7064) Use cgroup to get container resource utilization
[ https://issues.apache.org/jira/browse/YARN-7064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16541092#comment-16541092 ] yewei.huang edited comment on YARN-7064 at 7/12/18 3:56 AM: {{Thanks [~miklos.szeg...@cloudera.com] for such nice feature!!!}} {{As a newBee for yarn, I've got a little confusion on why we choose to add up (user + sys) time in cpuacct.stat rather than use cpuacct.usage when try to get total cpu usage ? }} {{From the [kernel doc|https://www.kernel.org/doc/Documentation/cgroup-v1/cpuacct.txt]}} cpuacct.usage gives the CPU time (in nanoseconds) obtained by this group which is essentially the CPU time obtained by all the tasks in the system. cpuacct.stat lists a few statistics which further divide the CPU time (in USER_HZ unit) obtained by the cgroup into user and system times. And it also mentioned cpuacct controller uses percpu_counter interface to collect user and system times. This has two side effects: * It is theoretically possible to see wrong values for user and system times. This is because percpu_counter_read() on 32bit systems isn't safe against concurrent writes. * It is possible to see slightly outdated values for user and system times due to the batch processing nature of percpu_counter. seems much safer to use cpuacct.usage? was (Author: windwizard): {{Thanks [~miklos.szeg...@cloudera.com] for such nice feature!!!}} {{As a newBee for yarn, I've got a little confusion on why we choose to add up (user + sys) time in cpuacct.stat rather than use cpuacct.usage when try to get total cpu usage ? }} {{From the [kernel doc|https://www.kernel.org/doc/Documentation/cgroup-v1/cpuacct.txt]}} {{cpuacct.usage }}gives the CPU time (in nanoseconds) obtained by this group which is essentially the CPU time obtained by all the tasks in the system. cpuacct.stat lists a few statistics which further divide the CPU time (in USER_HZ unit) obtained by the cgroup into user and system times. And it also mentioned cpuacct controller uses percpu_counter interface to collect user and system times. This has two side effects: * It is theoretically possible to see wrong values for user and system times. This is because percpu_counter_read() on 32bit systems isn't safe against concurrent writes. * It is possible to see slightly outdated values for user and system times due to the batch processing nature of percpu_counter. seems much safer to use cpuacct.usage? > Use cgroup to get container resource utilization > > > Key: YARN-7064 > URL: https://issues.apache.org/jira/browse/YARN-7064 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: Miklos Szegedi >Assignee: Miklos Szegedi >Priority: Major > Fix For: 3.1.0 > > Attachments: YARN-7064.000.patch, YARN-7064.001.patch, > YARN-7064.002.patch, YARN-7064.003.patch, YARN-7064.004.patch, > YARN-7064.005.patch, YARN-7064.007.patch, YARN-7064.008.patch, > YARN-7064.009.patch, YARN-7064.010.patch, YARN-7064.011.patch, > YARN-7064.012.patch, YARN-7064.013.patch, YARN-7064.014.patch > > > This is an addendum to YARN-6668. What happens is that that jira always wants > to rebase patches against YARN-1011 instead of trunk. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4104) dryrun of schedule for diagnostic and tenant's complain
[ https://issues.apache.org/jira/browse/YARN-4104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16541087#comment-16541087 ] Weiwei Yang commented on YARN-4104: --- I like this idea too, but seems this one gets to obsolete :( > dryrun of schedule for diagnostic and tenant's complain > --- > > Key: YARN-4104 > URL: https://issues.apache.org/jira/browse/YARN-4104 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Reporter: Hong Zhiguo >Assignee: Hong Zhiguo >Priority: Minor > > We have more than 1 thousand queues and several hundreds of tenants in a busy > cluster. We get a lot of complains/questions from owner/operator of queues > about "Why my queue/app can't get resource for a long while? " > It's really hard to answer such questions. > So we added a diagnostic REST endpoint > "/ws/v1/cluster/schedule/dryrun/{parentQueueName}" which returns the sorted > list of it's children according to it's SchedulingPolicy.getComparator(). > All scheduling parameters of the children are also displayed, such as > minShare, usage, demand, weight, priority etc. > Usually we just call "/ws/v1/cluster/schedule/dryrun/root", and the result > self-explains to the questions. > I feel it's really useful for multi-tenant clusters, and hope it could be > merged into the mainline. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-4192) Add YARN metric logging periodically to a seperate file
[ https://issues.apache.org/jira/browse/YARN-4192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang resolved YARN-4192. --- Resolution: Won't Fix I am closing this now, like [~aw] mentioned, this should be already handled by metrics2. Please reopen it if you think otherwise [~nijel]. Thanks. > Add YARN metric logging periodically to a seperate file > --- > > Key: YARN-4192 > URL: https://issues.apache.org/jira/browse/YARN-4192 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: nijel >Assignee: nijel >Priority: Minor > > HDFS-8880 added a framework for logging metrics in a given interval. > This can be added to YARN as well > Any thoughts ? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8511) When AM releases a container, RM removes allocation tags before it is released by NM
[ https://issues.apache.org/jira/browse/YARN-8511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16541083#comment-16541083 ] Weiwei Yang commented on YARN-8511: --- [~leftnoteasy], [~asuresh], [~kkaranasos], could you please help to review this. Thanks. > When AM releases a container, RM removes allocation tags before it is > released by NM > > > Key: YARN-8511 > URL: https://issues.apache.org/jira/browse/YARN-8511 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Affects Versions: 3.1.0 >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Major > Attachments: YARN-8511.001.patch, YARN-8511.002.patch > > > User leverages PC with allocation tags to avoid port conflicts between apps, > we found sometimes they still get port conflicts. This is a similar issue > like YARN-4148. Because RM immediately removes allocation tags once > AM#allocate asks to release a container, however container on NM has some > delay until it actually gets killed and released the port. We should let RM > remove allocation tags AFTER NM confirms the containers are released. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8505) AMLimit and userAMLimit check should be skipped for unmanaged AM
[ https://issues.apache.org/jira/browse/YARN-8505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16541078#comment-16541078 ] Tao Yang commented on YARN-8505: [~bibinchundatt], Thanks for your replay. Yes, this issue will change the old behavior. We just encounter a scene that only specified partitions(a/b) has NMs (None in default partition) in a cluster, our users want to submit an unmanaged AM which will request resources from partition a or b but it won't run because of the resource limitation in default partition. Users think that unmanaged AM's resource isn't got from YARN so that it should not be limited because of resource and feel puzzled at this limitation. Is current limitation reasonable? I think we need a discuss for that. cc: [~leftnoteasy], [~cheersyang], [~sunilg] > AMLimit and userAMLimit check should be skipped for unmanaged AM > > > Key: YARN-8505 > URL: https://issues.apache.org/jira/browse/YARN-8505 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.2.0, 2.9.2 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Attachments: YARN-8505.001.patch > > > AMLimit and userAMLimit check in LeafQueue#activateApplications should be > skipped for unmanaged AM whose resource is not taken from YARN cluster. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8480) Add boolean option for resources
[ https://issues.apache.org/jira/browse/YARN-8480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16541039#comment-16541039 ] Weiwei Yang commented on YARN-8480: --- Hi [~templedf]/[~snemeth] Could you please elaborate the use case where you use this boolean type resource? What is the difference with a countable resource with value only 0 or 1? Thanks > Add boolean option for resources > > > Key: YARN-8480 > URL: https://issues.apache.org/jira/browse/YARN-8480 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Daniel Templeton >Assignee: Szilard Nemeth >Priority: Major > Attachments: YARN-8480.001.patch, YARN-8480.002.patch > > > Make it possible to define a resource with a boolean value. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7639) Queue Management scheduling edit policy class needs to be configured dynamically
[ https://issues.apache.org/jira/browse/YARN-7639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540973#comment-16540973 ] genericqa commented on YARN-7639: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 26s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 27s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 39s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 42s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 44s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 7s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 55s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 18s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 71m 35s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 21s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}125m 45s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd | | JIRA Issue | YARN-7639 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12931237/YARN-7639.2.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux a450989a6731 4.4.0-130-generic #156-Ubuntu SMP Thu Jun 14 08:53:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 632aca5 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_171 | | findbugs | v3.1.0-RC1 | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/21218/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/21218/testReport/ | | Max. process+thread count | 884 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U:
[jira] [Commented] (YARN-8270) Adding JMX Metrics for Timeline Collector and Reader
[ https://issues.apache.org/jira/browse/YARN-8270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540967#comment-16540967 ] Vrushali C commented on YARN-8270: -- A couple of more questions: - perhaps a bit dumb, but what is the 10 in all the initializers for registry.newQuantiles? - Does the asyncPutEntitiesTotalCount.incr(incCount); need to be synchronized? Similarly forasyncPutEntitiesLatency.add(durationMs)? Same question for the sync entities calls. > Adding JMX Metrics for Timeline Collector and Reader > > > Key: YARN-8270 > URL: https://issues.apache.org/jira/browse/YARN-8270 > Project: Hadoop YARN > Issue Type: Sub-task > Components: ATSv2, timelineserver >Reporter: Sushil Ks >Assignee: Sushil Ks >Priority: Major > Attachments: YARN-8270.001.patch > > > This Jira is for emitting JMX Metrics for ATS v2 Timeline Collector and > Timeline Reader, basically for Timeline Collector it tries to capture > success, failure and latencies for *putEntities* and *putEntitiesAsync* from > *TimelineCollectorWebService* and all the API's success, failure and > latencies for fetching TimelineEntities from *TimelineReaderWebServices*. > This would actually help in monitoring and measuring performance for ATSv2 at > scale. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8270) Adding JMX Metrics for Timeline Collector and Reader
[ https://issues.apache.org/jira/browse/YARN-8270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540965#comment-16540965 ] Vrushali C commented on YARN-8270: -- Hi [~haibochen] Gentle ping for any further thoughts on Sushil's patch. [~Sushil-K-S] I had a couple of things: - I see Time.monotonicNow() in some places and System.nanoTime() in others. For example, TimelineReaderWebServices has start time as one and endtime as other. Not sure if that was intentional. Perhaps we can use only one in all places, I think Time.monotonicNow may be better - In TimelineCollectorWebService # putEntities, I believe we are looking to measure the entire function call time? Or are we looking to measure just the putEntitiesAsync call at line 177 at https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/collector/TimelineCollectorWebService.java#L177 or the putEntities call at https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/collector/TimelineCollectorWebService.java#L180 ? - Also, calculating the end time in getElapsedTimeMs may not be the accurate measurement of elapsed time. The end time should be obtained in the finally block and passed in as an argument to getElapsedTimeMs. > Adding JMX Metrics for Timeline Collector and Reader > > > Key: YARN-8270 > URL: https://issues.apache.org/jira/browse/YARN-8270 > Project: Hadoop YARN > Issue Type: Sub-task > Components: ATSv2, timelineserver >Reporter: Sushil Ks >Assignee: Sushil Ks >Priority: Major > Attachments: YARN-8270.001.patch > > > This Jira is for emitting JMX Metrics for ATS v2 Timeline Collector and > Timeline Reader, basically for Timeline Collector it tries to capture > success, failure and latencies for *putEntities* and *putEntitiesAsync* from > *TimelineCollectorWebService* and all the API's success, failure and > latencies for fetching TimelineEntities from *TimelineReaderWebServices*. > This would actually help in monitoring and measuring performance for ATSv2 at > scale. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7129) Application Catalog for YARN applications
[ https://issues.apache.org/jira/browse/YARN-7129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540963#comment-16540963 ] genericqa commented on YARN-7129: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 27s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 16 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 46s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 16s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 27m 1s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 28s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 29s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 25s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-project hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 0s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 5s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 24s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 26m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 26m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 6m 14s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} shellcheck {color} | {color:red} 0m 0s{color} | {color:red} The patch generated 8 new + 0 unchanged - 0 fixed = 8 total (was 0) {color} | | {color:orange}-0{color} | {color:orange} shelldocs {color} | {color:orange} 0m 34s{color} | {color:orange} The patch generated 158 new + 400 unchanged - 0 fixed = 558 total (was 400) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 14s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 52s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 1s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-project hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-catalog hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-catalog/hadoop-yarn-applications-catalog-docker {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 6s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 27s{color} | {color:green} hadoop-project in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 29m 13s{color} | {color:red}
[jira] [Created] (YARN-8520) Document best practice for user management
Eric Yang created YARN-8520: --- Summary: Document best practice for user management Key: YARN-8520 URL: https://issues.apache.org/jira/browse/YARN-8520 Project: Hadoop YARN Issue Type: Sub-task Components: documentation, yarn Reporter: Eric Yang Assignee: Eric Yang Docker container must have consistent username and groups with host operating system when external mount points are exposed to docker container. This prevents malicious or unauthorized impersonation to occur. This task is to document the best practice to ensure user and group membership are consistent across docker containers. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7481) Gpu locality support for Better AI scheduling
[ https://issues.apache.org/jira/browse/YARN-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540899#comment-16540899 ] Chen Qingcha commented on YARN-7481: There are some projects depend on the GPU on 2.7.2 and going to move to 2.9.0, So I need do some bug fixing and the movement. For long term, When these projects plan to move to 3.1+, we do need merge this to the community. > Gpu locality support for Better AI scheduling > - > > Key: YARN-7481 > URL: https://issues.apache.org/jira/browse/YARN-7481 > Project: Hadoop YARN > Issue Type: New Feature > Components: api, RM, yarn >Affects Versions: 2.7.2 >Reporter: Chen Qingcha >Priority: Major > Attachments: GPU locality support for Job scheduling.pdf, > hadoop-2.7.2.gpu-port-20180711.patch, hadoop-2.7.2.gpu-port.patch, > hadoop-2.9.0.gpu-port.patch, hadoop_2.9.0.patch > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > We enhance Hadoop with GPU support for better AI job scheduling. > Currently, YARN-3926 also supports GPU scheduling, which treats GPU as > countable resource. > However, GPU placement is also very important to deep learning job for better > efficiency. > For example, a 2-GPU job runs on gpu {0,1} could be faster than run on gpu > {0, 7}, if GPU 0 and 1 are under the same PCI-E switch while 0 and 7 are not. > We add the support to Hadoop 2.7.2 to enable GPU locality scheduling, which > support fine-grained GPU placement. > A 64-bits bitmap is added to yarn Resource, which indicates both GPU usage > and locality information in a node (up to 64 GPUs per node). '1' means > available and '0' otherwise in the corresponding position of the bit. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8480) Add boolean option for resources
[ https://issues.apache.org/jira/browse/YARN-8480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540897#comment-16540897 ] Wangda Tan commented on YARN-8480: -- btw, [~templedf], I knew you found some troubles of support node partition concept in FS (YARN-2497). The node attribute should be a much easier support because there's no resource sharing required under node attribute, everything comes with FCFS. Basically you should only add a if check before deciding allocate container X on node Y. > Add boolean option for resources > > > Key: YARN-8480 > URL: https://issues.apache.org/jira/browse/YARN-8480 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Daniel Templeton >Assignee: Szilard Nemeth >Priority: Major > Attachments: YARN-8480.001.patch, YARN-8480.002.patch > > > Make it possible to define a resource with a boolean value. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8480) Add boolean option for resources
[ https://issues.apache.org/jira/browse/YARN-8480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540895#comment-16540895 ] Wangda Tan commented on YARN-8480: -- [~templedf], If this only changes Fair Scheduler, I'm fine with that. However this touches Resource/ResourceInformation/RMAminCLI/ResourceCalculator/CapacityScheduler-implementation, etc. If you could take a closer look at YARN-3409 APIs, it should not be hard at all. Should definitely cheaper than adding a new resource type. > Add boolean option for resources > > > Key: YARN-8480 > URL: https://issues.apache.org/jira/browse/YARN-8480 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Daniel Templeton >Assignee: Szilard Nemeth >Priority: Major > Attachments: YARN-8480.001.patch, YARN-8480.002.patch > > > Make it possible to define a resource with a boolean value. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8480) Add boolean option for resources
[ https://issues.apache.org/jira/browse/YARN-8480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540885#comment-16540885 ] Daniel Templeton commented on YARN-8480: I don't disagree that in terms of general use cases there's overlap between boolean resources and node attributes. From the perspective of fair scheduler, however, there's a big difference. Boolean resources are a simple mechanism for a pure label resource that is a minor extension of existing capabilities. Node labels or node attributes will be a major integration effort. While much of this work touches the common code, what we're proposing here is not intended as a general solution. Capacity scheduler is where the work for node attributes is happening, and we don't propose to change that or muddy those waters. This JIRA is intended only to provide a pure label resource for fair scheduler that isn't as heavy. > Add boolean option for resources > > > Key: YARN-8480 > URL: https://issues.apache.org/jira/browse/YARN-8480 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Daniel Templeton >Assignee: Szilard Nemeth >Priority: Major > Attachments: YARN-8480.001.patch, YARN-8480.002.patch > > > Make it possible to define a resource with a boolean value. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7974) Allow updating application tracking url after registration
[ https://issues.apache.org/jira/browse/YARN-7974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540879#comment-16540879 ] Wangda Tan commented on YARN-7974: -- [~oliverhuh...@gmail.com], [~jhung], Thanks for updating the patch, in general the patch looks good. Several minor comments: 1) {code} public abstract void updateTrackingUrl(String trackingUrl); {code} Should have a default (maybe empty) implementation. Given AMRMClient/Async are all public/stable APIs, we don't want build break if any app extend from these classes. 2) The updateTrackingUrl should be marked as public/unstable. 3) Should we explicitly compare content of the new tracking url with old tracking url? Now we only check != null which may not be enough. {code} 1824 // Update tracking url if changed and save it to state store 1825 String newTrackingUrl = statusUpdateEvent.getTrackingUrl(); 1826 if (newTrackingUrl != null) { {code} > Allow updating application tracking url after registration > -- > > Key: YARN-7974 > URL: https://issues.apache.org/jira/browse/YARN-7974 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Jonathan Hung >Assignee: Jonathan Hung >Priority: Major > Attachments: YARN-7974.001.patch, YARN-7974.002.patch, > YARN-7974.003.patch, YARN-7974.004.patch, YARN-7974.005.patch > > > Normally an application's tracking url is set on AM registration. We have a > use case for updating the tracking url after registration (e.g. the UI is > hosted on one of the containers). > Approach is for AM to update tracking url on heartbeat to RM, and add related > API in AMRMClient. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7639) Queue Management scheduling edit policy class needs to be configured dynamically
[ https://issues.apache.org/jira/browse/YARN-7639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suma Shivaprasad updated YARN-7639: --- Attachment: YARN-7639.2.patch > Queue Management scheduling edit policy class needs to be configured > dynamically > > > Key: YARN-7639 > URL: https://issues.apache.org/jira/browse/YARN-7639 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Reporter: Suma Shivaprasad >Assignee: Suma Shivaprasad >Priority: Major > Attachments: YARN-7639.1.patch, YARN-7639.2.patch > > > This needs to be configured dynamically i.e added to the list of current > policies configured under > yarn.resourcemanager.scheduler.monitor.policies > whenever auto leaf queue creation is enabled for a parent queue. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7639) Queue Management scheduling edit policy class needs to be configured dynamically
[ https://issues.apache.org/jira/browse/YARN-7639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540867#comment-16540867 ] Suma Shivaprasad commented on YARN-7639: Fixed UT failure > Queue Management scheduling edit policy class needs to be configured > dynamically > > > Key: YARN-7639 > URL: https://issues.apache.org/jira/browse/YARN-7639 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Reporter: Suma Shivaprasad >Assignee: Suma Shivaprasad >Priority: Major > Attachments: YARN-7639.1.patch, YARN-7639.2.patch > > > This needs to be configured dynamically i.e added to the list of current > policies configured under > yarn.resourcemanager.scheduler.monitor.policies > whenever auto leaf queue creation is enabled for a parent queue. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7481) Gpu locality support for Better AI scheduling
[ https://issues.apache.org/jira/browse/YARN-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-7481: - Fix Version/s: (was: 2.7.2) > Gpu locality support for Better AI scheduling > - > > Key: YARN-7481 > URL: https://issues.apache.org/jira/browse/YARN-7481 > Project: Hadoop YARN > Issue Type: New Feature > Components: api, RM, yarn >Affects Versions: 2.7.2 >Reporter: Chen Qingcha >Priority: Major > Attachments: GPU locality support for Job scheduling.pdf, > hadoop-2.7.2.gpu-port-20180711.patch, hadoop-2.7.2.gpu-port.patch, > hadoop-2.9.0.gpu-port.patch, hadoop_2.9.0.patch > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > We enhance Hadoop with GPU support for better AI job scheduling. > Currently, YARN-3926 also supports GPU scheduling, which treats GPU as > countable resource. > However, GPU placement is also very important to deep learning job for better > efficiency. > For example, a 2-GPU job runs on gpu {0,1} could be faster than run on gpu > {0, 7}, if GPU 0 and 1 are under the same PCI-E switch while 0 and 7 are not. > We add the support to Hadoop 2.7.2 to enable GPU locality scheduling, which > support fine-grained GPU placement. > A 64-bits bitmap is added to yarn Resource, which indicates both GPU usage > and locality information in a node (up to 64 GPUs per node). '1' means > available and '0' otherwise in the corresponding position of the bit. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8480) Add boolean option for resources
[ https://issues.apache.org/jira/browse/YARN-8480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540810#comment-16540810 ] Wangda Tan commented on YARN-8480: -- [~snemeth]/[~templedf]. To me we should move this to node attribute: we already have most things ready here under YARN-3409 branch. I don't think we should duplicate the same things in this JIRA. For stuffs added to resource class, it's not necessarily to be countable, (definition of countable according to design doc of YARN-3926): {code} When we speak of countable resources, we refer to resourcetypes where the allocation and release of resources is a simple subtraction and addition operation. {code} I'm supportive to add new resource types like set, range, or hierarchical (like disk resource isolation which is mentioned by [~cheersyang]). The pure label resource type should go to node attribute / node partition. > Add boolean option for resources > > > Key: YARN-8480 > URL: https://issues.apache.org/jira/browse/YARN-8480 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Daniel Templeton >Assignee: Szilard Nemeth >Priority: Major > Attachments: YARN-8480.001.patch, YARN-8480.002.patch > > > Make it possible to define a resource with a boolean value. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7129) Application Catalog for YARN applications
[ https://issues.apache.org/jira/browse/YARN-7129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang updated YARN-7129: Attachment: YARN-7129.005.patch > Application Catalog for YARN applications > - > > Key: YARN-7129 > URL: https://issues.apache.org/jira/browse/YARN-7129 > Project: Hadoop YARN > Issue Type: New Feature > Components: applications >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Attachments: YARN Appstore.pdf, YARN-7129.001.patch, > YARN-7129.002.patch, YARN-7129.003.patch, YARN-7129.004.patch, > YARN-7129.005.patch > > > YARN native services provides web services API to improve usability of > application deployment on Hadoop using collection of docker images. It would > be nice to have an application catalog system which provides an editorial and > search interface for YARN applications. This improves usability of YARN for > manage the life cycle of applications. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8501) Reduce complexity of RMWebServices' getApps method
[ https://issues.apache.org/jira/browse/YARN-8501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540759#comment-16540759 ] Suma Shivaprasad commented on YARN-8501: [~snemeth] Patch LGTM,. Can you add some UTs for testing various query filter params to getApps? > Reduce complexity of RMWebServices' getApps method > -- > > Key: YARN-8501 > URL: https://issues.apache.org/jira/browse/YARN-8501 > Project: Hadoop YARN > Issue Type: Improvement > Components: restapi >Reporter: Szilard Nemeth >Assignee: Szilard Nemeth >Priority: Major > Attachments: YARN-8501.001.patch, YARN-8501.002.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-3611) Support Docker Containers In LinuxContainerExecutor
[ https://issues.apache.org/jira/browse/YARN-3611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540749#comment-16540749 ] Shane Kumpf commented on YARN-3611: --- +1000 :) Great effort everyone. I'm excited for what has been achieved and where this support is going. > Support Docker Containers In LinuxContainerExecutor > --- > > Key: YARN-3611 > URL: https://issues.apache.org/jira/browse/YARN-3611 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Sidharta Seethana >Assignee: Sidharta Seethana >Priority: Major > Labels: Docker > > Support Docker Containers In LinuxContainerExecutor > LinuxContainerExecutor provides useful functionality today with respect to > localization, cgroups based resource management and isolation for CPU, > network, disk etc. as well as security with a well-defined mechanism to > execute privileged operations using the container-executor utility. Bringing > docker support to LinuxContainerExecutor lets us use all of this > functionality when running docker containers under YARN, while not requiring > users and admins to configure and use a different ContainerExecutor. > There are several aspects here that need to be worked through : > * Mechanism(s) to let clients request docker-specific functionality - we > could initially implement this via environment variables without impacting > the client API. > * Security - both docker daemon as well as application > * Docker image localization > * Running a docker container via container-executor as a specified user > * “Isolate” the docker container in terms of CPU/network/disk/etc > * Communicating with and/or signaling the running container (ensure correct > pid handling) > * Figure out workarounds for certain performance-sensitive scenarios like > HDFS short-circuit reads > * All of these need to be achieved without changing the current behavior of > LinuxContainerExecutor -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8403) Nodemanager logs failed to download file with INFO level
[ https://issues.apache.org/jira/browse/YARN-8403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang updated YARN-8403: Issue Type: Sub-task (was: Improvement) Parent: YARN-8472 > Nodemanager logs failed to download file with INFO level > > > Key: YARN-8403 > URL: https://issues.apache.org/jira/browse/YARN-8403 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Attachments: YARN-8403.001.patch, YARN-8403.002.patch, > YARN-8403.003.patch, YARN-8403.png > > > Some of the container execution related stack traces are printing in INFO or > WARN level. > {code} > 2018-06-06 03:10:40,077 INFO localizer.ResourceLocalizationService > (ResourceLocalizationService.java:writeCredentials(1312)) - Writing > credentials to the nmPrivate file > /grid/0/hadoop/yarn/local/nmPrivate/container_e02_1528246317583_0048_01_01.tokens > 2018-06-06 03:10:40,087 INFO localizer.ResourceLocalizationService > (ResourceLocalizationService.java:run(975)) - Failed to download resource { { > hdfs://mycluster.example.com:8020/user/hrt_qa/Streaming/InputDir, > 1528254452720, FILE, null > },pending,[(container_e02_1528246317583_0048_01_01)],6074418082915225,DOWNLOADING} > org.apache.hadoop.yarn.exceptions.YarnException: Download and unpack failed > at > org.apache.hadoop.yarn.util.FSDownload.downloadAndUnpack(FSDownload.java:306) > at > org.apache.hadoop.yarn.util.FSDownload.verifyAndCopy(FSDownload.java:283) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:409) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:66) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.io.FileNotFoundException: > /grid/0/hadoop/yarn/local/filecache/28_tmp/InputDir/input1.txt (Permission > denied) > at java.io.FileOutputStream.open0(Native Method) > at java.io.FileOutputStream.open(FileOutputStream.java:270) > at java.io.FileOutputStream.(FileOutputStream.java:213) > at > org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.(RawLocalFileSystem.java:236) > at > org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.(RawLocalFileSystem.java:219) > at > org.apache.hadoop.fs.RawLocalFileSystem.createOutputStreamWithMode(RawLocalFileSystem.java:318) > at > org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:307) > at > org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:338) > at > org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.(ChecksumFileSystem.java:401) > at > org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:464) > at > org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:443) > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1169) > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1149) > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1038) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:408) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:399) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:381) > at > org.apache.hadoop.yarn.util.FSDownload.downloadAndUnpack(FSDownload.java:298) > ... 9 more > {code} > {code} > 2018-06-06 03:10:41,547 WARN privileged.PrivilegedOperationExecutor > (PrivilegedOperationExecutor.java:executePrivilegedOperation(182)) - > IOException executing command: > java.io.InterruptedIOException: java.lang.InterruptedException > at org.apache.hadoop.util.Shell.runCommand(Shell.java:1012) > at org.apache.hadoop.util.Shell.run(Shell.java:902) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1227) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:152) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.startLocalizer(LinuxContainerExecutor.java:402) > at >
[jira] [Resolved] (YARN-6576) Improve Diagonstic by moving Error stack trace from NM to slider AM
[ https://issues.apache.org/jira/browse/YARN-6576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang resolved YARN-6576. - Resolution: Duplicate Assignee: Eric Yang This is duplicate of YARN-8403. > Improve Diagonstic by moving Error stack trace from NM to slider AM > --- > > Key: YARN-6576 > URL: https://issues.apache.org/jira/browse/YARN-6576 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Yesha Vora >Assignee: Eric Yang >Priority: Major > Labels: Docker > > Slider Master diagonstics should improve to show root cause of App failures > for issues like missing docker image. > Currently, Slider Master log does not show proper error message to debug such > failure. User have to access Nodemanager logs to find out root cause of such > issues where container failed to start. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-3611) Support Docker Containers In LinuxContainerExecutor
[ https://issues.apache.org/jira/browse/YARN-3611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540740#comment-16540740 ] Eric Yang commented on YARN-3611: - We have completed 88 out of 105 tasks in the first milestone, and moved only 17 tasks to phase 2 for incremental improvement. This was a great community effort. Thank you to everyone who contributed to this JIRA. :) > Support Docker Containers In LinuxContainerExecutor > --- > > Key: YARN-3611 > URL: https://issues.apache.org/jira/browse/YARN-3611 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Sidharta Seethana >Assignee: Sidharta Seethana >Priority: Major > Labels: Docker > > Support Docker Containers In LinuxContainerExecutor > LinuxContainerExecutor provides useful functionality today with respect to > localization, cgroups based resource management and isolation for CPU, > network, disk etc. as well as security with a well-defined mechanism to > execute privileged operations using the container-executor utility. Bringing > docker support to LinuxContainerExecutor lets us use all of this > functionality when running docker containers under YARN, while not requiring > users and admins to configure and use a different ContainerExecutor. > There are several aspects here that need to be worked through : > * Mechanism(s) to let clients request docker-specific functionality - we > could initially implement this via environment variables without impacting > the client API. > * Security - both docker daemon as well as application > * Docker image localization > * Running a docker container via container-executor as a specified user > * “Isolate” the docker container in terms of CPU/network/disk/etc > * Communicating with and/or signaling the running container (ensure correct > pid handling) > * Figure out workarounds for certain performance-sensitive scenarios like > HDFS short-circuit reads > * All of these need to be achieved without changing the current behavior of > LinuxContainerExecutor -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-3611) Support Docker Containers In LinuxContainerExecutor
[ https://issues.apache.org/jira/browse/YARN-3611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger resolved YARN-3611. --- Resolution: Fixed We have closed out Phase 1 of Docker container support and are now moving onto Phase 2. Phase 2 will be tracked by YARN-8472, so please file any new JIRAs under that umbrella (with the 'Docker' label). > Support Docker Containers In LinuxContainerExecutor > --- > > Key: YARN-3611 > URL: https://issues.apache.org/jira/browse/YARN-3611 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Sidharta Seethana >Assignee: Sidharta Seethana >Priority: Major > Labels: Docker > > Support Docker Containers In LinuxContainerExecutor > LinuxContainerExecutor provides useful functionality today with respect to > localization, cgroups based resource management and isolation for CPU, > network, disk etc. as well as security with a well-defined mechanism to > execute privileged operations using the container-executor utility. Bringing > docker support to LinuxContainerExecutor lets us use all of this > functionality when running docker containers under YARN, while not requiring > users and admins to configure and use a different ContainerExecutor. > There are several aspects here that need to be worked through : > * Mechanism(s) to let clients request docker-specific functionality - we > could initially implement this via environment variables without impacting > the client API. > * Security - both docker daemon as well as application > * Docker image localization > * Running a docker container via container-executor as a specified user > * “Isolate” the docker container in terms of CPU/network/disk/etc > * Communicating with and/or signaling the running container (ensure correct > pid handling) > * Figure out workarounds for certain performance-sensitive scenarios like > HDFS short-circuit reads > * All of these need to be achieved without changing the current behavior of > LinuxContainerExecutor -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8472) YARN Container Phase 2
[ https://issues.apache.org/jira/browse/YARN-8472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-8472: -- Description: In YARN-3611, we have implemented basic Docker container support for YARN. This story is the next phase to improve container usability. Several area for improvements are: # Software defined network support # Interactive shell to container # User management sss/nscd integration # Runc/containerd support # Metrics/Logs integration with Timeline service v2 # Docker container profiles # Docker cgroup management was: In YARN-3611, we have implemented basic Docker container support for YARN. This story is the next phase to improve container usability. Several area for improvements are: # Software defined network support # Interactive shell to container # User management sss/nscd integration # Runc/containerd support # Metrics/Logs integration with Timeline service v2 > YARN Container Phase 2 > -- > > Key: YARN-8472 > URL: https://issues.apache.org/jira/browse/YARN-8472 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Eric Yang >Priority: Major > > In YARN-3611, we have implemented basic Docker container support for YARN. > This story is the next phase to improve container usability. > Several area for improvements are: > # Software defined network support > # Interactive shell to container > # User management sss/nscd integration > # Runc/containerd support > # Metrics/Logs integration with Timeline service v2 > # Docker container profiles > # Docker cgroup management -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-8518) test-container-executor test_is_empty() is broken
[ https://issues.apache.org/jira/browse/YARN-8518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Brennan reassigned YARN-8518: - Assignee: Jim Brennan > test-container-executor test_is_empty() is broken > - > > Key: YARN-8518 > URL: https://issues.apache.org/jira/browse/YARN-8518 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Major > > A new test was recently added to test-container-executor.c that has some > problems. > It is attempting to mkdir() a hard-coded path: > /tmp/2938rf2983hcqnw8ud/emptydir > This fails because the base directory is not there. These directories are > not being cleaned up either. > It should be using TEST_ROOT. > I don't know what Jira this change was made under - the git commit from July > 9 2018 does not reference a Jira. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-8241) MRAppMaster fails when using UID:GID pair within docker container
[ https://issues.apache.org/jira/browse/YARN-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger resolved YARN-8241. --- Resolution: Won't Fix We do not plan to support running containers in which a user lookup cannot be performed. This is to be considered a base requirement for a container to run. Some sort of user lookup mechanism such as nscd, sssd, /etc/passwd, or something else needs to be implemented so that the UID:GID pair can be linked to a user and group > MRAppMaster fails when using UID:GID pair within docker container > - > > Key: YARN-8241 > URL: https://issues.apache.org/jira/browse/YARN-8241 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Priority: Major > Labels: Docker > > As mentioned in [this > comment|https://issues.apache.org/jira/browse/YARN-4266?focusedCommentId=16063931=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16063931], > the MRAppMaster fails for docker containers if there is no additional user > lookup strategy (e.g. bind-mounting /var/run/nscd or /etc/passwd). We need a > better solution so that users can still run even if they are not known inside > of the container by name -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6495) check docker container's exit code when writing to cgroup task files
[ https://issues.apache.org/jira/browse/YARN-6495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540722#comment-16540722 ] genericqa commented on YARN-6495: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 7s{color} | {color:red} YARN-6495 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | YARN-6495 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12918965/YARN-6495.002.patch | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/21216/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > check docker container's exit code when writing to cgroup task files > > > Key: YARN-6495 > URL: https://issues.apache.org/jira/browse/YARN-6495 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Jaeboo Jeong >Assignee: Jaeboo Jeong >Priority: Major > Labels: Docker > Attachments: YARN-6495.001.patch, YARN-6495.002.patch > > > If I execute simple command like date on docker container, the application > failed to complete successfully. > for example, > {code} > $ yarn jar > $HADOOP_HOME/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.7.1.jar > -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env > YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=hadoop-docker -shell_command "date" -jar > $HADOOP_HOME/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.7.1.jar > -num_containers 1 -timeout 360 > … > 17/04/12 00:16:40 INFO distributedshell.Client: Application did finished > unsuccessfully. YarnState=FINISHED, DSFinalStatus=FAILED. Breaking monitoring > loop > 17/04/12 00:16:40 ERROR distributedshell.Client: Application failed to > complete successfully > {code} > The error log is like below. > {code} > ... > Failed to write pid to file > /cgroup_parent/cpu/hadoop-yarn/container_/tasks - No such process > ... > {code} > When writing pid to cgroup tasks, container-executor doesn’t check docker > container’s status. > If the container finished very quickly, we can’t write pid to cgroup tasks, > and it is not problem. > So container-executor needs to check docker container’s exit code during > writing pid to cgroup tasks. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7904) Privileged, trusted containers need all of their bind-mounted directories to be read-only
[ https://issues.apache.org/jira/browse/YARN-7904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-7904: -- Parent Issue: YARN-8472 (was: YARN-3611) > Privileged, trusted containers need all of their bind-mounted directories to > be read-only > - > > Key: YARN-7904 > URL: https://issues.apache.org/jira/browse/YARN-7904 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Priority: Major > Labels: Docker > > Since they will be running as some other user than themselves, the NM likely > won't be able to clean up after them because of permissions issues. So, to > prevent this, we should make these directories read-only. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5168) Add port mapping handling when docker container use bridge network
[ https://issues.apache.org/jira/browse/YARN-5168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-5168: -- Parent Issue: YARN-8472 (was: YARN-3611) > Add port mapping handling when docker container use bridge network > -- > > Key: YARN-5168 > URL: https://issues.apache.org/jira/browse/YARN-5168 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Jun Gong >Priority: Major > Labels: Docker > > YARN-4007 addresses different network setups when launching the docker > container. We need support port mapping when docker container uses bridge > network. > The following problems are what we faced: > 1. Add "-P" to map docker container's exposed ports to automatically. > 2. Add "-p" to let user specify specific ports to map. > 3. Add service registry support for bridge network case, then app could find > each other. It could be done out of YARN, however it might be more convenient > to support it natively in YARN. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-7197) Add support for a volume blacklist for docker containers
[ https://issues.apache.org/jira/browse/YARN-7197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger resolved YARN-7197. --- Resolution: Won't Fix There are currently no plans to attempt to implement this feature. Feel free to re-open if you would like to work on this > Add support for a volume blacklist for docker containers > > > Key: YARN-7197 > URL: https://issues.apache.org/jira/browse/YARN-7197 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Shane Kumpf >Priority: Major > Labels: Docker > Attachments: YARN-7197.001.patch, YARN-7197.002.patch, > YARN-7197.003.patch, YARN-7197.004.patch, YARN-7197.005.patch > > > Docker supports bind mounting host directories into containers. Work is > underway to allow admins to configure a whilelist of volume mounts. While > this is a much needed and useful feature, it opens the door for > misconfiguration that may lead to users being able to compromise or crash the > system. > One example would be allowing users to mount /run from a host running > systemd, and then running systemd in that container, rendering the host > mostly unusable. > This issue is to add support for a default blacklist. The default blacklist > would be where we put files and directories that if mounted into a container, > are likely to have negative consequences. Users are encouraged not to remove > items from the default blacklist, but may do so if necessary. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7639) Queue Management scheduling edit policy class needs to be configured dynamically
[ https://issues.apache.org/jira/browse/YARN-7639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540717#comment-16540717 ] genericqa commented on YARN-7639: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 24s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 42s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 15s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 46s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 4s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 11s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 9s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 25s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 69m 56s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 31s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}129m 40s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerAutoQueueCreation | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd | | JIRA Issue | YARN-7639 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12931210/YARN-7639.1.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux cb9c6aee71b8 3.13.0-144-generic #193-Ubuntu SMP Thu Mar 15 17:03:53 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 632aca5 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_171 | | findbugs | v3.1.0-RC1 | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/21215/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/21215/testReport/ | | Max. process+thread count | 942 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U:
[jira] [Updated] (YARN-5670) Add support for Docker image clean up
[ https://issues.apache.org/jira/browse/YARN-5670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-5670: -- Parent: YARN-8472 (was: YARN-3611) > Add support for Docker image clean up > - > > Key: YARN-5670 > URL: https://issues.apache.org/jira/browse/YARN-5670 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Zhankun Tang >Assignee: Shane Kumpf >Priority: Major > Labels: Docker > > Regarding to Docker image localization, we also need a way to clean up the > old/stale Docker image to save storage space. We may extend deletion service > to utilize "docker rm" to do this. > This is related to YARN-3854 and may depend on its implementation. Please > refer to YARN-3854 for Docker image localization details. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6495) check docker container's exit code when writing to cgroup task files
[ https://issues.apache.org/jira/browse/YARN-6495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-6495: -- Parent: YARN-8472 (was: YARN-3611) > check docker container's exit code when writing to cgroup task files > > > Key: YARN-6495 > URL: https://issues.apache.org/jira/browse/YARN-6495 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Jaeboo Jeong >Assignee: Jaeboo Jeong >Priority: Major > Labels: Docker > Attachments: YARN-6495.001.patch, YARN-6495.002.patch > > > If I execute simple command like date on docker container, the application > failed to complete successfully. > for example, > {code} > $ yarn jar > $HADOOP_HOME/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.7.1.jar > -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env > YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=hadoop-docker -shell_command "date" -jar > $HADOOP_HOME/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.7.1.jar > -num_containers 1 -timeout 360 > … > 17/04/12 00:16:40 INFO distributedshell.Client: Application did finished > unsuccessfully. YarnState=FINISHED, DSFinalStatus=FAILED. Breaking monitoring > loop > 17/04/12 00:16:40 ERROR distributedshell.Client: Application failed to > complete successfully > {code} > The error log is like below. > {code} > ... > Failed to write pid to file > /cgroup_parent/cpu/hadoop-yarn/container_/tasks - No such process > ... > {code} > When writing pid to cgroup tasks, container-executor doesn’t check docker > container’s status. > If the container finished very quickly, we can’t write pid to cgroup tasks, > and it is not problem. > So container-executor needs to check docker container’s exit code during > writing pid to cgroup tasks. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8376) Separate white list for docker.trusted.registries and docker.privileged-container.registries
[ https://issues.apache.org/jira/browse/YARN-8376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-8376: -- Parent: YARN-8472 (was: YARN-3611) > Separate white list for docker.trusted.registries and > docker.privileged-container.registries > > > Key: YARN-8376 > URL: https://issues.apache.org/jira/browse/YARN-8376 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Priority: Major > Labels: docker > > In the ideal world, it would be possible to have separate white lists for > docker registry depending on the security requirement for each type of docker > images: > 1. Registries from which we can run non-privileged containers without mounts > 2. Registries from which we can run non-privileged containers with mounts > 3. Registries from which we can run privileged or non-privileged containers > with mounts > In the current implementation, there are only type 1 and type 2 or 3. It > would be nice to definite a separate white list to differentiate between 2 > and 3. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-3854) Add localization support for docker images
[ https://issues.apache.org/jira/browse/YARN-3854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-3854: -- Parent: YARN-8472 (was: YARN-3611) > Add localization support for docker images > -- > > Key: YARN-3854 > URL: https://issues.apache.org/jira/browse/YARN-3854 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Sidharta Seethana >Assignee: Shane Kumpf >Priority: Major > Labels: Docker > Attachments: YARN-3854-branch-2.8.001.patch, > YARN-3854_Localization_support_for_Docker_image_v1.pdf, > YARN-3854_Localization_support_for_Docker_image_v2.pdf, > YARN-3854_Localization_support_for_Docker_image_v3.pdf > > > We need the ability to localize docker images when those images aren't > already available locally. There are various approaches that could be used > here with different trade-offs/issues : image archives on HDFS + docker load > , docker pull during the localization phase or (automatic) docker pull > during the run/launch phase. > We also need the ability to clean-up old/stale, unused images. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6456) Allow administrators to set a single ContainerRuntime for all containers
[ https://issues.apache.org/jira/browse/YARN-6456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-6456: -- Parent: YARN-8472 (was: YARN-3611) > Allow administrators to set a single ContainerRuntime for all containers > > > Key: YARN-6456 > URL: https://issues.apache.org/jira/browse/YARN-6456 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Miklos Szegedi >Priority: Major > Labels: Docker > > > With LCE, there are multiple ContainerRuntimes available for handling > different types of containers; default, docker, java sandbox. Admins should > have the ability to override the user decision and set a single global > ContainerRuntime to be used for all containers. > Original Description: > {quote}One reason to use Docker containers is to be able to isolate different > workloads, even, if they run as the same user. > I have noticed some issues in the current design: > 1. DockerLinuxContainerRuntime mounts containerLocalDirs > {{nm-local-dir/usercache/user/appcache/application_1491598755372_0011/}} and > userLocalDirs {{nm-local-dir/usercache/user/}}, so that a container can see > and modify the files of another container. I think the application file cache > directory should be enough for the container to run in most of the cases. > 2. The whole cgroups directory is mounted. Would the container directory be > enough? > 3. There is no way to enforce exclusive use of Docker for all containers. > There should be an option that it is not the user but the admin that requires > to use Docker. > {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7848) Force removal of docker containers that do not get removed on first try
[ https://issues.apache.org/jira/browse/YARN-7848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-7848: -- Parent: YARN-8472 (was: YARN-3611) > Force removal of docker containers that do not get removed on first try > --- > > Key: YARN-7848 > URL: https://issues.apache.org/jira/browse/YARN-7848 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Priority: Major > Labels: Docker > > After the addition of YARN-5366, containers will get removed after a certain > debug delay. However, this is a one-time effort. If the removal fails for > whatever reason, the container will persist. We need to add a mechanism for a > forced removal of those containers. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8263) DockerClient still touches hadoop.tmp.dir
[ https://issues.apache.org/jira/browse/YARN-8263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-8263: -- Parent: YARN-8472 (was: YARN-3611) > DockerClient still touches hadoop.tmp.dir > - > > Key: YARN-8263 > URL: https://issues.apache.org/jira/browse/YARN-8263 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.1.1 >Reporter: Jason Lowe >Priority: Minor > Labels: Docker > > The DockerClient constructor fails if hadoop.tmp.dir is not set and proceeds > to create a directory there. After YARN-8064 there's no longer a need to > touch the temporary directory. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8501) Reduce complexity of RMWebServices' getApps method
[ https://issues.apache.org/jira/browse/YARN-8501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540716#comment-16540716 ] genericqa commented on YARN-8501: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 52s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 58s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 23s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 10s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 59s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 59s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 43s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 9s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 9s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 8s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 7s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 26s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 40s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 6s{color} | {color:green} hadoop-yarn-server-common in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 19s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 20s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}128m 9s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.recovery.TestLeveldbRMStateStore | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd | | JIRA Issue | YARN-8501 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12931209/YARN-8501.002.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 2f92ee28d778 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 632aca5 | | maven
[jira] [Updated] (YARN-8287) Update documentation and yarn-default related to the Docker runtime
[ https://issues.apache.org/jira/browse/YARN-8287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-8287: -- Parent: YARN-8472 (was: YARN-3611) > Update documentation and yarn-default related to the Docker runtime > --- > > Key: YARN-8287 > URL: https://issues.apache.org/jira/browse/YARN-8287 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Shane Kumpf >Priority: Minor > Labels: Docker > > There are a few typos and omissions in the documentation and yarn-default wrt > running Docker containers on YARN. Below is what I noticed, but a more > thorough review is still needed: > * docker.allowed.volume-drivers is not documented > * None of the GPU or FPGA related items are in the Docker docs. > * "To run without any capabilites," - typo in yarn-default.xml > * remove from yarn-default.xml > * yarn.nodemanager.runtime.linux.docker.delayed-removal.allowed missing from > docs > * yarn.nodemanager.runtime.linux.docker.stop.grace-period missing from docs > * The user remapping features are missing from the docs, we should > explicitly call this out. > * The privileged container section could use a bit of rework to outline the > risks of the feature. > * Is it time to remove the security warnings? The community has made many > improvements since that warning was added. > * "path within the contatiner" typo -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8518) test-container-executor test_is_empty() is broken
[ https://issues.apache.org/jira/browse/YARN-8518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540604#comment-16540604 ] Robert Kanter commented on YARN-8518: - That would be great [~Jim_Brennan], thanks! > test-container-executor test_is_empty() is broken > - > > Key: YARN-8518 > URL: https://issues.apache.org/jira/browse/YARN-8518 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jim Brennan >Priority: Major > > A new test was recently added to test-container-executor.c that has some > problems. > It is attempting to mkdir() a hard-coded path: > /tmp/2938rf2983hcqnw8ud/emptydir > This fails because the base directory is not there. These directories are > not being cleaned up either. > It should be using TEST_ROOT. > I don't know what Jira this change was made under - the git commit from July > 9 2018 does not reference a Jira. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8519) Yarn UI2 : Changes to depict Auto Created leaf Queues/Managed Queues differently from other queues
Suma Shivaprasad created YARN-8519: -- Summary: Yarn UI2 : Changes to depict Auto Created leaf Queues/Managed Queues differently from other queues Key: YARN-8519 URL: https://issues.apache.org/jira/browse/YARN-8519 Project: Hadoop YARN Issue Type: Sub-task Reporter: Suma Shivaprasad YARN-7420 covers changes to depict auto created leaf queues in a separate color notation but this was done in the old Yarn UI and similiar chnages need to be incorporated in the new YARN UI to depict Managed Parent queues/Auto-Created leaf queues separately -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8518) test-container-executor test_is_empty() is broken
[ https://issues.apache.org/jira/browse/YARN-8518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540597#comment-16540597 ] Jim Brennan commented on YARN-8518: --- [~rkanter], [~szegedim], let me know if you would like me to put up a patch for this. > test-container-executor test_is_empty() is broken > - > > Key: YARN-8518 > URL: https://issues.apache.org/jira/browse/YARN-8518 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jim Brennan >Priority: Major > > A new test was recently added to test-container-executor.c that has some > problems. > It is attempting to mkdir() a hard-coded path: > /tmp/2938rf2983hcqnw8ud/emptydir > This fails because the base directory is not there. These directories are > not being cleaned up either. > It should be using TEST_ROOT. > I don't know what Jira this change was made under - the git commit from July > 9 2018 does not reference a Jira. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8518) test-container-executor test_is_empty() is broken
Jim Brennan created YARN-8518: - Summary: test-container-executor test_is_empty() is broken Key: YARN-8518 URL: https://issues.apache.org/jira/browse/YARN-8518 Project: Hadoop YARN Issue Type: Bug Reporter: Jim Brennan A new test was recently added to test-container-executor.c that has some problems. It is attempting to mkdir() a hard-coded path: /tmp/2938rf2983hcqnw8ud/emptydir This fails because the base directory is not there. These directories are not being cleaned up either. It should be using TEST_ROOT. I don't know what Jira this change was made under - the git commit from July 9 2018 does not reference a Jira. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7639) Queue Management scheduling edit policy class needs to be configured dynamically
[ https://issues.apache.org/jira/browse/YARN-7639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suma Shivaprasad updated YARN-7639: --- Attachment: YARN-7639.1.patch > Queue Management scheduling edit policy class needs to be configured > dynamically > > > Key: YARN-7639 > URL: https://issues.apache.org/jira/browse/YARN-7639 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Reporter: Suma Shivaprasad >Assignee: Suma Shivaprasad >Priority: Major > Attachments: YARN-7639.1.patch > > > This needs to be configured dynamically i.e added to the list of current > policies configured under > yarn.resourcemanager.scheduler.monitor.policies > whenever auto leaf queue creation is enabled for a parent queue. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8501) Reduce complexity of RMWebServices' getApps method
[ https://issues.apache.org/jira/browse/YARN-8501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated YARN-8501: - Attachment: YARN-8501.002.patch > Reduce complexity of RMWebServices' getApps method > -- > > Key: YARN-8501 > URL: https://issues.apache.org/jira/browse/YARN-8501 > Project: Hadoop YARN > Issue Type: Improvement > Components: restapi >Reporter: Szilard Nemeth >Assignee: Szilard Nemeth >Priority: Major > Attachments: YARN-8501.001.patch, YARN-8501.002.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-8517) getContainer and getContainers ResourceManager REST API methods are not documented
[ https://issues.apache.org/jira/browse/YARN-8517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antal Bálint Steinbach reassigned YARN-8517: Assignee: Antal Bálint Steinbach > getContainer and getContainers ResourceManager REST API methods are not > documented > -- > > Key: YARN-8517 > URL: https://issues.apache.org/jira/browse/YARN-8517 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Szilard Nemeth >Assignee: Antal Bálint Steinbach >Priority: Major > Labels: newbie, newbie++ > > Looking at the documentation here: > https://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html > I cannot find documentation for 2 RM REST endpoints: > - /apps/\{appid\}/appattempts/\{appattemptid\}/containers > - /apps/\{appid\}/appattempts/\{appattemptid\}/containers/\{containerid\} > I suppose they are not intentionally undocumented. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8434) Update federation documentation of Nodemanager configurations
[ https://issues.apache.org/jira/browse/YARN-8434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540506#comment-16540506 ] Íñigo Goiri commented on YARN-8434: --- The parts that were hard to figure out were: * How to setup the scheduler. We needed to setup the regular scheduler address for the RMs but then everybody else had to point to the AMRMProxy. Not sure if there's a point clarifying this. * Having to setup HADOOP_CLIENT_CONF. Right now, this doesn't work without it. Something needs to be said about this. Is there a JIRA? > Update federation documentation of Nodemanager configurations > - > > Key: YARN-8434 > URL: https://issues.apache.org/jira/browse/YARN-8434 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Minor > Attachments: YARN-8434.001.patch, YARN-8434.002.patch, > YARN-8434.003.patch > > > FederationRMFailoverProxyProvider doesn't handle connecting to active RM. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7481) Gpu locality support for Better AI scheduling
[ https://issues.apache.org/jira/browse/YARN-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540501#comment-16540501 ] Wangda Tan commented on YARN-7481: -- [~qinc...@microsoft.com], I saw you were keep updating patches in the last several months. Given the proposed approach conflicts with community existing solution, is there any plans to merge this with community solutions? > Gpu locality support for Better AI scheduling > - > > Key: YARN-7481 > URL: https://issues.apache.org/jira/browse/YARN-7481 > Project: Hadoop YARN > Issue Type: New Feature > Components: api, RM, yarn >Affects Versions: 2.7.2 >Reporter: Chen Qingcha >Priority: Major > Fix For: 2.7.2 > > Attachments: GPU locality support for Job scheduling.pdf, > hadoop-2.7.2.gpu-port-20180711.patch, hadoop-2.7.2.gpu-port.patch, > hadoop-2.9.0.gpu-port.patch, hadoop_2.9.0.patch > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > We enhance Hadoop with GPU support for better AI job scheduling. > Currently, YARN-3926 also supports GPU scheduling, which treats GPU as > countable resource. > However, GPU placement is also very important to deep learning job for better > efficiency. > For example, a 2-GPU job runs on gpu {0,1} could be faster than run on gpu > {0, 7}, if GPU 0 and 1 are under the same PCI-E switch while 0 and 7 are not. > We add the support to Hadoop 2.7.2 to enable GPU locality scheduling, which > support fine-grained GPU placement. > A 64-bits bitmap is added to yarn Resource, which indicates both GPU usage > and locality information in a node (up to 64 GPUs per node). '1' means > available and '0' otherwise in the corresponding position of the bit. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8517) getContainer and getContainers ResourceManager REST API methods are not documented
[ https://issues.apache.org/jira/browse/YARN-8517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated YARN-8517: - Description: Looking at the documentation here: https://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html I cannot find documentation for 2 RM REST endpoints: - /apps/\{appid\}/appattempts/\{appattemptid\}/containers - /apps/\{appid\}/appattempts/\{appattemptid\}/containers/\{containerid\} I suppose they are not intentionally undocumented. was: Looking at the documentation here: https://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html I cannot find documentation for 2 RM REST endpoints: - /apps/\{appid\}/appattempts/{appattemptid}/containers - /apps/{appid}/appattempts/{appattemptid}/containers/{containerid} I suppose they are not intentionally undocumented. > getContainer and getContainers ResourceManager REST API methods are not > documented > -- > > Key: YARN-8517 > URL: https://issues.apache.org/jira/browse/YARN-8517 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Szilard Nemeth >Priority: Major > Labels: newbie, newbie++ > > Looking at the documentation here: > https://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html > I cannot find documentation for 2 RM REST endpoints: > - /apps/\{appid\}/appattempts/\{appattemptid\}/containers > - /apps/\{appid\}/appattempts/\{appattemptid\}/containers/\{containerid\} > I suppose they are not intentionally undocumented. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8517) getContainer and getContainers ResourceManager REST API methods are not documented
[ https://issues.apache.org/jira/browse/YARN-8517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated YARN-8517: - Description: Looking at the documentation here: https://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html I cannot find documentation for 2 RM REST endpoints: - /apps/\{appid\}/appattempts/\{appattemptid\}/containers - /apps/\{appid\}/appattempts/\{appattemptid\}/containers/\{containerid\} I suppose they are not intentionally undocumented. was: Looking at the documentation here: https://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html I cannot find documentation for 2 RM REST endpoints: - /apps/\{appid\}/appattempts/\{appattemptid\}/containers - /apps/\{appid\}/appattempts/\{appattemptid\}/containers/\{containerid\} I suppose they are not intentionally undocumented. > getContainer and getContainers ResourceManager REST API methods are not > documented > -- > > Key: YARN-8517 > URL: https://issues.apache.org/jira/browse/YARN-8517 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Szilard Nemeth >Priority: Major > Labels: newbie, newbie++ > > Looking at the documentation here: > https://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html > I cannot find documentation for 2 RM REST endpoints: > - /apps/\{appid\}/appattempts/\{appattemptid\}/containers > - /apps/\{appid\}/appattempts/\{appattemptid\}/containers/\{containerid\} > I suppose they are not intentionally undocumented. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8517) getContainer and getContainers ResourceManager REST API methods are not documented
[ https://issues.apache.org/jira/browse/YARN-8517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated YARN-8517: - Description: Looking at the documentation here: https://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html I cannot find documentation for 2 RM REST endpoints: - /apps/\{appid\}/appattempts/{appattemptid}/containers - /apps/{appid}/appattempts/{appattemptid}/containers/{containerid} I suppose they are not intentionally undocumented. was: Looking at the documentation here: https://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html I cannot find documentation for 2 RM REST endpoints: - /apps/{appid}/appattempts/{appattemptid}/containers - /apps/{appid}/appattempts/{appattemptid}/containers/{containerid} I suppose they are not intentionally undocumented. > getContainer and getContainers ResourceManager REST API methods are not > documented > -- > > Key: YARN-8517 > URL: https://issues.apache.org/jira/browse/YARN-8517 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Szilard Nemeth >Priority: Major > Labels: newbie, newbie++ > > Looking at the documentation here: > https://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html > I cannot find documentation for 2 RM REST endpoints: > - /apps/\{appid\}/appattempts/{appattemptid}/containers > - /apps/{appid}/appattempts/{appattemptid}/containers/{containerid} > I suppose they are not intentionally undocumented. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8517) getContainer and getContainers ResourceManager REST API methods are not documented
Szilard Nemeth created YARN-8517: Summary: getContainer and getContainers ResourceManager REST API methods are not documented Key: YARN-8517 URL: https://issues.apache.org/jira/browse/YARN-8517 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Szilard Nemeth Looking at the documentation here: https://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html I cannot find documentation for 2 RM REST endpoints: - /apps/{appid}/appattempts/{appattemptid}/containers - /apps/{appid}/appattempts/{appattemptid}/containers/{containerid} I suppose they are not intentionally undocumented. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8501) Reduce complexity of RMWebServices' getApps method
[ https://issues.apache.org/jira/browse/YARN-8501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540485#comment-16540485 ] genericqa commented on YARN-8501: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 37s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 29m 26s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 3m 2s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 18s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 27s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 44s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 10s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 52s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 11s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 10s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 49s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 13s{color} | {color:green} hadoop-yarn-server-common in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 68m 55s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 25s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}143m 3s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsCustomResourceTypes | | | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServiceAppsNodelabel | | | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices | | | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesHttpStaticUserPermissions | | | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd | | JIRA Issue | YARN-8501 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12931182/YARN-8501.001.patch | | Optional Tests | asflicense compile javac javadoc
[jira] [Commented] (YARN-8434) Update federation documentation of Nodemanager configurations
[ https://issues.apache.org/jira/browse/YARN-8434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540428#comment-16540428 ] Subru Krishnan commented on YARN-8434: -- Thanks [~bibinchundatt] for understanding/verifying! +1 from my side on latest patch (v3). [~elgoiri], do you have any other documentation fixes before this goes in? > Update federation documentation of Nodemanager configurations > - > > Key: YARN-8434 > URL: https://issues.apache.org/jira/browse/YARN-8434 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Minor > Attachments: YARN-8434.001.patch, YARN-8434.002.patch, > YARN-8434.003.patch > > > FederationRMFailoverProxyProvider doesn't handle connecting to active RM. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8504) Incorrect sTarget column causing DataTable warning on RM application and scheduler web page and application history server webpage
[ https://issues.apache.org/jira/browse/YARN-8504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540342#comment-16540342 ] Íñigo Goiri commented on YARN-8504: --- Yes, same thing for NodesPage very easy to break. Ideally, we should even check that the type is the correct; for example if the table defines column X as sortable, we should check that it does so. However, the latter is a little too complex. At least a basic check that the numbers match should be there. > Incorrect sTarget column causing DataTable warning on RM application and > scheduler web page and application history server webpage > -- > > Key: YARN-8504 > URL: https://issues.apache.org/jira/browse/YARN-8504 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: tianjuan >Assignee: tianjuan >Priority: Major > Attachments: YARN-8504.001.patch, image-2018-07-09-12-09-21-401.png, > image-2018-07-09-12-12-18-131.png > > > On a cluster built from latest trunk, click 'State' at application history > webpage gives following warning > *DataTables warning (table id = 'apps'): Requested unknown parameter '9' > from the data source for row 0* > and click 'Cluster' at RM applications page gives following warning > !image-2018-07-09-12-09-21-401.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8501) Reduce complexity of RMWebServices' getApps method
[ https://issues.apache.org/jira/browse/YARN-8501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated YARN-8501: - Attachment: YARN-8501.001.patch > Reduce complexity of RMWebServices' getApps method > -- > > Key: YARN-8501 > URL: https://issues.apache.org/jira/browse/YARN-8501 > Project: Hadoop YARN > Issue Type: Improvement > Components: restapi >Reporter: Szilard Nemeth >Assignee: Szilard Nemeth >Priority: Major > Attachments: YARN-8501.001.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8434) Update federation documentation of Nodemanager configurations
[ https://issues.apache.org/jira/browse/YARN-8434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540236#comment-16540236 ] genericqa commented on YARN-8434: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 23s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 36s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 21s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 35m 20s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 32s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 24s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 48m 25s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd | | JIRA Issue | YARN-8434 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12931174/YARN-8434.003.patch | | Optional Tests | asflicense mvnsite | | uname | Linux 85e173e58a38 4.4.0-130-generic #156-Ubuntu SMP Thu Jun 14 08:53:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 2ae13d4 | | maven | version: Apache Maven 3.3.9 | | Max. process+thread count | 397 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/21212/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Update federation documentation of Nodemanager configurations > - > > Key: YARN-8434 > URL: https://issues.apache.org/jira/browse/YARN-8434 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Minor > Attachments: YARN-8434.001.patch, YARN-8434.002.patch, > YARN-8434.003.patch > > > FederationRMFailoverProxyProvider doesn't handle connecting to active RM. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8434) Update federation documentation of Nodemanager configurations
[ https://issues.apache.org/jira/browse/YARN-8434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540162#comment-16540162 ] Bibin A Chundatt commented on YARN-8434: Updated jira description and patch added removing yarn.client.failover-proxy-provider configuration from NMs section > Update federation documentation of Nodemanager configurations > - > > Key: YARN-8434 > URL: https://issues.apache.org/jira/browse/YARN-8434 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Minor > Attachments: YARN-8434.001.patch, YARN-8434.002.patch, > YARN-8434.003.patch > > > FederationRMFailoverProxyProvider doesn't handle connecting to active RM. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8434) Update federation documentation of Nodemanager configurations
[ https://issues.apache.org/jira/browse/YARN-8434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-8434: --- Attachment: YARN-8434.003.patch > Update federation documentation of Nodemanager configurations > - > > Key: YARN-8434 > URL: https://issues.apache.org/jira/browse/YARN-8434 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Minor > Attachments: YARN-8434.001.patch, YARN-8434.002.patch, > YARN-8434.003.patch > > > FederationRMFailoverProxyProvider doesn't handle connecting to active RM. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8434) Update federation documentation of Nodemanager configurations
[ https://issues.apache.org/jira/browse/YARN-8434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-8434: --- Summary: Update federation documentation of Nodemanager configurations (was: Nodemanager not registering to active RM in federation) > Update federation documentation of Nodemanager configurations > - > > Key: YARN-8434 > URL: https://issues.apache.org/jira/browse/YARN-8434 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Blocker > Attachments: YARN-8434.001.patch, YARN-8434.002.patch > > > FederationRMFailoverProxyProvider doesn't handle connecting to active RM. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8434) Update federation documentation of Nodemanager configurations
[ https://issues.apache.org/jira/browse/YARN-8434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-8434: --- Priority: Minor (was: Blocker) > Update federation documentation of Nodemanager configurations > - > > Key: YARN-8434 > URL: https://issues.apache.org/jira/browse/YARN-8434 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Minor > Attachments: YARN-8434.001.patch, YARN-8434.002.patch > > > FederationRMFailoverProxyProvider doesn't handle connecting to active RM. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8491) TestServiceCLI#testEnableFastLaunch fail when umask is 077
[ https://issues.apache.org/jira/browse/YARN-8491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16539925#comment-16539925 ] Hudson commented on YARN-8491: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14556 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/14556/]) YARN-8491. TestServiceCLI#testEnableFastLaunch fail when umask is 077. (bibinchundatt: rev 52e1bc8539ce769f47743d8b2d318a54c3887ba0) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/test/java/org/apache/hadoop/yarn/service/client/TestServiceCLI.java > TestServiceCLI#testEnableFastLaunch fail when umask is 077 > -- > > Key: YARN-8491 > URL: https://issues.apache.org/jira/browse/YARN-8491 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: K G Bakthavachalam >Assignee: K G Bakthavachalam >Priority: Major > Fix For: 3.2.0, 3.1.1 > > Attachments: YARN-8491.001.patch > > > UT failure in TestServiceCLI#testEnableFastLaunch due to permission issue > ,when permission is given only to the owner and not to group and > others(global). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-7031) Support distributed node attributes
[ https://issues.apache.org/jira/browse/YARN-7031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt resolved YARN-7031. Resolution: Implemented Fix Version/s: YARN-3409 Closing this since its already handled > Support distributed node attributes > --- > > Key: YARN-7031 > URL: https://issues.apache.org/jira/browse/YARN-7031 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Major > Fix For: YARN-3409 > > Attachments: Distributed node attributes v1.pdf, > YARN-7031-YARN-3409.001.patch > > > Allow nodemanagers to push its attributes to RM -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8491) Fix TestServiceCLI#testEnableFastLaunch fail when umask is 077
[ https://issues.apache.org/jira/browse/YARN-8491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-8491: --- Summary: Fix TestServiceCLI#testEnableFastLaunch fail when umask is 077 (was: UT failure in TestServiceCLI#testEnableFastLaunch when umask is set to 077) > Fix TestServiceCLI#testEnableFastLaunch fail when umask is 077 > -- > > Key: YARN-8491 > URL: https://issues.apache.org/jira/browse/YARN-8491 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: bakthavachalam >Assignee: bakthavachalam >Priority: Major > Attachments: YARN-8491.001.patch > > > UT failure in TestServiceCLI#testEnableFastLaunch due to permission issue > ,when permission is given only to the owner and not to group and > others(global). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8491) TestServiceCLI#testEnableFastLaunch fail when umask is 077
[ https://issues.apache.org/jira/browse/YARN-8491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-8491: --- Summary: TestServiceCLI#testEnableFastLaunch fail when umask is 077 (was: Fix TestServiceCLI#testEnableFastLaunch fail when umask is 077) > TestServiceCLI#testEnableFastLaunch fail when umask is 077 > -- > > Key: YARN-8491 > URL: https://issues.apache.org/jira/browse/YARN-8491 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: bakthavachalam >Assignee: bakthavachalam >Priority: Major > Attachments: YARN-8491.001.patch > > > UT failure in TestServiceCLI#testEnableFastLaunch due to permission issue > ,when permission is given only to the owner and not to group and > others(global). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8504) Incorrect sTarget column causing DataTable warning on RM application and scheduler web page and application history server webpage
[ https://issues.apache.org/jira/browse/YARN-8504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16539783#comment-16539783 ] tianjuan commented on YARN-8504: this is caused by YARN-7088 which adds a new column. there are some else places where the digits are manually counted, for example NodesPage.java. Maybe unified UTs are needed. > Incorrect sTarget column causing DataTable warning on RM application and > scheduler web page and application history server webpage > -- > > Key: YARN-8504 > URL: https://issues.apache.org/jira/browse/YARN-8504 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: tianjuan >Assignee: tianjuan >Priority: Major > Attachments: YARN-8504.001.patch, image-2018-07-09-12-09-21-401.png, > image-2018-07-09-12-12-18-131.png > > > On a cluster built from latest trunk, click 'State' at application history > webpage gives following warning > *DataTables warning (table id = 'apps'): Requested unknown parameter '9' > from the data source for row 0* > and click 'Cluster' at RM applications page gives following warning > !image-2018-07-09-12-09-21-401.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8511) When AM releases a container, RM removes allocation tags before it is released by NM
[ https://issues.apache.org/jira/browse/YARN-8511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16539774#comment-16539774 ] genericqa commented on YARN-8511: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 23s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 3 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 45s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 50s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 27m 34s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 26s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 34s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 32s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 6s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 9s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 19s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 26m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 26m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 25s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 8s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 71m 41s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 10m 22s{color} | {color:red} hadoop-sls in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 40s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}199m 9s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.sls.TestSLSRunner | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd | | JIRA Issue | YARN-8511 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12931119/YARN-8511.002.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 7d5d1c13287c 4.4.0-130-generic #156-Ubuntu SMP Thu Jun 14 08:53:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / a47ec5d | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_171 | | findbugs | v3.1.0-RC1 | | unit |
[jira] [Commented] (YARN-8468) Limit container sizes per queue in FairScheduler
[ https://issues.apache.org/jira/browse/YARN-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16539748#comment-16539748 ] Szilard Nemeth commented on YARN-8468: -- Hey [~bsteinbach]! Thanks for the updated patch. LGTM, +1 (non-binding) > Limit container sizes per queue in FairScheduler > > > Key: YARN-8468 > URL: https://issues.apache.org/jira/browse/YARN-8468 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 3.1.0 >Reporter: Antal Bálint Steinbach >Assignee: Antal Bálint Steinbach >Priority: Critical > Labels: patch > Attachments: YARN-8468.000.patch, YARN-8468.001.patch, > YARN-8468.002.patch, YARN-8468.003.patch > > > When using any scheduler, you can use "yarn.scheduler.maximum-allocation-mb" > to limit the overall size of a container. This applies globally to all > containers and cannot be limited by queue or and is not scheduler dependent. > > The goal of this ticket is to allow this value to be set on a per queue basis. > > The use case: User has two pools, one for ad hoc jobs and one for enterprise > apps. User wants to limit ad hoc jobs to small containers but allow > enterprise apps to request as many resources as needed. Setting > yarn.scheduler.maximum-allocation-mb sets a default value for maximum > container size for all queues and setting maximum resources per queue with > “maxContainerResources” queue config value. > > Suggested solution: > > All the infrastructure is already in the code. We need to do the following: > * add the setting to the queue properties for all queue types (parent and > leaf), this will cover dynamically created queues. > * if we set it on the root we override the scheduler setting and we should > not allow that. > * make sure that queue resource cap can not be larger than scheduler max > resource cap in the config. > * implement getMaximumResourceCapability(String queueName) in the > FairScheduler > * implement getMaximumResourceCapability() in both FSParentQueue and > FSLeafQueue as follows > * expose the setting in the queue information in the RM web UI. > * expose the setting in the metrics etc for the queue. > * write JUnit tests. > * update the scheduler documentation. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8421) when moving app, activeUsers is increased, even though app does not have outstanding request
[ https://issues.apache.org/jira/browse/YARN-8421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16539734#comment-16539734 ] genericqa commented on YARN-8421: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 47s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 29m 7s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 47s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 15s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 47s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 25s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 14s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 33s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 68m 58s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 24s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}131m 44s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd | | JIRA Issue | YARN-8421 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12931121/YARN-8421.003.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 0a5b5eed3cc4 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / a47ec5d | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_171 | | findbugs | v3.1.0-RC1 | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/21211/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/21211/testReport/ | | Max. process+thread count | 833 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U:
[jira] [Updated] (YARN-7481) Gpu locality support for Better AI scheduling
[ https://issues.apache.org/jira/browse/YARN-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Qingcha updated YARN-7481: --- Attachment: hadoop-2.7.2.gpu-port-20180711.patch > Gpu locality support for Better AI scheduling > - > > Key: YARN-7481 > URL: https://issues.apache.org/jira/browse/YARN-7481 > Project: Hadoop YARN > Issue Type: New Feature > Components: api, RM, yarn >Affects Versions: 2.7.2 >Reporter: Chen Qingcha >Priority: Major > Fix For: 2.7.2 > > Attachments: GPU locality support for Job scheduling.pdf, > hadoop-2.7.2.gpu-port-20180711.patch, hadoop-2.7.2.gpu-port.patch, > hadoop-2.9.0.gpu-port.patch, hadoop_2.9.0.patch > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > We enhance Hadoop with GPU support for better AI job scheduling. > Currently, YARN-3926 also supports GPU scheduling, which treats GPU as > countable resource. > However, GPU placement is also very important to deep learning job for better > efficiency. > For example, a 2-GPU job runs on gpu {0,1} could be faster than run on gpu > {0, 7}, if GPU 0 and 1 are under the same PCI-E switch while 0 and 7 are not. > We add the support to Hadoop 2.7.2 to enable GPU locality scheduling, which > support fine-grained GPU placement. > A 64-bits bitmap is added to yarn Resource, which indicates both GPU usage > and locality information in a node (up to 64 GPUs per node). '1' means > available and '0' otherwise in the corresponding position of the bit. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7481) Gpu locality support for Better AI scheduling
[ https://issues.apache.org/jira/browse/YARN-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Qingcha updated YARN-7481: --- Attachment: (was: hadoop-2.7.2.gpu-port-20180710.patch) > Gpu locality support for Better AI scheduling > - > > Key: YARN-7481 > URL: https://issues.apache.org/jira/browse/YARN-7481 > Project: Hadoop YARN > Issue Type: New Feature > Components: api, RM, yarn >Affects Versions: 2.7.2 >Reporter: Chen Qingcha >Priority: Major > Fix For: 2.7.2 > > Attachments: GPU locality support for Job scheduling.pdf, > hadoop-2.7.2.gpu-port.patch, hadoop-2.9.0.gpu-port.patch, hadoop_2.9.0.patch > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > We enhance Hadoop with GPU support for better AI job scheduling. > Currently, YARN-3926 also supports GPU scheduling, which treats GPU as > countable resource. > However, GPU placement is also very important to deep learning job for better > efficiency. > For example, a 2-GPU job runs on gpu {0,1} could be faster than run on gpu > {0, 7}, if GPU 0 and 1 are under the same PCI-E switch while 0 and 7 are not. > We add the support to Hadoop 2.7.2 to enable GPU locality scheduling, which > support fine-grained GPU placement. > A 64-bits bitmap is added to yarn Resource, which indicates both GPU usage > and locality information in a node (up to 64 GPUs per node). '1' means > available and '0' otherwise in the corresponding position of the bit. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7481) Gpu locality support for Better AI scheduling
[ https://issues.apache.org/jira/browse/YARN-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Qingcha updated YARN-7481: --- Attachment: (was: hadoop-2.7.2.gpu-port-20180711.patch) > Gpu locality support for Better AI scheduling > - > > Key: YARN-7481 > URL: https://issues.apache.org/jira/browse/YARN-7481 > Project: Hadoop YARN > Issue Type: New Feature > Components: api, RM, yarn >Affects Versions: 2.7.2 >Reporter: Chen Qingcha >Priority: Major > Fix For: 2.7.2 > > Attachments: GPU locality support for Job scheduling.pdf, > hadoop-2.7.2.gpu-port.patch, hadoop-2.9.0.gpu-port.patch, hadoop_2.9.0.patch > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > We enhance Hadoop with GPU support for better AI job scheduling. > Currently, YARN-3926 also supports GPU scheduling, which treats GPU as > countable resource. > However, GPU placement is also very important to deep learning job for better > efficiency. > For example, a 2-GPU job runs on gpu {0,1} could be faster than run on gpu > {0, 7}, if GPU 0 and 1 are under the same PCI-E switch while 0 and 7 are not. > We add the support to Hadoop 2.7.2 to enable GPU locality scheduling, which > support fine-grained GPU placement. > A 64-bits bitmap is added to yarn Resource, which indicates both GPU usage > and locality information in a node (up to 64 GPUs per node). '1' means > available and '0' otherwise in the corresponding position of the bit. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7481) Gpu locality support for Better AI scheduling
[ https://issues.apache.org/jira/browse/YARN-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Qingcha updated YARN-7481: --- Attachment: (was: hadoop-2.7.2.gpu-port-20180710_old.patch) > Gpu locality support for Better AI scheduling > - > > Key: YARN-7481 > URL: https://issues.apache.org/jira/browse/YARN-7481 > Project: Hadoop YARN > Issue Type: New Feature > Components: api, RM, yarn >Affects Versions: 2.7.2 >Reporter: Chen Qingcha >Priority: Major > Fix For: 2.7.2 > > Attachments: GPU locality support for Job scheduling.pdf, > hadoop-2.7.2.gpu-port.patch, hadoop-2.9.0.gpu-port.patch, hadoop_2.9.0.patch > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > We enhance Hadoop with GPU support for better AI job scheduling. > Currently, YARN-3926 also supports GPU scheduling, which treats GPU as > countable resource. > However, GPU placement is also very important to deep learning job for better > efficiency. > For example, a 2-GPU job runs on gpu {0,1} could be faster than run on gpu > {0, 7}, if GPU 0 and 1 are under the same PCI-E switch while 0 and 7 are not. > We add the support to Hadoop 2.7.2 to enable GPU locality scheduling, which > support fine-grained GPU placement. > A 64-bits bitmap is added to yarn Resource, which indicates both GPU usage > and locality information in a node (up to 64 GPUs per node). '1' means > available and '0' otherwise in the corresponding position of the bit. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8512) ATSv2 entities are not published to HBase from second attempt onwards
[ https://issues.apache.org/jira/browse/YARN-8512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16539649#comment-16539649 ] Hudson commented on YARN-8512: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14555 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/14555/]) YARN-8512. ATSv2 entities are not published to HBase from second attempt (sunilg: rev 7f1d3d0e9dbe328fae0d43421665e0b6907b33fe) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/application/ApplicationImpl.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/TestContainerManagerRecovery.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManagerImpl.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/BaseContainerManagerTest.java > ATSv2 entities are not published to HBase from second attempt onwards > - > > Key: YARN-8512 > URL: https://issues.apache.org/jira/browse/YARN-8512 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.0, 2.10.0, 3.2.0, 3.0.3 >Reporter: Yesha Vora >Assignee: Rohith Sharma K S >Priority: Major > Fix For: 3.2.0, 3.1.1 > > Attachments: YARN-8512.01.patch, YARN-8512.02.patch, > YARN-8512.03.patch > > > It is observed that if 1st attempt master container is died and 2nd attempt > master container is launched in a NM where old containers are running but not > master container. > ||Attempt||NM1||NM2||Action|| > |attempt-1|master container i.e container-1-1|container-1-2|master container > died| > |attempt-2|NA|container-1-2 and master container container-2-1|NA| > In the above scenario, NM doesn't identifies flowContext and will get log > below > {noformat} > 2018-07-10 00:44:38,285 WARN storage.HBaseTimelineWriterImpl > (HBaseTimelineWriterImpl.java:write(170)) - Found null for one of: > flowName=null appId=application_1531175172425_0001 userId=hbase > clusterId=yarn-cluster . Not proceeding with writing to hbase > 2018-07-10 00:44:38,560 WARN storage.HBaseTimelineWriterImpl > (HBaseTimelineWriterImpl.java:write(170)) - Found null for one of: > flowName=null appId=application_1531175172425_0001 userId=hbase > clusterId=yarn-cluster . Not proceeding with writing to hbase > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7481) Gpu locality support for Better AI scheduling
[ https://issues.apache.org/jira/browse/YARN-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Qingcha updated YARN-7481: --- Attachment: hadoop-2.7.2.gpu-port-20180711.patch > Gpu locality support for Better AI scheduling > - > > Key: YARN-7481 > URL: https://issues.apache.org/jira/browse/YARN-7481 > Project: Hadoop YARN > Issue Type: New Feature > Components: api, RM, yarn >Affects Versions: 2.7.2 >Reporter: Chen Qingcha >Priority: Major > Fix For: 2.7.2 > > Attachments: GPU locality support for Job scheduling.pdf, > hadoop-2.7.2.gpu-port-20180710.patch, > hadoop-2.7.2.gpu-port-20180710_old.patch, > hadoop-2.7.2.gpu-port-20180711.patch, hadoop-2.7.2.gpu-port.patch, > hadoop-2.9.0.gpu-port.patch, hadoop_2.9.0.patch > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > We enhance Hadoop with GPU support for better AI job scheduling. > Currently, YARN-3926 also supports GPU scheduling, which treats GPU as > countable resource. > However, GPU placement is also very important to deep learning job for better > efficiency. > For example, a 2-GPU job runs on gpu {0,1} could be faster than run on gpu > {0, 7}, if GPU 0 and 1 are under the same PCI-E switch while 0 and 7 are not. > We add the support to Hadoop 2.7.2 to enable GPU locality scheduling, which > support fine-grained GPU placement. > A 64-bits bitmap is added to yarn Resource, which indicates both GPU usage > and locality information in a node (up to 64 GPUs per node). '1' means > available and '0' otherwise in the corresponding position of the bit. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7481) Gpu locality support for Better AI scheduling
[ https://issues.apache.org/jira/browse/YARN-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Qingcha updated YARN-7481: --- Attachment: (was: hadoop-2.7.2.gpu-port-20180711.patch) > Gpu locality support for Better AI scheduling > - > > Key: YARN-7481 > URL: https://issues.apache.org/jira/browse/YARN-7481 > Project: Hadoop YARN > Issue Type: New Feature > Components: api, RM, yarn >Affects Versions: 2.7.2 >Reporter: Chen Qingcha >Priority: Major > Fix For: 2.7.2 > > Attachments: GPU locality support for Job scheduling.pdf, > hadoop-2.7.2.gpu-port-20180710.patch, > hadoop-2.7.2.gpu-port-20180710_old.patch, > hadoop-2.7.2.gpu-port-20180711.patch, hadoop-2.7.2.gpu-port.patch, > hadoop-2.9.0.gpu-port.patch, hadoop_2.9.0.patch > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > We enhance Hadoop with GPU support for better AI job scheduling. > Currently, YARN-3926 also supports GPU scheduling, which treats GPU as > countable resource. > However, GPU placement is also very important to deep learning job for better > efficiency. > For example, a 2-GPU job runs on gpu {0,1} could be faster than run on gpu > {0, 7}, if GPU 0 and 1 are under the same PCI-E switch while 0 and 7 are not. > We add the support to Hadoop 2.7.2 to enable GPU locality scheduling, which > support fine-grained GPU placement. > A 64-bits bitmap is added to yarn Resource, which indicates both GPU usage > and locality information in a node (up to 64 GPUs per node). '1' means > available and '0' otherwise in the corresponding position of the bit. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8421) when moving app, activeUsers is increased, even though app does not have outstanding request
[ https://issues.apache.org/jira/browse/YARN-8421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16539625#comment-16539625 ] kyungwan nam commented on YARN-8421: [~eepayne], Thank you for your review. I've attached a new patch according to your comment. > when moving app, activeUsers is increased, even though app does not have > outstanding request > - > > Key: YARN-8421 > URL: https://issues.apache.org/jira/browse/YARN-8421 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.8.4 >Reporter: kyungwan nam >Priority: Major > Attachments: YARN-8421.001.patch, YARN-8421.002.patch, > YARN-8421.003.patch > > > all containers for app1 have been allocated. > move app1 from default Queue to test Queue as follows. > {code} > yarn rmadmin application -movetoqueue app1 -queue test > {code} > _activeUsers_ of the test Queue is increased even though app1 which does not > have outstanding request. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8421) when moving app, activeUsers is increased, even though app does not have outstanding request
[ https://issues.apache.org/jira/browse/YARN-8421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kyungwan nam updated YARN-8421: --- Attachment: YARN-8421.003.patch > when moving app, activeUsers is increased, even though app does not have > outstanding request > - > > Key: YARN-8421 > URL: https://issues.apache.org/jira/browse/YARN-8421 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.8.4 >Reporter: kyungwan nam >Priority: Major > Attachments: YARN-8421.001.patch, YARN-8421.002.patch, > YARN-8421.003.patch > > > all containers for app1 have been allocated. > move app1 from default Queue to test Queue as follows. > {code} > yarn rmadmin application -movetoqueue app1 -queue test > {code} > _activeUsers_ of the test Queue is increased even though app1 which does not > have outstanding request. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org