[jira] [Commented] (YARN-7746) Fix PlacementProcessor to support app priority
[ https://issues.apache.org/jira/browse/YARN-7746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16629774#comment-16629774 ] Arun Suresh commented on YARN-7746: --- Thanks for the patch [~maniraj...@gmail.com] Instead of having a separate {{SchedulingRequestEventAsyncThread}}, it is possible to just replace the default backing queue of the schedulingThreadPool like so: {code} this.schedulingThreadPool = new ThreadPoolExecutor(.., priorityQueue, ..); {code} You might even consider using the {{org.apache.hadoop.util.BlockingThreadPoolExecutorService}} or add a method to create a priority backed threadpool. > Fix PlacementProcessor to support app priority > -- > > Key: YARN-7746 > URL: https://issues.apache.org/jira/browse/YARN-7746 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Arun Suresh >Assignee: Manikandan R >Priority: Major > Attachments: YARN-7746.001.patch > > > The Threadpools used in the Processor should be modified to take a priority > blocking queue that respects application priority. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-7746) Fix PlacementProcessor to support app priority
[ https://issues.apache.org/jira/browse/YARN-7746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16629774#comment-16629774 ] Arun Suresh edited comment on YARN-7746 at 9/27/18 5:28 AM: Thanks for the patch [~maniraj...@gmail.com] Instead of having a separate {{SchedulingRequestEventAsyncThread}}, it should be possible to just replace the default backing queue of the schedulingThreadPool like so: {code} this.schedulingThreadPool = new ThreadPoolExecutor(.., priorityQueue, ..); {code} You might even consider using the {{org.apache.hadoop.util.BlockingThreadPoolExecutorService}} or add a method to create a priority backed threadpool. was (Author: asuresh): Thanks for the patch [~maniraj...@gmail.com] Instead of having a separate {{SchedulingRequestEventAsyncThread}}, it is possible to just replace the default backing queue of the schedulingThreadPool like so: {code} this.schedulingThreadPool = new ThreadPoolExecutor(.., priorityQueue, ..); {code} You might even consider using the {{org.apache.hadoop.util.BlockingThreadPoolExecutorService}} or add a method to create a priority backed threadpool. > Fix PlacementProcessor to support app priority > -- > > Key: YARN-7746 > URL: https://issues.apache.org/jira/browse/YARN-7746 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Arun Suresh >Assignee: Manikandan R >Priority: Major > Attachments: YARN-7746.001.patch > > > The Threadpools used in the Processor should be modified to take a priority > blocking queue that respects application priority. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-7746) Fix PlacementProcessor to support app priority
[ https://issues.apache.org/jira/browse/YARN-7746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun Suresh reassigned YARN-7746: - Assignee: Manikandan R (was: Arun Suresh) > Fix PlacementProcessor to support app priority > -- > > Key: YARN-7746 > URL: https://issues.apache.org/jira/browse/YARN-7746 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Arun Suresh >Assignee: Manikandan R >Priority: Major > Attachments: YARN-7746.001.patch > > > The Threadpools used in the Processor should be modified to take a priority > blocking queue that respects application priority. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8826) Fix lingering timeline collector after serviceStop in TimelineCollectorManager
[ https://issues.apache.org/jira/browse/YARN-8826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16629765#comment-16629765 ] Rohith Sharma K S commented on YARN-8826: - Thanks [~prabham] for the patch. +1 lgtm. pending jenkins > Fix lingering timeline collector after serviceStop in TimelineCollectorManager > -- > > Key: YARN-8826 > URL: https://issues.apache.org/jira/browse/YARN-8826 > Project: Hadoop YARN > Issue Type: Bug > Components: ATSv2 >Reporter: Prabha Manepalli >Assignee: Prabha Manepalli >Priority: Trivial > Attachments: YARN-8826.v1.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8270) Adding JMX Metrics for Timeline Collector and Reader
[ https://issues.apache.org/jira/browse/YARN-8270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16629761#comment-16629761 ] Rohith Sharma K S commented on YARN-8270: - +1 lgtm.. [~vrushalic] Please do the honor by committing patch :) > Adding JMX Metrics for Timeline Collector and Reader > > > Key: YARN-8270 > URL: https://issues.apache.org/jira/browse/YARN-8270 > Project: Hadoop YARN > Issue Type: Sub-task > Components: ATSv2, timelineserver >Reporter: Sushil Ks >Assignee: Sushil Ks >Priority: Major > Attachments: YARN-8270.001.patch, YARN-8270.002.patch, > YARN-8270.003.patch, YARN-8270.004.patch, YARN-8270.005.patch > > > This Jira is for emitting JMX Metrics for ATS v2 Timeline Collector and > Timeline Reader, basically for Timeline Collector it tries to capture success > and failure latencies for *putEntities* and *putEntitiesAsync* from > *TimelineCollectorWebService* , similarly all the API's success and failure > latencies for fetching TimelineEntities from *TimelineReaderWebServices*. > This would actually help in monitoring and measuring performance for ATSv2 at > scale. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8827) Plumb per app, per user and per queue resource utilization from the NM to RM
[ https://issues.apache.org/jira/browse/YARN-8827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun Suresh updated YARN-8827: -- Description: Opportunistic Containers for OverAllocation need to be allocated to pending applications in some fair manner. Rather than evaluating queue and user resource usage (allocated resource usage) and comparing against queue and user limits to decide the allocation, it might make more sense to use a snapshot of actual resource utilization of the queue and user. To facilitate this, this JIRA proposes to aggregate per user, per app (and maybe per queue) resource utilization in addition to aggregated Container and Node Utilization and send it along with the NM heartbeat. It should be fairly inexpensive to aggregate - since it can be performed in the same loop of the {{ContainersMonitorImpl}}'s Monitoring thread. A snapshot aggregate can be made every couple of seconds in the RM. This instantaneous resource utilization should be used to decide if Opportunistic containers can be allocated to an App, Queue or User. was: Opportunistic Containers for OverAllocation need to be allocated to pending applications in some fair manner. Rather than evaluating queue and user resource usage (allocated resource usage) and comparing against queue and user limits to decide the allocation, it might be make more sense to use a snapshot of actual resource utilization of the queue and user. To facilitate this, this JIRA proposes to aggregate per user, per app (and maybe per queue) resource utilization in addition to aggregated Container and Node Utilization and send it along with the NM heartbeat. It should be fairly inexpensive to aggregate - since it can be performed in the same loop of the {{ContainersMonitorImpl}}'s Monitoring thread. A snapshot aggregate can be made every couple of seconds in the RM. This instantaneous resource utilization should be used to decide if Opportunistic containers can be allocated to an App, Queue or User. > Plumb per app, per user and per queue resource utilization from the NM to RM > > > Key: YARN-8827 > URL: https://issues.apache.org/jira/browse/YARN-8827 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Arun Suresh >Assignee: Arun Suresh >Priority: Major > > Opportunistic Containers for OverAllocation need to be allocated to pending > applications in some fair manner. Rather than evaluating queue and user > resource usage (allocated resource usage) and comparing against queue and > user limits to decide the allocation, it might make more sense to use a > snapshot of actual resource utilization of the queue and user. > To facilitate this, this JIRA proposes to aggregate per user, per app (and > maybe per queue) resource utilization in addition to aggregated Container and > Node Utilization and send it along with the NM heartbeat. It should be fairly > inexpensive to aggregate - since it can be performed in the same loop of the > {{ContainersMonitorImpl}}'s Monitoring thread. > A snapshot aggregate can be made every couple of seconds in the RM. This > instantaneous resource utilization should be used to decide if Opportunistic > containers can be allocated to an App, Queue or User. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8800) Updated documentation of Submarine with latest examples.
[ https://issues.apache.org/jira/browse/YARN-8800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16629713#comment-16629713 ] Wangda Tan commented on YARN-8800: -- [~sunilg], tried to use command: {code:java} mvn clean site:site -Preleasedocs; mvn site:stage -DstagingDirectory=/tmp/hadoop-site{code} It works for me for all pages. Fixed few issues. (004) > Updated documentation of Submarine with latest examples. > > > Key: YARN-8800 > URL: https://issues.apache.org/jira/browse/YARN-8800 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Critical > Attachments: YARN-8800.001.patch, YARN-8800.002.patch, > YARN-8800.003.patch, YARN-8800.004.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8800) Updated documentation of Submarine with latest examples.
[ https://issues.apache.org/jira/browse/YARN-8800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-8800: - Attachment: YARN-8800.004.patch > Updated documentation of Submarine with latest examples. > > > Key: YARN-8800 > URL: https://issues.apache.org/jira/browse/YARN-8800 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Critical > Attachments: YARN-8800.001.patch, YARN-8800.002.patch, > YARN-8800.003.patch, YARN-8800.004.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8828) When config ReservationSystem ,the RM start failed.
yimeng created YARN-8828: Summary: When config ReservationSystem ,the RM start failed. Key: YARN-8828 URL: https://issues.apache.org/jira/browse/YARN-8828 Project: Hadoop YARN Issue Type: Bug Components: capacity scheduler Affects Versions: 3.1.0 Reporter: yimeng I tested ReservationSystem in Hadooop 3.0,but it seems have problem. 1.config yarn.resourcemanager.reservation-system.enable = true in RM yarn-site.xml 2.select a leaf queue "bbb" ,config yarn.scheduler.capacity.root.bbb.reservable = true in capacity-scheduler.xml,as follow yarn.scheduler.capacity.root.bbb.reservable true 3.then restart RM ,the RM start failed .The error stack log is as follows: 2018-09-27 11:30:15,691 | FATAL | main | Error starting ResourceManager | ResourceManager.java:1517 org.apache.hadoop.service.ServiceStateException: java.io.IOException: mapping contains invalid or non-leaf queue : bbb at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:173) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:813) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:1214) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:315) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1510) Caused by: java.io.IOException: mapping contains invalid or non-leaf queue : bbb at org.apache.hadoop.yarn.server.resourcemanager.placement.UserGroupMappingPlacementRule.validateAndGetQueueMapping(UserGroupMappingPlacementRule.java:316) at org.apache.hadoop.yarn.server.resourcemanager.placement.UserGroupMappingPlacementRule.get(UserGroupMappingPlacementRule.java:280) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.getUserGroupMappingPlacementRule(CapacityScheduler.java:668) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.updatePlacementRules(CapacityScheduler.java:689) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initializeQueues(CapacityScheduler.java:716) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initScheduler(CapacityScheduler.java:360) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.serviceInit(CapacityScheduler.java:425) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) ... 7 more I am sure the queue "bbb" is a leaf queue. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8804) resourceLimits may be wrongly calculated when leaf-queue is blocked in cluster with 3+ level queues
[ https://issues.apache.org/jira/browse/YARN-8804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16629699#comment-16629699 ] Tao Yang commented on YARN-8804: Thanks [~jlowe] for the review and commit! > resourceLimits may be wrongly calculated when leaf-queue is blocked in > cluster with 3+ level queues > --- > > Key: YARN-8804 > URL: https://issues.apache.org/jira/browse/YARN-8804 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.2.0 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Critical > Fix For: 2.10.0, 3.2.0, 2.9.2, 3.0.4, 3.1.2, 2.8.6 > > Attachments: YARN-8804.001.patch, YARN-8804.002.patch, > YARN-8804.003.patch > > > This problem is due to YARN-4280, parent queue will deduct child queue's > headroom when the child queue reached its resource limit and the skipped type > is QUEUE_LIMIT, the resource limits of deepest parent queue will be correctly > calculated, but for non-deepest parent queue, its headroom may be much more > than the sum of reached-limit child queues' headroom, so that the resource > limit of non-deepest parent may be much less than its true value and block > the allocation for later queues. > To reproduce this problem with UT: > (1) Cluster has two nodes whose node resource both are <10GB, 10core> and > 3-level queues as below, among them max-capacity of "c1" is 10 and others are > all 100, so that max-capacity of queue "c1" is <2GB, 2core> > {noformat} > Root > / | \ > a bc >10 20 70 > | \ > c1 c2 > 10(max=10) 90 > {noformat} > (2) Submit app1 to queue "c1" and launch am1(resource=<1GB, 1 core>) on nm1 > (3) Submit app2 to queue "b" and launch am2(resource=<1GB, 1 core>) on nm1 > (4) app1 and app2 both ask one <2GB, 1core> containers. > (5) nm1 do 1 heartbeat > Now queue "c" has lower capacity percentage than queue "b", the allocation > sequence will be "a" -> "c" -> "b", > queue "c1" has reached queue limit so that requests of app1 should be > pending, > headroom of queue "c1" is <1GB, 1core> (=max-capacity - used), > headroom of queue "c" is <18GB, 18core> (=max-capacity - used), > after allocation for queue "c", resource limit of queue "b" will be wrongly > calculated as <2GB, 2core>, > headroom of queue "b" will be <1GB, 1core> (=resource-limit - used) > so that scheduler won't allocate one container for app2 on nm1 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8827) Plumb per app, per user and per queue resource utilization from the NM to RM
Arun Suresh created YARN-8827: - Summary: Plumb per app, per user and per queue resource utilization from the NM to RM Key: YARN-8827 URL: https://issues.apache.org/jira/browse/YARN-8827 Project: Hadoop YARN Issue Type: Sub-task Reporter: Arun Suresh Assignee: Arun Suresh Opportunistic Containers for OverAllocation need to be allocated to pending applications in some fair manner. Rather than evaluating queue and user resource usage (allocated resource usage) and comparing against queue and user limits to decide the allocation, it might be make more sense to use a snapshot of actual resource utilization of the queue and user. To facilitate this, this JIRA proposes to aggregate per user, per app (and maybe per queue) resource utilization in addition to aggregated Container and Node Utilization and send it along with the NM heartbeat. It should be fairly inexpensive to aggregate - since it can be performed in the same loop of the {{ContainersMonitorImpl}}'s Monitoring thread. A snapshot aggregate can be made every couple of seconds in the RM. This instantaneous resource utilization should be used to decide if Opportunistic containers can be allocated to an App, Queue or User. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8734) Readiness check for remote service belongs to the same user
[ https://issues.apache.org/jira/browse/YARN-8734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16629545#comment-16629545 ] Eric Yang edited comment on YARN-8734 at 9/26/18 11:52 PM: --- [~gsaha] Good catch, I have corrected yml accordingly. [~billie.rinaldi] Patch 006 fixed the dependencies check after the service state has been changed to running, and renamed to dependencies instead of remote_service_dependencies. was (Author: eyang): [~gsaha] Good catch, I have corrected yml accordingly. [~billie.rinaldi] I have fixed the dependencies check after the service state has been changed to running, and renamed to dependencies instead of remote_service_dependencies. > Readiness check for remote service belongs to the same user > --- > > Key: YARN-8734 > URL: https://issues.apache.org/jira/browse/YARN-8734 > Project: Hadoop YARN > Issue Type: New Feature > Components: yarn-native-services >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Attachments: Component_dependencies.png, Dependency check vs.pdf, > Service_dependencies.png, YARN-8734.001.patch, YARN-8734.002.patch, > YARN-8734.003.patch, YARN-8734.004.patch, YARN-8734.005.patch, > YARN-8734.006.patch > > > When a service is deploying, there can be remote service dependency. It > would be nice to describe ZooKeeper as a dependent service, and the service > has reached a stable state, then deploy HBase. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8734) Readiness check for remote service belongs to the same user
[ https://issues.apache.org/jira/browse/YARN-8734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16629545#comment-16629545 ] Eric Yang commented on YARN-8734: - [~gsaha] Good catch, I have corrected yml accordingly. [~billie.rinaldi] I have fixed the dependencies check after the service state has been changed to running, and renamed to dependencies instead of remote_service_dependencies. > Readiness check for remote service belongs to the same user > --- > > Key: YARN-8734 > URL: https://issues.apache.org/jira/browse/YARN-8734 > Project: Hadoop YARN > Issue Type: New Feature > Components: yarn-native-services >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Attachments: Component_dependencies.png, Dependency check vs.pdf, > Service_dependencies.png, YARN-8734.001.patch, YARN-8734.002.patch, > YARN-8734.003.patch, YARN-8734.004.patch, YARN-8734.005.patch, > YARN-8734.006.patch > > > When a service is deploying, there can be remote service dependency. It > would be nice to describe ZooKeeper as a dependent service, and the service > has reached a stable state, then deploy HBase. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8734) Readiness check for remote service belongs to the same user
[ https://issues.apache.org/jira/browse/YARN-8734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang updated YARN-8734: Attachment: YARN-8734.006.patch > Readiness check for remote service belongs to the same user > --- > > Key: YARN-8734 > URL: https://issues.apache.org/jira/browse/YARN-8734 > Project: Hadoop YARN > Issue Type: New Feature > Components: yarn-native-services >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Attachments: Component_dependencies.png, Dependency check vs.pdf, > Service_dependencies.png, YARN-8734.001.patch, YARN-8734.002.patch, > YARN-8734.003.patch, YARN-8734.004.patch, YARN-8734.005.patch, > YARN-8734.006.patch > > > When a service is deploying, there can be remote service dependency. It > would be nice to describe ZooKeeper as a dependent service, and the service > has reached a stable state, then deploy HBase. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8804) resourceLimits may be wrongly calculated when leaf-queue is blocked in cluster with 3+ level queues
[ https://issues.apache.org/jira/browse/YARN-8804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16629535#comment-16629535 ] Hudson commented on YARN-8804: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #15065 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/15065/]) YARN-8804. resourceLimits may be wrongly calculated when leaf-queue is (jlowe: rev 6b988d821e62d29c118e10a7213583b92c302baf) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestContainerAllocation.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/ResourceLimits.java > resourceLimits may be wrongly calculated when leaf-queue is blocked in > cluster with 3+ level queues > --- > > Key: YARN-8804 > URL: https://issues.apache.org/jira/browse/YARN-8804 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.2.0 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Critical > Attachments: YARN-8804.001.patch, YARN-8804.002.patch, > YARN-8804.003.patch > > > This problem is due to YARN-4280, parent queue will deduct child queue's > headroom when the child queue reached its resource limit and the skipped type > is QUEUE_LIMIT, the resource limits of deepest parent queue will be correctly > calculated, but for non-deepest parent queue, its headroom may be much more > than the sum of reached-limit child queues' headroom, so that the resource > limit of non-deepest parent may be much less than its true value and block > the allocation for later queues. > To reproduce this problem with UT: > (1) Cluster has two nodes whose node resource both are <10GB, 10core> and > 3-level queues as below, among them max-capacity of "c1" is 10 and others are > all 100, so that max-capacity of queue "c1" is <2GB, 2core> > {noformat} > Root > / | \ > a bc >10 20 70 > | \ > c1 c2 > 10(max=10) 90 > {noformat} > (2) Submit app1 to queue "c1" and launch am1(resource=<1GB, 1 core>) on nm1 > (3) Submit app2 to queue "b" and launch am2(resource=<1GB, 1 core>) on nm1 > (4) app1 and app2 both ask one <2GB, 1core> containers. > (5) nm1 do 1 heartbeat > Now queue "c" has lower capacity percentage than queue "b", the allocation > sequence will be "a" -> "c" -> "b", > queue "c1" has reached queue limit so that requests of app1 should be > pending, > headroom of queue "c1" is <1GB, 1core> (=max-capacity - used), > headroom of queue "c" is <18GB, 18core> (=max-capacity - used), > after allocation for queue "c", resource limit of queue "b" will be wrongly > calculated as <2GB, 2core>, > headroom of queue "b" will be <1GB, 1core> (=resource-limit - used) > so that scheduler won't allocate one container for app2 on nm1 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Issue Comment Deleted] (YARN-8734) Readiness check for remote service belongs to the same user
[ https://issues.apache.org/jira/browse/YARN-8734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang updated YARN-8734: Comment: was deleted (was: [~gsaha], I see what you mean. It will be corrected in the next patch. [~billie.rinaldi] [~gsaha], there appears to be a serialization problem when service level object use the same keyword as the component level. When I changed to use keyword "dependencies", Jersey fails with a generic error message: {code} 2018-09-26 21:17:33,388 WARN org.apache.hadoop.yarn.webapp.GenericExceptionHandler: INTERNAL_SERVER_ERROR javax.ws.rs.WebApplicationException at com.sun.jersey.server.impl.uri.rules.TerminatingRule.accept(TerminatingRule.java:66) at com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108) at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84) at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1542) at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1473) at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1419) at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1409) at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:409) at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:558) at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:733) at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:848) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:89) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:941) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:875) at org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebAppFilter.doFilter(RMWebAppFilter.java:179) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:829) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:82) at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:119) at com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:133) at com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:130) at com.google.inject.servlet.GuiceFilter$Context.call(GuiceFilter.java:203) at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:130) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) at org.apache.hadoop.security.http.XFrameOptionsFilter.doFilter(XFrameOptionsFilter.java:57) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:644) at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:592) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) at org.apache.hadoop.security.http.CrossOriginFilter.doFilter(CrossOriginFilter.java:98) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) at org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1610) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
[jira] [Commented] (YARN-8734) Readiness check for remote service belongs to the same user
[ https://issues.apache.org/jira/browse/YARN-8734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16629464#comment-16629464 ] Eric Yang commented on YARN-8734: - [~gsaha], I see what you mean. It will be corrected in the next patch. [~billie.rinaldi] [~gsaha], there appears to be a serialization problem when service level object use the same keyword as the component level. When I changed to use keyword "dependencies", Jersey fails with a generic error message: {code} 2018-09-26 21:17:33,388 WARN org.apache.hadoop.yarn.webapp.GenericExceptionHandler: INTERNAL_SERVER_ERROR javax.ws.rs.WebApplicationException at com.sun.jersey.server.impl.uri.rules.TerminatingRule.accept(TerminatingRule.java:66) at com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108) at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84) at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1542) at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1473) at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1419) at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1409) at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:409) at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:558) at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:733) at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:848) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:89) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:941) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:875) at org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebAppFilter.doFilter(RMWebAppFilter.java:179) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:829) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:82) at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:119) at com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:133) at com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:130) at com.google.inject.servlet.GuiceFilter$Context.call(GuiceFilter.java:203) at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:130) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) at org.apache.hadoop.security.http.XFrameOptionsFilter.doFilter(XFrameOptionsFilter.java:57) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:644) at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:592) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) at org.apache.hadoop.security.http.CrossOriginFilter.doFilter(CrossOriginFilter.java:98) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) at org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1610) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512) at
[jira] [Commented] (YARN-6456) Allow administrators to set a single ContainerRuntime for all containers
[ https://issues.apache.org/jira/browse/YARN-6456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16629406#comment-16629406 ] Eric Yang commented on YARN-6456: - I am trying to figure out the relationship between the configurations: Allow user to choose between Linux container and Docker container. {code} yarn.nodemanager.runtime.linux.type=default yarn.nodemanager.runtime.linux.allowed-runtimes=default,docker {code} Allow user to run only Linux container: {code} yarn.nodemanager.runtime.linux.type=default yarn.nodemanager.runtime.linux.allowed-runtimes=default {code} Allow user to run only Docker container: {code} yarn.nodemanager.runtime.linux.type=docker yarn.nodemanager.runtime.linux.allowed-runtimes=docker {code} Is this the intend? Why not use allowed-runtime to set the single container runtime? > Allow administrators to set a single ContainerRuntime for all containers > > > Key: YARN-6456 > URL: https://issues.apache.org/jira/browse/YARN-6456 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Miklos Szegedi >Assignee: Craig Condit >Priority: Major > Labels: Docker > Attachments: YARN-6456-ForceDockerRuntimeIfSupported.patch, > YARN-6456.001.patch, YARN-6456.002.patch, YARN-6456.003.patch, > YARN-6456.004.patch, YARN-6456.005.patch > > > > With LCE, there are multiple ContainerRuntimes available for handling > different types of containers; default, docker, java sandbox. Admins should > have the ability to override the user decision and set a single global > ContainerRuntime to be used for all containers. > Original Description: > {quote}One reason to use Docker containers is to be able to isolate different > workloads, even, if they run as the same user. > I have noticed some issues in the current design: > 1. DockerLinuxContainerRuntime mounts containerLocalDirs > {{nm-local-dir/usercache/user/appcache/application_1491598755372_0011/}} and > userLocalDirs {{nm-local-dir/usercache/user/}}, so that a container can see > and modify the files of another container. I think the application file cache > directory should be enough for the container to run in most of the cases. > 2. The whole cgroups directory is mounted. Would the container directory be > enough? > 3. There is no way to enforce exclusive use of Docker for all containers. > There should be an option that it is not the user but the admin that requires > to use Docker. > {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6456) Allow administrators to set a single ContainerRuntime for all containers
[ https://issues.apache.org/jira/browse/YARN-6456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16629336#comment-16629336 ] Eric Yang commented on YARN-6456: - [~ccondit-target] Thank you for patch 5. Is yarn.nodemanager.runtime.linux.type going to be an alias mapping to class name? How does default maps to DefaultLinuxContainerRuntime or docker maps to DockerLinuxContainerRuntime? This part of logic is unclear to me. > Allow administrators to set a single ContainerRuntime for all containers > > > Key: YARN-6456 > URL: https://issues.apache.org/jira/browse/YARN-6456 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Miklos Szegedi >Assignee: Craig Condit >Priority: Major > Labels: Docker > Attachments: YARN-6456-ForceDockerRuntimeIfSupported.patch, > YARN-6456.001.patch, YARN-6456.002.patch, YARN-6456.003.patch, > YARN-6456.004.patch, YARN-6456.005.patch > > > > With LCE, there are multiple ContainerRuntimes available for handling > different types of containers; default, docker, java sandbox. Admins should > have the ability to override the user decision and set a single global > ContainerRuntime to be used for all containers. > Original Description: > {quote}One reason to use Docker containers is to be able to isolate different > workloads, even, if they run as the same user. > I have noticed some issues in the current design: > 1. DockerLinuxContainerRuntime mounts containerLocalDirs > {{nm-local-dir/usercache/user/appcache/application_1491598755372_0011/}} and > userLocalDirs {{nm-local-dir/usercache/user/}}, so that a container can see > and modify the files of another container. I think the application file cache > directory should be enough for the container to run in most of the cases. > 2. The whole cgroups directory is mounted. Would the container directory be > enough? > 3. There is no way to enforce exclusive use of Docker for all containers. > There should be an option that it is not the user but the admin that requires > to use Docker. > {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8665) Yarn Service Upgrade: Support cancelling upgrade
[ https://issues.apache.org/jira/browse/YARN-8665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16629290#comment-16629290 ] Hudson commented on YARN-8665: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #15064 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/15064/]) YARN-8665. Added Yarn service cancel upgrade option. (eyang: rev 913f87dada27776c539dfb352400ecf8d40e7943) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-api/src/main/java/org/apache/hadoop/yarn/service/webapp/ApiServer.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-api/src/main/java/org/apache/hadoop/yarn/service/client/ApiServiceClient.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/main/java/org/apache/hadoop/yarn/service/component/ComponentEventType.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/test/java/org/apache/hadoop/yarn/service/component/TestComponent.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/main/java/org/apache/hadoop/yarn/service/api/records/ServiceState.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/main/java/org/apache/hadoop/yarn/service/component/Component.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/main/java/org/apache/hadoop/yarn/service/utils/ServiceApiUtil.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/test/java/org/apache/hadoop/yarn/service/TestYarnNativeServices.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/main/java/org/apache/hadoop/yarn/service/impl/pb/client/ClientAMProtocolPBClientImpl.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/main/java/org/apache/hadoop/yarn/service/ClientAMService.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/main/java/org/apache/hadoop/yarn/service/component/instance/ComponentInstanceState.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/test/java/org/apache/hadoop/yarn/service/client/TestServiceCLI.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/main/java/org/apache/hadoop/yarn/service/impl/pb/service/ClientAMProtocolPBServiceImpl.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManagerImpl.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/test/java/org/apache/hadoop/yarn/service/ServiceTestUtils.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/main/java/org/apache/hadoop/yarn/service/component/instance/ComponentInstance.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/main/java/org/apache/hadoop/yarn/service/client/ServiceClient.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/main/java/org/apache/hadoop/yarn/service/ServiceEvent.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/api/AppAdminClient.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/main/java/org/apache/hadoop/yarn/service/component/ComponentState.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/main/java/org/apache/hadoop/yarn/service/api/records/ContainerState.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/main/java/org/apache/hadoop/yarn/service/containerlaunch/ContainerLaunchService.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/main/java/org/apache/hadoop/yarn/service/ClientAMProtocol.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/ApplicationCLI.java * (edit)
[jira] [Commented] (YARN-8665) Yarn Service Upgrade: Support cancelling upgrade
[ https://issues.apache.org/jira/browse/YARN-8665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16629278#comment-16629278 ] Chandni Singh commented on YARN-8665: - Thanks [~eyang] for reviewing and merging this. > Yarn Service Upgrade: Support cancelling upgrade > - > > Key: YARN-8665 > URL: https://issues.apache.org/jira/browse/YARN-8665 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chandni Singh >Assignee: Chandni Singh >Priority: Major > Fix For: 3.2.0 > > Attachments: YARN-8665.001.patch, YARN-8665.002.patch, > YARN-8665.003.patch, YARN-8665.004.patch, YARN-8665.005.patch > > > When a service is upgraded without auto-finalization or express upgrade, then > the upgrade can be cancelled. This provides the user ability to test upgrade > of a single instance and if that doesn't go well, they get a chance to cancel > it. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8789) Add BoundedQueue to AsyncDispatcher
[ https://issues.apache.org/jira/browse/YARN-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16629275#comment-16629275 ] Hadoop QA commented on YARN-8789: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 26s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 13 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 59s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 55s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 17m 37s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 3m 42s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 4m 15s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 22m 11s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 6m 11s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 28s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 23s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 50s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 16m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 16m 19s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 3m 51s{color} | {color:orange} root: The patch generated 11 new + 890 unchanged - 13 fixed = 901 total (was 903) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 26s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 18s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 5s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 4m 6s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red}127m 13s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 4m 31s{color} | {color:green} hadoop-mapreduce-client-core in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 11m 18s{color} | {color:red} hadoop-mapreduce-client-app in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 47s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}274m 1s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart | | | hadoop.yarn.server.resourcemanager.TestContainerResourceUsage | | | hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation | | | hadoop.mapreduce.v2.app.TestMRAppMaster | | | hadoop.mapreduce.v2.app.TestFetchFailure | | | hadoop.mapreduce.v2.app.TestAMInfos
[jira] [Resolved] (YARN-7512) Support service upgrade via YARN Service API and CLI
[ https://issues.apache.org/jira/browse/YARN-7512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang resolved YARN-7512. - Resolution: Fixed Thank you [~csingh] for working on this feature. All tasks are done, close this story as fixed. > Support service upgrade via YARN Service API and CLI > > > Key: YARN-7512 > URL: https://issues.apache.org/jira/browse/YARN-7512 > Project: Hadoop YARN > Issue Type: New Feature > Components: yarn-native-services >Reporter: Gour Saha >Assignee: Chandni Singh >Priority: Major > Fix For: 3.2.0 > > Attachments: _In-Place Upgrade of Long-Running Applications in > YARN_v1.pdf, _In-Place Upgrade of Long-Running Applications in YARN_v2.pdf, > _In-Place Upgrade of Long-Running Applications in YARN_v3.pdf > > > YARN Service API and CLI needs to support service (and containers) upgrade in > line with what Slider supported in SLIDER-787 > (http://slider.incubator.apache.org/docs/slider_specs/application_pkg_upgrade.html) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7512) Support service upgrade via YARN Service API and CLI
[ https://issues.apache.org/jira/browse/YARN-7512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang updated YARN-7512: Target Version/s: 3.2.0 (was: 3.1.2) Fix Version/s: (was: yarn-native-services) 3.2.0 Component/s: yarn-native-services > Support service upgrade via YARN Service API and CLI > > > Key: YARN-7512 > URL: https://issues.apache.org/jira/browse/YARN-7512 > Project: Hadoop YARN > Issue Type: New Feature > Components: yarn-native-services >Reporter: Gour Saha >Assignee: Chandni Singh >Priority: Major > Fix For: 3.2.0 > > Attachments: _In-Place Upgrade of Long-Running Applications in > YARN_v1.pdf, _In-Place Upgrade of Long-Running Applications in YARN_v2.pdf, > _In-Place Upgrade of Long-Running Applications in YARN_v3.pdf > > > YARN Service API and CLI needs to support service (and containers) upgrade in > line with what Slider supported in SLIDER-787 > (http://slider.incubator.apache.org/docs/slider_specs/application_pkg_upgrade.html) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8734) Readiness check for remote service belongs to the same user
[ https://issues.apache.org/jira/browse/YARN-8734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16629178#comment-16629178 ] Hadoop QA commented on YARN-8734: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 6s{color} | {color:red} YARN-8734 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | YARN-8734 | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/21979/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > Readiness check for remote service belongs to the same user > --- > > Key: YARN-8734 > URL: https://issues.apache.org/jira/browse/YARN-8734 > Project: Hadoop YARN > Issue Type: New Feature > Components: yarn-native-services >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Attachments: Component_dependencies.png, Dependency check vs.pdf, > Service_dependencies.png, YARN-8734.001.patch, YARN-8734.002.patch, > YARN-8734.003.patch, YARN-8734.004.patch, YARN-8734.005.patch > > > When a service is deploying, there can be remote service dependency. It > would be nice to describe ZooKeeper as a dependent service, and the service > has reached a stable state, then deploy HBase. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8734) Readiness check for remote service belongs to the same user
[ https://issues.apache.org/jira/browse/YARN-8734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16629170#comment-16629170 ] Billie Rinaldi commented on YARN-8734: -- Oh, I see what you mean now, [~gsaha]. I didn't realize you were talking about the yaml; I thought it was about the Java objects. Thanks for clarifying. > Readiness check for remote service belongs to the same user > --- > > Key: YARN-8734 > URL: https://issues.apache.org/jira/browse/YARN-8734 > Project: Hadoop YARN > Issue Type: New Feature > Components: yarn-native-services >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Attachments: Component_dependencies.png, Dependency check vs.pdf, > Service_dependencies.png, YARN-8734.001.patch, YARN-8734.002.patch, > YARN-8734.003.patch, YARN-8734.004.patch, YARN-8734.005.patch > > > When a service is deploying, there can be remote service dependency. It > would be nice to describe ZooKeeper as a dependent service, and the service > has reached a stable state, then deploy HBase. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8734) Readiness check for remote service belongs to the same user
[ https://issues.apache.org/jira/browse/YARN-8734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16629168#comment-16629168 ] Gour Saha commented on YARN-8734: - bq. Yes, component dependencies is also outside of component properties in the component section. I think this is aligned correctly. [~eyang], am I missing something here? Please see where "dependencies" is defined in Component_dependencies.png vs where it is defined in Service_dependencies.png (attached). > Readiness check for remote service belongs to the same user > --- > > Key: YARN-8734 > URL: https://issues.apache.org/jira/browse/YARN-8734 > Project: Hadoop YARN > Issue Type: New Feature > Components: yarn-native-services >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Attachments: Component_dependencies.png, Dependency check vs.pdf, > Service_dependencies.png, YARN-8734.001.patch, YARN-8734.002.patch, > YARN-8734.003.patch, YARN-8734.004.patch, YARN-8734.005.patch > > > When a service is deploying, there can be remote service dependency. It > would be nice to describe ZooKeeper as a dependent service, and the service > has reached a stable state, then deploy HBase. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8734) Readiness check for remote service belongs to the same user
[ https://issues.apache.org/jira/browse/YARN-8734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gour Saha updated YARN-8734: Attachment: Service_dependencies.png Component_dependencies.png > Readiness check for remote service belongs to the same user > --- > > Key: YARN-8734 > URL: https://issues.apache.org/jira/browse/YARN-8734 > Project: Hadoop YARN > Issue Type: New Feature > Components: yarn-native-services >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Attachments: Component_dependencies.png, Dependency check vs.pdf, > Service_dependencies.png, YARN-8734.001.patch, YARN-8734.002.patch, > YARN-8734.003.patch, YARN-8734.004.patch, YARN-8734.005.patch > > > When a service is deploying, there can be remote service dependency. It > would be nice to describe ZooKeeper as a dependent service, and the service > has reached a stable state, then deploy HBase. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7644) NM gets backed up deleting docker containers
[ https://issues.apache.org/jira/browse/YARN-7644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16629097#comment-16629097 ] Eric Badger commented on YARN-7644: --- bq. With this Jira, I can focus on CLEANUP_CONTAINER and CLEANUP_CONTAINER_FOR_REINIT events to be performed in a non-blocking way. That sounds like the correct approach to me > NM gets backed up deleting docker containers > > > Key: YARN-7644 > URL: https://issues.apache.org/jira/browse/YARN-7644 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Eric Badger >Assignee: Chandni Singh >Priority: Major > Labels: Docker > > We are sending a {{docker stop}} to the docker container with a timeout of 10 > seconds when we shut down a container. If the container does not stop after > 10 seconds then we force kill it. However, the {{docker stop}} command is a > blocking call. So in cases where lots of containers don't go down with the > initial SIGTERM, we have to wait 10+ seconds for the {{docker stop}} to > return. This ties up the ContainerLaunch handler and so these kill events > back up. It also appears to be backing up new container launches as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7129) Application Catalog for YARN applications
[ https://issues.apache.org/jira/browse/YARN-7129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16628993#comment-16628993 ] Hadoop QA commented on YARN-7129: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 7s{color} | {color:red} YARN-7129 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | YARN-7129 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12941410/YARN-7129.012.patch | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/21978/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > Application Catalog for YARN applications > - > > Key: YARN-7129 > URL: https://issues.apache.org/jira/browse/YARN-7129 > Project: Hadoop YARN > Issue Type: New Feature > Components: applications >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Attachments: YARN Appstore.pdf, YARN-7129.001.patch, > YARN-7129.002.patch, YARN-7129.003.patch, YARN-7129.004.patch, > YARN-7129.005.patch, YARN-7129.006.patch, YARN-7129.007.patch, > YARN-7129.008.patch, YARN-7129.009.patch, YARN-7129.010.patch, > YARN-7129.011.patch, YARN-7129.012.patch > > > YARN native services provides web services API to improve usability of > application deployment on Hadoop using collection of docker images. It would > be nice to have an application catalog system which provides an editorial and > search interface for YARN applications. This improves usability of YARN for > manage the life cycle of applications. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7129) Application Catalog for YARN applications
[ https://issues.apache.org/jira/browse/YARN-7129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16628987#comment-16628987 ] Eric Yang commented on YARN-7129: - - Patch 012 Added ability to configure app prior to deploy. > Application Catalog for YARN applications > - > > Key: YARN-7129 > URL: https://issues.apache.org/jira/browse/YARN-7129 > Project: Hadoop YARN > Issue Type: New Feature > Components: applications >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Attachments: YARN Appstore.pdf, YARN-7129.001.patch, > YARN-7129.002.patch, YARN-7129.003.patch, YARN-7129.004.patch, > YARN-7129.005.patch, YARN-7129.006.patch, YARN-7129.007.patch, > YARN-7129.008.patch, YARN-7129.009.patch, YARN-7129.010.patch, > YARN-7129.011.patch, YARN-7129.012.patch > > > YARN native services provides web services API to improve usability of > application deployment on Hadoop using collection of docker images. It would > be nice to have an application catalog system which provides an editorial and > search interface for YARN applications. This improves usability of YARN for > manage the life cycle of applications. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7129) Application Catalog for YARN applications
[ https://issues.apache.org/jira/browse/YARN-7129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang updated YARN-7129: Attachment: YARN-7129.012.patch > Application Catalog for YARN applications > - > > Key: YARN-7129 > URL: https://issues.apache.org/jira/browse/YARN-7129 > Project: Hadoop YARN > Issue Type: New Feature > Components: applications >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Attachments: YARN Appstore.pdf, YARN-7129.001.patch, > YARN-7129.002.patch, YARN-7129.003.patch, YARN-7129.004.patch, > YARN-7129.005.patch, YARN-7129.006.patch, YARN-7129.007.patch, > YARN-7129.008.patch, YARN-7129.009.patch, YARN-7129.010.patch, > YARN-7129.011.patch, YARN-7129.012.patch > > > YARN native services provides web services API to improve usability of > application deployment on Hadoop using collection of docker images. It would > be nice to have an application catalog system which provides an editorial and > search interface for YARN applications. This improves usability of YARN for > manage the life cycle of applications. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8826) Fix lingering timeline collector after serviceStop in TimelineCollectorManager
[ https://issues.apache.org/jira/browse/YARN-8826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabha Manepalli updated YARN-8826: --- Attachment: YARN-8826.v1.patch > Fix lingering timeline collector after serviceStop in TimelineCollectorManager > -- > > Key: YARN-8826 > URL: https://issues.apache.org/jira/browse/YARN-8826 > Project: Hadoop YARN > Issue Type: Bug > Components: ATSv2 >Reporter: Prabha Manepalli >Assignee: Prabha Manepalli >Priority: Trivial > Attachments: YARN-8826.v1.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8826) Fix lingering timeline collector after serviceStop in TimelineCollectorManager
[ https://issues.apache.org/jira/browse/YARN-8826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabha Manepalli updated YARN-8826: --- Component/s: ATSv2 > Fix lingering timeline collector after serviceStop in TimelineCollectorManager > -- > > Key: YARN-8826 > URL: https://issues.apache.org/jira/browse/YARN-8826 > Project: Hadoop YARN > Issue Type: Bug > Components: ATSv2 >Reporter: Prabha Manepalli >Assignee: Prabha Manepalli >Priority: Trivial > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8826) Fix lingering timeline collector after serviceStop in TimelineCollectorManager
Prabha Manepalli created YARN-8826: -- Summary: Fix lingering timeline collector after serviceStop in TimelineCollectorManager Key: YARN-8826 URL: https://issues.apache.org/jira/browse/YARN-8826 Project: Hadoop YARN Issue Type: Bug Reporter: Prabha Manepalli Assignee: Prabha Manepalli -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8789) Add BoundedQueue to AsyncDispatcher
[ https://issues.apache.org/jira/browse/YARN-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated YARN-8789: -- Attachment: YARN-8789.9.patch > Add BoundedQueue to AsyncDispatcher > --- > > Key: YARN-8789 > URL: https://issues.apache.org/jira/browse/YARN-8789 > Project: Hadoop YARN > Issue Type: Improvement > Components: applications >Affects Versions: 3.2.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Major > Attachments: YARN-8789.1.patch, YARN-8789.2.patch, YARN-8789.3.patch, > YARN-8789.4.patch, YARN-8789.5.patch, YARN-8789.6.patch, YARN-8789.7.patch, > YARN-8789.7.patch, YARN-8789.8.patch, YARN-8789.9.patch > > > I recently came across a scenario where an MR ApplicationMaster was failing > with an OOM exception. It had many thousands of Mappers and thousands of > Reducers. It was noted that in the logging that the event-queue of > {{AsyncDispatcher}} had a very large number of item in it and was seemingly > never decreasing. > I started looking at the code and thought it could use some clean up, > simplification, and the ability to specify a bounded queue so that any > incoming events are throttled until they can be processed. This will protect > the ApplicationMaster from a flood of events. > Logging Message: > Size of event-queue is xxx -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8824) App Nodelabel missed after RM restart for finished apps
[ https://issues.apache.org/jira/browse/YARN-8824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16628768#comment-16628768 ] Hadoop QA commented on YARN-8824: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 25m 22s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 3 new or modified test files. {color} | || || || || {color:brown} branch-3.1 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 28m 44s{color} | {color:green} branch-3.1 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 58s{color} | {color:green} branch-3.1 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 45s{color} | {color:green} branch-3.1 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 59s{color} | {color:green} branch-3.1 passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 2s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 32s{color} | {color:green} branch-3.1 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 36s{color} | {color:green} branch-3.1 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 55s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 9s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 34s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 81m 47s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 32s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}174m 36s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:a607c02 | | JIRA Issue | YARN-8824 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12941362/YARN-8824-branch-3.1.001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux a6fc012b0170 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | branch-3.1 / 7dbdb75 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/21974/testReport/ | | Max. process+thread count | 827 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/21974/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated.
[jira] [Commented] (YARN-7825) Maintain constant horizontal application info bar for all pages
[ https://issues.apache.org/jira/browse/YARN-7825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16628716#comment-16628716 ] Hadoop QA commented on YARN-7825: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 21s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 4s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 30m 20s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 34s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 35s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 45m 35s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:4b8c2b1 | | JIRA Issue | YARN-7825 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12941378/YARN-7825.003.patch | | Optional Tests | dupname asflicense shadedclient | | uname | Linux 08542462358d 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 10:45:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / e5287a4 | | maven | version: Apache Maven 3.3.9 | | Max. process+thread count | 331 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/21976/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > Maintain constant horizontal application info bar for all pages > --- > > Key: YARN-7825 > URL: https://issues.apache.org/jira/browse/YARN-7825 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-ui-v2 >Reporter: Yesha Vora >Assignee: Akhil PB >Priority: Major > Attachments: Screen Shot 2018-04-10 at 11.06.27 AM.png, Screen Shot > 2018-04-10 at 11.06.40 AM.png, Screen Shot 2018-04-10 at 11.07.07 AM.png, > Screen Shot 2018-04-10 at 11.07.29 AM.png, Screen Shot 2018-04-10 at 11.15.27 > AM.png, YARN-7825.001.patch, YARN-7825.002.patch, YARN-7825.003.patch > > > Steps: > 1) enable Ats v2 > 2) Start Yarn service application ( Httpd ) > 3) Fix horizontal info bar for below pages. > * component page > * Component Instance info page > * Application attempt Info -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8270) Adding JMX Metrics for Timeline Collector and Reader
[ https://issues.apache.org/jira/browse/YARN-8270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16628666#comment-16628666 ] Hadoop QA commented on YARN-8270: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 14s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 56s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 18s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 27s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 44s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 34s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 27s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 1s{color} | {color:green} hadoop-yarn-server-timelineservice in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 27s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 50m 29s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:4b8c2b1 | | JIRA Issue | YARN-8270 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12941376/YARN-8270.005.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 3a8a57ddcd6d 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 10:45:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / e5287a4 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/21975/testReport/ | | Max. process+thread count | 302 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/21975/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > Adding JMX Metrics for Timeline Collector and
[jira] [Updated] (YARN-7825) Maintain constant horizontal application info bar for all pages
[ https://issues.apache.org/jira/browse/YARN-7825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akhil PB updated YARN-7825: --- Attachment: YARN-7825.003.patch > Maintain constant horizontal application info bar for all pages > --- > > Key: YARN-7825 > URL: https://issues.apache.org/jira/browse/YARN-7825 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-ui-v2 >Reporter: Yesha Vora >Assignee: Akhil PB >Priority: Major > Attachments: Screen Shot 2018-04-10 at 11.06.27 AM.png, Screen Shot > 2018-04-10 at 11.06.40 AM.png, Screen Shot 2018-04-10 at 11.07.07 AM.png, > Screen Shot 2018-04-10 at 11.07.29 AM.png, Screen Shot 2018-04-10 at 11.15.27 > AM.png, YARN-7825.001.patch, YARN-7825.002.patch, YARN-7825.003.patch > > > Steps: > 1) enable Ats v2 > 2) Start Yarn service application ( Httpd ) > 3) Fix horizontal info bar for below pages. > * component page > * Component Instance info page > * Application attempt Info -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5939) FSDownload leaks FileSystem resources
[ https://issues.apache.org/jira/browse/YARN-5939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16628560#comment-16628560 ] Weiwei Yang commented on YARN-5939: --- Hi [~bibinchundatt] Ah I see your concern, good point. The the localization code, I think it keeps tracking the state of local resource, it avoids to download same file concurrently using same file system instance. So it should be fine with the patch. But then I took a deeper look at this and I found YARN-58 has added {code:java} FileSystem.closeAllForUGI(ugi); {code} when ContainerLocalizer exits, so it should be able to avoid resource leaks. In this case, I don't think we need this patch anymore. I am not sure why [~lxw668899] at first place raised up this issue, maybe it was related to their self-defined file system implementation? A safer approach, lets close this a dup of YARN-58, unless any further comments received. Thanks [~bibinchundatt] for raising this up, appreciate that. > FSDownload leaks FileSystem resources > - > > Key: YARN-5939 > URL: https://issues.apache.org/jira/browse/YARN-5939 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.5.1, 2.7.3 >Reporter: liuxiangwei >Assignee: Weiwei Yang >Priority: Major > Labels: leak > Attachments: YARN-5939.004.patch, YARN-5939.005.patch, > YARN-5939.01.patch, YARN-5939.02.patch, YARN-5939.03.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > Background > To use our self-defined FileSystem class, the item of configuration > "fs.%s.impl.disable.cache" should set to true. > In YARN's source code, the class named > "org.apache.hadoop.yarn.util.FSDownload" use getFileSystem but never close, > which leading to file descriptor leak because our self-defined FileSystem > class close the file descriptor when the close function is invoked. > My Question below: > 1. whether invoking "getFileSystem" but never close is YARN's expected > behavior > 2. what should we do in our self-defined FileSystem resolve it. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8824) App Nodelabel missed after RM restart for finished apps
[ https://issues.apache.org/jira/browse/YARN-8824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16628509#comment-16628509 ] Bibin A Chundatt commented on YARN-8824: Thank you [~rohithsharma] for review and commit. Added 3.1 patch too.. > App Nodelabel missed after RM restart for finished apps > > > Key: YARN-8824 > URL: https://issues.apache.org/jira/browse/YARN-8824 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Major > Attachments: YARN-8824-branch-3.1.001.patch, YARN-8824.001.patch > > > Similar to YARN-8815 nodelabel for application is lost for finished > application after restart -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8303) YarnClient should contact TimelineReader for application/attempt/container report
[ https://issues.apache.org/jira/browse/YARN-8303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16628494#comment-16628494 ] Abhishek Modi commented on YARN-8303: - [~rohithsharma] I will submit an updated patch by EOD today. Thanks. > YarnClient should contact TimelineReader for application/attempt/container > report > - > > Key: YARN-8303 > URL: https://issues.apache.org/jira/browse/YARN-8303 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Rohith Sharma K S >Assignee: Abhishek Modi >Priority: Critical > Attachments: YARN-8303.poc.patch > > > YarnClient get app/attempt/container information from RM. If RM doesn't have > then queried to ahsClient. When ATSv2 is only enabled, yarnClient will result > empty. > YarnClient is used by many users which result in empty information for > app/attempt/container report. > Proposal is to have adapter from yarn client so that app/attempt/container > reports can be generated from AHSv2Client which does REST API to > TimelineReader and get the entity and convert it into app/attempt/container > report. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8824) App Nodelabel missed after RM restart for finished apps
[ https://issues.apache.org/jira/browse/YARN-8824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-8824: --- Attachment: YARN-8824-branch-3.1.001.patch > App Nodelabel missed after RM restart for finished apps > > > Key: YARN-8824 > URL: https://issues.apache.org/jira/browse/YARN-8824 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Major > Attachments: YARN-8824-branch-3.1.001.patch, YARN-8824.001.patch > > > Similar to YARN-8815 nodelabel for application is lost for finished > application after restart -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-5939) FSDownload leaks FileSystem resources
[ https://issues.apache.org/jira/browse/YARN-5939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16628484#comment-16628484 ] Bibin A Chundatt edited comment on YARN-5939 at 9/26/18 9:44 AM: - {quote}It firstly gets a FileSystem instance {quote} For multiple files, FileSystem instance can be the same rt ??. Incase of cache enabled CACHEKEY of FileSystem is combination of schema+authority,not complete uri. So DistributedFileSystem.close -> All the open streams will be closed of that FileSystem instance. Did i miss something? *Code* {code} 267 try (FileSystem sourceFs = sCopy.getFileSystem(conf)){ 268 FileStatus sStat = sourceFs.getFileStatus(sCopy); {code} was (Author: bibinchundatt): {quote}It firstly gets a FileSystem instance {quote} For multiple files, FileSystem instance can be the same rt ??. Incase of cache enabled CACHEKEY of FileSystem is combination of schema+authority,not complete uri. So DistributedFileSystem.close -> All the open streams will be closed of that FileSystem instance. Did i miss something?? > FSDownload leaks FileSystem resources > - > > Key: YARN-5939 > URL: https://issues.apache.org/jira/browse/YARN-5939 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.5.1, 2.7.3 >Reporter: liuxiangwei >Assignee: Weiwei Yang >Priority: Major > Labels: leak > Attachments: YARN-5939.004.patch, YARN-5939.005.patch, > YARN-5939.01.patch, YARN-5939.02.patch, YARN-5939.03.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > Background > To use our self-defined FileSystem class, the item of configuration > "fs.%s.impl.disable.cache" should set to true. > In YARN's source code, the class named > "org.apache.hadoop.yarn.util.FSDownload" use getFileSystem but never close, > which leading to file descriptor leak because our self-defined FileSystem > class close the file descriptor when the close function is invoked. > My Question below: > 1. whether invoking "getFileSystem" but never close is YARN's expected > behavior > 2. what should we do in our self-defined FileSystem resolve it. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5939) FSDownload leaks FileSystem resources
[ https://issues.apache.org/jira/browse/YARN-5939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16628484#comment-16628484 ] Bibin A Chundatt commented on YARN-5939: {quote}It firstly gets a FileSystem instance {quote} For multiple files, FileSystem instance can be the same rt ??. Incase of cache enabled CACHEKEY of FileSystem is combination of schema+authority,not complete uri. So DistributedFileSystem.close -> All the open streams will be closed of that FileSystem instance. Did i miss something?? > FSDownload leaks FileSystem resources > - > > Key: YARN-5939 > URL: https://issues.apache.org/jira/browse/YARN-5939 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.5.1, 2.7.3 >Reporter: liuxiangwei >Assignee: Weiwei Yang >Priority: Major > Labels: leak > Attachments: YARN-5939.004.patch, YARN-5939.005.patch, > YARN-5939.01.patch, YARN-5939.02.patch, YARN-5939.03.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > Background > To use our self-defined FileSystem class, the item of configuration > "fs.%s.impl.disable.cache" should set to true. > In YARN's source code, the class named > "org.apache.hadoop.yarn.util.FSDownload" use getFileSystem but never close, > which leading to file descriptor leak because our self-defined FileSystem > class close the file descriptor when the close function is invoked. > My Question below: > 1. whether invoking "getFileSystem" but never close is YARN's expected > behavior > 2. what should we do in our self-defined FileSystem resolve it. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-5939) FSDownload leaks FileSystem resources
[ https://issues.apache.org/jira/browse/YARN-5939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16628380#comment-16628380 ] Weiwei Yang edited comment on YARN-5939 at 9/26/18 8:03 AM: Hi [~bibinchundatt] Let me make sure I understand your question. Lets assume {{ResourceLocalizationService#PublicLocalizer}} has configured thread pool size 4, then there are 4 {{FSDownload}} threads doing the localization concurrently. Each of such thread will be initiated with a new {{FSDownload}} instance {code:java} pending.put(queue.submit(new FSDownload(lfs, null, conf, publicDirDestPath, resource, request.getContext().getStatCache())), request); {code} then each FSDownload instance will take care of downloading resources from a file system. When it starts to download, e.g calling {{downloadAndUnpack}}. It firstly gets a {{FileSystem}} instance, and normally it creates a few {{DFSOutputStream}} for I/O operations on certain files. After download is accomplished, calling #close like what added in this patch will close those streams. So it is supposed to only close streams for a certain {{FSDownload}} thread, not others. Does that make sense to you? was (Author: cheersyang): Hi [~bibinchundatt] Let me make sure I understand your question. Lets assume {{ResourceLocalizationService#PublicLocalizer}} has configured thread pool size 4, then there are 4 {{FSDownload}} threads doing the localization concurrently. Each of such thread will be initiated with a new {{FSDownload}} instance {code:java} pending.put(queue.submit(new FSDownload(lfs, null, conf, publicDirDestPath, resource, request.getContext().getStatCache())), request); {code} then each FSDownload instance will take care of downloading resources from a file system. When it starts to download, e.g calling {{downloadAndUnpack.}}It firstly gets a {{FileSystem}} instance, and normally it creates a few {{DFSOutputStream}} for I/O operations on certain files. After download is accomplished, calling #close like what added in this patch will close those streams. So it is supposed to only close streams for a certain {{FSDownload}} thread, not others. Does that make sense to you? > FSDownload leaks FileSystem resources > - > > Key: YARN-5939 > URL: https://issues.apache.org/jira/browse/YARN-5939 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.5.1, 2.7.3 >Reporter: liuxiangwei >Assignee: Weiwei Yang >Priority: Major > Labels: leak > Attachments: YARN-5939.004.patch, YARN-5939.005.patch, > YARN-5939.01.patch, YARN-5939.02.patch, YARN-5939.03.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > Background > To use our self-defined FileSystem class, the item of configuration > "fs.%s.impl.disable.cache" should set to true. > In YARN's source code, the class named > "org.apache.hadoop.yarn.util.FSDownload" use getFileSystem but never close, > which leading to file descriptor leak because our self-defined FileSystem > class close the file descriptor when the close function is invoked. > My Question below: > 1. whether invoking "getFileSystem" but never close is YARN's expected > behavior > 2. what should we do in our self-defined FileSystem resolve it. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5939) FSDownload leaks FileSystem resources
[ https://issues.apache.org/jira/browse/YARN-5939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16628380#comment-16628380 ] Weiwei Yang commented on YARN-5939: --- Hi [~bibinchundatt] Let me make sure I understand your question. Lets assume {{ResourceLocalizationService#PublicLocalizer}} has configured thread pool size 4, then there are 4 {{FSDownload}} threads doing the localization concurrently. Each of such thread will be initiated with a new {{FSDownload}} instance {code:java} pending.put(queue.submit(new FSDownload(lfs, null, conf, publicDirDestPath, resource, request.getContext().getStatCache())), request); {code} then each FSDownload instance will take care of downloading resources from a file system. When it starts to download, e.g calling {{downloadAndUnpack.}}It firstly gets a {{FileSystem}} instance, and normally it creates a few {{DFSOutputStream}} for I/O operations on certain files. After download is accomplished, calling #close like what added in this patch will close those streams. So it is supposed to only close streams for a certain {{FSDownload}} thread, not others. Does that make sense to you? > FSDownload leaks FileSystem resources > - > > Key: YARN-5939 > URL: https://issues.apache.org/jira/browse/YARN-5939 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.5.1, 2.7.3 >Reporter: liuxiangwei >Assignee: Weiwei Yang >Priority: Major > Labels: leak > Attachments: YARN-5939.004.patch, YARN-5939.005.patch, > YARN-5939.01.patch, YARN-5939.02.patch, YARN-5939.03.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > Background > To use our self-defined FileSystem class, the item of configuration > "fs.%s.impl.disable.cache" should set to true. > In YARN's source code, the class named > "org.apache.hadoop.yarn.util.FSDownload" use getFileSystem but never close, > which leading to file descriptor leak because our self-defined FileSystem > class close the file descriptor when the close function is invoked. > My Question below: > 1. whether invoking "getFileSystem" but never close is YARN's expected > behavior > 2. what should we do in our self-defined FileSystem resolve it. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8824) App Nodelabel missed after RM restart for finished apps
[ https://issues.apache.org/jira/browse/YARN-8824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16628350#comment-16628350 ] Hudson commented on YARN-8824: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #15061 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/15061/]) YARN-8824. App Nodelabel missed after RM restart for finished apps. (rohithsharmaks: rev e5287a4fe0bb03d929f066fc50eb0e7bd74bb759) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockMemoryRMStateStore.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java > App Nodelabel missed after RM restart for finished apps > > > Key: YARN-8824 > URL: https://issues.apache.org/jira/browse/YARN-8824 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Major > Attachments: YARN-8824.001.patch > > > Similar to YARN-8815 nodelabel for application is lost for finished > application after restart -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5939) FSDownload leaks FileSystem resources
[ https://issues.apache.org/jira/browse/YARN-5939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16628337#comment-16628337 ] Bibin A Chundatt commented on YARN-5939: [~cheersyang]/[~sunilg] Can you clarify one query regarding this , IIUC filesystem.close() will closes all the dfsStreams which are open. So in case of Public localization with a threadpool size of 4, with cache enabled wouldn't this close other streams ?? > FSDownload leaks FileSystem resources > - > > Key: YARN-5939 > URL: https://issues.apache.org/jira/browse/YARN-5939 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.5.1, 2.7.3 >Reporter: liuxiangwei >Assignee: Weiwei Yang >Priority: Major > Labels: leak > Attachments: YARN-5939.004.patch, YARN-5939.005.patch, > YARN-5939.01.patch, YARN-5939.02.patch, YARN-5939.03.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > Background > To use our self-defined FileSystem class, the item of configuration > "fs.%s.impl.disable.cache" should set to true. > In YARN's source code, the class named > "org.apache.hadoop.yarn.util.FSDownload" use getFileSystem but never close, > which leading to file descriptor leak because our self-defined FileSystem > class close the file descriptor when the close function is invoked. > My Question below: > 1. whether invoking "getFileSystem" but never close is YARN's expected > behavior > 2. what should we do in our self-defined FileSystem resolve it. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8824) App Nodelabel missed after RM restart for finished apps
[ https://issues.apache.org/jira/browse/YARN-8824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16628332#comment-16628332 ] Rohith Sharma K S commented on YARN-8824: - Committed to trunk.. branch-3.1 cherry pick gave errors.. [~bibinchundatt] can you attach patch to branch-3.1 branch? > App Nodelabel missed after RM restart for finished apps > > > Key: YARN-8824 > URL: https://issues.apache.org/jira/browse/YARN-8824 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Major > Attachments: YARN-8824.001.patch > > > Similar to YARN-8815 nodelabel for application is lost for finished > application after restart -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8824) App Nodelabel missed after RM restart for finished apps
[ https://issues.apache.org/jira/browse/YARN-8824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16628320#comment-16628320 ] Rohith Sharma K S commented on YARN-8824: - +1 committing shortly > App Nodelabel missed after RM restart for finished apps > > > Key: YARN-8824 > URL: https://issues.apache.org/jira/browse/YARN-8824 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Major > Attachments: YARN-8824.001.patch > > > Similar to YARN-8815 nodelabel for application is lost for finished > application after restart -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8303) YarnClient should contact TimelineReader for application/attempt/container report
[ https://issues.apache.org/jira/browse/YARN-8303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16628318#comment-16628318 ] Rohith Sharma K S commented on YARN-8303: - [~abmodi] any update on this? > YarnClient should contact TimelineReader for application/attempt/container > report > - > > Key: YARN-8303 > URL: https://issues.apache.org/jira/browse/YARN-8303 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Rohith Sharma K S >Assignee: Abhishek Modi >Priority: Critical > Attachments: YARN-8303.poc.patch > > > YarnClient get app/attempt/container information from RM. If RM doesn't have > then queried to ahsClient. When ATSv2 is only enabled, yarnClient will result > empty. > YarnClient is used by many users which result in empty information for > app/attempt/container report. > Proposal is to have adapter from yarn client so that app/attempt/container > reports can be generated from AHSv2Client which does REST API to > TimelineReader and get the entity and convert it into app/attempt/container > report. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org