[jira] [Commented] (YARN-7953) [GQ] Data structures for federation global queues calculations
[ https://issues.apache.org/jira/browse/YARN-7953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16388325#comment-16388325 ] Carlo Curino commented on YARN-7953: [~asuresh] I like this suggestion, and from our offline convo I know you are looking into it, please let me know if it looks promising once you tested the ideas. Since the FederationQueue objects I have here are transformed in YARN-7403 and YARN-7834 in other objects for algorithmic calculations, this should be pretty doable in terms of the rest of the YARN-7402 work items. Small caveats: # The reason I had initially not used QueueMetrics is that I saw them being broken/off often in live clusters, so I thought they were maintained a bit sloppily. If we can assure they are correct and consistent I think it is good to have them. # Also we should validate whether the polling of QueueMetrics is for performance (might be better due to already maintained objects and the delta protocol, but want to make sure). # The other advantage of the FedQueue objects was the fact that were very easy to build tests by constructing scenarios in .json. If we can do the same for QueueMetrics and the alike, I think it should be good. > [GQ] Data structures for federation global queues calculations > -- > > Key: YARN-7953 > URL: https://issues.apache.org/jira/browse/YARN-7953 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Carlo Curino >Assignee: Carlo Curino >Priority: Major > Attachments: YARN-7953.v1.patch > > > This Jira tracks data structures and helper classes used by the core > algorithms of YARN-7402 umbrella Jira (currently YARN-7403, and YARN-7834). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7403) [GQ] Compute global and local "IdealAllocation"
[ https://issues.apache.org/jira/browse/YARN-7403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino updated YARN-7403: --- Attachment: YARN-7403.v3.patch > [GQ] Compute global and local "IdealAllocation" > --- > > Key: YARN-7403 > URL: https://issues.apache.org/jira/browse/YARN-7403 > Project: Hadoop YARN > Issue Type: Sub-task > Components: federation >Reporter: Carlo Curino >Assignee: Carlo Curino >Priority: Major > Attachments: YARN-7403.draft.patch, YARN-7403.draft2.patch, > YARN-7403.draft3.patch, YARN-7403.v1.patch, YARN-7403.v2.patch, > YARN-7403.v3.patch, global-queues-preemption.PNG > > > This JIRA tracks algorithmic effort to combine the local queue views of > capacity guarantee/use/demand and compute the global ideal allocation, and > the respective local allocations. This will inform the RMs in each > sub-clusters on how to allocate more containers to each queues (allowing for > temporary over/under allocations that are locally excessive, but globally > correct). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7403) [GQ] Compute global and local "IdealAllocation"
[ https://issues.apache.org/jira/browse/YARN-7403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16372324#comment-16372324 ] Carlo Curino commented on YARN-7403: [~kkaranasos] thanks for looking at this. I initially put it together because it is not easy to understand why we have certain data structures without the code that use them, but if it is easier to review for you I am ok to split. # YARN-7953 is now a data-structure only patch (with minor refactoring should now compile fine, and be reasonably self-sustaining) # YARN-7403 (this patch) is now algo-only and depends on YARN-7953 and YARN-7934 (the hook in CS/preemption code patch) BTW the choice of JAX-B is because we are considering REST endpoint as a way to communicate between RM and GPG. > [GQ] Compute global and local "IdealAllocation" > --- > > Key: YARN-7403 > URL: https://issues.apache.org/jira/browse/YARN-7403 > Project: Hadoop YARN > Issue Type: Sub-task > Components: federation >Reporter: Carlo Curino >Assignee: Carlo Curino >Priority: Major > Attachments: YARN-7403.draft.patch, YARN-7403.draft2.patch, > YARN-7403.draft3.patch, YARN-7403.v1.patch, YARN-7403.v2.patch, > global-queues-preemption.PNG > > > This JIRA tracks algorithmic effort to combine the local queue views of > capacity guarantee/use/demand and compute the global ideal allocation, and > the respective local allocations. This will inform the RMs in each > sub-clusters on how to allocate more containers to each queues (allowing for > temporary over/under allocations that are locally excessive, but globally > correct). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-7953) [GQ] Data structures for federation global queues calculations
[ https://issues.apache.org/jira/browse/YARN-7953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16372305#comment-16372305 ] Carlo Curino edited comment on YARN-7953 at 2/22/18 1:41 AM: - Per [this |#comment-16370912] ask by [~kkaranasos], I am splitting YARN-7403 into a data-only patch, this one, and the algo side in YARN-7403. was (Author: curino): Per [this |#comment-16370912] ask by, I am splitting YARN-7403 into a data-only patch, this one, and the algo side in YARN-7403. > [GQ] Data structures for federation global queues calculations > -- > > Key: YARN-7953 > URL: https://issues.apache.org/jira/browse/YARN-7953 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Carlo Curino >Assignee: Carlo Curino >Priority: Major > Attachments: YARN-7953.v1.patch > > > This Jira tracks data structures and helper classes used by the core > algorithms of YARN-7402 umbrella Jira (currently YARN-7403, and YARN-7834). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-7953) [GQ] Data structures for federation global queues calculations
[ https://issues.apache.org/jira/browse/YARN-7953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino reassigned YARN-7953: -- Assignee: Carlo Curino > [GQ] Data structures for federation global queues calculations > -- > > Key: YARN-7953 > URL: https://issues.apache.org/jira/browse/YARN-7953 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Carlo Curino >Assignee: Carlo Curino >Priority: Major > Attachments: YARN-7953.v1.patch > > > This Jira tracks data structures and helper classes used by the core > algorithms of YARN-7402 umbrella Jira (currently YARN-7403, and YARN-7834). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7953) [GQ] Data structures for federation global queues calculations
[ https://issues.apache.org/jira/browse/YARN-7953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino updated YARN-7953: --- Attachment: YARN-7953.v1.patch > [GQ] Data structures for federation global queues calculations > -- > > Key: YARN-7953 > URL: https://issues.apache.org/jira/browse/YARN-7953 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Carlo Curino >Assignee: Carlo Curino >Priority: Major > Attachments: YARN-7953.v1.patch > > > This Jira tracks data structures and helper classes used by the core > algorithms of YARN-7402 umbrella Jira (currently YARN-7403, and YARN-7834). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-7953) [GQ] Data structures for federation global queues calculations
Carlo Curino created YARN-7953: -- Summary: [GQ] Data structures for federation global queues calculations Key: YARN-7953 URL: https://issues.apache.org/jira/browse/YARN-7953 Project: Hadoop YARN Issue Type: Sub-task Reporter: Carlo Curino This Jira tracks data structures and helper classes used by the core algorithms of YARN-7402 umbrella Jira (currently YARN-7403, and YARN-7834). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7732) Support Generic AM Simulator from SynthGenerator
[ https://issues.apache.org/jira/browse/YARN-7732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16371824#comment-16371824 ] Carlo Curino commented on YARN-7732: Thanks [~leftnoteasy], so should we then push to branch-3.0 (for all 3.x future branches?) > Support Generic AM Simulator from SynthGenerator > > > Key: YARN-7732 > URL: https://issues.apache.org/jira/browse/YARN-7732 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler-load-simulator >Reporter: Young Chen >Assignee: Young Chen >Priority: Minor > Attachments: YARN-7732-YARN-7798.01.patch, > YARN-7732-YARN-7798.02.patch, YARN-7732.01.patch, YARN-7732.02.patch, > YARN-7732.03.patch, YARN-7732.04.patch, YARN-7732.05.patch, YARN-7732.06.patch > > > Extract the MapReduce specific set-up in the SLSRunner into the > MRAMSimulator, and enable support for pluggable AMSimulators. > Previously, the AM set up in SLSRunner had the MRAMSimulator type hard coded, > for example startAMFromSynthGenerator() calls this: > > {code:java} > runNewAM(SLSUtils.DEFAULT_JOB_TYPE, user, jobQueue, oldJobId, > jobStartTimeMS, jobFinishTimeMS, containerList, reservationId, > job.getDeadline(), getAMContainerResource(null)); > {code} > where SLSUtils.DEFAULT_JOB_TYPE = "mapreduce" > The container set up was also only suitable for mapreduce: > > {code:java} > Version:1.0 StartHTML:00286 EndHTML:12564 StartFragment:03634 > EndFragment:12474 StartSelection:03700 EndSelection:12464 > SourceURL:https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/SLSRunner.java > > // map tasks > for (int i = 0; i < job.getNumberMaps(); i++) { > TaskAttemptInfo tai = job.getTaskAttemptInfo(TaskType.MAP, i, 0); > RMNode node = > nmMap.get(keyAsArray.get(rand.nextInt(keyAsArray.size( > .getNode(); > String hostname = "/" + node.getRackName() + "/" + node.getHostName(); > long containerLifeTime = tai.getRuntime(); > Resource containerResource = > Resource.newInstance((int) tai.getTaskInfo().getTaskMemory(), > (int) tai.getTaskInfo().getTaskVCores()); > containerList.add(new ContainerSimulator(containerResource, > containerLifeTime, hostname, DEFAULT_MAPPER_PRIORITY, "map")); > } > // reduce tasks > for (int i = 0; i < job.getNumberReduces(); i++) { > TaskAttemptInfo tai = job.getTaskAttemptInfo(TaskType.REDUCE, i, 0); > RMNode node = > nmMap.get(keyAsArray.get(rand.nextInt(keyAsArray.size( > .getNode(); > String hostname = "/" + node.getRackName() + "/" + node.getHostName(); > long containerLifeTime = tai.getRuntime(); > Resource containerResource = > Resource.newInstance((int) tai.getTaskInfo().getTaskMemory(), > (int) tai.getTaskInfo().getTaskVCores()); > containerList.add( > new ContainerSimulator(containerResource, containerLifeTime, > hostname, DEFAULT_REDUCER_PRIORITY, "reduce")); > } > {code} > > In addition, the syn.json format supported only mapreduce (the parameters > were very specific: mtime, rtime, mtasks, rtasks, etc..). > This patch aims to introduce a new syn.json format that can describe generic > jobs, and the SLS setup required to support the synth generation of generic > jobs. > See syn_generic.json for an equivalent of the previous syn.json in the new > format. > Using the new generic format, we describe a StreamAMSimulator simulates a > long running streaming service that maintains N number of containers for the > lifetime of the AM. See syn_stream.json. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7798) Refactor SLS Reservation Creation
[ https://issues.apache.org/jira/browse/YARN-7798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370822#comment-16370822 ] Carlo Curino commented on YARN-7798: Cherry-picked back to branch-3 with a clean cherry-pick (and spot checks of SLS tests running fine) > Refactor SLS Reservation Creation > - > > Key: YARN-7798 > URL: https://issues.apache.org/jira/browse/YARN-7798 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Young Chen >Assignee: Young Chen >Priority: Minor > Fix For: 3.1.0 > > Attachments: YARN-7798.01.patch, YARN-7798.02.patch, > YARN-7798.03.patch > > > Move the reservation request creation out of SLSRunner and delegate to the > AMSimulator instance. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7732) Support Generic AM Simulator from SynthGenerator
[ https://issues.apache.org/jira/browse/YARN-7732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370821#comment-16370821 ] Carlo Curino commented on YARN-7732: Thanks [~youchen] for the contribution, and [~leftnoteasy] for reviewing. I committed this to trunk, and cherry picked back this patch (and YARN-7798) to branch-3, since it was a clean cherry-pick and spot runs of SLS tests look good. [~leftnoteasy] and [~yufeigu], if you see issue with this cherry-pick let me know we can easily revert, I would like as much as possible to have all the SLS newer magic available in all branches, as it is very useful for regression/integration/performance testing. [~youchen] can you see why YARN-7798 does not apply to branch-2, it might be a very simple fix, in which case, please provide a patch for both YARN-7798 and YARN-7732 that works in branch-2, so we an backport there as well. > Support Generic AM Simulator from SynthGenerator > > > Key: YARN-7732 > URL: https://issues.apache.org/jira/browse/YARN-7732 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler-load-simulator >Reporter: Young Chen >Assignee: Young Chen >Priority: Minor > Attachments: YARN-7732-YARN-7798.01.patch, > YARN-7732-YARN-7798.02.patch, YARN-7732.01.patch, YARN-7732.02.patch, > YARN-7732.03.patch, YARN-7732.04.patch, YARN-7732.05.patch, YARN-7732.06.patch > > > Extract the MapReduce specific set-up in the SLSRunner into the > MRAMSimulator, and enable support for pluggable AMSimulators. > Previously, the AM set up in SLSRunner had the MRAMSimulator type hard coded, > for example startAMFromSynthGenerator() calls this: > > {code:java} > runNewAM(SLSUtils.DEFAULT_JOB_TYPE, user, jobQueue, oldJobId, > jobStartTimeMS, jobFinishTimeMS, containerList, reservationId, > job.getDeadline(), getAMContainerResource(null)); > {code} > where SLSUtils.DEFAULT_JOB_TYPE = "mapreduce" > The container set up was also only suitable for mapreduce: > > {code:java} > Version:1.0 StartHTML:00286 EndHTML:12564 StartFragment:03634 > EndFragment:12474 StartSelection:03700 EndSelection:12464 > SourceURL:https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/SLSRunner.java > > // map tasks > for (int i = 0; i < job.getNumberMaps(); i++) { > TaskAttemptInfo tai = job.getTaskAttemptInfo(TaskType.MAP, i, 0); > RMNode node = > nmMap.get(keyAsArray.get(rand.nextInt(keyAsArray.size( > .getNode(); > String hostname = "/" + node.getRackName() + "/" + node.getHostName(); > long containerLifeTime = tai.getRuntime(); > Resource containerResource = > Resource.newInstance((int) tai.getTaskInfo().getTaskMemory(), > (int) tai.getTaskInfo().getTaskVCores()); > containerList.add(new ContainerSimulator(containerResource, > containerLifeTime, hostname, DEFAULT_MAPPER_PRIORITY, "map")); > } > // reduce tasks > for (int i = 0; i < job.getNumberReduces(); i++) { > TaskAttemptInfo tai = job.getTaskAttemptInfo(TaskType.REDUCE, i, 0); > RMNode node = > nmMap.get(keyAsArray.get(rand.nextInt(keyAsArray.size( > .getNode(); > String hostname = "/" + node.getRackName() + "/" + node.getHostName(); > long containerLifeTime = tai.getRuntime(); > Resource containerResource = > Resource.newInstance((int) tai.getTaskInfo().getTaskMemory(), > (int) tai.getTaskInfo().getTaskVCores()); > containerList.add( > new ContainerSimulator(containerResource, containerLifeTime, > hostname, DEFAULT_REDUCER_PRIORITY, "reduce")); > } > {code} > > In addition, the syn.json format supported only mapreduce (the parameters > were very specific: mtime, rtime, mtasks, rtasks, etc..). > This patch aims to introduce a new syn.json format that can describe generic > jobs, and the SLS setup required to support the synth generation of generic > jobs. > See syn_generic.json for an equivalent of the previous syn.json in the new > format. > Using the new generic format, we describe a StreamAMSimulator simulates a > long running streaming service that maintains N number of containers for the > lifetime of the AM. See syn_stream.json. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7732) Support Generic AM Simulator from SynthGenerator
[ https://issues.apache.org/jira/browse/YARN-7732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370610#comment-16370610 ] Carlo Curino commented on YARN-7732: Thanks [~leftnoteasy] for the review. [~youchen] please fix the ASF license issue (by adding an exclusion in pom.xml), and I will commit to trunk based on Wangda's review (and a quick skim from me). > Support Generic AM Simulator from SynthGenerator > > > Key: YARN-7732 > URL: https://issues.apache.org/jira/browse/YARN-7732 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler-load-simulator >Reporter: Young Chen >Assignee: Young Chen >Priority: Minor > Attachments: YARN-7732-YARN-7798.01.patch, > YARN-7732-YARN-7798.02.patch, YARN-7732.01.patch, YARN-7732.02.patch, > YARN-7732.03.patch, YARN-7732.04.patch, YARN-7732.05.patch > > > Extract the MapReduce specific set-up in the SLSRunner into the > MRAMSimulator, and enable support for pluggable AMSimulators. > Previously, the AM set up in SLSRunner had the MRAMSimulator type hard coded, > for example startAMFromSynthGenerator() calls this: > > {code:java} > runNewAM(SLSUtils.DEFAULT_JOB_TYPE, user, jobQueue, oldJobId, > jobStartTimeMS, jobFinishTimeMS, containerList, reservationId, > job.getDeadline(), getAMContainerResource(null)); > {code} > where SLSUtils.DEFAULT_JOB_TYPE = "mapreduce" > The container set up was also only suitable for mapreduce: > > {code:java} > Version:1.0 StartHTML:00286 EndHTML:12564 StartFragment:03634 > EndFragment:12474 StartSelection:03700 EndSelection:12464 > SourceURL:https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/SLSRunner.java > > // map tasks > for (int i = 0; i < job.getNumberMaps(); i++) { > TaskAttemptInfo tai = job.getTaskAttemptInfo(TaskType.MAP, i, 0); > RMNode node = > nmMap.get(keyAsArray.get(rand.nextInt(keyAsArray.size( > .getNode(); > String hostname = "/" + node.getRackName() + "/" + node.getHostName(); > long containerLifeTime = tai.getRuntime(); > Resource containerResource = > Resource.newInstance((int) tai.getTaskInfo().getTaskMemory(), > (int) tai.getTaskInfo().getTaskVCores()); > containerList.add(new ContainerSimulator(containerResource, > containerLifeTime, hostname, DEFAULT_MAPPER_PRIORITY, "map")); > } > // reduce tasks > for (int i = 0; i < job.getNumberReduces(); i++) { > TaskAttemptInfo tai = job.getTaskAttemptInfo(TaskType.REDUCE, i, 0); > RMNode node = > nmMap.get(keyAsArray.get(rand.nextInt(keyAsArray.size( > .getNode(); > String hostname = "/" + node.getRackName() + "/" + node.getHostName(); > long containerLifeTime = tai.getRuntime(); > Resource containerResource = > Resource.newInstance((int) tai.getTaskInfo().getTaskMemory(), > (int) tai.getTaskInfo().getTaskVCores()); > containerList.add( > new ContainerSimulator(containerResource, containerLifeTime, > hostname, DEFAULT_REDUCER_PRIORITY, "reduce")); > } > {code} > > In addition, the syn.json format supported only mapreduce (the parameters > were very specific: mtime, rtime, mtasks, rtasks, etc..). > This patch aims to introduce a new syn.json format that can describe generic > jobs, and the SLS setup required to support the synth generation of generic > jobs. > See syn_generic.json for an equivalent of the previous syn.json in the new > format. > Using the new generic format, we describe a StreamAMSimulator simulates a > long running streaming service that maintains N number of containers for the > lifetime of the AM. See syn_stream.json. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7403) [GQ] Compute global and local "IdealAllocation"
[ https://issues.apache.org/jira/browse/YARN-7403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370506#comment-16370506 ] Carlo Curino commented on YARN-7403: Fixing TestYarnConfigurationFields unit test failure (the rest still does not compile as depends on YARN-7403). > [GQ] Compute global and local "IdealAllocation" > --- > > Key: YARN-7403 > URL: https://issues.apache.org/jira/browse/YARN-7403 > Project: Hadoop YARN > Issue Type: Sub-task > Components: federation >Reporter: Carlo Curino >Assignee: Carlo Curino >Priority: Major > Attachments: YARN-7403.draft.patch, YARN-7403.draft2.patch, > YARN-7403.draft3.patch, YARN-7403.v1.patch, YARN-7403.v2.patch, > global-queues-preemption.PNG > > > This JIRA tracks algorithmic effort to combine the local queue views of > capacity guarantee/use/demand and compute the global ideal allocation, and > the respective local allocations. This will inform the RMs in each > sub-clusters on how to allocate more containers to each queues (allowing for > temporary over/under allocations that are locally excessive, but globally > correct). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7403) [GQ] Compute global and local "IdealAllocation"
[ https://issues.apache.org/jira/browse/YARN-7403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino updated YARN-7403: --- Attachment: YARN-7403.v2.patch > [GQ] Compute global and local "IdealAllocation" > --- > > Key: YARN-7403 > URL: https://issues.apache.org/jira/browse/YARN-7403 > Project: Hadoop YARN > Issue Type: Sub-task > Components: federation >Reporter: Carlo Curino >Assignee: Carlo Curino >Priority: Major > Attachments: YARN-7403.draft.patch, YARN-7403.draft2.patch, > YARN-7403.draft3.patch, YARN-7403.v1.patch, YARN-7403.v2.patch, > global-queues-preemption.PNG > > > This JIRA tracks algorithmic effort to combine the local queue views of > capacity guarantee/use/demand and compute the global ideal allocation, and > the respective local allocations. This will inform the RMs in each > sub-clusters on how to allocate more containers to each queues (allowing for > temporary over/under allocations that are locally excessive, but globally > correct). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7934) [GQ] Refactor preemption calculators to allow overriding for Federation Global Algos
[ https://issues.apache.org/jira/browse/YARN-7934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16368062#comment-16368062 ] Carlo Curino commented on YARN-7934: [~subru] thanks for the review. I agree the test seems unrelated and passed in v3 (that diff with v4 only in comments), so likely just a flacky one. > [GQ] Refactor preemption calculators to allow overriding for Federation > Global Algos > > > Key: YARN-7934 > URL: https://issues.apache.org/jira/browse/YARN-7934 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Carlo Curino >Assignee: Carlo Curino >Priority: Major > Attachments: YARN-7934.v1.patch, YARN-7934.v2.patch, > YARN-7934.v3.patch, YARN-7934.v4.patch > > > This Jira tracks minimal changes in the capacity scheduler preemption > mechanics that allow for sub-classing and overriding of certain behaviors, > which we use to implement federation global algorithms, e.g., in YARN-7403. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7934) [GQ] Refactor preemption calculators to allow overriding for Federation Global Algos
[ https://issues.apache.org/jira/browse/YARN-7934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino updated YARN-7934: --- Attachment: YARN-7934.v4.patch > [GQ] Refactor preemption calculators to allow overriding for Federation > Global Algos > > > Key: YARN-7934 > URL: https://issues.apache.org/jira/browse/YARN-7934 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Carlo Curino >Assignee: Carlo Curino >Priority: Major > Attachments: YARN-7934.v1.patch, YARN-7934.v2.patch, > YARN-7934.v3.patch, YARN-7934.v4.patch > > > This Jira tracks minimal changes in the capacity scheduler preemption > mechanics that allow for sub-classing and overriding of certain behaviors, > which we use to implement federation global algorithms, e.g., in YARN-7403. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7934) [GQ] Refactor preemption calculators to allow overriding for Federation Global Algos
[ https://issues.apache.org/jira/browse/YARN-7934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16366516#comment-16366516 ] Carlo Curino commented on YARN-7934: [~subru] thanks for the quick review. I have adressed the javadoc comments issue in patch v3. Regarding consumers you are correct they are not in this patch, but in YARN-7403. This is by design, the purpose of this patch is to commit to trunk the most basic refactoring needed while we develop algos and big stuff in the YARN-7402 feature branch (to limit churn on the touch points of the branch work). > [GQ] Refactor preemption calculators to allow overriding for Federation > Global Algos > > > Key: YARN-7934 > URL: https://issues.apache.org/jira/browse/YARN-7934 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Carlo Curino >Assignee: Carlo Curino >Priority: Major > Attachments: YARN-7934.v1.patch, YARN-7934.v2.patch, > YARN-7934.v3.patch > > > This Jira tracks minimal changes in the capacity scheduler preemption > mechanics that allow for sub-classing and overriding of certain behaviors, > which we use to implement federation global algorithms, e.g., in YARN-7403. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7934) [GQ] Refactor preemption calculators to allow overriding for Federation Global Algos
[ https://issues.apache.org/jira/browse/YARN-7934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino updated YARN-7934: --- Attachment: YARN-7934.v3.patch > [GQ] Refactor preemption calculators to allow overriding for Federation > Global Algos > > > Key: YARN-7934 > URL: https://issues.apache.org/jira/browse/YARN-7934 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Carlo Curino >Assignee: Carlo Curino >Priority: Major > Attachments: YARN-7934.v1.patch, YARN-7934.v2.patch, > YARN-7934.v3.patch > > > This Jira tracks minimal changes in the capacity scheduler preemption > mechanics that allow for sub-classing and overriding of certain behaviors, > which we use to implement federation global algorithms, e.g., in YARN-7403. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7834) [GQ] Rebalance queue configuration for load-balancing and locality affinities
[ https://issues.apache.org/jira/browse/YARN-7834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16366372#comment-16366372 ] Carlo Curino commented on YARN-7834: The uploaded patch provides an Linear Programming (LP) implementation of this algorithms, leveraging the oljalgo solver (which ships with hadoop already, and contains a pure-java solver, as well as hooks to leverage an external more powerful solver such as Gurobi or CPLEX). The formulation is designed to: # Guarantee that all queues will be allocated fully # Guarantee that none of the sub-clusters is allocated more capacity than it can take # It maximizes load-balancing (as a primary objective). # Subject to not impacting load-balancing more than a configurable delta (zero by default), it maximizes queue-to-sub-cluster affinity (as a secondary objective). The reasons behind 3/4 being in a primary-secondary relationship (instead of a weighted linear combination) is that in our production experience load-balancing is the most concerning issue, secondary of which we aim at optimizing for locality. > [GQ] Rebalance queue configuration for load-balancing and locality affinities > - > > Key: YARN-7834 > URL: https://issues.apache.org/jira/browse/YARN-7834 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Carlo Curino >Assignee: Carlo Curino >Priority: Major > Attachments: YARN-7834.v1.patch > > > This Jira tracks algorithmic work, which will run in the GPG and will > rebalance the mapping of queues to sub-clusters. The current design supports > both balancing the "load" across sub-clusters (proportionally to their size) > and as a second objective to maximize the affinity between queues and the > sub-clusters where they historically have most demand. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7834) [GQ] Rebalance queue configuration for load-balancing and locality affinities
[ https://issues.apache.org/jira/browse/YARN-7834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino updated YARN-7834: --- Attachment: YARN-7834.v1.patch > [GQ] Rebalance queue configuration for load-balancing and locality affinities > - > > Key: YARN-7834 > URL: https://issues.apache.org/jira/browse/YARN-7834 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Carlo Curino >Assignee: Carlo Curino >Priority: Major > Attachments: YARN-7834.v1.patch > > > This Jira tracks algorithmic work, which will run in the GPG and will > rebalance the mapping of queues to sub-clusters. The current design supports > both balancing the "load" across sub-clusters (proportionally to their size) > and as a second objective to maximize the affinity between queues and the > sub-clusters where they historically have most demand. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-7725) [GQ] Compute global "ideal allocation" including locality biases
[ https://issues.apache.org/jira/browse/YARN-7725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino resolved YARN-7725. Resolution: Duplicate Fix Version/s: yarn-7403 Newer version of YARN-7403 subsumes this task. > [GQ] Compute global "ideal allocation" including locality biases > > > Key: YARN-7725 > URL: https://issues.apache.org/jira/browse/YARN-7725 > Project: Hadoop YARN > Issue Type: Sub-task > Components: federation >Reporter: Carlo Curino >Assignee: Carlo Curino >Priority: Major > Fix For: yarn-7403 > > > This JIRA tracks an algorithmic effort to compute the global ideal > allocation. We also take into account of locality demand/availability gap, > and map down the global allocation to sub-cluster level, computing the delta+ > and delta- for each queue in each sub-cluster. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7403) [GQ] Compute global and local "IdealAllocation"
[ https://issues.apache.org/jira/browse/YARN-7403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino updated YARN-7403: --- Description: This JIRA tracks algorithmic effort to combine the local queue views of capacity guarantee/use/demand and compute the global ideal allocation, and the respective local allocations. This will inform the RMs in each sub-clusters on how to allocate more containers to each queues (allowing for temporary over/under allocations that are locally excessive, but globally correct). (was: This JIRA tracks algorithmic effort to combine the local queue views of capacity guarantee/use/demand and compute the global amount of preemption, and based on that, "where" (in which sub-cluster) preemption will be enacted.) > [GQ] Compute global and local "IdealAllocation" > --- > > Key: YARN-7403 > URL: https://issues.apache.org/jira/browse/YARN-7403 > Project: Hadoop YARN > Issue Type: Sub-task > Components: federation >Reporter: Carlo Curino >Assignee: Carlo Curino >Priority: Major > Attachments: YARN-7403.draft.patch, YARN-7403.draft2.patch, > YARN-7403.draft3.patch, YARN-7403.v1.patch, global-queues-preemption.PNG > > > This JIRA tracks algorithmic effort to combine the local queue views of > capacity guarantee/use/demand and compute the global ideal allocation, and > the respective local allocations. This will inform the RMs in each > sub-clusters on how to allocate more containers to each queues (allowing for > temporary over/under allocations that are locally excessive, but globally > correct). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7934) [GQ] Refactor preemption calculators to allow overriding for Federation Global Algos
[ https://issues.apache.org/jira/browse/YARN-7934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16366215#comment-16366215 ] Carlo Curino commented on YARN-7934: Patch v2 attempts to please the YETUS gods. This patch does not change any of the behavior, just define hooks to be used by sub-classes in YARN-7403, hence it doesn't require any new test. > [GQ] Refactor preemption calculators to allow overriding for Federation > Global Algos > > > Key: YARN-7934 > URL: https://issues.apache.org/jira/browse/YARN-7934 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Carlo Curino >Assignee: Carlo Curino >Priority: Major > Attachments: YARN-7934.v1.patch, YARN-7934.v2.patch > > > This Jira tracks minimal changes in the capacity scheduler preemption > mechanics that allow for sub-classing and overriding of certain behaviors, > which we use to implement federation global algorithms, e.g., in YARN-7403. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7934) [GQ] Refactor preemption calculators to allow overriding for Federation Global Algos
[ https://issues.apache.org/jira/browse/YARN-7934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino updated YARN-7934: --- Attachment: YARN-7934.v2.patch > [GQ] Refactor preemption calculators to allow overriding for Federation > Global Algos > > > Key: YARN-7934 > URL: https://issues.apache.org/jira/browse/YARN-7934 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Carlo Curino >Assignee: Carlo Curino >Priority: Major > Attachments: YARN-7934.v1.patch, YARN-7934.v2.patch > > > This Jira tracks minimal changes in the capacity scheduler preemption > mechanics that allow for sub-classing and overriding of certain behaviors, > which we use to implement federation global algorithms, e.g., in YARN-7403. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7403) [GQ] Compute global and local "IdealAllocation"
[ https://issues.apache.org/jira/browse/YARN-7403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16365063#comment-16365063 ] Carlo Curino commented on YARN-7403: The v1 patch contains a much reworked version of the code. This depends on YARN-7934 so will not compile. The key idea here is to provide algorithms that will run every few seconds in the GPG and observe the overall state of a federated cluster. The algorithm will leverage some of the {{PreemptableResourceCalculator}} logic (with several additions) to compute what is the ideal allocation for each queues. The extra effort is put into considering "locality" among sub-clusters, which require some careful consideration. Various heuristics can be chosen from (we implement two as reference), and once YARN-7885/YARN-7886 will be ready to use we can experiment on which is most closely approximating the behavior of a single {{CapacityScheduler}} overlooking the entire federation. > [GQ] Compute global and local "IdealAllocation" > --- > > Key: YARN-7403 > URL: https://issues.apache.org/jira/browse/YARN-7403 > Project: Hadoop YARN > Issue Type: Sub-task > Components: federation >Reporter: Carlo Curino >Assignee: Carlo Curino >Priority: Major > Attachments: YARN-7403.draft.patch, YARN-7403.draft2.patch, > YARN-7403.draft3.patch, YARN-7403.v1.patch, global-queues-preemption.PNG > > > This JIRA tracks algorithmic effort to combine the local queue views of > capacity guarantee/use/demand and compute the global amount of preemption, > and based on that, "where" (in which sub-cluster) preemption will be enacted. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7403) [GQ] Compute global and local "IdealAllocation"
[ https://issues.apache.org/jira/browse/YARN-7403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino updated YARN-7403: --- Attachment: YARN-7403.v1.patch > [GQ] Compute global and local "IdealAllocation" > --- > > Key: YARN-7403 > URL: https://issues.apache.org/jira/browse/YARN-7403 > Project: Hadoop YARN > Issue Type: Sub-task > Components: federation >Reporter: Carlo Curino >Assignee: Carlo Curino >Priority: Major > Attachments: YARN-7403.draft.patch, YARN-7403.draft2.patch, > YARN-7403.draft3.patch, YARN-7403.v1.patch, global-queues-preemption.PNG > > > This JIRA tracks algorithmic effort to combine the local queue views of > capacity guarantee/use/demand and compute the global amount of preemption, > and based on that, "where" (in which sub-cluster) preemption will be enacted. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7934) [GQ] Refactor preemption calculators to allow overriding for Federation Global Algos
[ https://issues.apache.org/jira/browse/YARN-7934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364966#comment-16364966 ] Carlo Curino commented on YARN-7934: [~leftnoteasy] the intention is to commit this directly to trunk, so we avoid churn, as the rest of the development will continue in YARN-7402 branch. Please check it out if you can, if none complaints and YETUS is happy, this will go straight to trunk. > [GQ] Refactor preemption calculators to allow overriding for Federation > Global Algos > > > Key: YARN-7934 > URL: https://issues.apache.org/jira/browse/YARN-7934 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Carlo Curino >Assignee: Carlo Curino >Priority: Major > Attachments: YARN-7934.v1.patch > > > This Jira tracks minimal changes in the capacity scheduler preemption > mechanics that allow for sub-classing and overriding of certain behaviors, > which we use to implement federation global algorithms, e.g., in YARN-7403. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7934) [GQ] Refactor preemption calculators to allow overriding for Federation Global Algos
[ https://issues.apache.org/jira/browse/YARN-7934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino updated YARN-7934: --- Attachment: (was: More.url) > [GQ] Refactor preemption calculators to allow overriding for Federation > Global Algos > > > Key: YARN-7934 > URL: https://issues.apache.org/jira/browse/YARN-7934 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Carlo Curino >Assignee: Carlo Curino >Priority: Major > Attachments: YARN-7934.v1.patch > > > This Jira tracks minimal changes in the capacity scheduler preemption > mechanics that allow for sub-classing and overriding of certain behaviors, > which we use to implement federation global algorithms, e.g., in YARN-7403. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7934) [GQ] Refactor preemption calculators to allow overriding for Federation Global Algos
[ https://issues.apache.org/jira/browse/YARN-7934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino updated YARN-7934: --- Attachment: YARN-7934.v1.patch > [GQ] Refactor preemption calculators to allow overriding for Federation > Global Algos > > > Key: YARN-7934 > URL: https://issues.apache.org/jira/browse/YARN-7934 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Carlo Curino >Assignee: Carlo Curino >Priority: Major > Attachments: More.url, YARN-7934.v1.patch > > > This Jira tracks minimal changes in the capacity scheduler preemption > mechanics that allow for sub-classing and overriding of certain behaviors, > which we use to implement federation global algorithms, e.g., in YARN-7403. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7934) [GQ] Refactor preemption calculators to allow overriding for Federation Global Algos
[ https://issues.apache.org/jira/browse/YARN-7934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino updated YARN-7934: --- Attachment: More.url > [GQ] Refactor preemption calculators to allow overriding for Federation > Global Algos > > > Key: YARN-7934 > URL: https://issues.apache.org/jira/browse/YARN-7934 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Carlo Curino >Assignee: Carlo Curino >Priority: Major > Attachments: More.url > > > This Jira tracks minimal changes in the capacity scheduler preemption > mechanics that allow for sub-classing and overriding of certain behaviors, > which we use to implement federation global algorithms, e.g., in YARN-7403. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-7934) [GQ] Refactor preemption calculators to allow overriding for Federation Global Algos
[ https://issues.apache.org/jira/browse/YARN-7934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino reassigned YARN-7934: -- Assignee: Carlo Curino > [GQ] Refactor preemption calculators to allow overriding for Federation > Global Algos > > > Key: YARN-7934 > URL: https://issues.apache.org/jira/browse/YARN-7934 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Carlo Curino >Assignee: Carlo Curino >Priority: Major > > This Jira tracks minimal changes in the capacity scheduler preemption > mechanics that allow for sub-classing and overriding of certain behaviors, > which we use to implement federation global algorithms, e.g., in YARN-7403. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-7934) [GQ] Refactor preemption calculators to allow overriding for Federation Global Algos
Carlo Curino created YARN-7934: -- Summary: [GQ] Refactor preemption calculators to allow overriding for Federation Global Algos Key: YARN-7934 URL: https://issues.apache.org/jira/browse/YARN-7934 Project: Hadoop YARN Issue Type: Sub-task Reporter: Carlo Curino This Jira tracks minimal changes in the capacity scheduler preemption mechanics that allow for sub-classing and overriding of certain behaviors, which we use to implement federation global algorithms, e.g., in YARN-7403. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6528) [PERF/TEST] Add JMX metrics for Plan Follower and Agent Placement and Plan Operations
[ https://issues.apache.org/jira/browse/YARN-6528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino updated YARN-6528: --- Summary: [PERF/TEST] Add JMX metrics for Plan Follower and Agent Placement and Plan Operations (was: Add JMX metrics for Plan Follower and Agent Placement and Plan Operations) > [PERF/TEST] Add JMX metrics for Plan Follower and Agent Placement and Plan > Operations > - > > Key: YARN-6528 > URL: https://issues.apache.org/jira/browse/YARN-6528 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Sean Po >Assignee: Xiaohua (Victor) Liang >Priority: Major > Attachments: YARN-6528.v001.patch, YARN-6528.v002.patch, > YARN-6528.v003.patch, YARN-6528.v004.patch, YARN-6528.v005.patch, > YARN-6528.v006.patch, YARN-6528.v007.patch > > > YARN-1051 introduced a ReservationSytem that enables the YARN RM to handle > time explicitly, i.e. users can now "reserve" capacity ahead of time which is > predictably allocated to them. In order to understand in finer detail the > performance of Rayon, YARN-6528 proposes to include JMX metrics in the Plan > Follower, Agent Placement and Plan Operations components of Rayon. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6528) Add JMX metrics for Plan Follower and Agent Placement and Plan Operations
[ https://issues.apache.org/jira/browse/YARN-6528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364757#comment-16364757 ] Carlo Curino commented on YARN-6528: Thanks [~seanpo03], I am ok with the left-over checkstyle. I am assigning this to [~lxhfirenking] who volunteered to rebase and extend this, I will mark you both as contributor when we get to commit this. [~lxhfirenking] please see if you can shush the checkstyle using {{@SuppressWarnings("checkstyle:XYZ")}} , with XYZ being the right checkstyle rule (or similar tricks). > Add JMX metrics for Plan Follower and Agent Placement and Plan Operations > - > > Key: YARN-6528 > URL: https://issues.apache.org/jira/browse/YARN-6528 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Sean Po >Assignee: Xiaohua (Victor) Liang >Priority: Major > Attachments: YARN-6528.v001.patch, YARN-6528.v002.patch, > YARN-6528.v003.patch, YARN-6528.v004.patch, YARN-6528.v005.patch, > YARN-6528.v006.patch, YARN-6528.v007.patch > > > YARN-1051 introduced a ReservationSytem that enables the YARN RM to handle > time explicitly, i.e. users can now "reserve" capacity ahead of time which is > predictably allocated to them. In order to understand in finer detail the > performance of Rayon, YARN-6528 proposes to include JMX metrics in the Plan > Follower, Agent Placement and Plan Operations components of Rayon. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-6528) Add JMX metrics for Plan Follower and Agent Placement and Plan Operations
[ https://issues.apache.org/jira/browse/YARN-6528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino reassigned YARN-6528: -- Assignee: Xiaohua (Victor) Liang (was: Carlo Curino) > Add JMX metrics for Plan Follower and Agent Placement and Plan Operations > - > > Key: YARN-6528 > URL: https://issues.apache.org/jira/browse/YARN-6528 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Sean Po >Assignee: Xiaohua (Victor) Liang >Priority: Major > Attachments: YARN-6528.v001.patch, YARN-6528.v002.patch, > YARN-6528.v003.patch, YARN-6528.v004.patch, YARN-6528.v005.patch, > YARN-6528.v006.patch, YARN-6528.v007.patch > > > YARN-1051 introduced a ReservationSytem that enables the YARN RM to handle > time explicitly, i.e. users can now "reserve" capacity ahead of time which is > predictably allocated to them. In order to understand in finer detail the > performance of Rayon, YARN-6528 proposes to include JMX metrics in the Plan > Follower, Agent Placement and Plan Operations components of Rayon. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6528) Add JMX metrics for Plan Follower and Agent Placement and Plan Operations
[ https://issues.apache.org/jira/browse/YARN-6528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino updated YARN-6528: --- Issue Type: Sub-task (was: Task) Parent: YARN-7402 > Add JMX metrics for Plan Follower and Agent Placement and Plan Operations > - > > Key: YARN-6528 > URL: https://issues.apache.org/jira/browse/YARN-6528 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Sean Po >Assignee: Sean Po >Priority: Major > Attachments: YARN-6528.v001.patch, YARN-6528.v002.patch, > YARN-6528.v003.patch, YARN-6528.v004.patch, YARN-6528.v005.patch, > YARN-6528.v006.patch, YARN-6528.v007.patch > > > YARN-1051 introduced a ReservationSytem that enables the YARN RM to handle > time explicitly, i.e. users can now "reserve" capacity ahead of time which is > predictably allocated to them. In order to understand in finer detail the > performance of Rayon, YARN-6528 proposes to include JMX metrics in the Plan > Follower, Agent Placement and Plan Operations components of Rayon. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-7614) [RESERVATION] Support Reservation APIs in Federation Router
[ https://issues.apache.org/jira/browse/YARN-7614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino reassigned YARN-7614: -- Assignee: Giovanni Matteo Fumarola > [RESERVATION] Support Reservation APIs in Federation Router > --- > > Key: YARN-7614 > URL: https://issues.apache.org/jira/browse/YARN-7614 > Project: Hadoop YARN > Issue Type: Sub-task > Components: federation, reservation system >Reporter: Carlo Curino >Assignee: Giovanni Matteo Fumarola >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-7404) [GQ] propagate to GPG queue-level utilization/pending information
[ https://issues.apache.org/jira/browse/YARN-7404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino reassigned YARN-7404: -- Assignee: Jose Miguel Arreola > [GQ] propagate to GPG queue-level utilization/pending information > - > > Key: YARN-7404 > URL: https://issues.apache.org/jira/browse/YARN-7404 > Project: Hadoop YARN > Issue Type: Sub-task > Components: federation >Reporter: Carlo Curino >Assignee: Jose Miguel Arreola >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-7870) [PERF/TEST] Performance testing of ReservationSystem at high job submission rates
[ https://issues.apache.org/jira/browse/YARN-7870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino reassigned YARN-7870: -- Assignee: Xiaohua (Victor) Liang > [PERF/TEST] Performance testing of ReservationSystem at high job submission > rates > - > > Key: YARN-7870 > URL: https://issues.apache.org/jira/browse/YARN-7870 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Carlo Curino >Assignee: Xiaohua (Victor) Liang >Priority: Major > > To leverage the ReservationSystem as a gang-semantics enforcer for all jobs > of a large federation, we need to evaluate it can sustain large number of > job submissions (and replanning) per second. This Jira tracks this validation > effort. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-7869) [PERF/TEST] Performance testing of CapacityScheudler at many-thousands of queues
[ https://issues.apache.org/jira/browse/YARN-7869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino reassigned YARN-7869: -- Assignee: Abhishek Modi > [PERF/TEST] Performance testing of CapacityScheudler at many-thousands of > queues > > > Key: YARN-7869 > URL: https://issues.apache.org/jira/browse/YARN-7869 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Carlo Curino >Assignee: Abhishek Modi >Priority: Major > > The CapacityScheduler is known to work well at tens to hundreds of queues. > This Jira tracks performance testing at much larger scale thousands of > queues, and deep queue hierachies >10 levels. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7870) [PERF/TEST] Performance testing of ReservationSystem at high job submission rates
[ https://issues.apache.org/jira/browse/YARN-7870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347741#comment-16347741 ] Carlo Curino commented on YARN-7870: Yes! It is, in fact, already extended to support reservations (YARN-6363 if I am not mistaken), and to run a {{MetricsInvariantChecker}} (YARN-6451 and YARN-6547) to validate some of the performance/correctness. In this Jira (and others in the same umbrella and in the SLS umbrella, e.g., YARN-7798) we plan to build upon it to give us a solid testing and perf-testing platform for the various algorithmic/protocol additions that we are planning in YANR-7402 (and YARN in general). > [PERF/TEST] Performance testing of ReservationSystem at high job submission > rates > - > > Key: YARN-7870 > URL: https://issues.apache.org/jira/browse/YARN-7870 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Carlo Curino >Priority: Major > > To leverage the ReservationSystem as a gang-semantics enforcer for all jobs > of a large federation, we need to evaluate it can sustain large number of > job submissions (and replanning) per second. This Jira tracks this validation > effort. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7833) [PERF/TEST] Extend SLS to support simulation of a Federated Environment
[ https://issues.apache.org/jira/browse/YARN-7833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino updated YARN-7833: --- Summary: [PERF/TEST] Extend SLS to support simulation of a Federated Environment (was: Extend SLS to support simulation of a Federated Environment) > [PERF/TEST] Extend SLS to support simulation of a Federated Environment > --- > > Key: YARN-7833 > URL: https://issues.apache.org/jira/browse/YARN-7833 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Carlo Curino >Assignee: Jose Miguel Arreola >Priority: Major > > To develop algorithms for federation, it would be of great help to have a > version of SLS that supports multi RMs and GPG. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-7870) [PERF/TEST] Performance testing of ReservationSystem at high job submission rates
Carlo Curino created YARN-7870: -- Summary: [PERF/TEST] Performance testing of ReservationSystem at high job submission rates Key: YARN-7870 URL: https://issues.apache.org/jira/browse/YARN-7870 Project: Hadoop YARN Issue Type: Sub-task Reporter: Carlo Curino To leverage the ReservationSystem as a gang-semantics enforcer for all jobs of a large federation, we need to evaluate it can sustain large number of job submissions (and replanning) per second. This Jira tracks this validation effort. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-7869) [PERF/TEST] Performance testing of CapacityScheudler at many-thousands of queues
Carlo Curino created YARN-7869: -- Summary: [PERF/TEST] Performance testing of CapacityScheudler at many-thousands of queues Key: YARN-7869 URL: https://issues.apache.org/jira/browse/YARN-7869 Project: Hadoop YARN Issue Type: Sub-task Reporter: Carlo Curino The CapacityScheduler is known to work well at tens to hundreds of queues. This Jira tracks performance testing at much larger scale thousands of queues, and deep queue hierachies >10 levels. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-7833) Extend SLS to support simulation of a Federated Environment
[ https://issues.apache.org/jira/browse/YARN-7833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino reassigned YARN-7833: -- Assignee: Jose Miguel Arreola > Extend SLS to support simulation of a Federated Environment > --- > > Key: YARN-7833 > URL: https://issues.apache.org/jira/browse/YARN-7833 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Carlo Curino >Assignee: Jose Miguel Arreola >Priority: Major > > To develop algorithms for federation, it would be of great help to have a > version of SLS that supports multi RMs and GPG. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-7615) [RESERVATION] Federation StateStore: support storage/retrieval of reservations
[ https://issues.apache.org/jira/browse/YARN-7615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino reassigned YARN-7615: -- Assignee: Giovanni Matteo Fumarola > [RESERVATION] Federation StateStore: support storage/retrieval of reservations > -- > > Key: YARN-7615 > URL: https://issues.apache.org/jira/browse/YARN-7615 > Project: Hadoop YARN > Issue Type: Sub-task > Components: federation >Reporter: Carlo Curino >Assignee: Giovanni Matteo Fumarola >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-7405) [GQ] Bias container allocations based on global view
[ https://issues.apache.org/jira/browse/YARN-7405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino reassigned YARN-7405: -- Assignee: Subru Krishnan > [GQ] Bias container allocations based on global view > > > Key: YARN-7405 > URL: https://issues.apache.org/jira/browse/YARN-7405 > Project: Hadoop YARN > Issue Type: Sub-task > Components: federation >Reporter: Carlo Curino >Assignee: Subru Krishnan >Priority: Major > > Each RM in a federation should bias its local allocations of containers based > on the global over/under utilization of queues. As part of this the local RM > should account for the work that other RMs will be doing in between the > updates we receive via the heartbeats of YARN-7404 (the mechanics used for > synchronization). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-7834) [GQ] Rebalance queue configuration for load-balancing and locality affinities
Carlo Curino created YARN-7834: -- Summary: [GQ] Rebalance queue configuration for load-balancing and locality affinities Key: YARN-7834 URL: https://issues.apache.org/jira/browse/YARN-7834 Project: Hadoop YARN Issue Type: Sub-task Reporter: Carlo Curino This Jira tracks algorithmic work, which will run in the GPG and will rebalance the mapping of queues to sub-clusters. The current design supports both balancing the "load" across sub-clusters (proportionally to their size) and as a second objective to maximize the affinity between queues and the sub-clusters where they historically have most demand. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-7834) [GQ] Rebalance queue configuration for load-balancing and locality affinities
[ https://issues.apache.org/jira/browse/YARN-7834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino reassigned YARN-7834: -- Assignee: Carlo Curino > [GQ] Rebalance queue configuration for load-balancing and locality affinities > - > > Key: YARN-7834 > URL: https://issues.apache.org/jira/browse/YARN-7834 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Carlo Curino >Assignee: Carlo Curino >Priority: Major > > This Jira tracks algorithmic work, which will run in the GPG and will > rebalance the mapping of queues to sub-clusters. The current design supports > both balancing the "load" across sub-clusters (proportionally to their size) > and as a second objective to maximize the affinity between queues and the > sub-clusters where they historically have most demand. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7403) [GQ] Compute global and local "IdealAllocation"
[ https://issues.apache.org/jira/browse/YARN-7403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino updated YARN-7403: --- Summary: [GQ] Compute global and local "IdealAllocation" (was: [GQ] Compute global and local preemption) > [GQ] Compute global and local "IdealAllocation" > --- > > Key: YARN-7403 > URL: https://issues.apache.org/jira/browse/YARN-7403 > Project: Hadoop YARN > Issue Type: Sub-task > Components: federation >Reporter: Carlo Curino >Assignee: Carlo Curino >Priority: Major > Attachments: YARN-7403.draft.patch, YARN-7403.draft2.patch, > YARN-7403.draft3.patch, global-queues-preemption.PNG > > > This JIRA tracks algorithmic effort to combine the local queue views of > capacity guarantee/use/demand and compute the global amount of preemption, > and based on that, "where" (in which sub-cluster) preemption will be enacted. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-7833) Extend SLS to support simulation of a Federated Environment
Carlo Curino created YARN-7833: -- Summary: Extend SLS to support simulation of a Federated Environment Key: YARN-7833 URL: https://issues.apache.org/jira/browse/YARN-7833 Project: Hadoop YARN Issue Type: Sub-task Reporter: Carlo Curino To develop algorithms for federation, it would be of great help to have a version of SLS that supports multi RMs and GPG. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6648) [GPG] Add SubClusterCleaner in Global Policy Generator
[ https://issues.apache.org/jira/browse/YARN-6648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16338144#comment-16338144 ] Carlo Curino commented on YARN-6648: [~botong] thanks for updating the patch, +1 from me, with the following minor issues: # Fix the findbugs exclusion (I see you are already trying to do so, but seems that your offering have not please the Yetus gods yet :)). # (minor) {{SubclusterCleaner}} line 98 the {{LOG.info}} seems a bit redundant, maybe LOG.debug? I see it being useful while debugging, but during normal operations is somewhat unnecessary > [GPG] Add SubClusterCleaner in Global Policy Generator > -- > > Key: YARN-6648 > URL: https://issues.apache.org/jira/browse/YARN-6648 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Botong Huang >Assignee: Botong Huang >Priority: Minor > Labels: federation, gpg > Attachments: YARN-6648-YARN-2915.v1.patch, > YARN-6648-YARN-7402.v2.patch, YARN-6648-YARN-7402.v3.patch, > YARN-6648-YARN-7402.v4.patch, YARN-6648-YARN-7402.v5.patch, > YARN-6648-YARN-7402.v6.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-6648) [GPG] Add SubClusterCleaner in Global Policy Generator
[ https://issues.apache.org/jira/browse/YARN-6648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16332694#comment-16332694 ] Carlo Curino edited comment on YARN-6648 at 1/19/18 6:37 PM: - [~botong] thanks for the updated patch, I think it is nicer to have them combined (easier to follow). Here a few questions/suggestions (some pretty minor, some more important): # in {{MemoryFederationStateStore.setSubClusterLastHeartbeat}} why do you go through {{getSubcluster}} instead of just doing {{membership.get(subClusterId).setLastHeartBeat(longHeartBeat)}} ? # In {{GPGUtils}} consider using {{DurationFormatUtils.formatDuration(long, string_format)}}, instead of the code you have. # In {{GlobalPolicyGenerator}} ## should we keep the string constants here, or have them in {{YarnConfiguration}} or other places where those are usually defined? ## Is the {{SubClusterCleanerService}} required by every Federation deployment, or is it something we might want to make configurable (runs only if turned on). More generally, should we have a generic mechanism to "start services" in the GPG? # In {{SubClusterCleaner}} ## line 77, is there a way for us to "check" whether the format in the {{StateStore}} is local or UTC? Related is the code around line 100, you seem to doubt the format, and be conservative about it, which might mean the clean-up is at times could be delayed by many hours. Anything better than assuming things and/or being overly conservative? ## In {{SubClusterCleaner}} line 87, maybe a bit verbose? Should some of this be {{LOG.debug}} instead (if so, wrap it in the usual {{if(debugEnabled)}} check)? ## What do you do in case the subCluster {{isUnusable()}}? # In {{SubClusterCleanerService}} ## type in Javadoc GPE ## I assume we will have many similar "actions run on a schedule", can you make this class more generic (templatize it, so we can re-use it)? ## If the threads crashes, do we have something that restarts it? I see it throws {{Exception}}, anyone restarting the service if it throws? was (Author: curino): [~botong] thanks for the updated patch, I think it is nicer to have them combined (easier to follow). Here a few questions/suggestions (some pretty minor, some more important): # in {{MemoryFederationStateStore.setSubClusterLastHeartbeat}} why do you go through {{getSubcluster}} instead of just doing {{membership.get(subClusterId).setLastHeartBeat(longHeartBeat)}} ? # In {{GPGUtils}} consider using {{DurationFormatUtils.formatDuration(long, string_format)}}, instead of the code you have. # In {{GlobalPolicyGenerator}} ## should we keep the string constants here, or have them in {{YarnConfiguration}} or other places where those are usually defined? ## Is the {{SubClusterCleanerService}} required by every Federation deployment, or is it something we might want to make configurable (runs only if turned on). More generally, should we have a generic mechanism to "start services" in the GPG? # In {{SubClusterCleaner}} ## line 77, is there a way for us to "check" whether the format in the {{StateStore}} is local or UTC? Related is the code around line 100, you seem to doubt the format, and be conservative about it, which might mean the clean-up is at times could be delayed by many hours. Anything better than assuming things and/or being overly conservative? ## In {{SubClusterCleaner}} line 87, maybe a bit verbose? Should some of this be {{LOG.debug}} instead (if so, wrap it in the usual {{if(debugEnabled)}} check)? ## What do you do in case the subCluster {{isUnusable()}}? #In \{{SubClusterCleanerService }} ## type in Javadoc GPE ## I assume we will have many similar "actions run on a schedule", can you make this class more generic (templatize it, so we can re-use it)? ## If the threads crashes, do we have something that restarts it? I see it throws {{Exception}}, anyone restarting the service if it throws? > [GPG] Add SubClusterCleaner in Global Policy Generator > -- > > Key: YARN-6648 > URL: https://issues.apache.org/jira/browse/YARN-6648 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Botong Huang >Assignee: Botong Huang >Priority: Minor > Labels: federation, gpg > Attachments: YARN-6648-YARN-2915.v1.patch, > YARN-6648-YARN-7402.v2.patch, YARN-6648-YARN-7402.v3.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6648) [GPG] Add SubClusterCleaner in Global Policy Generator
[ https://issues.apache.org/jira/browse/YARN-6648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16332694#comment-16332694 ] Carlo Curino commented on YARN-6648: [~botong] thanks for the updated patch, I think it is nicer to have them combined (easier to follow). Here a few questions/suggestions (some pretty minor, some more important): # in {{MemoryFederationStateStore.setSubClusterLastHeartbeat}} why do you go through {{getSubcluster}} instead of just doing {{membership.get(subClusterId).setLastHeartBeat(longHeartBeat)}} ? # In {{GPGUtils}} consider using {{DurationFormatUtils.formatDuration(long, string_format)}}, instead of the code you have. # In {{GlobalPolicyGenerator}} ## should we keep the string constants here, or have them in {{YarnConfiguration}} or other places where those are usually defined? ## Is the {{SubClusterCleanerService}} required by every Federation deployment, or is it something we might want to make configurable (runs only if turned on). More generally, should we have a generic mechanism to "start services" in the GPG? # In {{SubClusterCleaner}} ## line 77, is there a way for us to "check" whether the format in the {{StateStore}} is local or UTC? Related is the code around line 100, you seem to doubt the format, and be conservative about it, which might mean the clean-up is at times could be delayed by many hours. Anything better than assuming things and/or being overly conservative? ## In {{SubClusterCleaner}} line 87, maybe a bit verbose? Should some of this be {{LOG.debug}} instead (if so, wrap it in the usual {{if(debugEnabled)}} check)? ## What do you do in case the subCluster {{isUnusable()}}? #In \{{SubClusterCleanerService }} ## type in Javadoc GPE ## I assume we will have many similar "actions run on a schedule", can you make this class more generic (templatize it, so we can re-use it)? ## If the threads crashes, do we have something that restarts it? I see it throws {{Exception}}, anyone restarting the service if it throws? > [GPG] Add SubClusterCleaner in Global Policy Generator > -- > > Key: YARN-6648 > URL: https://issues.apache.org/jira/browse/YARN-6648 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Botong Huang >Assignee: Botong Huang >Priority: Minor > Labels: federation, gpg > Attachments: YARN-6648-YARN-2915.v1.patch, > YARN-6648-YARN-7402.v2.patch, YARN-6648-YARN-7402.v3.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-3660) [GPG] Federation Global Policy Generator (service hook only)
[ https://issues.apache.org/jira/browse/YARN-3660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16331562#comment-16331562 ] Carlo Curino commented on YARN-3660: [~botong] thanks for the contribution, v4 patch looks good. I committed it to the dev-branch YARN-7402. > [GPG] Federation Global Policy Generator (service hook only) > > > Key: YARN-3660 > URL: https://issues.apache.org/jira/browse/YARN-3660 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Carlo Curino >Assignee: Botong Huang >Priority: Major > Labels: federation, gpg > Attachments: YARN-3660-YARN-7402.v1.patch, > YARN-3660-YARN-7402.v2.patch, YARN-3660-YARN-7402.v3.patch, > YARN-3660-YARN-7402.v3.patch, YARN-3660-YARN-7402.v3.patch, > YARN-3660-YARN-7402.v4.patch > > > In a federated environment, local impairments of one sub-cluster might > unfairly affect users/queues that are mapped to that sub-cluster. A > centralized component (GPG) runs out-of-band and edits the policies governing > how users/queues are allocated to sub-clusters. This allows us to enforce > global invariants (by dynamically updating locally-enforced invariants). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6648) [GPG] Add FederationStateStore interfaces for Global Policy Generator
[ https://issues.apache.org/jira/browse/YARN-6648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16324373#comment-16324373 ] Carlo Curino commented on YARN-6648: [~botong] the changes in this JIRA seem fine/harmless. However, since I don't see the code that will use them are a bit pointless as is. There is a bit of a trade off between breaking things down in small easy to review JIRAs and keeping things together so that changes are justified. In this case, I think we might have been over-zealous in keeping patches small. Please combine this with the JIRA that uses them,and mark this as duplicate. > [GPG] Add FederationStateStore interfaces for Global Policy Generator > - > > Key: YARN-6648 > URL: https://issues.apache.org/jira/browse/YARN-6648 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Botong Huang >Assignee: Botong Huang >Priority: Minor > Labels: federation, gpg > Attachments: YARN-6648-YARN-2915.v1.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-3660) [GPG] Federation Global Policy Generator (service hook only)
[ https://issues.apache.org/jira/browse/YARN-3660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16324362#comment-16324362 ] Carlo Curino commented on YARN-3660: [~botong] the patch seems generally ok but please: # Address checkstyle issues and write some good Javadoc for the top classes # Please provide some basic test, even just that the boot is correct or fails on disable federation, the empty Test class is not ok (better none at all and explain why when QA complaints). # This JIRA is an empty service that you will need in other patches, let's rename the JIRA (my attempt is meah if you can find a better title) # The patch is reasonably small, so please include the bash/powershel scripts to start/stop/restart the GPG (you can look at what was done for the Federation Router) > [GPG] Federation Global Policy Generator (service hook only) > > > Key: YARN-3660 > URL: https://issues.apache.org/jira/browse/YARN-3660 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Carlo Curino >Assignee: Botong Huang > Labels: federation, gpg > Attachments: YARN-3660-YARN-7402.v1.patch > > > In a federated environment, local impairments of one sub-cluster might > unfairly affect users/queues that are mapped to that sub-cluster. A > centralized component (GPG) runs out-of-band and edits the policies governing > how users/queues are allocated to sub-clusters. This allows us to enforce > global invariants (by dynamically updating locally-enforced invariants). -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-3660) [GPG] Federation Global Policy Generator (service hook only)
[ https://issues.apache.org/jira/browse/YARN-3660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino updated YARN-3660: --- Summary: [GPG] Federation Global Policy Generator (service hook only) (was: [GPG] Federation Global Policy Generator (load balancing)) > [GPG] Federation Global Policy Generator (service hook only) > > > Key: YARN-3660 > URL: https://issues.apache.org/jira/browse/YARN-3660 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Carlo Curino >Assignee: Botong Huang > Labels: federation, gpg > Attachments: YARN-3660-YARN-7402.v1.patch > > > In a federated environment, local impairments of one sub-cluster might > unfairly affect users/queues that are mapped to that sub-cluster. A > centralized component (GPG) runs out-of-band and edits the policies governing > how users/queues are allocated to sub-clusters. This allows us to enforce > global invariants (by dynamically updating locally-enforced invariants). -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7402) Federation V2: Global Optimizations
[ https://issues.apache.org/jira/browse/YARN-7402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16319464#comment-16319464 ] Carlo Curino commented on YARN-7402: I created a dev-branch YARN-7402 for the activities of this umbrella JIRA. > Federation V2: Global Optimizations > --- > > Key: YARN-7402 > URL: https://issues.apache.org/jira/browse/YARN-7402 > Project: Hadoop YARN > Issue Type: New Feature > Components: federation >Reporter: Carlo Curino >Assignee: Carlo Curino > > YARN Federation today requires manual configuration of queues within each > sub-cluster, and each RM operates "in isolation". This has few issues: > # Preemption is computed locally (and might far exceed the global need) > # Jobs within a queue are forced to consume their resources "evenly" based on > queue mapping > This umbrella JIRA tracks a new feature that leverages the > FederationStateStore as a synchronization mechanism among RMs, and allows for > allocation and preemption decisions to be based on a (close to up-to-date) > global view of the cluster allocation and demand. The JIRA also tracks > algorithms to automatically generate policies for Router and AMRMProxy to > shape the traffic to each sub-cluster, and general "maintenance" of the > FederationStateStore. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-7725) [GQ] Compute global "ideal allocation" including locality biases
Carlo Curino created YARN-7725: -- Summary: [GQ] Compute global "ideal allocation" including locality biases Key: YARN-7725 URL: https://issues.apache.org/jira/browse/YARN-7725 Project: Hadoop YARN Issue Type: Sub-task Reporter: Carlo Curino This JIRA tracks an algorithmic effort to compute the global ideal allocation. We also take into account of locality demand/availability gap, and map down the global allocation to sub-cluster level, computing the delta+ and delta- for each queue in each sub-cluster. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-7725) [GQ] Compute global "ideal allocation" including locality biases
[ https://issues.apache.org/jira/browse/YARN-7725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino reassigned YARN-7725: -- Assignee: Carlo Curino > [GQ] Compute global "ideal allocation" including locality biases > > > Key: YARN-7725 > URL: https://issues.apache.org/jira/browse/YARN-7725 > Project: Hadoop YARN > Issue Type: Sub-task > Components: federation >Reporter: Carlo Curino >Assignee: Carlo Curino > > This JIRA tracks an algorithmic effort to compute the global ideal > allocation. We also take into account of locality demand/availability gap, > and map down the global allocation to sub-cluster level, computing the delta+ > and delta- for each queue in each sub-cluster. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7403) [GQ] Compute global and local preemption
[ https://issues.apache.org/jira/browse/YARN-7403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino updated YARN-7403: --- Summary: [GQ] Compute global and local preemption (was: Compute global and local preemption) > [GQ] Compute global and local preemption > > > Key: YARN-7403 > URL: https://issues.apache.org/jira/browse/YARN-7403 > Project: Hadoop YARN > Issue Type: Sub-task > Components: federation >Reporter: Carlo Curino >Assignee: Carlo Curino > Attachments: YARN-7403.draft.patch, YARN-7403.draft2.patch, > YARN-7403.draft3.patch, global-queues-preemption.PNG > > > This JIRA tracks algorithmic effort to combine the local queue views of > capacity guarantee/use/demand and compute the global amount of preemption, > and based on that, "where" (in which sub-cluster) preemption will be enacted. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7405) [GQ] Bias container allocations based on global view
[ https://issues.apache.org/jira/browse/YARN-7405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino updated YARN-7405: --- Summary: [GQ] Bias container allocations based on global view (was: Bias container allocations based on global view) > [GQ] Bias container allocations based on global view > > > Key: YARN-7405 > URL: https://issues.apache.org/jira/browse/YARN-7405 > Project: Hadoop YARN > Issue Type: Sub-task > Components: federation >Reporter: Carlo Curino > > Each RM in a federation should bias its local allocations of containers based > on the global over/under utilization of queues. As part of this the local RM > should account for the work that other RMs will be doing in between the > updates we receive via the heartbeats of YARN-7404 (the mechanics used for > synchronization). -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7404) [GQ] propagate to GPG queue-level utilization/pending information
[ https://issues.apache.org/jira/browse/YARN-7404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino updated YARN-7404: --- Summary: [GQ] propagate to GPG queue-level utilization/pending information (was: RM federation heartbeat to StateStore must include "queue state" ) > [GQ] propagate to GPG queue-level utilization/pending information > - > > Key: YARN-7404 > URL: https://issues.apache.org/jira/browse/YARN-7404 > Project: Hadoop YARN > Issue Type: Sub-task > Components: federation >Reporter: Carlo Curino > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7614) Support Reservation APIs in Federation Router
[ https://issues.apache.org/jira/browse/YARN-7614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino updated YARN-7614: --- Component/s: reservation system > Support Reservation APIs in Federation Router > - > > Key: YARN-7614 > URL: https://issues.apache.org/jira/browse/YARN-7614 > Project: Hadoop YARN > Issue Type: Sub-task > Components: federation, reservation system >Reporter: Carlo Curino > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7615) [RESERVATION] Federation StateStore: support storage/retrieval of reservations
[ https://issues.apache.org/jira/browse/YARN-7615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino updated YARN-7615: --- Summary: [RESERVATION] Federation StateStore: support storage/retrieval of reservations (was: Federation StateStore: support storage/retrieval of reservations) > [RESERVATION] Federation StateStore: support storage/retrieval of reservations > -- > > Key: YARN-7615 > URL: https://issues.apache.org/jira/browse/YARN-7615 > Project: Hadoop YARN > Issue Type: Sub-task > Components: federation >Reporter: Carlo Curino > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7614) [RESERVATION] Support Reservation APIs in Federation Router
[ https://issues.apache.org/jira/browse/YARN-7614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino updated YARN-7614: --- Summary: [RESERVATION] Support Reservation APIs in Federation Router (was: Support Reservation APIs in Federation Router) > [RESERVATION] Support Reservation APIs in Federation Router > --- > > Key: YARN-7614 > URL: https://issues.apache.org/jira/browse/YARN-7614 > Project: Hadoop YARN > Issue Type: Sub-task > Components: federation, reservation system >Reporter: Carlo Curino > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5871) [RESERVATION] Add support for reservation-based routing.
[ https://issues.apache.org/jira/browse/YARN-5871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino updated YARN-5871: --- Labels: federation reservation (was: federation) > [RESERVATION] Add support for reservation-based routing. > > > Key: YARN-5871 > URL: https://issues.apache.org/jira/browse/YARN-5871 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Affects Versions: YARN-2915 >Reporter: Carlo Curino >Assignee: Carlo Curino > Labels: federation, reservation > Attachments: YARN-5871-YARN-2915.01.patch, > YARN-5871-YARN-2915.01.patch, YARN-5871-YARN-2915.02.patch, > YARN-5871-YARN-2915.03.patch, YARN-5871-YARN-2915.04.patch > > > Adding policies that can route reservations, and that then route applications > to where the reservation have been placed. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5871) [RESERVATION] Add support for reservation-based routing.
[ https://issues.apache.org/jira/browse/YARN-5871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino updated YARN-5871: --- Summary: [RESERVATION] Add support for reservation-based routing. (was: Add support for reservation-based routing.) > [RESERVATION] Add support for reservation-based routing. > > > Key: YARN-5871 > URL: https://issues.apache.org/jira/browse/YARN-5871 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Affects Versions: YARN-2915 >Reporter: Carlo Curino >Assignee: Carlo Curino > Labels: federation, reservation > Attachments: YARN-5871-YARN-2915.01.patch, > YARN-5871-YARN-2915.01.patch, YARN-5871-YARN-2915.02.patch, > YARN-5871-YARN-2915.03.patch, YARN-5871-YARN-2915.04.patch > > > Adding policies that can route reservations, and that then route applications > to where the reservation have been placed. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7402) Federation V2: Global Optimizations
[ https://issues.apache.org/jira/browse/YARN-7402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino updated YARN-7402: --- Summary: Federation V2: Global Optimizations (was: Federation: Global Queues) Description: YARN Federation today requires manual configuration of queues within each sub-cluster, and each RM operates "in isolation". This has few issues: # Preemption is computed locally (and might far exceed the global need) # Jobs within a queue are forced to consume their resources "evenly" based on queue mapping This umbrella JIRA tracks a new feature that leverages the FederationStateStore as a synchronization mechanism among RMs, and allows for allocation and preemption decisions to be based on a (close to up-to-date) global view of the cluster allocation and demand. The JIRA also tracks algorithms to automatically generate policies for Router and AMRMProxy to shape the traffic to each sub-cluster, and general "maintenance" of the FederationStateStore. was: YARN Federation today requires manual configuration of queues within each sub-cluster, and each RM operates "in isolation". This has few issues: # Preemption is computed locally (and might far exceed the global need) # Jobs within a queue are forced to consume their resources "evenly" based on queue mapping This umbrella JIRA tracks a new feature that leverages the FederationStateStore as a synchronization mechanism among RMs, and allows for allocation and preemption decisions to be based on a (close to up-to-date) global view of the cluster allocation and demand. > Federation V2: Global Optimizations > --- > > Key: YARN-7402 > URL: https://issues.apache.org/jira/browse/YARN-7402 > Project: Hadoop YARN > Issue Type: New Feature > Components: federation >Reporter: Carlo Curino >Assignee: Carlo Curino > > YARN Federation today requires manual configuration of queues within each > sub-cluster, and each RM operates "in isolation". This has few issues: > # Preemption is computed locally (and might far exceed the global need) > # Jobs within a queue are forced to consume their resources "evenly" based on > queue mapping > This umbrella JIRA tracks a new feature that leverages the > FederationStateStore as a synchronization mechanism among RMs, and allows for > allocation and preemption decisions to be based on a (close to up-to-date) > global view of the cluster allocation and demand. The JIRA also tracks > algorithms to automatically generate policies for Router and AMRMProxy to > shape the traffic to each sub-cluster, and general "maintenance" of the > FederationStateStore. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7708) [GPG] Load based policy generator
[ https://issues.apache.org/jira/browse/YARN-7708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino updated YARN-7708: --- Parent Issue: YARN-7402 (was: YARN-5597) > [GPG] Load based policy generator > - > > Key: YARN-7708 > URL: https://issues.apache.org/jira/browse/YARN-7708 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Carlo Curino >Assignee: Young Chen > > This policy reads load from the "pendingQueueLength" metrics and provides > scaling into a set of weights that influence the AMRMProxy and Router > behaviors. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7599) [GPG] Application cleaner and subcluster cleaner in Global Policy Generator
[ https://issues.apache.org/jira/browse/YARN-7599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino updated YARN-7599: --- Parent Issue: YARN-7402 (was: YARN-5597) > [GPG] Application cleaner and subcluster cleaner in Global Policy Generator > --- > > Key: YARN-7599 > URL: https://issues.apache.org/jira/browse/YARN-7599 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Botong Huang >Assignee: Botong Huang >Priority: Minor > Labels: federation, gpg > > In Federation, we need a cleanup service for StateStore as well as Yarn > Registry. For the former, we need to remove old application records as well > as inactive subclusters. For the latter, failed and killed applications might > leave records in the Yarn Registry (see YARN-6128). We plan to add both > cleanup service in GPG -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7707) [GPG] Policy generator framework
[ https://issues.apache.org/jira/browse/YARN-7707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino updated YARN-7707: --- Parent Issue: YARN-7402 (was: YARN-5597) > [GPG] Policy generator framework > > > Key: YARN-7707 > URL: https://issues.apache.org/jira/browse/YARN-7707 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Carlo Curino >Assignee: Young Chen > Labels: federation, gpg > > This JIRA tracks the development of a generic framework for querying > sub-clusters for metrics, running policies, and updating them in the > FederationStateStore. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6648) [GPG] Add FederationStateStore interfaces for Global Policy Generator
[ https://issues.apache.org/jira/browse/YARN-6648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino updated YARN-6648: --- Parent Issue: YARN-7402 (was: YARN-5597) > [GPG] Add FederationStateStore interfaces for Global Policy Generator > - > > Key: YARN-6648 > URL: https://issues.apache.org/jira/browse/YARN-6648 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Botong Huang >Assignee: Botong Huang >Priority: Minor > Labels: federation, gpg > Attachments: YARN-6648-YARN-2915.v1.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-3660) [GPG] Federation Global Policy Generator (load balancing)
[ https://issues.apache.org/jira/browse/YARN-3660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino updated YARN-3660: --- Parent Issue: YARN-7402 (was: YARN-5597) > [GPG] Federation Global Policy Generator (load balancing) > - > > Key: YARN-3660 > URL: https://issues.apache.org/jira/browse/YARN-3660 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Carlo Curino >Assignee: Botong Huang > Labels: federation, gpg > > In a federated environment, local impairments of one sub-cluster might > unfairly affect users/queues that are mapped to that sub-cluster. A > centralized component (GPG) runs out-of-band and edits the policies governing > how users/queues are allocated to sub-clusters. This allows us to enforce > global invariants (by dynamically updating locally-enforced invariants). -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5597) YARN Federation improvements
[ https://issues.apache.org/jira/browse/YARN-5597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino updated YARN-5597: --- Summary: YARN Federation improvements (was: YARN Federation phase 2) > YARN Federation improvements > > > Key: YARN-5597 > URL: https://issues.apache.org/jira/browse/YARN-5597 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Subru Krishnan >Assignee: Subru Krishnan > > This umbrella JIRA tracks set of improvements over the YARN Federation MVP > (YARN-2915) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-7708) [GPG] Load based policy generator
[ https://issues.apache.org/jira/browse/YARN-7708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino reassigned YARN-7708: -- Assignee: Young Chen > [GPG] Load based policy generator > - > > Key: YARN-7708 > URL: https://issues.apache.org/jira/browse/YARN-7708 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Carlo Curino >Assignee: Young Chen > > This policy reads load from the "pendingQueueLength" metrics and provides > scaling into a set of weights that influence the AMRMProxy and Router > behaviors. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-7707) [GPG] Policy generator framework
[ https://issues.apache.org/jira/browse/YARN-7707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino reassigned YARN-7707: -- Assignee: Young Chen (was: Carlo Curino) > [GPG] Policy generator framework > > > Key: YARN-7707 > URL: https://issues.apache.org/jira/browse/YARN-7707 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Carlo Curino >Assignee: Young Chen > Labels: federation, gpg > > This JIRA tracks the development of a generic framework for querying > sub-clusters for metrics, running policies, and updating them in the > FederationStateStore. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7707) [GPG] Policy generator framework
[ https://issues.apache.org/jira/browse/YARN-7707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino updated YARN-7707: --- Labels: federation gpg (was: ) > [GPG] Policy generator framework > > > Key: YARN-7707 > URL: https://issues.apache.org/jira/browse/YARN-7707 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Carlo Curino >Assignee: Young Chen > Labels: federation, gpg > > This JIRA tracks the development of a generic framework for querying > sub-clusters for metrics, running policies, and updating them in the > FederationStateStore. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-7707) [GPG] Policy generator framework
[ https://issues.apache.org/jira/browse/YARN-7707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino reassigned YARN-7707: -- Assignee: Carlo Curino (was: Young Chen) > [GPG] Policy generator framework > > > Key: YARN-7707 > URL: https://issues.apache.org/jira/browse/YARN-7707 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Carlo Curino >Assignee: Carlo Curino > > This JIRA tracks the development of a generic framework for querying > sub-clusters for metrics, running policies, and updating them in the > FederationStateStore. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-7707) [GPG] Policy generator framework
[ https://issues.apache.org/jira/browse/YARN-7707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino reassigned YARN-7707: -- Assignee: Young Chen > [GPG] Policy generator framework > > > Key: YARN-7707 > URL: https://issues.apache.org/jira/browse/YARN-7707 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Carlo Curino >Assignee: Young Chen > > This JIRA tracks the development of a generic framework for querying > sub-clusters for metrics, running policies, and updating them in the > FederationStateStore. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-7708) [GPG] Load based policy generator
Carlo Curino created YARN-7708: -- Summary: [GPG] Load based policy generator Key: YARN-7708 URL: https://issues.apache.org/jira/browse/YARN-7708 Project: Hadoop YARN Issue Type: Sub-task Reporter: Carlo Curino This policy reads load from the "pendingQueueLength" metrics and provides scaling into a set of weights that influence the AMRMProxy and Router behaviors. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-7707) [GPG] Policy generator framework
Carlo Curino created YARN-7707: -- Summary: [GPG] Policy generator framework Key: YARN-7707 URL: https://issues.apache.org/jira/browse/YARN-7707 Project: Hadoop YARN Issue Type: Sub-task Reporter: Carlo Curino This JIRA tracks the development of a generic framework for querying sub-clusters for metrics, running policies, and updating them in the FederationStateStore. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7599) [GPG] Application cleaner and subcluster cleaner in Global Policy Generator
[ https://issues.apache.org/jira/browse/YARN-7599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino updated YARN-7599: --- Summary: [GPG] Application cleaner and subcluster cleaner in Global Policy Generator (was: Application cleaner and subcluster cleaner in Global Policy Generator) > [GPG] Application cleaner and subcluster cleaner in Global Policy Generator > --- > > Key: YARN-7599 > URL: https://issues.apache.org/jira/browse/YARN-7599 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Botong Huang >Assignee: Botong Huang >Priority: Minor > Labels: federation, gpg > > In Federation, we need a cleanup service for StateStore as well as Yarn > Registry. For the former, we need to remove old application records as well > as inactive subclusters. For the latter, failed and killed applications might > leave records in the Yarn Registry (see YARN-6128). We plan to add both > cleanup service in GPG -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7599) [GPG] Application cleaner and subcluster cleaner in Global Policy Generator
[ https://issues.apache.org/jira/browse/YARN-7599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino updated YARN-7599: --- Labels: federation gpg (was: ) > [GPG] Application cleaner and subcluster cleaner in Global Policy Generator > --- > > Key: YARN-7599 > URL: https://issues.apache.org/jira/browse/YARN-7599 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Botong Huang >Assignee: Botong Huang >Priority: Minor > Labels: federation, gpg > > In Federation, we need a cleanup service for StateStore as well as Yarn > Registry. For the former, we need to remove old application records as well > as inactive subclusters. For the latter, failed and killed applications might > leave records in the Yarn Registry (see YARN-6128). We plan to add both > cleanup service in GPG -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6648) [GPG] Add FederationStateStore interfaces for Global Policy Generator
[ https://issues.apache.org/jira/browse/YARN-6648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino updated YARN-6648: --- Labels: federation gpg (was: ) > [GPG] Add FederationStateStore interfaces for Global Policy Generator > - > > Key: YARN-6648 > URL: https://issues.apache.org/jira/browse/YARN-6648 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Botong Huang >Assignee: Botong Huang >Priority: Minor > Labels: federation, gpg > Attachments: YARN-6648-YARN-2915.v1.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-3660) [GPG] Federation Global Policy Generator (load balancing)
[ https://issues.apache.org/jira/browse/YARN-3660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino updated YARN-3660: --- Summary: [GPG] Federation Global Policy Generator (load balancing) (was: Federation Global Policy Generator (load balancing)) > [GPG] Federation Global Policy Generator (load balancing) > - > > Key: YARN-3660 > URL: https://issues.apache.org/jira/browse/YARN-3660 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Carlo Curino >Assignee: Botong Huang > Labels: federation, gpg > > In a federated environment, local impairments of one sub-cluster might > unfairly affect users/queues that are mapped to that sub-cluster. A > centralized component (GPG) runs out-of-band and edits the policies governing > how users/queues are allocated to sub-clusters. This allows us to enforce > global invariants (by dynamically updating locally-enforced invariants). -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-3660) [GPG] Federation Global Policy Generator (load balancing)
[ https://issues.apache.org/jira/browse/YARN-3660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino updated YARN-3660: --- Labels: federation gpg (was: ) > [GPG] Federation Global Policy Generator (load balancing) > - > > Key: YARN-3660 > URL: https://issues.apache.org/jira/browse/YARN-3660 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Carlo Curino >Assignee: Botong Huang > Labels: federation, gpg > > In a federated environment, local impairments of one sub-cluster might > unfairly affect users/queues that are mapped to that sub-cluster. A > centralized component (GPG) runs out-of-band and edits the policies governing > how users/queues are allocated to sub-clusters. This allows us to enforce > global invariants (by dynamically updating locally-enforced invariants). -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6648) [GPG] Add FederationStateStore interfaces for Global Policy Generator
[ https://issues.apache.org/jira/browse/YARN-6648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino updated YARN-6648: --- Summary: [GPG] Add FederationStateStore interfaces for Global Policy Generator (was: Add FederationStateStore interfaces for Global Policy Generator) > [GPG] Add FederationStateStore interfaces for Global Policy Generator > - > > Key: YARN-6648 > URL: https://issues.apache.org/jira/browse/YARN-6648 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Botong Huang >Assignee: Botong Huang >Priority: Minor > Labels: federation, gpg > Attachments: YARN-6648-YARN-2915.v1.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-3660) Federation Global Policy Generator (load balancing)
[ https://issues.apache.org/jira/browse/YARN-3660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino reassigned YARN-3660: -- Assignee: Botong Huang (was: Subru Krishnan) > Federation Global Policy Generator (load balancing) > --- > > Key: YARN-3660 > URL: https://issues.apache.org/jira/browse/YARN-3660 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Carlo Curino >Assignee: Botong Huang > > In a federated environment, local impairments of one sub-cluster might > unfairly affect users/queues that are mapped to that sub-cluster. A > centralized component (GPG) runs out-of-band and edits the policies governing > how users/queues are allocated to sub-clusters. This allows us to enforce > global invariants (by dynamically updating locally-enforced invariants). -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-7615) Federation StateStore: support storage/retrieval of reservations
Carlo Curino created YARN-7615: -- Summary: Federation StateStore: support storage/retrieval of reservations Key: YARN-7615 URL: https://issues.apache.org/jira/browse/YARN-7615 Project: Hadoop YARN Issue Type: Sub-task Reporter: Carlo Curino -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-7614) Support Reservation APIs in Federation Router
Carlo Curino created YARN-7614: -- Summary: Support Reservation APIs in Federation Router Key: YARN-7614 URL: https://issues.apache.org/jira/browse/YARN-7614 Project: Hadoop YARN Issue Type: Sub-task Reporter: Carlo Curino -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7439) Minor improvements to Reservation System documentation/exceptions
[ https://issues.apache.org/jira/browse/YARN-7439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16238099#comment-16238099 ] Carlo Curino commented on YARN-7439: # The main documentation page for the ReservationSystem (http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ReservationSystem.html) should call out how to switch on the reservaiton system in yarn-site.xml # The submission-reservation.json example in (http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Reservation_API_Submit) is missing a coma after reservation id. # When the reservation system is disabled and we attempt to invoke new-reservation we get: {code} { "RemoteException": { "exception": "YarnRuntimeException", "message": "Unable to create new reservation from RM web service", "javaClassName": "org.apache.hadoop.yarn.exceptions.YarnRuntimeException" } } {code} which is not the most telling message. By constrast the submission throws back a more appropriate: {code} { "RemoteException": { "exception": "BadRequestException", "message": "java.lang.Exception: Reservation is not enabled. Please enable & try again", "javaClassName": "org.apache.hadoop.yarn.webapp.BadRequestException" } } {code} > Minor improvements to Reservation System documentation/exceptions > - > > Key: YARN-7439 > URL: https://issues.apache.org/jira/browse/YARN-7439 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Carlo Curino > > This JIRA tracks a couple of minor issues with docs and exception for the > reservation system. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-7439) Minor improvements to Reservation System documentation/exceptions
Carlo Curino created YARN-7439: -- Summary: Minor improvements to Reservation System documentation/exceptions Key: YARN-7439 URL: https://issues.apache.org/jira/browse/YARN-7439 Project: Hadoop YARN Issue Type: Bug Reporter: Carlo Curino This JIRA tracks a couple of minor issues with docs and exception for the reservation system. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7434) Router getApps REST invocation fails with multiple RMs
[ https://issues.apache.org/jira/browse/YARN-7434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16236910#comment-16236910 ] Carlo Curino commented on YARN-7434: Thanks [~elgoiri] for the patch. LGTM, let's wait for Yetus. Also as soon as this is checked by YETUS please upload the version for branch-2/branch-2.9. > Router getApps REST invocation fails with multiple RMs > -- > > Key: YARN-7434 > URL: https://issues.apache.org/jira/browse/YARN-7434 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Subru Krishnan >Assignee: Íñigo Goiri >Priority: Critical > Attachments: YARN-7434.000.patch > > > Router uses threads to invoke getApps in parallel with multiple RMs and has a > concurrency bug caused by sharing of the HTTP request object. This jira > tracks the changes to fix the multi-threading issue by cloning the request. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7431) resource estimator has findbugs problems
[ https://issues.apache.org/jira/browse/YARN-7431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16236804#comment-16236804 ] Carlo Curino commented on YARN-7431: The patch LGTM vs the issues listed, though let's see what yetus says. > resource estimator has findbugs problems > > > Key: YARN-7431 > URL: https://issues.apache.org/jira/browse/YARN-7431 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.9.0, 3.1.0 >Reporter: Allen Wittenauer >Assignee: Arun Suresh >Priority: Blocker > Attachments: YARN-7431.001.patch > > > Just see any recent report. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7403) Compute global and local preemption
[ https://issues.apache.org/jira/browse/YARN-7403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16227507#comment-16227507 ] Carlo Curino commented on YARN-7403: The attached screenshot shows an example of "globally" calculated local preemption. In particular, it tries to highlight the problem of locality of the demand vs availability of preemptable containers. The code in draf3 patch computes the total preemption to be 100 containers, it splits it among SC1 and SC2 based on B demand (so 66/33) and cap the preemption by the number of preemptable containers in A1 which is 20 in SC2. Other "splitting" decisions can be made, enforcing different invariants, e.g., that all 100 containers are preempted etc... I think the current policy is reasonable, when combined with a stateful AMRMPRoxy policy that "relax" locality demand, as the requests from B will eventually be migrated towards the sub-cluster where demand is being fulfilled, i.e., in a later time B's demand should be in SC1 and more preemptiong of A1 containers in SC1 should kick in. Thoughts? > Compute global and local preemption > --- > > Key: YARN-7403 > URL: https://issues.apache.org/jira/browse/YARN-7403 > Project: Hadoop YARN > Issue Type: Sub-task > Components: federation >Reporter: Carlo Curino >Assignee: Carlo Curino > Attachments: YARN-7403.draft.patch, YARN-7403.draft2.patch, > YARN-7403.draft3.patch, global-queues-preemption.PNG > > > This JIRA tracks algorithmic effort to combine the local queue views of > capacity guarantee/use/demand and compute the global amount of preemption, > and based on that, "where" (in which sub-cluster) preemption will be enacted. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7403) Compute global and local preemption
[ https://issues.apache.org/jira/browse/YARN-7403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino updated YARN-7403: --- Attachment: global-queues-preemption.PNG > Compute global and local preemption > --- > > Key: YARN-7403 > URL: https://issues.apache.org/jira/browse/YARN-7403 > Project: Hadoop YARN > Issue Type: Sub-task > Components: federation >Reporter: Carlo Curino >Assignee: Carlo Curino > Attachments: YARN-7403.draft.patch, YARN-7403.draft2.patch, > YARN-7403.draft3.patch, global-queues-preemption.PNG > > > This JIRA tracks algorithmic effort to combine the local queue views of > capacity guarantee/use/demand and compute the global amount of preemption, > and based on that, "where" (in which sub-cluster) preemption will be enacted. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7403) Compute global and local preemption
[ https://issues.apache.org/jira/browse/YARN-7403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino updated YARN-7403: --- Attachment: YARN-7403.draft3.patch > Compute global and local preemption > --- > > Key: YARN-7403 > URL: https://issues.apache.org/jira/browse/YARN-7403 > Project: Hadoop YARN > Issue Type: Sub-task > Components: federation >Reporter: Carlo Curino >Assignee: Carlo Curino > Attachments: YARN-7403.draft.patch, YARN-7403.draft2.patch, > YARN-7403.draft3.patch > > > This JIRA tracks algorithmic effort to combine the local queue views of > capacity guarantee/use/demand and compute the global amount of preemption, > and based on that, "where" (in which sub-cluster) preemption will be enacted. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7403) Compute global and local preemption
[ https://issues.apache.org/jira/browse/YARN-7403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino updated YARN-7403: --- Attachment: YARN-7403.draft2.patch Fixing ASF license. > Compute global and local preemption > --- > > Key: YARN-7403 > URL: https://issues.apache.org/jira/browse/YARN-7403 > Project: Hadoop YARN > Issue Type: Sub-task > Components: federation >Reporter: Carlo Curino >Assignee: Carlo Curino > Attachments: YARN-7403.draft.patch, YARN-7403.draft2.patch > > > This JIRA tracks algorithmic effort to combine the local queue views of > capacity guarantee/use/demand and compute the global amount of preemption, > and based on that, "where" (in which sub-cluster) preemption will be enacted. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-7405) Bias container allocations based on global view
Carlo Curino created YARN-7405: -- Summary: Bias container allocations based on global view Key: YARN-7405 URL: https://issues.apache.org/jira/browse/YARN-7405 Project: Hadoop YARN Issue Type: Sub-task Reporter: Carlo Curino Each RM in a federation should bias its local allocations of containers based on the global over/under utilization of queues. As part of this the local RM should account for the work that other RMs will be doing in between the updates we receive via the heartbeats of YARN-7404 (the mechanics used for synchronization). -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org