[jira] [Commented] (YARN-7953) [GQ] Data structures for federation global queues calculations

2018-03-06 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16388325#comment-16388325
 ] 

Carlo Curino commented on YARN-7953:


[~asuresh] I like this suggestion, and from our offline convo I know you are 
looking into it, please let me know if it looks promising once you tested the 
ideas.

Since the FederationQueue objects I have here are transformed in YARN-7403 and 
YARN-7834 in other objects for algorithmic calculations, 
 this should be pretty doable in terms of the rest of the YARN-7402 work items.

Small caveats:
 # The reason I had initially not used QueueMetrics is that I saw them being 
broken/off often in live clusters, so I thought they were maintained a bit 
sloppily.
 If we can assure they are correct and consistent I think it is good to have 
them.
 # Also we should validate whether the polling of QueueMetrics is for 
performance (might be better due to already maintained objects and the delta 
protocol,
 but want to make sure).
 # The other advantage of the FedQueue objects was the fact that were very easy 
to build tests by constructing scenarios in .json. If we can do
 the same for QueueMetrics and the alike, I think it should be good.

> [GQ] Data structures for federation global queues calculations
> --
>
> Key: YARN-7953
> URL: https://issues.apache.org/jira/browse/YARN-7953
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Carlo Curino
>Assignee: Carlo Curino
>Priority: Major
> Attachments: YARN-7953.v1.patch
>
>
> This Jira tracks data structures and helper classes used by the core 
> algorithms of YARN-7402 umbrella Jira (currently YARN-7403, and YARN-7834).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7403) [GQ] Compute global and local "IdealAllocation"

2018-02-21 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-7403:
---
Attachment: YARN-7403.v3.patch

> [GQ] Compute global and local "IdealAllocation"
> ---
>
> Key: YARN-7403
> URL: https://issues.apache.org/jira/browse/YARN-7403
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation
>Reporter: Carlo Curino
>Assignee: Carlo Curino
>Priority: Major
> Attachments: YARN-7403.draft.patch, YARN-7403.draft2.patch, 
> YARN-7403.draft3.patch, YARN-7403.v1.patch, YARN-7403.v2.patch, 
> YARN-7403.v3.patch, global-queues-preemption.PNG
>
>
> This JIRA tracks algorithmic effort to combine the local queue views of 
> capacity guarantee/use/demand and compute the global ideal allocation, and 
> the respective local allocations. This will inform the RMs in each 
> sub-clusters on how to allocate more containers to each queues (allowing for 
> temporary over/under allocations that are locally excessive, but globally 
> correct).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7403) [GQ] Compute global and local "IdealAllocation"

2018-02-21 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16372324#comment-16372324
 ] 

Carlo Curino commented on YARN-7403:


[~kkaranasos] thanks for looking at this. I initially put it together because 
it is not easy to understand why we have certain data structures without the 
code that use them, but if it is easier to review for you I am ok to split.
# YARN-7953 is now a data-structure only patch (with minor refactoring should 
now compile fine, and be reasonably self-sustaining)
# YARN-7403 (this patch) is now algo-only and depends on YARN-7953 and 
YARN-7934 (the hook in CS/preemption code patch)

BTW the choice of JAX-B is because we are considering REST endpoint as a way to 
communicate between RM and GPG.

> [GQ] Compute global and local "IdealAllocation"
> ---
>
> Key: YARN-7403
> URL: https://issues.apache.org/jira/browse/YARN-7403
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation
>Reporter: Carlo Curino
>Assignee: Carlo Curino
>Priority: Major
> Attachments: YARN-7403.draft.patch, YARN-7403.draft2.patch, 
> YARN-7403.draft3.patch, YARN-7403.v1.patch, YARN-7403.v2.patch, 
> global-queues-preemption.PNG
>
>
> This JIRA tracks algorithmic effort to combine the local queue views of 
> capacity guarantee/use/demand and compute the global ideal allocation, and 
> the respective local allocations. This will inform the RMs in each 
> sub-clusters on how to allocate more containers to each queues (allowing for 
> temporary over/under allocations that are locally excessive, but globally 
> correct).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-7953) [GQ] Data structures for federation global queues calculations

2018-02-21 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16372305#comment-16372305
 ] 

Carlo Curino edited comment on YARN-7953 at 2/22/18 1:41 AM:
-

Per [this |#comment-16370912] ask by [~kkaranasos], I am splitting YARN-7403 
into a data-only patch, this one, and the algo side in YARN-7403.


was (Author: curino):
Per [this |#comment-16370912] ask by, I am splitting YARN-7403 into a data-only 
patch, this one, and the algo side in YARN-7403.

> [GQ] Data structures for federation global queues calculations
> --
>
> Key: YARN-7953
> URL: https://issues.apache.org/jira/browse/YARN-7953
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Carlo Curino
>Assignee: Carlo Curino
>Priority: Major
> Attachments: YARN-7953.v1.patch
>
>
> This Jira tracks data structures and helper classes used by the core 
> algorithms of YARN-7402 umbrella Jira (currently YARN-7403, and YARN-7834).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-7953) [GQ] Data structures for federation global queues calculations

2018-02-21 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino reassigned YARN-7953:
--

Assignee: Carlo Curino

> [GQ] Data structures for federation global queues calculations
> --
>
> Key: YARN-7953
> URL: https://issues.apache.org/jira/browse/YARN-7953
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Carlo Curino
>Assignee: Carlo Curino
>Priority: Major
> Attachments: YARN-7953.v1.patch
>
>
> This Jira tracks data structures and helper classes used by the core 
> algorithms of YARN-7402 umbrella Jira (currently YARN-7403, and YARN-7834).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7953) [GQ] Data structures for federation global queues calculations

2018-02-21 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-7953:
---
Attachment: YARN-7953.v1.patch

> [GQ] Data structures for federation global queues calculations
> --
>
> Key: YARN-7953
> URL: https://issues.apache.org/jira/browse/YARN-7953
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Carlo Curino
>Assignee: Carlo Curino
>Priority: Major
> Attachments: YARN-7953.v1.patch
>
>
> This Jira tracks data structures and helper classes used by the core 
> algorithms of YARN-7402 umbrella Jira (currently YARN-7403, and YARN-7834).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-7953) [GQ] Data structures for federation global queues calculations

2018-02-21 Thread Carlo Curino (JIRA)
Carlo Curino created YARN-7953:
--

 Summary: [GQ] Data structures for federation global queues 
calculations
 Key: YARN-7953
 URL: https://issues.apache.org/jira/browse/YARN-7953
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Carlo Curino


This Jira tracks data structures and helper classes used by the core algorithms 
of YARN-7402 umbrella Jira (currently YARN-7403, and YARN-7834).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7732) Support Generic AM Simulator from SynthGenerator

2018-02-21 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16371824#comment-16371824
 ] 

Carlo Curino commented on YARN-7732:


Thanks [~leftnoteasy], so should we then push to branch-3.0 (for all 3.x future 
branches?)

> Support Generic AM Simulator from SynthGenerator
> 
>
> Key: YARN-7732
> URL: https://issues.apache.org/jira/browse/YARN-7732
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler-load-simulator
>Reporter: Young Chen
>Assignee: Young Chen
>Priority: Minor
> Attachments: YARN-7732-YARN-7798.01.patch, 
> YARN-7732-YARN-7798.02.patch, YARN-7732.01.patch, YARN-7732.02.patch, 
> YARN-7732.03.patch, YARN-7732.04.patch, YARN-7732.05.patch, YARN-7732.06.patch
>
>
> Extract the MapReduce specific set-up in the SLSRunner into the 
> MRAMSimulator, and enable support for pluggable AMSimulators.
> Previously, the AM set up in SLSRunner had the MRAMSimulator type hard coded, 
> for example startAMFromSynthGenerator() calls this:
>  
> {code:java}
> runNewAM(SLSUtils.DEFAULT_JOB_TYPE, user, jobQueue, oldJobId,
> jobStartTimeMS, jobFinishTimeMS, containerList, reservationId,
> job.getDeadline(), getAMContainerResource(null));
> {code}
> where SLSUtils.DEFAULT_JOB_TYPE = "mapreduce"
> The container set up was also only suitable for mapreduce: 
>  
> {code:java}
> Version:1.0 StartHTML:00286 EndHTML:12564 StartFragment:03634 
> EndFragment:12474 StartSelection:03700 EndSelection:12464 
> SourceURL:https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/SLSRunner.java
>  
> // map tasks
> for (int i = 0; i < job.getNumberMaps(); i++) {
>   TaskAttemptInfo tai = job.getTaskAttemptInfo(TaskType.MAP, i, 0);
>   RMNode node =
>   nmMap.get(keyAsArray.get(rand.nextInt(keyAsArray.size(
>   .getNode();
>   String hostname = "/" + node.getRackName() + "/" + node.getHostName();
>   long containerLifeTime = tai.getRuntime();
>   Resource containerResource =
>   Resource.newInstance((int) tai.getTaskInfo().getTaskMemory(),
>   (int) tai.getTaskInfo().getTaskVCores());
>   containerList.add(new ContainerSimulator(containerResource,
>   containerLifeTime, hostname, DEFAULT_MAPPER_PRIORITY, "map"));
> }
> // reduce tasks
> for (int i = 0; i < job.getNumberReduces(); i++) {
>   TaskAttemptInfo tai = job.getTaskAttemptInfo(TaskType.REDUCE, i, 0);
>   RMNode node =
>   nmMap.get(keyAsArray.get(rand.nextInt(keyAsArray.size(
>   .getNode();
>   String hostname = "/" + node.getRackName() + "/" + node.getHostName();
>   long containerLifeTime = tai.getRuntime();
>   Resource containerResource =
>   Resource.newInstance((int) tai.getTaskInfo().getTaskMemory(),
>   (int) tai.getTaskInfo().getTaskVCores());
>   containerList.add(
>   new ContainerSimulator(containerResource, containerLifeTime,
>   hostname, DEFAULT_REDUCER_PRIORITY, "reduce"));
> }
> {code}
>  
> In addition, the syn.json format supported only mapreduce (the parameters 
> were very specific: mtime, rtime, mtasks, rtasks, etc..).
> This patch aims to introduce a new syn.json format that can describe generic 
> jobs, and the SLS setup required to support the synth generation of generic 
> jobs.
> See syn_generic.json for an equivalent of the previous syn.json in the new 
> format.
> Using the new generic format, we describe a StreamAMSimulator simulates a 
> long running streaming service that maintains N number of containers for the 
> lifetime of the AM. See syn_stream.json.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7798) Refactor SLS Reservation Creation

2018-02-20 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370822#comment-16370822
 ] 

Carlo Curino commented on YARN-7798:


Cherry-picked back to branch-3 with a clean cherry-pick (and spot checks of SLS 
tests running fine)

> Refactor SLS Reservation Creation
> -
>
> Key: YARN-7798
> URL: https://issues.apache.org/jira/browse/YARN-7798
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Young Chen
>Assignee: Young Chen
>Priority: Minor
> Fix For: 3.1.0
>
> Attachments: YARN-7798.01.patch, YARN-7798.02.patch, 
> YARN-7798.03.patch
>
>
> Move the reservation request creation out of SLSRunner and delegate to the 
> AMSimulator instance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7732) Support Generic AM Simulator from SynthGenerator

2018-02-20 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370821#comment-16370821
 ] 

Carlo Curino commented on YARN-7732:


Thanks [~youchen] for the contribution, and [~leftnoteasy] for reviewing. I 
committed this to trunk, and cherry picked back this patch (and YARN-7798) to 
branch-3, since it was a clean cherry-pick and spot runs of SLS tests look 
good.  
[~leftnoteasy] and [~yufeigu], if you see issue with this cherry-pick let me 
know we can easily revert, I would like as much as possible to have all the SLS 
newer magic available in all branches, as it is very useful for 
regression/integration/performance testing.

 

[~youchen] can you see why YARN-7798 does not apply to branch-2, it might be a 
very simple fix, in which case, please provide a patch for both YARN-7798 and 
YARN-7732 that works in branch-2, so we an backport there as well.

> Support Generic AM Simulator from SynthGenerator
> 
>
> Key: YARN-7732
> URL: https://issues.apache.org/jira/browse/YARN-7732
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler-load-simulator
>Reporter: Young Chen
>Assignee: Young Chen
>Priority: Minor
> Attachments: YARN-7732-YARN-7798.01.patch, 
> YARN-7732-YARN-7798.02.patch, YARN-7732.01.patch, YARN-7732.02.patch, 
> YARN-7732.03.patch, YARN-7732.04.patch, YARN-7732.05.patch, YARN-7732.06.patch
>
>
> Extract the MapReduce specific set-up in the SLSRunner into the 
> MRAMSimulator, and enable support for pluggable AMSimulators.
> Previously, the AM set up in SLSRunner had the MRAMSimulator type hard coded, 
> for example startAMFromSynthGenerator() calls this:
>  
> {code:java}
> runNewAM(SLSUtils.DEFAULT_JOB_TYPE, user, jobQueue, oldJobId,
> jobStartTimeMS, jobFinishTimeMS, containerList, reservationId,
> job.getDeadline(), getAMContainerResource(null));
> {code}
> where SLSUtils.DEFAULT_JOB_TYPE = "mapreduce"
> The container set up was also only suitable for mapreduce: 
>  
> {code:java}
> Version:1.0 StartHTML:00286 EndHTML:12564 StartFragment:03634 
> EndFragment:12474 StartSelection:03700 EndSelection:12464 
> SourceURL:https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/SLSRunner.java
>  
> // map tasks
> for (int i = 0; i < job.getNumberMaps(); i++) {
>   TaskAttemptInfo tai = job.getTaskAttemptInfo(TaskType.MAP, i, 0);
>   RMNode node =
>   nmMap.get(keyAsArray.get(rand.nextInt(keyAsArray.size(
>   .getNode();
>   String hostname = "/" + node.getRackName() + "/" + node.getHostName();
>   long containerLifeTime = tai.getRuntime();
>   Resource containerResource =
>   Resource.newInstance((int) tai.getTaskInfo().getTaskMemory(),
>   (int) tai.getTaskInfo().getTaskVCores());
>   containerList.add(new ContainerSimulator(containerResource,
>   containerLifeTime, hostname, DEFAULT_MAPPER_PRIORITY, "map"));
> }
> // reduce tasks
> for (int i = 0; i < job.getNumberReduces(); i++) {
>   TaskAttemptInfo tai = job.getTaskAttemptInfo(TaskType.REDUCE, i, 0);
>   RMNode node =
>   nmMap.get(keyAsArray.get(rand.nextInt(keyAsArray.size(
>   .getNode();
>   String hostname = "/" + node.getRackName() + "/" + node.getHostName();
>   long containerLifeTime = tai.getRuntime();
>   Resource containerResource =
>   Resource.newInstance((int) tai.getTaskInfo().getTaskMemory(),
>   (int) tai.getTaskInfo().getTaskVCores());
>   containerList.add(
>   new ContainerSimulator(containerResource, containerLifeTime,
>   hostname, DEFAULT_REDUCER_PRIORITY, "reduce"));
> }
> {code}
>  
> In addition, the syn.json format supported only mapreduce (the parameters 
> were very specific: mtime, rtime, mtasks, rtasks, etc..).
> This patch aims to introduce a new syn.json format that can describe generic 
> jobs, and the SLS setup required to support the synth generation of generic 
> jobs.
> See syn_generic.json for an equivalent of the previous syn.json in the new 
> format.
> Using the new generic format, we describe a StreamAMSimulator simulates a 
> long running streaming service that maintains N number of containers for the 
> lifetime of the AM. See syn_stream.json.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7732) Support Generic AM Simulator from SynthGenerator

2018-02-20 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370610#comment-16370610
 ] 

Carlo Curino commented on YARN-7732:


Thanks [~leftnoteasy] for  the review. [~youchen] please fix the ASF license 
issue (by adding an exclusion in pom.xml), and I will commit to trunk based on 
Wangda's review (and a quick skim from me).

> Support Generic AM Simulator from SynthGenerator
> 
>
> Key: YARN-7732
> URL: https://issues.apache.org/jira/browse/YARN-7732
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler-load-simulator
>Reporter: Young Chen
>Assignee: Young Chen
>Priority: Minor
> Attachments: YARN-7732-YARN-7798.01.patch, 
> YARN-7732-YARN-7798.02.patch, YARN-7732.01.patch, YARN-7732.02.patch, 
> YARN-7732.03.patch, YARN-7732.04.patch, YARN-7732.05.patch
>
>
> Extract the MapReduce specific set-up in the SLSRunner into the 
> MRAMSimulator, and enable support for pluggable AMSimulators.
> Previously, the AM set up in SLSRunner had the MRAMSimulator type hard coded, 
> for example startAMFromSynthGenerator() calls this:
>  
> {code:java}
> runNewAM(SLSUtils.DEFAULT_JOB_TYPE, user, jobQueue, oldJobId,
> jobStartTimeMS, jobFinishTimeMS, containerList, reservationId,
> job.getDeadline(), getAMContainerResource(null));
> {code}
> where SLSUtils.DEFAULT_JOB_TYPE = "mapreduce"
> The container set up was also only suitable for mapreduce: 
>  
> {code:java}
> Version:1.0 StartHTML:00286 EndHTML:12564 StartFragment:03634 
> EndFragment:12474 StartSelection:03700 EndSelection:12464 
> SourceURL:https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/SLSRunner.java
>  
> // map tasks
> for (int i = 0; i < job.getNumberMaps(); i++) {
>   TaskAttemptInfo tai = job.getTaskAttemptInfo(TaskType.MAP, i, 0);
>   RMNode node =
>   nmMap.get(keyAsArray.get(rand.nextInt(keyAsArray.size(
>   .getNode();
>   String hostname = "/" + node.getRackName() + "/" + node.getHostName();
>   long containerLifeTime = tai.getRuntime();
>   Resource containerResource =
>   Resource.newInstance((int) tai.getTaskInfo().getTaskMemory(),
>   (int) tai.getTaskInfo().getTaskVCores());
>   containerList.add(new ContainerSimulator(containerResource,
>   containerLifeTime, hostname, DEFAULT_MAPPER_PRIORITY, "map"));
> }
> // reduce tasks
> for (int i = 0; i < job.getNumberReduces(); i++) {
>   TaskAttemptInfo tai = job.getTaskAttemptInfo(TaskType.REDUCE, i, 0);
>   RMNode node =
>   nmMap.get(keyAsArray.get(rand.nextInt(keyAsArray.size(
>   .getNode();
>   String hostname = "/" + node.getRackName() + "/" + node.getHostName();
>   long containerLifeTime = tai.getRuntime();
>   Resource containerResource =
>   Resource.newInstance((int) tai.getTaskInfo().getTaskMemory(),
>   (int) tai.getTaskInfo().getTaskVCores());
>   containerList.add(
>   new ContainerSimulator(containerResource, containerLifeTime,
>   hostname, DEFAULT_REDUCER_PRIORITY, "reduce"));
> }
> {code}
>  
> In addition, the syn.json format supported only mapreduce (the parameters 
> were very specific: mtime, rtime, mtasks, rtasks, etc..).
> This patch aims to introduce a new syn.json format that can describe generic 
> jobs, and the SLS setup required to support the synth generation of generic 
> jobs.
> See syn_generic.json for an equivalent of the previous syn.json in the new 
> format.
> Using the new generic format, we describe a StreamAMSimulator simulates a 
> long running streaming service that maintains N number of containers for the 
> lifetime of the AM. See syn_stream.json.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7403) [GQ] Compute global and local "IdealAllocation"

2018-02-20 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370506#comment-16370506
 ] 

Carlo Curino commented on YARN-7403:


Fixing TestYarnConfigurationFields unit test failure (the rest still does not 
compile as depends on YARN-7403).

> [GQ] Compute global and local "IdealAllocation"
> ---
>
> Key: YARN-7403
> URL: https://issues.apache.org/jira/browse/YARN-7403
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation
>Reporter: Carlo Curino
>Assignee: Carlo Curino
>Priority: Major
> Attachments: YARN-7403.draft.patch, YARN-7403.draft2.patch, 
> YARN-7403.draft3.patch, YARN-7403.v1.patch, YARN-7403.v2.patch, 
> global-queues-preemption.PNG
>
>
> This JIRA tracks algorithmic effort to combine the local queue views of 
> capacity guarantee/use/demand and compute the global ideal allocation, and 
> the respective local allocations. This will inform the RMs in each 
> sub-clusters on how to allocate more containers to each queues (allowing for 
> temporary over/under allocations that are locally excessive, but globally 
> correct).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7403) [GQ] Compute global and local "IdealAllocation"

2018-02-20 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-7403:
---
Attachment: YARN-7403.v2.patch

> [GQ] Compute global and local "IdealAllocation"
> ---
>
> Key: YARN-7403
> URL: https://issues.apache.org/jira/browse/YARN-7403
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation
>Reporter: Carlo Curino
>Assignee: Carlo Curino
>Priority: Major
> Attachments: YARN-7403.draft.patch, YARN-7403.draft2.patch, 
> YARN-7403.draft3.patch, YARN-7403.v1.patch, YARN-7403.v2.patch, 
> global-queues-preemption.PNG
>
>
> This JIRA tracks algorithmic effort to combine the local queue views of 
> capacity guarantee/use/demand and compute the global ideal allocation, and 
> the respective local allocations. This will inform the RMs in each 
> sub-clusters on how to allocate more containers to each queues (allowing for 
> temporary over/under allocations that are locally excessive, but globally 
> correct).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7934) [GQ] Refactor preemption calculators to allow overriding for Federation Global Algos

2018-02-16 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16368062#comment-16368062
 ] 

Carlo Curino commented on YARN-7934:


[~subru] thanks for the review. I agree the test seems unrelated and passed in 
v3 (that diff with v4 only in comments), so likely just a flacky one. 

> [GQ] Refactor preemption calculators to allow overriding for Federation 
> Global Algos
> 
>
> Key: YARN-7934
> URL: https://issues.apache.org/jira/browse/YARN-7934
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Carlo Curino
>Assignee: Carlo Curino
>Priority: Major
> Attachments: YARN-7934.v1.patch, YARN-7934.v2.patch, 
> YARN-7934.v3.patch, YARN-7934.v4.patch
>
>
> This Jira tracks minimal changes in the capacity scheduler preemption 
> mechanics that allow for sub-classing and overriding of certain behaviors, 
> which we use to implement federation global algorithms, e.g., in YARN-7403.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7934) [GQ] Refactor preemption calculators to allow overriding for Federation Global Algos

2018-02-16 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-7934:
---
Attachment: YARN-7934.v4.patch

> [GQ] Refactor preemption calculators to allow overriding for Federation 
> Global Algos
> 
>
> Key: YARN-7934
> URL: https://issues.apache.org/jira/browse/YARN-7934
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Carlo Curino
>Assignee: Carlo Curino
>Priority: Major
> Attachments: YARN-7934.v1.patch, YARN-7934.v2.patch, 
> YARN-7934.v3.patch, YARN-7934.v4.patch
>
>
> This Jira tracks minimal changes in the capacity scheduler preemption 
> mechanics that allow for sub-classing and overriding of certain behaviors, 
> which we use to implement federation global algorithms, e.g., in YARN-7403.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7934) [GQ] Refactor preemption calculators to allow overriding for Federation Global Algos

2018-02-15 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16366516#comment-16366516
 ] 

Carlo Curino commented on YARN-7934:


[~subru] thanks for the quick review. I have adressed the javadoc comments 
issue in patch v3.

Regarding consumers you are correct they are not in this patch, but in 
YARN-7403. This is by design, the purpose of this patch is to commit to trunk 
the most basic refactoring needed while we develop algos and big stuff in the 
YARN-7402 feature branch (to limit churn on the touch points of the branch 
work).

> [GQ] Refactor preemption calculators to allow overriding for Federation 
> Global Algos
> 
>
> Key: YARN-7934
> URL: https://issues.apache.org/jira/browse/YARN-7934
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Carlo Curino
>Assignee: Carlo Curino
>Priority: Major
> Attachments: YARN-7934.v1.patch, YARN-7934.v2.patch, 
> YARN-7934.v3.patch
>
>
> This Jira tracks minimal changes in the capacity scheduler preemption 
> mechanics that allow for sub-classing and overriding of certain behaviors, 
> which we use to implement federation global algorithms, e.g., in YARN-7403.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7934) [GQ] Refactor preemption calculators to allow overriding for Federation Global Algos

2018-02-15 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-7934:
---
Attachment: YARN-7934.v3.patch

> [GQ] Refactor preemption calculators to allow overriding for Federation 
> Global Algos
> 
>
> Key: YARN-7934
> URL: https://issues.apache.org/jira/browse/YARN-7934
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Carlo Curino
>Assignee: Carlo Curino
>Priority: Major
> Attachments: YARN-7934.v1.patch, YARN-7934.v2.patch, 
> YARN-7934.v3.patch
>
>
> This Jira tracks minimal changes in the capacity scheduler preemption 
> mechanics that allow for sub-classing and overriding of certain behaviors, 
> which we use to implement federation global algorithms, e.g., in YARN-7403.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7834) [GQ] Rebalance queue configuration for load-balancing and locality affinities

2018-02-15 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16366372#comment-16366372
 ] 

Carlo Curino commented on YARN-7834:


The uploaded patch provides an Linear Programming (LP) implementation of this 
algorithms, leveraging the oljalgo solver (which ships with hadoop already, and 
contains a pure-java solver, as well as hooks to leverage an external more 
powerful solver such as Gurobi or CPLEX).

The formulation is designed to:
 # Guarantee that all queues will be allocated fully
 # Guarantee that none of the sub-clusters is allocated more capacity than it 
can take
 # It maximizes load-balancing (as a primary objective).
 # Subject to not impacting load-balancing more than a configurable delta (zero 
by default), it maximizes queue-to-sub-cluster affinity (as a secondary 
objective).

The reasons behind 3/4 being in a primary-secondary relationship (instead of a 
weighted linear combination) is that in our production experience 
load-balancing is the most concerning issue, secondary of which we aim at 
optimizing for locality.

> [GQ] Rebalance queue configuration for load-balancing and locality affinities
> -
>
> Key: YARN-7834
> URL: https://issues.apache.org/jira/browse/YARN-7834
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Carlo Curino
>Assignee: Carlo Curino
>Priority: Major
> Attachments: YARN-7834.v1.patch
>
>
> This Jira tracks algorithmic work, which will run in the GPG and will 
> rebalance the mapping of queues to sub-clusters. The current design supports 
> both balancing the "load" across sub-clusters (proportionally to their size) 
> and as a second objective to maximize the affinity between queues and the 
> sub-clusters where they historically have most demand.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7834) [GQ] Rebalance queue configuration for load-balancing and locality affinities

2018-02-15 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-7834:
---
Attachment: YARN-7834.v1.patch

> [GQ] Rebalance queue configuration for load-balancing and locality affinities
> -
>
> Key: YARN-7834
> URL: https://issues.apache.org/jira/browse/YARN-7834
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Carlo Curino
>Assignee: Carlo Curino
>Priority: Major
> Attachments: YARN-7834.v1.patch
>
>
> This Jira tracks algorithmic work, which will run in the GPG and will 
> rebalance the mapping of queues to sub-clusters. The current design supports 
> both balancing the "load" across sub-clusters (proportionally to their size) 
> and as a second objective to maximize the affinity between queues and the 
> sub-clusters where they historically have most demand.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-7725) [GQ] Compute global "ideal allocation" including locality biases

2018-02-15 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino resolved YARN-7725.

   Resolution: Duplicate
Fix Version/s: yarn-7403

Newer version of YARN-7403 subsumes this task.

> [GQ] Compute global "ideal allocation" including locality biases
> 
>
> Key: YARN-7725
> URL: https://issues.apache.org/jira/browse/YARN-7725
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation
>Reporter: Carlo Curino
>Assignee: Carlo Curino
>Priority: Major
> Fix For: yarn-7403
>
>
> This JIRA tracks an algorithmic effort to compute the global ideal 
> allocation. We also take into account of locality demand/availability gap, 
> and map down the global allocation to sub-cluster level, computing the delta+ 
> and delta- for each queue in each sub-cluster.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7403) [GQ] Compute global and local "IdealAllocation"

2018-02-15 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-7403:
---
Description: This JIRA tracks algorithmic effort to combine the local queue 
views of capacity guarantee/use/demand and compute the global ideal allocation, 
and the respective local allocations. This will inform the RMs in each 
sub-clusters on how to allocate more containers to each queues (allowing for 
temporary over/under allocations that are locally excessive, but globally 
correct).  (was: This JIRA tracks algorithmic effort to combine the local queue 
views of capacity guarantee/use/demand and compute the global amount of 
preemption, and based on that, "where" (in which sub-cluster) preemption will 
be enacted.)

> [GQ] Compute global and local "IdealAllocation"
> ---
>
> Key: YARN-7403
> URL: https://issues.apache.org/jira/browse/YARN-7403
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation
>Reporter: Carlo Curino
>Assignee: Carlo Curino
>Priority: Major
> Attachments: YARN-7403.draft.patch, YARN-7403.draft2.patch, 
> YARN-7403.draft3.patch, YARN-7403.v1.patch, global-queues-preemption.PNG
>
>
> This JIRA tracks algorithmic effort to combine the local queue views of 
> capacity guarantee/use/demand and compute the global ideal allocation, and 
> the respective local allocations. This will inform the RMs in each 
> sub-clusters on how to allocate more containers to each queues (allowing for 
> temporary over/under allocations that are locally excessive, but globally 
> correct).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7934) [GQ] Refactor preemption calculators to allow overriding for Federation Global Algos

2018-02-15 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16366215#comment-16366215
 ] 

Carlo Curino commented on YARN-7934:


Patch v2 attempts to please the YETUS gods.

This patch does not change any of the behavior, just define hooks to be used by 
sub-classes in YARN-7403, hence it doesn't require any new test.

> [GQ] Refactor preemption calculators to allow overriding for Federation 
> Global Algos
> 
>
> Key: YARN-7934
> URL: https://issues.apache.org/jira/browse/YARN-7934
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Carlo Curino
>Assignee: Carlo Curino
>Priority: Major
> Attachments: YARN-7934.v1.patch, YARN-7934.v2.patch
>
>
> This Jira tracks minimal changes in the capacity scheduler preemption 
> mechanics that allow for sub-classing and overriding of certain behaviors, 
> which we use to implement federation global algorithms, e.g., in YARN-7403.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7934) [GQ] Refactor preemption calculators to allow overriding for Federation Global Algos

2018-02-15 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-7934:
---
Attachment: YARN-7934.v2.patch

> [GQ] Refactor preemption calculators to allow overriding for Federation 
> Global Algos
> 
>
> Key: YARN-7934
> URL: https://issues.apache.org/jira/browse/YARN-7934
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Carlo Curino
>Assignee: Carlo Curino
>Priority: Major
> Attachments: YARN-7934.v1.patch, YARN-7934.v2.patch
>
>
> This Jira tracks minimal changes in the capacity scheduler preemption 
> mechanics that allow for sub-classing and overriding of certain behaviors, 
> which we use to implement federation global algorithms, e.g., in YARN-7403.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7403) [GQ] Compute global and local "IdealAllocation"

2018-02-14 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16365063#comment-16365063
 ] 

Carlo Curino commented on YARN-7403:


The v1 patch contains a much reworked version of the code. This depends on 
YARN-7934 so will not compile. The key idea here is to provide algorithms that 
will run every few seconds in the GPG and observe the overall state of a 
federated cluster. The algorithm will leverage some of the 
{{PreemptableResourceCalculator}} logic (with several additions) to compute 
what is the ideal allocation for each queues. The extra effort is put into 
considering "locality" among sub-clusters, which require some careful 
consideration. Various heuristics can be chosen from (we implement two as 
reference), and once YARN-7885/YARN-7886 will be ready to use we can experiment 
on which is most closely approximating the behavior of a single 
{{CapacityScheduler}} overlooking the entire federation.

 

> [GQ] Compute global and local "IdealAllocation"
> ---
>
> Key: YARN-7403
> URL: https://issues.apache.org/jira/browse/YARN-7403
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation
>Reporter: Carlo Curino
>Assignee: Carlo Curino
>Priority: Major
> Attachments: YARN-7403.draft.patch, YARN-7403.draft2.patch, 
> YARN-7403.draft3.patch, YARN-7403.v1.patch, global-queues-preemption.PNG
>
>
> This JIRA tracks algorithmic effort to combine the local queue views of 
> capacity guarantee/use/demand and compute the global amount of preemption, 
> and based on that, "where" (in which sub-cluster) preemption will be enacted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7403) [GQ] Compute global and local "IdealAllocation"

2018-02-14 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-7403:
---
Attachment: YARN-7403.v1.patch

> [GQ] Compute global and local "IdealAllocation"
> ---
>
> Key: YARN-7403
> URL: https://issues.apache.org/jira/browse/YARN-7403
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation
>Reporter: Carlo Curino
>Assignee: Carlo Curino
>Priority: Major
> Attachments: YARN-7403.draft.patch, YARN-7403.draft2.patch, 
> YARN-7403.draft3.patch, YARN-7403.v1.patch, global-queues-preemption.PNG
>
>
> This JIRA tracks algorithmic effort to combine the local queue views of 
> capacity guarantee/use/demand and compute the global amount of preemption, 
> and based on that, "where" (in which sub-cluster) preemption will be enacted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7934) [GQ] Refactor preemption calculators to allow overriding for Federation Global Algos

2018-02-14 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364966#comment-16364966
 ] 

Carlo Curino commented on YARN-7934:


[~leftnoteasy] the intention is to commit this directly to trunk, so we avoid 
churn, as the rest of the development will continue in YARN-7402 branch. Please 
check it out if you can, if none complaints and YETUS is happy, this will go 
straight to trunk.

> [GQ] Refactor preemption calculators to allow overriding for Federation 
> Global Algos
> 
>
> Key: YARN-7934
> URL: https://issues.apache.org/jira/browse/YARN-7934
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Carlo Curino
>Assignee: Carlo Curino
>Priority: Major
> Attachments: YARN-7934.v1.patch
>
>
> This Jira tracks minimal changes in the capacity scheduler preemption 
> mechanics that allow for sub-classing and overriding of certain behaviors, 
> which we use to implement federation global algorithms, e.g., in YARN-7403.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7934) [GQ] Refactor preemption calculators to allow overriding for Federation Global Algos

2018-02-14 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-7934:
---
Attachment: (was: More.url)

> [GQ] Refactor preemption calculators to allow overriding for Federation 
> Global Algos
> 
>
> Key: YARN-7934
> URL: https://issues.apache.org/jira/browse/YARN-7934
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Carlo Curino
>Assignee: Carlo Curino
>Priority: Major
> Attachments: YARN-7934.v1.patch
>
>
> This Jira tracks minimal changes in the capacity scheduler preemption 
> mechanics that allow for sub-classing and overriding of certain behaviors, 
> which we use to implement federation global algorithms, e.g., in YARN-7403.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7934) [GQ] Refactor preemption calculators to allow overriding for Federation Global Algos

2018-02-14 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-7934:
---
Attachment: YARN-7934.v1.patch

> [GQ] Refactor preemption calculators to allow overriding for Federation 
> Global Algos
> 
>
> Key: YARN-7934
> URL: https://issues.apache.org/jira/browse/YARN-7934
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Carlo Curino
>Assignee: Carlo Curino
>Priority: Major
> Attachments: More.url, YARN-7934.v1.patch
>
>
> This Jira tracks minimal changes in the capacity scheduler preemption 
> mechanics that allow for sub-classing and overriding of certain behaviors, 
> which we use to implement federation global algorithms, e.g., in YARN-7403.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7934) [GQ] Refactor preemption calculators to allow overriding for Federation Global Algos

2018-02-14 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-7934:
---
Attachment: More.url

> [GQ] Refactor preemption calculators to allow overriding for Federation 
> Global Algos
> 
>
> Key: YARN-7934
> URL: https://issues.apache.org/jira/browse/YARN-7934
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Carlo Curino
>Assignee: Carlo Curino
>Priority: Major
> Attachments: More.url
>
>
> This Jira tracks minimal changes in the capacity scheduler preemption 
> mechanics that allow for sub-classing and overriding of certain behaviors, 
> which we use to implement federation global algorithms, e.g., in YARN-7403.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-7934) [GQ] Refactor preemption calculators to allow overriding for Federation Global Algos

2018-02-14 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino reassigned YARN-7934:
--

Assignee: Carlo Curino

> [GQ] Refactor preemption calculators to allow overriding for Federation 
> Global Algos
> 
>
> Key: YARN-7934
> URL: https://issues.apache.org/jira/browse/YARN-7934
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Carlo Curino
>Assignee: Carlo Curino
>Priority: Major
>
> This Jira tracks minimal changes in the capacity scheduler preemption 
> mechanics that allow for sub-classing and overriding of certain behaviors, 
> which we use to implement federation global algorithms, e.g., in YARN-7403.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-7934) [GQ] Refactor preemption calculators to allow overriding for Federation Global Algos

2018-02-14 Thread Carlo Curino (JIRA)
Carlo Curino created YARN-7934:
--

 Summary: [GQ] Refactor preemption calculators to allow overriding 
for Federation Global Algos
 Key: YARN-7934
 URL: https://issues.apache.org/jira/browse/YARN-7934
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Carlo Curino


This Jira tracks minimal changes in the capacity scheduler preemption mechanics 
that allow for sub-classing and overriding of certain behaviors, which we use 
to implement federation global algorithms, e.g., in YARN-7403.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6528) [PERF/TEST] Add JMX metrics for Plan Follower and Agent Placement and Plan Operations

2018-02-14 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-6528:
---
Summary: [PERF/TEST] Add JMX metrics for Plan Follower and Agent Placement 
and Plan Operations  (was: Add JMX metrics for Plan Follower and Agent 
Placement and Plan Operations)

> [PERF/TEST] Add JMX metrics for Plan Follower and Agent Placement and Plan 
> Operations
> -
>
> Key: YARN-6528
> URL: https://issues.apache.org/jira/browse/YARN-6528
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Sean Po
>Assignee: Xiaohua (Victor) Liang
>Priority: Major
> Attachments: YARN-6528.v001.patch, YARN-6528.v002.patch, 
> YARN-6528.v003.patch, YARN-6528.v004.patch, YARN-6528.v005.patch, 
> YARN-6528.v006.patch, YARN-6528.v007.patch
>
>
> YARN-1051 introduced a ReservationSytem that enables the YARN RM to handle 
> time explicitly, i.e. users can now "reserve" capacity ahead of time which is 
> predictably allocated to them. In order to understand in finer detail the 
> performance of Rayon, YARN-6528 proposes to include JMX metrics in the Plan 
> Follower, Agent Placement and Plan Operations components of Rayon.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6528) Add JMX metrics for Plan Follower and Agent Placement and Plan Operations

2018-02-14 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364757#comment-16364757
 ] 

Carlo Curino commented on YARN-6528:


Thanks [~seanpo03], I am ok with the left-over checkstyle. I am assigning this 
to [~lxhfirenking] who volunteered to rebase and extend this, I will mark you 
both as contributor when we get to commit this.

[~lxhfirenking] please see if you can shush the checkstyle using 
{{@SuppressWarnings("checkstyle:XYZ")}} , with XYZ being the right checkstyle 
rule (or similar tricks).

> Add JMX metrics for Plan Follower and Agent Placement and Plan Operations
> -
>
> Key: YARN-6528
> URL: https://issues.apache.org/jira/browse/YARN-6528
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Sean Po
>Assignee: Xiaohua (Victor) Liang
>Priority: Major
> Attachments: YARN-6528.v001.patch, YARN-6528.v002.patch, 
> YARN-6528.v003.patch, YARN-6528.v004.patch, YARN-6528.v005.patch, 
> YARN-6528.v006.patch, YARN-6528.v007.patch
>
>
> YARN-1051 introduced a ReservationSytem that enables the YARN RM to handle 
> time explicitly, i.e. users can now "reserve" capacity ahead of time which is 
> predictably allocated to them. In order to understand in finer detail the 
> performance of Rayon, YARN-6528 proposes to include JMX metrics in the Plan 
> Follower, Agent Placement and Plan Operations components of Rayon.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-6528) Add JMX metrics for Plan Follower and Agent Placement and Plan Operations

2018-02-14 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino reassigned YARN-6528:
--

Assignee: Xiaohua (Victor) Liang  (was: Carlo Curino)

> Add JMX metrics for Plan Follower and Agent Placement and Plan Operations
> -
>
> Key: YARN-6528
> URL: https://issues.apache.org/jira/browse/YARN-6528
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Sean Po
>Assignee: Xiaohua (Victor) Liang
>Priority: Major
> Attachments: YARN-6528.v001.patch, YARN-6528.v002.patch, 
> YARN-6528.v003.patch, YARN-6528.v004.patch, YARN-6528.v005.patch, 
> YARN-6528.v006.patch, YARN-6528.v007.patch
>
>
> YARN-1051 introduced a ReservationSytem that enables the YARN RM to handle 
> time explicitly, i.e. users can now "reserve" capacity ahead of time which is 
> predictably allocated to them. In order to understand in finer detail the 
> performance of Rayon, YARN-6528 proposes to include JMX metrics in the Plan 
> Follower, Agent Placement and Plan Operations components of Rayon.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6528) Add JMX metrics for Plan Follower and Agent Placement and Plan Operations

2018-02-14 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-6528:
---
Issue Type: Sub-task  (was: Task)
Parent: YARN-7402

> Add JMX metrics for Plan Follower and Agent Placement and Plan Operations
> -
>
> Key: YARN-6528
> URL: https://issues.apache.org/jira/browse/YARN-6528
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Sean Po
>Assignee: Sean Po
>Priority: Major
> Attachments: YARN-6528.v001.patch, YARN-6528.v002.patch, 
> YARN-6528.v003.patch, YARN-6528.v004.patch, YARN-6528.v005.patch, 
> YARN-6528.v006.patch, YARN-6528.v007.patch
>
>
> YARN-1051 introduced a ReservationSytem that enables the YARN RM to handle 
> time explicitly, i.e. users can now "reserve" capacity ahead of time which is 
> predictably allocated to them. In order to understand in finer detail the 
> performance of Rayon, YARN-6528 proposes to include JMX metrics in the Plan 
> Follower, Agent Placement and Plan Operations components of Rayon.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-7614) [RESERVATION] Support Reservation APIs in Federation Router

2018-01-31 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino reassigned YARN-7614:
--

Assignee: Giovanni Matteo Fumarola

> [RESERVATION] Support Reservation APIs in Federation Router
> ---
>
> Key: YARN-7614
> URL: https://issues.apache.org/jira/browse/YARN-7614
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation, reservation system
>Reporter: Carlo Curino
>Assignee: Giovanni Matteo Fumarola
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-7404) [GQ] propagate to GPG queue-level utilization/pending information

2018-01-31 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino reassigned YARN-7404:
--

Assignee: Jose Miguel Arreola

> [GQ] propagate to GPG queue-level utilization/pending information
> -
>
> Key: YARN-7404
> URL: https://issues.apache.org/jira/browse/YARN-7404
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation
>Reporter: Carlo Curino
>Assignee: Jose Miguel Arreola
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-7870) [PERF/TEST] Performance testing of ReservationSystem at high job submission rates

2018-01-31 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino reassigned YARN-7870:
--

Assignee: Xiaohua (Victor) Liang

> [PERF/TEST] Performance testing of ReservationSystem at high job submission 
> rates
> -
>
> Key: YARN-7870
> URL: https://issues.apache.org/jira/browse/YARN-7870
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Carlo Curino
>Assignee: Xiaohua (Victor) Liang
>Priority: Major
>
> To leverage the ReservationSystem as a gang-semantics enforcer for all jobs 
> of  a large federation, we need to evaluate it can sustain large number of 
> job submissions (and replanning) per second. This Jira tracks this validation 
> effort.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-7869) [PERF/TEST] Performance testing of CapacityScheudler at many-thousands of queues

2018-01-31 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino reassigned YARN-7869:
--

Assignee: Abhishek Modi

> [PERF/TEST] Performance testing of CapacityScheudler at many-thousands of 
> queues
> 
>
> Key: YARN-7869
> URL: https://issues.apache.org/jira/browse/YARN-7869
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Carlo Curino
>Assignee: Abhishek Modi
>Priority: Major
>
> The CapacityScheduler is known to work well at tens to hundreds of queues. 
> This Jira tracks performance testing at much larger scale thousands of 
> queues, and deep queue hierachies >10 levels. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7870) [PERF/TEST] Performance testing of ReservationSystem at high job submission rates

2018-01-31 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347741#comment-16347741
 ] 

Carlo Curino commented on YARN-7870:


Yes! It is, in fact, already extended to support reservations (YARN-6363 if I 
am not mistaken), and to run a {{MetricsInvariantChecker}} (YARN-6451 and 
YARN-6547) to validate some of the performance/correctness. In this Jira (and 
others in the same umbrella and in the SLS umbrella, e.g., YARN-7798) we plan 
to build upon it to give us a solid testing and perf-testing platform for the 
various algorithmic/protocol additions that we are planning in YANR-7402 (and 
YARN in general). 

> [PERF/TEST] Performance testing of ReservationSystem at high job submission 
> rates
> -
>
> Key: YARN-7870
> URL: https://issues.apache.org/jira/browse/YARN-7870
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Carlo Curino
>Priority: Major
>
> To leverage the ReservationSystem as a gang-semantics enforcer for all jobs 
> of  a large federation, we need to evaluate it can sustain large number of 
> job submissions (and replanning) per second. This Jira tracks this validation 
> effort.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7833) [PERF/TEST] Extend SLS to support simulation of a Federated Environment

2018-01-31 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-7833:
---
Summary: [PERF/TEST] Extend SLS to support simulation of a Federated 
Environment  (was: Extend SLS to support simulation of a Federated Environment)

> [PERF/TEST] Extend SLS to support simulation of a Federated Environment
> ---
>
> Key: YARN-7833
> URL: https://issues.apache.org/jira/browse/YARN-7833
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Carlo Curino
>Assignee: Jose Miguel Arreola
>Priority: Major
>
> To develop algorithms for federation, it would be of great help to have a 
> version of SLS that supports multi RMs and GPG.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-7870) [PERF/TEST] Performance testing of ReservationSystem at high job submission rates

2018-01-31 Thread Carlo Curino (JIRA)
Carlo Curino created YARN-7870:
--

 Summary: [PERF/TEST] Performance testing of ReservationSystem at 
high job submission rates
 Key: YARN-7870
 URL: https://issues.apache.org/jira/browse/YARN-7870
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Carlo Curino


To leverage the ReservationSystem as a gang-semantics enforcer for all jobs of  
a large federation, we need to evaluate it can sustain large number of job 
submissions (and replanning) per second. This Jira tracks this validation 
effort.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-7869) [PERF/TEST] Performance testing of CapacityScheudler at many-thousands of queues

2018-01-31 Thread Carlo Curino (JIRA)
Carlo Curino created YARN-7869:
--

 Summary: [PERF/TEST] Performance testing of CapacityScheudler at 
many-thousands of queues
 Key: YARN-7869
 URL: https://issues.apache.org/jira/browse/YARN-7869
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Carlo Curino


The CapacityScheduler is known to work well at tens to hundreds of queues. This 
Jira tracks performance testing at much larger scale thousands of queues, and 
deep queue hierachies >10 levels. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-7833) Extend SLS to support simulation of a Federated Environment

2018-01-26 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino reassigned YARN-7833:
--

Assignee: Jose Miguel Arreola

> Extend SLS to support simulation of a Federated Environment
> ---
>
> Key: YARN-7833
> URL: https://issues.apache.org/jira/browse/YARN-7833
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Carlo Curino
>Assignee: Jose Miguel Arreola
>Priority: Major
>
> To develop algorithms for federation, it would be of great help to have a 
> version of SLS that supports multi RMs and GPG.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-7615) [RESERVATION] Federation StateStore: support storage/retrieval of reservations

2018-01-26 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino reassigned YARN-7615:
--

Assignee: Giovanni Matteo Fumarola

> [RESERVATION] Federation StateStore: support storage/retrieval of reservations
> --
>
> Key: YARN-7615
> URL: https://issues.apache.org/jira/browse/YARN-7615
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation
>Reporter: Carlo Curino
>Assignee: Giovanni Matteo Fumarola
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-7405) [GQ] Bias container allocations based on global view

2018-01-26 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino reassigned YARN-7405:
--

Assignee: Subru Krishnan

> [GQ] Bias container allocations based on global view
> 
>
> Key: YARN-7405
> URL: https://issues.apache.org/jira/browse/YARN-7405
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation
>Reporter: Carlo Curino
>Assignee: Subru Krishnan
>Priority: Major
>
> Each RM in a federation should bias its local allocations of containers based 
> on the global over/under utilization of queues. As part of this the local RM 
> should account for the work that other RMs will be doing in between the 
> updates we receive via the heartbeats of YARN-7404 (the mechanics used for 
> synchronization).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-7834) [GQ] Rebalance queue configuration for load-balancing and locality affinities

2018-01-26 Thread Carlo Curino (JIRA)
Carlo Curino created YARN-7834:
--

 Summary: [GQ] Rebalance queue configuration for load-balancing and 
locality affinities
 Key: YARN-7834
 URL: https://issues.apache.org/jira/browse/YARN-7834
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Carlo Curino


This Jira tracks algorithmic work, which will run in the GPG and will rebalance 
the mapping of queues to sub-clusters. The current design supports both 
balancing the "load" across sub-clusters (proportionally to their size) and as 
a second objective to maximize the affinity between queues and the sub-clusters 
where they historically have most demand.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-7834) [GQ] Rebalance queue configuration for load-balancing and locality affinities

2018-01-26 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino reassigned YARN-7834:
--

Assignee: Carlo Curino

> [GQ] Rebalance queue configuration for load-balancing and locality affinities
> -
>
> Key: YARN-7834
> URL: https://issues.apache.org/jira/browse/YARN-7834
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Carlo Curino
>Assignee: Carlo Curino
>Priority: Major
>
> This Jira tracks algorithmic work, which will run in the GPG and will 
> rebalance the mapping of queues to sub-clusters. The current design supports 
> both balancing the "load" across sub-clusters (proportionally to their size) 
> and as a second objective to maximize the affinity between queues and the 
> sub-clusters where they historically have most demand.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7403) [GQ] Compute global and local "IdealAllocation"

2018-01-26 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-7403:
---
Summary: [GQ] Compute global and local "IdealAllocation"  (was: [GQ] 
Compute global and local preemption)

> [GQ] Compute global and local "IdealAllocation"
> ---
>
> Key: YARN-7403
> URL: https://issues.apache.org/jira/browse/YARN-7403
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation
>Reporter: Carlo Curino
>Assignee: Carlo Curino
>Priority: Major
> Attachments: YARN-7403.draft.patch, YARN-7403.draft2.patch, 
> YARN-7403.draft3.patch, global-queues-preemption.PNG
>
>
> This JIRA tracks algorithmic effort to combine the local queue views of 
> capacity guarantee/use/demand and compute the global amount of preemption, 
> and based on that, "where" (in which sub-cluster) preemption will be enacted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-7833) Extend SLS to support simulation of a Federated Environment

2018-01-26 Thread Carlo Curino (JIRA)
Carlo Curino created YARN-7833:
--

 Summary: Extend SLS to support simulation of a Federated 
Environment
 Key: YARN-7833
 URL: https://issues.apache.org/jira/browse/YARN-7833
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Carlo Curino


To develop algorithms for federation, it would be of great help to have a 
version of SLS that supports multi RMs and GPG.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6648) [GPG] Add SubClusterCleaner in Global Policy Generator

2018-01-24 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16338144#comment-16338144
 ] 

Carlo Curino commented on YARN-6648:


[~botong] thanks for updating the patch, +1 from me, with the following minor 
issues:
 # Fix the findbugs exclusion (I see you are already trying to do so, but seems 
that your offering have not please the Yetus gods yet :)).
 # (minor) {{SubclusterCleaner}} line  98 the {{LOG.info}} seems a bit 
redundant, maybe LOG.debug? I see it being useful while debugging, but during 
normal operations is somewhat unnecessary

> [GPG] Add SubClusterCleaner in Global Policy Generator
> --
>
> Key: YARN-6648
> URL: https://issues.apache.org/jira/browse/YARN-6648
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Botong Huang
>Assignee: Botong Huang
>Priority: Minor
>  Labels: federation, gpg
> Attachments: YARN-6648-YARN-2915.v1.patch, 
> YARN-6648-YARN-7402.v2.patch, YARN-6648-YARN-7402.v3.patch, 
> YARN-6648-YARN-7402.v4.patch, YARN-6648-YARN-7402.v5.patch, 
> YARN-6648-YARN-7402.v6.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-6648) [GPG] Add SubClusterCleaner in Global Policy Generator

2018-01-19 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16332694#comment-16332694
 ] 

Carlo Curino edited comment on YARN-6648 at 1/19/18 6:37 PM:
-

[~botong] thanks for the updated patch, I think it is nicer to have them 
combined (easier to follow).

Here a few questions/suggestions (some pretty minor, some more important):
 # in {{MemoryFederationStateStore.setSubClusterLastHeartbeat}} why do you go 
through {{getSubcluster}} instead of just doing 
{{membership.get(subClusterId).setLastHeartBeat(longHeartBeat)}} ?
 # In {{GPGUtils}} consider using {{DurationFormatUtils.formatDuration(long, 
string_format)}}, instead of the code you have.
 # In {{GlobalPolicyGenerator}}
 ## should we keep the string constants here, or have them in 
{{YarnConfiguration}} or other places where those are usually defined?
 ## Is the {{SubClusterCleanerService}} required by every Federation 
deployment, or is it something we might want to make configurable (runs only if 
turned on). More generally, should we have a generic mechanism to "start 
services" in the GPG?
 # In {{SubClusterCleaner}}
 ## line 77, is there a way for us to "check" whether the format in the 
{{StateStore}} is local or UTC? Related is the code around line 100, you seem 
to doubt the format, and be conservative about it, which might mean the 
clean-up is at times could be delayed by many hours. Anything better than 
assuming things and/or being overly conservative?
 ## In {{SubClusterCleaner}} line 87, maybe a bit verbose? Should some of this 
be {{LOG.debug}} instead (if so, wrap it in the usual {{if(debugEnabled)}} 
check)?
 ## What do you do in case the subCluster {{isUnusable()}}?
 # In {{SubClusterCleanerService}}
 ## type in Javadoc GPE
 ## I assume we will have many similar "actions run on a schedule", can you 
make this class more generic (templatize it, so we can re-use it)?
 ## If the threads crashes, do we have something that restarts it? I see it 
throws {{Exception}}, anyone restarting the service if it throws?


was (Author: curino):
[~botong] thanks for the updated patch, I think it is nicer to have them 
combined (easier to follow).

Here a few questions/suggestions (some pretty minor, some more important):
 # in {{MemoryFederationStateStore.setSubClusterLastHeartbeat}} why do you go 
through {{getSubcluster}} instead of just doing 
{{membership.get(subClusterId).setLastHeartBeat(longHeartBeat)}} ?
 # In {{GPGUtils}} consider using {{DurationFormatUtils.formatDuration(long, 
string_format)}}, instead of the code you have.
 # In {{GlobalPolicyGenerator}}
 ## should we keep the string constants here, or have them in 
{{YarnConfiguration}} or other places where those are usually defined?
 ## Is the {{SubClusterCleanerService}} required by every Federation 
deployment, or is it something we might want to make configurable (runs only if 
turned on). More generally, should we have a generic mechanism to "start 
services" in the GPG?
 # In {{SubClusterCleaner}}
 ## line 77, is there a way for us to "check" whether the format in the 
{{StateStore}} is local or UTC? Related is the code around line 100, you seem 
to doubt the format, and be conservative about it, which might mean the 
clean-up is at times could be delayed by many hours. Anything better than 
assuming things and/or being overly conservative?
 ## In {{SubClusterCleaner}} line 87, maybe a bit verbose? Should some of this 
be {{LOG.debug}} instead (if so, wrap it in the usual {{if(debugEnabled)}} 
check)?
 ## What do you do in case the subCluster {{isUnusable()}}?
 #In \{{SubClusterCleanerService }}
 ## type in Javadoc GPE
 ## I assume we will have many similar "actions run on a schedule", can you 
make this class more generic (templatize it, so we can re-use it)?
 ## If the threads crashes, do we have something that restarts it? I see it 
throws {{Exception}}, anyone restarting the service if it throws?

> [GPG] Add SubClusterCleaner in Global Policy Generator
> --
>
> Key: YARN-6648
> URL: https://issues.apache.org/jira/browse/YARN-6648
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Botong Huang
>Assignee: Botong Huang
>Priority: Minor
>  Labels: federation, gpg
> Attachments: YARN-6648-YARN-2915.v1.patch, 
> YARN-6648-YARN-7402.v2.patch, YARN-6648-YARN-7402.v3.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6648) [GPG] Add SubClusterCleaner in Global Policy Generator

2018-01-19 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16332694#comment-16332694
 ] 

Carlo Curino commented on YARN-6648:


[~botong] thanks for the updated patch, I think it is nicer to have them 
combined (easier to follow).

Here a few questions/suggestions (some pretty minor, some more important):
 # in {{MemoryFederationStateStore.setSubClusterLastHeartbeat}} why do you go 
through {{getSubcluster}} instead of just doing 
{{membership.get(subClusterId).setLastHeartBeat(longHeartBeat)}} ?
 # In {{GPGUtils}} consider using {{DurationFormatUtils.formatDuration(long, 
string_format)}}, instead of the code you have.
 # In {{GlobalPolicyGenerator}}
 ## should we keep the string constants here, or have them in 
{{YarnConfiguration}} or other places where those are usually defined?
 ## Is the {{SubClusterCleanerService}} required by every Federation 
deployment, or is it something we might want to make configurable (runs only if 
turned on). More generally, should we have a generic mechanism to "start 
services" in the GPG?
 # In {{SubClusterCleaner}}
 ## line 77, is there a way for us to "check" whether the format in the 
{{StateStore}} is local or UTC? Related is the code around line 100, you seem 
to doubt the format, and be conservative about it, which might mean the 
clean-up is at times could be delayed by many hours. Anything better than 
assuming things and/or being overly conservative?
 ## In {{SubClusterCleaner}} line 87, maybe a bit verbose? Should some of this 
be {{LOG.debug}} instead (if so, wrap it in the usual {{if(debugEnabled)}} 
check)?
 ## What do you do in case the subCluster {{isUnusable()}}?
 #In \{{SubClusterCleanerService }}
 ## type in Javadoc GPE
 ## I assume we will have many similar "actions run on a schedule", can you 
make this class more generic (templatize it, so we can re-use it)?
 ## If the threads crashes, do we have something that restarts it? I see it 
throws {{Exception}}, anyone restarting the service if it throws?

> [GPG] Add SubClusterCleaner in Global Policy Generator
> --
>
> Key: YARN-6648
> URL: https://issues.apache.org/jira/browse/YARN-6648
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Botong Huang
>Assignee: Botong Huang
>Priority: Minor
>  Labels: federation, gpg
> Attachments: YARN-6648-YARN-2915.v1.patch, 
> YARN-6648-YARN-7402.v2.patch, YARN-6648-YARN-7402.v3.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3660) [GPG] Federation Global Policy Generator (service hook only)

2018-01-18 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16331562#comment-16331562
 ] 

Carlo Curino commented on YARN-3660:


[~botong] thanks for the contribution, v4 patch looks good. I committed it to 
the dev-branch YARN-7402.

> [GPG] Federation Global Policy Generator (service hook only)
> 
>
> Key: YARN-3660
> URL: https://issues.apache.org/jira/browse/YARN-3660
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Carlo Curino
>Assignee: Botong Huang
>Priority: Major
>  Labels: federation, gpg
> Attachments: YARN-3660-YARN-7402.v1.patch, 
> YARN-3660-YARN-7402.v2.patch, YARN-3660-YARN-7402.v3.patch, 
> YARN-3660-YARN-7402.v3.patch, YARN-3660-YARN-7402.v3.patch, 
> YARN-3660-YARN-7402.v4.patch
>
>
> In a federated environment, local impairments of one sub-cluster might 
> unfairly affect users/queues that are mapped to that sub-cluster. A 
> centralized component (GPG) runs out-of-band and edits the policies governing 
> how users/queues are allocated to sub-clusters. This allows us to enforce 
> global invariants (by dynamically updating locally-enforced invariants).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6648) [GPG] Add FederationStateStore interfaces for Global Policy Generator

2018-01-12 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16324373#comment-16324373
 ] 

Carlo Curino commented on YARN-6648:


[~botong] the changes in this JIRA seem fine/harmless. However, since I don't 
see the code that will use them are a bit pointless as is. 
There is a bit of a trade off between breaking things down in small easy to 
review JIRAs and keeping things together so that changes 
are justified. In this case, I think we might have been over-zealous in keeping 
patches small. Please combine this with the JIRA that uses 
them,and mark this as duplicate.

> [GPG] Add FederationStateStore interfaces for Global Policy Generator
> -
>
> Key: YARN-6648
> URL: https://issues.apache.org/jira/browse/YARN-6648
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Botong Huang
>Assignee: Botong Huang
>Priority: Minor
>  Labels: federation, gpg
> Attachments: YARN-6648-YARN-2915.v1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3660) [GPG] Federation Global Policy Generator (service hook only)

2018-01-12 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16324362#comment-16324362
 ] 

Carlo Curino commented on YARN-3660:


[~botong] the patch seems generally ok but please:
# Address checkstyle issues and write some  good Javadoc for the top classes
# Please provide some basic test, even just that the boot is correct or fails 
on disable federation, the empty Test class is not ok (better none at all and 
explain why when QA complaints).
# This JIRA is an empty service that you will need in other patches, let's 
rename the JIRA (my attempt is meah if you can find a better title)
# The patch is  reasonably small, so please include the bash/powershel scripts 
to start/stop/restart the GPG (you can look at what was done for the Federation 
Router)

> [GPG] Federation Global Policy Generator (service hook only)
> 
>
> Key: YARN-3660
> URL: https://issues.apache.org/jira/browse/YARN-3660
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Carlo Curino
>Assignee: Botong Huang
>  Labels: federation, gpg
> Attachments: YARN-3660-YARN-7402.v1.patch
>
>
> In a federated environment, local impairments of one sub-cluster might 
> unfairly affect users/queues that are mapped to that sub-cluster. A 
> centralized component (GPG) runs out-of-band and edits the policies governing 
> how users/queues are allocated to sub-clusters. This allows us to enforce 
> global invariants (by dynamically updating locally-enforced invariants).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-3660) [GPG] Federation Global Policy Generator (service hook only)

2018-01-12 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-3660:
---
Summary: [GPG] Federation Global Policy Generator (service hook only)  
(was: [GPG] Federation Global Policy Generator (load balancing))

> [GPG] Federation Global Policy Generator (service hook only)
> 
>
> Key: YARN-3660
> URL: https://issues.apache.org/jira/browse/YARN-3660
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Carlo Curino
>Assignee: Botong Huang
>  Labels: federation, gpg
> Attachments: YARN-3660-YARN-7402.v1.patch
>
>
> In a federated environment, local impairments of one sub-cluster might 
> unfairly affect users/queues that are mapped to that sub-cluster. A 
> centralized component (GPG) runs out-of-band and edits the policies governing 
> how users/queues are allocated to sub-clusters. This allows us to enforce 
> global invariants (by dynamically updating locally-enforced invariants).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7402) Federation V2: Global Optimizations

2018-01-09 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16319464#comment-16319464
 ] 

Carlo Curino commented on YARN-7402:


I created a dev-branch YARN-7402 for the activities of this umbrella JIRA.

> Federation V2: Global Optimizations
> ---
>
> Key: YARN-7402
> URL: https://issues.apache.org/jira/browse/YARN-7402
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: federation
>Reporter: Carlo Curino
>Assignee: Carlo Curino
>
> YARN Federation today requires manual configuration of queues within each 
> sub-cluster, and each RM operates "in isolation". This has few issues:
> # Preemption is computed locally (and might far exceed the global need)
> # Jobs within a queue are forced to consume their resources "evenly" based on 
> queue mapping
> This umbrella JIRA tracks a new feature that leverages the 
> FederationStateStore as a synchronization mechanism among RMs, and allows for 
> allocation and preemption decisions to be based on a (close to up-to-date) 
> global view of the cluster allocation and demand. The JIRA also tracks 
> algorithms to automatically generate policies for Router and AMRMProxy to 
> shape the traffic to each sub-cluster, and general "maintenance" of the 
> FederationStateStore.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-7725) [GQ] Compute global "ideal allocation" including locality biases

2018-01-09 Thread Carlo Curino (JIRA)
Carlo Curino created YARN-7725:
--

 Summary: [GQ] Compute global "ideal allocation" including locality 
biases
 Key: YARN-7725
 URL: https://issues.apache.org/jira/browse/YARN-7725
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Carlo Curino


This JIRA tracks an algorithmic effort to compute the global ideal allocation. 
We also take into account of locality demand/availability gap, and map down the 
global allocation to sub-cluster level, computing the delta+ and delta- for 
each queue in each sub-cluster.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-7725) [GQ] Compute global "ideal allocation" including locality biases

2018-01-09 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino reassigned YARN-7725:
--

Assignee: Carlo Curino

> [GQ] Compute global "ideal allocation" including locality biases
> 
>
> Key: YARN-7725
> URL: https://issues.apache.org/jira/browse/YARN-7725
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation
>Reporter: Carlo Curino
>Assignee: Carlo Curino
>
> This JIRA tracks an algorithmic effort to compute the global ideal 
> allocation. We also take into account of locality demand/availability gap, 
> and map down the global allocation to sub-cluster level, computing the delta+ 
> and delta- for each queue in each sub-cluster.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7403) [GQ] Compute global and local preemption

2018-01-09 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-7403:
---
Summary: [GQ] Compute global and local preemption  (was: Compute global and 
local preemption)

> [GQ] Compute global and local preemption
> 
>
> Key: YARN-7403
> URL: https://issues.apache.org/jira/browse/YARN-7403
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Attachments: YARN-7403.draft.patch, YARN-7403.draft2.patch, 
> YARN-7403.draft3.patch, global-queues-preemption.PNG
>
>
> This JIRA tracks algorithmic effort to combine the local queue views of 
> capacity guarantee/use/demand and compute the global amount of preemption, 
> and based on that, "where" (in which sub-cluster) preemption will be enacted.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7405) [GQ] Bias container allocations based on global view

2018-01-09 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-7405:
---
Summary: [GQ] Bias container allocations based on global view  (was: Bias 
container allocations based on global view)

> [GQ] Bias container allocations based on global view
> 
>
> Key: YARN-7405
> URL: https://issues.apache.org/jira/browse/YARN-7405
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation
>Reporter: Carlo Curino
>
> Each RM in a federation should bias its local allocations of containers based 
> on the global over/under utilization of queues. As part of this the local RM 
> should account for the work that other RMs will be doing in between the 
> updates we receive via the heartbeats of YARN-7404 (the mechanics used for 
> synchronization).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7404) [GQ] propagate to GPG queue-level utilization/pending information

2018-01-09 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-7404:
---
Summary: [GQ] propagate to GPG queue-level utilization/pending information  
(was: RM federation heartbeat to StateStore must include "queue state" )

> [GQ] propagate to GPG queue-level utilization/pending information
> -
>
> Key: YARN-7404
> URL: https://issues.apache.org/jira/browse/YARN-7404
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation
>Reporter: Carlo Curino
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7614) Support Reservation APIs in Federation Router

2018-01-09 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-7614:
---
Component/s: reservation system

> Support Reservation APIs in Federation Router
> -
>
> Key: YARN-7614
> URL: https://issues.apache.org/jira/browse/YARN-7614
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation, reservation system
>Reporter: Carlo Curino
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7615) [RESERVATION] Federation StateStore: support storage/retrieval of reservations

2018-01-09 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-7615:
---
Summary: [RESERVATION] Federation StateStore: support storage/retrieval of 
reservations  (was: Federation StateStore: support storage/retrieval of 
reservations)

> [RESERVATION] Federation StateStore: support storage/retrieval of reservations
> --
>
> Key: YARN-7615
> URL: https://issues.apache.org/jira/browse/YARN-7615
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation
>Reporter: Carlo Curino
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7614) [RESERVATION] Support Reservation APIs in Federation Router

2018-01-09 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-7614:
---
Summary: [RESERVATION] Support Reservation APIs in Federation Router  (was: 
Support Reservation APIs in Federation Router)

> [RESERVATION] Support Reservation APIs in Federation Router
> ---
>
> Key: YARN-7614
> URL: https://issues.apache.org/jira/browse/YARN-7614
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation, reservation system
>Reporter: Carlo Curino
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5871) [RESERVATION] Add support for reservation-based routing.

2018-01-09 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-5871:
---
Labels: federation reservation  (was: federation)

> [RESERVATION] Add support for reservation-based routing.
> 
>
> Key: YARN-5871
> URL: https://issues.apache.org/jira/browse/YARN-5871
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Affects Versions: YARN-2915
>Reporter: Carlo Curino
>Assignee: Carlo Curino
>  Labels: federation, reservation
> Attachments: YARN-5871-YARN-2915.01.patch, 
> YARN-5871-YARN-2915.01.patch, YARN-5871-YARN-2915.02.patch, 
> YARN-5871-YARN-2915.03.patch, YARN-5871-YARN-2915.04.patch
>
>
> Adding policies that can route reservations, and that then route applications 
> to where the reservation have been placed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5871) [RESERVATION] Add support for reservation-based routing.

2018-01-09 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-5871:
---
Summary: [RESERVATION] Add support for reservation-based routing.  (was: 
Add support for reservation-based routing.)

> [RESERVATION] Add support for reservation-based routing.
> 
>
> Key: YARN-5871
> URL: https://issues.apache.org/jira/browse/YARN-5871
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Affects Versions: YARN-2915
>Reporter: Carlo Curino
>Assignee: Carlo Curino
>  Labels: federation, reservation
> Attachments: YARN-5871-YARN-2915.01.patch, 
> YARN-5871-YARN-2915.01.patch, YARN-5871-YARN-2915.02.patch, 
> YARN-5871-YARN-2915.03.patch, YARN-5871-YARN-2915.04.patch
>
>
> Adding policies that can route reservations, and that then route applications 
> to where the reservation have been placed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7402) Federation V2: Global Optimizations

2018-01-09 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-7402:
---
Summary: Federation V2: Global Optimizations  (was: Federation: Global 
Queues)
Description: 
YARN Federation today requires manual configuration of queues within each 
sub-cluster, and each RM operates "in isolation". This has few issues:
# Preemption is computed locally (and might far exceed the global need)
# Jobs within a queue are forced to consume their resources "evenly" based on 
queue mapping

This umbrella JIRA tracks a new feature that leverages the FederationStateStore 
as a synchronization mechanism among RMs, and allows for allocation and 
preemption decisions to be based on a (close to up-to-date) global view of the 
cluster allocation and demand. The JIRA also tracks algorithms to automatically 
generate policies for Router and AMRMProxy to shape the traffic to each 
sub-cluster, and general "maintenance" of the FederationStateStore.


  was:
YARN Federation today requires manual configuration of queues within each 
sub-cluster, and each RM operates "in isolation". This has few issues:
# Preemption is computed locally (and might far exceed the global need)
# Jobs within a queue are forced to consume their resources "evenly" based on 
queue mapping

This umbrella JIRA tracks a new feature that leverages the FederationStateStore 
as a synchronization mechanism among RMs, and allows for allocation and 
preemption decisions to be based on a (close to up-to-date) global view of the 
cluster allocation and demand.



> Federation V2: Global Optimizations
> ---
>
> Key: YARN-7402
> URL: https://issues.apache.org/jira/browse/YARN-7402
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: federation
>Reporter: Carlo Curino
>Assignee: Carlo Curino
>
> YARN Federation today requires manual configuration of queues within each 
> sub-cluster, and each RM operates "in isolation". This has few issues:
> # Preemption is computed locally (and might far exceed the global need)
> # Jobs within a queue are forced to consume their resources "evenly" based on 
> queue mapping
> This umbrella JIRA tracks a new feature that leverages the 
> FederationStateStore as a synchronization mechanism among RMs, and allows for 
> allocation and preemption decisions to be based on a (close to up-to-date) 
> global view of the cluster allocation and demand. The JIRA also tracks 
> algorithms to automatically generate policies for Router and AMRMProxy to 
> shape the traffic to each sub-cluster, and general "maintenance" of the 
> FederationStateStore.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7708) [GPG] Load based policy generator

2018-01-09 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-7708:
---
Parent Issue: YARN-7402  (was: YARN-5597)

> [GPG] Load based policy generator
> -
>
> Key: YARN-7708
> URL: https://issues.apache.org/jira/browse/YARN-7708
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Carlo Curino
>Assignee: Young Chen
>
> This policy reads load from the "pendingQueueLength" metrics and provides 
> scaling into a set of weights that influence the AMRMProxy and Router 
> behaviors.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7599) [GPG] Application cleaner and subcluster cleaner in Global Policy Generator

2018-01-09 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-7599:
---
Parent Issue: YARN-7402  (was: YARN-5597)

> [GPG] Application cleaner and subcluster cleaner in Global Policy Generator
> ---
>
> Key: YARN-7599
> URL: https://issues.apache.org/jira/browse/YARN-7599
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Botong Huang
>Assignee: Botong Huang
>Priority: Minor
>  Labels: federation, gpg
>
> In Federation, we need a cleanup service for StateStore as well as Yarn 
> Registry. For the former, we need to remove old application records as well 
> as inactive subclusters. For the latter, failed and killed applications might 
> leave records in the Yarn Registry (see YARN-6128). We plan to add both 
> cleanup service in GPG



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7707) [GPG] Policy generator framework

2018-01-09 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-7707:
---
Parent Issue: YARN-7402  (was: YARN-5597)

> [GPG] Policy generator framework
> 
>
> Key: YARN-7707
> URL: https://issues.apache.org/jira/browse/YARN-7707
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Carlo Curino
>Assignee: Young Chen
>  Labels: federation, gpg
>
> This JIRA tracks the development of a generic framework for querying 
> sub-clusters for metrics, running policies, and updating them in the 
> FederationStateStore.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6648) [GPG] Add FederationStateStore interfaces for Global Policy Generator

2018-01-09 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-6648:
---
Parent Issue: YARN-7402  (was: YARN-5597)

> [GPG] Add FederationStateStore interfaces for Global Policy Generator
> -
>
> Key: YARN-6648
> URL: https://issues.apache.org/jira/browse/YARN-6648
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Botong Huang
>Assignee: Botong Huang
>Priority: Minor
>  Labels: federation, gpg
> Attachments: YARN-6648-YARN-2915.v1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-3660) [GPG] Federation Global Policy Generator (load balancing)

2018-01-09 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-3660:
---
Parent Issue: YARN-7402  (was: YARN-5597)

> [GPG] Federation Global Policy Generator (load balancing)
> -
>
> Key: YARN-3660
> URL: https://issues.apache.org/jira/browse/YARN-3660
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Carlo Curino
>Assignee: Botong Huang
>  Labels: federation, gpg
>
> In a federated environment, local impairments of one sub-cluster might 
> unfairly affect users/queues that are mapped to that sub-cluster. A 
> centralized component (GPG) runs out-of-band and edits the policies governing 
> how users/queues are allocated to sub-clusters. This allows us to enforce 
> global invariants (by dynamically updating locally-enforced invariants).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5597) YARN Federation improvements

2018-01-09 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-5597:
---
Summary: YARN Federation improvements  (was: YARN Federation phase 2)

> YARN Federation improvements
> 
>
> Key: YARN-5597
> URL: https://issues.apache.org/jira/browse/YARN-5597
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Subru Krishnan
>Assignee: Subru Krishnan
>
> This umbrella JIRA tracks set of improvements over the YARN Federation MVP 
> (YARN-2915)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-7708) [GPG] Load based policy generator

2018-01-05 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino reassigned YARN-7708:
--

Assignee: Young Chen

> [GPG] Load based policy generator
> -
>
> Key: YARN-7708
> URL: https://issues.apache.org/jira/browse/YARN-7708
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Carlo Curino
>Assignee: Young Chen
>
> This policy reads load from the "pendingQueueLength" metrics and provides 
> scaling into a set of weights that influence the AMRMProxy and Router 
> behaviors.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-7707) [GPG] Policy generator framework

2018-01-05 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino reassigned YARN-7707:
--

Assignee: Young Chen  (was: Carlo Curino)

> [GPG] Policy generator framework
> 
>
> Key: YARN-7707
> URL: https://issues.apache.org/jira/browse/YARN-7707
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Carlo Curino
>Assignee: Young Chen
>  Labels: federation, gpg
>
> This JIRA tracks the development of a generic framework for querying 
> sub-clusters for metrics, running policies, and updating them in the 
> FederationStateStore.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7707) [GPG] Policy generator framework

2018-01-05 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-7707:
---
Labels: federation gpg  (was: )

> [GPG] Policy generator framework
> 
>
> Key: YARN-7707
> URL: https://issues.apache.org/jira/browse/YARN-7707
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Carlo Curino
>Assignee: Young Chen
>  Labels: federation, gpg
>
> This JIRA tracks the development of a generic framework for querying 
> sub-clusters for metrics, running policies, and updating them in the 
> FederationStateStore.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-7707) [GPG] Policy generator framework

2018-01-05 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino reassigned YARN-7707:
--

Assignee: Carlo Curino  (was: Young Chen)

> [GPG] Policy generator framework
> 
>
> Key: YARN-7707
> URL: https://issues.apache.org/jira/browse/YARN-7707
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Carlo Curino
>Assignee: Carlo Curino
>
> This JIRA tracks the development of a generic framework for querying 
> sub-clusters for metrics, running policies, and updating them in the 
> FederationStateStore.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-7707) [GPG] Policy generator framework

2018-01-05 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino reassigned YARN-7707:
--

Assignee: Young Chen

> [GPG] Policy generator framework
> 
>
> Key: YARN-7707
> URL: https://issues.apache.org/jira/browse/YARN-7707
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Carlo Curino
>Assignee: Young Chen
>
> This JIRA tracks the development of a generic framework for querying 
> sub-clusters for metrics, running policies, and updating them in the 
> FederationStateStore.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-7708) [GPG] Load based policy generator

2018-01-05 Thread Carlo Curino (JIRA)
Carlo Curino created YARN-7708:
--

 Summary: [GPG] Load based policy generator
 Key: YARN-7708
 URL: https://issues.apache.org/jira/browse/YARN-7708
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Carlo Curino


This policy reads load from the "pendingQueueLength" metrics and provides 
scaling into a set of weights that influence the AMRMProxy and Router behaviors.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-7707) [GPG] Policy generator framework

2018-01-05 Thread Carlo Curino (JIRA)
Carlo Curino created YARN-7707:
--

 Summary: [GPG] Policy generator framework
 Key: YARN-7707
 URL: https://issues.apache.org/jira/browse/YARN-7707
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Carlo Curino


This JIRA tracks the development of a generic framework for querying 
sub-clusters for metrics, running policies, and updating them in the 
FederationStateStore.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7599) [GPG] Application cleaner and subcluster cleaner in Global Policy Generator

2018-01-05 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-7599:
---
Summary: [GPG] Application cleaner and subcluster cleaner in Global Policy 
Generator  (was: Application cleaner and subcluster cleaner in Global Policy 
Generator)

> [GPG] Application cleaner and subcluster cleaner in Global Policy Generator
> ---
>
> Key: YARN-7599
> URL: https://issues.apache.org/jira/browse/YARN-7599
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Botong Huang
>Assignee: Botong Huang
>Priority: Minor
>  Labels: federation, gpg
>
> In Federation, we need a cleanup service for StateStore as well as Yarn 
> Registry. For the former, we need to remove old application records as well 
> as inactive subclusters. For the latter, failed and killed applications might 
> leave records in the Yarn Registry (see YARN-6128). We plan to add both 
> cleanup service in GPG



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7599) [GPG] Application cleaner and subcluster cleaner in Global Policy Generator

2018-01-05 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-7599:
---
Labels: federation gpg  (was: )

> [GPG] Application cleaner and subcluster cleaner in Global Policy Generator
> ---
>
> Key: YARN-7599
> URL: https://issues.apache.org/jira/browse/YARN-7599
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Botong Huang
>Assignee: Botong Huang
>Priority: Minor
>  Labels: federation, gpg
>
> In Federation, we need a cleanup service for StateStore as well as Yarn 
> Registry. For the former, we need to remove old application records as well 
> as inactive subclusters. For the latter, failed and killed applications might 
> leave records in the Yarn Registry (see YARN-6128). We plan to add both 
> cleanup service in GPG



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6648) [GPG] Add FederationStateStore interfaces for Global Policy Generator

2018-01-05 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-6648:
---
Labels: federation gpg  (was: )

> [GPG] Add FederationStateStore interfaces for Global Policy Generator
> -
>
> Key: YARN-6648
> URL: https://issues.apache.org/jira/browse/YARN-6648
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Botong Huang
>Assignee: Botong Huang
>Priority: Minor
>  Labels: federation, gpg
> Attachments: YARN-6648-YARN-2915.v1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-3660) [GPG] Federation Global Policy Generator (load balancing)

2018-01-05 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-3660:
---
Summary: [GPG] Federation Global Policy Generator (load balancing)  (was: 
Federation Global Policy Generator (load balancing))

> [GPG] Federation Global Policy Generator (load balancing)
> -
>
> Key: YARN-3660
> URL: https://issues.apache.org/jira/browse/YARN-3660
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Carlo Curino
>Assignee: Botong Huang
>  Labels: federation, gpg
>
> In a federated environment, local impairments of one sub-cluster might 
> unfairly affect users/queues that are mapped to that sub-cluster. A 
> centralized component (GPG) runs out-of-band and edits the policies governing 
> how users/queues are allocated to sub-clusters. This allows us to enforce 
> global invariants (by dynamically updating locally-enforced invariants).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-3660) [GPG] Federation Global Policy Generator (load balancing)

2018-01-05 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-3660:
---
Labels: federation gpg  (was: )

> [GPG] Federation Global Policy Generator (load balancing)
> -
>
> Key: YARN-3660
> URL: https://issues.apache.org/jira/browse/YARN-3660
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Carlo Curino
>Assignee: Botong Huang
>  Labels: federation, gpg
>
> In a federated environment, local impairments of one sub-cluster might 
> unfairly affect users/queues that are mapped to that sub-cluster. A 
> centralized component (GPG) runs out-of-band and edits the policies governing 
> how users/queues are allocated to sub-clusters. This allows us to enforce 
> global invariants (by dynamically updating locally-enforced invariants).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6648) [GPG] Add FederationStateStore interfaces for Global Policy Generator

2018-01-05 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-6648:
---
Summary: [GPG] Add FederationStateStore interfaces for Global Policy 
Generator  (was: Add FederationStateStore interfaces for Global Policy 
Generator)

> [GPG] Add FederationStateStore interfaces for Global Policy Generator
> -
>
> Key: YARN-6648
> URL: https://issues.apache.org/jira/browse/YARN-6648
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Botong Huang
>Assignee: Botong Huang
>Priority: Minor
>  Labels: federation, gpg
> Attachments: YARN-6648-YARN-2915.v1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-3660) Federation Global Policy Generator (load balancing)

2018-01-05 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino reassigned YARN-3660:
--

Assignee: Botong Huang  (was: Subru Krishnan)

> Federation Global Policy Generator (load balancing)
> ---
>
> Key: YARN-3660
> URL: https://issues.apache.org/jira/browse/YARN-3660
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Carlo Curino
>Assignee: Botong Huang
>
> In a federated environment, local impairments of one sub-cluster might 
> unfairly affect users/queues that are mapped to that sub-cluster. A 
> centralized component (GPG) runs out-of-band and edits the policies governing 
> how users/queues are allocated to sub-clusters. This allows us to enforce 
> global invariants (by dynamically updating locally-enforced invariants).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-7615) Federation StateStore: support storage/retrieval of reservations

2017-12-05 Thread Carlo Curino (JIRA)
Carlo Curino created YARN-7615:
--

 Summary: Federation StateStore: support storage/retrieval of 
reservations
 Key: YARN-7615
 URL: https://issues.apache.org/jira/browse/YARN-7615
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Carlo Curino






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-7614) Support Reservation APIs in Federation Router

2017-12-05 Thread Carlo Curino (JIRA)
Carlo Curino created YARN-7614:
--

 Summary: Support Reservation APIs in Federation Router
 Key: YARN-7614
 URL: https://issues.apache.org/jira/browse/YARN-7614
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Carlo Curino






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7439) Minor improvements to Reservation System documentation/exceptions

2017-11-03 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16238099#comment-16238099
 ] 

Carlo Curino commented on YARN-7439:



# The main documentation page for the ReservationSystem 
(http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ReservationSystem.html)
 should call out how to switch on the reservaiton system in yarn-site.xml 
# The submission-reservation.json example in 
(http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Reservation_API_Submit)
 is missing a coma after reservation id.
# When the reservation system is disabled and we attempt to invoke 
new-reservation we get:

{code}
{
  "RemoteException": {
"exception": "YarnRuntimeException",
"message": "Unable to create new reservation from RM web service",
"javaClassName": "org.apache.hadoop.yarn.exceptions.YarnRuntimeException"
  }
}
{code}

which is not the most telling message.  By constrast the submission throws back 
a more appropriate:

{code}
{
  "RemoteException": {
"exception": "BadRequestException",
"message": "java.lang.Exception: Reservation is not enabled. Please enable 
& try again",
"javaClassName": "org.apache.hadoop.yarn.webapp.BadRequestException"
  }
}
{code}

> Minor improvements to Reservation System documentation/exceptions
> -
>
> Key: YARN-7439
> URL: https://issues.apache.org/jira/browse/YARN-7439
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Carlo Curino
>
> This JIRA tracks a couple of minor issues with docs and exception for the 
> reservation system.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-7439) Minor improvements to Reservation System documentation/exceptions

2017-11-03 Thread Carlo Curino (JIRA)
Carlo Curino created YARN-7439:
--

 Summary: Minor improvements to Reservation System 
documentation/exceptions
 Key: YARN-7439
 URL: https://issues.apache.org/jira/browse/YARN-7439
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Carlo Curino


This JIRA tracks a couple of minor issues with docs and exception for the 
reservation system.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7434) Router getApps REST invocation fails with multiple RMs

2017-11-02 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16236910#comment-16236910
 ] 

Carlo Curino commented on YARN-7434:


Thanks [~elgoiri] for the patch. LGTM, let's wait for Yetus. Also as soon as 
this is checked by YETUS please upload the version for branch-2/branch-2.9.


> Router getApps REST invocation fails with multiple RMs
> --
>
> Key: YARN-7434
> URL: https://issues.apache.org/jira/browse/YARN-7434
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Subru Krishnan
>Assignee: Íñigo Goiri
>Priority: Critical
> Attachments: YARN-7434.000.patch
>
>
> Router uses threads to invoke getApps in parallel with multiple RMs and has a 
> concurrency bug caused by sharing of the HTTP request object. This jira 
> tracks the changes to fix the multi-threading issue by cloning the request.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7431) resource estimator has findbugs problems

2017-11-02 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16236804#comment-16236804
 ] 

Carlo Curino commented on YARN-7431:


The patch LGTM vs the issues listed, though let's see what yetus says.

> resource estimator has findbugs problems
> 
>
> Key: YARN-7431
> URL: https://issues.apache.org/jira/browse/YARN-7431
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.9.0, 3.1.0
>Reporter: Allen Wittenauer
>Assignee: Arun Suresh
>Priority: Blocker
> Attachments: YARN-7431.001.patch
>
>
> Just see any recent report.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7403) Compute global and local preemption

2017-10-31 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16227507#comment-16227507
 ] 

Carlo Curino commented on YARN-7403:


The attached screenshot shows an example of "globally" calculated local 
preemption. In particular, it tries to highlight the problem of locality of the 
demand vs availability of preemptable containers. The code in draf3 patch 
computes the total preemption to be 100 containers, it splits it among SC1 and 
SC2 based on B demand (so 66/33) and cap the preemption by the number of 
preemptable containers in A1 which is 20 in SC2.   

Other "splitting" decisions can be made, enforcing different invariants, e.g., 
that all 100 containers are preempted etc... I think the current  policy is 
reasonable, when combined with a stateful AMRMPRoxy policy that "relax" 
locality demand, as the requests from B will eventually be migrated towards the 
sub-cluster where demand is being fulfilled, i.e., in a later time B's demand 
should be in SC1 and more preemptiong of A1 containers in SC1 should kick in. 

Thoughts?



> Compute global and local preemption
> ---
>
> Key: YARN-7403
> URL: https://issues.apache.org/jira/browse/YARN-7403
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Attachments: YARN-7403.draft.patch, YARN-7403.draft2.patch, 
> YARN-7403.draft3.patch, global-queues-preemption.PNG
>
>
> This JIRA tracks algorithmic effort to combine the local queue views of 
> capacity guarantee/use/demand and compute the global amount of preemption, 
> and based on that, "where" (in which sub-cluster) preemption will be enacted.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7403) Compute global and local preemption

2017-10-31 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-7403:
---
Attachment: global-queues-preemption.PNG

> Compute global and local preemption
> ---
>
> Key: YARN-7403
> URL: https://issues.apache.org/jira/browse/YARN-7403
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Attachments: YARN-7403.draft.patch, YARN-7403.draft2.patch, 
> YARN-7403.draft3.patch, global-queues-preemption.PNG
>
>
> This JIRA tracks algorithmic effort to combine the local queue views of 
> capacity guarantee/use/demand and compute the global amount of preemption, 
> and based on that, "where" (in which sub-cluster) preemption will be enacted.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7403) Compute global and local preemption

2017-10-31 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-7403:
---
Attachment: YARN-7403.draft3.patch

> Compute global and local preemption
> ---
>
> Key: YARN-7403
> URL: https://issues.apache.org/jira/browse/YARN-7403
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Attachments: YARN-7403.draft.patch, YARN-7403.draft2.patch, 
> YARN-7403.draft3.patch
>
>
> This JIRA tracks algorithmic effort to combine the local queue views of 
> capacity guarantee/use/demand and compute the global amount of preemption, 
> and based on that, "where" (in which sub-cluster) preemption will be enacted.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7403) Compute global and local preemption

2017-10-26 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-7403:
---
Attachment: YARN-7403.draft2.patch

Fixing ASF license.

> Compute global and local preemption
> ---
>
> Key: YARN-7403
> URL: https://issues.apache.org/jira/browse/YARN-7403
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Attachments: YARN-7403.draft.patch, YARN-7403.draft2.patch
>
>
> This JIRA tracks algorithmic effort to combine the local queue views of 
> capacity guarantee/use/demand and compute the global amount of preemption, 
> and based on that, "where" (in which sub-cluster) preemption will be enacted.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-7405) Bias container allocations based on global view

2017-10-26 Thread Carlo Curino (JIRA)
Carlo Curino created YARN-7405:
--

 Summary: Bias container allocations based on global view
 Key: YARN-7405
 URL: https://issues.apache.org/jira/browse/YARN-7405
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Carlo Curino


Each RM in a federation should bias its local allocations of containers based 
on the global over/under utilization of queues. As part of this the local RM 
should account for the work that other RMs will be doing in between the updates 
we receive via the heartbeats of YARN-7404 (the mechanics used for 
synchronization).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



  1   2   3   4   5   6   7   8   9   10   >