[jira] [Updated] (YARN-3) Add support for CPU isolation/monitoring of containers

2012-08-08 Thread Andrew Ferguson (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Ferguson updated YARN-3:
---

Attachment: YARN-3-lce_only-v1.patch

This patch augments the LinuxContainerExecutor with an LCEResourcesHandler, 
which can be used to enforce resource limits using either cgroups (in this 
patch) or sched_setaffinity/taskset (future patch). A 
DefaultLCEResourcesHandler is also provided which does not enforce any new 
resource limits.

The LCEResourcesHandler interface (and concrete classes) are introduced to keep 
the LinuxContainerExecutor java class simple, and to separate the cgroups and 
future sched_setaffinity logic.

The resources handler code is split across the introduced Java classes, and the 
existing container-executor native binary. This is done to minimize the amount 
of code added to the native binary, to provide easy logging via Java 
mechanisms, and because a singleton is needed to track CPU assignments when 
using sched_setaffinity/taskset.

The handler operates synchronously with the execution of the container.

The changes to the native code are:
1) A resources option has been added to the LaunchContainer command. This 
option is used to convey a list of cgroups into which the container should be 
placed before the user command is launched, and, in the future, will be used to 
alternatively covey a list of CPUs which the process should be pinned to, if 
using sched_setaffinity instead of cgroups.

2) A --mount-cgroups command has been added to the native code. This command 
will mount cgroups controllers and create hierarchies for the NodeManager to 
manage. This feature is optional (see below), and exposed to the Java code via 
a new method in LinuxContainerExecutor.java.

The following configuration options are introduced:

yarn.nodemanager.linux-container-executor.resources-handler.class -- The class 
which should assist the LCE in handling resources.

yarn.nodemanager.linux-container-executor.cgroups.hierarchy -- The cgroups 
hierarchy under which to place YARN proccesses (cannot contain commas). Only 
used when the LCE resources handler is set to the CgroupsLCEResourcesHandler.

yarn.nodemanager.linux-container-executor.cgroups.mount -- Whether the LCE 
should attempt to mount cgroups if not found. Only used when the LCE resources 
handler is set to the CgroupsLCEResourcesHandler.

yarn.nodemanager.linux-container-executor.cgroups.mount-path -- Where the LCE 
should attempt to mount cgroups if not found. Common locations include 
/sys/fs/cgroup and /cgroup. The path must exist before the NodeManager is 
launched. Only used when the LCE resources handler is set to the 
CgroupsLCEResourcesHandler.

> Add support for CPU isolation/monitoring of containers
> --
>
> Key: YARN-3
> URL: https://issues.apache.org/jira/browse/YARN-3
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun C Murthy
>Assignee: Andrew Ferguson
> Attachments: MAPREDUCE-4334-executor-v1.patch, 
> MAPREDUCE-4334-executor-v2.patch, MAPREDUCE-4334-executor-v3.patch, 
> MAPREDUCE-4334-executor-v4.patch, MAPREDUCE-4334-pre1.patch, 
> MAPREDUCE-4334-pre2-with_cpu.patch, MAPREDUCE-4334-pre2.patch, 
> MAPREDUCE-4334-pre3-with_cpu.patch, MAPREDUCE-4334-pre3.patch, 
> MAPREDUCE-4334-v1.patch, MAPREDUCE-4334-v2.patch, YARN-3-lce_only-v1.patch, 
> mapreduce-4334-design-doc-v2.txt, mapreduce-4334-design-doc.txt
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (YARN-3) Add support for CPU isolation/monitoring of containers

2012-08-20 Thread Andrew Ferguson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13438416#comment-13438416
 ] 

Andrew Ferguson commented on YARN-3:


hi,

I would like to mark this JIRA as "patch available" for the patch I uploaded on 
August 9th. however, it doesn't seem to be available in my list of "More 
Actions".  perhaps there is some other step I need to take?

thanks!
Andrew

> Add support for CPU isolation/monitoring of containers
> --
>
> Key: YARN-3
> URL: https://issues.apache.org/jira/browse/YARN-3
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun C Murthy
>Assignee: Andrew Ferguson
> Attachments: mapreduce-4334-design-doc.txt, 
> mapreduce-4334-design-doc-v2.txt, MAPREDUCE-4334-executor-v1.patch, 
> MAPREDUCE-4334-executor-v2.patch, MAPREDUCE-4334-executor-v3.patch, 
> MAPREDUCE-4334-executor-v4.patch, MAPREDUCE-4334-pre1.patch, 
> MAPREDUCE-4334-pre2.patch, MAPREDUCE-4334-pre2-with_cpu.patch, 
> MAPREDUCE-4334-pre3.patch, MAPREDUCE-4334-pre3-with_cpu.patch, 
> MAPREDUCE-4334-v1.patch, MAPREDUCE-4334-v2.patch, YARN-3-lce_only-v1.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (YARN-147) Add support for CPU isolation/monitoring of containers

2012-10-14 Thread Andrew Ferguson (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Ferguson updated YARN-147:
-

Attachment: YARN-147-v1.patch

updated patch as per Tucu's review

> Add support for CPU isolation/monitoring of containers
> --
>
> Key: YARN-147
> URL: https://issues.apache.org/jira/browse/YARN-147
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.0.3-alpha
>Reporter: Alejandro Abdelnur
>Assignee: Andrew Ferguson
> Fix For: 2.0.3-alpha
>
> Attachments: YARN-147-v1.patch, YARN-3.patch
>
>
> This is a clone for YARN-3 to be able to submit the patch as YARN-3 does not 
> show the SUBMIT PATCH button.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-147) Add support for CPU isolation/monitoring of containers

2012-10-14 Thread Andrew Ferguson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13475909#comment-13475909
 ] 

Andrew Ferguson commented on YARN-147:
--

hi Tucu,

thanks very much for opening this new jira and reviewing the patch. I've 
updated a new version which addresses most of your comments.

answers to the questions in your review:
.bq cgroupMountPath, if there is no default we should fail if not set, can't we 
have a sensible default?

I've added a check to fail if not set. as far as I can tell, there isn't a 
single default path for cgroups -- some distributions use "/sys/fs/cgroup", 
some use "/cgroup", others, "/cgroups". I've even seen "/mnt/cgroup" (Debian 
perhaps?); these also vary across releases of the same distro. :-(

.bq default value for cgroupPrefix has '/', here will produce a '//' in the path

yes, I made that choice deliberately. I wanted to convey that cgroupPrefix can 
be a path (which is why I kept the '/') and when I use it, I also added a '/' 
in case the user did not put a '/' at the right place in the prefix. my 
understanding is that on Unix, '//' in a path is interpreted as '/', no?

.bq Nf the filereader cannot be open/read, is this acceptable or should stop 
execution by throwing exception?

eh, we could go either way here, but I think it's reasonable to not throw the 
exception. if the file can't be read, then the map from cgroup controller to 
path isn't built, and we already have existing checks which skip controllers 
which can't be found in the path (say, if the file can be read correctly, but 
the CPU controller isn't mounted).


ok, great. I'm going to mark this as "patch available" and see if the findbugs 
warning has gone away (I can't seem to get it to run locally).


thanks!!
Andrew


> Add support for CPU isolation/monitoring of containers
> --
>
> Key: YARN-147
> URL: https://issues.apache.org/jira/browse/YARN-147
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.0.3-alpha
>Reporter: Alejandro Abdelnur
>Assignee: Andrew Ferguson
> Fix For: 2.0.3-alpha
>
> Attachments: YARN-147-v1.patch, YARN-3.patch
>
>
> This is a clone for YARN-3 to be able to submit the patch as YARN-3 does not 
> show the SUBMIT PATCH button.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-147) Add support for CPU isolation/monitoring of containers

2012-10-16 Thread Andrew Ferguson (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Ferguson updated YARN-147:
-

Attachment: YARN-147-v2.patch

thanks for the additional comments, Tucu!  I've updated the patch as per your 
review. hopefully I have done everything correctly.

btw, I have this patch in github:
https://github.com/adferguson/hadoop-common/tree/adf-yarn-147
you can see the changes for this patch in the most recent commit.


thanks!
Andrew

> Add support for CPU isolation/monitoring of containers
> --
>
> Key: YARN-147
> URL: https://issues.apache.org/jira/browse/YARN-147
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.0.3-alpha
>Reporter: Alejandro Abdelnur
>Assignee: Andrew Ferguson
> Fix For: 2.0.3-alpha
>
> Attachments: YARN-147-v1.patch, YARN-147-v2.patch, YARN-3.patch
>
>
> This is a clone for YARN-3 to be able to submit the patch as YARN-3 does not 
> show the SUBMIT PATCH button.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-147) Add support for CPU isolation/monitoring of containers

2012-10-18 Thread Andrew Ferguson (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Ferguson updated YARN-147:
-

Attachment: YARN-147-v3.patch

update native code per review by Colin

> Add support for CPU isolation/monitoring of containers
> --
>
> Key: YARN-147
> URL: https://issues.apache.org/jira/browse/YARN-147
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.0.3-alpha
>Reporter: Alejandro Abdelnur
>Assignee: Andrew Ferguson
> Fix For: 2.0.3-alpha
>
> Attachments: YARN-147-v1.patch, YARN-147-v2.patch, YARN-147-v3.patch, 
> YARN-3.patch
>
>
> This is a clone for YARN-3 to be able to submit the patch as YARN-3 does not 
> show the SUBMIT PATCH button.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-147) Add support for CPU isolation/monitoring of containers

2012-10-18 Thread Andrew Ferguson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13479387#comment-13479387
 ] 

Andrew Ferguson commented on YARN-147:
--

hi Colin,

thanks for looking at the native code. since the changes were pretty extensive, 
would you mind taking a careful look again? if it's easier for you, the 
incremental changes can be seen here:
https://github.com/adferguson/hadoop-common/commits/adf-yarn-147

I hope I've faithfully implemented the new key-value API you suggested -- let 
me know if that's not the case.

If the mount fails, I let the exception bubble all the way up to stop the 
NodeManager, as Tucu suggested before about a different error.

The one thing I did not do is change the open / write / close methods to fopen 
/ fprintf / fclose, as the rest of the native code does not use those methods. 
Which would you prefer to see: adjust my patch to use fopen, etc., or fix my 
use of open, etc.?

Yes, I totally agree that it would be better if main.c used getopt_long; it 
definitely smells like another JIRA to me. :-)


thanks!
Andrew

> Add support for CPU isolation/monitoring of containers
> --
>
> Key: YARN-147
> URL: https://issues.apache.org/jira/browse/YARN-147
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.0.3-alpha
>Reporter: Alejandro Abdelnur
>Assignee: Andrew Ferguson
> Fix For: 2.0.3-alpha
>
> Attachments: YARN-147-v1.patch, YARN-147-v2.patch, YARN-147-v3.patch, 
> YARN-3.patch
>
>
> This is a clone for YARN-3 to be able to submit the patch as YARN-3 does not 
> show the SUBMIT PATCH button.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-147) Add support for CPU isolation/monitoring of containers

2012-10-18 Thread Andrew Ferguson (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Ferguson updated YARN-147:
-

Attachment: YARN-147-v4.patch

small fix in two places: don't log & re-throw the same exception -- construct 
new exceptions with better context, and use the previous one as the cause.

thanks Tucu for pointing this out!

> Add support for CPU isolation/monitoring of containers
> --
>
> Key: YARN-147
> URL: https://issues.apache.org/jira/browse/YARN-147
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.0.3-alpha
>Reporter: Alejandro Abdelnur
>Assignee: Andrew Ferguson
> Fix For: 2.0.3-alpha
>
> Attachments: YARN-147-v1.patch, YARN-147-v2.patch, YARN-147-v3.patch, 
> YARN-147-v4.patch, YARN-3.patch
>
>
> This is a clone for YARN-3 to be able to submit the patch as YARN-3 does not 
> show the SUBMIT PATCH button.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-147) Add support for CPU isolation/monitoring of containers

2012-10-19 Thread Andrew Ferguson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480437#comment-13480437
 ] 

Andrew Ferguson commented on YARN-147:
--

thanks again, Colin.  I'm going to stick with {{strchr}} instead of {{strspn}} 
because I think it's clearer that we get NULL when there is no = in the input 
string, rather than having to check if the length of the "key" is the same as 
the length of the string in that case.

thanks for noticing that the error handling wasn't quite right in 
{{mount_cgroup}}.

updated patch to follow in a sec.

> Add support for CPU isolation/monitoring of containers
> --
>
> Key: YARN-147
> URL: https://issues.apache.org/jira/browse/YARN-147
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.0.3-alpha
>Reporter: Alejandro Abdelnur
>Assignee: Andrew Ferguson
> Fix For: 2.0.3-alpha
>
> Attachments: YARN-147-v1.patch, YARN-147-v2.patch, YARN-147-v3.patch, 
> YARN-147-v4.patch, YARN-3.patch
>
>
> This is a clone for YARN-3 to be able to submit the patch as YARN-3 does not 
> show the SUBMIT PATCH button.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-147) Add support for CPU isolation/monitoring of containers

2012-10-19 Thread Andrew Ferguson (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Ferguson updated YARN-147:
-

Attachment: YARN-147-v5.patch

updated patch with most recent suggestions from Colin.

> Add support for CPU isolation/monitoring of containers
> --
>
> Key: YARN-147
> URL: https://issues.apache.org/jira/browse/YARN-147
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.0.3-alpha
>Reporter: Alejandro Abdelnur
>Assignee: Andrew Ferguson
> Fix For: 2.0.3-alpha
>
> Attachments: YARN-147-v1.patch, YARN-147-v2.patch, YARN-147-v3.patch, 
> YARN-147-v4.patch, YARN-147-v5.patch, YARN-3.patch
>
>
> This is a clone for YARN-3 to be able to submit the patch as YARN-3 does not 
> show the SUBMIT PATCH button.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-2) Enhance CS to schedule accounting for both memory and cpu cores

2012-10-21 Thread Andrew Ferguson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481081#comment-13481081
 ] 

Andrew Ferguson commented on YARN-2:


hi Arun,

this patch is looking GREAT! in particular, the ResourceCalculator class is 
super useful -- I really like it. :-)  my version, without it, is definitely 
much harder to follow...

before some specific feedback, I want to say that I agree that cores should be 
floats/fractional-units for three reasons:
# they make sense for long-running services, which may require little CPU, but 
should be available on each node, with the ease of having been scheduled by 
YARN.
# this gives us a fine-grained knob for implementing dynamic re-adjustment one 
day; ie, I may want to increase an executing job's weight by 10%, or decrease 
by 15%, etc.
# the publicly released traces of resource requests & usage in Google's cluster 
(to my knowledge, the only traces of their kind) include fractional amounts for 
CPU; having fractional CPU requests in YARN may make it easier to translate 
insights from that dataset to making better resource requests in a YARN cluster.

ok, here are some specific comments on the patch:
* *YarnConfiguration.java*: duplicate import of 
{{com.google.common.base.Joiner}}

* *DefaultContainer.java*: {{divideAndCeil}} explicitly uses the two-argument 
form of {{createResource}} to create a resource with 0 cores, whereas other 
Resources created in this calculator create resources with 1 core. this seems 
counter-intuitive to me, as {{divideAndCeil}} tends to result in an 
_overestimate_ of resource consumption, rather than an _underestimate_. either 
way, perhaps a comment would be helpful, as it is the only time this method is 
used this way in the memory-only comparator

* *MultiResourceCalculator.java*: in {{compare()}}, you are looking to order 
the resources by how dominant they are, and then compare by most-dominant 
resource, second most-dominant, etc. ... I think the boolean flag to 
{{getResourceAsValue()}} doesn't make this clear. with the flag, the question 
in my mind would be "wait, why would I want the non-dominant resource?". simply 
having a boolean flag makes extending to three or more resources less clean. I 
implemented this by treating each resource request as a vector, normalizing by 
clusterResources, and then sorting the components by dominance.

* *MultiResourceCalcuator.java*, *DefaultCalculator.java*, *Resources.java*: 
for the {{multiplyAndNormalizeUp}} and {{multiplyAndNormalizeDown}} methods, 
consider renaming the third argument to "stepping" instead of "factor" is it's 
not a factor used for the multiplication, rather it's a unit of discretization 
to round to ("stepping" may not be the best word, but perhaps it's closer). 
just a thought...

* *CSQueueUtils.java*: extra spaces in front of {{@Lock(CSQueue.class)}}

* *CapacityScheduler.java*: in the {{allocate()}} method, there's a call to 
normalize the request (after a comment about sanity checks). currently, it only 
normalizes the memory; I think the patch should also normalize the number of 
CPU's requested, no?

* *LeafQueue.java*: in {{assignReservedContainer}} consider changing 
{{Resources.divide}} to {{Resources.ratio}} when calculating 
{{potentialNewCapacity}} (and the current capacity). While both calls "should" 
give the same result, {{ratio}} has fewer floating-point operations, and, 
better yet, is semantically what is meant in this case -- we're calculating the 
ratio between (used + requested) and available. Frankly, this is perhaps 
something to take a closer look at (as [~vinodkv] pointed out): whether both 
{{divide}} and {{ratio}} are needed, and if so, which should be used in each 
case.


Also, both *ContainerTokenIdentifier.java* and *BuilderUtils.java* assume that 
memory is the only resource; I'm not certain they should be updated, but I 
wanted to mention them just in case.

Oh, and should *yarn-default.xml* be updated with values for 
{{yarn.scheduler.minimum-allocaiton-cores}} and 
{{yarn.scheduler.maximum-allocation-cores}} ?


Hope this helps, Arun!  depending on how the discussion of integral vs 
fractional cores shakes out, I think this patch is good to go.


cheers,
Andrew

> Enhance CS to schedule accounting for both memory and cpu cores
> ---
>
> Key: YARN-2
> URL: https://issues.apache.org/jira/browse/YARN-2
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: capacityscheduler, scheduler
>Reporter: Arun C Murthy
>Assignee: Arun C Murthy
> Fix For: 2.0.3-alpha
>
> Attachments: MAPREDUCE-4327.patch, MAPREDUCE-4327.patch, 
> MAPREDUCE-4327.patch, MAPREDUCE-4327-v2.patch, MAPREDUCE-4327-v3.patch, 
> MAPREDUCE-4327-v4.patch, MAPREDUCE-4327-v5.patch, 

[jira] [Commented] (YARN-2) Enhance CS to schedule accounting for both memory and cpu cores

2012-10-21 Thread Andrew Ferguson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481082#comment-13481082
 ] 

Andrew Ferguson commented on YARN-2:


oops, quick typo fix: by *DefaultContainer.java*, I meant 
*DefaultCalculator.java* .. thanks!

> Enhance CS to schedule accounting for both memory and cpu cores
> ---
>
> Key: YARN-2
> URL: https://issues.apache.org/jira/browse/YARN-2
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: capacityscheduler, scheduler
>Reporter: Arun C Murthy
>Assignee: Arun C Murthy
> Fix For: 2.0.3-alpha
>
> Attachments: MAPREDUCE-4327.patch, MAPREDUCE-4327.patch, 
> MAPREDUCE-4327.patch, MAPREDUCE-4327-v2.patch, MAPREDUCE-4327-v3.patch, 
> MAPREDUCE-4327-v4.patch, MAPREDUCE-4327-v5.patch, YARN-2-help.patch, 
> YARN-2.patch, YARN-2.patch, YARN-2.patch
>
>
> With YARN being a general purpose system, it would be useful for several 
> applications (MPI et al) to specify not just memory but also CPU (cores) for 
> their resource requirements. Thus, it would be useful to the 
> CapacityScheduler to account for both.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-3) Add support for CPU isolation/monitoring of containers

2012-10-24 Thread Andrew Ferguson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13483266#comment-13483266
 ] 

Andrew Ferguson commented on YARN-3:


(replying to comments on YARN-147 here instead as per [~acmurthy]'s request)

thanks for catching that bug [~sseth]! I've updated my git repo [1], and will 
post a new patch after addressing the review from [~vinodkone]. I successfully 
tested it quite a bit with and without cgroups back in the summer, but it seems 
the patch has shifted enough since the testing that I should do it again.

[1] https://github.com/adferguson/hadoop-common/commits/adf-yarn-147

> Add support for CPU isolation/monitoring of containers
> --
>
> Key: YARN-3
> URL: https://issues.apache.org/jira/browse/YARN-3
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun C Murthy
>Assignee: Andrew Ferguson
> Attachments: mapreduce-4334-design-doc.txt, 
> mapreduce-4334-design-doc-v2.txt, MAPREDUCE-4334-executor-v1.patch, 
> MAPREDUCE-4334-executor-v2.patch, MAPREDUCE-4334-executor-v3.patch, 
> MAPREDUCE-4334-executor-v4.patch, MAPREDUCE-4334-pre1.patch, 
> MAPREDUCE-4334-pre2.patch, MAPREDUCE-4334-pre2-with_cpu.patch, 
> MAPREDUCE-4334-pre3.patch, MAPREDUCE-4334-pre3-with_cpu.patch, 
> MAPREDUCE-4334-v1.patch, MAPREDUCE-4334-v2.patch, YARN-3-lce_only-v1.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-147) Add support for CPU isolation/monitoring of containers

2012-10-24 Thread Andrew Ferguson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13483268#comment-13483268
 ] 

Andrew Ferguson commented on YARN-147:
--

hi [~acmurthy], I've started posting replies on YARN-3 instead. the LCE bug is 
fixed and I'll post a new patch after addressing [~vinodkv]'s comments. thanks!

> Add support for CPU isolation/monitoring of containers
> --
>
> Key: YARN-147
> URL: https://issues.apache.org/jira/browse/YARN-147
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.0.3-alpha
>Reporter: Alejandro Abdelnur
>Assignee: Andrew Ferguson
> Fix For: 2.0.3-alpha
>
> Attachments: YARN-147-v1.patch, YARN-147-v2.patch, YARN-147-v3.patch, 
> YARN-147-v4.patch, YARN-147-v5.patch, YARN-3.patch
>
>
> This is a clone for YARN-3 to be able to submit the patch as YARN-3 does not 
> show the SUBMIT PATCH button.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-3) Add support for CPU isolation/monitoring of containers

2012-10-24 Thread Andrew Ferguson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13483412#comment-13483412
 ] 

Andrew Ferguson commented on YARN-3:


thanks for the review [~vinodkv]. I'll post an updated patch on YARN-147. 
there's a lot of food for thought here (design questions), so here are some 
comments:

bq. yarn.nodemanager.linux-container-executor.cgroups.mount has different 
defaults in code and in yarn-default.xml

yeah -- personally, I think the default should be false since it's not clear 
what a sensible default mount path is. I had changed the line in the code in 
response to Tucu's comment [1], but I'm changing it back to false since true 
doesn't seem sensible to me. if anyone in the community has a sensible default 
mount path, then we can surely change the default to true in both the code and 
yarn-default.xml :-/

bq. Can you explain this? Is this sleep necessary. Depending on its importance, 
we'll need to fix the following Id check, AMs don't always have ID equaling one.

the sleep is necessary as sometimes the LCE reports that the container has 
exited, even though the AM process has not terminated. hence, because the 
process is still running, we can't remove the cgroup yet; therefore, the code 
sleeps briefly.

since the AM doesn't always have the ID of 1, what do you suggest I do to 
determine whether the container has the AM or not? if there isn't a good rule, 
the code can just always sleep before removing the cgroup.

bq. container-executor.c: If a mount-point is already mounted, mount gives a 
EBUSY error, mount_cgroup() will need to be fixed to support remounts (for e.g. 
on NM restarts). We could unmount cgroup fs on shutdown but that isn't always 
guaranteed.

great catch! thanks! I've made this non-fatal. now, the NM will attempt to 
re-mount the cgroup, will print a message that it can't do that because it's 
mounted, and everything will work, because it will simply work as in the case 
where the cluster admin has already mounted the cgroups.

bq. Not sure of the benefit of configurable 
yarn.nodemanager.linux-container-executor.cgroups.mount-path. Couldn't NM just 
always mount to a path that it creates and owns? Similar comment for the 
hierarchy-prefix.

for the hierarchy-prefix, this needs to be configurable since, in the scenario 
where the admin creates the cgroups in advance, the NM doesn't have privileges 
to create its own hierarchy.

for the mount-path, this is a good question. Linux distributions mount the 
cgroup controllers in various locations, so I thought it was better to keep it 
configurable, since I figured it would be confusing if the OS had already 
mounted some of the cgroup congrollers on /cgroup/ or /sys/fs/cgroup/, and then 
the NM started mounting additional controllers in /path/nm/owns/cgroup/.

bq. CgroupsLCEResourcesHandler is swallowing exceptions and errors in multiple 
places - updateCgroup() and createCgroup(). In the later, if cgroups are 
enabled, and we can't create the file, it is a critical error?

I'm fine either way. what would people prefer to see? is it better to launch a 
container even if we can't enforce the limits? or is it better to prevent the 
container from launching? happy to make the necessary quick change.

bq. Make ResourcesHandler top level. I'd like to merge the ContainersMonitor 
functionality with this so as to monitor/enforce memory limits also. 
ContainersMinotor is top-level, we should make ResourcesHandler also top-level 
so that other platforms don't need to create this type-hierarchy all over again 
when they wish to implement some or all of this functionality.

if I'm reading this correctly, yes, that is what I first wanted to do when I 
started this patch (see discussions at the top of this YARN-3 thread, the early 
patches for MAPREDUCE-4334, and the current YARN-4). however, it seems we have 
decided to go another way.



thank you,
Andrew


[1] 
https://issues.apache.org/jira/browse/YARN-147?focusedCommentId=13470926&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13470926

> Add support for CPU isolation/monitoring of containers
> --
>
> Key: YARN-3
> URL: https://issues.apache.org/jira/browse/YARN-3
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun C Murthy
>Assignee: Andrew Ferguson
> Attachments: mapreduce-4334-design-doc.txt, 
> mapreduce-4334-design-doc-v2.txt, MAPREDUCE-4334-executor-v1.patch, 
> MAPREDUCE-4334-executor-v2.patch, MAPREDUCE-4334-executor-v3.patch, 
> MAPREDUCE-4334-executor-v4.patch, MAPREDUCE-4334-pre1.patch, 
> MAPREDUCE-4334-pre2.patch, MAPREDUCE-4334-pre2-with_cpu.patch, 
> MAPREDUCE-4334-pre3.patch, MAPREDUCE-4334-pre3-with_cpu.patch, 
> MAPREDUCE-4334-v1.patch, MAPREDUCE-4334-v2.patch, YARN

[jira] [Updated] (YARN-147) Add support for CPU isolation/monitoring of containers

2012-10-24 Thread Andrew Ferguson (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Ferguson updated YARN-147:
-

Attachment: YARN-147-v6.patch

updated as per reviews on comments here and on YARN-3.

> Add support for CPU isolation/monitoring of containers
> --
>
> Key: YARN-147
> URL: https://issues.apache.org/jira/browse/YARN-147
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.0.3-alpha
>Reporter: Alejandro Abdelnur
>Assignee: Andrew Ferguson
> Fix For: 2.0.3-alpha
>
> Attachments: YARN-147-v1.patch, YARN-147-v2.patch, YARN-147-v3.patch, 
> YARN-147-v4.patch, YARN-147-v5.patch, YARN-147-v6.patch, YARN-3.patch
>
>
> This is a clone for YARN-3 to be able to submit the patch as YARN-3 does not 
> show the SUBMIT PATCH button.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-147) Add support for CPU isolation/monitoring of containers

2012-12-02 Thread Andrew Ferguson (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Ferguson updated YARN-147:
-

Attachment: YARN-147-v8.patch

> Add support for CPU isolation/monitoring of containers
> --
>
> Key: YARN-147
> URL: https://issues.apache.org/jira/browse/YARN-147
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.0.3-alpha
>Reporter: Alejandro Abdelnur
>Assignee: Andrew Ferguson
> Fix For: 2.0.3-alpha
>
> Attachments: YARN-147-v1.patch, YARN-147-v2.patch, YARN-147-v3.patch, 
> YARN-147-v4.patch, YARN-147-v5.patch, YARN-147-v6.patch, YARN-147-v8.patch, 
> YARN-3.patch
>
>
> This is a clone for YARN-3 to be able to submit the patch as YARN-3 does not 
> show the SUBMIT PATCH button.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-3) Add support for CPU isolation/monitoring of containers

2012-12-02 Thread Andrew Ferguson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508411#comment-13508411
 ] 

Andrew Ferguson commented on YARN-3:


hi everyone, sorry for the delay on this patch -- the east coast hurricane & 
other events set me behind schedule.

I have attached a new version of this work to YARN-147 (v8); it is based on the 
latest version of trunk. as always, you can see my github tree for exact 
changes: https://github.com/adferguson/hadoop-common/

this patch has been tested (and confirmed to work) as follows:
- default executor, no cgroups
- Linux executor, no cgroups
- Linux executor, with cgroups
- Linux executor, mount cgroups automatically
- Linux executor, cgroups already mounted & asked to mount
- error condition: cgroups already mounted & cannot write to cgroup
- error condition: asked to mount cgroups, but cannot mount

both error conditions result in the NodeManager halting, as we have discussed 
above.


[~bikassaha], to answer your first question: mountCgroups is a function in 
LinuxContainerExecutor because that class is simply a Java wrapper for the 
functions provided by the LCE.

[~bikassaha], to answer your second question: if we use cgroups to limit CPU 
and there is only one container running on the machine, the current design will 
allow the container to access all of the CPU resources until other tasks start 
running (a work-conserving design). this design is using the CPU weights 
feature of cgroups, rather than the cpu bandwidth feature (or the entirely 
separate cpusets controller) to limit the bandwidth (a non-work-conserving 
design).


thank you,
Andrew

> Add support for CPU isolation/monitoring of containers
> --
>
> Key: YARN-3
> URL: https://issues.apache.org/jira/browse/YARN-3
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun C Murthy
>Assignee: Andrew Ferguson
> Attachments: mapreduce-4334-design-doc.txt, 
> mapreduce-4334-design-doc-v2.txt, MAPREDUCE-4334-executor-v1.patch, 
> MAPREDUCE-4334-executor-v2.patch, MAPREDUCE-4334-executor-v3.patch, 
> MAPREDUCE-4334-executor-v4.patch, MAPREDUCE-4334-pre1.patch, 
> MAPREDUCE-4334-pre2.patch, MAPREDUCE-4334-pre2-with_cpu.patch, 
> MAPREDUCE-4334-pre3.patch, MAPREDUCE-4334-pre3-with_cpu.patch, 
> MAPREDUCE-4334-v1.patch, MAPREDUCE-4334-v2.patch, YARN-3-lce_only-v1.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-3) Add support for CPU isolation/monitoring of containers

2012-12-18 Thread Andrew Ferguson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13535111#comment-13535111
 ] 

Andrew Ferguson commented on YARN-3:


[~vinodkv] you bet! I will fix these today.

thanks,
Andrew

> Add support for CPU isolation/monitoring of containers
> --
>
> Key: YARN-3
> URL: https://issues.apache.org/jira/browse/YARN-3
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun C Murthy
>Assignee: Andrew Ferguson
> Attachments: mapreduce-4334-design-doc.txt, 
> mapreduce-4334-design-doc-v2.txt, MAPREDUCE-4334-executor-v1.patch, 
> MAPREDUCE-4334-executor-v2.patch, MAPREDUCE-4334-executor-v3.patch, 
> MAPREDUCE-4334-executor-v4.patch, MAPREDUCE-4334-pre1.patch, 
> MAPREDUCE-4334-pre2.patch, MAPREDUCE-4334-pre2-with_cpu.patch, 
> MAPREDUCE-4334-pre3.patch, MAPREDUCE-4334-pre3-with_cpu.patch, 
> MAPREDUCE-4334-v1.patch, MAPREDUCE-4334-v2.patch, YARN-3-lce_only-v1.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-147) Add support for CPU isolation/monitoring of containers

2012-12-18 Thread Andrew Ferguson (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Ferguson updated YARN-147:
-

Attachment: YARN-147-v9.patch

hi,

this version fixes the broken test case in the previous patch, and should 
hopefully fix the findbugs warning.

thanks!
Andrew

> Add support for CPU isolation/monitoring of containers
> --
>
> Key: YARN-147
> URL: https://issues.apache.org/jira/browse/YARN-147
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.0.3-alpha
>Reporter: Alejandro Abdelnur
>Assignee: Andrew Ferguson
> Fix For: 2.0.3-alpha
>
> Attachments: YARN-147-v1.patch, YARN-147-v2.patch, YARN-147-v3.patch, 
> YARN-147-v4.patch, YARN-147-v5.patch, YARN-147-v6.patch, YARN-147-v8.patch, 
> YARN-147-v9.patch, YARN-3.patch
>
>
> This is a clone for YARN-3 to be able to submit the patch as YARN-3 does not 
> show the SUBMIT PATCH button.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-3) Add support for CPU isolation/monitoring of containers

2013-02-07 Thread Andrew Ferguson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13574187#comment-13574187
 ] 

Andrew Ferguson commented on YARN-3:


[~acmurthy] thanks for the merge Arun!

> Add support for CPU isolation/monitoring of containers
> --
>
> Key: YARN-3
> URL: https://issues.apache.org/jira/browse/YARN-3
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun C Murthy
>Assignee: Andrew Ferguson
> Fix For: 2.0.3-alpha
>
> Attachments: mapreduce-4334-design-doc.txt, 
> mapreduce-4334-design-doc-v2.txt, MAPREDUCE-4334-executor-v1.patch, 
> MAPREDUCE-4334-executor-v2.patch, MAPREDUCE-4334-executor-v3.patch, 
> MAPREDUCE-4334-executor-v4.patch, MAPREDUCE-4334-pre1.patch, 
> MAPREDUCE-4334-pre2.patch, MAPREDUCE-4334-pre2-with_cpu.patch, 
> MAPREDUCE-4334-pre3.patch, MAPREDUCE-4334-pre3-with_cpu.patch, 
> MAPREDUCE-4334-v1.patch, MAPREDUCE-4334-v2.patch, YARN-3-lce_only-v1.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-326) Add multi-resource scheduling to the fair scheduler

2013-04-25 Thread Andrew Ferguson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13642286#comment-13642286
 ] 

Andrew Ferguson commented on YARN-326:
--

hi Sandy,

I'm wondering if you want minimum and maximum shares to actually be fractions 
of the cluster, rather than resource vectors? that would fit more with the 
"fairness" aspect of the FairScheduler, but it's completely a design decision.

for example, what happens if the sum of the minimum shares for each queue 
exceeds the size of the cluster? (or the size of the cluster during a failure?)

or, if my queue has been given a minimum share of (2 CPU, 240 GB RAM) -- 
because I was originally using tasks with high-memory, what happens if I decide 
to switch to using tasks with high-CPU and low-memory?  I think a minimum share 
of "1/8" might make more sense since it would allow the queue's users to 
request the resources as they see fit.


anyway, just a thought.


cheers,
Andrew

> Add multi-resource scheduling to the fair scheduler
> ---
>
> Key: YARN-326
> URL: https://issues.apache.org/jira/browse/YARN-326
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: scheduler
>Affects Versions: 2.0.2-alpha
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Attachments: FairSchedulerDRFDesignDoc.pdf, YARN-326.patch
>
>
> With YARN-2 in, the capacity scheduler has the ability to schedule based on 
> multiple resources, using dominant resource fairness.  The fair scheduler 
> should be able to do multiple resource scheduling as well, also using 
> dominant resource fairness.
> More details to come on how the corner cases with fair scheduler configs such 
> as min and max resources will be handled.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-326) Add multi-resource scheduling to the fair scheduler

2013-04-26 Thread Andrew Ferguson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13643070#comment-13643070
 ] 

Andrew Ferguson commented on YARN-326:
--

hey Sandy,

sure, I certainly see the appeal of the absolute values approach -- like I 
said, it's a design tradeoff.

however, one point of DRF is that we can sensibly consider "fractions" of 
multidimensional resource vectors since the "fraction" is defined as the 
fraction of the cluster consumed by the most dominant resource. having 
single-dimensional fractions like this is nice because we can then a) weight 
them, and b) calculate max-min fairness as in the one-dimensional case (eg, 
memory) case.

consider the history and geology departments you introduced above. let's say 
our policy is that each queue gets equal weight (since the the departments went 
in on the purchase of the cluster 50/50), and that each queue should be 
guaranteed a minimum of 1/4 of the cluster (so that a queue fresh with jobs 
ramps-up to 1/4 of the cluster quickly).

in your proposal, since the departments have different shaped demands (one for 
high-memory, the other for high-cpu), we would configure their minimum share 
vectors based on these different shaped demands. this would work fine as long 
as the departments continued to submit resource requests which had these same, 
pre-configured shapes.

however, if we establish the minimums using fractions, then the departments can 
easily change between different shaped jobs, and still have the minimums work 
out for them sensibly. does this make sense?

let's be concrete:

10 nodes with 8 CPUs and 64 GB of RAM

if history usually submits jobs for (1 CPU, 16 GB) and geology for (2 CPU, 8 
GB). with your proposal, we might define history's minimum allocation to be (10 
CPU, 160 GB)  (1/4 of the dominant resource) and geology to be (20 CPU, 80 GB) 
(again, 1/4 of dominant resource). if either department changed the shape of 
their requests, they wouldn't get full use of their minimum.

so, what if we listed the minimums as simply 1/4 * cluster size, but not 
considering DRF? ie, giving (20 CPU and 160 GB) as the minimum allocation to 
each? well, if the departments continued to submit the different shaped jobs (1 
CPU, 16 GB) and (2 CPU, 8 GB), the design described would continue to see the 
queues as being below their "minimum allocation", even after the bottleneck 
resource fully consumed its amount of the minimum allocation. in the extreme 
case, I highly suspect a job could get *more* than its DRF-based fair share, 
simply by having one of its non-dominant resources remain below the amount 
listed in its minimum share. (can you see this? if not, I'll work out an 
example)

the beauty of the fractions approach, in my mind, is that it will apply no 
matter which resource is the bottleneck resource.


hope this example is clear. sorry I haven't had time to look at your code -- 
this is just based on my reading of your design doc. perhaps all is well and 
good in the code itself. :-)


cheers,
Andrew

> Add multi-resource scheduling to the fair scheduler
> ---
>
> Key: YARN-326
> URL: https://issues.apache.org/jira/browse/YARN-326
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: scheduler
>Affects Versions: 2.0.2-alpha
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Attachments: FairSchedulerDRFDesignDoc.pdf, YARN-326.patch, 
> YARN-326.patch
>
>
> With YARN-2 in, the capacity scheduler has the ability to schedule based on 
> multiple resources, using dominant resource fairness.  The fair scheduler 
> should be able to do multiple resource scheduling as well, also using 
> dominant resource fairness.
> More details to come on how the corner cases with fair scheduler configs such 
> as min and max resources will be handled.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-326) Add multi-resource scheduling to the fair scheduler

2013-04-28 Thread Andrew Ferguson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13644208#comment-13644208
 ] 

Andrew Ferguson commented on YARN-326:
--

[~sandyr] bingo. that was exactly the concern I alluded to before. glad we 
found it while thinking about the design. :-)

[~kkambatl] yup, that's the idea -- fractional min-share, which would be 
interpreted as a fraction of the dominant resource (which wouldn't be 
pre-specified, so the queue's dominant resource could adapt based on the jobs 
submitted) ... I wrote my example a bit quickly, sorry! let me know if 
something's still not clear.

the new plan sounds like a good approach. I like it.



> Add multi-resource scheduling to the fair scheduler
> ---
>
> Key: YARN-326
> URL: https://issues.apache.org/jira/browse/YARN-326
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: scheduler
>Affects Versions: 2.0.2-alpha
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Attachments: FairSchedulerDRFDesignDoc.pdf, YARN-326.patch, 
> YARN-326.patch
>
>
> With YARN-2 in, the capacity scheduler has the ability to schedule based on 
> multiple resources, using dominant resource fairness.  The fair scheduler 
> should be able to do multiple resource scheduling as well, also using 
> dominant resource fairness.
> More details to come on how the corner cases with fair scheduler configs such 
> as min and max resources will be handled.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-326) Add multi-resource scheduling to the fair scheduler

2013-04-28 Thread Andrew Ferguson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13644209#comment-13644209
 ] 

Andrew Ferguson commented on YARN-326:
--

ps -- I forgot to include a pointer to the newest paper in the DRF line of 
work: http://www.cs.berkeley.edu/~matei/papers/2013/eurosys_choosy.pdf

> Add multi-resource scheduling to the fair scheduler
> ---
>
> Key: YARN-326
> URL: https://issues.apache.org/jira/browse/YARN-326
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: scheduler
>Affects Versions: 2.0.2-alpha
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Attachments: FairSchedulerDRFDesignDoc.pdf, YARN-326.patch, 
> YARN-326.patch
>
>
> With YARN-2 in, the capacity scheduler has the ability to schedule based on 
> multiple resources, using dominant resource fairness.  The fair scheduler 
> should be able to do multiple resource scheduling as well, also using 
> dominant resource fairness.
> More details to come on how the corner cases with fair scheduler configs such 
> as min and max resources will be handled.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira