[ 
https://issues.apache.org/jira/browse/YARN-371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13572512#comment-13572512
 ] 

Robert Joseph Evans commented on YARN-371:
------------------------------------------

I don't see any change like this happening in the 2.0 time frame.  Especially 
not a change that is going to break backwards compatibility. I also agree with 
Tom completely that how well YARN is adopted outside of Map/Reduce will be 
determined by how well it can *easily* accommodate other computing/scheduling 
needs.  The current API can be used to support all of the use cases I 
mentioned, but in a highly inefficient manor.  For gang scheduling I can 
request all of the containers and just have them sitting waiting until the 
final one shows up.  The others are similar, if not more wasteful.

I would encourage people interested in the scheduler or any other part of 
Hadoop to make a prototype with a specific set of problems in mind.  The 
original problem on this JIRA was delay scheduling was not having the desired 
impact on data locality, which may be a good enough reason to make a protocol 
change.  After making the prototype measure the impact of your changes and 
present those findings with the patch.  This will give people hard numbers to 
look at and code that they can use to benchmark on other clusters.  The 
performance impact of a change is rarely predictable, especially in very large 
distributed systems like Hadoop. Even on small programs running on a single 
computer the rule is always measure change measure again.  Try to stress your 
changes.  A change that speeds up a 100 node cluster is great, but I personally 
don't run any 100 node clusters and if the change makes it worse for me to run 
a 4000 node cluster churning through 90,000 apps a day I am likely to give it a 
-1 unless it can be fixed.  I am already dealing with too many issues with YARN 
at that scale. But, if it solves a true problem and does not make anything else 
worse I would be honored to be the one to check it in.
                
> Resource-centric compression in AM-RM protocol limits scheduling
> ----------------------------------------------------------------
>
>                 Key: YARN-371
>                 URL: https://issues.apache.org/jira/browse/YARN-371
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: api, resourcemanager, scheduler
>    Affects Versions: 2.0.2-alpha
>            Reporter: Sandy Ryza
>            Assignee: Sandy Ryza
>
> Each AMRM heartbeat consists of a list of resource requests. Currently, each 
> resource request consists of a container count, a resource vector, and a 
> location, which may be a node, a rack, or "*". When an application wishes to 
> request a task run in multiple localtions, it must issue a request for each 
> location.  This means that for a node-local task, it must issue three 
> requests, one at the node-level, one at the rack-level, and one with * (any). 
> These requests are not linked with each other, so when a container is 
> allocated for one of them, the RM has no way of knowing which others to get 
> rid of. When a node-local container is allocated, this is handled by 
> decrementing the number of requests on that node's rack and in *. But when 
> the scheduler allocates a task with a node-local request on its rack, the 
> request on the node is left there.  This can cause delay-scheduling to try to 
> assign a container on a node that nobody cares about anymore.
> Additionally, unless I am missing something, the current model does not allow 
> requests for containers only on a specific node or specific rack. While this 
> is not a use case for MapReduce currently, it is conceivable that it might be 
> something useful to support in the future, for example to schedule 
> long-running services that persist state in a particular location, or for 
> applications that generally care less about latency than data-locality.
> Lastly, the ability to understand which requests are for the same task will 
> possibly allow future schedulers to make more intelligent scheduling 
> decisions, as well as permit a more exact understanding of request load.
> I would propose the tweak of allowing a single ResourceRequest to encapsulate 
> all the location information for a task.  So instead of just a single 
> location, a ResourceRequest would contain an array of locations, including 
> nodes that it would be happy with, racks that it would be happy with, and 
> possibly *.  Side effects of this change would be a reduction in the amount 
> of data that needs to be transferred in a heartbeat, as well in as the RM's 
> memory footprint, becaused what used to be different requests for the same 
> task are now able to share some common data.
> While this change breaks compatibility, if it is going to happen, it makes 
> sense to do it now, before YARN becomes beta.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to