[ https://issues.apache.org/jira/browse/YARN-3870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15070186#comment-15070186 ]
Wangda Tan commented on YARN-3870: ---------------------------------- Hi [~grey], Thanks for raising this, we definitely need such mechanism to better describe our resource request. [~asuresh], I'm not sure how the unique id works? Are you planing to add it as a key to AppSchedulingInfo resource requests map? (e.g. {{Map = <priority, <id, <resourceName, request>>>}}) > Providing raw container request information for fine scheduling > --------------------------------------------------------------- > > Key: YARN-3870 > URL: https://issues.apache.org/jira/browse/YARN-3870 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, applications, capacityscheduler, fairscheduler, > resourcemanager, scheduler, yarn > Reporter: Lei Guo > > Currently, when AM sends container requests to RM and scheduler, it expands > individual container requests into host/rack/any format. For instance, if I > am asking for container request with preference "host1, host2, host3", > assuming all are in the same rack rack1, instead of sending one raw container > request to RM/Scheduler with raw preference list, it basically expand it to > become 5 different objects with host1, host2, host3, rack1 and any in there. > When scheduler receives information, it basically already lost the raw > request. This is ok for single container request, but it will cause trouble > when dealing with multiple container requests from the same application. > Consider this case: > 6 hosts, two racks: > rack1 (host1, host2, host3) rack2 (host4, host5, host6) > When application requests two containers with different data locality > preference: > c1: host1, host2, host4 > c2: host2, host3, host5 > This will end up with following container request list when client sending > request to RM/Scheduler: > host1: 1 instance > host2: 2 instances > host3: 1 instance > host4: 1 instance > host5: 1 instance > rack1: 2 instances > rack2: 2 instances > any: 2 instances > Fundamentally, it is hard for scheduler to make a right judgement without > knowing the raw container request. The situation will get worse when dealing > with affinity and anti-affinity or even gang scheduling etc. > We need some way to provide raw container request information for fine > scheduling purpose. -- This message was sent by Atlassian JIRA (v6.3.4#6332)