[ 
https://issues.apache.org/jira/browse/YARN-3870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15075740#comment-15075740
 ] 

Karthik Kambatla commented on YARN-3870:
----------------------------------------

bq. I think this JIRA and YARN-4485 are different:

bq. I think we cannot simply update timestamp when new resource request 
arrives. For example, at T1, AM asks 100 * 1G container; after 2 mins (T2), 
assume there's no container allocated, AM asks 100 * 1G container, we cannot 
say the resource request is added at T2. Instead, we should only set new 
timestamp for incremental asks

For YARN-4485, I was planning on taking this exact approach. While the two 
JIRAs and their purposes are different, the ability to identify a set of 
requests that arrived at one point in time requires similar updates to the data 
structures we use in AppSchedulingInfo. 

bq. what do you think of using <appAttempt#, timestamp> for id?
Timestamps for Ids might not be a good idea especially when an AM can restart. 
Also, there might be merit to differentiating two ResourceRequests (say, at 
different priorities) received at the same time. 

Discussed this with [~asuresh] and [~subru] offline. We felt the following 
changes would help us address multiple JIRAs (as [~xinxianyin] listed):
# Add an ID field to ResourceRequest - this can be a sequence number for each 
application. On AM restarts, a subsequent attempt could choose to resume from 
appropriate sequence number. If the AM doesn't add an ID, the RM could add one. 
Or, we could have the RM add the IDs and return them to the AM for help with 
book keeping. 
# YARN-4485 would likely want to add a timestamp in addition to this. Given the 
IDs, we likely don't have to do special delta handling. 
# In case the number of containers in the existing ResourceRequest increases, 
the delta is given a new ID. e.g  - e.g. App increases request from 3 
containers to 7 containers of same capability etc., the first three would have 
ID '1' and the next four would have ID '2'. 
# In case the number of containers corresponding to an existing ResourceRequest 
decreases, the number of containers is reduced from the largest ID to the 
smallest ID until the decrease is accounted for. e.g. If an app asks for 3, 7 
and 2 containers in subsequent allocate calls, once these calls are processed, 
the app has 2 containers with ID '1'. 
# The resource-request data structure in AppSchedulingInfo will be this 
{{Map<Priority, Map<ID, Map<String, Map<Resource, ResourceRequest>>>>}}. This 
would help YARN-314 as well. YARN-314 will need a few more changes to fix up 
the matching in each of the schedulers.
# Note that we will still be expanding a ResourceRequest to node-local, 
rack-local and ANY requests. These would now be tied with an ID and hence can 
be updated correctly.

If folks feel this would address all requirements, I could take a stab at the 
first patch. [~asuresh] and [~subru] have graciously offered to iterate on my 
prelim patch to fix up any issues in FairScheduler and CapacityScheduler. 

> Providing raw container request information for fine scheduling
> ---------------------------------------------------------------
>
>                 Key: YARN-3870
>                 URL: https://issues.apache.org/jira/browse/YARN-3870
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: api, applications, capacityscheduler, fairscheduler, 
> resourcemanager, scheduler, yarn
>            Reporter: Lei Guo
>
> Currently, when AM sends container requests to RM and scheduler, it expands 
> individual container requests into host/rack/any format. For instance, if I 
> am asking for container request with preference "host1, host2, host3", 
> assuming all are in the same rack rack1, instead of sending one raw container 
> request to RM/Scheduler with raw preference list, it basically expand it to 
> become 5 different objects with host1, host2, host3, rack1 and any in there. 
> When scheduler receives information, it basically already lost the raw 
> request. This is ok for single container request, but it will cause trouble 
> when dealing with multiple container requests from the same application. 
> Consider this case:
> 6 hosts, two racks:
> rack1 (host1, host2, host3) rack2 (host4, host5, host6)
> When application requests two containers with different data locality 
> preference:
> c1: host1, host2, host4
> c2: host2, host3, host5
> This will end up with following container request list when client sending 
> request to RM/Scheduler:
> host1: 1 instance
> host2: 2 instances
> host3: 1 instance
> host4: 1 instance
> host5: 1 instance
> rack1: 2 instances
> rack2: 2 instances
> any: 2 instances
> Fundamentally, it is hard for scheduler to make a right judgement without 
> knowing the raw container request. The situation will get worse when dealing 
> with affinity and anti-affinity or even gang scheduling etc.
> We need some way to provide raw container request information for fine 
> scheduling purpose.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to