[ https://issues.apache.org/jira/browse/YARN-4879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15220475#comment-15220475 ]
Vinod Kumar Vavilapalli commented on YARN-4879: ----------------------------------------------- Tx for the doc, [~subru] and [~asuresh]! +1 overall for a unique identifier. h4. Comments on your doc - I'd rather call it "an enhancement to identify requests explicitly" instead of "simple (delta) allocate protocol". We used to use the phrase "delta protocol" in a slightly different context - see YARN-110. - bq. The RM will attempt to allocate containers in decreasing sequence number order, Why are we putting priority semantics onto the ID? We should just follow the existing priority ordering. - bq. In our proposal, we could potentially have requests for each container at worst case. It is both network / memory overhead as well as scheduler's CPU time. Till we move off to global scheduling completely, we should be cautious about this. Of course, by inverting the ResourceRequest and still keying by ResourceName in the API, we are limiting the total entries to be of the order of the cluster-size. I already suggested on YARN-1547 that we also have an upper limit on the total number of requests - see [here|https://issues.apache.org/jira/browse/YARN-1547?focusedCommentId=15218681&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15218681]. But I strongly suggest that we have additional limits on the total number of IDs that can be used - this will fit our narrative at YARN-4902 too. h4. Comments from YARN-4902 Copy-edit-pasting here a few comments that we posted in the document for YARN-4902, and those I think were not laid out in the doc explicitly. We were calling it Allocation-ID there, I guess I now like Request-ID better. If some or all of them make sense, you can add them to your doc - *Scope*: This ID is a unique identifier for different ResourceRequests from the *same application* - essentially IDs can conflict across applications. - *Generation*: The application should simply generate a unique identifier within the application - if not the client-libraries can do so if desired by the application. - *Non-binding nature*: Applications can continue to completely ignore the returned Allocation-ID in the response and use the allocation for any of their outstanding requests - *Responses*: The scheduler may return multiple responses corresponding to the same Allocation-ID - as and when scheduler returns allocations - *Deeper details on updates*: Similar to the current API, update of only selected fields against a previously existing Allocation-ID will only update the object (as opposed to replacing it). For e.g, say a ResourceRequest first gets created with Allocation-ID "76589" and with _"host: *"_. A future ResourceRequest with the same Allocation-ID but with contents _“rack05: 10”_ will only append the rack information to the existing object. This is how one can replace parts of an object and is similar to how the existing per-record-deltas based protocol works. - *Deletes*: Similarly, if one wishes to replace an entire ResourceRequest corresponding to a specific allocation-ID, they can simply cancel the corresponding ResourceRequest and submit a new one afresh. h4. Other responses bq. If a node local allocation is made for node N1, we can immediately lookup the entries for rack and ANY by using the ID key and decrement them instead of linearly scanning the rack/ANY entries. +1, ID is really the logical grouping key. bq. While making these changes, would it possible to address YARN-314 too? I'm okay if we can get two in a shot, but I'd caution against risking this effort by blowing up the size. > Proposal for a simple (delta) allocate protocol > ----------------------------------------------- > > Key: YARN-4879 > URL: https://issues.apache.org/jira/browse/YARN-4879 > Project: Hadoop YARN > Issue Type: Improvement > Components: applications, resourcemanager > Reporter: Subru Krishnan > Assignee: Subru Krishnan > Attachments: SimpleAllocateProtocolProposal-v1.pdf > > > For legacy reasons, the current allocate protocol expects expanded requests > which represent the cumulative request for any change in resource > constraints. This is not only very difficult to comprehend but makes it > impossible for the scheduler to associate container allocations to the > original requests. This problem is amplified by the fact that the expansion > is managed by the AMRMClient which makes it cumbersome for non-Java clients > as they all have to replicate the non-trivial logic. In this JIRA, we are > proposing a delta allocate protocol where the AM will need to only specify > changes in resource constraints. -- This message was sent by Atlassian JIRA (v6.3.4#6332)