[ 
https://issues.apache.org/jira/browse/YARN-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14726075#comment-14726075
 ] 

MENG DING commented on YARN-1651:
---------------------------------

Thanks [~leftnoteasy] for the explanation.

bq. For the example 1, I think it's not very confusing. IIRC, we have discussed 
this before. We should track latest container increase/decrease status in 
AMRMClient. If AM send increase token to NM after it asks decrease the same 
container, it's AM's fault
This makes sense to me now.

bq. For example 2, actually this is handled by existing code: 
AbstractYarnScheduler#containerIncreasedOnNode. Killing the container after 
increase expires is just a temp solution. I did this only because existing 
patch is too big, I don't want to add more confusing to it . 
bq. We should fix it as soon as we can, and it should be in the same release of 
YARN-1197. Do you think does it still confuse you if we fix the killing 
container after increase expires issue?
This should be fine as long as we handle resource rollback correctly for 
multiple increase requests expiration. For example:

1. Initially a container uses 4GB
2. The first increase request increases it to 6GB (token1)
3. The second increase request increases it to 8GB (token2)

If only token2 is used, expiration of token1 will be ignored
If token1 is used, but token2 is expired, scheduler should be able to roll back 
to 6GB.
If both token1 and token2 expire, scheduler must be able to roll back to 4GB.

In other words, the increase expiration logic must remember all ongoing 
increase requests for a container, not just the latest one.

bq.  Frankly I don't like existing naming as well, but I don't get a better 
name.
I think it is the name of {{CombinedContainerAllocator}} that is most confusing 
to me :-). How about just naming {{CombinedContainerAllocator}} as 
{{ContainerAllocator}}, and then renaming the current {{ContainerAllocator}} as 
{{AbstractContainerAllocator}}? Anyway, maybe other people will have better 
suggestions.

I also have another suggestion regarding {{newlyDecreasedContainers}}. Right 
now, as soon as the scheduler decrease the container resource, it will add the 
container to this list, which will be pulled by AM in the next heartbeat. I 
think this doesn't have too much value. I was wondering, if it can be enhanced 
such that the list will only be updated when the decreased containers list has 
been sent to NM? Implementation wise, it can be similar to the idea of 
RMAppAttemptImpl.justFinishedContainers, and 
RMAppAttemptImpl.finishedContainersSentToAM. Basically RMNodeImpl keeps the 
containers from the *toBeDecreasedContainers* list in the memory for one more 
heartbeat cycle, and once RM receives the next heartbeat request from NM, it 
implies that the toBeDecreasedContainers from last heartbeat response has been 
successfully received by NM. The benefit of this approach is that once AM 
receives the decreased list, it knows that the decrease has been received by 
NM. Maybe we can take a step further to make the decrease action synchronous in 
NodeStatusUpdaterImpl (involves one line code change, and we already do 
blocking increase on NM), then once AM receives the decreased list, it knows 
that decrease has been completed in NM, eliminating the needs for status 
polling. Thoughts?

> CapacityScheduler side changes to support increase/decrease container 
> resource.
> -------------------------------------------------------------------------------
>
>                 Key: YARN-1651
>                 URL: https://issues.apache.org/jira/browse/YARN-1651
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager, scheduler
>            Reporter: Wangda Tan
>            Assignee: Wangda Tan
>         Attachments: YARN-1651-1.YARN-1197.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to