[ https://issues.apache.org/jira/browse/YARN-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14726075#comment-14726075 ]
MENG DING commented on YARN-1651: --------------------------------- Thanks [~leftnoteasy] for the explanation. bq. For the example 1, I think it's not very confusing. IIRC, we have discussed this before. We should track latest container increase/decrease status in AMRMClient. If AM send increase token to NM after it asks decrease the same container, it's AM's fault This makes sense to me now. bq. For example 2, actually this is handled by existing code: AbstractYarnScheduler#containerIncreasedOnNode. Killing the container after increase expires is just a temp solution. I did this only because existing patch is too big, I don't want to add more confusing to it . bq. We should fix it as soon as we can, and it should be in the same release of YARN-1197. Do you think does it still confuse you if we fix the killing container after increase expires issue? This should be fine as long as we handle resource rollback correctly for multiple increase requests expiration. For example: 1. Initially a container uses 4GB 2. The first increase request increases it to 6GB (token1) 3. The second increase request increases it to 8GB (token2) If only token2 is used, expiration of token1 will be ignored If token1 is used, but token2 is expired, scheduler should be able to roll back to 6GB. If both token1 and token2 expire, scheduler must be able to roll back to 4GB. In other words, the increase expiration logic must remember all ongoing increase requests for a container, not just the latest one. bq. Frankly I don't like existing naming as well, but I don't get a better name. I think it is the name of {{CombinedContainerAllocator}} that is most confusing to me :-). How about just naming {{CombinedContainerAllocator}} as {{ContainerAllocator}}, and then renaming the current {{ContainerAllocator}} as {{AbstractContainerAllocator}}? Anyway, maybe other people will have better suggestions. I also have another suggestion regarding {{newlyDecreasedContainers}}. Right now, as soon as the scheduler decrease the container resource, it will add the container to this list, which will be pulled by AM in the next heartbeat. I think this doesn't have too much value. I was wondering, if it can be enhanced such that the list will only be updated when the decreased containers list has been sent to NM? Implementation wise, it can be similar to the idea of RMAppAttemptImpl.justFinishedContainers, and RMAppAttemptImpl.finishedContainersSentToAM. Basically RMNodeImpl keeps the containers from the *toBeDecreasedContainers* list in the memory for one more heartbeat cycle, and once RM receives the next heartbeat request from NM, it implies that the toBeDecreasedContainers from last heartbeat response has been successfully received by NM. The benefit of this approach is that once AM receives the decreased list, it knows that the decrease has been received by NM. Maybe we can take a step further to make the decrease action synchronous in NodeStatusUpdaterImpl (involves one line code change, and we already do blocking increase on NM), then once AM receives the decreased list, it knows that decrease has been completed in NM, eliminating the needs for status polling. Thoughts? > CapacityScheduler side changes to support increase/decrease container > resource. > ------------------------------------------------------------------------------- > > Key: YARN-1651 > URL: https://issues.apache.org/jira/browse/YARN-1651 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager, scheduler > Reporter: Wangda Tan > Assignee: Wangda Tan > Attachments: YARN-1651-1.YARN-1197.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)