[ https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13784061#comment-13784061 ]
Wangda Tan commented on YARN-1197: ---------------------------------- To [~bikassaha], Thanks for your comments, see my opinions below, {quote} Still thinking through the RM-NM interactions. The request for change should probably be a new object that is basically a map of (containerId, Resource) where Resource is new value for the existing containerId. Not quite sure how we would use the new container token for a running container since its only used in start container. {quote} Agree, we need to update interface of YarnScheduler.allocate to accept this as a paramter if we make request for change independent. And as you mentioned below, we can use the new token to update NM's resource monitoring limitations of containers. {quote} If we wait for RM to sync with NM about the increased resources then it might be too slow since this happens on a heartbeat and the heartbeat interval can be in the order of seconds. An alternative would be a new NM API to allow AM's to increase resources and this would be signed with new container token. But this would burden the AMs by requiring them to make that additional call. {quote} Agree, this is much more time-effective than RM-NM communications. Yes, it's a cost for both AM/NM for changing container size, but AM should be self-discipline not do this too frequent. {quote} There could be a race between a new container token coming in with increased resources for an acquired container and the old container token being used by the NMClient to launch the container (in case the AM decides to launch the smaller container while it was waiting for an increase). {quote} Hmmm... thanks for reminding, this is really a problem. I find another issue is AM may "lie" to RM/NM about resource usage, AM can 1) allocate a big container, launch it 2) ask for decrease the container, RM released resource in corresponding node/application 3) but AM doesn't tell NM about this decrease, it can still use resource before releasing in the container I don't have a good idea to solve such problem now. Hope to get more idea from you about this, I will think it through as well. > Support changing resources of an allocated container > ---------------------------------------------------- > > Key: YARN-1197 > URL: https://issues.apache.org/jira/browse/YARN-1197 > Project: Hadoop YARN > Issue Type: Task > Components: api, nodemanager, resourcemanager > Affects Versions: 2.1.0-beta > Reporter: Wangda Tan > Attachments: yarn-1197.pdf > > > Currently, YARN cannot support merge several containers in one node to a big > container, which can make us incrementally ask resources, merge them to a > bigger one, and launch our processes. The user scenario is described in the > comments. -- This message was sent by Atlassian JIRA (v6.1#6144)