[ https://issues.apache.org/jira/browse/YARN-6620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wangda Tan updated YARN-6620: ----------------------------- Attachment: YARN-6620.011.patch Thanks [~zhengchenyu] for your review, I replied your comments here: bq. When storeAssignedResources failed, there will be a leak. Because postComplete will never be called. It means usedDevices won't be removed. So should add cleanupAssignGpus in catch section. Fixed and added UT. bq. I thinks NM recovery for gpu doesn't take effect. Becuase RecoveredContainerState's ResourceMappings didn't add to the recovered Container. It added to ContainerImpl in constructor: {code} // constructor for a recovered container public ContainerImpl(Configuration conf, Dispatcher dispatcher, ContainerLaunchContext launchContext, Credentials creds, NodeManagerMetrics metrics, ContainerTokenIdentifier containerTokenIdentifier, Context context, RecoveredContainerState rcs) { this(conf, dispatcher, launchContext, creds, metrics, containerTokenIdentifier, context, rcs.getStartTime()); // .... this.resourceMappings = rcs.getResourceMappings(); } {code} Please let me know if I missed anything. Attached ver.11 patch. > [YARN-6223] NM Java side code changes to support isolate GPU devices by using > CGroups > ------------------------------------------------------------------------------------- > > Key: YARN-6620 > URL: https://issues.apache.org/jira/browse/YARN-6620 > Project: Hadoop YARN > Issue Type: Sub-task > Reporter: Wangda Tan > Assignee: Wangda Tan > Attachments: YARN-6620.001.patch, YARN-6620.002.patch, > YARN-6620.003.patch, YARN-6620.004.patch, YARN-6620.005.patch, > YARN-6620.006-WIP.patch, YARN-6620.007.patch, YARN-6620.008.patch, > YARN-6620.009.patch, YARN-6620.010.patch, YARN-6620.011.patch > > > This JIRA plan to add support of: > 1) GPU configuration for NodeManagers > 2) Isolation in CGroups. (Java side). > 3) NM restart and recovery allocated GPU devices -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org