[ 
https://issues.apache.org/jira/browse/YARN-6620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-6620:
-----------------------------
    Attachment: YARN-6620.011.patch

Thanks [~zhengchenyu] for your review, I replied your comments here: 

bq. When storeAssignedResources failed, there will be a leak. Because 
postComplete will never be called. It means usedDevices won't be removed. So 
should add cleanupAssignGpus in catch section.
Fixed and added UT. 

bq. I thinks NM recovery for gpu doesn't take effect. Becuase 
RecoveredContainerState's ResourceMappings didn't add to the recovered 
Container.
It added to ContainerImpl in constructor: 
{code}

  // constructor for a recovered container
  public ContainerImpl(Configuration conf, Dispatcher dispatcher,
      ContainerLaunchContext launchContext, Credentials creds,
      NodeManagerMetrics metrics,
      ContainerTokenIdentifier containerTokenIdentifier, Context context,
      RecoveredContainerState rcs) {
    this(conf, dispatcher, launchContext, creds, metrics,
        containerTokenIdentifier, context, rcs.getStartTime());
    // ....
    this.resourceMappings = rcs.getResourceMappings();
  }
{code} 
Please let me know if I missed anything. 

Attached ver.11 patch.


> [YARN-6223] NM Java side code changes to support isolate GPU devices by using 
> CGroups
> -------------------------------------------------------------------------------------
>
>                 Key: YARN-6620
>                 URL: https://issues.apache.org/jira/browse/YARN-6620
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Wangda Tan
>            Assignee: Wangda Tan
>         Attachments: YARN-6620.001.patch, YARN-6620.002.patch, 
> YARN-6620.003.patch, YARN-6620.004.patch, YARN-6620.005.patch, 
> YARN-6620.006-WIP.patch, YARN-6620.007.patch, YARN-6620.008.patch, 
> YARN-6620.009.patch, YARN-6620.010.patch, YARN-6620.011.patch
>
>
> This JIRA plan to add support of:
> 1) GPU configuration for NodeManagers
> 2) Isolation in CGroups. (Java side).
> 3) NM restart and recovery allocated GPU devices



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to