[ https://issues.apache.org/jira/browse/YARN-9060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16756896#comment-16756896 ]
Sunil Govindan edited comment on YARN-9060 at 1/31/19 6:01 AM: --------------------------------------------------------------- Thanks [~tangzhankun] Few comments. # In below comments, it's better to have majorNumber minorNumber example given in same line rather than giving in a line above. {code:java} # #for instance, "8:16,8:32" # devices.denied-numbers=## Blacklisted devices not permitted to use. The format is comma separated "majorNumber:minorNumber". Leave it empty means default devices reported by device plugin are all allowed. \ No newline at end of file{code} # In {{getRegisterRequestInfo}} , could we set the "nvidia.com/gpu" as a static value. ? # Its better to place NvidiaCommandExecutor under test package. # Thanks for adding detailed logs. it certainly helpful. Could you also pls improve some logs in updateDockerRunCommand to add container id or other dynamic values as well. pls refer below logs which lacks some more dynamic information. Also pls see other methods as well to check the same. {code:java} LOG.warn("The container return null for device runtime spec"); LOG.debug("Try to update docker run command.");{code} # requestsDevice ==> requestedDevice # Is there any chance that getCleanupDockerVolumesCommand wont be called in some failure case, if there is such chance, i worry about the cached data structures which may cause some leaks. # In getDeviceType, i prefer to keep a new enum for more readability than keeping "c" or "b" was (Author: sunilg): Thanks [~tangzhankun] Few comments. # In below comments, it's better to have majorNumber minorNumber example given in same line rather than giving in a line above. {code:java} # #for instance, "8:16,8:32" # devices.denied-numbers=## Blacklisted devices not permitted to use. The format is comma separated "majorNumber:minorNumber". Leave it empty means default devices reported by device plugin are all allowed. \ No newline at end of file{code} # In {{getRegisterRequestInfo}} , could we set the "nvidia.com/gpu" as a static value. ? # Its better to place NvidiaCommandExecutor under test package. # Thanks for adding detailed logs. it certainly helpful. Could you also pls improve some logs in updateDockerRunCommand to add container id or other dynamic values as well. pls refer below logs which lacks some more dynamic information. Also pls see other methods as well to check the same. {code:java} LOG.warn("The container return null for device runtime spec"); LOG.debug("Try to update docker run command.");{code} # requestsDevice ==> requestedDevice # Is there any chance that getCleanupDockerVolumesCommand wont be called in some failure case, if there is such chance, i worry about the cached data structures which may cause some leaks. # In getDeviceType, i prefer to keep a new enum for more readability than keeping "c" or "b" > [YARN-8851] Phase 1 - Support device isolation and use the Nvidia GPU plugin > as an example > ------------------------------------------------------------------------------------------ > > Key: YARN-9060 > URL: https://issues.apache.org/jira/browse/YARN-9060 > Project: Hadoop YARN > Issue Type: Sub-task > Reporter: Zhankun Tang > Assignee: Zhankun Tang > Priority: Major > Attachments: YARN-9060-trunk.001.patch, YARN-9060-trunk.002.patch, > YARN-9060-trunk.003.patch, YARN-9060-trunk.004.patch, > YARN-9060-trunk.005.patch, YARN-9060-trunk.006.patch, > YARN-9060-trunk.007.patch, YARN-9060-trunk.008.patch, > YARN-9060-trunk.009.patch, YARN-9060-trunk.010.patch, > YARN-9060-trunk.011.patch, YARN-9060-trunk.012.patch, > YARN-9060-trunk.013.patch, YARN-9060-trunk.014.patch > > > Due to the cgroups v1 implementation policy in linux kernel, we cannot update > the value of the device cgroups controller unless we have the root permission > ([here|https://github.com/torvalds/linux/blob/6f0d349d922ba44e4348a17a78ea51b7135965b1/security/device_cgroup.c#L604]). > So we need to support this in container-executor for Java layer to invoke. > This Jira will have three parts: > # native c-e module > # Java layer code to isolate devices for container (docker and non-docker) > # A sample Nvidia GPU plugin -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org