[ 
https://issues.apache.org/jira/browse/YARN-9060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16756896#comment-16756896
 ] 

Sunil Govindan edited comment on YARN-9060 at 1/31/19 6:01 AM:
---------------------------------------------------------------

Thanks [~tangzhankun] 

Few comments.
 # In below comments, it's better to have majorNumber minorNumber example given 
in same line rather than giving in a line above.
{code:java}
# #for instance, "8:16,8:32"
# devices.denied-numbers=## Blacklisted devices not permitted to use. The 
format is comma separated "majorNumber:minorNumber". Leave it empty means 
default devices reported by device plugin are all allowed.
\ No newline at end of file{code}
 # In {{getRegisterRequestInfo}} , could we set the "nvidia.com/gpu" as a 
static value. ?
 # Its better to place NvidiaCommandExecutor under test package.
 # Thanks for adding detailed logs. it certainly helpful. Could you also pls 
improve some logs in updateDockerRunCommand to add container id or other 
dynamic values as well. pls refer below logs which lacks some more dynamic 
information. Also pls see other methods as well to check the same.
{code:java}
LOG.warn("The container return null for device runtime spec");
LOG.debug("Try to update docker run command.");{code}
 # requestsDevice ==> requestedDevice
 # Is there any chance that getCleanupDockerVolumesCommand wont be called in 
some failure case, if there is such chance, i worry about the cached data 
structures which may cause some leaks.
 # In getDeviceType, i prefer to keep a new enum for more readability than 
keeping "c" or "b"


was (Author: sunilg):
Thanks [~tangzhankun] 

Few comments.
 # In below comments, it's better to have majorNumber minorNumber example given 
in same line rather than giving in a line above.
{code:java}
# #for instance, "8:16,8:32"
# devices.denied-numbers=## Blacklisted devices not permitted to use. The 
format is comma separated "majorNumber:minorNumber". Leave it empty means 
default devices reported by device plugin are all allowed.
\ No newline at end of file{code}

 # In {{getRegisterRequestInfo}} , could we set the "nvidia.com/gpu" as a 
static value. ?
 # Its better to place NvidiaCommandExecutor under test package.
 # Thanks for adding detailed logs. it certainly helpful. Could you also pls 
improve some logs in updateDockerRunCommand to add container id or other 
dynamic values as well. pls refer below logs which lacks some more dynamic 
information. Also pls see other methods as well to check the same.
{code:java}
LOG.warn("The container return null for device runtime spec");
LOG.debug("Try to update docker run command.");{code}
 # requestsDevice ==> requestedDevice
 # Is there any chance that getCleanupDockerVolumesCommand wont be called in 
some failure case, if there is such chance, i worry about the cached data 
structures which may cause some leaks.
 # In getDeviceType, i prefer to keep a new enum for more readability than 
keeping "c" or "b"

> [YARN-8851] Phase 1 - Support device isolation and use the Nvidia GPU plugin 
> as an example
> ------------------------------------------------------------------------------------------
>
>                 Key: YARN-9060
>                 URL: https://issues.apache.org/jira/browse/YARN-9060
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Zhankun Tang
>            Assignee: Zhankun Tang
>            Priority: Major
>         Attachments: YARN-9060-trunk.001.patch, YARN-9060-trunk.002.patch, 
> YARN-9060-trunk.003.patch, YARN-9060-trunk.004.patch, 
> YARN-9060-trunk.005.patch, YARN-9060-trunk.006.patch, 
> YARN-9060-trunk.007.patch, YARN-9060-trunk.008.patch, 
> YARN-9060-trunk.009.patch, YARN-9060-trunk.010.patch, 
> YARN-9060-trunk.011.patch, YARN-9060-trunk.012.patch, 
> YARN-9060-trunk.013.patch, YARN-9060-trunk.014.patch
>
>
> Due to the cgroups v1 implementation policy in linux kernel, we cannot update 
> the value of the device cgroups controller unless we have the root permission 
> ([here|https://github.com/torvalds/linux/blob/6f0d349d922ba44e4348a17a78ea51b7135965b1/security/device_cgroup.c#L604]).
>  So we need to support this in container-executor for Java layer to invoke.
> This Jira will have three parts:
>  # native c-e module
>  # Java layer code to isolate devices for container (docker and non-docker)
>  # A sample Nvidia GPU plugin



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to