[ https://issues.apache.org/jira/browse/YARN-10248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17105933#comment-17105933 ]
Zhankun Tang commented on YARN-10248: ------------------------------------- [~jasstionzyf], do you mean the existing test case "testAllocationWithoutAllowedGpus" fails but is not related to our changes? > when config allowed-gpu-devices , excluded GPUs still be visible to containers > ------------------------------------------------------------------------------ > > Key: YARN-10248 > URL: https://issues.apache.org/jira/browse/YARN-10248 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager > Affects Versions: 3.2.1 > Reporter: zhao yufei > Assignee: zhao yufei > Priority: Minor > Labels: pull-request-available > Fix For: 3.2.1 > > Attachments: YARN-10248-branch-3.2.001.path, > YARN-10248-branch-3.2.001.path > > > I have a server with two GPU, and i want to use only one of them within yarn > cluster. > according to hadoop document, i set configs: > {code:java} > <property> > <name>yarn.nodemanager.resource-plugins.gpu.allowed-gpu-devices</name> > <value>0:1</value> > </property> > <property> > > <name>yarn.nodemanager.resource-plugins.gpu.path-to-discovery-executables</name> > <value>/etc/alternatives/x86_64-linux-gnu_nvidia_smi</value> > </property> > {code} > then i running following command to test: > {code:java} > yarn jar > ./share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.2.1.jar \ > -jar > ./share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.2.1.jar > -shell_command ' nvidia-smi & sleep 3 ' \ > -container_resources memory-mb=3072,vcores=1,yarn.io/gpu=1 \ > -num_containers 1 -queue yufei -node_label_expression slaves > {code} > iI expected gpu with minor number 0 will not visible to container, but in the > launched container, nvidia-smi print two gpu information. > I check the related source code and find it is a bug. > the problem is: > when you specify allowed-gpu-devices, GpuDiscoverer will populate usable gpus > from it, > then when assign to a container some of the gpus, it will set denied gpus for > the container, > but it never consider excluded gpu of the host. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org