[ 
https://issues.apache.org/jira/browse/YARN-9120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16722443#comment-16722443
 ] 

Zhankun Tang commented on YARN-9120:
------------------------------------

[~snemeth], I double-checked that if we remove "yarn.io/gpu" from property 
"nm.resource-plugins", the other GPU related configuration remains there, the 
server's GPU resource won't be discovered and used. Which means, GPU is 
disabled. And verified that the application requesting GPU will fail. It can 
run without requesting GPU resource.

[~pbacsko], Probably it may have no obvious benefit when we add a new "off" 
value to "yarn.nodemanager.resource-plugins.gpu.allowed-gpu-devices" comparing 
to remove "yarn.io/gpu" from "yarn.nodemanager.resource-plugins"? Both ways to 
me need the admin to configure different yarn-site.xml in the servers.
I guess your point is on how YARN can manage the configurations on a 
heterogeneous cluster? 
I'm not sure if Ambari or any tool can have a different configuration for each 
node. This seems not YARN's responsibility. [~rohithsharma] , any idea?

> Need to have a way to turn off GPU auto-discovery in GpuDiscoverer
> ------------------------------------------------------------------
>
>                 Key: YARN-9120
>                 URL: https://issues.apache.org/jira/browse/YARN-9120
>             Project: Hadoop YARN
>          Issue Type: Improvement
>            Reporter: Szilard Nemeth
>            Assignee: Szilard Nemeth
>            Priority: Major
>
> GpuDiscoverer.getGpusUsableByYarn either parses the user-defined GPU devices 
> or should have the value 'auto' (from property: 
> yarn.nodemanager.resource-plugins.gpu.allowed-gpu-devices)
> In some circumstances, users would want to exclude a node from scheduling, so 
> they should have an option to turn off auto-discovery.
> It's straightforward that this is possible by removing the GPU 
> resource-plugin from YARN's config along with GPU-related config in 
> container-executor.cfg, but doing that with a dedicated value for 
> yarn.nodemanager.resource-plugins.gpu.allowed-gpu-devices is a more 
> lightweight approach.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to