[ 
https://issues.apache.org/jira/browse/YARN-8851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16660906#comment-16660906
 ] 

Weiwei Yang commented on YARN-8851:
-----------------------------------

Hi [~tangzhankun]

Thanks for the design doc and patch. I have some high-level comments

1) From a user perspective, what needs to be implemented? Is it just following 
two?
 * DevicePlugin (required)
 * DevicePluginScheduler (optional)

2) It's good to see you added a *examples* package, it will be useful for user 
to start with. However instead of providing a fake implementation, can we 
implement a demo device plugin that can be actually configured and tested on a 
single node cluster? This will give more sense to user how to implement their 
own plugin. Further, it will be good if you can provide a sanity test-suit to 
verify if a device plugin is compatible.

3) Some high-level comments about the APIs in {{DevicePlugin}}
{code:java}
DeviceRegisterRequest register();
{code}
This is a bit confusing. A register() function is normally a two-side call, e.g 
a slave registers itself to a master. But here it simply returns a 
{{DeviceRegisterRequest}}, it looks more like a {{getDeviceInfo()}} API to me.
{code:java}
Set<Device> getDevices();
{code}
is this supposed to return a set available devices? If so, is it better to 
rename it to "getAvailableDevices"?

4) It is interesting to allow customized {{DevicePluginScheduler}}, how failure 
recovery can be done? Does that mean user needs to implement all the logic 
about allocated resource persistent & recovery in NM store? In that case, we 
are exposing too much YARN internals in a plugin framework.

5) {{DevicePluginAdapter}} doesn't look like a adaptor, it looks more like a 
base class of {{ResourcePlugin}} to me. Pls correct me if I misunderstood this.

6) It is confusing that DevicePluginAdapter has a reference to 
ResourcePluginManager, could you remove that? From what I can see, 
ResourcePluginManager manages all ResourcePlugins, and each ResourcePlugins can 
be instanced by a DevicePluginAdapter.

Let me know if these make sense.

Thanks

> [Umbrella] A new pluggable device plugin framework to ease vendor plugin 
> development
> ------------------------------------------------------------------------------------
>
>                 Key: YARN-8851
>                 URL: https://issues.apache.org/jira/browse/YARN-8851
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: yarn
>            Reporter: Zhankun Tang
>            Assignee: Zhankun Tang
>            Priority: Major
>         Attachments: YARN-8851-WIP2-trunk.001.patch, 
> YARN-8851-WIP3-trunk.001.patch, YARN-8851-WIP4-trunk.001.patch, 
> YARN-8851-WIP5-trunk.001.patch, YARN-8851-WIP6-trunk.001.patch, 
> YARN-8851-WIP7-trunk.001.patch, [YARN-8851] 
> YARN_New_Device_Plugin_Framework_Design_Proposal-3.pdf, [YARN-8851] 
> YARN_New_Device_Plugin_Framework_Design_Proposal.pdf
>
>
> At present, we support GPU/FPGA device in YARN through a native, coupling 
> way. But it's difficult for a vendor to implement such a device plugin 
> because the developer needs much knowledge of YARN internals. And this brings 
> burden to the community to maintain both YARN core and vendor-specific code.
> Here we propose a new device plugin framework to ease vendor device plugin 
> development and provide a more flexible way to integrate with YARN NM.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to