[ https://issues.apache.org/jira/browse/YARN-8851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16660906#comment-16660906 ]
Weiwei Yang commented on YARN-8851: ----------------------------------- Hi [~tangzhankun] Thanks for the design doc and patch. I have some high-level comments 1) From a user perspective, what needs to be implemented? Is it just following two? * DevicePlugin (required) * DevicePluginScheduler (optional) 2) It's good to see you added a *examples* package, it will be useful for user to start with. However instead of providing a fake implementation, can we implement a demo device plugin that can be actually configured and tested on a single node cluster? This will give more sense to user how to implement their own plugin. Further, it will be good if you can provide a sanity test-suit to verify if a device plugin is compatible. 3) Some high-level comments about the APIs in {{DevicePlugin}} {code:java} DeviceRegisterRequest register(); {code} This is a bit confusing. A register() function is normally a two-side call, e.g a slave registers itself to a master. But here it simply returns a {{DeviceRegisterRequest}}, it looks more like a {{getDeviceInfo()}} API to me. {code:java} Set<Device> getDevices(); {code} is this supposed to return a set available devices? If so, is it better to rename it to "getAvailableDevices"? 4) It is interesting to allow customized {{DevicePluginScheduler}}, how failure recovery can be done? Does that mean user needs to implement all the logic about allocated resource persistent & recovery in NM store? In that case, we are exposing too much YARN internals in a plugin framework. 5) {{DevicePluginAdapter}} doesn't look like a adaptor, it looks more like a base class of {{ResourcePlugin}} to me. Pls correct me if I misunderstood this. 6) It is confusing that DevicePluginAdapter has a reference to ResourcePluginManager, could you remove that? From what I can see, ResourcePluginManager manages all ResourcePlugins, and each ResourcePlugins can be instanced by a DevicePluginAdapter. Let me know if these make sense. Thanks > [Umbrella] A new pluggable device plugin framework to ease vendor plugin > development > ------------------------------------------------------------------------------------ > > Key: YARN-8851 > URL: https://issues.apache.org/jira/browse/YARN-8851 > Project: Hadoop YARN > Issue Type: New Feature > Components: yarn > Reporter: Zhankun Tang > Assignee: Zhankun Tang > Priority: Major > Attachments: YARN-8851-WIP2-trunk.001.patch, > YARN-8851-WIP3-trunk.001.patch, YARN-8851-WIP4-trunk.001.patch, > YARN-8851-WIP5-trunk.001.patch, YARN-8851-WIP6-trunk.001.patch, > YARN-8851-WIP7-trunk.001.patch, [YARN-8851] > YARN_New_Device_Plugin_Framework_Design_Proposal-3.pdf, [YARN-8851] > YARN_New_Device_Plugin_Framework_Design_Proposal.pdf > > > At present, we support GPU/FPGA device in YARN through a native, coupling > way. But it's difficult for a vendor to implement such a device plugin > because the developer needs much knowledge of YARN internals. And this brings > burden to the community to maintain both YARN core and vendor-specific code. > Here we propose a new device plugin framework to ease vendor device plugin > development and provide a more flexible way to integrate with YARN NM. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org