[ https://issues.apache.org/jira/browse/YARN-8851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16654594#comment-16654594 ]
Zhankun Tang edited comment on YARN-8851 at 10/18/18 3:56 AM: -------------------------------------------------------------- [~leftnoteasy] , For question 6, I'll try to answer here and we can talk offline if it's not clear to you. The design principle here I'm trying to follow is trying to make the vendor completely agnostic to our YARN internals. Simpler for them, better for YARN's device plugin ecosystem. Actually, I'm not very sure if this will bring huge out of control complexity for us. But my idea is like this: The vendor developer only needs to use libraries YARN provided to describes the requirements related to their devices. And the *_DevicePlugin_* interface defines the hooks which are the only chances the vendor can tell YARN what devices they have and how to use their devices. It is the only interfaces that the vendor needs to know. And the specs can be only created with the library provided by us. Sorry that the *_DevicePluginAdapter_* name is confusing. This class act as a bridge between NM and the vendor plugin. When NM wants to get devices, the DevicePluginAdapter knows and delegate it to vendor plugin and give back result. When NM wants to use these devices, the DevicePluginAdapter knows, it allocates devices and delegates to the vendor plugin to get back how to use them and tell YARN in YARN's language. The DevicePluginAdapter is a 1 to 1 relation with DevicePlugin. Each DevicePlugin instance needs a DevicePluginAdapter instance to help it. So it's not a problem that DevicePlugin interfaces are not similar to DevicePluginAdapter. The DevicePluginAdapter knows YARN internals well and should not be touched by the vendor. Maybe "DevicePluginWrapper" or "ResourcePluginAdapter" is more proper name? For the device scheduler, I'm now using a shared device scheduler to handle all DevicePluginAdapter's allocation request before container launch. The various type of resources allocated one by one in this shared scheduler which is, in essence, the same with current independent scheduler inside each GPU plugin/FPGA plugin. Regarding to whether we should accept vendor's customized scheduler, it's a good idea. But from my experience, I guess a shared scheduler supporting FIFO and topology scheduling(topology can be described in _Device,_ check design doc) might be enough for most of the vendor in a long term? was (Author: tangzhankun): [~leftnoteasy] , For question 6, I'll try to answer here and we can talk offline if it's not clear to you. The design principle here I'm trying to follow is trying to make the vendor completely agnostic to our YARN internals. Simpler for them, better for YARN's device plugin ecosystem. Actually, I'm not very sure if this will bring huge out of control complexity for us. But my idea is like this: The vendor developer only needs to use libraries YARN provided to describes the requirements related to their devices. And the *_DevicePlugin_* interface defines the hooks which are the only chances the vendor can tell YARN what devices they have and how to use their devices. It is the only interfaces that the vendor needs to know. And the specs can be only created with the library provided by us. Sorry that the *_DevicePluginAdapter_* name is confusing. This class act as a bridge between NM and the vendor plugin. When NM wants to get devices, the DevicePluginAdapter knows and delegate it to vendor plugin and give back result. When NM wants to use these devices, the DevicePluginAdapter knows, it allocates devices and delegates to the vendor plugin to get back how to use them and tell YARN in YARN's language. The DevicePluginAdapter is a 1 to 1 relation with DevicePlugin. Each DevicePlugin instance needs a DevicePluginAdapter instance to help it. So it's not a problem that DevicePlugin interfaces are not similar to DevicePluginAdapter. The DevicePluginAdapter knows YARN internals well and should not be touched by the vendor. Maybe "DevicePluginWrapper" or "ResourcePluginAdapter" is more proper name? For the device scheduler, I'm now using a shared device scheduler to handle all DevicePluginAdapter's allocation request before container launch. The various type of resources allocated one by one in this shared scheduler which is, in essence, the same with current independent scheduler inside each GPU plugin/FPGA plugin. Regarding to whether we should accept vendor's customized scheduler, it's a good idea. But from my experience, I guess a shared scheduler supporting FIFO and topology scheduling might be enough for most of the vendor in a long term? > [Umbrella] A new pluggable device plugin framework to ease vendor plugin > development > ------------------------------------------------------------------------------------ > > Key: YARN-8851 > URL: https://issues.apache.org/jira/browse/YARN-8851 > Project: Hadoop YARN > Issue Type: New Feature > Components: yarn > Reporter: Zhankun Tang > Assignee: Zhankun Tang > Priority: Major > Attachments: YARN-8851-WIP2-trunk.001.patch, > YARN-8851-WIP3-trunk.001.patch, YARN-8851-WIP4-trunk.001.patch, [YARN-8851] > YARN_New_Device_Plugin_Framework_Design_Proposal-3.pdf, [YARN-8851] > YARN_New_Device_Plugin_Framework_Design_Proposal.pdf > > > At present, we support GPU/FPGA device in YARN through a native, coupling > way. But it's difficult for a vendor to implement such a device plugin > because the developer needs much knowledge of YARN internals. And this brings > burden to the community to maintain both YARN core and vendor-specific code. > Here we propose a new device plugin framework to ease vendor device plugin > development and provide a more flexible way to integrate with YARN NM. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org