[jira] [Commented] (YARN-8851) [Umbrella] A new pluggable device plugin framework to ease vendor plugin development

2018-11-19 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16692020#comment-16692020
 ] 

Hudson commented on YARN-8851:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #15462 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/15462/])
YARN-8881. [YARN-8851] Add basic pluggable device plugin framework. (wangda: 
rev 63578036450f660d49ae204327efcd629d9dd137)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/ResourcePluginManager.java
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/api/deviceplugin/DeviceRegisterRequest.java
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/deviceframework/package-info.java
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/deviceframework/FakeTestDevicePlugin2.java
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/package-info.java
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/deviceframework/TestDevicePluginAdapter.java
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/deviceframework/FakeTestDevicePlugin3.java
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/resource-types-pluggable-devices.xml
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/api/deviceplugin/package-info.java
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/api/deviceplugin/DevicePlugin.java
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/api/deviceplugin/MountDeviceSpec.java
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/api/deviceplugin/VolumeSpec.java
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/deviceframework/DevicePluginAdapter.java
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/api/deviceplugin/Device.java
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/deviceframework/FakeTestDevicePlugin4.java
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/api/deviceplugin/YarnRuntimeType.java
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/api/deviceplugin/DeviceRuntimeSpec.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/TestResourcePluginManager.java
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/api/deviceplugin/MountVolumeSpec.java
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/deviceframework/DeviceResourceUpdaterImpl.java
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/deviceframework/FakeTestDevicePlugin1.java


> [Umbrella] A new pluggable device plugin framework to ease vendor plugin 
> development
> 
>
> Key: YARN-8851
> URL: https://issues.apache.org/jira/browse/YARN-8851
> Project: Hadoop YARN
>  Issue 

[jira] [Commented] (YARN-8851) [Umbrella] A new pluggable device plugin framework to ease vendor plugin development

2018-11-06 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16677733#comment-16677733
 ] 

Hadoop QA commented on YARN-8851:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
13s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 7 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  3m 
47s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m  
4s{color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} shadedclient {color} | {color:red} 13m 
37s{color} | {color:red} branch has errors when building and testing our client 
artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
41s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
14s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m  
5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m  
5s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 16s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 175 new + 240 unchanged - 3 fixed = 415 total (was 243) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
2s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:red}-1{color} | {color:red} shadedclient {color} | {color:red} 10m 
42s{color} | {color:red} patch has errors when building and testing our client 
artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
56s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  0m 44s{color} 
| {color:red} hadoop-yarn-api in the patch failed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  1m  2s{color} 
| {color:red} hadoop-yarn-common in the patch failed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  1m 35s{color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
41s{color} | {color:red} The patch generated 1 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 87m 17s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f |
| JIRA Issue | YARN-8851 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12947182/YARN-8851-trunk.002.patch
 |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  xml  |
| uname | Linux c3679116819d 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 
17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 

[jira] [Commented] (YARN-8851) [Umbrella] A new pluggable device plugin framework to ease vendor plugin development

2018-11-05 Thread Zhankun Tang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16676074#comment-16676074
 ] 

Zhankun Tang commented on YARN-8851:


{quote}1) Regarding to the 
NM_PLUGGABLE_DEVICE_FRAMEWORK_PREFER_CUSTOMIZED_SCHEDULER, should we just use 
default scheduler if device plugin doesn't provide their customized scheduler? 
We should assume that load device plugin runs "trusted" code, we may not need 
to add extra protection here.
{quote}
Zhankun–> Agree.
{quote}2) DeviceSchedulerManager, it sounds like "manages scheduler", however 
it handles how to map device to containers, and scheduler is just 
implementation details. How about call it DeviceMappingManager?
{quote}
-

 
{quote}internalAssignDevices should be private, and it is a bit long, might be 
better for future maintenance if you can break it down to multiple methods.
{quote}
Zhankun -> Good idea. Will do that.
{quote}I think we could move to make this POC to sub tasks and get them done 
piece by piece. It gonna be helpful if you can highlight subtasks required.
{quote}
Zhankun-> The YARN-8880, YARN-8881, YARN-8882, YARN-8883, YARN-8885 are our 
Phase 1 highlighted subtasks.

Thanks for the review! [~leftnoteasy]

> [Umbrella] A new pluggable device plugin framework to ease vendor plugin 
> development
> 
>
> Key: YARN-8851
> URL: https://issues.apache.org/jira/browse/YARN-8851
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: yarn
>Reporter: Zhankun Tang
>Assignee: Zhankun Tang
>Priority: Major
> Attachments: YARN-8851-WIP2-trunk.001.patch, 
> YARN-8851-WIP3-trunk.001.patch, YARN-8851-WIP4-trunk.001.patch, 
> YARN-8851-WIP5-trunk.001.patch, YARN-8851-WIP6-trunk.001.patch, 
> YARN-8851-WIP7-trunk.001.patch, YARN-8851-WIP8-trunk.001.patch, 
> YARN-8851-WIP9-trunk.001.patch, YARN-8851-trunk.001.patch, [YARN-8851] 
> YARN_New_Device_Plugin_Framework_Design_Proposal-3.pdf, [YARN-8851] 
> YARN_New_Device_Plugin_Framework_Design_Proposal-4.pdf, [YARN-8851] 
> YARN_New_Device_Plugin_Framework_Design_Proposal.pdf
>
>
> At present, we support GPU/FPGA device in YARN through a native, coupling 
> way. But it's difficult for a vendor to implement such a device plugin 
> because the developer needs much knowledge of YARN internals. And this brings 
> burden to the community to maintain both YARN core and vendor-specific code.
> Here we propose a new device plugin framework to ease vendor device plugin 
> development and provide a more flexible way to integrate with YARN NM.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8851) [Umbrella] A new pluggable device plugin framework to ease vendor plugin development

2018-11-05 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16675606#comment-16675606
 ] 

Wangda Tan commented on YARN-8851:
--

Thanks [~tangzhankun] , 

1) Regarding to the NM_PLUGGABLE_DEVICE_FRAMEWORK_PREFER_CUSTOMIZED_SCHEDULER, 
should we just use default scheduler if device plugin doesn't provide their 
customized scheduler? We should assume that load device plugin runs "trusted" 
code, we may not need to add extra protection here.


2) DeviceSchedulerManager, it sounds like "manages scheduler", however it 
handles how to map device to containers, and scheduler is just implementation 
details. How about call it DeviceMappingManager?
- internalAssignDevices should be private, and it is a bit long, might be 
better for future maintenance if you can break it down to multiple methods.

I think we could move to make this POC to sub tasks and get them done piece by 
piece. It gonna be helpful if you can highlight subtasks required.

> [Umbrella] A new pluggable device plugin framework to ease vendor plugin 
> development
> 
>
> Key: YARN-8851
> URL: https://issues.apache.org/jira/browse/YARN-8851
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: yarn
>Reporter: Zhankun Tang
>Assignee: Zhankun Tang
>Priority: Major
> Attachments: YARN-8851-WIP2-trunk.001.patch, 
> YARN-8851-WIP3-trunk.001.patch, YARN-8851-WIP4-trunk.001.patch, 
> YARN-8851-WIP5-trunk.001.patch, YARN-8851-WIP6-trunk.001.patch, 
> YARN-8851-WIP7-trunk.001.patch, YARN-8851-WIP8-trunk.001.patch, 
> YARN-8851-WIP9-trunk.001.patch, YARN-8851-trunk.001.patch, [YARN-8851] 
> YARN_New_Device_Plugin_Framework_Design_Proposal-3.pdf, [YARN-8851] 
> YARN_New_Device_Plugin_Framework_Design_Proposal-4.pdf, [YARN-8851] 
> YARN_New_Device_Plugin_Framework_Design_Proposal.pdf
>
>
> At present, we support GPU/FPGA device in YARN through a native, coupling 
> way. But it's difficult for a vendor to implement such a device plugin 
> because the developer needs much knowledge of YARN internals. And this brings 
> burden to the community to maintain both YARN core and vendor-specific code.
> Here we propose a new device plugin framework to ease vendor device plugin 
> development and provide a more flexible way to integrate with YARN NM.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8851) [Umbrella] A new pluggable device plugin framework to ease vendor plugin development

2018-11-05 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16675015#comment-16675015
 ] 

Hadoop QA commented on YARN-8851:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
22s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 6 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
13s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
59s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
31s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
18s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m 11s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
51s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
55s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
13s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m 
54s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 33s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 186 new + 229 unchanged - 3 fixed = 415 total (was 232) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
3s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m  0s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m 
11s{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 generated 5 new + 0 unchanged - 0 fixed = 5 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
51s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
47s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
21s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 18m 39s{color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
38s{color} | {color:red} The patch generated 2 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}109m 42s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | 
module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 |
|  |  Null pointer dereference of allocated in 
org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.deviceframework.DeviceResourceDockerRuntimePluginImpl.getAllocatedDevices(Container,
 Set)  Dereferenced at 

[jira] [Commented] (YARN-8851) [Umbrella] A new pluggable device plugin framework to ease vendor plugin development

2018-11-03 Thread Zhankun Tang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16673972#comment-16673972
 ] 

Zhankun Tang commented on YARN-8851:


[~leftnoteasy] [~csingh] . Thanks for the review. After offline discussion with 
wangda, we'll prefer to give below one interface for the plugin to prepare 
device and reply DeviceRuntimeSpec after device allocated.

Also, we can improve it as time goes by.
{code:java}
/**
 * Asking how these devices should be prepared/used
 * before/when container launch. A plugin can do some tasks in its own or
 * define it in DeviceRuntimeSpec to let the framework do it.
 * For instance, define {@code VolumeSpec} to let the
 * framework to create volume before running container.
 *
 * @param allocatedDevices A set of allocated {@link Device}.
 * @param yarnRuntime Indicate which runtime YARN will use
 * Could be {@code docker} or {@code default}
 * in {@link DeviceRuntimeSpec} constants
 * @return a {@link DeviceRuntimeSpec} description about environment,
 * {@link VolumeSpec}, {@link MountVolumeSpec}. etc
 * */
DeviceRuntimeSpec onDeviceAllocated(Set allocatedDevices,
 String yarnRuntime);{code}

> [Umbrella] A new pluggable device plugin framework to ease vendor plugin 
> development
> 
>
> Key: YARN-8851
> URL: https://issues.apache.org/jira/browse/YARN-8851
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: yarn
>Reporter: Zhankun Tang
>Assignee: Zhankun Tang
>Priority: Major
> Attachments: YARN-8851-WIP2-trunk.001.patch, 
> YARN-8851-WIP3-trunk.001.patch, YARN-8851-WIP4-trunk.001.patch, 
> YARN-8851-WIP5-trunk.001.patch, YARN-8851-WIP6-trunk.001.patch, 
> YARN-8851-WIP7-trunk.001.patch, YARN-8851-WIP8-trunk.001.patch, 
> YARN-8851-WIP9-trunk.001.patch, [YARN-8851] 
> YARN_New_Device_Plugin_Framework_Design_Proposal-3.pdf, [YARN-8851] 
> YARN_New_Device_Plugin_Framework_Design_Proposal.pdf
>
>
> At present, we support GPU/FPGA device in YARN through a native, coupling 
> way. But it's difficult for a vendor to implement such a device plugin 
> because the developer needs much knowledge of YARN internals. And this brings 
> burden to the community to maintain both YARN core and vendor-specific code.
> Here we propose a new device plugin framework to ease vendor device plugin 
> development and provide a more flexible way to integrate with YARN NM.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8851) [Umbrella] A new pluggable device plugin framework to ease vendor plugin development

2018-10-29 Thread Zhankun Tang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16667341#comment-16667341
 ] 

Zhankun Tang commented on YARN-8851:


Updated the patch.
 # Add a sanity-check to fast fail an incompatible plugin
 # Add topology information in Device class

> [Umbrella] A new pluggable device plugin framework to ease vendor plugin 
> development
> 
>
> Key: YARN-8851
> URL: https://issues.apache.org/jira/browse/YARN-8851
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: yarn
>Reporter: Zhankun Tang
>Assignee: Zhankun Tang
>Priority: Major
> Attachments: YARN-8851-WIP2-trunk.001.patch, 
> YARN-8851-WIP3-trunk.001.patch, YARN-8851-WIP4-trunk.001.patch, 
> YARN-8851-WIP5-trunk.001.patch, YARN-8851-WIP6-trunk.001.patch, 
> YARN-8851-WIP7-trunk.001.patch, YARN-8851-WIP8-trunk.001.patch, [YARN-8851] 
> YARN_New_Device_Plugin_Framework_Design_Proposal-3.pdf, [YARN-8851] 
> YARN_New_Device_Plugin_Framework_Design_Proposal.pdf
>
>
> At present, we support GPU/FPGA device in YARN through a native, coupling 
> way. But it's difficult for a vendor to implement such a device plugin 
> because the developer needs much knowledge of YARN internals. And this brings 
> burden to the community to maintain both YARN core and vendor-specific code.
> Here we propose a new device plugin framework to ease vendor device plugin 
> development and provide a more flexible way to integrate with YARN NM.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8851) [Umbrella] A new pluggable device plugin framework to ease vendor plugin development

2018-10-27 Thread Zhankun Tang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16665981#comment-16665981
 ] 

Zhankun Tang commented on YARN-8851:


[~csingh] , 
{quote}However, the DevicePluginAPI can have a method
{code:java}
onDevicesAllocated(Set allocatedDevices)
{code}
Let's say the user does NOT want to provide a custom DevicePluginScheduler and 
use the default one. This will be call back after the devices get allocated.
 It also seems to complete the current DevicePluginAPI which has a 
{{onDevicesReleased(devices)}} method.
{quote}
Yeah. Very good point. Let the plugin do something after YARN allocated these 
devices is ok. But from our current use case in GPU/FPGA, we haven't an idea of 
what will the plugin do after we allocated the devices.

And this interesting name "onDeviceAllocated" is used in my prior patch to let 
the plugin do some preparation and provide runtime spec. 

I changed it from "preLaunchContainer" to "onDeviceAllocated" and to 
"onDeviceUse" and to "getDeviceRuntimeSpec" based on feedback.  But I admit 
that the method is still confusing in the meaning/name.

Let's go over the interface again. The current status of my latest code:

 
{code:java}
/**
 * A must interface for vendor plugin to implement.
 * */
public interface DevicePlugin {
 /**
 * Called first when device plugin framework wants to register
 * @return DeviceRegisterRequest {@link DeviceRegisterRequest}
 * */
 DeviceRegisterRequest getRegisterRequestInfo();

 /**
 * Called when update node resource
 * @return a set of {@link Device}, {@link java.util.TreeSet} recommended
 * */
 Set getDevices();

 /**
 * Asking how these devices should be prepared/used before/when container 
launch.
 * @param allocatedDevices A set of allocated {@link Device}.
 * Note that it could be null which means no device allocated.
 * Only {@code volumeClaims} in it will be handled to create volume.
 * @param runtime Indicate which runtime the framework will use
 * Could be {@code RUNTIME_CGROUPS} or {@code RUNTIME_DOCKER}
 * in {@link DeviceRuntimeSpec}
 * @return a {@link DeviceRuntimeSpec} description about environment,
 * {@link VolumeSpec}, {@link MountVolumeSpec}. etc
 * */

// THis is called onDeviceAllocated in prior patches. 
DeviceRuntimeSpec getDeviceRuntimeSpec(Set allocatedDevices, String 
runtime);
 
 /**
 * Called after device released.
 * */
 void onDevicesReleased(Set releasedDevices);
}{code}
 

 The "getRegisterRequestInfo" and "getDevice" is quite clear. And 
"onDeviceRelease" is also clear that this is a hook when container 
finishes(devices back to YARN, some plugin may do some cleanup or device reset).

But the name "_getDeviceRuntimeSpec_" is still a little confusing after I think 
about it again.

I should explain more details on this. My original name "_onDeviceUse_"'s 
intention is to tell a plugin that YARN is going to *USE* the 
"alloccatedDevice"(no matter who allocated it) and asking how to use these 
devices by this runtime ( sets environments, volume/device mounts, volume 
creation).

Confusing comes when the allocation can be null.

"

If the allocatedDevices is null and runtime is Docker, the plugin can do some 
preparation or prefer YARN do it, for instance, in Nvidia GPU Docker case it 
needs a docker volume to be created which needs permission.

If the allocatedDevices is not null and runtime is Docker, the plugin should 
tell YARN which device and volume to mount and what environment to set.

"

This explanation is confusing and indicates the limitation of our YARN internal 
plugin lifecycle management.

I shouldn't keep our internals easy but make such a complex method parameter. I 
passed a null and expect the plugin understand the intention and return volume 
creation request. This is silly.  !/jira/images/icons/emoticons/smile.png!

The current YARN internal lifecycle management shortage details are here. You 
can skip it and go to the end directly if too much detail to read. The internal 
"_DockerCommandPlugin_'s  "_getCreateDockerVolumeCommand_" is called in 
_DockerLinuxContainerRuntime_'s "_prepareContainer_". At this time, the 
container hasn't been allocated (we can do allocation in the 
_DockerCommandPlugin_'s getCreateDockerVolumeCommand but this is weird) so I 
pass null value to the hook. Maybe this hook should volume creation be moved to 
launchContainer(before update the Docker run command). In this case, we can 
allocate the devices in _ResourceHandlerChain_ before the 
_DockerCommandPlugin's_ method invocation. The allocation passed in the vendor 
plugin won't be null.  And the _DeviceRuntimeSpec_ can be shared and passed to 
DockerCommandPlugin to do volume creation and docker run stuff later. 

 

Based on the above discussion, now I prefer to keep the original"onDeviceUse" 
intention/scope unchanged but try to find a better name for it. 

If we agree this API's scope, we should change YARN 

[jira] [Commented] (YARN-8851) [Umbrella] A new pluggable device plugin framework to ease vendor plugin development

2018-10-26 Thread Chandni Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16665382#comment-16665382
 ] 

Chandni Singh commented on YARN-8851:
-

Thanks [~tangzhankun]
{quote}
"DeviceRegisterRequest getDeviceResourceInfo" and "DeviceRegisterRequest 
getRegisterRequestInfo()". Maybe the later one is more acurate since the 
"DeviceRegisterRequest" may contains more info besides resource name & version 
we currently want?
{quote}
{{getRegisterRequestInfo()}} is good.


{quote}
 We have another interface "DevicePluginScheduler" to do this.
{quote}
{code}
Set allocateDevices(Set availableDevices, Integer count);
{code}
I saw the above API. This one seems that if the implementation of custom 
scheduler is provided, this implementation will allocate devices which is fine.

However, the DevicePluginAPI can have a method
{code}
onDevicesAllocated(Set allocatedDevices)
{code}
Let's say the user does NOT want to provide a custom DevicePluginScheduler and 
use the default one. This will be call back after the devices get allocated.
It also seems to complete the current DevicePluginAPI which has a 
{{onDevicesReleased(devices)}} method.


> [Umbrella] A new pluggable device plugin framework to ease vendor plugin 
> development
> 
>
> Key: YARN-8851
> URL: https://issues.apache.org/jira/browse/YARN-8851
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: yarn
>Reporter: Zhankun Tang
>Assignee: Zhankun Tang
>Priority: Major
> Attachments: YARN-8851-WIP2-trunk.001.patch, 
> YARN-8851-WIP3-trunk.001.patch, YARN-8851-WIP4-trunk.001.patch, 
> YARN-8851-WIP5-trunk.001.patch, YARN-8851-WIP6-trunk.001.patch, 
> YARN-8851-WIP7-trunk.001.patch, [YARN-8851] 
> YARN_New_Device_Plugin_Framework_Design_Proposal-3.pdf, [YARN-8851] 
> YARN_New_Device_Plugin_Framework_Design_Proposal.pdf
>
>
> At present, we support GPU/FPGA device in YARN through a native, coupling 
> way. But it's difficult for a vendor to implement such a device plugin 
> because the developer needs much knowledge of YARN internals. And this brings 
> burden to the community to maintain both YARN core and vendor-specific code.
> Here we propose a new device plugin framework to ease vendor device plugin 
> development and provide a more flexible way to integrate with YARN NM.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8851) [Umbrella] A new pluggable device plugin framework to ease vendor plugin development

2018-10-25 Thread Zhankun Tang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664540#comment-16664540
 ] 

Zhankun Tang commented on YARN-8851:


[~csingh] , Thanks for the review!
{quote}1.
{code:java}
  DeviceRegisterRequest register();
{code}
This is misleading. {{register()}} would mean that the device plugin is 
registering itself. However, here we need some information from the device 
plugin. Maybe, it can be changed to something like
{code:java}
DeviceResourceInfo getDeviceResourceInfo()
{code}
{quote}
Zhankun-> Yeah. Weiwei also mentioned this problem. "getDeviceResourceInfo" is 
also very good. Now we have two names for it. :)

"DeviceRegisterRequest getDeviceResourceInfo" and "DeviceRegisterRequest 
getRegisterRequestInfo()". Maybe the later one is more acurate since the 
"DeviceRegisterRequest" may contains more info besides resource name & version 
we currently want?
{quote}2.
{code:java}
 DeviceRuntimeSpec onDevicesUse(Set allocatedDevices, String runtime);
{code}
If this is get the {{DeviceRuntimeSpec}}, then should it be called 
{{getDeviceRuntimeSpec()}} ?
{quote}
Zhankun-> That's a good idea.
{quote}3. Since we have callback for devices released, do we also need a 
callback for devices allocated?
 {{void onDevicesAllocated(Set allocatedDevices)}}
{quote}
Zhankun-> We have another interface "DevicePluginScheduler" to do this. And one 
may ask the reason why it's two interfaces, the intention here is that this 
scheduler interface is optional. And the other one is a must.
{code:java}
/**
* Called when allocating devices. The framework will do all device book keeping
* and fail recovery. So this hook should only do scheduling based on available 
devices
* passed in. This method could be invoked multiple times.
* @param availableDevices Devices allowed to be chosen from.
* @param count Number of device to be allocated.
* @return a set of {@link Device}
* */
Set allocateDevices(Set availableDevices, Integer count);{code}
{quote}4. Just a suggestion about logging
 Use slf4j logging format since that's the framework we are using and it 
improves readability of logging stmts.
 eg. instead of {{LOG.info("Adapter of " + pluginClassName + " created. 
Initializing..");}} 
 we can use
{code:java}
LOG.info("Adapter of {} created. Initializing..", pluginClassName);{code}
{quote}
Zhankun -> Yeah. I also noted that we're using slf4j here in this 
"ResourcePluginManager" instead of log4j. Will change it.

> [Umbrella] A new pluggable device plugin framework to ease vendor plugin 
> development
> 
>
> Key: YARN-8851
> URL: https://issues.apache.org/jira/browse/YARN-8851
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: yarn
>Reporter: Zhankun Tang
>Assignee: Zhankun Tang
>Priority: Major
> Attachments: YARN-8851-WIP2-trunk.001.patch, 
> YARN-8851-WIP3-trunk.001.patch, YARN-8851-WIP4-trunk.001.patch, 
> YARN-8851-WIP5-trunk.001.patch, YARN-8851-WIP6-trunk.001.patch, 
> YARN-8851-WIP7-trunk.001.patch, [YARN-8851] 
> YARN_New_Device_Plugin_Framework_Design_Proposal-3.pdf, [YARN-8851] 
> YARN_New_Device_Plugin_Framework_Design_Proposal.pdf
>
>
> At present, we support GPU/FPGA device in YARN through a native, coupling 
> way. But it's difficult for a vendor to implement such a device plugin 
> because the developer needs much knowledge of YARN internals. And this brings 
> burden to the community to maintain both YARN core and vendor-specific code.
> Here we propose a new device plugin framework to ease vendor device plugin 
> development and provide a more flexible way to integrate with YARN NM.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8851) [Umbrella] A new pluggable device plugin framework to ease vendor plugin development

2018-10-25 Thread Chandni Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664228#comment-16664228
 ] 

Chandni Singh commented on YARN-8851:
-

[~tangzhankun] Thanks for working on this. I have few initial comments about 
the Device Plugin API

1.
{code:java}
  DeviceRegisterRequest register();
{code}
This is misleading. {{register()}} would mean that the device plugin is 
registering itself. However, here we need some information from the device 
plugin. Maybe, it can be changed to something like
{code:java}
DeviceResourceInfo getDeviceResourceInfo()
{code}
2.
{code:java}
 DeviceRuntimeSpec onDevicesUse(Set allocatedDevices, String runtime);
{code}
If this is get the {{DeviceRuntimeSpec}}, then should it be called 
{{getDeviceRuntimeSpec()}} ?

3. Since we have callback for devices released, do we also need a callback for 
devices allocated?
 \{{ void onDevicesAllocated(Set allocatedDevices)}}

4. Just a suggestion about logging
 Use slf4j logging format since that's the framework we are using and it 
improves readability of logging stmts.
 eg. instead of {{LOG.info("Adapter of " + pluginClassName + " created. 
Initializing..");}} 
 we can use : \{{LOG.info("Adapter of {} created. Initializing..", 
pluginClassName); }}

> [Umbrella] A new pluggable device plugin framework to ease vendor plugin 
> development
> 
>
> Key: YARN-8851
> URL: https://issues.apache.org/jira/browse/YARN-8851
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: yarn
>Reporter: Zhankun Tang
>Assignee: Zhankun Tang
>Priority: Major
> Attachments: YARN-8851-WIP2-trunk.001.patch, 
> YARN-8851-WIP3-trunk.001.patch, YARN-8851-WIP4-trunk.001.patch, 
> YARN-8851-WIP5-trunk.001.patch, YARN-8851-WIP6-trunk.001.patch, 
> YARN-8851-WIP7-trunk.001.patch, [YARN-8851] 
> YARN_New_Device_Plugin_Framework_Design_Proposal-3.pdf, [YARN-8851] 
> YARN_New_Device_Plugin_Framework_Design_Proposal.pdf
>
>
> At present, we support GPU/FPGA device in YARN through a native, coupling 
> way. But it's difficult for a vendor to implement such a device plugin 
> because the developer needs much knowledge of YARN internals. And this brings 
> burden to the community to maintain both YARN core and vendor-specific code.
> Here we propose a new device plugin framework to ease vendor device plugin 
> development and provide a more flexible way to integrate with YARN NM.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8851) [Umbrella] A new pluggable device plugin framework to ease vendor plugin development

2018-10-25 Thread Zhankun Tang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16663495#comment-16663495
 ] 

Zhankun Tang commented on YARN-8851:


Sorry that I missed your comments. Thanks [~cheersyang] . :)

> [Umbrella] A new pluggable device plugin framework to ease vendor plugin 
> development
> 
>
> Key: YARN-8851
> URL: https://issues.apache.org/jira/browse/YARN-8851
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: yarn
>Reporter: Zhankun Tang
>Assignee: Zhankun Tang
>Priority: Major
> Attachments: YARN-8851-WIP2-trunk.001.patch, 
> YARN-8851-WIP3-trunk.001.patch, YARN-8851-WIP4-trunk.001.patch, 
> YARN-8851-WIP5-trunk.001.patch, YARN-8851-WIP6-trunk.001.patch, 
> YARN-8851-WIP7-trunk.001.patch, [YARN-8851] 
> YARN_New_Device_Plugin_Framework_Design_Proposal-3.pdf, [YARN-8851] 
> YARN_New_Device_Plugin_Framework_Design_Proposal.pdf
>
>
> At present, we support GPU/FPGA device in YARN through a native, coupling 
> way. But it's difficult for a vendor to implement such a device plugin 
> because the developer needs much knowledge of YARN internals. And this brings 
> burden to the community to maintain both YARN core and vendor-specific code.
> Here we propose a new device plugin framework to ease vendor device plugin 
> development and provide a more flexible way to integrate with YARN NM.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8851) [Umbrella] A new pluggable device plugin framework to ease vendor plugin development

2018-10-23 Thread Zhankun Tang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661657#comment-16661657
 ] 

Zhankun Tang commented on YARN-8851:


[~leftnoteasy], Thanks for the review. Answer as below:
{quote}1) From a user perspective, what needs to be implemented? Is it just 
following two?

DevicePlugin (required)
 DevicePluginScheduler (optional)
{quote}
Zhankun-> Yeah. Just the follow two.
{quote}2) It's good to see you added a examples package, it will be useful for 
user to start with. However instead of providing a fake implementation, can we 
implement a demo device plugin that can be actually configured and tested on a 
single node cluster? This will give more sense to user how to implement their 
own plugin. Further, it will be good if you can provide a sanity test-suit to 
verify if a device plugin is compatible.
{quote}
Zhankun-> The fake device plugin can be actually configured and tested. The 
only problem in my mind here is in the example it's just a class but not an 
maven project with pom.xml in it.  Add pom.xml dependencies in the document and 
the example device plugin code comments? 

For the sanity test-suit, will do that.
{quote}3) Some high-level comments about the APIs in DevicePlugin

DeviceRegisterRequest register();
 This is a bit confusing. A register() function is normally a two-side call, 
e.g a slave registers itself to a master. But here it simply returns a 
DeviceRegisterRequest, it looks more like a getDeviceInfo() API to me.

Set getDevices();
 is this supposed to return a set available devices? If so, is it better to 
rename it to "getAvailableDevices"?
{quote}
Zhankun-> The DeviceRegisterRequest contains the name of the resource type that 
plugin wants to register. And maybe other info in the future. How about 
"DeviceRegisterRequest getRegisterInfo()"?

Yeah. "getAvailableDevices" is more concrete. I'm afraid once we support 
monitoring the devices, this method would be called regularly. The name is also 
a little confusing to the plugin which has scheduling logic. It may be confused 
by what the available means? Do I need to count the already using devices in? I 
guess we are actually asking allowed devices? How about "Set 
getAllowedDevices"?
{quote}4) It is interesting to allow customized DevicePluginScheduler, how 
failure recovery can be done? Does that mean user needs to implement all the 
logic about allocated resource persistent & recovery in NM store? In that case, 
we are exposing too much YARN internals in a plugin framework.
{quote}
Zhankun-> YARN will do bookkeeping and persistent & recovery of all the 
customized device plugin scheduler's allocation. The DevicePluginScheduler 
should be stateless. Check the API description below, and we ensure the 
"availabeDevices" we passed into the API is an immutable set. Calling the API 
won't affect YARN stability.

Here we ask the plugin this question "hey, there's some available devices at my 
hand, choose N for me".

The vendor plugin developer can check it and do customized scheduling based the 
topology, utilization, virtualization or health status based on its own idea 
that we don't know.
{code:java}
/**
* Called when allocating devices. The framework will do all device book keeping
* and fail recovery. So this hook should only do scheduling based on available 
devices
* passed in. This method could be invoked multiple times.
* @param availableDevices Devices allowed to be chosen from.
* @param count Number of device to be allocated.
* @return a set of {@link Device}
* */
Set allocateDevices(Set availableDevices, Integer count);{code}
{quote}5) DevicePluginAdapter doesn't look like a adaptor, it looks more like a 
base class of ResourcePlugin to me. Pls correct me if I misunderstood this.
{quote}
Zhankun-> I'm afraid not. One device plugin instance is wrapped with one 
DevicePluginAdapter to be integrated into the YARN ResourcePlugin handling 
process. In this angle, the DevicePluginAdapter adapts YARN's requirements to 
the plugin instance.

I haven't got a better name for it. The previous implementation of 
DevicePluginAdapter is to inherit 4 interfaces. Now it only inherit the 
ResourcePlugin. How about "DeviceResourceImpl"?
{quote}6) It is confusing that DevicePluginAdapter has a reference to 
ResourcePluginManager, could you remove that? From what I can see, 
ResourcePluginManager manages all ResourcePlugins, and each ResourcePlugins can 
be instanced by a DevicePluginAdapter.
{quote}
Zhankun-> Yeah, It's a legacy in WIP patch. Will remove that. One thing to 
clarify is that the DevicePluginAdapter itsefl is actually a ResourcePlugin. It 
is added into ResourcePluginManager's pluginMap.

> [Umbrella] A new pluggable device plugin framework to ease vendor plugin 
> development
> 
>
> Key: YARN-8851
> 

[jira] [Commented] (YARN-8851) [Umbrella] A new pluggable device plugin framework to ease vendor plugin development

2018-10-23 Thread Weiwei Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16660906#comment-16660906
 ] 

Weiwei Yang commented on YARN-8851:
---

Hi [~tangzhankun]

Thanks for the design doc and patch. I have some high-level comments

1) From a user perspective, what needs to be implemented? Is it just following 
two?
 * DevicePlugin (required)
 * DevicePluginScheduler (optional)

2) It's good to see you added a *examples* package, it will be useful for user 
to start with. However instead of providing a fake implementation, can we 
implement a demo device plugin that can be actually configured and tested on a 
single node cluster? This will give more sense to user how to implement their 
own plugin. Further, it will be good if you can provide a sanity test-suit to 
verify if a device plugin is compatible.

3) Some high-level comments about the APIs in {{DevicePlugin}}
{code:java}
DeviceRegisterRequest register();
{code}
This is a bit confusing. A register() function is normally a two-side call, e.g 
a slave registers itself to a master. But here it simply returns a 
{{DeviceRegisterRequest}}, it looks more like a {{getDeviceInfo()}} API to me.
{code:java}
Set getDevices();
{code}
is this supposed to return a set available devices? If so, is it better to 
rename it to "getAvailableDevices"?

4) It is interesting to allow customized {{DevicePluginScheduler}}, how failure 
recovery can be done? Does that mean user needs to implement all the logic 
about allocated resource persistent & recovery in NM store? In that case, we 
are exposing too much YARN internals in a plugin framework.

5) {{DevicePluginAdapter}} doesn't look like a adaptor, it looks more like a 
base class of {{ResourcePlugin}} to me. Pls correct me if I misunderstood this.

6) It is confusing that DevicePluginAdapter has a reference to 
ResourcePluginManager, could you remove that? From what I can see, 
ResourcePluginManager manages all ResourcePlugins, and each ResourcePlugins can 
be instanced by a DevicePluginAdapter.

Let me know if these make sense.

Thanks

> [Umbrella] A new pluggable device plugin framework to ease vendor plugin 
> development
> 
>
> Key: YARN-8851
> URL: https://issues.apache.org/jira/browse/YARN-8851
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: yarn
>Reporter: Zhankun Tang
>Assignee: Zhankun Tang
>Priority: Major
> Attachments: YARN-8851-WIP2-trunk.001.patch, 
> YARN-8851-WIP3-trunk.001.patch, YARN-8851-WIP4-trunk.001.patch, 
> YARN-8851-WIP5-trunk.001.patch, YARN-8851-WIP6-trunk.001.patch, 
> YARN-8851-WIP7-trunk.001.patch, [YARN-8851] 
> YARN_New_Device_Plugin_Framework_Design_Proposal-3.pdf, [YARN-8851] 
> YARN_New_Device_Plugin_Framework_Design_Proposal.pdf
>
>
> At present, we support GPU/FPGA device in YARN through a native, coupling 
> way. But it's difficult for a vendor to implement such a device plugin 
> because the developer needs much knowledge of YARN internals. And this brings 
> burden to the community to maintain both YARN core and vendor-specific code.
> Here we propose a new device plugin framework to ease vendor device plugin 
> development and provide a more flexible way to integrate with YARN NM.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8851) [Umbrella] A new pluggable device plugin framework to ease vendor plugin development

2018-10-23 Thread Zhankun Tang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16660804#comment-16660804
 ] 

Zhankun Tang commented on YARN-8851:


[~leftnoteasy], Agree and updated the patch. Please review:
1. Change one API of DevicePlugin:
Added a runtime parameter of the API as below.
{code:java}
  /**
   * Asking how these devices should be prepared/used before container launch.
   * @param allocatedDevices A set of allocated {@link Device}.
   *Note that it could be null which means no device allocated.
   *Only {@code volumeClaims} in it will be handled to create volume.
   * @param runtime Indicate which runtime the framework will use
   *Could be {@code RUNTIME_CGROUPS} or {@code RUNTIME_DOCKER}
   *in {@link DeviceRuntimeSpec}
   * @return a {@link DeviceRuntimeSpec} description about environment,
   * {@link VolumeSpec}, {@link MountVolumeSpec}. etc
   * */
DeviceRuntimeSpec onDevicesUse(Set allocatedDevices, String runtime);
{code}
2. Added some code to show how to get DeviceRuntimeSpec and use it.
The above onDevicesUse is called in ResourceHandler's preStart and 
DockerCommandPlugin's all three methods. Because DockerCommanPlugin's 
getCreateDockerVolumeCommand method is called before ResourceHandler's preStart 
if runtime is Docker. So here the allocatedDevices would be null to pass in. 
The code here let the device plugin return DeviceRuntimeSpec with only 
VolumeSpec in it which requires YARN to create docker volume. Or it can return 
an empty object if it can create in its own way. Then YARN does nothing for the 
docker volume creation.

This above API might be a little complex. And we can also add one interface 
like below. But I'm not quite sure if the two "onDevicePreparation" and 
"onDeviceUse" would cause confusion.

In theory, we can change our internals to make the allocation earlier and 
visible to DockerCommandPlugin. In that way, the allocation will not be null 
and the onDeviceUse seems clear.  So I don't add one. 
{code:java}
VolumeSpec onDevicePreparation(String runtime)
{code}

Please let me know your thoughts. Thanks.

> [Umbrella] A new pluggable device plugin framework to ease vendor plugin 
> development
> 
>
> Key: YARN-8851
> URL: https://issues.apache.org/jira/browse/YARN-8851
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: yarn
>Reporter: Zhankun Tang
>Assignee: Zhankun Tang
>Priority: Major
> Attachments: YARN-8851-WIP2-trunk.001.patch, 
> YARN-8851-WIP3-trunk.001.patch, YARN-8851-WIP4-trunk.001.patch, 
> YARN-8851-WIP5-trunk.001.patch, YARN-8851-WIP6-trunk.001.patch, 
> YARN-8851-WIP7-trunk.001.patch, [YARN-8851] 
> YARN_New_Device_Plugin_Framework_Design_Proposal-3.pdf, [YARN-8851] 
> YARN_New_Device_Plugin_Framework_Design_Proposal.pdf
>
>
> At present, we support GPU/FPGA device in YARN through a native, coupling 
> way. But it's difficult for a vendor to implement such a device plugin 
> because the developer needs much knowledge of YARN internals. And this brings 
> burden to the community to maintain both YARN core and vendor-specific code.
> Here we propose a new device plugin framework to ease vendor device plugin 
> development and provide a more flexible way to integrate with YARN NM.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8851) [Umbrella] A new pluggable device plugin framework to ease vendor plugin development

2018-10-22 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16659747#comment-16659747
 ] 

Wangda Tan commented on YARN-8851:
--

[~tangzhankun],

Thanks for updating the patch, the latest patch looks much better now.

One suggestion: 
 * The DevicePluginAdapter extends/implements 4 interfaces, Instead of doing 
that, is it possible to just make the Adapter implements ResourcePlugin 
interface, and make several "sub-adapter" to implement ResourceHandler, 
DockerCommandPlugin, and NMResourceUpdaterPlugin? By doing this, we can get a 
more grandularized interface definition and very much close to ResourcePlugin 
interface so less changes of integration code required.
 * I can understand most of the DevicePluginAdapter logics should be alike 
GPUResourcePlugin implementation, but some part will come from 
DeviceRuntimeSpec. It gonna be help to get more concrete implementation to see 
if our APIs properly designed or not.

And I haven't dig into details of code logics / naming, etc. while we're trying 
to sort out overall code structure.

> [Umbrella] A new pluggable device plugin framework to ease vendor plugin 
> development
> 
>
> Key: YARN-8851
> URL: https://issues.apache.org/jira/browse/YARN-8851
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: yarn
>Reporter: Zhankun Tang
>Assignee: Zhankun Tang
>Priority: Major
> Attachments: YARN-8851-WIP2-trunk.001.patch, 
> YARN-8851-WIP3-trunk.001.patch, YARN-8851-WIP4-trunk.001.patch, 
> YARN-8851-WIP5-trunk.001.patch, YARN-8851-WIP6-trunk.001.patch, [YARN-8851] 
> YARN_New_Device_Plugin_Framework_Design_Proposal-3.pdf, [YARN-8851] 
> YARN_New_Device_Plugin_Framework_Design_Proposal.pdf
>
>
> At present, we support GPU/FPGA device in YARN through a native, coupling 
> way. But it's difficult for a vendor to implement such a device plugin 
> because the developer needs much knowledge of YARN internals. And this brings 
> burden to the community to maintain both YARN core and vendor-specific code.
> Here we propose a new device plugin framework to ease vendor device plugin 
> development and provide a more flexible way to integrate with YARN NM.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8851) [Umbrella] A new pluggable device plugin framework to ease vendor plugin development

2018-10-22 Thread Zhankun Tang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658763#comment-16658763
 ] 

Zhankun Tang commented on YARN-8851:


[~leftnoteasy], [~sunilg], [~cheersyang]. Updated the patch for your review. 
The key changes are:
1. Added _DeviceRuntimeSpec_ related classes

2. Added another new interface class file "DevicePluginScheduler" for the 
vendor to implement to provide their own schedule logic in the method

_"Set allocateDevices(Set availableDevices, Integer count)"_.

The framework will use plugin's scheduling logic based on configuration. If set 
true to 
"yarn.nodemanager.pluggable-device-framework.prefer-customized-scheduler", 
otherwise, use internal scheduling logic

3. Change current "DevicePlugin"'s interfaces names.
"preLaunchContainer => OnDevicesAllocated 
postCompleteContainer => OnDeviceReleased"

4. Change name of "DeviceLocalScheduler" to "DeviceSchedulerManager". 

5. Added some unit tests to check basic workflow of DevicePlugin, 
DevicePluginAdapter, and DeviceSchedulerManager.

> [Umbrella] A new pluggable device plugin framework to ease vendor plugin 
> development
> 
>
> Key: YARN-8851
> URL: https://issues.apache.org/jira/browse/YARN-8851
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: yarn
>Reporter: Zhankun Tang
>Assignee: Zhankun Tang
>Priority: Major
> Attachments: YARN-8851-WIP2-trunk.001.patch, 
> YARN-8851-WIP3-trunk.001.patch, YARN-8851-WIP4-trunk.001.patch, 
> YARN-8851-WIP5-trunk.001.patch, [YARN-8851] 
> YARN_New_Device_Plugin_Framework_Design_Proposal-3.pdf, [YARN-8851] 
> YARN_New_Device_Plugin_Framework_Design_Proposal.pdf
>
>
> At present, we support GPU/FPGA device in YARN through a native, coupling 
> way. But it's difficult for a vendor to implement such a device plugin 
> because the developer needs much knowledge of YARN internals. And this brings 
> burden to the community to maintain both YARN core and vendor-specific code.
> Here we propose a new device plugin framework to ease vendor device plugin 
> development and provide a more flexible way to integrate with YARN NM.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8851) [Umbrella] A new pluggable device plugin framework to ease vendor plugin development

2018-10-18 Thread Zhankun Tang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16656188#comment-16656188
 ] 

Zhankun Tang commented on YARN-8851:


[~leftnoteasy] , a lot thanks for the offline discussion. Agreed that we remove 
the unnecessary APIVersion field since we can throw an exception if the plugin 
is not compatible.

For the Factory pattern involved to create device adapter or device plugin 
instance, we'll keep it for the future plan if we encounter huge complexity in 
current design. So we'll go with the current DevicePlugn interface (keep 
adapter invisible to vendor developer) and shared device local scheduler. But 
try to leave the vendor interface to insert their own device scheduling logic.

> [Umbrella] A new pluggable device plugin framework to ease vendor plugin 
> development
> 
>
> Key: YARN-8851
> URL: https://issues.apache.org/jira/browse/YARN-8851
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: yarn
>Reporter: Zhankun Tang
>Assignee: Zhankun Tang
>Priority: Major
> Attachments: YARN-8851-WIP2-trunk.001.patch, 
> YARN-8851-WIP3-trunk.001.patch, YARN-8851-WIP4-trunk.001.patch, [YARN-8851] 
> YARN_New_Device_Plugin_Framework_Design_Proposal-3.pdf, [YARN-8851] 
> YARN_New_Device_Plugin_Framework_Design_Proposal.pdf
>
>
> At present, we support GPU/FPGA device in YARN through a native, coupling 
> way. But it's difficult for a vendor to implement such a device plugin 
> because the developer needs much knowledge of YARN internals. And this brings 
> burden to the community to maintain both YARN core and vendor-specific code.
> Here we propose a new device plugin framework to ease vendor device plugin 
> development and provide a more flexible way to integrate with YARN NM.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8851) [Umbrella] A new pluggable device plugin framework to ease vendor plugin development

2018-10-17 Thread Zhankun Tang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654594#comment-16654594
 ] 

Zhankun Tang commented on YARN-8851:


[~leftnoteasy] , For question 6, I'll try to answer here and we can talk 
offline if it's not clear to you.

The design principle here I'm trying to follow is trying to make the vendor 
completely agnostic to our YARN internals. Simpler for them, better for YARN's 
device plugin ecosystem. Actually, I'm not very sure if this will bring huge 
out of control complexity for us. But my idea is like this:

The vendor developer only needs to use libraries YARN provided to describes the 
requirements related to their devices. And the *_DevicePlugin_* interface 
defines the hooks which are the only chances the vendor can tell YARN what 
devices they have and how to use their devices. It is the only interfaces that 
the vendor needs to know. And the specs can be only created with the library 
provided by us.

Sorry that the *_DevicePluginAdapter_* name is confusing. This class act as a 
bridge between NM and the vendor plugin. When NM wants to get devices, the 
DevicePluginAdapter knows and delegate it to vendor plugin and give back 
result. When NM wants to use these devices, the DevicePluginAdapter knows, it 
allocates devices and delegates to the vendor plugin to get back how to use 
them and tell YARN in YARN's language. The DevicePluginAdapter is a 1 to 1 
relation with DevicePlugin. Each DevicePlugin instance needs a 
DevicePluginAdapter instance to help it. So it's not a  problem that 
DevicePlugin interfaces are not similar to DevicePluginAdapter. The 
DevicePluginAdapter knows NM well and DevicePlugin is utilized by it.

Maybe "DevicePluginWrapper" or "ResourcePluginAdapter" is more proper name? 

 

For the device scheduler, I'm now using a shared device scheduler to handle all 
DevicePluginAdapter's allocation request before container launch. The various 
type of resources allocated one by one in this shared scheduler which is, in 
essence, the same with current independent scheduler inside each GPU 
plugin/FPGA plugin.

Regarding to whether we should accept vendor's customized scheduler, it's a 
good idea. But from my experience, I guess a shared scheduler supporting FIFO 
and topology scheduling might be enough for most of the vendor in a long term? 

 

 

> [Umbrella] A new pluggable device plugin framework to ease vendor plugin 
> development
> 
>
> Key: YARN-8851
> URL: https://issues.apache.org/jira/browse/YARN-8851
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: yarn
>Reporter: Zhankun Tang
>Assignee: Zhankun Tang
>Priority: Major
> Attachments: YARN-8851-WIP2-trunk.001.patch, 
> YARN-8851-WIP3-trunk.001.patch, YARN-8851-WIP4-trunk.001.patch, [YARN-8851] 
> YARN_New_Device_Plugin_Framework_Design_Proposal-3.pdf, [YARN-8851] 
> YARN_New_Device_Plugin_Framework_Design_Proposal.pdf
>
>
> At present, we support GPU/FPGA device in YARN through a native, coupling 
> way. But it's difficult for a vendor to implement such a device plugin 
> because the developer needs much knowledge of YARN internals. And this brings 
> burden to the community to maintain both YARN core and vendor-specific code.
> Here we propose a new device plugin framework to ease vendor device plugin 
> development and provide a more flexible way to integrate with YARN NM.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8851) [Umbrella] A new pluggable device plugin framework to ease vendor plugin development

2018-10-17 Thread Zhankun Tang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654557#comment-16654557
 ] 

Zhankun Tang commented on YARN-8851:


[~leftnoteasy] 

5) DeviceRuntimeSpec is empty, what you plan to add?

{color:#d04437}Zhankun–>{color} It's a set of classeds and is described more 
clearly in the UML figure of the design doc. In general, it is returned by 
vendor plugin's implementation hook "OnDeviceAllocated" to describe the 
requirements of environment or volume creation or docker command updates.etc. 
This DeviceRuntimeSpec will be translated to YARN internal operations by the 
"DevicePluginAdapter". For instance, GPUv1 might require a volume creation 
before container launch. And in this DeviceRuntimeSpec, it is a volumeClaim to 
describe it and let NM to create it. Another example is GPUv2 needs additional 
environment when running Docker container, this is described by "envs". And for 
cgroups device isolation, it is described by "MountDeviceSpec".

The class is like this:
{code:java}
class DeviceRuntimeSpec {
Map envs; // describe needed environment variables before using 
devices
Set volumeMounts; // describe volumes need to be mounted 
before using devices
Set devices; // describe devices needed to be mount
Set volumeClaim; // describe volume to be created/delete before 
using devices
}
{code}
 

 

> [Umbrella] A new pluggable device plugin framework to ease vendor plugin 
> development
> 
>
> Key: YARN-8851
> URL: https://issues.apache.org/jira/browse/YARN-8851
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: yarn
>Reporter: Zhankun Tang
>Assignee: Zhankun Tang
>Priority: Major
> Attachments: YARN-8851-WIP2-trunk.001.patch, 
> YARN-8851-WIP3-trunk.001.patch, YARN-8851-WIP4-trunk.001.patch, [YARN-8851] 
> YARN_New_Device_Plugin_Framework_Design_Proposal-3.pdf, [YARN-8851] 
> YARN_New_Device_Plugin_Framework_Design_Proposal.pdf
>
>
> At present, we support GPU/FPGA device in YARN through a native, coupling 
> way. But it's difficult for a vendor to implement such a device plugin 
> because the developer needs much knowledge of YARN internals. And this brings 
> burden to the community to maintain both YARN core and vendor-specific code.
> Here we propose a new device plugin framework to ease vendor device plugin 
> development and provide a more flexible way to integrate with YARN NM.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8851) [Umbrella] A new pluggable device plugin framework to ease vendor plugin development

2018-10-17 Thread Zhankun Tang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654534#comment-16654534
 ] 

Zhankun Tang commented on YARN-8851:


[~leftnoteasy] Thanks for the review! Very helpful comments!

1. 
{code:java}
 
 // Check version for compatibility
 String pluginVersion = request.getVersion();
 if (!isVersionCompatible(pluginVersion)) {
 LOG.error("Class: " + pluginClassName + " version: " + pluginVersion +
 " is not compatible. Expected: " + DeviceConstants.version);
 }
{code}
What's the use case for this? My understanding is, version match should happen 
when requests come to NM. And I'm not sure if it is the best idea to limit 
format of version, maybe we should just treat it as an identifier in addition 
to name?

{color:#FF}Zhankun -->{color} Sorry for the misleading name 
"pluginVersion". It should be "APIVersion" in fact. The format of it follows 
semantic versioning which is "Major.Mino.patch". A vendor plugin should report 
which DevicePlugin API version it is using.

Given a version number MAJOR.MINOR.PATCH, increment the:
MAJOR version when you make incompatible API changes,
MINOR version when you add functionality in a backwards-compatible manner, and
PATCH version when you make backwards-compatible bug fixes.

When NM gets the request from vendor plugin, this "APIVersion" is used to check 
if the vendor plugin is developed by a compatible version of 
"org.apache.hadoop.yarn.server.nodemanager.api.deviceplugin". For instance, the 
NM uses a "1.0.0" but the plugin's APIversion is "0.1.0"(which means this 
vendor plugin is developed by 0.1.0 APIs), we should reject this register 
request because the APIs it used maybe deprecated (major version 0 < 1).

And we can add a field of "pluginVersion" for the plugin to indicate its own 
version. But I guess this not that important to YARN.

2. 

Instead of adding two configs:
{code:java}
 @Private
 public static final String 
NM_RESOURCE_PLUGINS_ENABLE_PLUGGABLE_DEVICE_FRAMEWORK =
 NM_RESOURCE_PLUGINS + ".pluggable-device-framework.enable";


 @Private
 public static final String NM_RESOURCE_PLUGINS_PLUGGABLE_CLASS =
 NM_RESOURCE_PLUGINS + ".pluggable-class";
{code}
Maybe leaving the pluggable-class is sufficient?

{color:#FF}Zhankun -->{color} Ah ha, I think leave only this one is ok for 
now. But I'm not sure if there'll be more configurations related to the device 
framework. So maybe leave a switch here is more easy for the administrator to 
open/close the whole?

3.

Set getAndWatch(), 
I'm not sure what does the "Watch" mean? Should it be just getDevices?

{color:#FF}Zhankun–>{color} Good idea.

4. It looks like you try to make DevicePlugin agnostic to Container itself, 
maybe we should change the name:
preLaunchContainer => allocateDevices 
postCompleteContainer => releaseDevices?

{color:#FF}Zhankun–> {color:#33}Yeah. This name is confusing. How about 
this? Since we want the vendor plugin {color}deveoper{color:#33} to know 
these two are hooks which will be invoked by NM (more accurate, 
DevicePluginAdapter).{color}{color}

 

 

> [Umbrella] A new pluggable device plugin framework to ease vendor plugin 
> development
> 
>
> Key: YARN-8851
> URL: https://issues.apache.org/jira/browse/YARN-8851
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: yarn
>Reporter: Zhankun Tang
>Assignee: Zhankun Tang
>Priority: Major
> Attachments: YARN-8851-WIP2-trunk.001.patch, 
> YARN-8851-WIP3-trunk.001.patch, YARN-8851-WIP4-trunk.001.patch, [YARN-8851] 
> YARN_New_Device_Plugin_Framework_Design_Proposal-3.pdf, [YARN-8851] 
> YARN_New_Device_Plugin_Framework_Design_Proposal.pdf
>
>
> At present, we support GPU/FPGA device in YARN through a native, coupling 
> way. But it's difficult for a vendor to implement such a device plugin 
> because the developer needs much knowledge of YARN internals. And this brings 
> burden to the community to maintain both YARN core and vendor-specific code.
> Here we propose a new device plugin framework to ease vendor device plugin 
> development and provide a more flexible way to integrate with YARN NM.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8851) [Umbrella] A new pluggable device plugin framework to ease vendor plugin development

2018-10-17 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654192#comment-16654192
 ] 

Wangda Tan commented on YARN-8851:
--

Thanks [~tangzhankun],  mostly high level comments.  item #6 will be most 
important and fundamental of the feature. 

1) Regarding to version compatibility:
{code:java}
 
 // Check version for compatibility
 String pluginVersion = request.getVersion();
 if (!isVersionCompatible(pluginVersion)) {
 LOG.error("Class: " + pluginClassName + " version: " + pluginVersion +
 " is not compatible. Expected: " + DeviceConstants.version);
 }
{code}
What's the use case for this? My understanding is, version match should happen 
when requests come to NM. And I'm not sure if it is the best idea to limit 
format of version, maybe we should just treat it as an identifier in addition 
to name?

2) Instead of adding two configs:
{code:java}
 @Private
 public static final String 
NM_RESOURCE_PLUGINS_ENABLE_PLUGGABLE_DEVICE_FRAMEWORK =
 NM_RESOURCE_PLUGINS + ".pluggable-device-framework.enable";


 @Private
 public static final String NM_RESOURCE_PLUGINS_PLUGGABLE_CLASS =
 NM_RESOURCE_PLUGINS + ".pluggable-class";
{code}
Maybe leaving the pluggable-class is sufficient?

3) Set getAndWatch(), 
 I'm not sure what does the "Watch" mean? Should it be just getDevices?

4) It looks like you try to make DevicePlugin agnostic to Container itself, 
maybe we should change the name:
 preLaunchContainer => allocateDevices 
 postCompleteContainer => releaseDevices?

5) DeviceRuntimeSpec is empty, what you plan to add?

6) The purpose of {{DevicePluginAdapter}} is to handle all resource plugins, 
however, given DevicePlugin interface and DevicePluginAdapter are not quite 
matching. It is very likely that we need customized logic for 
DevicePluginAdapter. Such as how to manipulate Docker command could be quite 
different for GPU and FPGA. So instead of only make pluggable interface for 
DevicePlugin itself, should we use Factory pattern to make all required 
interfaces pluggable?

What I meant is,
 Change:
{code:java}
.pluggable-class
{code}
To {{.pluggable-factory-class}}. And device provider should provide a factory 
method which can returns {{DevicePluginAdapter}} and {{DevicePlugin}} instances.

I also felt it will be better if we can make the scheduler to be part of the 
factory given how to allocate resources for different devices could be 
different.

So the Factory interface could have following method.
{code:java}
 
DevicePluginFactory {
 DevicePlugin getDevicePlugin();
 DevicePluginAdapter getDevicePluginAdapter();
 DevicePluginScheduler getDevicePluginScheduler();

}
{code}
Or, if you think DevicePlugin/DevicePluginScheduler should be internal 
implementation details of getDevicePluginAdapter, we can only leave 
getDevicePluginAdapter, and maybe rename it to getDevicePlugin().

And I think it gonna be fine to leave a common implementation for PluginAdapter 
which exists inside NM, but the DevicePlugin interface should be at least close 
to the PluginAdapter interface, otherwise it is very hard to bridge the two 
interfaces.

> [Umbrella] A new pluggable device plugin framework to ease vendor plugin 
> development
> 
>
> Key: YARN-8851
> URL: https://issues.apache.org/jira/browse/YARN-8851
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: yarn
>Reporter: Zhankun Tang
>Assignee: Zhankun Tang
>Priority: Major
> Attachments: YARN-8851-WIP2-trunk.001.patch, 
> YARN-8851-WIP3-trunk.001.patch, YARN-8851-WIP4-trunk.001.patch, [YARN-8851] 
> YARN_New_Device_Plugin_Framework_Design_Proposal-3.pdf, [YARN-8851] 
> YARN_New_Device_Plugin_Framework_Design_Proposal.pdf
>
>
> At present, we support GPU/FPGA device in YARN through a native, coupling 
> way. But it's difficult for a vendor to implement such a device plugin 
> because the developer needs much knowledge of YARN internals. And this brings 
> burden to the community to maintain both YARN core and vendor-specific code.
> Here we propose a new device plugin framework to ease vendor device plugin 
> development and provide a more flexible way to integrate with YARN NM.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org