[jira] [Commented] (YARN-8220) Running Tensorflow on YARN with GPU and Docker - Examples

2018-08-28 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16595944#comment-16595944
 ] 

Wangda Tan commented on YARN-8220:
--

Thanks [~sunilg], I think we should close this JIRA. 

> Running Tensorflow on YARN with GPU and Docker - Examples
> -
>
> Key: YARN-8220
> URL: https://issues.apache.org/jira/browse/YARN-8220
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-native-services
>Reporter: Sunil Govindan
>Assignee: Sunil Govindan
>Priority: Critical
> Attachments: YARN-8220.001.patch, YARN-8220.002.patch, 
> YARN-8220.003.patch, YARN-8220.004.patch
>
>
> Tensorflow could be run on YARN and could leverage YARN's distributed 
> features.
> This spec fill will help to run Tensorflow on yarn with GPU/docker



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8220) Running Tensorflow on YARN with GPU and Docker - Examples

2018-08-28 Thread Sunil Govindan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16595942#comment-16595942
 ] 

Sunil Govindan commented on YARN-8220:
--

hi [~leftnoteasy]

As Submarine is in, i think this work is not very important to go in for 
tensorflow. If u agree, i can cancel patch and close the same.

> Running Tensorflow on YARN with GPU and Docker - Examples
> -
>
> Key: YARN-8220
> URL: https://issues.apache.org/jira/browse/YARN-8220
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-native-services
>Reporter: Sunil Govindan
>Assignee: Sunil Govindan
>Priority: Critical
> Attachments: YARN-8220.001.patch, YARN-8220.002.patch, 
> YARN-8220.003.patch, YARN-8220.004.patch
>
>
> Tensorflow could be run on YARN and could leverage YARN's distributed 
> features.
> This spec fill will help to run Tensorflow on yarn with GPU/docker



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8220) Running Tensorflow on YARN with GPU and Docker - Examples

2018-06-25 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16523020#comment-16523020
 ] 

genericqa commented on YARN-8220:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
36s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
17s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 27m 
 8s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 28m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
18s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m  4s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
53s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
18s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 29m  
5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 29m  
5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
37s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} pylint {color} | {color:orange}  0m 
17s{color} | {color:orange} The patch generated 387 new + 0 unchanged - 0 fixed 
= 387 total (was 0) {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 24 line(s) that end in whitespace. Use 
git apply --whitespace=fix <>. Refer 
https://git-scm.com/docs/git-apply {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
5s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 15s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
14s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
22s{color} | {color:green} hadoop-project in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 29m 12s{color} 
| {color:red} hadoop-yarn-applications in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
23s{color} | {color:green} hadoop-yarn-deep-learning-frameworks in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
40s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}147m 56s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.applications.distributedshell.TestDistributedShell |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-8220 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12929097/YARN-8220.004.patch |
| Optional Tests |  asflicense  mvnsite  xml  compile  javac  javadoc  
mvninstall  unit  shadedclient  pylint  |
| uname | Linux 66762fc11f91 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided

[jira] [Commented] (YARN-8220) Running Tensorflow on YARN with GPU and Docker - Examples

2018-06-25 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16522985#comment-16522985
 ] 

genericqa commented on YARN-8220:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
22s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
41s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 26m 
31s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
2s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m  7s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
38s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
17s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 26m  
1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 26m  
1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
20s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} pylint {color} | {color:orange}  0m 
15s{color} | {color:orange} The patch generated 387 new + 0 unchanged - 0 fixed 
= 387 total (was 0) {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 24 line(s) that end in whitespace. Use 
git apply --whitespace=fix <>. Refer 
https://git-scm.com/docs/git-apply {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
5s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m  5s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
0s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
19s{color} | {color:green} hadoop-project in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 29m 31s{color} 
| {color:red} hadoop-yarn-applications in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
20s{color} | {color:green} hadoop-yarn-deep-learning-frameworks in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
28s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}135m 11s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-8220 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12929091/YARN-8220.003.patch |
| Optional Tests |  asflicense  mvnsite  xml  compile  javac  javadoc  
mvninstall  unit  shadedclient  pylint  |
| uname | Linux 00045d11878b 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 
11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / c687a66 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_171 |
| pylint 

[jira] [Commented] (YARN-8220) Running Tensorflow on YARN with GPU and Docker - Examples

2018-06-25 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16522897#comment-16522897
 ] 

Wangda Tan commented on YARN-8220:
--

Attached ver.4 patch, removed duplicated contents inside Dockerfile and make 
them built from base images.

> Running Tensorflow on YARN with GPU and Docker - Examples
> -
>
> Key: YARN-8220
> URL: https://issues.apache.org/jira/browse/YARN-8220
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-native-services
>Reporter: Sunil Govindan
>Assignee: Sunil Govindan
>Priority: Critical
> Attachments: YARN-8220.001.patch, YARN-8220.002.patch, 
> YARN-8220.003.patch, YARN-8220.004.patch
>
>
> Tensorflow could be run on YARN and could leverage YARN's distributed 
> features.
> This spec fill will help to run Tensorflow on yarn with GPU/docker



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8220) Running Tensorflow on YARN with GPU and Docker - Examples

2018-06-25 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16522855#comment-16522855
 ] 

Wangda Tan commented on YARN-8220:
--

Attached ver.3 patch, added several fixes to submit-tf-job.py helper script. 
And added tensorboard to example launch spec. Thanks [~yanboliang] for offline 
suggestions and helps of these changes.

> Running Tensorflow on YARN with GPU and Docker - Examples
> -
>
> Key: YARN-8220
> URL: https://issues.apache.org/jira/browse/YARN-8220
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-native-services
>Reporter: Sunil Govindan
>Assignee: Sunil Govindan
>Priority: Critical
> Attachments: YARN-8220.001.patch, YARN-8220.002.patch, 
> YARN-8220.003.patch
>
>
> Tensorflow could be run on YARN and could leverage YARN's distributed 
> features.
> This spec fill will help to run Tensorflow on yarn with GPU/docker



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8220) Running Tensorflow on YARN with GPU and Docker - Examples

2018-06-11 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16509112#comment-16509112
 ] 

Eric Yang commented on YARN-8220:
-

[~leftnoteasy] If JAVA_HOME, CLASSPATH, HDFS_HOME are not changeable.  There is 
no reason to expose them.  Instead, ENTRYPOINT shell script can hard code the 
predefined values to make the docker image self contained.

> Running Tensorflow on YARN with GPU and Docker - Examples
> -
>
> Key: YARN-8220
> URL: https://issues.apache.org/jira/browse/YARN-8220
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-native-services
>Reporter: Sunil Govindan
>Assignee: Sunil Govindan
>Priority: Critical
> Attachments: YARN-8220.001.patch, YARN-8220.002.patch
>
>
> Tensorflow could be run on YARN and could leverage YARN's distributed 
> features.
> This spec fill will help to run Tensorflow on yarn with GPU/docker



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8220) Running Tensorflow on YARN with GPU and Docker - Examples

2018-06-11 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16509094#comment-16509094
 ] 

Wangda Tan commented on YARN-8220:
--

Discussed with [~eyang] about this and did some tests:

Currently, YARN NM passes JAVA_HOME, HDFS_HOME, CLASSPATH environments before 
launching Docker container no matter if ENTRY_POINT is used or not. This will 
overwrite environments defined inside Dockerfile (by using \{{ENV}}). For 
Docker container, it actually doesn't make sense to pass JAVA_HOME, HDFS_HOME, 
etc. because inside docker image we have a separate Java/Hadoop installed or 
mounted to exactly same directory of host machine.

I just filed YARN-8417 to revisit this behavior.

Once the above change is done, we actually don't need to presetup common 
configs inside service spec or presetup.sh, everything could be done very 
cleanly inside the Dockerfile.

For this patch:

Considering size of this patch, I suggest to get it merged before YARN-8417. We 
can continuously improve it (like using ENV/ENTRY_POINT) after YARN-8417 and 
feedbacks from others.

Really appreciate valuable inputs from [~eyang]!

> Running Tensorflow on YARN with GPU and Docker - Examples
> -
>
> Key: YARN-8220
> URL: https://issues.apache.org/jira/browse/YARN-8220
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-native-services
>Reporter: Sunil Govindan
>Assignee: Sunil Govindan
>Priority: Critical
> Attachments: YARN-8220.001.patch, YARN-8220.002.patch
>
>
> Tensorflow could be run on YARN and could leverage YARN's distributed 
> features.
> This spec fill will help to run Tensorflow on yarn with GPU/docker



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8220) Running Tensorflow on YARN with GPU and Docker - Examples

2018-06-11 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16508408#comment-16508408
 ] 

Eric Yang commented on YARN-8220:
-

[~leftnoteasy] Base on RunTensorflowJobUsingNativeServiceSpec.md the code can 
be changed to:

{code}
{
"name": "single-node-tensorflow",
"version": "1.0.0",
"components": [
{
"artifact" : {
  "id" : ,
  "type" : "DOCKER"
},
"name": "worker",
"dependencies": [],
"resource": {
"cpus": 1,
"memory": "4096",
"additional" : {
  "yarn.io/gpu" : {
"value" : 2
   }
}
},
"launch_command": 
"--data-dir=hdfs://default/tmp/cifar-10-data,--job-dir=hdfs://default/tmp/cifar-10-jobdir,--num-gpus=1,--train-batch-size=16,--train-steps=4",
"number_of_containers": 1,
"run_privileged_container": false,
"configuration": {
"env": {
  "HADOOP_HOME": "/hadoop-3.1.0",
  "HADOOP_HDFS_HOME": "",
  "HADOOP_YARN_HOME": "",
  "HADOOP_CONF_DIR": "/etc/hadoop/conf",
  "YARN_CONTAINER_RUNTIME_DOCKER_RUN_OVERRIDE_DISABLE":"true"
}
}
}
],
"kerberos_principal" : {
  "principal_name" : "test-u...@example.com",
  "keytab" : "file:///etc/security/keytabs/test-user.headless.keytab"
}
}
{code}

JAVA_HOME, LD_LIBRARY_PATH, and CLASSPATH can be variables that are defined in 
/etc/profile.d or Dockerfile to avoid having to specify them externally.  The 
same for {{cd /test/cifar10_estimator}} can be replaced with WORKDIR directive 
in Dockerfile.  Dockerfile defines:

{code}
WORKDIR /test/models/tutorials/image/cifar10_estimator 
ENTRYPOINT ["/usr/bin/python", "cifar10_main.py"]
{code}

This would help with readability of the configurations.

> Running Tensorflow on YARN with GPU and Docker - Examples
> -
>
> Key: YARN-8220
> URL: https://issues.apache.org/jira/browse/YARN-8220
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-native-services
>Reporter: Sunil Govindan
>Assignee: Sunil Govindan
>Priority: Critical
> Attachments: YARN-8220.001.patch, YARN-8220.002.patch
>
>
> Tensorflow could be run on YARN and could leverage YARN's distributed 
> features.
> This spec fill will help to run Tensorflow on yarn with GPU/docker



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8220) Running Tensorflow on YARN with GPU and Docker - Examples

2018-06-10 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16507556#comment-16507556
 ] 

genericqa commented on YARN-8220:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
34s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
18s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 28m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 55s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
52s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
17s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 27m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 27m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m  
6s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} pylint {color} | {color:orange}  0m 
17s{color} | {color:orange} The patch generated 351 new + 0 unchanged - 0 fixed 
= 351 total (was 0) {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 24 line(s) that end in whitespace. Use 
git apply --whitespace=fix <>. Refer 
https://git-scm.com/docs/git-apply {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
5s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m  4s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
16s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
23s{color} | {color:green} hadoop-project in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 28m 24s{color} 
| {color:red} hadoop-yarn-applications in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
24s{color} | {color:green} hadoop-yarn-deep-learning-frameworks in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
38s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}144m 16s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-8220 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12927230/YARN-8220.002.patch |
| Optional Tests |  asflicense  mvnsite  xml  compile  javac  javadoc  
mvninstall  unit  shadedclient  pylint  |
| uname | Linux 1b1ea5c85c75 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / ccfb816 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_171 |
| pylin

[jira] [Commented] (YARN-8220) Running Tensorflow on YARN with GPU and Docker - Examples

2018-06-10 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16507553#comment-16507553
 ] 

Wangda Tan commented on YARN-8220:
--

[~eyang],

Fair enough, could u help to give some examples of how to use ENTRYPOINT (to 
expose multiple envars) and pass launch_command at the same time? Is there any 
configs needed?  

> Running Tensorflow on YARN with GPU and Docker - Examples
> -
>
> Key: YARN-8220
> URL: https://issues.apache.org/jira/browse/YARN-8220
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-native-services
>Reporter: Sunil Govindan
>Assignee: Sunil Govindan
>Priority: Critical
> Attachments: YARN-8220.001.patch, YARN-8220.002.patch
>
>
> Tensorflow could be run on YARN and could leverage YARN's distributed 
> features.
> This spec fill will help to run Tensorflow on yarn with GPU/docker



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8220) Running Tensorflow on YARN with GPU and Docker - Examples

2018-06-10 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16507542#comment-16507542
 ] 

Eric Yang commented on YARN-8220:
-

[~leftnoteasy] Launch_command containers environment variables and Hadoop 
versions that have better ways to be handled with care.  As soon as Hadoop 
version changes, the launch_command is out dated, and manual care is required 
to clean up the messy code.  I don't think this is a good idea to ignore 
ENTRYPOINT advice.

> Running Tensorflow on YARN with GPU and Docker - Examples
> -
>
> Key: YARN-8220
> URL: https://issues.apache.org/jira/browse/YARN-8220
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-native-services
>Reporter: Sunil Govindan
>Assignee: Sunil Govindan
>Priority: Critical
> Attachments: YARN-8220.001.patch, YARN-8220.002.patch
>
>
> Tensorflow could be run on YARN and could leverage YARN's distributed 
> features.
> This spec fill will help to run Tensorflow on yarn with GPU/docker



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8220) Running Tensorflow on YARN with GPU and Docker - Examples

2018-06-10 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16507523#comment-16507523
 ] 

Wangda Tan commented on YARN-8220:
--

Attached ver.2 patch, fixed jenkins reported warnings.

Addressed the {{git clone}} suggestion from [~eyang], now the scripts are 
embedded inside the project. 

For the entry point, it is a good feature but I think it may not be best suit 
the training example. We can consider to use it when we want to add the 
zeppelin + TF or tensorflow serving example. Sounds good, [~eyang]? 

> Running Tensorflow on YARN with GPU and Docker - Examples
> -
>
> Key: YARN-8220
> URL: https://issues.apache.org/jira/browse/YARN-8220
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-native-services
>Reporter: Sunil Govindan
>Assignee: Sunil Govindan
>Priority: Critical
> Attachments: YARN-8220.001.patch, YARN-8220.002.patch
>
>
> Tensorflow could be run on YARN and could leverage YARN's distributed 
> features.
> This spec fill will help to run Tensorflow on yarn with GPU/docker



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8220) Running Tensorflow on YARN with GPU and Docker - Examples

2018-06-02 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16499160#comment-16499160
 ] 

genericqa commented on YARN-8220:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
30s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
18s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 25m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 28m 
42s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
18s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 33s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
51s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
18s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 27m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 27m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
45s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} pylint {color} | {color:orange}  0m  
4s{color} | {color:orange} The patch generated 204 new + 0 unchanged - 0 fixed 
= 204 total (was 0) {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 28 line(s) that end in whitespace. Use 
git apply --whitespace=fix <>. Refer 
https://git-scm.com/docs/git-apply {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
5s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 24s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
15s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
22s{color} | {color:green} hadoop-project in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 28m  3s{color} 
| {color:red} hadoop-yarn-applications in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
23s{color} | {color:green} hadoop-yarn-deep-learning-frameworks in the patch 
passed. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
40s{color} | {color:red} The patch generated 2 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}141m 31s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.applications.distributedshell.TestDistributedShell |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-8220 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12926048/YARN-8220.001.patch |
| Optional Tests |  asflicense  mvnsite  xml  compile  javac  javadoc  
mvninstall  unit  shadedclient  pylint  |
| uname | Linux 70e9f3b9366c 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git rev

[jira] [Commented] (YARN-8220) Running Tensorflow on YARN with GPU and Docker - Examples

2018-06-02 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16499130#comment-16499130
 ] 

Wangda Tan commented on YARN-8220:
--

Reopened Jira to trigger Jenkins.

> Running Tensorflow on YARN with GPU and Docker - Examples
> -
>
> Key: YARN-8220
> URL: https://issues.apache.org/jira/browse/YARN-8220
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-native-services
>Reporter: Sunil Govindan
>Assignee: Sunil Govindan
>Priority: Critical
> Attachments: YARN-8220.001.patch
>
>
> Tensorflow could be run on YARN and could leverage YARN's distributed 
> features.
> This spec fill will help to run Tensorflow on yarn with GPU/docker



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8220) Running Tensorflow on YARN with GPU and Docker - Examples

2018-06-02 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16498929#comment-16498929
 ] 

Eric Yang commented on YARN-8220:
-

[~sunilg] ENTRYPOINT could be anything even bash.  If you want a more flexible 
image, you can use python as ENTRYPOINT, and specify cifar10_main.py as a 
parameter.  It is mostly a mirror image of what you can do with a shell prompt 
except pipe or output redirection are not support by default.  There is one 
less child process (bash) between node manager to end user application to keep 
it more efficient, secure, and avoid some race conditions between first program 
logging and mounting external log directory to log in some scenarios.

> Running Tensorflow on YARN with GPU and Docker - Examples
> -
>
> Key: YARN-8220
> URL: https://issues.apache.org/jira/browse/YARN-8220
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-native-services
>Reporter: Sunil Govindan
>Assignee: Sunil Govindan
>Priority: Critical
> Attachments: YARN-8220.001.patch
>
>
> Tensorflow could be run on YARN and could leverage YARN's distributed 
> features.
> This spec fill will help to run Tensorflow on yarn with GPU/docker



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8220) Running Tensorflow on YARN with GPU and Docker - Examples

2018-06-01 Thread Sunil Govindan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16498882#comment-16498882
 ] 

Sunil Govindan commented on YARN-8220:
--

Hi [~eyang]

One quick doubt

bq.ENTRYPOINT, and CMD in Dockerfile

This means that the ENTRYPOINT and other CMDs are to be specified in 
Dockerfile. This means we need different Dockerfiles to run different TF 
workload which may inconvenient, correct? We could have changed jobs in 
Yarnfile itself. Pls correct me if I am wrong.

> Running Tensorflow on YARN with GPU and Docker - Examples
> -
>
> Key: YARN-8220
> URL: https://issues.apache.org/jira/browse/YARN-8220
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-native-services
>Reporter: Sunil Govindan
>Assignee: Sunil Govindan
>Priority: Critical
> Attachments: YARN-8220.001.patch
>
>
> Tensorflow could be run on YARN and could leverage YARN's distributed 
> features.
> This spec fill will help to run Tensorflow on yarn with GPU/docker



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8220) Running Tensorflow on YARN with GPU and Docker - Examples

2018-06-01 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16498365#comment-16498365
 ] 

Eric Yang commented on YARN-8220:
-

[~leftnoteasy] {quote}
Entry point is a nice feature for static command. (For example default TF 
docker image which start notebook by default: 
https://github.com/tensorflow/tensorflow/tree/r1.8/tensorflow/tools/docker). 
For training program, since user need to do a lot of hyper parameter tuning, 
user will update such parameters to make it work.{quote}

Additional parameters can pass to ENTRYPOINT via CMD, which is same as 
specifying it in launch_command, if ENTRYPOINT is in use.  Yarnfile is shorten 
to:

{code}
{
  ..
"launch_command":"--train-steps=1,--trans-batch-size=16"
  ..
}
{code}

Or any parameters that has not been specified in ENTRYPOINT+CMD combination.

> Running Tensorflow on YARN with GPU and Docker - Examples
> -
>
> Key: YARN-8220
> URL: https://issues.apache.org/jira/browse/YARN-8220
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-native-services
>Reporter: Sunil Govindan
>Assignee: Sunil Govindan
>Priority: Critical
> Attachments: YARN-8220.001.patch
>
>
> Tensorflow could be run on YARN and could leverage YARN's distributed 
> features.
> This spec fill will help to run Tensorflow on yarn with GPU/docker



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8220) Running Tensorflow on YARN with GPU and Docker - Examples

2018-06-01 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16498270#comment-16498270
 ] 

Wangda Tan commented on YARN-8220:
--

Thanks [~eyang] for your comments,

For your comments:
bq. 1. Avoid using bash style launch command 
Entry point is a nice feature for static command. (For example default TF 
docker image which start notebook by default: 
https://github.com/tensorflow/tensorflow/tree/r1.8/tensorflow/tools/docker). 
For training program, since user need to do a lot of hyper parameter tuning, 
user will update such parameters to make it work. 

bq. 2. It might be good to show case some yarnfile features:
We intentionally want to avoid user specify this. It is a burden for user to 
specify such mounting. In side submit_tf.py, we use the feature you mentioned. 

bq. 3. Downloading source code from individual github contributors might be 
risky and prone to break
This is a good suggestion, will check if it is possible to commit example code 
to sub folder of this example.

> Running Tensorflow on YARN with GPU and Docker - Examples
> -
>
> Key: YARN-8220
> URL: https://issues.apache.org/jira/browse/YARN-8220
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-native-services
>Reporter: Sunil Govindan
>Assignee: Sunil Govindan
>Priority: Critical
> Attachments: YARN-8220.001.patch
>
>
> Tensorflow could be run on YARN and could leverage YARN's distributed 
> features.
> This spec fill will help to run Tensorflow on yarn with GPU/docker



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8220) Running Tensorflow on YARN with GPU and Docker - Examples

2018-06-01 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16498199#comment-16498199
 ] 

Eric Yang commented on YARN-8220:
-

[~sunilg] Thank you for the patch, a couple suggestions:

1. Avoid using bash style launch command.  Although this is kind of working, 
but it greatly improves security and readability to use ENTRYPOINT, and CMD in 
Dockerfile.  For example:

{code}
WORKDIR /test/models/tutorials/image/cifar10_estimator 
ENTRYPOINT ["/usr/bin/python", "cifar10_main.py"]
CMD ["--data-dir=hdfs:///tmp/cifar-10-data"]
CMD ["--job-dir=hdfs:///tmp/cifar-10-jobdir"]
CMD ["--train-steps=1"]
CMD ["--eval-batch-size=16"]
CMD ["--train-batch-size=16"]
CMD ["--sync"]
CMD ["--num-gpus=2"]
{code}

This simplifies yarnfile, and prevent to run the script in wrong directory if 
working directory doesn't exist.

2. It might be good to show case some yarnfile features:

{code}
{
..
  "configuration": {
"env": {
  
"YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS":"/etc/hadoop/conf:/etc/hadoop/conf:ro",
  "YARN_CONTAINER_RUNTIME_DOCKER_RUN_OVERRIDE_DISABLE":"true"
}
  }
..
}
{code}

This helps to show case how to mount configuration files from host disks, and 
use ENTRYPOINT support.

3. Downloading source code from individual github contributors might be risky 
and prone to break.  If the source is small enough and donated to Apache, it 
would be better to host them locally.

> Running Tensorflow on YARN with GPU and Docker - Examples
> -
>
> Key: YARN-8220
> URL: https://issues.apache.org/jira/browse/YARN-8220
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-native-services
>Reporter: Sunil Govindan
>Assignee: Sunil Govindan
>Priority: Critical
> Attachments: YARN-8220.001.patch
>
>
> Tensorflow could be run on YARN and could leverage YARN's distributed 
> features.
> This spec fill will help to run Tensorflow on yarn with GPU/docker



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8220) Running Tensorflow on YARN with GPU and Docker - Examples

2018-06-01 Thread Sunil Govindan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16497708#comment-16497708
 ] 

Sunil Govindan commented on YARN-8220:
--

Attaching v1 patch. This patch majorly covers all scripts/examples/docker file 
etc which will help to run Tensorflow on YARN (Distributed/Standalone).

Thank you very much [~leftnoteasy] for helping out to integrate TF in YARN with 
GPU/Docker.

 

Details of this work:
 # Script to auto-generate native service spec file for Tensorflow jobs which 
will auto submit service to YARN. This will help to run TF jobs on YARN without 
any complexity. Detailed example is available in the doc.
 # Support to run latest Tensorflow 1.8 and CUDA 9  on YARN.
 # Distributed Tensorflow support. User could simply run this by providing 
{{--distributed}} option the script and multiple *worker* could run in 
different nodes and could leverage the resources in YARN.
 # Dockerfile is provided for various cases (GPU/CPU, Different Tensorflow 
versions) etc.
 # Various tests are done based on TF version / GPU etc and results are 
published as part of the document in the patch.

Example:
{code:java}
python submit_tf_job.py --remote_conf_path hdfs:///tf-job-conf --input_spec 
example_tf_job_spec.json --docker_image gpu.cuda_9.0.tf_1.8.0 --job_name 
distributed-tf-gpu --user tf-user --domain tensorflow.site --distributed 
--kerberos
{code}
cc [~vinodkv] [~rohithsharma]

> Running Tensorflow on YARN with GPU and Docker - Examples
> -
>
> Key: YARN-8220
> URL: https://issues.apache.org/jira/browse/YARN-8220
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-native-services
>Reporter: Sunil Govindan
>Assignee: Sunil Govindan
>Priority: Critical
> Attachments: YARN-8220.001.patch
>
>
> Tensorflow could be run on YARN and could leverage YARN's distributed 
> features.
> This spec fill will help to run Tensorflow on yarn with GPU/docker



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org