[jira] [Commented] (YARN-8220) Running Tensorflow on YARN with GPU and Docker - Examples
[ https://issues.apache.org/jira/browse/YARN-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16595944#comment-16595944 ] Wangda Tan commented on YARN-8220: -- Thanks [~sunilg], I think we should close this JIRA. > Running Tensorflow on YARN with GPU and Docker - Examples > - > > Key: YARN-8220 > URL: https://issues.apache.org/jira/browse/YARN-8220 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn-native-services >Reporter: Sunil Govindan >Assignee: Sunil Govindan >Priority: Critical > Attachments: YARN-8220.001.patch, YARN-8220.002.patch, > YARN-8220.003.patch, YARN-8220.004.patch > > > Tensorflow could be run on YARN and could leverage YARN's distributed > features. > This spec fill will help to run Tensorflow on yarn with GPU/docker -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8220) Running Tensorflow on YARN with GPU and Docker - Examples
[ https://issues.apache.org/jira/browse/YARN-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16595942#comment-16595942 ] Sunil Govindan commented on YARN-8220: -- hi [~leftnoteasy] As Submarine is in, i think this work is not very important to go in for tensorflow. If u agree, i can cancel patch and close the same. > Running Tensorflow on YARN with GPU and Docker - Examples > - > > Key: YARN-8220 > URL: https://issues.apache.org/jira/browse/YARN-8220 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn-native-services >Reporter: Sunil Govindan >Assignee: Sunil Govindan >Priority: Critical > Attachments: YARN-8220.001.patch, YARN-8220.002.patch, > YARN-8220.003.patch, YARN-8220.004.patch > > > Tensorflow could be run on YARN and could leverage YARN's distributed > features. > This spec fill will help to run Tensorflow on yarn with GPU/docker -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8220) Running Tensorflow on YARN with GPU and Docker - Examples
[ https://issues.apache.org/jira/browse/YARN-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16523020#comment-16523020 ] genericqa commented on YARN-8220: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 36s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 17s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 27m 8s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 28m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 18s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 4s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 53s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 18s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 29m 5s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 29m 5s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 37s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} pylint {color} | {color:orange} 0m 17s{color} | {color:orange} The patch generated 387 new + 0 unchanged - 0 fixed = 387 total (was 0) {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 24 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 5s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 15s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 14s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 22s{color} | {color:green} hadoop-project in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 29m 12s{color} | {color:red} hadoop-yarn-applications in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 23s{color} | {color:green} hadoop-yarn-deep-learning-frameworks in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 40s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}147m 56s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.applications.distributedshell.TestDistributedShell | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd | | JIRA Issue | YARN-8220 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12929097/YARN-8220.004.patch | | Optional Tests | asflicense mvnsite xml compile javac javadoc mvninstall unit shadedclient pylint | | uname | Linux 66762fc11f91 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided
[jira] [Commented] (YARN-8220) Running Tensorflow on YARN with GPU and Docker - Examples
[ https://issues.apache.org/jira/browse/YARN-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16522985#comment-16522985 ] genericqa commented on YARN-8220: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 22s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 41s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 26m 31s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 2s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 7s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 38s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 17s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 26m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 26m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 20s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} pylint {color} | {color:orange} 0m 15s{color} | {color:orange} The patch generated 387 new + 0 unchanged - 0 fixed = 387 total (was 0) {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 24 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 5s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 5s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 0s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 19s{color} | {color:green} hadoop-project in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 29m 31s{color} | {color:red} hadoop-yarn-applications in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 20s{color} | {color:green} hadoop-yarn-deep-learning-frameworks in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 28s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}135m 11s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd | | JIRA Issue | YARN-8220 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12929091/YARN-8220.003.patch | | Optional Tests | asflicense mvnsite xml compile javac javadoc mvninstall unit shadedclient pylint | | uname | Linux 00045d11878b 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / c687a66 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_171 | | pylint
[jira] [Commented] (YARN-8220) Running Tensorflow on YARN with GPU and Docker - Examples
[ https://issues.apache.org/jira/browse/YARN-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16522897#comment-16522897 ] Wangda Tan commented on YARN-8220: -- Attached ver.4 patch, removed duplicated contents inside Dockerfile and make them built from base images. > Running Tensorflow on YARN with GPU and Docker - Examples > - > > Key: YARN-8220 > URL: https://issues.apache.org/jira/browse/YARN-8220 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn-native-services >Reporter: Sunil Govindan >Assignee: Sunil Govindan >Priority: Critical > Attachments: YARN-8220.001.patch, YARN-8220.002.patch, > YARN-8220.003.patch, YARN-8220.004.patch > > > Tensorflow could be run on YARN and could leverage YARN's distributed > features. > This spec fill will help to run Tensorflow on yarn with GPU/docker -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8220) Running Tensorflow on YARN with GPU and Docker - Examples
[ https://issues.apache.org/jira/browse/YARN-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16522855#comment-16522855 ] Wangda Tan commented on YARN-8220: -- Attached ver.3 patch, added several fixes to submit-tf-job.py helper script. And added tensorboard to example launch spec. Thanks [~yanboliang] for offline suggestions and helps of these changes. > Running Tensorflow on YARN with GPU and Docker - Examples > - > > Key: YARN-8220 > URL: https://issues.apache.org/jira/browse/YARN-8220 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn-native-services >Reporter: Sunil Govindan >Assignee: Sunil Govindan >Priority: Critical > Attachments: YARN-8220.001.patch, YARN-8220.002.patch, > YARN-8220.003.patch > > > Tensorflow could be run on YARN and could leverage YARN's distributed > features. > This spec fill will help to run Tensorflow on yarn with GPU/docker -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8220) Running Tensorflow on YARN with GPU and Docker - Examples
[ https://issues.apache.org/jira/browse/YARN-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16509112#comment-16509112 ] Eric Yang commented on YARN-8220: - [~leftnoteasy] If JAVA_HOME, CLASSPATH, HDFS_HOME are not changeable. There is no reason to expose them. Instead, ENTRYPOINT shell script can hard code the predefined values to make the docker image self contained. > Running Tensorflow on YARN with GPU and Docker - Examples > - > > Key: YARN-8220 > URL: https://issues.apache.org/jira/browse/YARN-8220 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn-native-services >Reporter: Sunil Govindan >Assignee: Sunil Govindan >Priority: Critical > Attachments: YARN-8220.001.patch, YARN-8220.002.patch > > > Tensorflow could be run on YARN and could leverage YARN's distributed > features. > This spec fill will help to run Tensorflow on yarn with GPU/docker -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8220) Running Tensorflow on YARN with GPU and Docker - Examples
[ https://issues.apache.org/jira/browse/YARN-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16509094#comment-16509094 ] Wangda Tan commented on YARN-8220: -- Discussed with [~eyang] about this and did some tests: Currently, YARN NM passes JAVA_HOME, HDFS_HOME, CLASSPATH environments before launching Docker container no matter if ENTRY_POINT is used or not. This will overwrite environments defined inside Dockerfile (by using \{{ENV}}). For Docker container, it actually doesn't make sense to pass JAVA_HOME, HDFS_HOME, etc. because inside docker image we have a separate Java/Hadoop installed or mounted to exactly same directory of host machine. I just filed YARN-8417 to revisit this behavior. Once the above change is done, we actually don't need to presetup common configs inside service spec or presetup.sh, everything could be done very cleanly inside the Dockerfile. For this patch: Considering size of this patch, I suggest to get it merged before YARN-8417. We can continuously improve it (like using ENV/ENTRY_POINT) after YARN-8417 and feedbacks from others. Really appreciate valuable inputs from [~eyang]! > Running Tensorflow on YARN with GPU and Docker - Examples > - > > Key: YARN-8220 > URL: https://issues.apache.org/jira/browse/YARN-8220 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn-native-services >Reporter: Sunil Govindan >Assignee: Sunil Govindan >Priority: Critical > Attachments: YARN-8220.001.patch, YARN-8220.002.patch > > > Tensorflow could be run on YARN and could leverage YARN's distributed > features. > This spec fill will help to run Tensorflow on yarn with GPU/docker -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8220) Running Tensorflow on YARN with GPU and Docker - Examples
[ https://issues.apache.org/jira/browse/YARN-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16508408#comment-16508408 ] Eric Yang commented on YARN-8220: - [~leftnoteasy] Base on RunTensorflowJobUsingNativeServiceSpec.md the code can be changed to: {code} { "name": "single-node-tensorflow", "version": "1.0.0", "components": [ { "artifact" : { "id" : , "type" : "DOCKER" }, "name": "worker", "dependencies": [], "resource": { "cpus": 1, "memory": "4096", "additional" : { "yarn.io/gpu" : { "value" : 2 } } }, "launch_command": "--data-dir=hdfs://default/tmp/cifar-10-data,--job-dir=hdfs://default/tmp/cifar-10-jobdir,--num-gpus=1,--train-batch-size=16,--train-steps=4", "number_of_containers": 1, "run_privileged_container": false, "configuration": { "env": { "HADOOP_HOME": "/hadoop-3.1.0", "HADOOP_HDFS_HOME": "", "HADOOP_YARN_HOME": "", "HADOOP_CONF_DIR": "/etc/hadoop/conf", "YARN_CONTAINER_RUNTIME_DOCKER_RUN_OVERRIDE_DISABLE":"true" } } } ], "kerberos_principal" : { "principal_name" : "test-u...@example.com", "keytab" : "file:///etc/security/keytabs/test-user.headless.keytab" } } {code} JAVA_HOME, LD_LIBRARY_PATH, and CLASSPATH can be variables that are defined in /etc/profile.d or Dockerfile to avoid having to specify them externally. The same for {{cd /test/cifar10_estimator}} can be replaced with WORKDIR directive in Dockerfile. Dockerfile defines: {code} WORKDIR /test/models/tutorials/image/cifar10_estimator ENTRYPOINT ["/usr/bin/python", "cifar10_main.py"] {code} This would help with readability of the configurations. > Running Tensorflow on YARN with GPU and Docker - Examples > - > > Key: YARN-8220 > URL: https://issues.apache.org/jira/browse/YARN-8220 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn-native-services >Reporter: Sunil Govindan >Assignee: Sunil Govindan >Priority: Critical > Attachments: YARN-8220.001.patch, YARN-8220.002.patch > > > Tensorflow could be run on YARN and could leverage YARN's distributed > features. > This spec fill will help to run Tensorflow on yarn with GPU/docker -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8220) Running Tensorflow on YARN with GPU and Docker - Examples
[ https://issues.apache.org/jira/browse/YARN-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16507556#comment-16507556 ] genericqa commented on YARN-8220: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 34s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 18s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 20s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 28m 34s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 19s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 55s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 52s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 17s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 18s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 27m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 27m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 6s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} pylint {color} | {color:orange} 0m 17s{color} | {color:orange} The patch generated 351 new + 0 unchanged - 0 fixed = 351 total (was 0) {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 24 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 5s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 4s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 16s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 23s{color} | {color:green} hadoop-project in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 28m 24s{color} | {color:red} hadoop-yarn-applications in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 24s{color} | {color:green} hadoop-yarn-deep-learning-frameworks in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 38s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}144m 16s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd | | JIRA Issue | YARN-8220 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12927230/YARN-8220.002.patch | | Optional Tests | asflicense mvnsite xml compile javac javadoc mvninstall unit shadedclient pylint | | uname | Linux 1b1ea5c85c75 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / ccfb816 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_171 | | pylin
[jira] [Commented] (YARN-8220) Running Tensorflow on YARN with GPU and Docker - Examples
[ https://issues.apache.org/jira/browse/YARN-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16507553#comment-16507553 ] Wangda Tan commented on YARN-8220: -- [~eyang], Fair enough, could u help to give some examples of how to use ENTRYPOINT (to expose multiple envars) and pass launch_command at the same time? Is there any configs needed? > Running Tensorflow on YARN with GPU and Docker - Examples > - > > Key: YARN-8220 > URL: https://issues.apache.org/jira/browse/YARN-8220 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn-native-services >Reporter: Sunil Govindan >Assignee: Sunil Govindan >Priority: Critical > Attachments: YARN-8220.001.patch, YARN-8220.002.patch > > > Tensorflow could be run on YARN and could leverage YARN's distributed > features. > This spec fill will help to run Tensorflow on yarn with GPU/docker -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8220) Running Tensorflow on YARN with GPU and Docker - Examples
[ https://issues.apache.org/jira/browse/YARN-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16507542#comment-16507542 ] Eric Yang commented on YARN-8220: - [~leftnoteasy] Launch_command containers environment variables and Hadoop versions that have better ways to be handled with care. As soon as Hadoop version changes, the launch_command is out dated, and manual care is required to clean up the messy code. I don't think this is a good idea to ignore ENTRYPOINT advice. > Running Tensorflow on YARN with GPU and Docker - Examples > - > > Key: YARN-8220 > URL: https://issues.apache.org/jira/browse/YARN-8220 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn-native-services >Reporter: Sunil Govindan >Assignee: Sunil Govindan >Priority: Critical > Attachments: YARN-8220.001.patch, YARN-8220.002.patch > > > Tensorflow could be run on YARN and could leverage YARN's distributed > features. > This spec fill will help to run Tensorflow on yarn with GPU/docker -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8220) Running Tensorflow on YARN with GPU and Docker - Examples
[ https://issues.apache.org/jira/browse/YARN-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16507523#comment-16507523 ] Wangda Tan commented on YARN-8220: -- Attached ver.2 patch, fixed jenkins reported warnings. Addressed the {{git clone}} suggestion from [~eyang], now the scripts are embedded inside the project. For the entry point, it is a good feature but I think it may not be best suit the training example. We can consider to use it when we want to add the zeppelin + TF or tensorflow serving example. Sounds good, [~eyang]? > Running Tensorflow on YARN with GPU and Docker - Examples > - > > Key: YARN-8220 > URL: https://issues.apache.org/jira/browse/YARN-8220 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn-native-services >Reporter: Sunil Govindan >Assignee: Sunil Govindan >Priority: Critical > Attachments: YARN-8220.001.patch, YARN-8220.002.patch > > > Tensorflow could be run on YARN and could leverage YARN's distributed > features. > This spec fill will help to run Tensorflow on yarn with GPU/docker -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8220) Running Tensorflow on YARN with GPU and Docker - Examples
[ https://issues.apache.org/jira/browse/YARN-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16499160#comment-16499160 ] genericqa commented on YARN-8220: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 30s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 18s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 25m 38s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 28m 42s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 18s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 33s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 51s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 18s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 27m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 27m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 45s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} pylint {color} | {color:orange} 0m 4s{color} | {color:orange} The patch generated 204 new + 0 unchanged - 0 fixed = 204 total (was 0) {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 28 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 5s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 24s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 15s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 22s{color} | {color:green} hadoop-project in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 28m 3s{color} | {color:red} hadoop-yarn-applications in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 23s{color} | {color:green} hadoop-yarn-deep-learning-frameworks in the patch passed. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 40s{color} | {color:red} The patch generated 2 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}141m 31s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.applications.distributedshell.TestDistributedShell | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd | | JIRA Issue | YARN-8220 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12926048/YARN-8220.001.patch | | Optional Tests | asflicense mvnsite xml compile javac javadoc mvninstall unit shadedclient pylint | | uname | Linux 70e9f3b9366c 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git rev
[jira] [Commented] (YARN-8220) Running Tensorflow on YARN with GPU and Docker - Examples
[ https://issues.apache.org/jira/browse/YARN-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16499130#comment-16499130 ] Wangda Tan commented on YARN-8220: -- Reopened Jira to trigger Jenkins. > Running Tensorflow on YARN with GPU and Docker - Examples > - > > Key: YARN-8220 > URL: https://issues.apache.org/jira/browse/YARN-8220 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn-native-services >Reporter: Sunil Govindan >Assignee: Sunil Govindan >Priority: Critical > Attachments: YARN-8220.001.patch > > > Tensorflow could be run on YARN and could leverage YARN's distributed > features. > This spec fill will help to run Tensorflow on yarn with GPU/docker -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8220) Running Tensorflow on YARN with GPU and Docker - Examples
[ https://issues.apache.org/jira/browse/YARN-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16498929#comment-16498929 ] Eric Yang commented on YARN-8220: - [~sunilg] ENTRYPOINT could be anything even bash. If you want a more flexible image, you can use python as ENTRYPOINT, and specify cifar10_main.py as a parameter. It is mostly a mirror image of what you can do with a shell prompt except pipe or output redirection are not support by default. There is one less child process (bash) between node manager to end user application to keep it more efficient, secure, and avoid some race conditions between first program logging and mounting external log directory to log in some scenarios. > Running Tensorflow on YARN with GPU and Docker - Examples > - > > Key: YARN-8220 > URL: https://issues.apache.org/jira/browse/YARN-8220 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn-native-services >Reporter: Sunil Govindan >Assignee: Sunil Govindan >Priority: Critical > Attachments: YARN-8220.001.patch > > > Tensorflow could be run on YARN and could leverage YARN's distributed > features. > This spec fill will help to run Tensorflow on yarn with GPU/docker -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8220) Running Tensorflow on YARN with GPU and Docker - Examples
[ https://issues.apache.org/jira/browse/YARN-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16498882#comment-16498882 ] Sunil Govindan commented on YARN-8220: -- Hi [~eyang] One quick doubt bq.ENTRYPOINT, and CMD in Dockerfile This means that the ENTRYPOINT and other CMDs are to be specified in Dockerfile. This means we need different Dockerfiles to run different TF workload which may inconvenient, correct? We could have changed jobs in Yarnfile itself. Pls correct me if I am wrong. > Running Tensorflow on YARN with GPU and Docker - Examples > - > > Key: YARN-8220 > URL: https://issues.apache.org/jira/browse/YARN-8220 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn-native-services >Reporter: Sunil Govindan >Assignee: Sunil Govindan >Priority: Critical > Attachments: YARN-8220.001.patch > > > Tensorflow could be run on YARN and could leverage YARN's distributed > features. > This spec fill will help to run Tensorflow on yarn with GPU/docker -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8220) Running Tensorflow on YARN with GPU and Docker - Examples
[ https://issues.apache.org/jira/browse/YARN-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16498365#comment-16498365 ] Eric Yang commented on YARN-8220: - [~leftnoteasy] {quote} Entry point is a nice feature for static command. (For example default TF docker image which start notebook by default: https://github.com/tensorflow/tensorflow/tree/r1.8/tensorflow/tools/docker). For training program, since user need to do a lot of hyper parameter tuning, user will update such parameters to make it work.{quote} Additional parameters can pass to ENTRYPOINT via CMD, which is same as specifying it in launch_command, if ENTRYPOINT is in use. Yarnfile is shorten to: {code} { .. "launch_command":"--train-steps=1,--trans-batch-size=16" .. } {code} Or any parameters that has not been specified in ENTRYPOINT+CMD combination. > Running Tensorflow on YARN with GPU and Docker - Examples > - > > Key: YARN-8220 > URL: https://issues.apache.org/jira/browse/YARN-8220 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn-native-services >Reporter: Sunil Govindan >Assignee: Sunil Govindan >Priority: Critical > Attachments: YARN-8220.001.patch > > > Tensorflow could be run on YARN and could leverage YARN's distributed > features. > This spec fill will help to run Tensorflow on yarn with GPU/docker -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8220) Running Tensorflow on YARN with GPU and Docker - Examples
[ https://issues.apache.org/jira/browse/YARN-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16498270#comment-16498270 ] Wangda Tan commented on YARN-8220: -- Thanks [~eyang] for your comments, For your comments: bq. 1. Avoid using bash style launch command Entry point is a nice feature for static command. (For example default TF docker image which start notebook by default: https://github.com/tensorflow/tensorflow/tree/r1.8/tensorflow/tools/docker). For training program, since user need to do a lot of hyper parameter tuning, user will update such parameters to make it work. bq. 2. It might be good to show case some yarnfile features: We intentionally want to avoid user specify this. It is a burden for user to specify such mounting. In side submit_tf.py, we use the feature you mentioned. bq. 3. Downloading source code from individual github contributors might be risky and prone to break This is a good suggestion, will check if it is possible to commit example code to sub folder of this example. > Running Tensorflow on YARN with GPU and Docker - Examples > - > > Key: YARN-8220 > URL: https://issues.apache.org/jira/browse/YARN-8220 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn-native-services >Reporter: Sunil Govindan >Assignee: Sunil Govindan >Priority: Critical > Attachments: YARN-8220.001.patch > > > Tensorflow could be run on YARN and could leverage YARN's distributed > features. > This spec fill will help to run Tensorflow on yarn with GPU/docker -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8220) Running Tensorflow on YARN with GPU and Docker - Examples
[ https://issues.apache.org/jira/browse/YARN-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16498199#comment-16498199 ] Eric Yang commented on YARN-8220: - [~sunilg] Thank you for the patch, a couple suggestions: 1. Avoid using bash style launch command. Although this is kind of working, but it greatly improves security and readability to use ENTRYPOINT, and CMD in Dockerfile. For example: {code} WORKDIR /test/models/tutorials/image/cifar10_estimator ENTRYPOINT ["/usr/bin/python", "cifar10_main.py"] CMD ["--data-dir=hdfs:///tmp/cifar-10-data"] CMD ["--job-dir=hdfs:///tmp/cifar-10-jobdir"] CMD ["--train-steps=1"] CMD ["--eval-batch-size=16"] CMD ["--train-batch-size=16"] CMD ["--sync"] CMD ["--num-gpus=2"] {code} This simplifies yarnfile, and prevent to run the script in wrong directory if working directory doesn't exist. 2. It might be good to show case some yarnfile features: {code} { .. "configuration": { "env": { "YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS":"/etc/hadoop/conf:/etc/hadoop/conf:ro", "YARN_CONTAINER_RUNTIME_DOCKER_RUN_OVERRIDE_DISABLE":"true" } } .. } {code} This helps to show case how to mount configuration files from host disks, and use ENTRYPOINT support. 3. Downloading source code from individual github contributors might be risky and prone to break. If the source is small enough and donated to Apache, it would be better to host them locally. > Running Tensorflow on YARN with GPU and Docker - Examples > - > > Key: YARN-8220 > URL: https://issues.apache.org/jira/browse/YARN-8220 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn-native-services >Reporter: Sunil Govindan >Assignee: Sunil Govindan >Priority: Critical > Attachments: YARN-8220.001.patch > > > Tensorflow could be run on YARN and could leverage YARN's distributed > features. > This spec fill will help to run Tensorflow on yarn with GPU/docker -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8220) Running Tensorflow on YARN with GPU and Docker - Examples
[ https://issues.apache.org/jira/browse/YARN-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16497708#comment-16497708 ] Sunil Govindan commented on YARN-8220: -- Attaching v1 patch. This patch majorly covers all scripts/examples/docker file etc which will help to run Tensorflow on YARN (Distributed/Standalone). Thank you very much [~leftnoteasy] for helping out to integrate TF in YARN with GPU/Docker. Details of this work: # Script to auto-generate native service spec file for Tensorflow jobs which will auto submit service to YARN. This will help to run TF jobs on YARN without any complexity. Detailed example is available in the doc. # Support to run latest Tensorflow 1.8 and CUDA 9 on YARN. # Distributed Tensorflow support. User could simply run this by providing {{--distributed}} option the script and multiple *worker* could run in different nodes and could leverage the resources in YARN. # Dockerfile is provided for various cases (GPU/CPU, Different Tensorflow versions) etc. # Various tests are done based on TF version / GPU etc and results are published as part of the document in the patch. Example: {code:java} python submit_tf_job.py --remote_conf_path hdfs:///tf-job-conf --input_spec example_tf_job_spec.json --docker_image gpu.cuda_9.0.tf_1.8.0 --job_name distributed-tf-gpu --user tf-user --domain tensorflow.site --distributed --kerberos {code} cc [~vinodkv] [~rohithsharma] > Running Tensorflow on YARN with GPU and Docker - Examples > - > > Key: YARN-8220 > URL: https://issues.apache.org/jira/browse/YARN-8220 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn-native-services >Reporter: Sunil Govindan >Assignee: Sunil Govindan >Priority: Critical > Attachments: YARN-8220.001.patch > > > Tensorflow could be run on YARN and could leverage YARN's distributed > features. > This spec fill will help to run Tensorflow on yarn with GPU/docker -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org