[ 
https://issues.apache.org/jira/browse/YARN-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16508408#comment-16508408
 ] 

Eric Yang commented on YARN-8220:
---------------------------------

[~leftnoteasy] Base on RunTensorflowJobUsingNativeServiceSpec.md the code can 
be changed to:

{code}
{
    "name": "single-node-tensorflow",
    "version": "1.0.0",
    "components": [
        {
            "artifact" : {
              "id" : <docker-image-name>,
              "type" : "DOCKER"
            },
            "name": "worker",
            "dependencies": [],
            "resource": {
                "cpus": 1,
                "memory": "4096",
                "additional" : {
                  "yarn.io/gpu" : {
                    "value" : 2
                   }
                }
            },
            "launch_command": 
"--data-dir=hdfs://default/tmp/cifar-10-data,--job-dir=hdfs://default/tmp/cifar-10-jobdir,--num-gpus=1,--train-batch-size=16,--train-steps=40000",
            "number_of_containers": 1,
            "run_privileged_container": false,
            "configuration": {
                "env": {
                  "HADOOP_HOME": "/hadoop-3.1.0",
                  "HADOOP_HDFS_HOME": "",
                  "HADOOP_YARN_HOME": "",
                  "HADOOP_CONF_DIR": "/etc/hadoop/conf",
                  "YARN_CONTAINER_RUNTIME_DOCKER_RUN_OVERRIDE_DISABLE":"true"
                }
            }
        }
    ],
    "kerberos_principal" : {
      "principal_name" : "test-u...@example.com",
      "keytab" : "file:///etc/security/keytabs/test-user.headless.keytab"
    }
}
{code}

JAVA_HOME, LD_LIBRARY_PATH, and CLASSPATH can be variables that are defined in 
/etc/profile.d or Dockerfile to avoid having to specify them externally.  The 
same for {{cd /test/cifar10_estimator}} can be replaced with WORKDIR directive 
in Dockerfile.  Dockerfile defines:

{code}
WORKDIR /test/models/tutorials/image/cifar10_estimator 
ENTRYPOINT ["/usr/bin/python", "cifar10_main.py"]
{code}

This would help with readability of the configurations.

> Running Tensorflow on YARN with GPU and Docker - Examples
> ---------------------------------------------------------
>
>                 Key: YARN-8220
>                 URL: https://issues.apache.org/jira/browse/YARN-8220
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: yarn-native-services
>            Reporter: Sunil Govindan
>            Assignee: Sunil Govindan
>            Priority: Critical
>         Attachments: YARN-8220.001.patch, YARN-8220.002.patch
>
>
> Tensorflow could be run on YARN and could leverage YARN's distributed 
> features.
> This spec fill will help to run Tensorflow on yarn with GPU/docker



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to