[ 
https://issues.apache.org/jira/browse/YARN-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16497708#comment-16497708
 ] 

Sunil Govindan commented on YARN-8220:
--------------------------------------

Attaching v1 patch. This patch majorly covers all scripts/examples/docker file 
etc which will help to run Tensorflow on YARN (Distributed/Standalone).

Thank you very much [~leftnoteasy] for helping out to integrate TF in YARN with 
GPU/Docker.

 

Details of this work:
 # Script to auto-generate native service spec file for Tensorflow jobs which 
will auto submit service to YARN. This will help to run TF jobs on YARN without 
any complexity. Detailed example is available in the doc.
 # Support to run latest Tensorflow 1.8 and CUDA 9  on YARN.
 # Distributed Tensorflow support. User could simply run this by providing 
{{--distributed}} option the script and multiple *worker* could run in 
different nodes and could leverage the resources in YARN.
 # Dockerfile is provided for various cases (GPU/CPU, Different Tensorflow 
versions) etc.
 # Various tests are done based on TF version / GPU etc and results are 
published as part of the document in the patch.

Example:
{code:java}
python submit_tf_job.py --remote_conf_path hdfs:///tf-job-conf --input_spec 
example_tf_job_spec.json --docker_image gpu.cuda_9.0.tf_1.8.0 --job_name 
distributed-tf-gpu --user tf-user --domain tensorflow.site --distributed 
--kerberos
{code}
cc [~vinodkv] [~rohithsharma]

> Running Tensorflow on YARN with GPU and Docker - Examples
> ---------------------------------------------------------
>
>                 Key: YARN-8220
>                 URL: https://issues.apache.org/jira/browse/YARN-8220
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: yarn-native-services
>            Reporter: Sunil Govindan
>            Assignee: Sunil Govindan
>            Priority: Critical
>         Attachments: YARN-8220.001.patch
>
>
> Tensorflow could be run on YARN and could leverage YARN's distributed 
> features.
> This spec fill will help to run Tensorflow on yarn with GPU/docker



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to