[ https://issues.apache.org/jira/browse/YARN-8135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16431333#comment-16431333 ]
Wangda Tan commented on YARN-8135: ---------------------------------- I'm currently working on a design doc and a prototype, will share more details in the next several days. > Hadoop {Submarine} Project: Simple and scalable deployment of deep learning > training / serving jobs on Hadoop > ------------------------------------------------------------------------------------------------------------- > > Key: YARN-8135 > URL: https://issues.apache.org/jira/browse/YARN-8135 > Project: Hadoop YARN > Issue Type: New Feature > Reporter: Wangda Tan > Assignee: Wangda Tan > Priority: Major > Attachments: image-2018-04-09-14-35-16-778.png > > > Description: > *Goals:* > - Allow infra engineer / data scientist to run *unmodified* Tensorflow jobs > on YARN. > - Allow jobs easy access data/models in HDFS and other storages. > - Can launch services to serve Tensorflow/MXNet models. > - Support run distributed Tensorflow jobs with simple configs. > - Support run user-specified Docker images. > - Support specify GPU and other resources. > - Support launch tensorboard if user specified. > - Support customized DNS name for roles (like tensorboard.$user.$domain:6006) > *Why this name?* > - Because Submarine is the only vehicle can take human to deep places. B-) > Compare to other projects: > !image-2018-04-09-14-35-16-778.png! > *Notes:* > * GPU Isolation of XLearning project is achieved by patched YARN, which is > different from community’s GPU isolation solution. > ** XLearning needs few modification to read ClusterSpec from env. > *References:* > - TensorflowOnSpark (Yahoo): https://github.com/yahoo/TensorFlowOnSpark > - TensorFlowOnYARN (Intel): https://github.com/Intel-bigdata/TensorFlowOnYARN > - Spark Deep Learning (Databricks): > https://github.com/databricks/spark-deep-learning > - XLearning (Qihoo360): https://github.com/Qihoo360/XLearning > - Kubeflow (Google): https://github.com/kubeflow/kubeflow -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org