Wangda Tan created YARN-8135: -------------------------------- Summary: Hadoop {Submarine} Project: Simple and scalable deployment of deep learning training / serving jobs on Hadoop Key: YARN-8135 URL: https://issues.apache.org/jira/browse/YARN-8135 Project: Hadoop YARN Issue Type: New Feature Reporter: Wangda Tan Assignee: Wangda Tan Attachments: image-2018-04-09-14-35-16-778.png
Description: *Goals:* - Allow infra engineer / data scientist to run *unmodified* Tensorflow jobs on YARN. - Allow jobs easy access data/models in HDFS and other storages. - Can launch services to serve Tensorflow/MXNet models. - Support run distributed Tensorflow jobs with simple configs. - Support run user-specified Docker images. - Support specify GPU and other resources. - Support launch tensorboard if user specified. - Support customized DNS name for roles (like tensorboard.$user.$domain:6006) *Why this name?* - Because Submarine is the only vehicle can take human to deep places. B-) Compare to other projects: !image-2018-04-09-14-35-16-778.png! *Notes:* * GPU Isolation of XLearning project is achieved by patched YARN, which is different from community’s GPU isolation solution. ** XLearning needs few modification to read ClusterSpec from env. *References:* - TensorflowOnSpark (Yahoo): https://github.com/yahoo/TensorFlowOnSpark - TensorFlowOnYARN (Intel): https://github.com/Intel-bigdata/TensorFlowOnYARN - Spark Deep Learning (Databricks): https://github.com/databricks/spark-deep-learning - XLearning (Qihoo360): https://github.com/Qihoo360/XLearning - Kubeflow (Google): https://github.com/kubeflow/kubeflow -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org