[ 
https://issues.apache.org/jira/browse/SUBMARINE-58?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zac Zhou updated SUBMARINE-58:
------------------------------
    Status: Patch Available  (was: Open)

A submarine job can be submitted using the uber jar like this:

/home/hadoop/java-current/bin/java -cp 
/home/hadoop/hadoop-current/etc/hadoop/:/home/hadoop/zq/hadoop-submarine-standalone-0.2.0-SNAPSHOT-with-all-dependencies.jar
 \
 org.apache.hadoop.yarn.submarine.client.cli.Cli job run \
 --env DOCKER_JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/jre \
 --env DOCKER_HADOOP_HDFS_HOME=/hadoop-current --name distributed-tf-gpu-ml4 \
 --env YARN_CONTAINER_RUNTIME_DOCKER_CONTAINER_NETWORK=calico-network \
 --env PYTHONPATH="./submarine_algorithm:$PYTHONPATH" \
 --env TZ="Asia/Shanghai" \
 --input_path hdfs://hadoop-cluster/tmp/cifar-10-data \
 --checkpoint_path 
hdfs://hadoop-cluster/user/hadoop/tf-distributed-checkpoint-ml4 \
 --saved_model_path 
hdfs://hadoop-cluster/user/hadoop/tf-distributed-saved-model-ml4 \
 --num_ps 0 \
 --ps_resources memory=4G,vcores=2,gpu=0 \
 --ps_launch_cmd "python cifar10_main.py 
--data-dir=hdfs://hadoop-cluster/tmp/cifar-10-data --job-dir=%checkpoint_path% 
--num-gpus=0" \
 --ps_docker_image *.*.*.*:5000/tensorflow1.13.1-hadoop3.1.2-cpu:1.0.0 \
 --worker_docker_image *.*.*.*:5000/tensorflow1.13.1-hadoop3.1.2-gpu:1.0.0 \
 --worker_resources memory=4G,vcores=2,gpu=1 --verbose \
 --num_workers 1 \
 --worker_launch_cmd "python cifar10_main.py 
--data-dir=hdfs://hadoop-cluster/tmp/cifar-10-data --job-dir=%checkpoint_path% 
--train-steps=500 --eval-batch-size=16 --train-batch-size=16 --num-gpus=1" \
 --localization 
"hdfs://hadoop-cluster/user/hadoop/estimator-model/1.10/cifar10_estimator/cifar10.py:."
 \
 --localization 
"hdfs://hadoop-cluster/user/hadoop/estimator-model/1.10/cifar10_estimator/cifar10_main.py:."
 \
 --localization 
"hdfs://hadoop-cluster/user/hadoop/estimator-model/1.10/cifar10_estimator/cifar10_model.py:."
 \
 --localization 
"hdfs://hadoop-cluster/user/hadoop/estimator-model/1.10/cifar10_estimator/cifar10_utils.py:."
 \
 --localization 
"hdfs://hadoop-cluster/user/hadoop/estimator-model/1.10/cifar10_estimator/generate_cifar10_tfrecords.py:."
 \
 --localization 
"hdfs://hadoop-cluster/user/hadoop/estimator-model/1.10/cifar10_estimator/model_base.py:."
 \
 --localization 
"hdfs://hadoop-cluster/user/hadoop/estimator-model/1.10/cifar10_estimator:./submarine_algorithm"
 \
 --wait_job_finish \
 --tensorboard \
 --tensorboard_docker_image *.*.*.*:5000/tensorflow1.13.1-hadoop3.1.2-cpu:1.0.0 
\
 --keytab /home/hadoop/hadoop.keytab \
 --principal hadoop/admin \
 --distribute_keytab

> Submarine client needs to generate fat jar
> ------------------------------------------
>
>                 Key: SUBMARINE-58
>                 URL: https://issues.apache.org/jira/browse/SUBMARINE-58
>             Project: Hadoop Submarine
>          Issue Type: Improvement
>            Reporter: Xun Liu
>            Assignee: Zac Zhou
>            Priority: Major
>         Attachments: SUBMARINE-58.001.patch
>
>
> When submitting a job using the submarine client alone, Will encounter 
> package dependencies and cause execution to fail, If the submarine client can 
> provide a fat jar, Many development and usage issues will be avoided.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to