[ https://issues.apache.org/jira/browse/SUBMARINE-58?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Zac Zhou updated SUBMARINE-58: ------------------------------ Status: Patch Available (was: Open) A submarine job can be submitted using the uber jar like this: /home/hadoop/java-current/bin/java -cp /home/hadoop/hadoop-current/etc/hadoop/:/home/hadoop/zq/hadoop-submarine-standalone-0.2.0-SNAPSHOT-with-all-dependencies.jar \ org.apache.hadoop.yarn.submarine.client.cli.Cli job run \ --env DOCKER_JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/jre \ --env DOCKER_HADOOP_HDFS_HOME=/hadoop-current --name distributed-tf-gpu-ml4 \ --env YARN_CONTAINER_RUNTIME_DOCKER_CONTAINER_NETWORK=calico-network \ --env PYTHONPATH="./submarine_algorithm:$PYTHONPATH" \ --env TZ="Asia/Shanghai" \ --input_path hdfs://hadoop-cluster/tmp/cifar-10-data \ --checkpoint_path hdfs://hadoop-cluster/user/hadoop/tf-distributed-checkpoint-ml4 \ --saved_model_path hdfs://hadoop-cluster/user/hadoop/tf-distributed-saved-model-ml4 \ --num_ps 0 \ --ps_resources memory=4G,vcores=2,gpu=0 \ --ps_launch_cmd "python cifar10_main.py --data-dir=hdfs://hadoop-cluster/tmp/cifar-10-data --job-dir=%checkpoint_path% --num-gpus=0" \ --ps_docker_image *.*.*.*:5000/tensorflow1.13.1-hadoop3.1.2-cpu:1.0.0 \ --worker_docker_image *.*.*.*:5000/tensorflow1.13.1-hadoop3.1.2-gpu:1.0.0 \ --worker_resources memory=4G,vcores=2,gpu=1 --verbose \ --num_workers 1 \ --worker_launch_cmd "python cifar10_main.py --data-dir=hdfs://hadoop-cluster/tmp/cifar-10-data --job-dir=%checkpoint_path% --train-steps=500 --eval-batch-size=16 --train-batch-size=16 --num-gpus=1" \ --localization "hdfs://hadoop-cluster/user/hadoop/estimator-model/1.10/cifar10_estimator/cifar10.py:." \ --localization "hdfs://hadoop-cluster/user/hadoop/estimator-model/1.10/cifar10_estimator/cifar10_main.py:." \ --localization "hdfs://hadoop-cluster/user/hadoop/estimator-model/1.10/cifar10_estimator/cifar10_model.py:." \ --localization "hdfs://hadoop-cluster/user/hadoop/estimator-model/1.10/cifar10_estimator/cifar10_utils.py:." \ --localization "hdfs://hadoop-cluster/user/hadoop/estimator-model/1.10/cifar10_estimator/generate_cifar10_tfrecords.py:." \ --localization "hdfs://hadoop-cluster/user/hadoop/estimator-model/1.10/cifar10_estimator/model_base.py:." \ --localization "hdfs://hadoop-cluster/user/hadoop/estimator-model/1.10/cifar10_estimator:./submarine_algorithm" \ --wait_job_finish \ --tensorboard \ --tensorboard_docker_image *.*.*.*:5000/tensorflow1.13.1-hadoop3.1.2-cpu:1.0.0 \ --keytab /home/hadoop/hadoop.keytab \ --principal hadoop/admin \ --distribute_keytab > Submarine client needs to generate fat jar > ------------------------------------------ > > Key: SUBMARINE-58 > URL: https://issues.apache.org/jira/browse/SUBMARINE-58 > Project: Hadoop Submarine > Issue Type: Improvement > Reporter: Xun Liu > Assignee: Zac Zhou > Priority: Major > Attachments: SUBMARINE-58.001.patch > > > When submitting a job using the submarine client alone, Will encounter > package dependencies and cause execution to fail, If the submarine client can > provide a fat jar, Many development and usage issues will be avoided. -- This message was sent by Atlassian JIRA (v7.6.3#76005)