[ https://issues.apache.org/jira/browse/YARN-8698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16591040#comment-16591040 ]
Zac Zhou edited comment on YARN-8698 at 8/24/18 1:47 AM: --------------------------------------------------------- Hi [~tangzhankun], yeah, I specified "DOCKER_HADOOP_HDFS_HOME" to /hadoop-3.1.0 which is the hadoop home directory in docker image. In docker, DOCKER_HADOOP_HDFS_HOME takes effect, but it is not enough. I think you can test it even without docker env. when you just specify HADOOP_HDFS_HOME, it works well as follows: {code:java} hadoop@hostname:~/zq/submarine-lib/hadoop-3.2.0-SNAPSHOT$ export HADOOP_HDFS_HOME=/home/hadoop/zq/submarine-lib/hadoop-3.2.0-SNAPSHOT hadoop@hostname:~/zq/submarine-lib/hadoop-3.2.0-SNAPSHOT$ echo $HADOOP_HDFS_HOME /home/hadoop/zq/submarine-lib/hadoop-3.2.0-SNAPSHOT hadoop@hostname:~/zq/submarine-lib/hadoop-3.2.0-SNAPSHOT$ ./bin/hadoop classpath --glob HADOOP_SUBCMD_SUPPORTDAEMONIZATION: false HADOOP_SUBCMD_SECURESERVICE: false HADOOP_DAEMON_MODE: default /home/hadoop/yarn-submarine/etc/hadoop:/home/hadoop/yarn-submarine/share/hadoop/common/lib/jaxb-api-2.2.11.jar:/home/hadoop/yarn-submarine/share/hadoop/common/lib/commons-lang3-3.7.jar:/home/hadoop/yarn-submarine/share/hadoop/common/lib/gson-2.2.4.jar:/home/hadoop/yarn-submarine/share/hadoop/common/lib/paranamer-2.3.jar: {code} But, if you specify a wrong HADOOP_COMMON_HOME with a correct HADOOP_HDFS_HOME, it will fail. {code:java} hadoop@hostname:~/zq/submarine-lib/hadoop-3.2.0-SNAPSHOT$ export HADOOP_COMMON_HOME=/home/hadoop hadoop@hostname:~/zq/submarine-lib/hadoop-3.2.0-SNAPSHOT$ ./bin/hadoop classpath --glob HADOOP_SUBCMD_SUPPORTDAEMONIZATION: false HADOOP_SUBCMD_SECURESERVICE: false HADOOP_DAEMON_MODE: default Error: Could not find or load main class org.apache.hadoop.util.Classpath {code} was (Author: yuan_zac): Hi [~tangzhankun], yeah, I specified "DOCKER_HADOOP_HDFS_HOME" to /hadoop-3.1.0 which is the hadoop home directory in docker image. In docker, DOCKER_HADOOP_HDFS_HOME takes effect, but it is not enough. I think you can test it even without docker env. when you just specify HADOOP_HDFS_HOME, it works well as follows: {code:java} hadoop@hostname:~/zq/submarine-lib/hadoop-3.2.0-SNAPSHOT$ export HADOOP_HDFS_HOME=/home/hadoop/zq/submarine-lib/hadoop-3.2.0-SNAPSHOT hadoop@hostname:~/zq/submarine-lib/hadoop-3.2.0-SNAPSHOT$ echo $HADOOP_HDFS_HOME /home/hadoop/zq/submarine-lib/hadoop-3.2.0-SNAPSHOT hadoop@hostname:~/zq/submarine-lib/hadoop-3.2.0-SNAPSHOT$ ./bin/hadoop classpath --glob HADOOP_SUBCMD_SUPPORTDAEMONIZATION: false HADOOP_SUBCMD_SECURESERVICE: false HADOOP_DAEMON_MODE: default /home/hadoop/yarn-submarine/etc/hadoop:/home/hadoop/yarn-submarine/share/hadoop/common/lib/jaxb-api-2.2.11.jar:/home/hadoop/yarn-submarine/share/hadoop/common/lib/commons-lang3-3.7.jar:/home/hadoop/yarn-submarine/share/hadoop/common/lib/gson-2.2.4.jar:/home/hadoop/yarn-submarine/share/hadoop/common/lib/paranamer-2.3.jar: {code} But, if you specify a wrong HADOOP_COMMON_HOME with a correct HADOOP_HDFS_HOME, it failed. {code:java} hadoop@hostname:~/zq/submarine-lib/hadoop-3.2.0-SNAPSHOT$ export HADOOP_COMMON_HOME=/home/hadoop hadoop@hostname:~/zq/submarine-lib/hadoop-3.2.0-SNAPSHOT$ ./bin/hadoop classpath --glob HADOOP_SUBCMD_SUPPORTDAEMONIZATION: false HADOOP_SUBCMD_SECURESERVICE: false HADOOP_DAEMON_MODE: default Error: Could not find or load main class org.apache.hadoop.util.Classpath {code} > [Submarine] Failed to add hadoop dependencies in docker container when > submitting a submarine job > ------------------------------------------------------------------------------------------------- > > Key: YARN-8698 > URL: https://issues.apache.org/jira/browse/YARN-8698 > Project: Hadoop YARN > Issue Type: Sub-task > Reporter: Zac Zhou > Assignee: Zac Zhou > Priority: Major > Attachments: YARN-8698.001.patch > > > When a standalone submarine tf job is submitted, the following error is got : > INFO:tensorflow:image after unit resnet/tower_0/fully_connected/: (?, 11) > INFO:tensorflow:Done calling model_fn. > INFO:tensorflow:Create CheckpointSaverHook. > hdfsBuilderConnect(forceNewInstance=0, nn=submarine, port=0, > kerbTicketCachePath=(NULL), userNa > me=(NULL)) error: > (unable to get root cause for java.lang.NoClassDefFoundError) > (unable to get stack trace for java.lang.NoClassDefFoundError) > hdfsBuilderConnect(forceNewInstance=0, nn=submarine, port=0, > kerbTicketCachePath=(NULL), userNa > me=(NULL)) error: > (unable to get root cause for java.lang.NoClassDefFoundError) > (unable to get stack trace for java.lang.NoClassDefFoundError) > > This error may be related to hadoop classpath > Hadoop env variables of launch_container.sh are as follows: > export HADOOP_COMMON_HOME=${HADOOP_COMMON_HOME:-"/home/hadoop/yarn-submarine"} > export HADOOP_HDFS_HOME=${HADOOP_HDFS_HOME:-"/home/hadoop/yarn-submarine"} > export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/home/hadoop/yarn-submarine/conf"} > export HADOOP_YARN_HOME=${HADOOP_YARN_HOME:-"/home/hadoop/yarn-submarine"} > export HADOOP_HOME=${HADOOP_HOME:-"/home/hadoop/yarn-submarine"} > > run-PRIMARY_WORKER.sh is like: > export HADOOP_YARN_HOME= > export HADOOP_HDFS_HOME=/hadoop-3.1.0 > export HADOOP_CONF_DIR=$WORK_DIR > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org