[ https://issues.apache.org/jira/browse/YARN-8698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16589597#comment-16589597 ]
Zac Zhou commented on YARN-8698: -------------------------------- Thanks a lot, [~leftnoteasy] :) Hi [~tangzhankun], I think this issue is related to hadoop classpath. The hadoop path of nodemanager is different from the one of docker. launch_container.sh specifies HADOOP_COMMON_HOME to the path which doesn't exists in the docker container. run-PRIMARY_WORKER.sh failed to execute the command: export CLASSPATH=`$HADOOP_HDFS_HOME/bin/hadoop classpath --glob` so classpath can't generated correctly. I validate this issue with the following step: # move hadoop package to some path, like A. # specify HADOOP_COMMON_HOME to some other path, like B, which is not hadoop package location: export HADOOP_COMMON_HOME=B # execute the command: ${A}/bin/hadoop classpath --glob We will get the following error: Error: Could not find or load main class org.apache.hadoop.util.Classpath If any more info is needed, feel free to let me know~ Thanks > [Submarine] Failed to add hadoop dependencies in docker container when > submitting a submarine job > ------------------------------------------------------------------------------------------------- > > Key: YARN-8698 > URL: https://issues.apache.org/jira/browse/YARN-8698 > Project: Hadoop YARN > Issue Type: Sub-task > Reporter: Zac Zhou > Assignee: Zac Zhou > Priority: Major > Attachments: YARN-8698.001.patch > > > When a standalone submarine tf job is submitted, the following error is got : > INFO:tensorflow:image after unit resnet/tower_0/fully_connected/: (?, 11) > INFO:tensorflow:Done calling model_fn. > INFO:tensorflow:Create CheckpointSaverHook. > hdfsBuilderConnect(forceNewInstance=0, nn=submarine, port=0, > kerbTicketCachePath=(NULL), userNa > me=(NULL)) error: > (unable to get root cause for java.lang.NoClassDefFoundError) > (unable to get stack trace for java.lang.NoClassDefFoundError) > hdfsBuilderConnect(forceNewInstance=0, nn=submarine, port=0, > kerbTicketCachePath=(NULL), userNa > me=(NULL)) error: > (unable to get root cause for java.lang.NoClassDefFoundError) > (unable to get stack trace for java.lang.NoClassDefFoundError) > > This error may be related to hadoop classpath > Hadoop env variables of launch_container.sh are as follows: > export HADOOP_COMMON_HOME=${HADOOP_COMMON_HOME:-"/home/hadoop/yarn-submarine"} > export HADOOP_HDFS_HOME=${HADOOP_HDFS_HOME:-"/home/hadoop/yarn-submarine"} > export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/home/hadoop/yarn-submarine/conf"} > export HADOOP_YARN_HOME=${HADOOP_YARN_HOME:-"/home/hadoop/yarn-submarine"} > export HADOOP_HOME=${HADOOP_HOME:-"/home/hadoop/yarn-submarine"} > > run-PRIMARY_WORKER.sh is like: > export HADOOP_YARN_HOME= > export HADOOP_HDFS_HOME=/hadoop-3.1.0 > export HADOOP_CONF_DIR=$WORK_DIR > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org