[ 
https://issues.apache.org/jira/browse/YARN-8698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16589597#comment-16589597
 ] 

Zac Zhou commented on YARN-8698:
--------------------------------

Thanks a lot, [~leftnoteasy] :)

Hi [~tangzhankun], I think this issue is related to hadoop classpath. 

The hadoop path of nodemanager is different from the one of docker.

 launch_container.sh specifies HADOOP_COMMON_HOME to the path which doesn't 
exists in the docker container.

run-PRIMARY_WORKER.sh failed to execute the command:

export CLASSPATH=`$HADOOP_HDFS_HOME/bin/hadoop classpath --glob`

so classpath can't generated correctly.

I validate this issue with the following step:
 # move hadoop package to some path, like A.
 # specify HADOOP_COMMON_HOME to some other path, like B, which is not hadoop 
package location: export HADOOP_COMMON_HOME=B
 # execute the command: ${A}/bin/hadoop classpath --glob

We will get the following error:

 Error: Could not find or load main class org.apache.hadoop.util.Classpath

If any more info is needed, feel free to let me know~

Thanks

 

> [Submarine] Failed to add hadoop dependencies in docker container when 
> submitting a submarine job
> -------------------------------------------------------------------------------------------------
>
>                 Key: YARN-8698
>                 URL: https://issues.apache.org/jira/browse/YARN-8698
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Zac Zhou
>            Assignee: Zac Zhou
>            Priority: Major
>         Attachments: YARN-8698.001.patch
>
>
> When a standalone submarine tf job is submitted, the following error is got :
> INFO:tensorflow:image after unit resnet/tower_0/fully_connected/: (?, 11)
>  INFO:tensorflow:Done calling model_fn.
>  INFO:tensorflow:Create CheckpointSaverHook.
>  hdfsBuilderConnect(forceNewInstance=0, nn=submarine, port=0, 
> kerbTicketCachePath=(NULL), userNa
>  me=(NULL)) error:
>  (unable to get root cause for java.lang.NoClassDefFoundError)
>  (unable to get stack trace for java.lang.NoClassDefFoundError)
>  hdfsBuilderConnect(forceNewInstance=0, nn=submarine, port=0, 
> kerbTicketCachePath=(NULL), userNa
>  me=(NULL)) error:
>  (unable to get root cause for java.lang.NoClassDefFoundError)
>  (unable to get stack trace for java.lang.NoClassDefFoundError)
>  
> This error may be related to hadoop classpath
> Hadoop env variables of launch_container.sh are as follows:
> export HADOOP_COMMON_HOME=${HADOOP_COMMON_HOME:-"/home/hadoop/yarn-submarine"}
>  export HADOOP_HDFS_HOME=${HADOOP_HDFS_HOME:-"/home/hadoop/yarn-submarine"}
>  export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/home/hadoop/yarn-submarine/conf"}
>  export HADOOP_YARN_HOME=${HADOOP_YARN_HOME:-"/home/hadoop/yarn-submarine"}
>  export HADOOP_HOME=${HADOOP_HOME:-"/home/hadoop/yarn-submarine"}
>  
> run-PRIMARY_WORKER.sh is like:
> export HADOOP_YARN_HOME=
>  export HADOOP_HDFS_HOME=/hadoop-3.1.0
>  export HADOOP_CONF_DIR=$WORK_DIR
>  
>   



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to