[jira] [Comment Edited] (YARN-8698) [Submarine] Failed to add hadoop dependencies in docker container when submitting a submarine job
[ https://issues.apache.org/jira/browse/YARN-8698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16594738#comment-16594738 ] Zhankun Tang edited comment on YARN-8698 at 8/28/18 9:27 AM: - [~yuan_zac] Thanks! I've reproduced your issue and double-checked the patch. It solves the HDFS connection issue caused by wrong HADOOP_COMMON_HOME value which causes invalid classpath and the TF job hung there. Both set it empty or correct works per my testing. {code:java} fw.append("export HADOOP_COMMON_HOME=\n");{code} [~sunilg] or [~leftnoteasy] Could you help merge the patch? Thanks! was (Author: tangzhankun): [~yuan_zac] Thanks! I've reproduced your issue and double-checked the patch. It solves the HDFS connection issue caused by wrong HADOOP_COMMON_HOME value which causes invalid classpath. Both set it empty or correct works per my testing. {code:java} fw.append("export HADOOP_COMMON_HOME=\n");{code} [~sunilg] or [~leftnoteasy] Could you help merge the patch? Thanks! > [Submarine] Failed to add hadoop dependencies in docker container when > submitting a submarine job > - > > Key: YARN-8698 > URL: https://issues.apache.org/jira/browse/YARN-8698 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zac Zhou >Assignee: Zac Zhou >Priority: Major > Attachments: YARN-8698.001.patch > > > When a standalone submarine tf job is submitted, the following error is got : > INFO:tensorflow:image after unit resnet/tower_0/fully_connected/: (?, 11) > INFO:tensorflow:Done calling model_fn. > INFO:tensorflow:Create CheckpointSaverHook. > hdfsBuilderConnect(forceNewInstance=0, nn=submarine, port=0, > kerbTicketCachePath=(NULL), userNa > me=(NULL)) error: > (unable to get root cause for java.lang.NoClassDefFoundError) > (unable to get stack trace for java.lang.NoClassDefFoundError) > hdfsBuilderConnect(forceNewInstance=0, nn=submarine, port=0, > kerbTicketCachePath=(NULL), userNa > me=(NULL)) error: > (unable to get root cause for java.lang.NoClassDefFoundError) > (unable to get stack trace for java.lang.NoClassDefFoundError) > > This error may be related to hadoop classpath > Hadoop env variables of launch_container.sh are as follows: > export HADOOP_COMMON_HOME=${HADOOP_COMMON_HOME:-"/home/hadoop/yarn-submarine"} > export HADOOP_HDFS_HOME=${HADOOP_HDFS_HOME:-"/home/hadoop/yarn-submarine"} > export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/home/hadoop/yarn-submarine/conf"} > export HADOOP_YARN_HOME=${HADOOP_YARN_HOME:-"/home/hadoop/yarn-submarine"} > export HADOOP_HOME=${HADOOP_HOME:-"/home/hadoop/yarn-submarine"} > > run-PRIMARY_WORKER.sh is like: > export HADOOP_YARN_HOME= > export HADOOP_HDFS_HOME=/hadoop-3.1.0 > export HADOOP_CONF_DIR=$WORK_DIR > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8698) [Submarine] Failed to add hadoop dependencies in docker container when submitting a submarine job
[ https://issues.apache.org/jira/browse/YARN-8698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16594738#comment-16594738 ] Zhankun Tang edited comment on YARN-8698 at 8/28/18 9:23 AM: - [~yuan_zac] Thanks! I've reproduced your issue and double-checked the patch. It solves the HDFS connection issue caused by wrong HADOOP_COMMON_HOME value which causes invalid classpath. Both set it empty or correct works per my testing. {code:java} fw.append("export HADOOP_COMMON_HOME=\n");{code} [~sunilg] or [~leftnoteasy] Could you help merge the patch? Thanks! was (Author: tangzhankun): [~yuan_zac] I've reproduced your issue and double-checked the patch. It solves the HDFS connection issue caused by wrong HADOOP_COMMON_HOME value which causes invalid classpath. Both set it empty or correct works per my testing. {code:java} fw.append("export HADOOP_COMMON_HOME=\n");{code} [~sunilg] or [~leftnoteasy] Could you help merge the patch? Thanks! > [Submarine] Failed to add hadoop dependencies in docker container when > submitting a submarine job > - > > Key: YARN-8698 > URL: https://issues.apache.org/jira/browse/YARN-8698 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zac Zhou >Assignee: Zac Zhou >Priority: Major > Attachments: YARN-8698.001.patch > > > When a standalone submarine tf job is submitted, the following error is got : > INFO:tensorflow:image after unit resnet/tower_0/fully_connected/: (?, 11) > INFO:tensorflow:Done calling model_fn. > INFO:tensorflow:Create CheckpointSaverHook. > hdfsBuilderConnect(forceNewInstance=0, nn=submarine, port=0, > kerbTicketCachePath=(NULL), userNa > me=(NULL)) error: > (unable to get root cause for java.lang.NoClassDefFoundError) > (unable to get stack trace for java.lang.NoClassDefFoundError) > hdfsBuilderConnect(forceNewInstance=0, nn=submarine, port=0, > kerbTicketCachePath=(NULL), userNa > me=(NULL)) error: > (unable to get root cause for java.lang.NoClassDefFoundError) > (unable to get stack trace for java.lang.NoClassDefFoundError) > > This error may be related to hadoop classpath > Hadoop env variables of launch_container.sh are as follows: > export HADOOP_COMMON_HOME=${HADOOP_COMMON_HOME:-"/home/hadoop/yarn-submarine"} > export HADOOP_HDFS_HOME=${HADOOP_HDFS_HOME:-"/home/hadoop/yarn-submarine"} > export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/home/hadoop/yarn-submarine/conf"} > export HADOOP_YARN_HOME=${HADOOP_YARN_HOME:-"/home/hadoop/yarn-submarine"} > export HADOOP_HOME=${HADOOP_HOME:-"/home/hadoop/yarn-submarine"} > > run-PRIMARY_WORKER.sh is like: > export HADOOP_YARN_HOME= > export HADOOP_HDFS_HOME=/hadoop-3.1.0 > export HADOOP_CONF_DIR=$WORK_DIR > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8698) [Submarine] Failed to add hadoop dependencies in docker container when submitting a submarine job
[ https://issues.apache.org/jira/browse/YARN-8698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16591243#comment-16591243 ] Zac Zhou edited comment on YARN-8698 at 8/24/18 7:27 AM: - In my case, launch_container.sh specified HADOOP_COMMON_HOME like this: export HADOOP_COMMON_HOME=${HADOOP_COMMON_HOME:-"/home/hadoop/yarn-submarine"} export HADOOP_HDFS_HOME=${HADOOP_HDFS_HOME:-"/home/hadoop/yarn-submarine"} export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/home/hadoop/yarn-submarine/conf"} export HADOOP_YARN_HOME=${HADOOP_YARN_HOME:-"/home/hadoop/yarn-submarine"} export HADOOP_HOME=${HADOOP_HOME:-"/home/hadoop/yarn-submarine"} I think, the HADOOP_COMMON_HOME in launch_container.sh would take effect in run-PRIMARY_WORKER.sh. ~ was (Author: yuan_zac): In my case, launch_container.sh specified HADOOP_COMMON_HOME like this: export HADOOP_COMMON_HOME=${HADOOP_COMMON_HOME:-"/home/hadoop/yarn-submarine"} export HADOOP_HDFS_HOME=${HADOOP_HDFS_HOME:-"/home/hadoop/yarn-submarine"} export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/home/hadoop/yarn-submarine/conf"} export HADOOP_YARN_HOME=${HADOOP_YARN_HOME:-"/home/hadoop/yarn-submarine"} export HADOOP_HOME=${HADOOP_HOME:-"/home/hadoop/yarn-submarine"} I think, the HADOOP_COMMON_HOME in launch_container.sh will affect in run-PRIMARY_WORKER.sh > [Submarine] Failed to add hadoop dependencies in docker container when > submitting a submarine job > - > > Key: YARN-8698 > URL: https://issues.apache.org/jira/browse/YARN-8698 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zac Zhou >Assignee: Zac Zhou >Priority: Major > Attachments: YARN-8698.001.patch > > > When a standalone submarine tf job is submitted, the following error is got : > INFO:tensorflow:image after unit resnet/tower_0/fully_connected/: (?, 11) > INFO:tensorflow:Done calling model_fn. > INFO:tensorflow:Create CheckpointSaverHook. > hdfsBuilderConnect(forceNewInstance=0, nn=submarine, port=0, > kerbTicketCachePath=(NULL), userNa > me=(NULL)) error: > (unable to get root cause for java.lang.NoClassDefFoundError) > (unable to get stack trace for java.lang.NoClassDefFoundError) > hdfsBuilderConnect(forceNewInstance=0, nn=submarine, port=0, > kerbTicketCachePath=(NULL), userNa > me=(NULL)) error: > (unable to get root cause for java.lang.NoClassDefFoundError) > (unable to get stack trace for java.lang.NoClassDefFoundError) > > This error may be related to hadoop classpath > Hadoop env variables of launch_container.sh are as follows: > export HADOOP_COMMON_HOME=${HADOOP_COMMON_HOME:-"/home/hadoop/yarn-submarine"} > export HADOOP_HDFS_HOME=${HADOOP_HDFS_HOME:-"/home/hadoop/yarn-submarine"} > export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/home/hadoop/yarn-submarine/conf"} > export HADOOP_YARN_HOME=${HADOOP_YARN_HOME:-"/home/hadoop/yarn-submarine"} > export HADOOP_HOME=${HADOOP_HOME:-"/home/hadoop/yarn-submarine"} > > run-PRIMARY_WORKER.sh is like: > export HADOOP_YARN_HOME= > export HADOOP_HDFS_HOME=/hadoop-3.1.0 > export HADOOP_CONF_DIR=$WORK_DIR > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8698) [Submarine] Failed to add hadoop dependencies in docker container when submitting a submarine job
[ https://issues.apache.org/jira/browse/YARN-8698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16591040#comment-16591040 ] Zac Zhou edited comment on YARN-8698 at 8/24/18 1:55 AM: - Hi [~tangzhankun], yeah, I specified "DOCKER_HADOOP_HDFS_HOME" to /hadoop-3.1.0 which is the hadoop home directory in docker image. In docker, DOCKER_HADOOP_HDFS_HOME takes effect, but it is not enough. I think you can test it even without docker env. when you just specify HADOOP_HDFS_HOME, it works well as follows: {code:java} hadoop@hostname:~/zq/submarine-lib/hadoop-3.2.0-SNAPSHOT$ export HADOOP_HDFS_HOME=/home/hadoop/zq/submarine-lib/hadoop-3.2.0-SNAPSHOT hadoop@hostname:~/zq/submarine-lib/hadoop-3.2.0-SNAPSHOT$ echo $HADOOP_HDFS_HOME /home/hadoop/zq/submarine-lib/hadoop-3.2.0-SNAPSHOT hadoop@hostname:~/zq/submarine-lib/hadoop-3.2.0-SNAPSHOT$ ./bin/hadoop classpath --glob HADOOP_SUBCMD_SUPPORTDAEMONIZATION: false HADOOP_SUBCMD_SECURESERVICE: false HADOOP_DAEMON_MODE: default /home/hadoop/yarn-submarine/etc/hadoop:/home/hadoop/yarn-submarine/share/hadoop/common/lib/jaxb-api-2.2.11.jar:/home/hadoop/yarn-submarine/share/hadoop/common/lib/commons-lang3-3.7.jar:/home/hadoop/yarn-submarine/share/hadoop/common/lib/gson-2.2.4.jar:/home/hadoop/yarn-submarine/share/hadoop/common/lib/paranamer-2.3.jar: {code} But, if a wrong HADOOP_COMMON_HOME is specified with a correct HADOOP_HDFS_HOME, it will fail. {code:java} hadoop@hostname:~/zq/submarine-lib/hadoop-3.2.0-SNAPSHOT$ export HADOOP_COMMON_HOME=/home/hadoop hadoop@hostname:~/zq/submarine-lib/hadoop-3.2.0-SNAPSHOT$ ./bin/hadoop classpath --glob HADOOP_SUBCMD_SUPPORTDAEMONIZATION: false HADOOP_SUBCMD_SECURESERVICE: false HADOOP_DAEMON_MODE: default Error: Could not find or load main class org.apache.hadoop.util.Classpath {code} was (Author: yuan_zac): Hi [~tangzhankun], yeah, I specified "DOCKER_HADOOP_HDFS_HOME" to /hadoop-3.1.0 which is the hadoop home directory in docker image. In docker, DOCKER_HADOOP_HDFS_HOME takes effect, but it is not enough. I think you can test it even without docker env. when you just specify HADOOP_HDFS_HOME, it works well as follows: {code:java} hadoop@hostname:~/zq/submarine-lib/hadoop-3.2.0-SNAPSHOT$ export HADOOP_HDFS_HOME=/home/hadoop/zq/submarine-lib/hadoop-3.2.0-SNAPSHOT hadoop@hostname:~/zq/submarine-lib/hadoop-3.2.0-SNAPSHOT$ echo $HADOOP_HDFS_HOME /home/hadoop/zq/submarine-lib/hadoop-3.2.0-SNAPSHOT hadoop@hostname:~/zq/submarine-lib/hadoop-3.2.0-SNAPSHOT$ ./bin/hadoop classpath --glob HADOOP_SUBCMD_SUPPORTDAEMONIZATION: false HADOOP_SUBCMD_SECURESERVICE: false HADOOP_DAEMON_MODE: default /home/hadoop/yarn-submarine/etc/hadoop:/home/hadoop/yarn-submarine/share/hadoop/common/lib/jaxb-api-2.2.11.jar:/home/hadoop/yarn-submarine/share/hadoop/common/lib/commons-lang3-3.7.jar:/home/hadoop/yarn-submarine/share/hadoop/common/lib/gson-2.2.4.jar:/home/hadoop/yarn-submarine/share/hadoop/common/lib/paranamer-2.3.jar: {code} But, if you specify a wrong HADOOP_COMMON_HOME with a correct HADOOP_HDFS_HOME, it will fail. {code:java} hadoop@hostname:~/zq/submarine-lib/hadoop-3.2.0-SNAPSHOT$ export HADOOP_COMMON_HOME=/home/hadoop hadoop@hostname:~/zq/submarine-lib/hadoop-3.2.0-SNAPSHOT$ ./bin/hadoop classpath --glob HADOOP_SUBCMD_SUPPORTDAEMONIZATION: false HADOOP_SUBCMD_SECURESERVICE: false HADOOP_DAEMON_MODE: default Error: Could not find or load main class org.apache.hadoop.util.Classpath {code} > [Submarine] Failed to add hadoop dependencies in docker container when > submitting a submarine job > - > > Key: YARN-8698 > URL: https://issues.apache.org/jira/browse/YARN-8698 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zac Zhou >Assignee: Zac Zhou >Priority: Major > Attachments: YARN-8698.001.patch > > > When a standalone submarine tf job is submitted, the following error is got : > INFO:tensorflow:image after unit resnet/tower_0/fully_connected/: (?, 11) > INFO:tensorflow:Done calling model_fn. > INFO:tensorflow:Create CheckpointSaverHook. > hdfsBuilderConnect(forceNewInstance=0, nn=submarine, port=0, > kerbTicketCachePath=(NULL), userNa > me=(NULL)) error: > (unable to get root cause for java.lang.NoClassDefFoundError) > (unable to get stack trace for java.lang.NoClassDefFoundError) > hdfsBuilderConnect(forceNewInstance=0, nn=submarine, port=0, > kerbTicketCachePath=(NULL), userNa > me=(NULL)) error: > (unable to get root cause for java.lang.NoClassDefFoundError) > (unable to get stack trace for java.lang.NoClassDefFoundError) > > This error may be related to hadoop classpath > Hadoop env variables of launch_container.sh are as follows: > export HADOOP_COMMON_HOME=${HADOOP_COMMON_HOME:-"/home/hadoop/yar
[jira] [Comment Edited] (YARN-8698) [Submarine] Failed to add hadoop dependencies in docker container when submitting a submarine job
[ https://issues.apache.org/jira/browse/YARN-8698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16591040#comment-16591040 ] Zac Zhou edited comment on YARN-8698 at 8/24/18 1:47 AM: - Hi [~tangzhankun], yeah, I specified "DOCKER_HADOOP_HDFS_HOME" to /hadoop-3.1.0 which is the hadoop home directory in docker image. In docker, DOCKER_HADOOP_HDFS_HOME takes effect, but it is not enough. I think you can test it even without docker env. when you just specify HADOOP_HDFS_HOME, it works well as follows: {code:java} hadoop@hostname:~/zq/submarine-lib/hadoop-3.2.0-SNAPSHOT$ export HADOOP_HDFS_HOME=/home/hadoop/zq/submarine-lib/hadoop-3.2.0-SNAPSHOT hadoop@hostname:~/zq/submarine-lib/hadoop-3.2.0-SNAPSHOT$ echo $HADOOP_HDFS_HOME /home/hadoop/zq/submarine-lib/hadoop-3.2.0-SNAPSHOT hadoop@hostname:~/zq/submarine-lib/hadoop-3.2.0-SNAPSHOT$ ./bin/hadoop classpath --glob HADOOP_SUBCMD_SUPPORTDAEMONIZATION: false HADOOP_SUBCMD_SECURESERVICE: false HADOOP_DAEMON_MODE: default /home/hadoop/yarn-submarine/etc/hadoop:/home/hadoop/yarn-submarine/share/hadoop/common/lib/jaxb-api-2.2.11.jar:/home/hadoop/yarn-submarine/share/hadoop/common/lib/commons-lang3-3.7.jar:/home/hadoop/yarn-submarine/share/hadoop/common/lib/gson-2.2.4.jar:/home/hadoop/yarn-submarine/share/hadoop/common/lib/paranamer-2.3.jar: {code} But, if you specify a wrong HADOOP_COMMON_HOME with a correct HADOOP_HDFS_HOME, it will fail. {code:java} hadoop@hostname:~/zq/submarine-lib/hadoop-3.2.0-SNAPSHOT$ export HADOOP_COMMON_HOME=/home/hadoop hadoop@hostname:~/zq/submarine-lib/hadoop-3.2.0-SNAPSHOT$ ./bin/hadoop classpath --glob HADOOP_SUBCMD_SUPPORTDAEMONIZATION: false HADOOP_SUBCMD_SECURESERVICE: false HADOOP_DAEMON_MODE: default Error: Could not find or load main class org.apache.hadoop.util.Classpath {code} was (Author: yuan_zac): Hi [~tangzhankun], yeah, I specified "DOCKER_HADOOP_HDFS_HOME" to /hadoop-3.1.0 which is the hadoop home directory in docker image. In docker, DOCKER_HADOOP_HDFS_HOME takes effect, but it is not enough. I think you can test it even without docker env. when you just specify HADOOP_HDFS_HOME, it works well as follows: {code:java} hadoop@hostname:~/zq/submarine-lib/hadoop-3.2.0-SNAPSHOT$ export HADOOP_HDFS_HOME=/home/hadoop/zq/submarine-lib/hadoop-3.2.0-SNAPSHOT hadoop@hostname:~/zq/submarine-lib/hadoop-3.2.0-SNAPSHOT$ echo $HADOOP_HDFS_HOME /home/hadoop/zq/submarine-lib/hadoop-3.2.0-SNAPSHOT hadoop@hostname:~/zq/submarine-lib/hadoop-3.2.0-SNAPSHOT$ ./bin/hadoop classpath --glob HADOOP_SUBCMD_SUPPORTDAEMONIZATION: false HADOOP_SUBCMD_SECURESERVICE: false HADOOP_DAEMON_MODE: default /home/hadoop/yarn-submarine/etc/hadoop:/home/hadoop/yarn-submarine/share/hadoop/common/lib/jaxb-api-2.2.11.jar:/home/hadoop/yarn-submarine/share/hadoop/common/lib/commons-lang3-3.7.jar:/home/hadoop/yarn-submarine/share/hadoop/common/lib/gson-2.2.4.jar:/home/hadoop/yarn-submarine/share/hadoop/common/lib/paranamer-2.3.jar: {code} But, if you specify a wrong HADOOP_COMMON_HOME with a correct HADOOP_HDFS_HOME, it failed. {code:java} hadoop@hostname:~/zq/submarine-lib/hadoop-3.2.0-SNAPSHOT$ export HADOOP_COMMON_HOME=/home/hadoop hadoop@hostname:~/zq/submarine-lib/hadoop-3.2.0-SNAPSHOT$ ./bin/hadoop classpath --glob HADOOP_SUBCMD_SUPPORTDAEMONIZATION: false HADOOP_SUBCMD_SECURESERVICE: false HADOOP_DAEMON_MODE: default Error: Could not find or load main class org.apache.hadoop.util.Classpath {code} > [Submarine] Failed to add hadoop dependencies in docker container when > submitting a submarine job > - > > Key: YARN-8698 > URL: https://issues.apache.org/jira/browse/YARN-8698 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zac Zhou >Assignee: Zac Zhou >Priority: Major > Attachments: YARN-8698.001.patch > > > When a standalone submarine tf job is submitted, the following error is got : > INFO:tensorflow:image after unit resnet/tower_0/fully_connected/: (?, 11) > INFO:tensorflow:Done calling model_fn. > INFO:tensorflow:Create CheckpointSaverHook. > hdfsBuilderConnect(forceNewInstance=0, nn=submarine, port=0, > kerbTicketCachePath=(NULL), userNa > me=(NULL)) error: > (unable to get root cause for java.lang.NoClassDefFoundError) > (unable to get stack trace for java.lang.NoClassDefFoundError) > hdfsBuilderConnect(forceNewInstance=0, nn=submarine, port=0, > kerbTicketCachePath=(NULL), userNa > me=(NULL)) error: > (unable to get root cause for java.lang.NoClassDefFoundError) > (unable to get stack trace for java.lang.NoClassDefFoundError) > > This error may be related to hadoop classpath > Hadoop env variables of launch_container.sh are as follows: > export HADOOP_COMMON_HOME=${HADOOP_COMMON_HOME:-"/home/hadoop/yarn-su