It looks like the interpreter process can not connect to zeppelin server process. I guess it is due to some network issue, can you check whether the node in yarn cluster can connect to the zeppelin server host ?
Y. Ethan Guo <guoyi...@uber.com> 于2019年4月7日周日 下午3:31写道: > Hi Jeff, > > Given this PR is merged, I'm trying to see if I can run yarn cluster mode > from master build. I built Zeppelin master from this commit: > > commit 3655c12b875884410224eca5d6155287d51916ac > Author: Jongyoul Lee <jongy...@gmail.com> > Date: Mon Apr 1 15:37:57 2019 +0900 > [MINOR] Refactor CronJob class (#3335) > > While I can successfully run Spark interpreter yarn client mode, I'm > having trouble making the yarn cluster mode working. Specifically, while > the interpreter job was accepted in yarn, the job failed after 1-2 minutes > because of this exception (see below). Do you have any idea why this > is happening? > > DEBUG [2019-04-07 06:57:00,314] ({main} Logging.scala[logDebug]:58) - > Created SSL options for fs: SSLOptions{enabled=false, keyStore=None, > keyStorePassword=None, trustStore=None, trustStorePassword=None, > protocol=None, enabledAlgorithms=Set()} > INFO [2019-04-07 06:57:00,323] ({main} Logging.scala[logInfo]:54) - > Starting the user application in a separate Thread > INFO [2019-04-07 06:57:00,350] ({main} Logging.scala[logInfo]:54) - > Waiting for spark context initialization... > INFO [2019-04-07 06:57:00,403] ({Driver} > RemoteInterpreterServer.java[<init>]:148) - Starting remote interpreter > server on port 0, intpEventServerAddress: 172.17.0.1:45128 > ERROR [2019-04-07 06:57:00,408] ({Driver} Logging.scala[logError]:91) - > User class threw exception: > org.apache.thrift.transport.TTransportException: java.net.ConnectException: > Connection refused (Connection refused) > org.apache.thrift.transport.TTransportException: > java.net.ConnectException: Connection refused (Connection refused) > at org.apache.thrift.transport.TSocket.open(TSocket.java:226) > at > org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer.<init>(RemoteInterpreterServer.java:154) > at > org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer.<init>(RemoteInterpreterServer.java:139) > at > org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer.main(RemoteInterpreterServer.java:285) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:635) > Caused by: java.net.ConnectException: Connection refused (Connection > refused) > at java.net.PlainSocketImpl.socketConnect(Native Method) > at > java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) > at > java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) > at > java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) > at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) > at java.net.Socket.connect(Socket.java:589) > at org.apache.thrift.transport.TSocket.open(TSocket.java:221) > ... 8 more > > Thanks, > - Ethan > > On Wed, Feb 27, 2019 at 4:24 PM Jeff Zhang <zjf...@gmail.com> wrote: > >> Here's the PR >> https://github.com/apache/zeppelin/pull/3308 >> >> Y. Ethan Guo <guoyi...@uber.com> 于2019年2月28日周四 上午2:50写道: >> >>> Hi All, >>> >>> I'm trying to use the new feature of yarn cluster mode to run Spark >>> 2.4.0 jobs on Zeppelin 0.8.1. I've set the SPARK_HOME, >>> SPARK_SUBMIT_OPTIONS, and HADOOP_CONF_DIR env variables in zeppelin-env.sh >>> so that the Spark interpreter can be started in the cluster. I used >>> `--jars` in SPARK_SUBMIT_OPTIONS to add local jars. However, when I tried >>> to import a class from the jars in a Spark paragraph, the interpreter >>> complained that it cannot find the package and class ("<console>:23: error: >>> object ... is not a member of package ..."). Looks like the jars are not >>> properly imported. >>> >>> I followed the instruction here >>> <https://zeppelin.apache.org/docs/0.8.1/interpreter/spark.html#2-loading-spark-properties> >>> to add the jars, but it seems that it's not working in the cluster mode. >>> And this issue seems to be related to this bug: >>> https://jira.apache.org/jira/browse/ZEPPELIN-3986. Is there any update >>> on fixing it? What is the right way to add local jars in yarn cluster mode? >>> Any help and update are much appreciated. >>> >>> >>> Here's the SPARK_SUBMIT_OPTIONS I used (packages and jars paths omitted): >>> >>> export SPARK_SUBMIT_OPTIONS="--driver-memory 12G --packages ... --jars >>> ... --repositories >>> https://repository.cloudera.com/artifactory/public/,https://repository.cloudera.com/content/repositories/releases/,http://repo.spring.io/plugins-release/ >>> " >>> >>> Thanks, >>> - Ethan >>> -- >>> Best, >>> - Ethan >>> >> >> >> -- >> Best Regards >> >> Jeff Zhang >> > -- Best Regards Jeff Zhang