I'm partially hitting this issue in 0.9.0-SNAPSHOT for Spark interpreter with other names. Not sure if ZEPPELIN-3986 issue is completely resolved. I'm using multiple spark interpreters with different spark confs which share the same SPARK_SUBMIT_OPTIONS including a `--jars` option. It seems that only one of them is working. Anyway, shall we follow up on the ticket and see how to fix it?
Thanks, - Ethan On Mon, Apr 8, 2019 at 1:34 AM Jeff Zhang <zjf...@gmail.com> wrote: > Hi Ethan, > > These behavior are not expected. Maybe you are hitting this issue which is > fixed in 0.8.2 > https://jira.apache.org/jira/browse/ZEPPELIN-3986 > > > Y. Ethan Guo <guoyi...@uber.com> 于2019年4月8日周一 下午4:26写道: > >> Hi Jeff, Dave, >> >> Thanks for the suggestion. I was able to successfully run the Spark >> interpreter in yarn cluster mode on anther machine running Zeppelin. The >> previous problem could probably be due to network issues. >> >> I have two observations: >> (1) I'm able to use "--jars" option in SPARK_SUBMIT_OPTIONS in the >> "spark" interpreter with yarn cluster mode configured. I verify that the >> jars are pushed to the driver and executors by successfully running a job >> using some classes in the jars. However, if I create a new "spark_abc" >> interpreter under the spark interpreter group, this new interpreter doesn't >> seem to pick up SPARK_SUBMIT_OPTIONS and the jars option, leading to errors >> of not being able to access packages/classes in the jars. >> >> (2) Once I restart the spark interpreters in the interpreter settings, >> the corresponding Spark jobs in yarn cluster first transition from >> "RUNNING" state to "ACCEPTED" state, and then end up in "FAILED" state. >> >> I'm wondering if the above behavior are expected and they are known to be >> the limitations of the current 0.9.0-SNAPSHOT version. >> >> Thanks, >> - Ethan >> >> On Sun, Apr 7, 2019 at 9:59 AM Dave Boyd <db...@incadencecorp.com> wrote: >> >>> From the connection refused message I wonder if it is an SSL error. I >>> note none of the information for SSL (truststore, keystore, etc.) >>> I would think the YARN cluster requires some form of authentication. >>> On 4/7/19 9:27 AM, Jeff Zhang wrote: >>> >>> It looks like the interpreter process can not connect to zeppelin server >>> process. I guess it is due to some network issue, can you check whether the >>> node in yarn cluster can connect to the zeppelin server host ? >>> >>> Y. Ethan Guo <guoyi...@uber.com> 于2019年4月7日周日 下午3:31写道: >>> >>>> Hi Jeff, >>>> >>>> Given this PR is merged, I'm trying to see if I can run yarn cluster >>>> mode from master build. I built Zeppelin master from this commit: >>>> >>>> commit 3655c12b875884410224eca5d6155287d51916ac >>>> Author: Jongyoul Lee <jongy...@gmail.com> >>>> Date: Mon Apr 1 15:37:57 2019 +0900 >>>> [MINOR] Refactor CronJob class (#3335) >>>> >>>> While I can successfully run Spark interpreter yarn client mode, I'm >>>> having trouble making the yarn cluster mode working. Specifically, while >>>> the interpreter job was accepted in yarn, the job failed after 1-2 minutes >>>> because of this exception (see below). Do you have any idea why this >>>> is happening? >>>> >>>> DEBUG [2019-04-07 06:57:00,314] ({main} Logging.scala[logDebug]:58) - >>>> Created SSL options for fs: SSLOptions{enabled=false, keyStore=None, >>>> keyStorePassword=None, trustStore=None, trustStorePassword=None, >>>> protocol=None, enabledAlgorithms=Set()} >>>> INFO [2019-04-07 06:57:00,323] ({main} Logging.scala[logInfo]:54) - >>>> Starting the user application in a separate Thread >>>> INFO [2019-04-07 06:57:00,350] ({main} Logging.scala[logInfo]:54) - >>>> Waiting for spark context initialization... >>>> INFO [2019-04-07 06:57:00,403] ({Driver} >>>> RemoteInterpreterServer.java[<init>]:148) - Starting remote interpreter >>>> server on port 0, intpEventServerAddress: 172.17.0.1:45128 >>>> ERROR [2019-04-07 06:57:00,408] ({Driver} Logging.scala[logError]:91) - >>>> User class threw exception: >>>> org.apache.thrift.transport.TTransportException: java.net.ConnectException: >>>> Connection refused (Connection refused) >>>> org.apache.thrift.transport.TTransportException: >>>> java.net.ConnectException: Connection refused (Connection refused) >>>> at org.apache.thrift.transport.TSocket.open(TSocket.java:226) >>>> at >>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer.<init>(RemoteInterpreterServer.java:154) >>>> at >>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer.<init>(RemoteInterpreterServer.java:139) >>>> at >>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer.main(RemoteInterpreterServer.java:285) >>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>>> at >>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) >>>> at >>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >>>> at java.lang.reflect.Method.invoke(Method.java:498) >>>> at >>>> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:635) >>>> Caused by: java.net.ConnectException: Connection refused (Connection >>>> refused) >>>> at java.net.PlainSocketImpl.socketConnect(Native Method) >>>> at >>>> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) >>>> at >>>> java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) >>>> at >>>> java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) >>>> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) >>>> at java.net.Socket.connect(Socket.java:589) >>>> at org.apache.thrift.transport.TSocket.open(TSocket.java:221) >>>> ... 8 more >>>> >>>> Thanks, >>>> - Ethan >>>> >>>> On Wed, Feb 27, 2019 at 4:24 PM Jeff Zhang <zjf...@gmail.com> wrote: >>>> >>>>> Here's the PR >>>>> https://github.com/apache/zeppelin/pull/3308 >>>>> >>>>> Y. Ethan Guo <guoyi...@uber.com> 于2019年2月28日周四 上午2:50写道: >>>>> >>>>>> Hi All, >>>>>> >>>>>> I'm trying to use the new feature of yarn cluster mode to run Spark >>>>>> 2.4.0 jobs on Zeppelin 0.8.1. I've set the SPARK_HOME, >>>>>> SPARK_SUBMIT_OPTIONS, and HADOOP_CONF_DIR env variables in >>>>>> zeppelin-env.sh >>>>>> so that the Spark interpreter can be started in the cluster. I used >>>>>> `--jars` in SPARK_SUBMIT_OPTIONS to add local jars. However, when I tried >>>>>> to import a class from the jars in a Spark paragraph, the interpreter >>>>>> complained that it cannot find the package and class ("<console>:23: >>>>>> error: >>>>>> object ... is not a member of package ..."). Looks like the jars are not >>>>>> properly imported. >>>>>> >>>>>> I followed the instruction here >>>>>> <https://zeppelin.apache.org/docs/0.8.1/interpreter/spark.html#2-loading-spark-properties> >>>>>> to add the jars, but it seems that it's not working in the cluster mode. >>>>>> And this issue seems to be related to this bug: >>>>>> https://jira.apache.org/jira/browse/ZEPPELIN-3986. Is there any >>>>>> update on fixing it? What is the right way to add local jars in yarn >>>>>> cluster mode? Any help and update are much appreciated. >>>>>> >>>>>> >>>>>> Here's the SPARK_SUBMIT_OPTIONS I used (packages and jars paths >>>>>> omitted): >>>>>> >>>>>> export SPARK_SUBMIT_OPTIONS="--driver-memory 12G --packages ... >>>>>> --jars ... --repositories >>>>>> https://repository.cloudera.com/artifactory/public/,https://repository.cloudera.com/content/repositories/releases/,http://repo.spring.io/plugins-release/ >>>>>> " >>>>>> >>>>>> Thanks, >>>>>> - Ethan >>>>>> -- >>>>>> Best, >>>>>> - Ethan >>>>>> >>>>> >>>>> >>>>> -- >>>>> Best Regards >>>>> >>>>> Jeff Zhang >>>>> >>>> >>> >>> -- >>> Best Regards >>> >>> Jeff Zhang >>> >>> -- >>> ========= mailto:db...@incadencecorp.com <db...@incadencecorp.com> >>> ============ >>> David W. Boyd >>> VP, Data Solutions >>> 10432 Balls Ford, Suite 240 >>> Manassas, VA 20109 >>> office: +1-703-552-2862 >>> cell: +1-703-402-7908 >>> ============== http://www.incadencecorp.com/ ============ >>> ISO/IEC JTC1 WG9, editor ISO/IEC 20547 Big Data Reference Architecture >>> Chair ANSI/INCITS TC Big Data >>> Co-chair NIST Big Data Public Working Group Reference Architecture >>> First Robotic Mentor - FRC, FTC - www.iliterobotics.org >>> Board Member- USSTEM Foundation - www.usstem.org >>> >>> The information contained in this message may be privileged >>> and/or confidential and protected from disclosure. >>> If the reader of this message is not the intended recipient >>> or an employee or agent responsible for delivering this message >>> to the intended recipient, you are hereby notified that any >>> dissemination, distribution or copying of this communication >>> is strictly prohibited. If you have received this communication >>> in error, please notify the sender immediately by replying to >>> this message and deleting the material from any computer. >>> >>> >>> >>> > > -- > Best Regards > > Jeff Zhang >