It is supposed to be fixed in 0.9.0-SNAPSHOT as well, if you hit this issue in master, then it should be a bug, please file a ticket and describe the details. Thanks
Y. Ethan Guo <guoyi...@uber.com> 于2019年4月8日周一 下午4:42写道: > I'm partially hitting this issue in 0.9.0-SNAPSHOT for Spark interpreter > with other names. Not sure if ZEPPELIN-3986 issue is completely resolved. > I'm using multiple spark interpreters with different spark confs which > share the same SPARK_SUBMIT_OPTIONS including a `--jars` option. It seems > that only one of them is working. Anyway, shall we follow up on the ticket > and see how to fix it? > > Thanks, > - Ethan > > On Mon, Apr 8, 2019 at 1:34 AM Jeff Zhang <zjf...@gmail.com> wrote: > >> Hi Ethan, >> >> These behavior are not expected. Maybe you are hitting this issue which >> is fixed in 0.8.2 >> https://jira.apache.org/jira/browse/ZEPPELIN-3986 >> >> >> Y. Ethan Guo <guoyi...@uber.com> 于2019年4月8日周一 下午4:26写道: >> >>> Hi Jeff, Dave, >>> >>> Thanks for the suggestion. I was able to successfully run the Spark >>> interpreter in yarn cluster mode on anther machine running Zeppelin. The >>> previous problem could probably be due to network issues. >>> >>> I have two observations: >>> (1) I'm able to use "--jars" option in SPARK_SUBMIT_OPTIONS in the >>> "spark" interpreter with yarn cluster mode configured. I verify that the >>> jars are pushed to the driver and executors by successfully running a job >>> using some classes in the jars. However, if I create a new "spark_abc" >>> interpreter under the spark interpreter group, this new interpreter doesn't >>> seem to pick up SPARK_SUBMIT_OPTIONS and the jars option, leading to errors >>> of not being able to access packages/classes in the jars. >>> >>> (2) Once I restart the spark interpreters in the interpreter settings, >>> the corresponding Spark jobs in yarn cluster first transition from >>> "RUNNING" state to "ACCEPTED" state, and then end up in "FAILED" state. >>> >>> I'm wondering if the above behavior are expected and they are known to >>> be the limitations of the current 0.9.0-SNAPSHOT version. >>> >>> Thanks, >>> - Ethan >>> >>> On Sun, Apr 7, 2019 at 9:59 AM Dave Boyd <db...@incadencecorp.com> >>> wrote: >>> >>>> From the connection refused message I wonder if it is an SSL error. I >>>> note none of the information for SSL (truststore, keystore, etc.) >>>> I would think the YARN cluster requires some form of authentication. >>>> On 4/7/19 9:27 AM, Jeff Zhang wrote: >>>> >>>> It looks like the interpreter process can not connect to zeppelin >>>> server process. I guess it is due to some network issue, can you check >>>> whether the node in yarn cluster can connect to the zeppelin server host ? >>>> >>>> Y. Ethan Guo <guoyi...@uber.com> 于2019年4月7日周日 下午3:31写道: >>>> >>>>> Hi Jeff, >>>>> >>>>> Given this PR is merged, I'm trying to see if I can run yarn cluster >>>>> mode from master build. I built Zeppelin master from this commit: >>>>> >>>>> commit 3655c12b875884410224eca5d6155287d51916ac >>>>> Author: Jongyoul Lee <jongy...@gmail.com> >>>>> Date: Mon Apr 1 15:37:57 2019 +0900 >>>>> [MINOR] Refactor CronJob class (#3335) >>>>> >>>>> While I can successfully run Spark interpreter yarn client mode, I'm >>>>> having trouble making the yarn cluster mode working. Specifically, while >>>>> the interpreter job was accepted in yarn, the job failed after 1-2 minutes >>>>> because of this exception (see below). Do you have any idea why this >>>>> is happening? >>>>> >>>>> DEBUG [2019-04-07 06:57:00,314] ({main} Logging.scala[logDebug]:58) - >>>>> Created SSL options for fs: SSLOptions{enabled=false, keyStore=None, >>>>> keyStorePassword=None, trustStore=None, trustStorePassword=None, >>>>> protocol=None, enabledAlgorithms=Set()} >>>>> INFO [2019-04-07 06:57:00,323] ({main} Logging.scala[logInfo]:54) - >>>>> Starting the user application in a separate Thread >>>>> INFO [2019-04-07 06:57:00,350] ({main} Logging.scala[logInfo]:54) - >>>>> Waiting for spark context initialization... >>>>> INFO [2019-04-07 06:57:00,403] ({Driver} >>>>> RemoteInterpreterServer.java[<init>]:148) - Starting remote interpreter >>>>> server on port 0, intpEventServerAddress: 172.17.0.1:45128 >>>>> ERROR [2019-04-07 06:57:00,408] ({Driver} Logging.scala[logError]:91) >>>>> - User class threw exception: >>>>> org.apache.thrift.transport.TTransportException: >>>>> java.net.ConnectException: >>>>> Connection refused (Connection refused) >>>>> org.apache.thrift.transport.TTransportException: >>>>> java.net.ConnectException: Connection refused (Connection refused) >>>>> at org.apache.thrift.transport.TSocket.open(TSocket.java:226) >>>>> at >>>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer.<init>(RemoteInterpreterServer.java:154) >>>>> at >>>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer.<init>(RemoteInterpreterServer.java:139) >>>>> at >>>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer.main(RemoteInterpreterServer.java:285) >>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>>>> at >>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) >>>>> at >>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >>>>> at java.lang.reflect.Method.invoke(Method.java:498) >>>>> at >>>>> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:635) >>>>> Caused by: java.net.ConnectException: Connection refused (Connection >>>>> refused) >>>>> at java.net.PlainSocketImpl.socketConnect(Native Method) >>>>> at >>>>> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) >>>>> at >>>>> java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) >>>>> at >>>>> java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) >>>>> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) >>>>> at java.net.Socket.connect(Socket.java:589) >>>>> at org.apache.thrift.transport.TSocket.open(TSocket.java:221) >>>>> ... 8 more >>>>> >>>>> Thanks, >>>>> - Ethan >>>>> >>>>> On Wed, Feb 27, 2019 at 4:24 PM Jeff Zhang <zjf...@gmail.com> wrote: >>>>> >>>>>> Here's the PR >>>>>> https://github.com/apache/zeppelin/pull/3308 >>>>>> >>>>>> Y. Ethan Guo <guoyi...@uber.com> 于2019年2月28日周四 上午2:50写道: >>>>>> >>>>>>> Hi All, >>>>>>> >>>>>>> I'm trying to use the new feature of yarn cluster mode to run Spark >>>>>>> 2.4.0 jobs on Zeppelin 0.8.1. I've set the SPARK_HOME, >>>>>>> SPARK_SUBMIT_OPTIONS, and HADOOP_CONF_DIR env variables in >>>>>>> zeppelin-env.sh >>>>>>> so that the Spark interpreter can be started in the cluster. I used >>>>>>> `--jars` in SPARK_SUBMIT_OPTIONS to add local jars. However, when I >>>>>>> tried >>>>>>> to import a class from the jars in a Spark paragraph, the interpreter >>>>>>> complained that it cannot find the package and class ("<console>:23: >>>>>>> error: >>>>>>> object ... is not a member of package ..."). Looks like the jars are not >>>>>>> properly imported. >>>>>>> >>>>>>> I followed the instruction here >>>>>>> <https://zeppelin.apache.org/docs/0.8.1/interpreter/spark.html#2-loading-spark-properties> >>>>>>> to add the jars, but it seems that it's not working in the cluster mode. >>>>>>> And this issue seems to be related to this bug: >>>>>>> https://jira.apache.org/jira/browse/ZEPPELIN-3986. Is there any >>>>>>> update on fixing it? What is the right way to add local jars in yarn >>>>>>> cluster mode? Any help and update are much appreciated. >>>>>>> >>>>>>> >>>>>>> Here's the SPARK_SUBMIT_OPTIONS I used (packages and jars paths >>>>>>> omitted): >>>>>>> >>>>>>> export SPARK_SUBMIT_OPTIONS="--driver-memory 12G --packages ... >>>>>>> --jars ... --repositories >>>>>>> https://repository.cloudera.com/artifactory/public/,https://repository.cloudera.com/content/repositories/releases/,http://repo.spring.io/plugins-release/ >>>>>>> " >>>>>>> >>>>>>> Thanks, >>>>>>> - Ethan >>>>>>> -- >>>>>>> Best, >>>>>>> - Ethan >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Best Regards >>>>>> >>>>>> Jeff Zhang >>>>>> >>>>> >>>> >>>> -- >>>> Best Regards >>>> >>>> Jeff Zhang >>>> >>>> -- >>>> ========= mailto:db...@incadencecorp.com <db...@incadencecorp.com> >>>> ============ >>>> David W. Boyd >>>> VP, Data Solutions >>>> 10432 Balls Ford, Suite 240 >>>> Manassas, VA 20109 >>>> office: +1-703-552-2862 >>>> cell: +1-703-402-7908 >>>> ============== http://www.incadencecorp.com/ ============ >>>> ISO/IEC JTC1 WG9, editor ISO/IEC 20547 Big Data Reference Architecture >>>> Chair ANSI/INCITS TC Big Data >>>> Co-chair NIST Big Data Public Working Group Reference Architecture >>>> First Robotic Mentor - FRC, FTC - www.iliterobotics.org >>>> Board Member- USSTEM Foundation - www.usstem.org >>>> >>>> The information contained in this message may be privileged >>>> and/or confidential and protected from disclosure. >>>> If the reader of this message is not the intended recipient >>>> or an employee or agent responsible for delivering this message >>>> to the intended recipient, you are hereby notified that any >>>> dissemination, distribution or copying of this communication >>>> is strictly prohibited. If you have received this communication >>>> in error, please notify the sender immediately by replying to >>>> this message and deleting the material from any computer. >>>> >>>> >>>> >>>> >> >> -- >> Best Regards >> >> Jeff Zhang >> > -- Best Regards Jeff Zhang