I'm partially hitting this issue in 0.9.0-SNAPSHOT for Spark interpreter
with other names.  Not sure if ZEPPELIN-3986 issue is completely resolved.
I'm using multiple spark interpreters with different spark confs which
share the same SPARK_SUBMIT_OPTIONS including a `--jars` option.  It seems
that only one of them is working.  Anyway, shall we follow up on the ticket
and see how to fix it?

Thanks,
- Ethan

On Mon, Apr 8, 2019 at 1:34 AM Jeff Zhang <zjf...@gmail.com> wrote:

> Hi Ethan,
>
> These behavior are not expected. Maybe you are hitting this issue which is
> fixed in 0.8.2
> https://jira.apache.org/jira/browse/ZEPPELIN-3986
>
>
> Y. Ethan Guo <guoyi...@uber.com> 于2019年4月8日周一 下午4:26写道:
>
>> Hi Jeff, Dave,
>>
>> Thanks for the suggestion.  I was able to successfully run the Spark
>> interpreter in yarn cluster mode on anther machine running Zeppelin.  The
>> previous problem could probably be due to network issues.
>>
>> I have two observations:
>> (1) I'm able to use "--jars" option in SPARK_SUBMIT_OPTIONS in the
>> "spark" interpreter with yarn cluster mode configured.  I verify that the
>> jars are pushed to the driver and executors by successfully running a job
>> using some classes in the jars.  However, if I create a new "spark_abc"
>> interpreter under the spark interpreter group, this new interpreter doesn't
>> seem to pick up SPARK_SUBMIT_OPTIONS and the jars option, leading to errors
>> of not being able to access packages/classes in the jars.
>>
>> (2) Once I restart the spark interpreters in the interpreter settings,
>> the corresponding Spark jobs in yarn cluster first transition from
>> "RUNNING" state to "ACCEPTED" state, and then end up in "FAILED" state.
>>
>> I'm wondering if the above behavior are expected and they are known to be
>> the limitations of the current 0.9.0-SNAPSHOT version.
>>
>> Thanks,
>> - Ethan
>>
>> On Sun, Apr 7, 2019 at 9:59 AM Dave Boyd <db...@incadencecorp.com> wrote:
>>
>>> From the connection refused message I wonder if it is an SSL error.  I
>>> note none of the information for SSL (truststore, keystore, etc.)
>>> I would think the YARN cluster requires some form of authentication.
>>> On 4/7/19 9:27 AM, Jeff Zhang wrote:
>>>
>>> It looks like the interpreter process can not connect to zeppelin server
>>> process. I guess it is due to some network issue, can you check whether the
>>> node in yarn cluster can connect to the zeppelin server host ?
>>>
>>> Y. Ethan Guo <guoyi...@uber.com> 于2019年4月7日周日 下午3:31写道:
>>>
>>>> Hi Jeff,
>>>>
>>>> Given this PR is merged, I'm trying to see if I can run yarn cluster
>>>> mode from master build.  I built Zeppelin master from this commit:
>>>>
>>>> commit 3655c12b875884410224eca5d6155287d51916ac
>>>> Author: Jongyoul Lee <jongy...@gmail.com>
>>>> Date:   Mon Apr 1 15:37:57 2019 +0900
>>>>     [MINOR] Refactor CronJob class (#3335)
>>>>
>>>> While I can successfully run Spark interpreter yarn client mode, I'm
>>>> having trouble making the yarn cluster mode working.  Specifically, while
>>>> the interpreter job was accepted in yarn, the job failed after 1-2 minutes
>>>> because of this exception (see below).  Do you have any idea why this
>>>> is happening?
>>>>
>>>> DEBUG [2019-04-07 06:57:00,314] ({main} Logging.scala[logDebug]:58) -
>>>> Created SSL options for fs: SSLOptions{enabled=false, keyStore=None,
>>>> keyStorePassword=None, trustStore=None, trustStorePassword=None,
>>>> protocol=None, enabledAlgorithms=Set()}
>>>>  INFO [2019-04-07 06:57:00,323] ({main} Logging.scala[logInfo]:54) -
>>>> Starting the user application in a separate Thread
>>>>  INFO [2019-04-07 06:57:00,350] ({main} Logging.scala[logInfo]:54) -
>>>> Waiting for spark context initialization...
>>>>  INFO [2019-04-07 06:57:00,403] ({Driver}
>>>> RemoteInterpreterServer.java[<init>]:148) - Starting remote interpreter
>>>> server on port 0, intpEventServerAddress: 172.17.0.1:45128
>>>> ERROR [2019-04-07 06:57:00,408] ({Driver} Logging.scala[logError]:91) -
>>>> User class threw exception:
>>>> org.apache.thrift.transport.TTransportException: java.net.ConnectException:
>>>> Connection refused (Connection refused)
>>>> org.apache.thrift.transport.TTransportException:
>>>> java.net.ConnectException: Connection refused (Connection refused)
>>>> at org.apache.thrift.transport.TSocket.open(TSocket.java:226)
>>>> at
>>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer.<init>(RemoteInterpreterServer.java:154)
>>>> at
>>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer.<init>(RemoteInterpreterServer.java:139)
>>>> at
>>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer.main(RemoteInterpreterServer.java:285)
>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>> at
>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>>> at
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>> at java.lang.reflect.Method.invoke(Method.java:498)
>>>> at
>>>> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:635)
>>>> Caused by: java.net.ConnectException: Connection refused (Connection
>>>> refused)
>>>> at java.net.PlainSocketImpl.socketConnect(Native Method)
>>>> at
>>>> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
>>>> at
>>>> java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
>>>> at
>>>> java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
>>>> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
>>>> at java.net.Socket.connect(Socket.java:589)
>>>> at org.apache.thrift.transport.TSocket.open(TSocket.java:221)
>>>> ... 8 more
>>>>
>>>> Thanks,
>>>> - Ethan
>>>>
>>>> On Wed, Feb 27, 2019 at 4:24 PM Jeff Zhang <zjf...@gmail.com> wrote:
>>>>
>>>>> Here's the PR
>>>>> https://github.com/apache/zeppelin/pull/3308
>>>>>
>>>>> Y. Ethan Guo <guoyi...@uber.com> 于2019年2月28日周四 上午2:50写道:
>>>>>
>>>>>> Hi All,
>>>>>>
>>>>>> I'm trying to use the new feature of yarn cluster mode to run Spark
>>>>>> 2.4.0 jobs on Zeppelin 0.8.1. I've set the SPARK_HOME,
>>>>>> SPARK_SUBMIT_OPTIONS, and HADOOP_CONF_DIR env variables in 
>>>>>> zeppelin-env.sh
>>>>>> so that the Spark interpreter can be started in the cluster. I used
>>>>>> `--jars` in SPARK_SUBMIT_OPTIONS to add local jars. However, when I tried
>>>>>> to import a class from the jars in a Spark paragraph, the interpreter
>>>>>> complained that it cannot find the package and class ("<console>:23: 
>>>>>> error:
>>>>>> object ... is not a member of package ..."). Looks like the jars are not
>>>>>> properly imported.
>>>>>>
>>>>>> I followed the instruction here
>>>>>> <https://zeppelin.apache.org/docs/0.8.1/interpreter/spark.html#2-loading-spark-properties>
>>>>>> to add the jars, but it seems that it's not working in the cluster mode.
>>>>>> And this issue seems to be related to this bug:
>>>>>> https://jira.apache.org/jira/browse/ZEPPELIN-3986.  Is there any
>>>>>> update on fixing it? What is the right way to add local jars in yarn
>>>>>> cluster mode? Any help and update are much appreciated.
>>>>>>
>>>>>>
>>>>>> Here's the SPARK_SUBMIT_OPTIONS I used (packages and jars paths
>>>>>> omitted):
>>>>>>
>>>>>> export SPARK_SUBMIT_OPTIONS="--driver-memory 12G --packages ...
>>>>>> --jars ... --repositories
>>>>>> https://repository.cloudera.com/artifactory/public/,https://repository.cloudera.com/content/repositories/releases/,http://repo.spring.io/plugins-release/
>>>>>> "
>>>>>>
>>>>>> Thanks,
>>>>>> - Ethan
>>>>>> --
>>>>>> Best,
>>>>>> - Ethan
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Best Regards
>>>>>
>>>>> Jeff Zhang
>>>>>
>>>>
>>>
>>> --
>>> Best Regards
>>>
>>> Jeff Zhang
>>>
>>> --
>>> ========= mailto:db...@incadencecorp.com <db...@incadencecorp.com> 
>>> ============
>>> David W. Boyd
>>> VP,  Data Solutions
>>> 10432 Balls Ford, Suite 240
>>> Manassas, VA 20109
>>> office:   +1-703-552-2862
>>> cell:     +1-703-402-7908
>>> ============== http://www.incadencecorp.com/ ============
>>> ISO/IEC JTC1 WG9, editor ISO/IEC 20547 Big Data Reference Architecture
>>> Chair ANSI/INCITS TC Big Data
>>> Co-chair NIST Big Data Public Working Group Reference Architecture
>>> First Robotic Mentor - FRC, FTC - www.iliterobotics.org
>>> Board Member- USSTEM Foundation - www.usstem.org
>>>
>>> The information contained in this message may be privileged
>>> and/or confidential and protected from disclosure.
>>> If the reader of this message is not the intended recipient
>>> or an employee or agent responsible for delivering this message
>>> to the intended recipient, you are hereby notified that any
>>> dissemination, distribution or copying of this communication
>>> is strictly prohibited.  If you have received this communication
>>> in error, please notify the sender immediately by replying to
>>> this message and deleting the material from any computer.
>>>
>>>
>>>
>>>
>
> --
> Best Regards
>
> Jeff Zhang
>

Reply via email to