Hi Jeff, Dave,

Thanks for the suggestion.  I was able to successfully run the Spark
interpreter in yarn cluster mode on anther machine running Zeppelin.  The
previous problem could probably be due to network issues.

I have two observations:
(1) I'm able to use "--jars" option in SPARK_SUBMIT_OPTIONS in the "spark"
interpreter with yarn cluster mode configured.  I verify that the jars are
pushed to the driver and executors by successfully running a job using some
classes in the jars.  However, if I create a new "spark_abc" interpreter
under the spark interpreter group, this new interpreter doesn't seem to
pick up SPARK_SUBMIT_OPTIONS and the jars option, leading to errors of not
being able to access packages/classes in the jars.

(2) Once I restart the spark interpreters in the interpreter settings, the
corresponding Spark jobs in yarn cluster first transition from "RUNNING"
state to "ACCEPTED" state, and then end up in "FAILED" state.

I'm wondering if the above behavior are expected and they are known to be
the limitations of the current 0.9.0-SNAPSHOT version.

Thanks,
- Ethan

On Sun, Apr 7, 2019 at 9:59 AM Dave Boyd <db...@incadencecorp.com> wrote:

> From the connection refused message I wonder if it is an SSL error.  I
> note none of the information for SSL (truststore, keystore, etc.)
> I would think the YARN cluster requires some form of authentication.
> On 4/7/19 9:27 AM, Jeff Zhang wrote:
>
> It looks like the interpreter process can not connect to zeppelin server
> process. I guess it is due to some network issue, can you check whether the
> node in yarn cluster can connect to the zeppelin server host ?
>
> Y. Ethan Guo <guoyi...@uber.com> 于2019年4月7日周日 下午3:31写道:
>
>> Hi Jeff,
>>
>> Given this PR is merged, I'm trying to see if I can run yarn cluster mode
>> from master build.  I built Zeppelin master from this commit:
>>
>> commit 3655c12b875884410224eca5d6155287d51916ac
>> Author: Jongyoul Lee <jongy...@gmail.com>
>> Date:   Mon Apr 1 15:37:57 2019 +0900
>>     [MINOR] Refactor CronJob class (#3335)
>>
>> While I can successfully run Spark interpreter yarn client mode, I'm
>> having trouble making the yarn cluster mode working.  Specifically, while
>> the interpreter job was accepted in yarn, the job failed after 1-2 minutes
>> because of this exception (see below).  Do you have any idea why this
>> is happening?
>>
>> DEBUG [2019-04-07 06:57:00,314] ({main} Logging.scala[logDebug]:58) -
>> Created SSL options for fs: SSLOptions{enabled=false, keyStore=None,
>> keyStorePassword=None, trustStore=None, trustStorePassword=None,
>> protocol=None, enabledAlgorithms=Set()}
>>  INFO [2019-04-07 06:57:00,323] ({main} Logging.scala[logInfo]:54) -
>> Starting the user application in a separate Thread
>>  INFO [2019-04-07 06:57:00,350] ({main} Logging.scala[logInfo]:54) -
>> Waiting for spark context initialization...
>>  INFO [2019-04-07 06:57:00,403] ({Driver}
>> RemoteInterpreterServer.java[<init>]:148) - Starting remote interpreter
>> server on port 0, intpEventServerAddress: 172.17.0.1:45128
>> ERROR [2019-04-07 06:57:00,408] ({Driver} Logging.scala[logError]:91) -
>> User class threw exception:
>> org.apache.thrift.transport.TTransportException: java.net.ConnectException:
>> Connection refused (Connection refused)
>> org.apache.thrift.transport.TTransportException:
>> java.net.ConnectException: Connection refused (Connection refused)
>> at org.apache.thrift.transport.TSocket.open(TSocket.java:226)
>> at
>> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer.<init>(RemoteInterpreterServer.java:154)
>> at
>> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer.<init>(RemoteInterpreterServer.java:139)
>> at
>> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer.main(RemoteInterpreterServer.java:285)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:498)
>> at
>> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:635)
>> Caused by: java.net.ConnectException: Connection refused (Connection
>> refused)
>> at java.net.PlainSocketImpl.socketConnect(Native Method)
>> at
>> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
>> at
>> java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
>> at
>> java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
>> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
>> at java.net.Socket.connect(Socket.java:589)
>> at org.apache.thrift.transport.TSocket.open(TSocket.java:221)
>> ... 8 more
>>
>> Thanks,
>> - Ethan
>>
>> On Wed, Feb 27, 2019 at 4:24 PM Jeff Zhang <zjf...@gmail.com> wrote:
>>
>>> Here's the PR
>>> https://github.com/apache/zeppelin/pull/3308
>>>
>>> Y. Ethan Guo <guoyi...@uber.com> 于2019年2月28日周四 上午2:50写道:
>>>
>>>> Hi All,
>>>>
>>>> I'm trying to use the new feature of yarn cluster mode to run Spark
>>>> 2.4.0 jobs on Zeppelin 0.8.1. I've set the SPARK_HOME,
>>>> SPARK_SUBMIT_OPTIONS, and HADOOP_CONF_DIR env variables in zeppelin-env.sh
>>>> so that the Spark interpreter can be started in the cluster. I used
>>>> `--jars` in SPARK_SUBMIT_OPTIONS to add local jars. However, when I tried
>>>> to import a class from the jars in a Spark paragraph, the interpreter
>>>> complained that it cannot find the package and class ("<console>:23: error:
>>>> object ... is not a member of package ..."). Looks like the jars are not
>>>> properly imported.
>>>>
>>>> I followed the instruction here
>>>> <https://zeppelin.apache.org/docs/0.8.1/interpreter/spark.html#2-loading-spark-properties>
>>>> to add the jars, but it seems that it's not working in the cluster mode.
>>>> And this issue seems to be related to this bug:
>>>> https://jira.apache.org/jira/browse/ZEPPELIN-3986.  Is there any
>>>> update on fixing it? What is the right way to add local jars in yarn
>>>> cluster mode? Any help and update are much appreciated.
>>>>
>>>>
>>>> Here's the SPARK_SUBMIT_OPTIONS I used (packages and jars paths
>>>> omitted):
>>>>
>>>> export SPARK_SUBMIT_OPTIONS="--driver-memory 12G --packages ... --jars
>>>> ... --repositories
>>>> https://repository.cloudera.com/artifactory/public/,https://repository.cloudera.com/content/repositories/releases/,http://repo.spring.io/plugins-release/
>>>> "
>>>>
>>>> Thanks,
>>>> - Ethan
>>>> --
>>>> Best,
>>>> - Ethan
>>>>
>>>
>>>
>>> --
>>> Best Regards
>>>
>>> Jeff Zhang
>>>
>>
>
> --
> Best Regards
>
> Jeff Zhang
>
> --
> ========= mailto:db...@incadencecorp.com <db...@incadencecorp.com> 
> ============
> David W. Boyd
> VP,  Data Solutions
> 10432 Balls Ford, Suite 240
> Manassas, VA 20109
> office:   +1-703-552-2862
> cell:     +1-703-402-7908
> ============== http://www.incadencecorp.com/ ============
> ISO/IEC JTC1 WG9, editor ISO/IEC 20547 Big Data Reference Architecture
> Chair ANSI/INCITS TC Big Data
> Co-chair NIST Big Data Public Working Group Reference Architecture
> First Robotic Mentor - FRC, FTC - www.iliterobotics.org
> Board Member- USSTEM Foundation - www.usstem.org
>
> The information contained in this message may be privileged
> and/or confidential and protected from disclosure.
> If the reader of this message is not the intended recipient
> or an employee or agent responsible for delivering this message
> to the intended recipient, you are hereby notified that any
> dissemination, distribution or copying of this communication
> is strictly prohibited.  If you have received this communication
> in error, please notify the sender immediately by replying to
> this message and deleting the material from any computer.
>
>
>
>

Reply via email to