Re: Installation of Zeppelin at separate machine to connect to Spark Yarn cluster

Eugene Sun, 13 Sep 2015 10:01:43 -0700

Hi all,

Rough! I spent around 16 hours in making things work due to lack of decent
documentation, plus I reported two reproducible bugs to Jira.
Still I don't understand this manages to operate but it does work. Maybe
somebody can shed some light here.


To make Zeppelin work distantly with Amazon EMR + Spark I did the following:


   1. Update EMR_MASTER EC2 security groups to accept incoming requests
   from all ports, to communicate with Zeppelin (should be specific port, not
   yet know which)
   2. Copy directory EMR_MASTER:/etc/hadoop/conf to
   MY_STANDALONE_SERVER:/home/zeppelin/hadoop-conf.
   3. zeppelin/conf/zeppelin-env.sh should contain:
   export MASTER=yarn-client
   export HADOOP_CONF_DIR=/home/zeppelin/hadoop-conf


Most of the time I wasted because I specified SPARK_HOME env variable in
zeppelin/conf/zeppelin-env.sh, as it's "(required)", it cause all sorts of
jar hell problems (missing libs).


My questions:

   - As far as I understand, Yarn simply executes jars over cluster, it
   does not know anything about spark, right?
   - If so, when I specify SPARK_HOME to my spark installation and set
   MASTER=yarn-client, I supposed Zeppelin should be sending spark app jars to
   Yarn, correct?
   - Then, I don't get it how to dependency libs to my code (for example
   AWS S3 lib from maven) due to the following bugs:
   https://issues.apache.org/jira/browse/ZEPPELIN-301,
   https://issues.apache.org/jira/browse/ZEPPELIN-302. What exactly should
   I add to SPARK_HOME/conf/spark-defaults.conf to make my code be supplied
   with dependency jars?






2015-09-11 3:24 GMT+04:00 moon soo Lee <m...@apache.org>:

> I didn't tried in EMR with separated machine. but theoretically you can
> try,
> If you build Zeppelin from current master branch,
>
> 1. copy the SPARK_HOME and HADOOP_HOME directory from EMR cluster to your
> separate machine, with keeping the same path.
> 2. Make sure your SPARK_HOME/bin/spark-shell works from your separate
> machine.
> 3. Then, export SPARK_HOME  and MASTER in your conf/zeppelin-env.sh file.
> 4. Enjoy
>
> Hope this helps
>
> Best,
> moon
>
>
> On Thu, Sep 10, 2015 at 1:46 PM ÐΞ€ρ@Ҝ (๏̯͡๏) <deepuj...@gmail.com> wrote:
>
>> 1. Hadoop client machine: It needs to have ACL open to submit hadoop
>> jobs/read/write to HDFS. The information is contained in the site*.xml
>> files that are found on conf directory, these xml contain all the details
>> of cluster that you wish to communicate to.
>> 2. You can then use the wiki to install zeppelin and connect to this YARN
>> cluster.
>>
>>
>>
>>
>> On Thu, Sep 10, 2015 at 12:12 PM, Eugene <blackorange...@gmail.com>
>> wrote:
>>
>>> Hi all,
>>>
>>> I have Amazon EMR Hadoop v2.6 cluster with Spark 1.4.1, with Yarn
>>> resource manager.
>>> I want to deploy Zeppelin on separate machine to allow turning off EMR
>>> cluster when there is no jobs running.
>>>
>>> I tried following instruction from here
>>> https://zeppelin.incubator.apache.org/docs/install/yarn_install.html
>>> with not much of success.
>>>
>>> In particular, I don't understand how hadoop should be present on client
>>> machine.
>>> EMR cluster has hadoop installed on itself, it has config directory. Do
>>> I need to copy this config directory to machine where Zeppelin is installed
>>> and reference it from Zeppelin config?
>>>
>>>
>>>    1. I installed Zeppelin and built it according to link.
>>>    2. I installed Spark 1.4.1 with embedded hadoop and referenced it in
>>>    Zeppelin config.
>>>    3. I copied yarn-site.xml to ~/hadoop-conf folder in Zeppelin
>>>    machine and referenced ~/hadoop-conf as HADOOP_CONF_DIR
>>>    4. I use MASTER=yarn-client mode
>>>
>>>
>>> I got different errors in logs:
>>>
>>>    - org.apache.spark.SparkException: Yarn application has already
>>>    ended! It might have been killed or unable to launch application master.
>>>    - org.apache.thrift.transport.TTransportException
>>>    - org.apache.thrift.transport.TTransportException:
>>>    java.net.SocketException: Broken pipe
>>>
>>>
>>> Can somebody demystify steps how Zeppelin should connect to existing
>>> Yarn cluster from different machine?
>>>
>>> --
>>>
>>>
>>> Best regards,
>>> Eugene.
>>>
>>
>>
>>
>> --
>> Deepak
>>
>>


-- 


Best regards,
Eugene.

Re: Installation of Zeppelin at separate machine to connect to Spark Yarn cluster

Reply via email to