Hi all, Rough! I spent around 16 hours in making things work due to lack of decent documentation, plus I reported two reproducible bugs to Jira. Still I don't understand this manages to operate but it does work. Maybe somebody can shed some light here.
To make Zeppelin work distantly with Amazon EMR + Spark I did the following: 1. Update EMR_MASTER EC2 security groups to accept incoming requests from all ports, to communicate with Zeppelin (should be specific port, not yet know which) 2. Copy directory EMR_MASTER:/etc/hadoop/conf to MY_STANDALONE_SERVER:/home/zeppelin/hadoop-conf. 3. zeppelin/conf/zeppelin-env.sh should contain: export MASTER=yarn-client export HADOOP_CONF_DIR=/home/zeppelin/hadoop-conf Most of the time I wasted because I specified SPARK_HOME env variable in zeppelin/conf/zeppelin-env.sh, as it's "(required)", it cause all sorts of jar hell problems (missing libs). My questions: - As far as I understand, Yarn simply executes jars over cluster, it does not know anything about spark, right? - If so, when I specify SPARK_HOME to my spark installation and set MASTER=yarn-client, I supposed Zeppelin should be sending spark app jars to Yarn, correct? - Then, I don't get it how to dependency libs to my code (for example AWS S3 lib from maven) due to the following bugs: https://issues.apache.org/jira/browse/ZEPPELIN-301, https://issues.apache.org/jira/browse/ZEPPELIN-302. What exactly should I add to SPARK_HOME/conf/spark-defaults.conf to make my code be supplied with dependency jars? 2015-09-11 3:24 GMT+04:00 moon soo Lee <m...@apache.org>: > I didn't tried in EMR with separated machine. but theoretically you can > try, > If you build Zeppelin from current master branch, > > 1. copy the SPARK_HOME and HADOOP_HOME directory from EMR cluster to your > separate machine, with keeping the same path. > 2. Make sure your SPARK_HOME/bin/spark-shell works from your separate > machine. > 3. Then, export SPARK_HOME and MASTER in your conf/zeppelin-env.sh file. > 4. Enjoy > > Hope this helps > > Best, > moon > > > On Thu, Sep 10, 2015 at 1:46 PM ÐΞ€ρ@Ҝ (๏̯͡๏) <deepuj...@gmail.com> wrote: > >> 1. Hadoop client machine: It needs to have ACL open to submit hadoop >> jobs/read/write to HDFS. The information is contained in the site*.xml >> files that are found on conf directory, these xml contain all the details >> of cluster that you wish to communicate to. >> 2. You can then use the wiki to install zeppelin and connect to this YARN >> cluster. >> >> >> >> >> On Thu, Sep 10, 2015 at 12:12 PM, Eugene <blackorange...@gmail.com> >> wrote: >> >>> Hi all, >>> >>> I have Amazon EMR Hadoop v2.6 cluster with Spark 1.4.1, with Yarn >>> resource manager. >>> I want to deploy Zeppelin on separate machine to allow turning off EMR >>> cluster when there is no jobs running. >>> >>> I tried following instruction from here >>> https://zeppelin.incubator.apache.org/docs/install/yarn_install.html >>> with not much of success. >>> >>> In particular, I don't understand how hadoop should be present on client >>> machine. >>> EMR cluster has hadoop installed on itself, it has config directory. Do >>> I need to copy this config directory to machine where Zeppelin is installed >>> and reference it from Zeppelin config? >>> >>> >>> 1. I installed Zeppelin and built it according to link. >>> 2. I installed Spark 1.4.1 with embedded hadoop and referenced it in >>> Zeppelin config. >>> 3. I copied yarn-site.xml to ~/hadoop-conf folder in Zeppelin >>> machine and referenced ~/hadoop-conf as HADOOP_CONF_DIR >>> 4. I use MASTER=yarn-client mode >>> >>> >>> I got different errors in logs: >>> >>> - org.apache.spark.SparkException: Yarn application has already >>> ended! It might have been killed or unable to launch application master. >>> - org.apache.thrift.transport.TTransportException >>> - org.apache.thrift.transport.TTransportException: >>> java.net.SocketException: Broken pipe >>> >>> >>> Can somebody demystify steps how Zeppelin should connect to existing >>> Yarn cluster from different machine? >>> >>> -- >>> >>> >>> Best regards, >>> Eugene. >>> >> >> >> >> -- >> Deepak >> >> -- Best regards, Eugene.