Hi all,

I have Amazon EMR Hadoop v2.6 cluster with Spark 1.4.1, with Yarn resource
manager.
I want to deploy Zeppelin on separate machine to allow turning off EMR
cluster when there is no jobs running.

I tried following instruction from here
https://zeppelin.incubator.apache.org/docs/install/yarn_install.html
with not much of success.

In particular, I don't understand how hadoop should be present on client
machine.
EMR cluster has hadoop installed on itself, it has config directory. Do I
need to copy this config directory to machine where Zeppelin is installed
and reference it from Zeppelin config?


   1. I installed Zeppelin and built it according to link.
   2. I installed Spark 1.4.1 with embedded hadoop and referenced it in
   Zeppelin config.
   3. I copied yarn-site.xml to ~/hadoop-conf folder in Zeppelin machine
   and referenced ~/hadoop-conf as HADOOP_CONF_DIR
   4. I use MASTER=yarn-client mode


I got different errors in logs:

   - org.apache.spark.SparkException: Yarn application has already ended!
   It might have been killed or unable to launch application master.
   - org.apache.thrift.transport.TTransportException
   - org.apache.thrift.transport.TTransportException:
   java.net.SocketException: Broken pipe


Can somebody demystify steps how Zeppelin should connect to existing Yarn
cluster from different machine?

-- 


Best regards,
Eugene.

Reply via email to