Hi all, I have Amazon EMR Hadoop v2.6 cluster with Spark 1.4.1, with Yarn resource manager. I want to deploy Zeppelin on separate machine to allow turning off EMR cluster when there is no jobs running.
I tried following instruction from here https://zeppelin.incubator.apache.org/docs/install/yarn_install.html with not much of success. In particular, I don't understand how hadoop should be present on client machine. EMR cluster has hadoop installed on itself, it has config directory. Do I need to copy this config directory to machine where Zeppelin is installed and reference it from Zeppelin config? 1. I installed Zeppelin and built it according to link. 2. I installed Spark 1.4.1 with embedded hadoop and referenced it in Zeppelin config. 3. I copied yarn-site.xml to ~/hadoop-conf folder in Zeppelin machine and referenced ~/hadoop-conf as HADOOP_CONF_DIR 4. I use MASTER=yarn-client mode I got different errors in logs: - org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master. - org.apache.thrift.transport.TTransportException - org.apache.thrift.transport.TTransportException: java.net.SocketException: Broken pipe Can somebody demystify steps how Zeppelin should connect to existing Yarn cluster from different machine? -- Best regards, Eugene.
