Did anyone else encountered that problem? I removed the *--driver-class-path "${CLASSPATH}"* from bin/interpreter.sh script and now it start the SparkContext as expected. The problem is that it does not grab my local hive-site.xml that pointed to an external metastore and try to use the local one :(
On Fri, Sep 18, 2015 at 4:14 PM, Eugene <blackorange...@gmail.com> wrote: > Hi Anders, > > I also had the error you mention, overcame this with: > > 1. using spark installation from zeppelin > 2. altering conf/interpreter.json with properties like > "spark.executor.instances", "spark.executor.cores", > "spark.default.parallelism" from spark-defaults.conf, parsed this file > using parts of your gist. > > Code looks like this: > > cd ~/zeppelin/conf/ > SPARK_DEFAULTS=~/emr-spark-defaults.conf > SPARK_EXECUTOR_INSTANCES=$(grep spark.executor.instances $SPARK_DEFAULTS | > awk '{print $2}') > SPARK_EXECUTOR_CORES=$(grep spark.executor.cores $SPARK_DEFAULTS | awk > '{print $2}') > SPARK_EXECUTOR_MEMORY=$(grep spark.executor.memory $SPARK_DEFAULTS | awk > '{print $2}') > SPARK_DEFAULT_PARALLELISM=$(grep spark.default.parallelism $SPARK_DEFAULTS > | awk '{print $2}') > cat interpreter.json | jq > ".interpreterSettings.\"2B188AQ5T\".properties.\"spark.executor.instances\" > = \"${SPARK_EXECUTOR_INSTANCES}\" | > .interpreterSettings.\"2B188AQ5T\".properties.\"spark.executor.cores\" = > \"${SPARK_EXECUTOR_CORES}\" | > .interpreterSettings.\"2B188AQ5T\".properties.\"spark.executor.memory\" = > \"${SPARK_EXECUTOR_MEMORY}\" | > .interpreterSettings.\"2B188AQ5T\".properties.\"spark.default.parallelism\" > = \"${SPARK_DEFAULT_PARALLELISM}\" " > interpreter.json_ > cat interpreter.json_ > interpreter.json > rm interpreter.json_ > > > 2015-09-18 17:05 GMT+04:00 Anders Hammar <anders.ham...@gmail.com>: > >> Hi, >> >> Thank you Phil for updating my script to support the latest version of >> EMR. >> I have edited my gist so that it includes some of your updates plus added >> some other additional changes. >> >> https://gist.github.com/andershammar/224e1077021d0ea376dd >> >> While on the subject, has anyone be able to get Zeppelin to work together >> with the Amazon's Spark installation on Amazon EMR 4.x (by exporting >> SPARK_HOME and HADOOP_HOME instead)? When I try this then I get the >> following exception: >> >> org.apache.spark.SparkException: Found both spark.driver.extraClassPath >> and SPARK_CLASSPATH. Use only the former. >> at >> org.apache.spark.SparkConf$$anonfun$validateSettings$6$$anonfun$apply$8.apply(SparkConf.scala:444) >> at >> org.apache.spark.SparkConf$$anonfun$validateSettings$6$$anonfun$apply$8.apply(SparkConf.scala:442) >> at scala.collection.immutable.List.foreach(List.scala:318) >> at >> org.apache.spark.SparkConf$$anonfun$validateSettings$6.apply(SparkConf.scala:442) >> at >> org.apache.spark.SparkConf$$anonfun$validateSettings$6.apply(SparkConf.scala:430) >> at scala.Option.foreach(Option.scala:236) >> at org.apache.spark.SparkConf.validateSettings(SparkConf.scala:430) >> ... >> >> From a quick look at it, the problem seems to be that the Amazon >> installation of Spark use SPARK_CLASSPATH to add additional libraries >> (/etc/spark/conf/spark-env.sh) while the Zeppelin use "spark-submit >> --driver-class-path" (zeppelin/bin/interpreter.sh). >> >> Any ideas? >> >> Best regards, >> Anders >> >> >> On Wed, Sep 9, 2015 at 5:09 PM, Eugene <blackorange...@gmail.com> wrote: >> >>> Here's a bit shorter alternative, too >>> >>> https://gist.github.com/snowindy/008f3e8b878a23c00679 >>> >>> 2015-09-09 18:58 GMT+04:00 shahab <shahab.mok...@gmail.com>: >>> >>>> Thanks Phil, it works. Great job and well done! >>>> >>>> best, >>>> /Shahab >>>> >>>> On Mon, Sep 7, 2015 at 6:32 PM, Phil Wills <otherp...@gmail.com> wrote: >>>> >>>>> Anders script is a bit out of date if you're using the latest version >>>>> of EMR. Here's my fork: >>>>> >>>>> https://gist.github.com/philwills/71539f833f57338236b5 >>>>> >>>>> which worked OK for me fairly recently. >>>>> >>>>> Phil >>>>> >>>>> On Mon, 7 Sep 2015 at 10:01 shahab <shahab.mok...@gmail.com> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> I am trying to use Zeppelin to work with Spark on Amazon EMR. I used >>>>>> the script provided by Anders ( >>>>>> https://gist.github.com/andershammar/224e1077021d0ea376dd) to setup >>>>>> Zeppelin. The Zeppelin can connect to Spark but when I got error when I >>>>>> run >>>>>> the tutorials. and I get the following error: >>>>>> >>>>>> ...FileNotFoundException: File >>>>>> file:/home/hadoop/zeppelin/interpreter/spark/dep/zeppelin-spark-dependencies-0.6.0-incubating-SNAPSHOT.jar >>>>>> does not exist >>>>>> >>>>>> However, the above file does exists in that path on the Master node.' >>>>>> >>>>>> I do appreciate if anyone has any experience to share how to setup >>>>>> Zeppelin with EMR . >>>>>> >>>>>> best, >>>>>> /Shahab >>>>>> >>>>>> >>>> >>> >>> >>> -- >>> >>> >>> Best regards, >>> Eugene. >>> >> >> > > > -- > > > Best regards, > Eugene. >