Re: Zeppelin + Spark On EMR?

Ophir Cohen Wed, 30 Sep 2015 02:29:34 -0700

Did anyone else encountered that problem?

I removed the *--driver-class-path "${CLASSPATH}"* from bin/interpreter.sh
script and now it start the SparkContext as expected.
The problem is that it does not grab my local hive-site.xml that pointed to
an external metastore and try to use the local one :(


On Fri, Sep 18, 2015 at 4:14 PM, Eugene <blackorange...@gmail.com> wrote:

> Hi Anders,
>
> I also had the error you mention, overcame this with:
>
>    1. using spark installation from zeppelin
>    2. altering conf/interpreter.json with properties like
>    "spark.executor.instances", "spark.executor.cores",
>    "spark.default.parallelism" from spark-defaults.conf, parsed this file
>    using parts of your gist.
>
> Code looks like this:
>
> cd ~/zeppelin/conf/
> SPARK_DEFAULTS=~/emr-spark-defaults.conf
> SPARK_EXECUTOR_INSTANCES=$(grep spark.executor.instances $SPARK_DEFAULTS |
> awk '{print $2}')
> SPARK_EXECUTOR_CORES=$(grep spark.executor.cores $SPARK_DEFAULTS | awk
> '{print $2}')
> SPARK_EXECUTOR_MEMORY=$(grep spark.executor.memory $SPARK_DEFAULTS | awk
> '{print $2}')
> SPARK_DEFAULT_PARALLELISM=$(grep spark.default.parallelism $SPARK_DEFAULTS
> | awk '{print $2}')
> cat interpreter.json | jq
> ".interpreterSettings.\"2B188AQ5T\".properties.\"spark.executor.instances\"
> = \"${SPARK_EXECUTOR_INSTANCES}\" |
> .interpreterSettings.\"2B188AQ5T\".properties.\"spark.executor.cores\" =
> \"${SPARK_EXECUTOR_CORES}\" |
> .interpreterSettings.\"2B188AQ5T\".properties.\"spark.executor.memory\" =
> \"${SPARK_EXECUTOR_MEMORY}\" |
> .interpreterSettings.\"2B188AQ5T\".properties.\"spark.default.parallelism\"
> = \"${SPARK_DEFAULT_PARALLELISM}\" " > interpreter.json_
> cat interpreter.json_ > interpreter.json
> rm interpreter.json_
>
>
> 2015-09-18 17:05 GMT+04:00 Anders Hammar <anders.ham...@gmail.com>:
>
>> Hi,
>>
>> Thank you Phil for updating my script to support the latest version of
>> EMR.
>> I have edited my gist so that it includes some of your updates plus added
>> some other additional changes.
>>
>> https://gist.github.com/andershammar/224e1077021d0ea376dd
>>
>> While on the subject, has anyone be able to get Zeppelin to work together
>> with the Amazon's Spark installation on Amazon EMR 4.x (by exporting
>> SPARK_HOME and HADOOP_HOME instead)? When I try this then I get the
>> following exception:
>>
>> org.apache.spark.SparkException: Found both spark.driver.extraClassPath
>> and SPARK_CLASSPATH. Use only the former.
>>     at
>> org.apache.spark.SparkConf$$anonfun$validateSettings$6$$anonfun$apply$8.apply(SparkConf.scala:444)
>>     at
>> org.apache.spark.SparkConf$$anonfun$validateSettings$6$$anonfun$apply$8.apply(SparkConf.scala:442)
>>     at scala.collection.immutable.List.foreach(List.scala:318)
>>     at
>> org.apache.spark.SparkConf$$anonfun$validateSettings$6.apply(SparkConf.scala:442)
>>     at
>> org.apache.spark.SparkConf$$anonfun$validateSettings$6.apply(SparkConf.scala:430)
>>     at scala.Option.foreach(Option.scala:236)
>>     at org.apache.spark.SparkConf.validateSettings(SparkConf.scala:430)
>>     ...
>>
>> From a quick look at it, the problem seems to be that the Amazon
>> installation of Spark use SPARK_CLASSPATH to add additional libraries
>> (/etc/spark/conf/spark-env.sh) while the Zeppelin use "spark-submit
>> --driver-class-path" (zeppelin/bin/interpreter.sh).
>>
>> Any ideas?
>>
>> Best regards,
>> Anders
>>
>>
>> On Wed, Sep 9, 2015 at 5:09 PM, Eugene <blackorange...@gmail.com> wrote:
>>
>>> Here's a bit shorter alternative, too
>>>
>>> https://gist.github.com/snowindy/008f3e8b878a23c00679
>>>
>>> 2015-09-09 18:58 GMT+04:00 shahab <shahab.mok...@gmail.com>:
>>>
>>>> Thanks Phil, it works. Great job and well done!
>>>>
>>>> best,
>>>> /Shahab
>>>>
>>>> On Mon, Sep 7, 2015 at 6:32 PM, Phil Wills <otherp...@gmail.com> wrote:
>>>>
>>>>> Anders script is a bit out of date if you're using the latest version
>>>>> of EMR.  Here's my fork:
>>>>>
>>>>> https://gist.github.com/philwills/71539f833f57338236b5
>>>>>
>>>>> which worked OK for me fairly recently.
>>>>>
>>>>> Phil
>>>>>
>>>>> On Mon, 7 Sep 2015 at 10:01 shahab <shahab.mok...@gmail.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I am trying to use Zeppelin to work with Spark on Amazon EMR. I used
>>>>>> the script provided by Anders (
>>>>>> https://gist.github.com/andershammar/224e1077021d0ea376dd) to setup
>>>>>> Zeppelin. The Zeppelin can connect to Spark but when I got error when I 
>>>>>> run
>>>>>> the tutorials. and I get the following error:
>>>>>>
>>>>>> ...FileNotFoundException: File
>>>>>> file:/home/hadoop/zeppelin/interpreter/spark/dep/zeppelin-spark-dependencies-0.6.0-incubating-SNAPSHOT.jar
>>>>>> does not exist
>>>>>>
>>>>>> However, the above file does exists in that path on the Master node.'
>>>>>>
>>>>>> I do appreciate if anyone has any experience to share how to setup
>>>>>> Zeppelin with EMR .
>>>>>>
>>>>>> best,
>>>>>> /Shahab
>>>>>>
>>>>>>
>>>>
>>>
>>>
>>> --
>>>
>>>
>>> Best regards,
>>> Eugene.
>>>
>>
>>
>
>
> --
>
>
> Best regards,
> Eugene.
>

Re: Zeppelin + Spark On EMR?

Reply via email to