Re: "You must build Spark with Hive. Export 'SPARK_HIVE=true'"

Ruslan Dautkhanov Tue, 29 Nov 2016 12:05:57 -0800

That's what we will have to do. It's hard to explain to users though, that
in Zeppelin you can assign HiveContext
to a variable only once. Didn't have this problem in Jupyter. Is this hard
to fix? Created https://issues.apache.org/jira/browse/ZEPPELIN-1728


If somebody forgets about this rule, it's only fixable by restarting
Zeppelin server which is super inconvenient.

Thanks.



-- 
Ruslan Dautkhanov

On Tue, Nov 29, 2016 at 12:54 PM, Felix Cheung <felixcheun...@hotmail.com>
wrote:

> Can you reuse the HiveContext instead of making new ones with
> HiveContext(sc)?
>
>
> ------------------------------
> *From:* Ruslan Dautkhanov <dautkha...@gmail.com>
> *Sent:* Sunday, November 27, 2016 8:07:41 AM
> *To:* users
> *Subject:* Re: "You must build Spark with Hive. Export 'SPARK_HIVE=true'"
>
> Also, to get rid of this problem (once HiveContext(sc) was assigned at
> least twice to a variable,
> the only fix is - ro restart Zeppelin :-(
>
>
> --
> Ruslan Dautkhanov
>
> On Sun, Nov 27, 2016 at 9:00 AM, Ruslan Dautkhanov <dautkha...@gmail.com>
> wrote:
>
>> I found a pattern when this happens.
>>
>> When I run
>> sqlCtx = HiveContext(sc)
>>
>> it works as expected.
>>
>> Second and any time after that - gives that exception stack I reported in
>> this email chain.
>>
>> > sqlCtx = HiveContext(sc)
>> > sqlCtx.sql('select * from marketview.spend_dim')
>>
>> You must build Spark with Hive. Export 'SPARK_HIVE=true' and run
>> build/sbt assembly
>> Traceback (most recent call last):
>> File "/tmp/zeppelin_pyspark-6752406810533348793.py", line 267, in
>> <module>
>> raise Exception(traceback.format_exc())
>> Exception: Traceback (most recent call last):
>> File "/tmp/zeppelin_pyspark-6752406810533348793.py", line 265, in
>> <module>
>> exec(code)
>> File "<stdin>", line 2, in <module>
>> File "/opt/cloudera/parcels/CDH/lib/spark/python/pyspark/sql/context.py",
>> line 580, in sql
>> return DataFrame(self._ssql_ctx.sql(sqlQuery), self)
>> File "/opt/cloudera/parcels/CDH/lib/spark/python/pyspark/sql/context.py",
>> line 683, in _ssql_ctx
>> self._scala_HiveContext = self._get_hive_ctx()
>> File "/opt/cloudera/parcels/CDH/lib/spark/python/pyspark/sql/context.py",
>> line 692, in _get_hive_ctx
>> return self._jvm.HiveContext(self._jsc.sc())
>> File 
>> "/opt/cloudera/parcels/CDH/lib/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py",
>> line 1064, in __call__
>> answer, self._gateway_client, None, self._fqn)
>> File "/opt/cloudera/parcels/CDH/lib/spark/python/pyspark/sql/utils.py",
>> line 45, in deco
>> return f(*a, **kw)
>>
>>
>> Key piece to reproduce this issue - assign HiveContext(sc) to a variable
>> more than once,
>> and use that variable between assignments.
>>
>>
>> --
>> Ruslan Dautkhanov
>>
>> On Mon, Nov 21, 2016 at 2:52 PM, Ruslan Dautkhanov <dautkha...@gmail.com>
>> wrote:
>>
>>> Getting
>>> You must *build Spark with Hive*. Export 'SPARK_HIVE=true'
>>> See full stack [2] below.
>>>
>>> I'm using Spark 1.6 that comes with CDH 5.8.3.
>>> So it's definitely compiled with Hive.
>>> We use Jupyter notebooks without problems in the same environment.
>>>
>>> Using Zeppelin 0.6.2, downloaded as zeppelin-0.6.2-bin-all.tgz from from
>>> apache.org
>>>
>>> Is Zeppelin compiled with Hive too? I guess so.
>>> Not sure what else is missing.
>>>
>>> Tried to play with ZEPPELIN_SPARK_USEHIVECONTEXT but it does not make
>>> difference.
>>>
>>>
>>> [1]
>>> $ cat zeppelin-env.sh
>>> export JAVA_HOME=/usr/java/java7
>>> export SPARK_HOME=/opt/cloudera/parcels/CDH/lib/spark
>>> export SPARK_SUBMIT_OPTIONS="--principal xxxx --keytab yyy --conf
>>> spark.driver.memory=7g --conf spark.executor.cores=2 --conf
>>> spark.executor.memory=8g"
>>> export SPARK_APP_NAME="Zeppelin notebook"
>>> export HADOOP_CONF_DIR=/etc/hadoop/conf
>>> export HIVE_CONF_DIR=/etc/hive/conf
>>> export HIVE_HOME=/opt/cloudera/parcels/CDH/lib/hive
>>> export PYSPARK_PYTHON="/opt/cloudera/parcels/Anaconda/bin/python2"
>>> export PYTHONPATH="/opt/cloudera/parcels/CDH/lib/spark/python:/opt/
>>> cloudera/parcels/CDH/lib/spark/python/lib/py4j-0.9-src.zip"
>>> export MASTER="yarn-client"
>>> export ZEPPELIN_SPARK_USEHIVECONTEXT=true
>>>
>>>
>>>
>>>
>>> [2]
>>>
>>> You must build Spark with Hive. Export 'SPARK_HIVE=true' and run
>>> build/sbt assembly
>>> Traceback (most recent call last):
>>> File "/tmp/zeppelin_pyspark-9143637669637506477.py", line 267, in
>>> <module>
>>> raise Exception(traceback.format_exc())
>>> Exception: Traceback (most recent call last):
>>> File "/tmp/zeppelin_pyspark-9143637669637506477.py", line 265, in
>>> <module>
>>> exec(code)
>>> File "<stdin>", line 9, in <module>
>>> File "/opt/cloudera/parcels/CDH/lib/spark/python/pyspark/sql/context.py",
>>> line 580, in sql
>>>
>>> [3]
>>> Also have correct symlinks in zeppelin_home/conf for
>>> - hive-site.xml
>>> - hdfs-site.xml
>>> - core-site.xml
>>> - yarn-site.xml
>>>
>>>
>>>
>>> Thank you,
>>> Ruslan Dautkhanov
>>>
>>
>>
>

Re: "You must build Spark with Hive. Export 'SPARK_HIVE=true'"

Reply via email to