That's what we will have to do. It's hard to explain to users though, that in Zeppelin you can assign HiveContext to a variable only once. Didn't have this problem in Jupyter. Is this hard to fix? Created https://issues.apache.org/jira/browse/ZEPPELIN-1728
If somebody forgets about this rule, it's only fixable by restarting Zeppelin server which is super inconvenient. Thanks. -- Ruslan Dautkhanov On Tue, Nov 29, 2016 at 12:54 PM, Felix Cheung <felixcheun...@hotmail.com> wrote: > Can you reuse the HiveContext instead of making new ones with > HiveContext(sc)? > > > ------------------------------ > *From:* Ruslan Dautkhanov <dautkha...@gmail.com> > *Sent:* Sunday, November 27, 2016 8:07:41 AM > *To:* users > *Subject:* Re: "You must build Spark with Hive. Export 'SPARK_HIVE=true'" > > Also, to get rid of this problem (once HiveContext(sc) was assigned at > least twice to a variable, > the only fix is - ro restart Zeppelin :-( > > > -- > Ruslan Dautkhanov > > On Sun, Nov 27, 2016 at 9:00 AM, Ruslan Dautkhanov <dautkha...@gmail.com> > wrote: > >> I found a pattern when this happens. >> >> When I run >> sqlCtx = HiveContext(sc) >> >> it works as expected. >> >> Second and any time after that - gives that exception stack I reported in >> this email chain. >> >> > sqlCtx = HiveContext(sc) >> > sqlCtx.sql('select * from marketview.spend_dim') >> >> You must build Spark with Hive. Export 'SPARK_HIVE=true' and run >> build/sbt assembly >> Traceback (most recent call last): >> File "/tmp/zeppelin_pyspark-6752406810533348793.py", line 267, in >> <module> >> raise Exception(traceback.format_exc()) >> Exception: Traceback (most recent call last): >> File "/tmp/zeppelin_pyspark-6752406810533348793.py", line 265, in >> <module> >> exec(code) >> File "<stdin>", line 2, in <module> >> File "/opt/cloudera/parcels/CDH/lib/spark/python/pyspark/sql/context.py", >> line 580, in sql >> return DataFrame(self._ssql_ctx.sql(sqlQuery), self) >> File "/opt/cloudera/parcels/CDH/lib/spark/python/pyspark/sql/context.py", >> line 683, in _ssql_ctx >> self._scala_HiveContext = self._get_hive_ctx() >> File "/opt/cloudera/parcels/CDH/lib/spark/python/pyspark/sql/context.py", >> line 692, in _get_hive_ctx >> return self._jvm.HiveContext(self._jsc.sc()) >> File >> "/opt/cloudera/parcels/CDH/lib/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", >> line 1064, in __call__ >> answer, self._gateway_client, None, self._fqn) >> File "/opt/cloudera/parcels/CDH/lib/spark/python/pyspark/sql/utils.py", >> line 45, in deco >> return f(*a, **kw) >> >> >> Key piece to reproduce this issue - assign HiveContext(sc) to a variable >> more than once, >> and use that variable between assignments. >> >> >> -- >> Ruslan Dautkhanov >> >> On Mon, Nov 21, 2016 at 2:52 PM, Ruslan Dautkhanov <dautkha...@gmail.com> >> wrote: >> >>> Getting >>> You must *build Spark with Hive*. Export 'SPARK_HIVE=true' >>> See full stack [2] below. >>> >>> I'm using Spark 1.6 that comes with CDH 5.8.3. >>> So it's definitely compiled with Hive. >>> We use Jupyter notebooks without problems in the same environment. >>> >>> Using Zeppelin 0.6.2, downloaded as zeppelin-0.6.2-bin-all.tgz from from >>> apache.org >>> >>> Is Zeppelin compiled with Hive too? I guess so. >>> Not sure what else is missing. >>> >>> Tried to play with ZEPPELIN_SPARK_USEHIVECONTEXT but it does not make >>> difference. >>> >>> >>> [1] >>> $ cat zeppelin-env.sh >>> export JAVA_HOME=/usr/java/java7 >>> export SPARK_HOME=/opt/cloudera/parcels/CDH/lib/spark >>> export SPARK_SUBMIT_OPTIONS="--principal xxxx --keytab yyy --conf >>> spark.driver.memory=7g --conf spark.executor.cores=2 --conf >>> spark.executor.memory=8g" >>> export SPARK_APP_NAME="Zeppelin notebook" >>> export HADOOP_CONF_DIR=/etc/hadoop/conf >>> export HIVE_CONF_DIR=/etc/hive/conf >>> export HIVE_HOME=/opt/cloudera/parcels/CDH/lib/hive >>> export PYSPARK_PYTHON="/opt/cloudera/parcels/Anaconda/bin/python2" >>> export PYTHONPATH="/opt/cloudera/parcels/CDH/lib/spark/python:/opt/ >>> cloudera/parcels/CDH/lib/spark/python/lib/py4j-0.9-src.zip" >>> export MASTER="yarn-client" >>> export ZEPPELIN_SPARK_USEHIVECONTEXT=true >>> >>> >>> >>> >>> [2] >>> >>> You must build Spark with Hive. Export 'SPARK_HIVE=true' and run >>> build/sbt assembly >>> Traceback (most recent call last): >>> File "/tmp/zeppelin_pyspark-9143637669637506477.py", line 267, in >>> <module> >>> raise Exception(traceback.format_exc()) >>> Exception: Traceback (most recent call last): >>> File "/tmp/zeppelin_pyspark-9143637669637506477.py", line 265, in >>> <module> >>> exec(code) >>> File "<stdin>", line 9, in <module> >>> File "/opt/cloudera/parcels/CDH/lib/spark/python/pyspark/sql/context.py", >>> line 580, in sql >>> >>> [3] >>> Also have correct symlinks in zeppelin_home/conf for >>> - hive-site.xml >>> - hdfs-site.xml >>> - core-site.xml >>> - yarn-site.xml >>> >>> >>> >>> Thank you, >>> Ruslan Dautkhanov >>> >> >> >