Can you reuse the HiveContext instead of making new ones with HiveContext(sc)?
________________________________ From: Ruslan Dautkhanov <[email protected]> Sent: Sunday, November 27, 2016 8:07:41 AM To: users Subject: Re: "You must build Spark with Hive. Export 'SPARK_HIVE=true'" Also, to get rid of this problem (once HiveContext(sc) was assigned at least twice to a variable, the only fix is - ro restart Zeppelin :-( -- Ruslan Dautkhanov On Sun, Nov 27, 2016 at 9:00 AM, Ruslan Dautkhanov <[email protected]<mailto:[email protected]>> wrote: I found a pattern when this happens. When I run sqlCtx = HiveContext(sc) it works as expected. Second and any time after that - gives that exception stack I reported in this email chain. > sqlCtx = HiveContext(sc) > sqlCtx.sql('select * from marketview.spend_dim') You must build Spark with Hive. Export 'SPARK_HIVE=true' and run build/sbt assembly Traceback (most recent call last): File "/tmp/zeppelin_pyspark-6752406810533348793.py", line 267, in <module> raise Exception(traceback.format_exc()) Exception: Traceback (most recent call last): File "/tmp/zeppelin_pyspark-6752406810533348793.py", line 265, in <module> exec(code) File "<stdin>", line 2, in <module> File "/opt/cloudera/parcels/CDH/lib/spark/python/pyspark/sql/context.py", line 580, in sql return DataFrame(self._ssql_ctx.sql(sqlQuery), self) File "/opt/cloudera/parcels/CDH/lib/spark/python/pyspark/sql/context.py", line 683, in _ssql_ctx self._scala_HiveContext = self._get_hive_ctx() File "/opt/cloudera/parcels/CDH/lib/spark/python/pyspark/sql/context.py", line 692, in _get_hive_ctx return self._jvm.HiveContext(self._jsc.sc<http://jsc.sc>()) File "/opt/cloudera/parcels/CDH/lib/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 1064, in __call__ answer, self._gateway_client, None, self._fqn) File "/opt/cloudera/parcels/CDH/lib/spark/python/pyspark/sql/utils.py", line 45, in deco return f(*a, **kw) Key piece to reproduce this issue - assign HiveContext(sc) to a variable more than once, and use that variable between assignments. -- Ruslan Dautkhanov On Mon, Nov 21, 2016 at 2:52 PM, Ruslan Dautkhanov <[email protected]<mailto:[email protected]>> wrote: Getting You must build Spark with Hive. Export 'SPARK_HIVE=true' See full stack [2] below. I'm using Spark 1.6 that comes with CDH 5.8.3. So it's definitely compiled with Hive. We use Jupyter notebooks without problems in the same environment. Using Zeppelin 0.6.2, downloaded as zeppelin-0.6.2-bin-all.tgz from from apache.org<http://apache.org> Is Zeppelin compiled with Hive too? I guess so. Not sure what else is missing. Tried to play with ZEPPELIN_SPARK_USEHIVECONTEXT but it does not make difference. [1] $ cat zeppelin-env.sh export JAVA_HOME=/usr/java/java7 export SPARK_HOME=/opt/cloudera/parcels/CDH/lib/spark export SPARK_SUBMIT_OPTIONS="--principal xxxx --keytab yyy --conf spark.driver.memory=7g --conf spark.executor.cores=2 --conf spark.executor.memory=8g" export SPARK_APP_NAME="Zeppelin notebook" export HADOOP_CONF_DIR=/etc/hadoop/conf export HIVE_CONF_DIR=/etc/hive/conf export HIVE_HOME=/opt/cloudera/parcels/CDH/lib/hive export PYSPARK_PYTHON="/opt/cloudera/parcels/Anaconda/bin/python2" export PYTHONPATH="/opt/cloudera/parcels/CDH/lib/spark/python:/opt/cloudera/parcels/CDH/lib/spark/python/lib/py4j-0.9-src.zip" export MASTER="yarn-client" export ZEPPELIN_SPARK_USEHIVECONTEXT=true [2] You must build Spark with Hive. Export 'SPARK_HIVE=true' and run build/sbt assembly Traceback (most recent call last): File "/tmp/zeppelin_pyspark-9143637669637506477.py", line 267, in <module> raise Exception(traceback.format_exc()) Exception: Traceback (most recent call last): File "/tmp/zeppelin_pyspark-9143637669637506477.py", line 265, in <module> exec(code) File "<stdin>", line 9, in <module> File "/opt/cloudera/parcels/CDH/lib/spark/python/pyspark/sql/context.py", line 580, in sql [3] Also have correct symlinks in zeppelin_home/conf for - hive-site.xml - hdfs-site.xml - core-site.xml - yarn-site.xml Thank you, Ruslan Dautkhanov
