My point is that I suspect CDH also didn't compile spark with hive, you can run spark-shell to verify that.
Ruslan Dautkhanov <dautkha...@gmail.com>于2016年11月25日周五 上午1:48写道: > Yep, CDH doesn't have Spark compiled with Thrift server. > My understanding Zeppelin uses spark-shell REPL and not Spark thrift > server. > > Thank you. > > > > -- > Ruslan Dautkhanov > > On Thu, Nov 24, 2016 at 1:57 AM, Jeff Zhang <zjf...@gmail.com> wrote: > > AFAIK, spark of CDH don’t support spark thrift server, so it is possible > it is not compiled with hive. Can you run spark-shell to verify that ? If > it is built with hive, HiveContext will be created in spark-shell. > > Ruslan Dautkhanov <dautkha...@gmail.com>于2016年11月24日周四 下午3:30写道: > > I can't reproduce this in %spark, nor %sql > > It seems to be %pyspark-specific. > > Also seems it runs fine first time I start Zeppelin, then it shows this > error > You must build Spark with Hive. Export 'SPARK_HIVE=true' and run build/sbt > assembly > > > sqlc = HiveContext(sc) > sqlc.sql("select count(*) from hivedb.someTable") > > It runs fine only one time, then > > You must build Spark with Hive. Export 'SPARK_HIVE=true' and run build/sbt > assembly > Traceback (most recent call last): > > File "/tmp/zeppelin_pyspark-8000586427786928449.py", line 267, in <module> > > > raise Exception(traceback.format_exc()) > Exception: Traceback (most recent call last): > > File "/tmp/zeppelin_pyspark-8000586427786928449.py", line 265, in <module> > exec(code) > File "<stdin>", line 2, in <module> > > > File "/opt/cloudera/parcels/CDH/lib/spark/python/pyspark/sql/context.py", > line 580, in sql > > return DataFrame(self._ssql_ctx.sql(sqlQuery), self) > File "/opt/cloudera/parcels/CDH/lib/spark/python/pyspark/sql/context.py", > line 683, in _ssql_ctx > self._scala_HiveContext = self._get_hive_ctx() > File "/opt/cloudera/parcels/CDH/lib/spark/python/pyspark/sql/context.py", > line 692, in _get_hive_ctx > return self._jvm.HiveContext(self._jsc.sc()) > File > "/opt/cloudera/parcels/CDH/lib/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", > line 1064, in __call__ > answer, self._gateway_client, None, self._fqn) > File "/opt/cloudera/parcels/CDH/lib/spark/python/pyspark/sql/utils.py", > line 45, in deco > return f(*a, **kw) > > > > I don't see more details in logs than above error stack. > > > -- > Ruslan Dautkhanov > > On Wed, Nov 23, 2016 at 7:02 AM, Felix Cheung <felixcheun...@hotmail.com> > wrote: > > Hmm, SPARK_HOME is set it should pick up the right Spark. > > Does this work with the Scala Spark interpreter instead of pyspark? If it > doesn't, is there more info in the log? > > > ------------------------------ > *From:* Ruslan Dautkhanov <dautkha...@gmail.com> > *Sent:* Monday, November 21, 2016 1:52:36 PM > *To:* users@zeppelin.apache.org > *Subject:* "You must build Spark with Hive. Export 'SPARK_HIVE=true'" > > Getting > You must *build Spark with Hive*. Export 'SPARK_HIVE=true' > See full stack [2] below. > > I'm using Spark 1.6 that comes with CDH 5.8.3. > So it's definitely compiled with Hive. > We use Jupyter notebooks without problems in the same environment. > > Using Zeppelin 0.6.2, downloaded as zeppelin-0.6.2-bin-all.tgz from from > apache.org > > Is Zeppelin compiled with Hive too? I guess so. > Not sure what else is missing. > > Tried to play with ZEPPELIN_SPARK_USEHIVECONTEXT but it does not make > difference. > > > [1] > $ cat zeppelin-env.sh > export JAVA_HOME=/usr/java/java7 > export SPARK_HOME=/opt/cloudera/parcels/CDH/lib/spark > export SPARK_SUBMIT_OPTIONS="--principal xxxx --keytab yyy --conf > spark.driver.memory=7g --conf spark.executor.cores=2 --conf > spark.executor.memory=8g" > export SPARK_APP_NAME="Zeppelin notebook" > export HADOOP_CONF_DIR=/etc/hadoop/conf > export HIVE_CONF_DIR=/etc/hive/conf > export HIVE_HOME=/opt/cloudera/parcels/CDH/lib/hive > export PYSPARK_PYTHON="/opt/cloudera/parcels/Anaconda/bin/python2" > export > PYTHONPATH="/opt/cloudera/parcels/CDH/lib/spark/python:/opt/cloudera/parcels/CDH/lib/spark/python/lib/py4j-0.9-src.zip" > export MASTER="yarn-client" > export ZEPPELIN_SPARK_USEHIVECONTEXT=true > > > > > [2] > > You must build Spark with Hive. Export 'SPARK_HIVE=true' and run build/sbt > assembly > Traceback (most recent call last): > File "/tmp/zeppelin_pyspark-9143637669637506477.py", line 267, in <module> > raise Exception(traceback.format_exc()) > Exception: Traceback (most recent call last): > File "/tmp/zeppelin_pyspark-9143637669637506477.py", line 265, in <module> > exec(code) > File "<stdin>", line 9, in <module> > File "/opt/cloudera/parcels/CDH/lib/spark/python/pyspark/sql/context.py", > line 580, in sql > > [3] > Also have correct symlinks in zeppelin_home/conf for > - hive-site.xml > - hdfs-site.xml > - core-site.xml > - yarn-site.xml > > > > Thank you, > Ruslan Dautkhanov > > > >