How to have a native graphical representation (%sql) of a HiveContext?

LINZ, Arnaud Mon, 05 Feb 2018 04:33:54 -0800

Hello,

I’m trying to install Zeppelin (0.7.2) on my CDH cluster, and I am unable to 
connect the sql + graphical representations of the %sql  interpreter with my 
Hive data, and more surprisingly I really can’t find any good source on the 
internet (apache zeppelin documentation or stack overflow) that gives a 
practical answer about how to do this.
Most of the time, the data comes from compressed Hive tables and not plain hdfs 
text files ; so using a hive context is far more convenient than a plain spark 
sql context.


The following :
%spark
val hc = new  org.apache.spark.sql.hive.HiveContext(sc)
val result = hc.sql("select * from hivedb.hivetable")
result.registerTempTable("myTest")

works but no myTest table is available in the following %sql interpreter :
%sql
select * from myTest
org.apache.spark.sql.AnalysisException: Table not found: myTest;


However the following :
%pyspark
result = sqlContext.read.text("hdfs://cluster/test.txt")
result.registerTempTable("mySqlTest")

works as the %sql interpreter is “plugged”  to the sqlContext

but
result = sqlContext.sql("select * from hivedb.hivetable") does not work as the 
sqlContext is not a hive context.

I have set zeppelin.spark.useHiveContext to true, but it seems to have no 
effect (btw, it was more of a wild guess since the documentation is not giving 
much detail on parameters and context configuration)

Can you direct me towards how to configure the context used by the %sql 
interpreter?

Best regards,
Arnaud

PS : %spark and %sql interpreter conf:

master  yarn-client
spark.app.name  Zeppelin
spark.cores.max
spark.executor.memory   5g
zeppelin.R.cmd  R
zeppelin.R.image.width  100%
zeppelin.R.knitr    true
zeppelin.R.render.options   out.format = 'html', comment = NA, echo = FALSE, 
results = 'asis', message = F, warning = F
zeppelin.dep.additionalRemoteRepository 
spark-packages,http://dl.bintray.com/spark-packages/maven,false;
zeppelin.dep.localrepo  local-repo
zeppelin.interpreter.localRepo  /opt/zeppelin/local-repo/2CYVF45A9
zeppelin.interpreter.output.limit   102400
zeppelin.pyspark.python /usr/bin/pyspark
zeppelin.spark.concurrentSQL    true
zeppelin.spark.importImplicit   true
zeppelin.spark.maxResult    1000
zeppelin.spark.printREPLOutput  true
zeppelin.spark.sql.stacktrace   true
zeppelin.spark.useHiveContext   true

________________________________

L'intégrité de ce message n'étant pas assurée sur internet, la société 
expéditrice ne peut être tenue responsable de son contenu ni de ses pièces 
jointes. Toute utilisation ou diffusion non autorisée est interdite. Si vous 
n'êtes pas destinataire de ce message, merci de le détruire et d'avertir 
l'expéditeur.

The integrity of this message cannot be guaranteed on the Internet. The company 
that sent this message cannot therefore be held liable for its content nor 
attachments. Any unauthorized use or dissemination is prohibited. If you are 
not the intended recipient of this message, then please delete it and notify 
the sender.

How to have a native graphical representation (%sql) of a HiveContext?

Reply via email to