Hello,
I’m trying to install Zeppelin (0.7.2) on my CDH cluster, and I am unable to
connect the sql + graphical representations of the %sql interpreter with my
Hive data, and more surprisingly I really can’t find any good source on the
internet (apache zeppelin documentation or stack overflow) that gives a
practical answer about how to do this.
Most of the time, the data comes from compressed Hive tables and not plain hdfs
text files ; so using a hive context is far more convenient than a plain spark
sql context.
The following :
%spark
val hc = new org.apache.spark.sql.hive.HiveContext(sc)
val result = hc.sql("select * from hivedb.hivetable")
result.registerTempTable("myTest")
works but no myTest table is available in the following %sql interpreter :
%sql
select * from myTest
org.apache.spark.sql.AnalysisException: Table not found: myTest;
However the following :
%pyspark
result = sqlContext.read.text("hdfs://cluster/test.txt")
result.registerTempTable("mySqlTest")
works as the %sql interpreter is “plugged” to the sqlContext
but
result = sqlContext.sql("select * from hivedb.hivetable") does not work as the
sqlContext is not a hive context.
I have set zeppelin.spark.useHiveContext to true, but it seems to have no
effect (btw, it was more of a wild guess since the documentation is not giving
much detail on parameters and context configuration)
Can you direct me towards how to configure the context used by the %sql
interpreter?
Best regards,
Arnaud
PS : %spark and %sql interpreter conf:
master yarn-client
spark.app.name Zeppelin
spark.cores.max
spark.executor.memory 5g
zeppelin.R.cmd R
zeppelin.R.image.width 100%
zeppelin.R.knitr true
zeppelin.R.render.options out.format = 'html', comment = NA, echo = FALSE,
results = 'asis', message = F, warning = F
zeppelin.dep.additionalRemoteRepository
spark-packages,http://dl.bintray.com/spark-packages/maven,false;
zeppelin.dep.localrepo local-repo
zeppelin.interpreter.localRepo /opt/zeppelin/local-repo/2CYVF45A9
zeppelin.interpreter.output.limit 102400
zeppelin.pyspark.python /usr/bin/pyspark
zeppelin.spark.concurrentSQL true
zeppelin.spark.importImplicit true
zeppelin.spark.maxResult 1000
zeppelin.spark.printREPLOutput true
zeppelin.spark.sql.stacktrace true
zeppelin.spark.useHiveContext true
________________________________
L'intégrité de ce message n'étant pas assurée sur internet, la société
expéditrice ne peut être tenue responsable de son contenu ni de ses pièces
jointes. Toute utilisation ou diffusion non autorisée est interdite. Si vous
n'êtes pas destinataire de ce message, merci de le détruire et d'avertir
l'expéditeur.
The integrity of this message cannot be guaranteed on the Internet. The company
that sent this message cannot therefore be held liable for its content nor
attachments. Any unauthorized use or dissemination is prohibited. If you are
not the intended recipient of this message, then please delete it and notify
the sender.