Thanks a lot for the advice!. I found out why the standalone hiveContext would not work: it was trying to deploy a derby db and the user had no rights to create the dir where there db is stored:
Caused by: java.sql.SQLException: Failed to create database 'metastore_db', see the next exception for details. at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source) at org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(Unknown Source) ... 129 more Caused by: java.sql.SQLException: Directory /usr/share/spark-notebook/metastore_db cannot be created. Now, the new issue is that we can't start more than 1 context at the same time. I think we will need to setup a proper metastore. -kind regards, Gerard. On Thu, May 26, 2016 at 3:06 PM, Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > To use HiveContext witch is basically an sql api within Spark without > proper hive set up does not make sense. It is a super set of Spark > SQLContext > > In addition simple things like registerTempTable may not work. > > HTH > > Dr Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > > > On 26 May 2016 at 13:01, Silvio Fiorito <silvio.fior...@granturing.com> > wrote: > >> Hi Gerard, >> >> >> >> I’ve never had an issue using the HiveContext without a hive-site.xml >> configured. However, one issue you may have is if multiple users are >> starting the HiveContext from the same path, they’ll all be trying to store >> the default Derby metastore in the same location. Also, if you want them to >> be able to persist permanent table metadata for SparkSQL then you’ll want >> to set up a true metastore. >> >> >> >> The other thing it could be is Hive dependency collisions from the >> classpath, but that shouldn’t be an issue since you said it’s standalone >> (not a Hadoop distro right?). >> >> >> >> Thanks, >> >> Silvio >> >> >> >> *From: *Gerard Maas <gerard.m...@gmail.com> >> *Date: *Thursday, May 26, 2016 at 5:28 AM >> *To: *spark users <user@spark.apache.org> >> *Subject: *HiveContext standalone => without a Hive metastore >> >> >> >> Hi, >> >> >> >> I'm helping some folks setting up an analytics cluster with Spark. >> >> They want to use the HiveContext to enable the Window functions on >> DataFrames(*) but they don't have any Hive installation, nor they need one >> at the moment (if not necessary for this feature) >> >> >> >> When we try to create a Hive context, we get the following error: >> >> >> >> > val sqlContext = new org.apache.spark.sql.hive.HiveContext(sparkContext) >> >> java.lang.RuntimeException: java.lang.RuntimeException: Unable to >> instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient >> >> at >> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522) >> >> >> >> Is my HiveContext failing b/c it wants to connect to an unconfigured >> Hive Metastore? >> >> >> >> Is there a way to instantiate a HiveContext for the sake of Window >> support without an underlying Hive deployment? >> >> >> >> The docs are explicit in saying that that is should be the case: [1] >> >> >> >> "To use a HiveContext, you do not need to have an existing Hive setup, >> and all of the data sources available to aSQLContext are still >> available. HiveContext is only packaged separately to avoid including >> all of Hive’s dependencies in the default Spark build." >> >> >> >> So what is the right way to address this issue? How to instantiate a >> HiveContext with spark running on a HDFS cluster without Hive deployed? >> >> >> >> >> >> Thanks a lot! >> >> >> >> -Gerard. >> >> >> >> (*) The need for a HiveContext to use Window functions is pretty obscure. >> The only documentation of this seems to be a runtime exception: >> "org.apache.spark.sql.AnalysisException: >> Could not resolve window function 'max'. Note that, using window functions >> currently requires a HiveContext;" >> >> >> >> [1] >> http://spark.apache.org/docs/latest/sql-programming-guide.html#getting-started >> > >