Michael, Mitch, Silvio, Thanks!
The own directoy is the issue. We are running the Spark Notebook, which uses the same dir per server (i.e. for all notebooks). So this issue prevents us from running 2 notebooks using HiveContext. I'll look in a proper Hive installation and I'm glad to know that this dependency is gone in 2.0 Look forward to 2.1 :-) ;-) -kr, Gerard. On Thu, May 26, 2016 at 10:55 PM, Michael Armbrust <mich...@databricks.com> wrote: > You can also just make sure that each user is using their own directory. > A rough example can be found in TestHive. > > Note: in Spark 2.0 there should be no need to use HiveContext unless you > need to talk to a metastore. > > On Thu, May 26, 2016 at 1:36 PM, Mich Talebzadeh < > mich.talebza...@gmail.com> wrote: > >> Well make sure than you set up a reasonable RDBMS as metastore. Ours is >> Oracle but you can get away with others. Check the supported list in >> >> hduser@rhes564:: :/usr/lib/hive/scripts/metastore/upgrade> ltr >> total 40 >> drwxr-xr-x 2 hduser hadoop 4096 Feb 21 23:48 postgres >> drwxr-xr-x 2 hduser hadoop 4096 Feb 21 23:48 mysql >> drwxr-xr-x 2 hduser hadoop 4096 Feb 21 23:48 mssql >> drwxr-xr-x 2 hduser hadoop 4096 Feb 21 23:48 derby >> drwxr-xr-x 3 hduser hadoop 4096 May 20 18:44 oracle >> >> you have few good ones in the list. In general the base tables (without >> transactional support) are around 55 (Hive 2) and don't take much space >> (depending on the volume of tables). I attached a E-R diagram. >> >> HTH >> >> >> >> >> Dr Mich Talebzadeh >> >> >> >> LinkedIn * >> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >> >> >> >> http://talebzadehmich.wordpress.com >> >> >> >> On 26 May 2016 at 19:09, Gerard Maas <gerard.m...@gmail.com> wrote: >> >>> Thanks a lot for the advice!. >>> >>> I found out why the standalone hiveContext would not work: it was >>> trying to deploy a derby db and the user had no rights to create the dir >>> where there db is stored: >>> >>> Caused by: java.sql.SQLException: Failed to create database >>> 'metastore_db', see the next exception for details. >>> >>> at >>> org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown >>> Source) >>> >>> at >>> org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(Unknown >>> Source) >>> >>> ... 129 more >>> >>> Caused by: java.sql.SQLException: Directory >>> /usr/share/spark-notebook/metastore_db cannot be created. >>> >>> >>> Now, the new issue is that we can't start more than 1 context at the >>> same time. I think we will need to setup a proper metastore. >>> >>> >>> -kind regards, Gerard. >>> >>> >>> >>> >>> On Thu, May 26, 2016 at 3:06 PM, Mich Talebzadeh < >>> mich.talebza...@gmail.com> wrote: >>> >>>> To use HiveContext witch is basically an sql api within Spark without >>>> proper hive set up does not make sense. It is a super set of Spark >>>> SQLContext >>>> >>>> In addition simple things like registerTempTable may not work. >>>> >>>> HTH >>>> >>>> Dr Mich Talebzadeh >>>> >>>> >>>> >>>> LinkedIn * >>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>>> >>>> >>>> >>>> http://talebzadehmich.wordpress.com >>>> >>>> >>>> >>>> On 26 May 2016 at 13:01, Silvio Fiorito <silvio.fior...@granturing.com> >>>> wrote: >>>> >>>>> Hi Gerard, >>>>> >>>>> >>>>> >>>>> I’ve never had an issue using the HiveContext without a hive-site.xml >>>>> configured. However, one issue you may have is if multiple users are >>>>> starting the HiveContext from the same path, they’ll all be trying to >>>>> store >>>>> the default Derby metastore in the same location. Also, if you want them >>>>> to >>>>> be able to persist permanent table metadata for SparkSQL then you’ll want >>>>> to set up a true metastore. >>>>> >>>>> >>>>> >>>>> The other thing it could be is Hive dependency collisions from the >>>>> classpath, but that shouldn’t be an issue since you said it’s standalone >>>>> (not a Hadoop distro right?). >>>>> >>>>> >>>>> >>>>> Thanks, >>>>> >>>>> Silvio >>>>> >>>>> >>>>> >>>>> *From: *Gerard Maas <gerard.m...@gmail.com> >>>>> *Date: *Thursday, May 26, 2016 at 5:28 AM >>>>> *To: *spark users <user@spark.apache.org> >>>>> *Subject: *HiveContext standalone => without a Hive metastore >>>>> >>>>> >>>>> >>>>> Hi, >>>>> >>>>> >>>>> >>>>> I'm helping some folks setting up an analytics cluster with Spark. >>>>> >>>>> They want to use the HiveContext to enable the Window functions on >>>>> DataFrames(*) but they don't have any Hive installation, nor they need one >>>>> at the moment (if not necessary for this feature) >>>>> >>>>> >>>>> >>>>> When we try to create a Hive context, we get the following error: >>>>> >>>>> >>>>> >>>>> > val sqlContext = new >>>>> org.apache.spark.sql.hive.HiveContext(sparkContext) >>>>> >>>>> java.lang.RuntimeException: java.lang.RuntimeException: Unable to >>>>> instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient >>>>> >>>>> at >>>>> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522) >>>>> >>>>> >>>>> >>>>> Is my HiveContext failing b/c it wants to connect to an unconfigured >>>>> Hive Metastore? >>>>> >>>>> >>>>> >>>>> Is there a way to instantiate a HiveContext for the sake of Window >>>>> support without an underlying Hive deployment? >>>>> >>>>> >>>>> >>>>> The docs are explicit in saying that that is should be the case: [1] >>>>> >>>>> >>>>> >>>>> "To use a HiveContext, you do not need to have an existing Hive >>>>> setup, and all of the data sources available to aSQLContext are still >>>>> available. HiveContext is only packaged separately to avoid including >>>>> all of Hive’s dependencies in the default Spark build." >>>>> >>>>> >>>>> >>>>> So what is the right way to address this issue? How to instantiate a >>>>> HiveContext with spark running on a HDFS cluster without Hive deployed? >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> Thanks a lot! >>>>> >>>>> >>>>> >>>>> -Gerard. >>>>> >>>>> >>>>> >>>>> (*) The need for a HiveContext to use Window functions is pretty >>>>> obscure. The only documentation of this seems to be a runtime exception: >>>>> "org.apache.spark.sql.AnalysisException: >>>>> Could not resolve window function 'max'. Note that, using window functions >>>>> currently requires a HiveContext;" >>>>> >>>>> >>>>> >>>>> [1] >>>>> http://spark.apache.org/docs/latest/sql-programming-guide.html#getting-started >>>>> >>>> >>>> >>> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> > >