Re: HiveContext standalone => without a Hive metastore

Michael Segel Mon, 30 May 2016 12:12:59 -0700

Going from memory… Derby is/was Cloudscape which IBM acquired from Informix who 
bought the company way back when.  (Since IBM released it under Apache 
licensing, Sun Microsystems took it and created JavaDB…)


I believe that there is a networking function so that you can either bring it 
up in stand alone mode or networking mode that allows simultaneous network 
connections (multi-user). 

If not you can always go MySQL.

HTH

> On May 26, 2016, at 1:36 PM, Mich Talebzadeh <mich.talebza...@gmail.com> 
> wrote:
> 
> Well make sure than you set up a reasonable RDBMS as metastore. Ours is 
> Oracle but you can get away with others. Check the supported list in
> 
> hduser@rhes564:: :/usr/lib/hive/scripts/metastore/upgrade> ltr
> total 40
> drwxr-xr-x 2 hduser hadoop 4096 Feb 21 23:48 postgres
> drwxr-xr-x 2 hduser hadoop 4096 Feb 21 23:48 mysql
> drwxr-xr-x 2 hduser hadoop 4096 Feb 21 23:48 mssql
> drwxr-xr-x 2 hduser hadoop 4096 Feb 21 23:48 derby
> drwxr-xr-x 3 hduser hadoop 4096 May 20 18:44 oracle
> 
> you have few good ones in the list.  In general the base tables (without 
> transactional support) are around 55  (Hive 2) and don't take much space 
> (depending on the volume of tables). I attached a E-R diagram.
> 
> HTH
> 
> 
> 
> 
> Dr Mich Talebzadeh
>  
> LinkedIn  
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>  
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>
>  
> http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/>
>  
> 
> On 26 May 2016 at 19:09, Gerard Maas <gerard.m...@gmail.com 
> <mailto:gerard.m...@gmail.com>> wrote:
> Thanks a lot for the advice!. 
> 
> I found out why the standalone hiveContext would not work:  it was trying to 
> deploy a derby db and the user had no rights to create the dir where there db 
> is stored:
> 
> Caused by: java.sql.SQLException: Failed to create database 'metastore_db', 
> see the next exception for details.
> 
>        at 
> org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
> 
>        at 
> org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(Unknown
>  Source)
> 
>        ... 129 more
> 
> Caused by: java.sql.SQLException: Directory 
> /usr/share/spark-notebook/metastore_db cannot be created.
> 
> 
> 
> Now, the new issue is that we can't start more than 1 context at the same 
> time. I think we will need to setup a proper metastore.
> 
> 
> 
> -kind regards, Gerard.
> 
> 
> 
> 
> 
> On Thu, May 26, 2016 at 3:06 PM, Mich Talebzadeh <mich.talebza...@gmail.com 
> <mailto:mich.talebza...@gmail.com>> wrote:
> To use HiveContext witch is basically an sql api within Spark without proper 
> hive set up does not make sense. It is a super set of Spark SQLContext
> 
> In addition simple things like registerTempTable may not work.
> 
> HTH
> 
> Dr Mich Talebzadeh
>  
> LinkedIn  
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>  
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>
>  
> http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/>
>  
> 
> On 26 May 2016 at 13:01, Silvio Fiorito <silvio.fior...@granturing.com 
> <mailto:silvio.fior...@granturing.com>> wrote:
> Hi Gerard,
> 
>  
> 
> I’ve never had an issue using the HiveContext without a hive-site.xml 
> configured. However, one issue you may have is if multiple users are starting 
> the HiveContext from the same path, they’ll all be trying to store the 
> default Derby metastore in the same location. Also, if you want them to be 
> able to persist permanent table metadata for SparkSQL then you’ll want to set 
> up a true metastore.
> 
>  
> 
> The other thing it could be is Hive dependency collisions from the classpath, 
> but that shouldn’t be an issue since you said it’s standalone (not a Hadoop 
> distro right?).
> 
>  
> 
> Thanks,
> 
> Silvio
> 
>  
> 
> From: Gerard Maas <gerard.m...@gmail.com <mailto:gerard.m...@gmail.com>>
> Date: Thursday, May 26, 2016 at 5:28 AM
> To: spark users <user@spark.apache.org <mailto:user@spark.apache.org>>
> Subject: HiveContext standalone => without a Hive metastore
> 
>  
> 
> Hi,
> 
>  
> 
> I'm helping some folks setting up an analytics cluster with  Spark.
> 
> They want to use the HiveContext to enable the Window functions on 
> DataFrames(*) but they don't have any Hive installation, nor they need one at 
> the moment (if not necessary for this feature)
> 
>  
> 
> When we try to create a Hive context, we get the following error:
> 
>  
> 
> > val sqlContext = new org.apache.spark.sql.hive.HiveContext(sparkContext)
> 
> java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate 
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
> 
>        at 
> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
> 
>  
> 
> Is my HiveContext failing b/c it wants to connect to an unconfigured  Hive 
> Metastore?
> 
>  
> 
> Is there  a way to instantiate a HiveContext for the sake of Window support 
> without an underlying Hive deployment?
> 
>  
> 
> The docs are explicit in saying that that is should be the case: [1]
> 
>  
> 
> "To use a HiveContext, you do not need to have an existing Hive setup, and 
> all of the data sources available to aSQLContext are still available. 
> HiveContext is only packaged separately to avoid including all of Hive’s 
> dependencies in the default Spark build."
> 
>  
> 
> So what is the right way to address this issue? How to instantiate a 
> HiveContext with spark running on a HDFS cluster without Hive deployed?
> 
>  
> 
>  
> 
> Thanks a lot!
> 
>  
> 
> -Gerard.
> 
>  
> 
> (*) The need for a HiveContext to use Window functions is pretty obscure. The 
> only documentation of this seems to be a runtime exception: 
> "org.apache.spark.sql.AnalysisException: Could not resolve window function 
> 'max'. Note that, using window functions currently requires a HiveContext;"  
> 
>  
> 
> [1] 
> http://spark.apache.org/docs/latest/sql-programming-guide.html#getting-started
>  
> <http://spark.apache.org/docs/latest/sql-programming-guide.html#getting-started>
> 
> 
> <Hive2_base_tables.pdf>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org

Re: HiveContext standalone => without a Hive metastore

Reply via email to