Michael,  Mitch, Silvio,

Thanks!

The own directoy is the issue. We are running the Spark Notebook, which
uses the same dir per server (i.e. for all notebooks). So this issue
prevents us from running 2 notebooks using HiveContext.
I'll look in a proper Hive installation and I'm glad to know that this
dependency is gone in 2.0
Look forward to 2.1 :-) ;-)

-kr, Gerard.


On Thu, May 26, 2016 at 10:55 PM, Michael Armbrust <mich...@databricks.com>
wrote:

> You can also just make sure that each user is using their own directory.
> A rough example can be found in TestHive.
>
> Note: in Spark 2.0 there should be no need to use HiveContext unless you
> need to talk to a metastore.
>
> On Thu, May 26, 2016 at 1:36 PM, Mich Talebzadeh <
> mich.talebza...@gmail.com> wrote:
>
>> Well make sure than you set up a reasonable RDBMS as metastore. Ours is
>> Oracle but you can get away with others. Check the supported list in
>>
>> hduser@rhes564:: :/usr/lib/hive/scripts/metastore/upgrade> ltr
>> total 40
>> drwxr-xr-x 2 hduser hadoop 4096 Feb 21 23:48 postgres
>> drwxr-xr-x 2 hduser hadoop 4096 Feb 21 23:48 mysql
>> drwxr-xr-x 2 hduser hadoop 4096 Feb 21 23:48 mssql
>> drwxr-xr-x 2 hduser hadoop 4096 Feb 21 23:48 derby
>> drwxr-xr-x 3 hduser hadoop 4096 May 20 18:44 oracle
>>
>> you have few good ones in the list.  In general the base tables (without
>> transactional support) are around 55  (Hive 2) and don't take much space
>> (depending on the volume of tables). I attached a E-R diagram.
>>
>> HTH
>>
>>
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> On 26 May 2016 at 19:09, Gerard Maas <gerard.m...@gmail.com> wrote:
>>
>>> Thanks a lot for the advice!.
>>>
>>> I found out why the standalone hiveContext would not work:  it was
>>> trying to deploy a derby db and the user had no rights to create the dir
>>> where there db is stored:
>>>
>>> Caused by: java.sql.SQLException: Failed to create database
>>> 'metastore_db', see the next exception for details.
>>>
>>>        at
>>> org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown
>>> Source)
>>>
>>>        at
>>> org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(Unknown
>>> Source)
>>>
>>>        ... 129 more
>>>
>>> Caused by: java.sql.SQLException: Directory
>>> /usr/share/spark-notebook/metastore_db cannot be created.
>>>
>>>
>>> Now, the new issue is that we can't start more than 1 context at the
>>> same time. I think we will need to setup a proper metastore.
>>>
>>>
>>> -kind regards, Gerard.
>>>
>>>
>>>
>>>
>>> On Thu, May 26, 2016 at 3:06 PM, Mich Talebzadeh <
>>> mich.talebza...@gmail.com> wrote:
>>>
>>>> To use HiveContext witch is basically an sql api within Spark without
>>>> proper hive set up does not make sense. It is a super set of Spark
>>>> SQLContext
>>>>
>>>> In addition simple things like registerTempTable may not work.
>>>>
>>>> HTH
>>>>
>>>> Dr Mich Talebzadeh
>>>>
>>>>
>>>>
>>>> LinkedIn * 
>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>
>>>>
>>>>
>>>> http://talebzadehmich.wordpress.com
>>>>
>>>>
>>>>
>>>> On 26 May 2016 at 13:01, Silvio Fiorito <silvio.fior...@granturing.com>
>>>> wrote:
>>>>
>>>>> Hi Gerard,
>>>>>
>>>>>
>>>>>
>>>>> I’ve never had an issue using the HiveContext without a hive-site.xml
>>>>> configured. However, one issue you may have is if multiple users are
>>>>> starting the HiveContext from the same path, they’ll all be trying to 
>>>>> store
>>>>> the default Derby metastore in the same location. Also, if you want them 
>>>>> to
>>>>> be able to persist permanent table metadata for SparkSQL then you’ll want
>>>>> to set up a true metastore.
>>>>>
>>>>>
>>>>>
>>>>> The other thing it could be is Hive dependency collisions from the
>>>>> classpath, but that shouldn’t be an issue since you said it’s standalone
>>>>> (not a Hadoop distro right?).
>>>>>
>>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Silvio
>>>>>
>>>>>
>>>>>
>>>>> *From: *Gerard Maas <gerard.m...@gmail.com>
>>>>> *Date: *Thursday, May 26, 2016 at 5:28 AM
>>>>> *To: *spark users <user@spark.apache.org>
>>>>> *Subject: *HiveContext standalone => without a Hive metastore
>>>>>
>>>>>
>>>>>
>>>>> Hi,
>>>>>
>>>>>
>>>>>
>>>>> I'm helping some folks setting up an analytics cluster with  Spark.
>>>>>
>>>>> They want to use the HiveContext to enable the Window functions on
>>>>> DataFrames(*) but they don't have any Hive installation, nor they need one
>>>>> at the moment (if not necessary for this feature)
>>>>>
>>>>>
>>>>>
>>>>> When we try to create a Hive context, we get the following error:
>>>>>
>>>>>
>>>>>
>>>>> > val sqlContext = new
>>>>> org.apache.spark.sql.hive.HiveContext(sparkContext)
>>>>>
>>>>> java.lang.RuntimeException: java.lang.RuntimeException: Unable to
>>>>> instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
>>>>>
>>>>>        at
>>>>> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
>>>>>
>>>>>
>>>>>
>>>>> Is my HiveContext failing b/c it wants to connect to an unconfigured
>>>>>  Hive Metastore?
>>>>>
>>>>>
>>>>>
>>>>> Is there  a way to instantiate a HiveContext for the sake of Window
>>>>> support without an underlying Hive deployment?
>>>>>
>>>>>
>>>>>
>>>>> The docs are explicit in saying that that is should be the case: [1]
>>>>>
>>>>>
>>>>>
>>>>> "To use a HiveContext, you do not need to have an existing Hive
>>>>> setup, and all of the data sources available to aSQLContext are still
>>>>> available. HiveContext is only packaged separately to avoid including
>>>>> all of Hive’s dependencies in the default Spark build."
>>>>>
>>>>>
>>>>>
>>>>> So what is the right way to address this issue? How to instantiate a
>>>>> HiveContext with spark running on a HDFS cluster without Hive deployed?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Thanks a lot!
>>>>>
>>>>>
>>>>>
>>>>> -Gerard.
>>>>>
>>>>>
>>>>>
>>>>> (*) The need for a HiveContext to use Window functions is pretty
>>>>> obscure. The only documentation of this seems to be a runtime exception: 
>>>>> "org.apache.spark.sql.AnalysisException:
>>>>> Could not resolve window function 'max'. Note that, using window functions
>>>>> currently requires a HiveContext;"
>>>>>
>>>>>
>>>>>
>>>>> [1]
>>>>> http://spark.apache.org/docs/latest/sql-programming-guide.html#getting-started
>>>>>
>>>>
>>>>
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>
>

Reply via email to