Re: Questions about Hive

Tim Robertson Mon, 17 Sep 2012 00:16:24 -0700

I don't think Hive is intended for web request scoped operations... that
would be a rather unusual case from my understanding.


HBase sounds more like the Hadoop equivalent that you might be looking for,
but you need to look at your search patterns to see if HBase is a good fit
(you need to manage your own indexes again).

Cheers,
Tim


On Mon, Sep 17, 2012 at 8:07 AM, Something Something <
mailinglist...@gmail.com> wrote:

> Thank you both for the answers.  We are trying to find out if Hive can be
> used as a replacement of Netezza, but if there are no indexes then I don't
> see how it will beat Netezza in terms of performance.  Sounds like it
> certainly can't be used to do a quick lookup from a webapp - like Netezza
> can.
>
> If performance isn't a concern, then I guess it could be a useful tool.
> Will try it out & see how it works out.  Thanks.
>
>
>
> On Sun, Sep 16, 2012 at 10:51 PM, Tim Robertson <timrobertson...@gmail.com
> > wrote:
>
>> Note:  I am a newbie to Hive.
>>>
>>> Can someone please answer the following questions?
>>>
>>> 1)  Does Hive provide APIs (like HBase does) that can be used to
>>> retrieve data from the tables in Hive from a Java program?  I heard
>>> somewhere that the data can be accessed with JDBC (style) APIs.  True?
>>>
>>
>> True.
>> https://cwiki.apache.org/Hive/hiveclient.html#HiveClient-JDBC
>>
>>
>>> 2)  I don't see how I can add indexes on the tables, so does that mean a
>>> query such as the following will trigger a MR job that will search files on
>>> HDFS sequentially?
>>>
>>> hive> SELECT a.foo FROM invites a WHERE a.ds='2008-08-15';
>>>
>>>
>> There are some index implementations in hive, but it is not as simple as
>> a traditional db.
>> E.g. Search Jira and see some of the work:
>> https://issues.apache.org/jira/browse/HIVE-417
>>
>> You are correct that the above would do a full table scan
>>
>> 3)  Has anyone compared performance of Hive against other NOSQL databases
>>> such as HBase, MongoDB.  I understand it's not exactly apples to apples
>>> comparison, but still...
>>>
>>
>> I think you misunderstand what Hive is.  It is a basically a SQL to MR
>> translation engine, which has adapters for the input source.  By default it
>> uses simple files on the HDFS, but there is (e.g.) HBase adapters, so you
>> can use it to run SQL on HBase tables for example (which works great).
>>  Regarding performance, on the HBase scans, the operation is the same as
>> running a normal HBase MR scan, so is the same.
>>
>>
>>>
>>> Thanks.
>>
>>
>>
>

Re: Questions about Hive

Reply via email to