Re: Questions about Hive

Tim Robertson Sun, 16 Sep 2012 22:51:33 -0700

>
> Note:  I am a newbie to Hive.
>
> Can someone please answer the following questions?
>
> 1)  Does Hive provide APIs (like HBase does) that can be used to retrieve
> data from the tables in Hive from a Java program?  I heard somewhere that
> the data can be accessed with JDBC (style) APIs.  True?
>


True.
https://cwiki.apache.org/Hive/hiveclient.html#HiveClient-JDBC


> 2)  I don't see how I can add indexes on the tables, so does that mean a
> query such as the following will trigger a MR job that will search files on
> HDFS sequentially?
>
> hive> SELECT a.foo FROM invites a WHERE a.ds='2008-08-15';
>
>
There are some index implementations in hive, but it is not as simple as a
traditional db.
E.g. Search Jira and see some of the work:
https://issues.apache.org/jira/browse/HIVE-417

You are correct that the above would do a full table scan

3)  Has anyone compared performance of Hive against other NOSQL databases
> such as HBase, MongoDB.  I understand it's not exactly apples to apples
> comparison, but still...
>

I think you misunderstand what Hive is.  It is a basically a SQL to MR
translation engine, which has adapters for the input source.  By default it
uses simple files on the HDFS, but there is (e.g.) HBase adapters, so you
can use it to run SQL on HBase tables for example (which works great).
 Regarding performance, on the HBase scans, the operation is the same as
running a normal HBase MR scan, so is the same.


>
> Thanks.

Re: Questions about Hive

Reply via email to