Thank you both for the answers. We are trying to find out if Hive can be used as a replacement of Netezza, but if there are no indexes then I don't see how it will beat Netezza in terms of performance. Sounds like it certainly can't be used to do a quick lookup from a webapp - like Netezza can.
If performance isn't a concern, then I guess it could be a useful tool. Will try it out & see how it works out. Thanks. On Sun, Sep 16, 2012 at 10:51 PM, Tim Robertson <timrobertson...@gmail.com>wrote: > Note: I am a newbie to Hive. >> >> Can someone please answer the following questions? >> >> 1) Does Hive provide APIs (like HBase does) that can be used to retrieve >> data from the tables in Hive from a Java program? I heard somewhere that >> the data can be accessed with JDBC (style) APIs. True? >> > > True. > https://cwiki.apache.org/Hive/hiveclient.html#HiveClient-JDBC > > >> 2) I don't see how I can add indexes on the tables, so does that mean a >> query such as the following will trigger a MR job that will search files on >> HDFS sequentially? >> >> hive> SELECT a.foo FROM invites a WHERE a.ds='2008-08-15'; >> >> > There are some index implementations in hive, but it is not as simple as a > traditional db. > E.g. Search Jira and see some of the work: > https://issues.apache.org/jira/browse/HIVE-417 > > You are correct that the above would do a full table scan > > 3) Has anyone compared performance of Hive against other NOSQL databases >> such as HBase, MongoDB. I understand it's not exactly apples to apples >> comparison, but still... >> > > I think you misunderstand what Hive is. It is a basically a SQL to MR > translation engine, which has adapters for the input source. By default it > uses simple files on the HDFS, but there is (e.g.) HBase adapters, so you > can use it to run SQL on HBase tables for example (which works great). > Regarding performance, on the HBase scans, the operation is the same as > running a normal HBase MR scan, so is the same. > > >> >> Thanks. > > >