On Sep 11, 2012, at 7:00 AM, bharath vissapragada wrote: > Hey, > > Hive does all kinds of parsing , metadata lookups, query tree building and > stuff before executing the query. Not sure if this all was included in those > 36 seconds ! > > Also what hive does is, it builds a scan object with ranges based on > predicates (and mappers too ) on key column and not a direct "get" call as in > hbase shell. This might incur some overhead too!
Since Hive does this in a MapReduce job it definitely incurs overhead. It does not run directly against HBase as you might wish it did here. Alan. > > On Tue, Sep 11, 2012 at 7:10 PM, Shengjie Min <kelvin....@gmail.com> wrote: > Hi, > > I am trying to get hive working on top of my hbase table following the guide > below: > https://cwiki.apache.org/Hive/hbaseintegration.html > > CREATE EXTERNAL TABLE hive_hbase_test (key string, a string, b string, c > string) > STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' > WITH SERDEPROPERTIES > ("hbase.columns.mapping"=":key,cf:a,cf:b,cf:c") TBLPROPERTIES > ("hbase.table.name"="test"); > > this hive table creation makes my mapping roughly look like this: > > hive_hbase_test VS test > Hive key - hbase row_key > Hive column a - hbase cf:a > Hive column b - hbase cf:b > Hive column c - hbase cf:c > > From my understanding on how HBaseStorageHandler works, it's supposed to take > advantage of the hbase row_key index as much as possible. So I would expect, > > 1. if you do a hive query against the row key like "select * from > hive_hbase_test where key='blabla'", this would utilize the hbase row_key > index which give you very quick nearly real-time response just like hbase > does. > > 2. of coz, if you do a hive query against a column like "select * from > hive_hbase_test where a='blabla'", in this case, it queries against a > specific column, it probably uses mapred because there is nothing from Hbase > side can be utilized. > > From my test, query 1 doesn't seem fast at all, still taking ages, so > select * from hive_hbase_test where key='blabla' 36secs > vs > get 'test', 'blabla' less than 1 sec > still shows a huge difference. > > Anybody has tried this before? Is there anyway I can do sort of query plan > analysis against hive query? or I am not mapping hive table against hbase > table correctly? > > -- > All the best, > Shengjie Min > > > > > -- > Regards, > Bharath .V > w:http://researchweb.iiit.ac.in/~bharath.v