On Sep 11, 2012, at 7:00 AM, bharath vissapragada wrote:

> Hey,
> 
> Hive does all kinds of parsing , metadata lookups, query tree building and 
> stuff before executing the query. Not sure if this all was included in those 
> 36 seconds ! 
> 
> Also what hive does is, it builds a scan object with ranges based on 
> predicates (and mappers too ) on key column and not a direct "get" call as in 
> hbase shell. This might incur some overhead too!

Since Hive does this in a MapReduce job it definitely incurs overhead.  It does 
not run directly against HBase as you might wish it did here.

Alan.

> 
> On Tue, Sep 11, 2012 at 7:10 PM, Shengjie Min <kelvin....@gmail.com> wrote:
> Hi,
> 
> I am trying to get hive working on top of my hbase table following the guide 
> below:
> https://cwiki.apache.org/Hive/hbaseintegration.html
> 
> CREATE EXTERNAL TABLE hive_hbase_test (key string, a string, b string, c 
> string)
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> WITH SERDEPROPERTIES
> ("hbase.columns.mapping"=":key,cf:a,cf:b,cf:c") TBLPROPERTIES 
> ("hbase.table.name"="test");
> 
> this hive table creation makes my mapping roughly look like this:
> 
> hive_hbase_test  VS   test
> Hive key  -   hbase row_key
> Hive column a -  hbase cf:a
> Hive column b  -  hbase cf:b
> Hive column c  -  hbase cf:c
> 
> From my understanding on how HBaseStorageHandler works, it's supposed to take 
> advantage of the hbase row_key index as much as possible. So I would expect, 
> 
> 1. if you do a hive query against the row key like "select * from 
> hive_hbase_test where key='blabla'", this would utilize the hbase row_key 
> index which give you very quick nearly real-time response just like hbase 
> does.
> 
> 2. of coz, if you do a hive query against a column like "select * from 
> hive_hbase_test where a='blabla'", in this case, it queries against a 
> specific column, it probably uses mapred because there is nothing from Hbase 
> side can be utilized.
> 
> From my test, query 1 doesn't seem fast at all, still taking ages, so 
> select * from hive_hbase_test where key='blabla'   36secs
> vs
> get 'test', 'blabla'      less than 1 sec
> still shows a huge difference.
> 
> Anybody has tried this before? Is there anyway I can do sort of query plan 
> analysis against hive query? or I am not mapping hive table against hbase 
> table correctly?
> 
> -- 
> All the best,
> Shengjie Min
> 
> 
> 
> 
> -- 
> Regards,
> Bharath .V
> w:http://researchweb.iiit.ac.in/~bharath.v

Reply via email to