Hi, I am trying to get hive working on top of my hbase table following the guide below: https://cwiki.apache.org/Hive/hbaseintegration.html
CREATE EXTERNAL TABLE hive_hbase_test (key string, a string, b string, c string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping"=":key,cf:a,cf:b,cf:c") TBLPROPERTIES (" hbase.table.name"="test"); this hive table creation makes my mapping roughly look like this: hive_hbase_test VS test Hive key - hbase row_key Hive column a - hbase cf:a Hive column b - hbase cf:b Hive column c - hbase cf:c >From my understanding on how HBaseStorageHandler works, it's supposed to take advantage of the hbase row_key index as much as possible. So I would expect, 1. if you do a hive query against the row key like "select * from hive_hbase_test where key='blabla'", this would utilize the hbase row_key index which give you very quick nearly real-time response just like hbase does. 2. of coz, if you do a hive query against a column like "select * from hive_hbase_test where a='blabla'", in this case, it queries against a specific column, it probably uses mapred because there is nothing from Hbase side can be utilized. >From my test, query 1 doesn't seem fast at all, still taking ages, so select * from hive_hbase_test where key='blabla' 36secs vs get 'test', 'blabla' less than 1 sec still shows a huge difference. Anybody has tried this before? Is there anyway I can do sort of query plan analysis against hive query? or I am not mapping hive table against hbase table correctly?