kind of found this http://hortonworks.com/blog/hbase-via-hive-part-1/
" >From a performance perspective, there are things Hive can do today (ie, not dependent on data types) to take advantage of HBase. There’s also the possibility of an HBase-aware Hive to make use of HBase tables as intermediate storage location (HIVE-3565 <https://issues.apache.org/jira/browse/HIVE-3565>), facilitating map-side joins against dimension tables loaded into HBase. Hive could make use of HBase’s natural indexed structure (HIVE-3634 <https://issues.apache.org/jira/browse/HIVE-3634>, HIVE-3727 <https://issues.apache.org/jira/browse/HIVE-3727>), potentially saving huge scans. Currently, the user doesn’t have (any?) control over the scans which are executed. Configuration on a per-job, or at least per-table basis should be enabled (HIVE-1233 <https://issues.apache.org/jira/browse/HIVE-1233>). That would enable an HBase-savy user to provide Hive with hints regarding how it should interact with HBase. Support for simple split sampling of HBase tables ( HIVE-3399 <https://issues.apache.org/jira/browse/HIVE-3399>) could also be easily done because HBase manages table partitions already. On Thu, Jul 24, 2014 at 2:03 PM, Yang <teddyyyy...@gmail.com> wrote: > if I do a join of a table based on txt file and a table based on HBase, > and say the latter is very large, is HIVE smart enough to utilize the HBase > table's index to do the join, instead of implementing this as a regular map > reduce job, where each table is scanned fully, bucketed on join keys, and > then the matching items found out through the reducer? > > > thanks > Yang >