hi, folks,

I am using the HBaseintergration feature from hive (
https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration) to load
TPCH tables into HBase. Hive 0.13 and HBase 0.98.6.

The load works well. However, as documented here:
https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration#HBaseIntegration-KeyUniqueness.


The key uniqueness prevents me from loading all 'lineitem' rows. As
'lineitem' table is using "L_ORDERKEY, L_LINENUMBER" as compound primary
key. If I only mapped to 'L_ORDERKEY" as hbase key(aka, row #). Many rows
will get overwritten.

Any suggestion? someone on this list must go through this already. :-).
Thanks

BTW, here is my hive ddl.

create table hbase_lineitem( *l_orderkey bigint*, l_partkey bigint,
l_suppkey int, l_linenumber  bigint, l_quantity  double, l_extendedprice
double, l_discount  double, l_tax  double, l_returnflag  string,
l_linestatus  string, l_shipdate  string, l_commitdate  string,
l_receiptdate  string, l_shipinstruct  string, l_shipmode  string,
l_comment  string ) STORED BY
'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES
("hbase.columns.mapping"* = ":key*,l_partkey:val,l_suppkey:val,
l_linenumber:val, l_quantity:val, l_extendedprice:val, l_discount:val,
l_tax:val, l_returnflag:val, l_linestatus:val, l_shipdate:val,
l_commitdate:val, l_receiptdate:val, l_shipinstruct:val, l_shipmode:val,
l_comment:val ") TBLPROPERTIES ("hbase.table.name" = "lineitem");


insert overwrite table hbase_lineitem select * from lineitem;

Demai

Reply via email to