Hello,
There is middle term betwen sequecial keys (hot spoting risk) and md5
(heavy scan):
* you can use composed keys with a field that can segregate data
(hostname, productname, metric name) like OpenTSDB
* or use Salt with a limited number of values (example
substr(md5(rowid),0,1) = 16 values)
so that a scan is a combination of 16 filters on on each salt values
you can base your code on HBaseWD by sematext
http://blog.sematext.com/2012/04/09/hbasewd-avoid-regionserver-hotspotting-despite-writing-records-with-sequential-keys/
https://github.com/sematext/HBaseWD
Cheers,
2012/12/18 bigdata <[email protected]>
> Many articles tell me that MD5 rowkey or part of it is good method to
> balance the records stored in different parts. But If I want to search some
> sequential rowkey records, such as date as rowkey or partially. I can not
> use rowkey filter to scan a range of date value one time on the date by
> MD5. How to balance this issue?
> Thanks.
>
>
--
Damien HARDY