Re: random versus ordered input strategies

Fernando Padilla Wed, 15 Jun 2011 12:29:11 -0700

I'm sorry to butt in, but I have a question about that "split region inhalf" growth strategy. If it's too off-topic, you can ignore it..

I know that people always complain about using HBase for monotonouslyincreasing inputs, and recommend to use a random or hashed key strategyto distribute the load across servers.

I've always wondered if you can change the "split region in half"strategy to "create new region". If such a strategy could be useful forpeople that want to have monotonously increasing inputs. It won't be asefficient as full randomly distributed keys, but might be a good enoughcompromise to keep code complexity down.

From your example: If you have an almost full region with keys 1-10.And I go to add key 11. Right not it would split in half: 1-5, 6-11;And have to incur the overhead of creating new regions and doing lots ofdata copying to create all of the new region data files, etc. And toadd further insult once that new region is close to full, we have 1-5,6-16; once 17 is added, then it will go through a region split onceagain: 1-5, 6-10, 11-16, leaving half filled regions.

What if instead, you had a nearly full region, keys 1-10. Then add key11. And because it's in a "monotonously increasing inputs" mode, itwould leave region 1-10 alone, and simply create a brand new region,11-11 (hopefully in a different region server).

I know that someone must have thought about this already, and must havegood reasons to not follow it, but I've never seen this discussed in thearchitecture docs, nor in the email list.. so just wondering... :)






On 6/15/11 10:58 AM, Chris Tarnas wrote:

There are a few ways:

1) Dynamically as data added. You start with one region and all data goes 
there. When a region grows to big, it gets split in half. So if a region had 
keys 1-10 we now have 1-5 and 5-10.

Re: random versus ordered input strategies

Reply via email to