Sub-region management is in experimental stage. We will get better idea when HBASE-7667 gets in-depth review and more cluster-level testing is done.
You can watch HBASE-7667 so that you get updates. Cheers On Sun, Feb 10, 2013 at 9:56 PM, Joarder KAMAL <joard...@gmail.com> wrote: > Thanks Lars for explaining the reasons for hotspotting and key design > techniques. > Just wondering, is it possible to alter key design (e.g. from sequential > keys to salt keys) at run time in the production system? What are the > impacts? > > To Ted, > Thanks a lot for point out at [HBASE-7667]. Interesting idea indeed. And > Matt Corgan explained the trade-offs between having fewer and more regions. > He also pointed out how a large number of regions can impact the compaction > process. Although I am an expert on HBase system, but what did you think > about how to find an optimal value of stripes or sub-region for each > region? Actually I didn't get the idea of having a fixed boundary stripes. > > Thanks again. > HBase community is really great !! > > > > Regards, > Joarder Kamal > > > > On 11 February 2013 16:14, lars hofhansl <la...@apache.org> wrote: > > > The most common cause for hotspotting is inserting rows with > monotonically > > increasing row keys. > > In that case only the last region will get the writes and no amount of > > splitting will fix that (only one region serer will hold the last region > of > > the table regardless of how small it is). > > There are ways around this. If you generate keys make sure they are not > > monotonically increasing. For example if you do not care about the sort > > order of the keys w.r.t. to each other you could reverse the bytes before > > you use them as row key. Another option is to prefix the key with a hash > of > > the key (but then you loose the ability to do range scan across keys). > > > > If you still need to scan rows according to their sort order you can > > "salt" (as some call it) the key by prefix it with a limited number of > > random single digit (maybe 5-10 different numbers). Could also do a mod > of > > the key. Each scan then has to issue multiple scans in parallel for each > of > > the possible prefix numbers. > > (In fact that is a pretty effective way to avoid hotspotting and to > > parallelize your scans, but it needs some client side to reconcile the > > parallel scans). > > > > Another reason for hotspotting is inserting new versions a of small'ish > > set of row keys. In that case splitting might help, because it will > > increase the likelyhood of all those key falling into the same region. > > > > > > -- Lars > > > > > > > > ________________________________ > > From: Joarder KAMAL <joard...@gmail.com> > > To: user@hbase.apache.org; d...@hbase.apache.org > > Sent: Sunday, February 10, 2013 6:17 PM > > Subject: HBase Region/Table Hotspotting > > > > This is my first email in the group. I am having a more general and > > open-ended question but hope to get some reasoning from the HBase user > > communities. > > I am a very basic HBase user and still learning. My intention to use > HBase > > in one of our research project. Recently I was looking through Lars > > George's book "HBase - The Definitive Guide" and two particular topics > > caught my eyes. One is 'Region and Table Hotspotting' and the other is > > 'Region Auto-Sharding and Merging'. > > > > *Scenario: * > > If a hotspot is created in a particular region or in a table (having > > multiple regions) due to sudden workload change, then one may split the > > region into further small pieces and distributed it to a number of > > available physical machine in the cluster. This process should require > > large data transfer between different machines in the cluster and incur a > > performance cost. One may also change the 'key' definition and manage the > > regions. But I am not sure how effective or logical to change key designs > > on a production system. > > > > *Questions:* > > > > 1. How often you are facing Region or Table Hotspotting in HBase > > production systems? > > 2. If a hotspot is created, how quickly it is automatically cleared > out > > (assuming sudden workload change)? > > 3. How often this kind of situation happens - A hotspot is detected > and > > vanished out before taking an action? or hotspots stays longer period > of > > time? > > 4. Or if the hotspot is stays, how it is handled (in general) in > > production system? > > 5. How large data transfer cost is minimized or avoid for re-sharding > > regions within a cluster in a single data center or within WAN? > > 6. Is hotspoting in HBase cluster is really a issue (big!) nowadays > for > > OLAP workloads and real-time analytics? > > > > > > Further directions to more information about region/table hotspotting is > > most welcome. > > > > Many thanks in advance. > > > > Regards, > > Joarder Kamal > > >