Re: HBase Region/Table Hotspotting

Ted Yu Mon, 11 Feb 2013 06:56:12 -0800

Sub-region management is in experimental stage.
We will get better idea when HBASE-7667 gets in-depth review and more
cluster-level testing is done.


You can watch HBASE-7667 so that you get updates.

Cheers

On Sun, Feb 10, 2013 at 9:56 PM, Joarder KAMAL <joard...@gmail.com> wrote:

> Thanks Lars for explaining the reasons for hotspotting and key design
> techniques.
> Just wondering, is it possible to alter key design (e.g. from sequential
> keys to salt keys) at run time in the production system? What are the
> impacts?
>
> To Ted,
> Thanks a lot for point out at [HBASE-7667]. Interesting idea indeed. And
> Matt Corgan explained the trade-offs between having fewer and more regions.
> He also pointed out how a large number of regions can impact the compaction
> process. Although I am an expert on HBase system, but what did you think
> about how to find an optimal value of stripes or sub-region for each
> region? Actually I didn't get the idea of having a fixed boundary stripes.
>
> Thanks again.
> HBase community is really great !!
>
>
>
> Regards,
> Joarder Kamal
>
>
>
> On 11 February 2013 16:14, lars hofhansl <la...@apache.org> wrote:
>
> > The most common cause for hotspotting is inserting rows with
> monotonically
> > increasing row keys.
> > In that case only the last region will get the writes and no amount of
> > splitting will fix that (only one region serer will hold the last region
> of
> > the table regardless of how small it is).
> > There are ways around this. If you generate keys make sure they are not
> > monotonically increasing. For example if you do not care about the sort
> > order of the keys w.r.t. to each other you could reverse the bytes before
> > you use them as row key. Another option is to prefix the key with a hash
> of
> > the key (but then you loose the ability to do range scan across keys).
> >
> > If you still need to scan rows according to their sort order you can
> > "salt" (as some call it) the key by prefix it with a limited number of
> > random single digit (maybe 5-10 different numbers). Could also do a mod
> of
> > the key. Each scan then has to issue multiple scans in parallel for each
> of
> > the possible prefix numbers.
> > (In fact that is a pretty effective way to avoid hotspotting and to
> > parallelize your scans, but it needs some client side to reconcile the
> > parallel scans).
> >
> > Another reason for hotspotting is inserting new versions a of small'ish
> > set of row keys. In that case splitting might help, because it will
> > increase the likelyhood of all those key falling into the same region.
> >
> >
> > -- Lars
> >
> >
> >
> > ________________________________
> >  From: Joarder KAMAL <joard...@gmail.com>
> > To: user@hbase.apache.org; d...@hbase.apache.org
> > Sent: Sunday, February 10, 2013 6:17 PM
> > Subject: HBase Region/Table Hotspotting
> >
> > This is my first email in the group. I am having a more general and
> > open-ended question but hope to get some reasoning from the HBase user
> > communities.
> > I am a very basic HBase user and still learning. My intention to use
> HBase
> > in one of our research project. Recently I was looking through Lars
> > George's book "HBase - The Definitive Guide" and two particular topics
> > caught my eyes. One is 'Region and Table Hotspotting' and the other is
> > 'Region Auto-Sharding and Merging'.
> >
> > *Scenario: *
> > If a hotspot is created in a particular region or in a table (having
> > multiple regions) due to sudden workload change, then one may split the
> > region into further small pieces and distributed it to a number of
> > available physical machine in the cluster. This process should require
> > large data transfer between different machines in the cluster and incur a
> > performance cost. One may also change the 'key' definition and manage the
> > regions. But I am not sure how effective or logical to change key designs
> > on a production system.
> >
> > *Questions:*
> >
> >    1. How often you are facing Region or Table Hotspotting in HBase
> >    production systems?
> >    2. If a hotspot is created, how quickly it is automatically cleared
> out
> >    (assuming sudden workload change)?
> >    3. How often this kind of situation happens - A hotspot is detected
> and
> >    vanished out before taking an action? or hotspots stays longer period
> of
> >    time?
> >    4. Or if the hotspot is stays, how it is handled (in general) in
> >    production system?
> >    5. How large data transfer cost is minimized or avoid for re-sharding
> >    regions within a cluster in a single data center or within WAN?
> >    6. Is hotspoting in HBase cluster is really a issue (big!) nowadays
> for
> >    OLAP workloads and real-time analytics?
> >
> >
> > Further directions to more information about region/table hotspotting is
> > most welcome.
> >
> > Many thanks in advance.
> >
> > Regards,
> > Joarder Kamal
> >
>

Re: HBase Region/Table Hotspotting

Reply via email to