The funny thing about tuning... What works for one situation may not work well 
for others.
Using the old recommendation of never exceeding 1000 R per RS, keeping it low 
around 100-200 and monitoring tables and changing the REgion Size on a table by 
table basis we are doing OK. 
( of course there are other nasty bugs that kill us... But that's a different 
thread...)

The point is that you need to decide what makes sense for you and what trade 
offs you can live with...

Just my two cents...

Sent from a remote device. Please excuse any typos...

Mike Segel

On Nov 2, 2011, at 9:10 PM, lars hofhansl <lhofha...@yahoo.com> wrote:

> Do we know what would need to change in HBase in order to be able to manage 
> more regions per regionserver?
> With 20 regions per server, one would need 300G regions to just utilize 6T of 
> drive space.
> 
> 
> To utilize a regionserver/datanode with 24T drive space the region size would 
> be an insane 1T.
> 
> -- Lars
> 
> ________________________________
> From: Nicolas Spiegelberg <nspiegelb...@fb.com>
> To: "user@hbase.apache.org" <user@hbase.apache.org>
> Cc: Karthik Ranganathan <kranganat...@fb.com>; Kannan Muthukkaruppan 
> <kan...@fb.com>
> Sent: Tuesday, November 1, 2011 3:57 PM
> Subject: Re: region size/count per regionserver
> 
> Simple answer
> -------------
> 20 regions/server & <2000 regions/cluster is a good rule of thumb if you
> can't profile your workload yet.  You really want to ensure that
> 
> 1) You need to limits the regions/cluster so the master can have a
> reasonable startup time & can handle all the region state transitions via
> ZK.  Most bigger companies are running 2,000 in production and achieve
> reasonable startup times (< 2 minutes for region assignment on cold
> start).  If you want to test the scalability of that algorithm beyond what
> other companies need, admin beware.
> 2) The more regions/server you have, the faster that recovery can happen
> after RS death because you can currently parallelize recovery on a
> region-granularity.  Too many regions/server and #1 starts to be a problem.
> 
> 
> 
> Complicated answer
> ------------------
> More information is optimize this formula.  Additional considerations:
> 
> 1) Are you IO-bound or CPU-bound
> 2) What is your grid topology like
> 3) What is your network hardware like
> 4) How many disks (not just size)
> 5) What is the data locality between RegionServer & DataNode
> 
> In the Facebook case, we have 5 racks with 20 nodes each.  Servers in the
> rack are connected by 1G Eth to a switch with a 10G uplink.  We are
> network bound.  Our saturation point is mostly commonly on the top-of-rack
> switch.  With 20 regions/server, we can roughly parallelize our
> distributed log splitting within a single rack on RS death (although 2
> regions do split off-rack).  This minimizes top-of-rack traffic and
> optimized our recovery time.  Even if you are CPU-bound, log splitting
> (hence recovery time) is an IO-bound operation.  A lot of our work on
> region assignment is about maximizing data locality, even on RS death, so
> we avoid top-of-rack saturation.
> 
> 
> On 11/1/11 10:54 AM, "Sujee Maniyam" <su...@sujee.net> wrote:
> 
>> HI all,
>> My HBase cluster is 10 nodes, each node has 12core ,   48G RAM, 24TB disk,
>> 10GEthernet.
>> My region size is 1GB.
>> 
>> Any guidelines on how many regions can a RS  handle comfortably?
>> I vaguely remember reading some where to have no more than 1000 regions /
>> server; that comes to 1TB / server.  Seems pretty low for the current
>> hardware config.
>> 
>> Any rules of thumb?  experiences?
>> 
>> thanks
>> Sujee
>> 
>> http://sujee.net
> 

Reply via email to