Re: Region loadbalancing

Lars George Tue, 14 Dec 2010 07:17:38 -0800

Hi Jan,

Any day now!


Really, there just a few little road bumps but nothing major ad once
they are resolved it will be released. Just rushing it for the sake of
releasing it will not make anyone happy (if we find issues right away
just afterwards). Please bear with us!

Lars

On Tue, Dec 14, 2010 at 10:20 AM, Jan Lukavský
<jan.lukav...@firma.seznam.cz> wrote:
> Hi Daniel,
>
> I thought that version 0.90.0 would have major rewrites in this area, could
> you give a rough estimate when the new version will be out?
>
> Thanks,
>  Jan
>
> On 13.12.2010 20:43, Jean-Daniel Cryans wrote:
>>
>> Hi Jan,
>>
>> That area of HBase was reworked a lot in the upcoming 0.90.0 and
>> region opening and closing can now be done in parallel for multiple
>> regions.
>>
>> Also, the balancer works differently and may not even assign a single
>> region to a new region server (or a dead one that was restarted) until
>> the balancer runs (it's now every 5 minutes).
>>
>> Those behaviors are completely new, so it will probably need better
>> tuning, and there's still a lot to do regarding region balancing in
>> general, but it's probably worth trying it out.
>>
>> Regarding limiting the number of regions, you probably want to use LZO
>> (99% of the time it's faster for your tables) and set MAX_FILESIZE to
>> something like 1GB since the default is pretty low.
>>
>> Maybe your new config would be useful too in the new master, I have to
>> give it more thoughts.
>>
>> J-D
>>
>> On Mon, Dec 13, 2010 at 8:36 AM, Jan Lukavský
>> <jan.lukav...@firma.seznam.cz>  wrote:
>>>
>>> Hi all,
>>>
>>> we are using HBase 0.20.6 on a cluster of about 25 nodes with about 30k
>>> regions and are experiencing as issue which causes running  M/R jobs to
>>> fail.
>>> When we restart single RegionServer, then happens the following:
>>>  1) all regions of that RS get reassigned to remaing (say 24) nodes
>>>  2) when the restarted RegionServer comes up, HMaster closes about 60
>>> regions on all 24 nodes and assigns them back to the restarted node
>>>
>>> Now, the step 1) is usually very quick (if we can assign 10 regions per
>>> heartbeat, we have 240 regions per heartbeat on the whole cluster).
>>> The step 2) seems problematic, because first about 1200 regions get
>>> unassigned, and then they get slowly assigned to the single RS (speed
>>> again
>>> 10 regions per heartbeat). This time causes clients of Maps connected to
>>> the
>>> regions to throw RetriesExhaustedException.
>>>
>>> I'm aware that we can limit number of regions closed per RegionServer
>>> heartbeat by hbase.regions.close.max, but this config option seems a bit
>>> unsatisfactory, because as we increase size of the cluster, we will get
>>> more
>>> and more regions unassigned in single cluster heartbeat (say we limit
>>> this
>>> to 1, then we get 24 unassigned regions, but only 10 assigned per
>>> heartbeat). This led us to a solution, which seems quite simple. We have
>>> introduced new config option which is used to limit number of regions in
>>> transition. When regionsInTransition.size() crosses boundary, we
>>> temporarily
>>> stop load balancer. This seems to resolve our issue, because no region
>>> gets
>>> unassigned for long time and clients manage to recover within their
>>> number
>>> of retries.
>>>
>>> My question is, is this s general issue and a new config option should be
>>> proposed, or I am missing something a we could have resolved the issue
>>> with
>>> some other config option tuning?
>>>
>>> Thanks.
>>>  Jan
>>>
>>>
>
>
> --
>
> Jan Lukavský
> programátor
> Seznam.cz, a.s.
> Radlická 608/2
> 15000, Praha 5
>
> jan.lukav...@firma.seznam.cz
> http://www.seznam.cz
>
>

Re: Region loadbalancing

Reply via email to