Re: Need help trying to balance HBase RegionServer load

Edward Capriolo Thu, 17 Jun 2010 09:05:57 -0700

On Thu, Jun 17, 2010 at 11:58 AM, Daniel Einspanjer <[email protected]
> wrote:


>  Here is an example of a region split with both daughters being assigned to
> the same region.  Is this expected?
>
> 2010-06-17 08:34:53,060 INFO org.apache.hadoop.hbase.master.ServerManager:
> Processing MSG_REPORT_SPLIT_INCLUDES_DAUGHTERS:
> crash_reports,21006172700f355-1d02-485a-90d9-0e8182100617,1276776160508:
> Daughters;
> crash_reports,21006172700f355-1d02-485a-90d9-0e8182100617,1276788891647,
> crash_reports,21006172b7ec9f5-dcad-4c98-9dc5-969532100617,1276788891647 from
> cm-hadoop14.mozilla.org,60020,1276560962019; 1 of 1
> 2010-06-17 08:34:54,316 INFO org.apache.hadoop.hbase.master.RegionManager:
> Assigning region
> crash_reports,21006172700f355-1d02-485a-90d9-0e8182100617,1276788891647 to
> cm-hadoop15.mozilla.org,60020,1276778868841
> 2010-06-17 08:34:54,316 INFO org.apache.hadoop.hbase.master.RegionManager:
> Assigning region
> crash_reports,21006172b7ec9f5-dcad-4c98-9dc5-969532100617,1276788891647 to
> cm-hadoop15.mozilla.org,60020,12767788688412010-06-17 08:34:55,432 INFO
> org.apache.hadoop.hbase.master.ServerManager: Processing MSG_REPORT_OPEN:
> crash_reports,21006172700f355-1d02-485a-90d9-0e8182100617,1276788891647 from
> cm-hadoop15.mozilla.org,60020,1276778868841;
> 1 of 1
> 2010-06-17 08:34:55,432 INFO
> org.apache.hadoop.hbase.master.RegionServerOperation:
> crash_reports,21006172700f355-1d02-485a-90d9-0e8182100617,1276788891647 open
> on 10.2.72.74:60020
> 2010-06-17 08:34:55,436 INFO
> org.apache.hadoop.hbase.master.RegionServerOperation: Updated row
> crash_reports,21006172700f355-1d02-485a-90d9-0e8182100617,1276788891647 in
> region .META.,,1 with startcode=1276778868841, server=1
> 0.2.72.74:60020
> 2010-06-17 08:34:56,044 INFO org.apache.hadoop.hbase.master.ServerManager:
> Processing MSG_REPORT_OPEN:
> crash_reports,21006172b7ec9f5-dcad-4c98-9dc5-969532100617,1276788891647 from
> cm-hadoop15.mozilla.org,60020,1276778868841;
> 1 of 1
> 2010-06-17 08:34:56,044 INFO
> org.apache.hadoop.hbase.master.RegionServerOperation:
> crash_reports,21006172b7ec9f5-dcad-4c98-9dc5-969532100617,1276788891647 open
> on 10.2.72.74:60020
> 2010-06-17 08:34:56,048 INFO
> org.apache.hadoop.hbase.master.RegionServerOperation: Updated row
> crash_reports,21006172b7ec9f5-dcad-4c98-9dc5-969532100617,1276788891647 in
> region .META.,,1 with startcode=1276778868841, server=1
> 0.2.72.74:60020
>
>
>
> On 6/17/10 11:42 AM, Daniel Einspanjer wrote:
>
>>  Currently, in our production cluster, almost all of the traffic for a day
>> ends up assigned to a single RS and that causes the load on that machine to
>> be too high.
>>
>> With our last release, we salted our rowkeys so that rather than starting
>> with the date:
>> 100617<guid>
>>  they now start with the first letter of the guid followed by the date:
>>  e100617<guid_that_starts_with_e>
>>
>> When I look at the region assignments though, I see a single server
>> assigned the following regions:
>>  0100617...
>>  1100617...
>>  2100617...
>>  3100617...
>>  4100617...
>>  ...
>>  d100617...
>>  e100617...
>>  f100617...
>>
>> Is there anything we can do to try to get the cluster to shuffle this up
>> some more?
>> We are getting compaction times in the minutes (one I saw was over 12
>> minutes) and this causes our clients to time out and shut down which causes
>> production outages.
>>
>> -Daniel
>>
>
Here comes a stone age, stop gap suggestion. If you shutdown the region
server you would get them to move, but there is a period of time where the
region is inaccessible so that is never good.

Re: Need help trying to balance HBase RegionServer load

Reply via email to