On Thu, Jun 17, 2010 at 11:58 AM, Daniel Einspanjer <[email protected] > wrote:
> Here is an example of a region split with both daughters being assigned to > the same region. Is this expected? > > 2010-06-17 08:34:53,060 INFO org.apache.hadoop.hbase.master.ServerManager: > Processing MSG_REPORT_SPLIT_INCLUDES_DAUGHTERS: > crash_reports,21006172700f355-1d02-485a-90d9-0e8182100617,1276776160508: > Daughters; > crash_reports,21006172700f355-1d02-485a-90d9-0e8182100617,1276788891647, > crash_reports,21006172b7ec9f5-dcad-4c98-9dc5-969532100617,1276788891647 from > cm-hadoop14.mozilla.org,60020,1276560962019; 1 of 1 > 2010-06-17 08:34:54,316 INFO org.apache.hadoop.hbase.master.RegionManager: > Assigning region > crash_reports,21006172700f355-1d02-485a-90d9-0e8182100617,1276788891647 to > cm-hadoop15.mozilla.org,60020,1276778868841 > 2010-06-17 08:34:54,316 INFO org.apache.hadoop.hbase.master.RegionManager: > Assigning region > crash_reports,21006172b7ec9f5-dcad-4c98-9dc5-969532100617,1276788891647 to > cm-hadoop15.mozilla.org,60020,12767788688412010-06-17 08:34:55,432 INFO > org.apache.hadoop.hbase.master.ServerManager: Processing MSG_REPORT_OPEN: > crash_reports,21006172700f355-1d02-485a-90d9-0e8182100617,1276788891647 from > cm-hadoop15.mozilla.org,60020,1276778868841; > 1 of 1 > 2010-06-17 08:34:55,432 INFO > org.apache.hadoop.hbase.master.RegionServerOperation: > crash_reports,21006172700f355-1d02-485a-90d9-0e8182100617,1276788891647 open > on 10.2.72.74:60020 > 2010-06-17 08:34:55,436 INFO > org.apache.hadoop.hbase.master.RegionServerOperation: Updated row > crash_reports,21006172700f355-1d02-485a-90d9-0e8182100617,1276788891647 in > region .META.,,1 with startcode=1276778868841, server=1 > 0.2.72.74:60020 > 2010-06-17 08:34:56,044 INFO org.apache.hadoop.hbase.master.ServerManager: > Processing MSG_REPORT_OPEN: > crash_reports,21006172b7ec9f5-dcad-4c98-9dc5-969532100617,1276788891647 from > cm-hadoop15.mozilla.org,60020,1276778868841; > 1 of 1 > 2010-06-17 08:34:56,044 INFO > org.apache.hadoop.hbase.master.RegionServerOperation: > crash_reports,21006172b7ec9f5-dcad-4c98-9dc5-969532100617,1276788891647 open > on 10.2.72.74:60020 > 2010-06-17 08:34:56,048 INFO > org.apache.hadoop.hbase.master.RegionServerOperation: Updated row > crash_reports,21006172b7ec9f5-dcad-4c98-9dc5-969532100617,1276788891647 in > region .META.,,1 with startcode=1276778868841, server=1 > 0.2.72.74:60020 > > > > On 6/17/10 11:42 AM, Daniel Einspanjer wrote: > >> Currently, in our production cluster, almost all of the traffic for a day >> ends up assigned to a single RS and that causes the load on that machine to >> be too high. >> >> With our last release, we salted our rowkeys so that rather than starting >> with the date: >> 100617<guid> >> they now start with the first letter of the guid followed by the date: >> e100617<guid_that_starts_with_e> >> >> When I look at the region assignments though, I see a single server >> assigned the following regions: >> 0100617... >> 1100617... >> 2100617... >> 3100617... >> 4100617... >> ... >> d100617... >> e100617... >> f100617... >> >> Is there anything we can do to try to get the cluster to shuffle this up >> some more? >> We are getting compaction times in the minutes (one I saw was over 12 >> minutes) and this causes our clients to time out and shut down which causes >> production outages. >> >> -Daniel >> > Here comes a stone age, stop gap suggestion. If you shutdown the region server you would get them to move, but there is a period of time where the region is inaccessible so that is never good.
