Can you post the log from the regionserver that did not ever open the region (from 12:57 to 13:14)? And actually grab it from a couple minutes before 12:57.
Most likely this is not a bug as much as a current limitation of handling open/close messages sequentially. It's possible that a long-running close (flush) held up processing of the open. The logs will say more. This should be much improved with the major release of HBase. JG > -----Original Message----- > From: Jinsong Hu [mailto:[email protected]] > Sent: Monday, June 14, 2010 11:24 AM > To: [email protected] > Subject: bug report: opening hbase region takes too long , making the > region not available for more than 10 minutes. > > > > Hi, There: > > I have found an hbase bug related to openning region takes too long. > The > client reported error of no server address. For the region > MyOwnEventTable,2010-06-13 > 10:33:31\x0922f3563bd43a3c3c044bd1db885f1523,1276457581773, here is > the > sequence: > > > > Around 12:57, all 8 region servers closed this region. > On machine2037, at 12:57:45,812 , it received a request to open this > region. Usually, a worker thread will immediately honor the request > and > open this region within seconds, but in this case, the region wasn't > open > until 13:14:43,341 . > Around 13:16, all other regionservers received requests to open this > region > , and worker thread immediately opened them . > > > So during this time time gap from 12:57 to 13:14, the region is not > available. And the client logs error while trying to insert the > records. > > > > I have read the hbase code. The way the hbase solves this problem is by > retrying 10 times, waiting 10 seconds in between. Essentially it tries > for > 100 seconds. > > In this case, even that 100 seconds retrial won't work at 12:10. > because the > region was opened way beyond 100 second interval. > > > > This is clearly an hbase bug. > > > Jimmy> > > > > > Here is the client side log: > > 13:10:03,441 INFO [ClientCnxn] Attempting connection to server > zookeeper2.cloud.mydomain.net/10.110.8 52:2181: No server address > listed in > .META. for region MyOwnEventTable,2010-06-13 > 10:33:31\x0922f3563bd43a3c3c044bd1db885f1523,1276457581773 > > > > 13:10:03,451 INFO [ClientCnxn] Server connection successful > > org.apache.hadoop.hbase.client.NoServerForRegionException: No server > address > listed in .META. for r gion MyOwnEventTable,2010-06-13 > 10:33:31\x0922f3563bd43a3c3c044bd1db885f1523,1276457581773 > > > > > > here are the regionserver side log related to this issue. > > > machine2035: > > 2010-06-14 12:57:23,452 INFO > org.apache.hadoop.hbase.regionserver.HRegion: > Close > > d MyOwnEventTable,2010-06-13 > 10:33:31\x0922f3563bd43a3c3c044bd1db885f1523,127 > > 6457581773 > > 2010-06-14 13:16:37,333 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > > MSG_REGION_OPEN: MyOwnEventTable,2010-06-13 > 10:33:31\x0922f3563bd43a3c3c044b > > d1db885f1523,1276457581773 > > 2010-06-14 13:16:37,333 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > > Worker: MSG_REGION_OPEN: MyOwnEventTable,2010-06-13 > 10:33:31\x0922f3563bd43a > > 3c3c044bd1db885f1523,1276457581773 > > > > > > machine2036: > > 2010-06-14 12:57:29,312 INFO > org.apache.hadoop.hbase.regionserver.HRegion: > Close > > d MyOwnEventTable,2010-06-13 > 10:33:31\x0922f3563bd43a3c3c044bd1db885f1523,127 > > 6457581773 > > 2010-06-14 13:16:05,107 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > > MSG_REGION_OPEN: MyOwnEventTable,2010-06-13 > 10:33:31\x0922f3563bd43a3c3c044b > > d1db885f1523,1276457581773 > > 2010-06-14 13:16:05,107 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > > Worker: MSG_REGION_OPEN: MyOwnEventTable,2010-06-13 > 10:33:31\x0922f3563bd43a > > 3c3c044bd1db885f1523,1276457581773 > > > > > > > > machine2037 > > 2010-06-14 12:57:09,986 INFO > org.apache.hadoop.hbase.regionserver.HRegion: > Close > > d MyOwnEventTable,2010-06-13 > 10:33:31\x0922f3563bd43a3c3c044bd1db885f1523,127 > > 6457581773 > > 2010-06-14 12:57:45,812 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > > MSG_REGION_OPEN: MyOwnEventTable,2010-06-13 > 10:33:31\x0922f3563bd43a3c3c044b > > d1db885f1523,1276457581773 > > 2010-06-14 13:14:43,341 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > > Worker: MSG_REGION_OPEN: MyOwnEventTable,2010-06-13 > 10:33:31\x0922f3563bd43a > > 3c3c044bd1db885f1523,1276457581773 > > > > > > > > machine2038 > > > > 2010-06-14 12:57:25,562 INFO > org.apache.hadoop.hbase.regionserver.HRegion: > Close > > d MyOwnEventTable,2010-06-13 > 10:33:31\x0922f3563bd43a3c3c044bd1db885f1523,127 > > 6457581773 > > 2010-06-14 13:15:53,356 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > > MSG_REGION_OPEN: MyOwnEventTable,2010-06-13 > 10:33:31\x0922f3563bd43a3c3c044b > > d1db885f1523,1276457581773 > > 2010-06-14 13:15:53,356 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > > Worker: MSG_REGION_OPEN: MyOwnEventTable,2010-06-13 > 10:33:31\x0922f3563bd43a > > 3c3c044bd1db885f1523,1276457581773 > > > > > > machine2040: > > 2010-06-14 12:57:14,214 INFO > org.apache.hadoop.hbase.regionserver.HRegion: > Close > > d MyOwnEventTable,2010-06-13 > 10:33:31\x0922f3563bd43a3c3c044bd1db885f1523,127 > > 6457581773 > > 2010-06-14 13:15:01,266 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > > MSG_REGION_OPEN: MyOwnEventTable,2010-06-13 > 10:33:31\x0922f3563bd43a3c3c044b > > d1db885f1523,1276457581773 > > 2010-06-14 13:15:01,266 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > > Worker: MSG_REGION_OPEN: MyOwnEventTable,2010-06-13 > 10:33:31\x0922f3563bd43a > > 3c3c044bd1db885f1523,1276457581773 > > > > > > > > machine2041 > > 2010-06-14 12:57:44,877 INFO > org.apache.hadoop.hbase.regionserver.HRegion: > Close > > d MyOwnEventTable,2010-06-13 > 10:33:31\x0922f3563bd43a3c3c044bd1db885f1523,127 > > 6457581773 > > 2010-06-14 13:15:48,955 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > > MSG_REGION_OPEN: MyOwnEventTable,2010-06-13 > 10:33:31\x0922f3563bd43a3c3c044b > > d1db885f1523,1276457581773 > > 2010-06-14 13:15:48,955 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > > Worker: MSG_REGION_OPEN: MyOwnEventTable,2010-06-13 > 10:33:31\x0922f3563bd43a > > 3c3c044bd1db885f1523,1276457581773 > > > > machine2042: > > 2010-06-14 12:57:12,500 INFO > org.apache.hadoop.hbase.regionserver.HRegion: > Close > > d MyOwnEventTable,2010-06-13 > 10:33:31\x0922f3563bd43a3c3c044bd1db885f1523,127 > > 6457581773 > > 2010-06-14 13:14:58,719 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > > MSG_REGION_OPEN: MyOwnEventTable,2010-06-13 > 10:33:31\x0922f3563bd43a3c3c044b > > d1db885f1523,1276457581773 > > 2010-06-14 13:14:58,719 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > > Worker: MSG_REGION_OPEN: MyOwnEventTable,2010-06-13 > 10:33:31\x0922f3563bd43a > > 3c3c044bd1db885f1523,1276457581773 > > > > >
