Hi, There:

I have found an hbase bug related to openning region takes too long. The client reported error of no server address. For the region MyOwnEventTable,2010-06-13 10:33:31\x0922f3563bd43a3c3c044bd1db885f1523,1276457581773, here is the sequence:



Around  12:57, all 8 region servers closed this region.
On machine2037, at 12:57:45,812 , it received a request to open this region. Usually, a worker thread will immediately honor the request and open this region within seconds, but in this case, the region wasn't open until 13:14:43,341 . Around 13:16, all other regionservers received requests to open this region , and worker thread immediately opened them .


So during this time time gap from 12:57 to 13:14, the region is not available. And the client logs error while trying to insert the records.



I have read the hbase code. The way the hbase solves this problem is by retrying 10 times, waiting 10 seconds in between. Essentially it tries for 100 seconds.

In this case, even that 100 seconds retrial won't work at 12:10. because the region was opened way beyond 100 second interval.



This is clearly an hbase bug.


Jimmy>




Here is the client side log:

13:10:03,441 INFO [ClientCnxn] Attempting connection to server zookeeper2.cloud.mydomain.net/10.110.8 52:2181: No server address listed in .META. for region MyOwnEventTable,2010-06-13 10:33:31\x0922f3563bd43a3c3c044bd1db885f1523,1276457581773



13:10:03,451 INFO  [ClientCnxn] Server connection successful

org.apache.hadoop.hbase.client.NoServerForRegionException: No server address listed in .META. for r gion MyOwnEventTable,2010-06-13 10:33:31\x0922f3563bd43a3c3c044bd1db885f1523,1276457581773





here are the regionserver side log related to this issue.


machine2035:

2010-06-14 12:57:23,452 INFO org.apache.hadoop.hbase.regionserver.HRegion: Close

d MyOwnEventTable,2010-06-13 10:33:31\x0922f3563bd43a3c3c044bd1db885f1523,127

6457581773

2010-06-14 13:16:37,333 INFO org.apache.hadoop.hbase.regionserver.HRegionServer:

MSG_REGION_OPEN: MyOwnEventTable,2010-06-13 10:33:31\x0922f3563bd43a3c3c044b

d1db885f1523,1276457581773

2010-06-14 13:16:37,333 INFO org.apache.hadoop.hbase.regionserver.HRegionServer:

Worker: MSG_REGION_OPEN: MyOwnEventTable,2010-06-13 10:33:31\x0922f3563bd43a

3c3c044bd1db885f1523,1276457581773





machine2036:

2010-06-14 12:57:29,312 INFO org.apache.hadoop.hbase.regionserver.HRegion: Close

d MyOwnEventTable,2010-06-13 10:33:31\x0922f3563bd43a3c3c044bd1db885f1523,127

6457581773

2010-06-14 13:16:05,107 INFO org.apache.hadoop.hbase.regionserver.HRegionServer:

MSG_REGION_OPEN: MyOwnEventTable,2010-06-13 10:33:31\x0922f3563bd43a3c3c044b

d1db885f1523,1276457581773

2010-06-14 13:16:05,107 INFO org.apache.hadoop.hbase.regionserver.HRegionServer:

Worker: MSG_REGION_OPEN: MyOwnEventTable,2010-06-13 10:33:31\x0922f3563bd43a

3c3c044bd1db885f1523,1276457581773







machine2037

2010-06-14 12:57:09,986 INFO org.apache.hadoop.hbase.regionserver.HRegion: Close

d MyOwnEventTable,2010-06-13 10:33:31\x0922f3563bd43a3c3c044bd1db885f1523,127

6457581773

2010-06-14 12:57:45,812 INFO org.apache.hadoop.hbase.regionserver.HRegionServer:

MSG_REGION_OPEN: MyOwnEventTable,2010-06-13 10:33:31\x0922f3563bd43a3c3c044b

d1db885f1523,1276457581773

2010-06-14 13:14:43,341 INFO org.apache.hadoop.hbase.regionserver.HRegionServer:

Worker: MSG_REGION_OPEN: MyOwnEventTable,2010-06-13 10:33:31\x0922f3563bd43a

3c3c044bd1db885f1523,1276457581773







machine2038



2010-06-14 12:57:25,562 INFO org.apache.hadoop.hbase.regionserver.HRegion: Close

d MyOwnEventTable,2010-06-13 10:33:31\x0922f3563bd43a3c3c044bd1db885f1523,127

6457581773

2010-06-14 13:15:53,356 INFO org.apache.hadoop.hbase.regionserver.HRegionServer:

MSG_REGION_OPEN: MyOwnEventTable,2010-06-13 10:33:31\x0922f3563bd43a3c3c044b

d1db885f1523,1276457581773

2010-06-14 13:15:53,356 INFO org.apache.hadoop.hbase.regionserver.HRegionServer:

Worker: MSG_REGION_OPEN: MyOwnEventTable,2010-06-13 10:33:31\x0922f3563bd43a

3c3c044bd1db885f1523,1276457581773





machine2040:

2010-06-14 12:57:14,214 INFO org.apache.hadoop.hbase.regionserver.HRegion: Close

d MyOwnEventTable,2010-06-13 10:33:31\x0922f3563bd43a3c3c044bd1db885f1523,127

6457581773

2010-06-14 13:15:01,266 INFO org.apache.hadoop.hbase.regionserver.HRegionServer:

MSG_REGION_OPEN: MyOwnEventTable,2010-06-13 10:33:31\x0922f3563bd43a3c3c044b

d1db885f1523,1276457581773

2010-06-14 13:15:01,266 INFO org.apache.hadoop.hbase.regionserver.HRegionServer:

Worker: MSG_REGION_OPEN: MyOwnEventTable,2010-06-13 10:33:31\x0922f3563bd43a

3c3c044bd1db885f1523,1276457581773







machine2041

2010-06-14 12:57:44,877 INFO org.apache.hadoop.hbase.regionserver.HRegion: Close

d MyOwnEventTable,2010-06-13 10:33:31\x0922f3563bd43a3c3c044bd1db885f1523,127

6457581773

2010-06-14 13:15:48,955 INFO org.apache.hadoop.hbase.regionserver.HRegionServer:

MSG_REGION_OPEN: MyOwnEventTable,2010-06-13 10:33:31\x0922f3563bd43a3c3c044b

d1db885f1523,1276457581773

2010-06-14 13:15:48,955 INFO org.apache.hadoop.hbase.regionserver.HRegionServer:

Worker: MSG_REGION_OPEN: MyOwnEventTable,2010-06-13 10:33:31\x0922f3563bd43a

3c3c044bd1db885f1523,1276457581773



machine2042:

2010-06-14 12:57:12,500 INFO org.apache.hadoop.hbase.regionserver.HRegion: Close

d MyOwnEventTable,2010-06-13 10:33:31\x0922f3563bd43a3c3c044bd1db885f1523,127

6457581773

2010-06-14 13:14:58,719 INFO org.apache.hadoop.hbase.regionserver.HRegionServer:

MSG_REGION_OPEN: MyOwnEventTable,2010-06-13 10:33:31\x0922f3563bd43a3c3c044b

d1db885f1523,1276457581773

2010-06-14 13:14:58,719 INFO org.apache.hadoop.hbase.regionserver.HRegionServer:

Worker: MSG_REGION_OPEN: MyOwnEventTable,2010-06-13 10:33:31\x0922f3563bd43a

3c3c044bd1db885f1523,1276457581773






Reply via email to