Hi, There:
I have found an hbase bug related to openning region takes too long. The
client reported error of no server address. For the region
MyOwnEventTable,2010-06-13
10:33:31\x0922f3563bd43a3c3c044bd1db885f1523,1276457581773, here is the
sequence:
Around 12:57, all 8 region servers closed this region.
On machine2037, at 12:57:45,812 , it received a request to open this
region. Usually, a worker thread will immediately honor the request and
open this region within seconds, but in this case, the region wasn't open
until 13:14:43,341 .
Around 13:16, all other regionservers received requests to open this region
, and worker thread immediately opened them .
So during this time time gap from 12:57 to 13:14, the region is not
available. And the client logs error while trying to insert the records.
I have read the hbase code. The way the hbase solves this problem is by
retrying 10 times, waiting 10 seconds in between. Essentially it tries for
100 seconds.
In this case, even that 100 seconds retrial won't work at 12:10. because the
region was opened way beyond 100 second interval.
This is clearly an hbase bug.
Jimmy>
Here is the client side log:
13:10:03,441 INFO [ClientCnxn] Attempting connection to server
zookeeper2.cloud.mydomain.net/10.110.8 52:2181: No server address listed in
.META. for region MyOwnEventTable,2010-06-13
10:33:31\x0922f3563bd43a3c3c044bd1db885f1523,1276457581773
13:10:03,451 INFO [ClientCnxn] Server connection successful
org.apache.hadoop.hbase.client.NoServerForRegionException: No server address
listed in .META. for r gion MyOwnEventTable,2010-06-13
10:33:31\x0922f3563bd43a3c3c044bd1db885f1523,1276457581773
here are the regionserver side log related to this issue.
machine2035:
2010-06-14 12:57:23,452 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Close
d MyOwnEventTable,2010-06-13
10:33:31\x0922f3563bd43a3c3c044bd1db885f1523,127
6457581773
2010-06-14 13:16:37,333 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer:
MSG_REGION_OPEN: MyOwnEventTable,2010-06-13
10:33:31\x0922f3563bd43a3c3c044b
d1db885f1523,1276457581773
2010-06-14 13:16:37,333 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer:
Worker: MSG_REGION_OPEN: MyOwnEventTable,2010-06-13
10:33:31\x0922f3563bd43a
3c3c044bd1db885f1523,1276457581773
machine2036:
2010-06-14 12:57:29,312 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Close
d MyOwnEventTable,2010-06-13
10:33:31\x0922f3563bd43a3c3c044bd1db885f1523,127
6457581773
2010-06-14 13:16:05,107 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer:
MSG_REGION_OPEN: MyOwnEventTable,2010-06-13
10:33:31\x0922f3563bd43a3c3c044b
d1db885f1523,1276457581773
2010-06-14 13:16:05,107 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer:
Worker: MSG_REGION_OPEN: MyOwnEventTable,2010-06-13
10:33:31\x0922f3563bd43a
3c3c044bd1db885f1523,1276457581773
machine2037
2010-06-14 12:57:09,986 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Close
d MyOwnEventTable,2010-06-13
10:33:31\x0922f3563bd43a3c3c044bd1db885f1523,127
6457581773
2010-06-14 12:57:45,812 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer:
MSG_REGION_OPEN: MyOwnEventTable,2010-06-13
10:33:31\x0922f3563bd43a3c3c044b
d1db885f1523,1276457581773
2010-06-14 13:14:43,341 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer:
Worker: MSG_REGION_OPEN: MyOwnEventTable,2010-06-13
10:33:31\x0922f3563bd43a
3c3c044bd1db885f1523,1276457581773
machine2038
2010-06-14 12:57:25,562 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Close
d MyOwnEventTable,2010-06-13
10:33:31\x0922f3563bd43a3c3c044bd1db885f1523,127
6457581773
2010-06-14 13:15:53,356 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer:
MSG_REGION_OPEN: MyOwnEventTable,2010-06-13
10:33:31\x0922f3563bd43a3c3c044b
d1db885f1523,1276457581773
2010-06-14 13:15:53,356 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer:
Worker: MSG_REGION_OPEN: MyOwnEventTable,2010-06-13
10:33:31\x0922f3563bd43a
3c3c044bd1db885f1523,1276457581773
machine2040:
2010-06-14 12:57:14,214 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Close
d MyOwnEventTable,2010-06-13
10:33:31\x0922f3563bd43a3c3c044bd1db885f1523,127
6457581773
2010-06-14 13:15:01,266 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer:
MSG_REGION_OPEN: MyOwnEventTable,2010-06-13
10:33:31\x0922f3563bd43a3c3c044b
d1db885f1523,1276457581773
2010-06-14 13:15:01,266 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer:
Worker: MSG_REGION_OPEN: MyOwnEventTable,2010-06-13
10:33:31\x0922f3563bd43a
3c3c044bd1db885f1523,1276457581773
machine2041
2010-06-14 12:57:44,877 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Close
d MyOwnEventTable,2010-06-13
10:33:31\x0922f3563bd43a3c3c044bd1db885f1523,127
6457581773
2010-06-14 13:15:48,955 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer:
MSG_REGION_OPEN: MyOwnEventTable,2010-06-13
10:33:31\x0922f3563bd43a3c3c044b
d1db885f1523,1276457581773
2010-06-14 13:15:48,955 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer:
Worker: MSG_REGION_OPEN: MyOwnEventTable,2010-06-13
10:33:31\x0922f3563bd43a
3c3c044bd1db885f1523,1276457581773
machine2042:
2010-06-14 12:57:12,500 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Close
d MyOwnEventTable,2010-06-13
10:33:31\x0922f3563bd43a3c3c044bd1db885f1523,127
6457581773
2010-06-14 13:14:58,719 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer:
MSG_REGION_OPEN: MyOwnEventTable,2010-06-13
10:33:31\x0922f3563bd43a3c3c044b
d1db885f1523,1276457581773
2010-06-14 13:14:58,719 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer:
Worker: MSG_REGION_OPEN: MyOwnEventTable,2010-06-13
10:33:31\x0922f3563bd43a
3c3c044bd1db885f1523,1276457581773