Looks like I'll be fixing it in the context of HBASE-3984. J-D
On Mon, Jun 27, 2011 at 10:21 AM, Jean-Daniel Cryans <[email protected]> wrote: > Yeah the BulkAssigner uses a AssignmentManager.assign method that > won't retry if it gets an exception because originally it was only > used during startup (eg if you can't assign the regions properly at > startup then kill yourself). > > In this case it happened to you when creating a table (it also uses > the BulkAssigner now). To me it's a bug, it should just be assigned > somewhere else. Mind filing a jira? > > As to how you got a 60 secs timeout, you should take a look at the RS > log (and next time it happens also do a jstack). > > Thx, > > J-D > > On Mon, Jun 27, 2011 at 9:54 AM, Matt Davies <[email protected]> wrote: >> Hey all, >> >> Interesting problem I'm running into. It appears that our master has died >> after a failed region assignment. Any ideas as to why? >> >> >> We are running version 0.90.1 in this cluster. >> H3M1 = master >> H3S* = regionservers >> >> out on H3S3 the regionserver process is running (but didn't respond per the >> log) >> >> 2011-06-27 10:18:22,517 INFO >> org.apache.hadoop.hbase.master.handler.DisableTableHandler: Attemping to >> disable table ContentGuidance >> 2011-06-27 10:18:22,522 INFO >> org.apache.hadoop.hbase.master.handler.DisableTableHandler: Offlining 1 >> regions. >> 2011-06-27 10:18:23,527 INFO >> org.apache.hadoop.hbase.master.handler.DisableTableHandler: Disabled table >> is done=true >> 2011-06-27 10:18:24,536 INFO >> org.apache.hadoop.hbase.master.handler.TableEventHandler: Handling table >> operation C_M_DELETE_TABLE on table ContentGuidance >> 2011-06-27 10:18:24,584 INFO org.apache.hadoop.hbase.catalog.MetaEditor: >> Deleted region >> ContentGuidance,,1298676132327.9d63b6173c7307c27aa9728f738f0ef3. from META >> 2011-06-27 10:19:17,296 WARN org.apache.hadoop.hbase.zookeeper.ZKTable: >> Moving table ContentGuidance state to enabled but was already enabled >> 2011-06-27 10:19:17,313 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: >> HLog configuration: blocksize=64 MB, rollsize=60.8 MB, enabled=true, >> flushlogentries=1, optionallogflushinternal=1000ms >> 2011-06-27 10:19:17,349 INFO >> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: Using syncFs >> -- HDFS-200 >> 2011-06-27 10:19:17,350 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: >> New hlog >> /hbase/ContentGuidance/f38bbc9bc672b5ea415b6dcc7e1da43a/.logs/hlog.1309191557313 >> 2011-06-27 10:19:17,350 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: >> Using getNumCurrentReplicas--HDFS-826 >> 2011-06-27 10:19:17,433 INFO org.apache.hadoop.hbase.regionserver.HRegion: >> Onlined ContentGuidance,,1309191557292.f38bbc9bc672b5ea415b6dcc7e1da43a.; >> next sequenceid=1 >> 2011-06-27 10:19:17,438 INFO org.apache.hadoop.hbase.catalog.MetaEditor: >> Added region >> ContentGuidance,,1309191557292.f38bbc9bc672b5ea415b6dcc7e1da43a. to META >> 2011-06-27 10:19:17,439 INFO org.apache.hadoop.hbase.regionserver.HRegion: >> Closed ContentGuidance,,1309191557292.f38bbc9bc672b5ea415b6dcc7e1da43a. >> 2011-06-27 10:19:17,439 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: >> IPC Server handler 87 on 60000.logSyncer exiting >> 2011-06-27 10:19:17,470 INFO >> org.apache.hadoop.hbase.master.AssignmentManager: Bulk assigning 1 region(s) >> round-robin across 4 server(s) >> 2011-06-27 10:19:17,472 INFO >> org.apache.hadoop.hbase.master.AssignmentManager: Bulk assigning done >> 2011-06-27 10:19:17,480 INFO >> org.apache.hadoop.hbase.master.AssignmentManager: H3S4,60020,1308946657703 >> unassigned znodes=1 of total=1 >> 2011-06-27 10:19:21,530 INFO >> org.apache.hadoop.hbase.master.handler.DisableTableHandler: Attemping to >> disable table ContentGuidance >> 2011-06-27 10:19:21,536 INFO >> org.apache.hadoop.hbase.master.handler.DisableTableHandler: Offlining 1 >> regions. >> 2011-06-27 10:19:22,543 INFO >> org.apache.hadoop.hbase.master.handler.DisableTableHandler: Disabled table >> is done=true >> 2011-06-27 10:19:23,544 INFO >> org.apache.hadoop.hbase.master.handler.TableEventHandler: Handling table >> operation C_M_DELETE_TABLE on table ContentGuidance >> 2011-06-27 10:19:23,551 INFO org.apache.hadoop.hbase.catalog.MetaEditor: >> Deleted region >> ContentGuidance,,1309191557292.f38bbc9bc672b5ea415b6dcc7e1da43a. from META >> 2011-06-27 10:20:17,479 FATAL org.apache.hadoop.hbase.master.HMaster: Failed >> assignment of regions to serverName=H3S3,60020,1308946657608, >> load=(requests=0, regions=7, usedHeap=976, maxHeap=1011) >> java.net.SocketTimeoutException: Call to H3S3/10..x.x.x3:60020 failed on >> socket timeout exception: java.net.SocketTimeoutException: 60000 millis >> timeout while waiting for channel to be ready for read. ch : >> java.nio.channels.SocketChannel[connected local=/10..x.x.x:34904 >> remote=H3S3/10.x.x.x:60020] >> at >> org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:802) >> at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:775) >> at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257) >> at $Proxy7.openRegions(Unknown Source) >> at >> org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:566) >> at >> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:776) >> at >> org.apache.hadoop.hbase.master.AssignmentManager$SingleServerBulkAssigner.run(AssignmentManager.java:1310) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) >> at java.lang.Thread.run(Thread.java:619) >> Caused by: java.net.SocketTimeoutException: 60000 millis timeout while >> waiting for channel to be ready for read. ch : >> java.nio.channels.SocketChannel[connected local=/10..x.x.x:34904 >> remote=H3S3/10..x.x.x:60020] >> at >> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164) >> at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155) >> at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128) >> at java.io.FilterInputStream.read(FilterInputStream.java:116) >> at >> org.apache.hadoop.hbase.ipc.HBaseClient$Connection$PingInputStream.read(HBaseClient.java:299) >> at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) >> at java.io.BufferedInputStream.read(BufferedInputStream.java:237) >> at java.io.DataInputStream.readInt(DataInputStream.java:370) >> at >> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:539) >> at >> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:477) >> 2011-06-27 10:20:17,482 INFO org.apache.hadoop.hbase.master.HMaster: >> Aborting >> 2011-06-27 10:20:18,390 INFO org.apache.hadoop.ipc.HBaseServer: Stopping >> server on 60000 >> 2011-06-27 10:20:18,390 INFO org.apache.hadoop.hbase.master.CatalogJanitor: >> H3M1:60000-CatalogJanitor exiting >> 2011-06-27 10:20:18,390 INFO org.apache.hadoop.hbase.master.HMaster$1: >> H3M1:60000-BalancerChore exiting >> 2011-06-27 10:20:18,392 INFO org.apache.hadoop.ipc.HBaseServer: Stopping IPC >> Server listener on 60000 >> >
