Hey all, Interesting problem I'm running into. It appears that our master has died after a failed region assignment. Any ideas as to why?
We are running version 0.90.1 in this cluster. H3M1 = master H3S* = regionservers out on H3S3 the regionserver process is running (but didn't respond per the log) 2011-06-27 10:18:22,517 INFO org.apache.hadoop.hbase.master.handler.DisableTableHandler: Attemping to disable table ContentGuidance 2011-06-27 10:18:22,522 INFO org.apache.hadoop.hbase.master.handler.DisableTableHandler: Offlining 1 regions. 2011-06-27 10:18:23,527 INFO org.apache.hadoop.hbase.master.handler.DisableTableHandler: Disabled table is done=true 2011-06-27 10:18:24,536 INFO org.apache.hadoop.hbase.master.handler.TableEventHandler: Handling table operation C_M_DELETE_TABLE on table ContentGuidance 2011-06-27 10:18:24,584 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Deleted region ContentGuidance,,1298676132327.9d63b6173c7307c27aa9728f738f0ef3. from META 2011-06-27 10:19:17,296 WARN org.apache.hadoop.hbase.zookeeper.ZKTable: Moving table ContentGuidance state to enabled but was already enabled 2011-06-27 10:19:17,313 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: HLog configuration: blocksize=64 MB, rollsize=60.8 MB, enabled=true, flushlogentries=1, optionallogflushinternal=1000ms 2011-06-27 10:19:17,349 INFO org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: Using syncFs -- HDFS-200 2011-06-27 10:19:17,350 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: New hlog /hbase/ContentGuidance/f38bbc9bc672b5ea415b6dcc7e1da43a/.logs/hlog.1309191557313 2011-06-27 10:19:17,350 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: Using getNumCurrentReplicas--HDFS-826 2011-06-27 10:19:17,433 INFO org.apache.hadoop.hbase.regionserver.HRegion: Onlined ContentGuidance,,1309191557292.f38bbc9bc672b5ea415b6dcc7e1da43a.; next sequenceid=1 2011-06-27 10:19:17,438 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Added region ContentGuidance,,1309191557292.f38bbc9bc672b5ea415b6dcc7e1da43a. to META 2011-06-27 10:19:17,439 INFO org.apache.hadoop.hbase.regionserver.HRegion: Closed ContentGuidance,,1309191557292.f38bbc9bc672b5ea415b6dcc7e1da43a. 2011-06-27 10:19:17,439 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: IPC Server handler 87 on 60000.logSyncer exiting 2011-06-27 10:19:17,470 INFO org.apache.hadoop.hbase.master.AssignmentManager: Bulk assigning 1 region(s) round-robin across 4 server(s) 2011-06-27 10:19:17,472 INFO org.apache.hadoop.hbase.master.AssignmentManager: Bulk assigning done 2011-06-27 10:19:17,480 INFO org.apache.hadoop.hbase.master.AssignmentManager: H3S4,60020,1308946657703 unassigned znodes=1 of total=1 2011-06-27 10:19:21,530 INFO org.apache.hadoop.hbase.master.handler.DisableTableHandler: Attemping to disable table ContentGuidance 2011-06-27 10:19:21,536 INFO org.apache.hadoop.hbase.master.handler.DisableTableHandler: Offlining 1 regions. 2011-06-27 10:19:22,543 INFO org.apache.hadoop.hbase.master.handler.DisableTableHandler: Disabled table is done=true 2011-06-27 10:19:23,544 INFO org.apache.hadoop.hbase.master.handler.TableEventHandler: Handling table operation C_M_DELETE_TABLE on table ContentGuidance 2011-06-27 10:19:23,551 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Deleted region ContentGuidance,,1309191557292.f38bbc9bc672b5ea415b6dcc7e1da43a. from META 2011-06-27 10:20:17,479 FATAL org.apache.hadoop.hbase.master.HMaster: Failed assignment of regions to serverName=H3S3,60020,1308946657608, load=(requests=0, regions=7, usedHeap=976, maxHeap=1011) java.net.SocketTimeoutException: Call to H3S3/10..x.x.x3:60020 failed on socket timeout exception: java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/10..x.x.x:34904 remote=H3S3/10.x.x.x:60020] at org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:802) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:775) at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257) at $Proxy7.openRegions(Unknown Source) at org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:566) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:776) at org.apache.hadoop.hbase.master.AssignmentManager$SingleServerBulkAssigner.run(AssignmentManager.java:1310) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Caused by: java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/10..x.x.x:34904 remote=H3S3/10..x.x.x:60020] at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128) at java.io.FilterInputStream.read(FilterInputStream.java:116) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection$PingInputStream.read(HBaseClient.java:299) at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at java.io.BufferedInputStream.read(BufferedInputStream.java:237) at java.io.DataInputStream.readInt(DataInputStream.java:370) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:539) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:477) 2011-06-27 10:20:17,482 INFO org.apache.hadoop.hbase.master.HMaster: Aborting 2011-06-27 10:20:18,390 INFO org.apache.hadoop.ipc.HBaseServer: Stopping server on 60000 2011-06-27 10:20:18,390 INFO org.apache.hadoop.hbase.master.CatalogJanitor: H3M1:60000-CatalogJanitor exiting 2011-06-27 10:20:18,390 INFO org.apache.hadoop.hbase.master.HMaster$1: H3M1:60000-BalancerChore exiting 2011-06-27 10:20:18,392 INFO org.apache.hadoop.ipc.HBaseServer: Stopping IPC Server listener on 60000
