Hi, Sir:
I have been working with hadoop and hbase for some time and in my experience, hadoop is more stable than hbase. I used hadoop 0.20.1 and 0.20.9, both of them yahoo distribution, and it runs very stably in my 32 machine cluster of 4 to 8 gigs of ram. However, I am really struggling with hbase, I have tried 0.20.2 , 0.20.3, 0.20.4 , all from the apache distribution. And I constantly run into NotServingRegionException . I have to restart the zookeeper (3.2.2) and hbase to restore the hbase to good state. After several hours of heavy writing, it get into this state again. In my test cluster of 6 machines and much lighter load, I actually don't run into this situation. Last week, I did a deeper investigation and noticed that some of the blocks can't be read by hbase, and Todd looked into the log, saying it is because of the HDFS-445 bug not in my build. So I went ahead and patched the HDFS-445 in, after running several hours, another problem happens, this time I saw lots of other error. I wonder if any body can recommend a combination of hadoop /hbase distribution that can run stably in production environment, with heavy writing and light reading. If there are some configuration change that can help, it is appreciated too.

Jimmy.




2010-05-26 23:03:06,025 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server Responder, call put([...@4767374b, [Lorg.apache.hadoop.hbase.client.Put;@495f418c) from
10.110.8.75:46421: output error
2010-05-26 23:03:06,025 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 9 on 60020 caught: java.nio.channels.ClosedChannelException at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:1
26)
       at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324)
at org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java
:1125)
at org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBa
seServer.java:615)
at org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServ
er.java:679)
at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:
943)



2010-05-26 23:03:19,120 DEBUG org.apache.hadoop.hbase.regionserver.CompactSplitT hread: Compaction requested for region HEARTBEAT_MASTERPATCH,,1274304444539/1273
809222 because: regionserver/10.110.8.92:60020.cacheFlusher
2010-05-26 23:03:20,122 WARN org.apache.hadoop.hbase.regionserver.MemStoreFlushe r: Region HEARTBEAT_MASTERPATCH,,1274304444539 has too many store files, putting
it back at the end of the flush queue.



Reply via email to