Hi, Sir:
I have been working with hadoop and hbase for some time and in my
experience, hadoop is more stable than hbase.
I used hadoop 0.20.1 and 0.20.9, both of them yahoo distribution, and it
runs very stably in my 32 machine cluster of 4 to 8
gigs of ram. However, I am really struggling with hbase, I have tried 0.20.2
, 0.20.3, 0.20.4 , all from the apache distribution.
And I constantly run into NotServingRegionException . I have to restart the
zookeeper (3.2.2) and hbase to restore the hbase
to good state. After several hours of heavy writing, it get into this state
again.
In my test cluster of 6 machines and much lighter load, I actually don't
run into this situation. Last week, I did a deeper investigation and noticed
that some of the blocks can't be read by hbase, and Todd looked into the
log, saying it is because
of the HDFS-445 bug not in my build. So I went ahead and patched the
HDFS-445 in, after running several hours, another problem happens, this time
I saw lots of other error.
I wonder if any body can recommend a combination of hadoop /hbase
distribution that can run stably in production environment, with heavy
writing and light reading. If there are some configuration change that can
help, it is appreciated too.
Jimmy.
2010-05-26 23:03:06,025 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server
Responder, call put([...@4767374b,
[Lorg.apache.hadoop.hbase.client.Put;@495f418c) from
10.110.8.75:46421: output error
2010-05-26 23:03:06,025 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 9 on 60020 caught: java.nio.channels.ClosedChannelException
at
sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:1
26)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324)
at
org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java
:1125)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBa
seServer.java:615)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServ
er.java:679)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:
943)
2010-05-26 23:03:19,120 DEBUG
org.apache.hadoop.hbase.regionserver.CompactSplitT
hread: Compaction requested for region
HEARTBEAT_MASTERPATCH,,1274304444539/1273
809222 because: regionserver/10.110.8.92:60020.cacheFlusher
2010-05-26 23:03:20,122 WARN
org.apache.hadoop.hbase.regionserver.MemStoreFlushe
r: Region HEARTBEAT_MASTERPATCH,,1274304444539 has too many store files,
putting
it back at the end of the flush queue.