> In my test cluster of 6 machines and much lighter load, I actually don't > run into this situation.
Are you overloading hbase on your larger cluster? Is it swapping or is mapreduce stealing i/o from datanodes such that hbase is struggling (or stealing from zookeeper)? Are you monitoring your cluster nodes? Do the graphs tell anything interesting around failure times? Last week, I did a deeper investigation and noticed > that some of the blocks can't be read by hbase, and Todd looked into the > log, saying it is because > of the HDFS-445 bug not in my build. So I went ahead and patched the > HDFS-445 in, after running several hours, another problem happens, this time > I saw lots of other error. What was it? > I wonder if any body can recommend a combination of hadoop /hbase > distribution that can run stably in production environment, with heavy > writing and light reading. If there are some configuration change that can > help, it is appreciated too. > Use latest hadoop and hbase 0.20.3 (hbase 0.20.4 has a known issue -- 0.20.5 should be out soon). Tell us more about your loading pattern. Thanks, St.Ack
