Hi, We have region server sporadically stopping under load due supposedly to errors writing to HDFS. Things like:
2012-03-28 00:37:11,210 WARN org.apache.hadoop.hdfs.DFSClient: Error while syncing java.io.IOException: All datanodes 10.1.104.10:50010 are bad. Aborting.. It's happening with a different region server and data node every time, so it's not a problem with one specific server and there doesn't seem to be anything really wrong with either of them. I've already increased the file descriptor limit, datanode xceivers and data node handler count. Any idea what can be causing these errors? A more complete log is here: http://pastebin.com/wC90xU2x Thanks. -eran