We are running hbase 0.94.2 on hadoop 0.20 append version in production
(yes we have plans to upgrade hadoop). Its a 5 node cluster and a 6th node
running just the name node and hmaster.
I am seeing frequent RS YouAreDeadExceptions. Logs here
http://pastebin.com/44aFyYZV
The RS log shows a DFSOutputStream ResponseProcessor exception  for block
blk_-6695300470410774365_837638 java.io.EOFException at 13:41:00 followed
by YouAreDeadException at the same time.
I grep'ed this block in the Datanode (see log here
http://pastebin.com/2jfwCfcK). At 13:41:00 I see an Exception in
receiveBlock for block blk_-6695300470410774365_837638
java.nio.channels.ClosedByInterruptException.
I have also attached the namenode logs around the block here
http://pastebin.com/9NE9J8s1

Across several RS failure instances I see the following pattern - the
region server YouAreDeadException is always preceeded by the EOFException
and datanode ClosedByInterruptException

Is the error in the movement of the block causing the region server to
report a YouAreDeadException? And of course, how do I solve this?

- R

Reply via email to