On Tue, Jan 26, 2010 at 03:32:23PM +0100, Manuel Bouyer wrote: > Can you give more details on the corruption ? > Was it only directory entries that were corrupted, or did you notice > corruptions in the data block too ?
I was seeing corruption in data blocks too. That's what I meant, when I mentioned corrupt CVS/Root files. Fsck complained about directories that were corrupted right at the start of the data block. I think I didn't save the error messages. But "." and ".." were corrupt or missing. I have a netbsd-3/Xen 2 based server that runs on the same hardware and we have seen FS corruption in a particular domU on that system taqt seems to be related to the file system running out of space. That's what the co-admin running that domU tells me anyway. But I haven't seen the damage or the error messages in the domU personally. > > raid1: IO failed after 5 retries. > > cgd1: error 5 > > xbd IO domain 1: error 5 > > It seems raidframe doesn't do anything special for memory failure. Greg tells me that raidframe does retry several times. And the above error indicates that it retried 5 times. Note that I only got the above message exactly once. But the pool stats indicated several hundred allocation failures. I am contemplating collecting stack traces when getiobuf can't get a buf from the pool and maybe checking that it does always get a buf when it is called with waitok==true. I wonder if the b_iodone issues you are investigating have an impact on this. --chris