Yes, indeed. Checking dmesg should be the first thing of course. And I do see 
errors there:

hammer2_bulkfree: Scanning BACKUP
chain 0000004186d3400a.01 (Inode) meth=32 CHECK FAIL (flags=00144002, bref/data 
ac9f8ef29097a55b/05e63f87e50fb2e2)
   Resides at/in inode 49686
   In pfs UNKNOWN on device serno/WCJ35GE0.s1d

No CRC errors though, but I will check the media anyway.

Thanks for the hint on how to repair the directory.

While here, unrelated question. On my server I have a hardware raid (LSI 
MegaSAS Gen2) without battery. By default write cache was enabled. Do I 
understand it right that it's not safe to have write cache enabled unless there 
is a battery? What kind of cache is that anyway. I guess if it's an nvm then 
maybe it's ok? Another thing: the drive write chache is set to "default" which 
might mean that it's on too? The drives cache is probably a volatile memory - 
so it's definitely should be off, right?

--
Aleksej Lebedev

On Mon, Oct 12, 2020, at 20:19, Matthew Dillon wrote:
> Generally speaking this error occurs if a directory entry is present but the 
> related inode cannot be found.  You can use a hammer2 directive to destroy 
> the directory entry to clean it up.  But before you do so you want to check 
> the media for CHECK FAIL errors.
> 
> The easiest way to do this is to just read off the entire directory structure 
> with tar, e.g. 'tar cf /dev/null filesystem' and then check the dmesg output 
> for errors.  'dmesg | fgrep CHECK'.  Something like that.
> 
> If the filesystem appears clean other than the disconnected directory entry, 
> then you can use 'hammer2 destroy filename' to destroy the directory entry.  
> Be very careful when doing that.
> 
> If the filesystem has other problems, such as CRC errors, other CHECK errors, 
> etc.... then it is best to make a full backup and reformat.
> 
> Also make sure that bulkfree runs don't have errors.  'hammer2 bulkfree ...' 
> and then check dmesg output as well.
> 
> --
> 
> In terms of how a disconnected inode can happen.  It has become more rare but 
> it might still be possible if a power failure or panic occurs during heavy 
> filesystem activity.  It shouldn't be possible for CRC errors to occur unless 
> the media itself corrupted the data.
> 
> -Matt

Reply via email to