Re: [zfs-discuss] ZFS offline ZIL corruption not detected

Darren J Moffat Thu, 26 Aug 2010 07:34:22 -0700

On 26/08/2010 15:08, Saso Kiselkov wrote:

If I might add my $0.02: it appears that the ZIL is implemented as a
kind of circular log buffer. As I understand it, when a corrupt checksum

It is NOT circular since that implies limited number of entries that getoverwritten.

is detected, it is taken to be the end of the log, but this kind of
defeats the checksum's original purpose, which is to detect device
failure. Thus we would first need to change this behavior to only be
used for failure detection. This leaves the question of how to detect
the end of the log, which I think could be done by using a monotonously
incrementing counter on the ZIL entries. Once we find an entry where the
counter != n+1, then we know that the block is the last one in the sequence.

See the comment part way down zil_read_log_block about how we dosomething pretty much like that for checking the chain of log blocks:


http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/zfs/zil.c#zil_read_log_block

This is the checksum in the BP checksum field.

But before we even got there we checked the ZILOG2 checksum as part ofdoing the zio (in zio_checksum_verify() stage):


http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/zfs/zio_checksum.c#zio_checksum_error

A ZILOG2 checksum is a embedded in the block (at the start, theoriginal ZILOG was at the end) version of fletcher4. If that failed -ie the block was corrupt we would have returned an error back throughthe dsl_read() of the log block.


--
Darren J Moffat
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS offline ZIL corruption not detected

Reply via email to