Re: [zfs-discuss] Best way to convert checksums

Richard Elling Sun, 04 Oct 2009 15:41:16 -0700


On Oct 4, 2009, at 11:51 AM, Miles Nordin wrote:

"re" == Richard Elling <richard.ell...@gmail.com> writes:


   re> The probability of the garbage having both a valid fletcher2
   re> checksum at the proper offset and having the proper sequence
   re> number and having the right log chain link and having the
   re> right block size is considerably lower than the weakness of
   re> fletcher2.

I'm having trouble parsing this.  I think you're confusing a few
different failure modes:

* ZIL entry is written, but corrupted by the storage, so that, for
  example, an entry should be read from the mirrored ZIL instead.


This is attempted, if you have a mirrored slog.

  + broken fletcher2 detects the storage corruption
    CASE A: Good!

  + broken fletcher2 misses the error, so that corrupted data is
    replayed from ZIL into the proper pool, possibly adding a
    stronger checksum to the corrupt data while writing it.
    CASE B: Bad!

  + broken fletcher2 misinterprets storage corruption as signalling
    the end of the ZIL, and any data in the ZIL after the corrupt
    entry is truncated without even attempting to read the mirror.
    (does this happen?)
    CASE C: Bad!

* ZIL entry is intentional garbage, either a partially-written entry
  or an old entry, and should be treated as the end of the ZIL

  + broken fletcher2 identifies the partially written entry by a
    checksum mismatch, or the sequence number identifies it as old
    CASE D: Good!


If the checksum mismatches, you can't go any further because
the pointer to the next ZIL log entry cannot be trusted. So the
roll forward stops.  This is how such logs work -- there is no
end-of-log record.

  + broken fletcher2 misidentifies a partially-written entry as
    complete because of a hash collision
    CASE E: Bad!

  + (hypothetical, only applies to non-existent fixed system) working
    fletcher2 or broken-good-enough fletcher4 misidentifies a
    partially-written entry as complete because of a hash collision
    CASE F: Bad!


As I said before, if the checksum matches, then the data is
checked for sequence number = previous + 1, the blk_birth == 0,
and the size is correct. Since this data lives inside the block, it
is unlikely that a collision would also result in a valid block.
In other words, ZFS doesn't just trust the checksum for slog entries.
 -- richard

If I read your sentence carefully and try to match it with this chart,
it seems like you're saying P(CASE F) << P(CASE E), which seems like
an argument for fixing the checksum.  While you don't say so, I
presume from your other posts you're trying to make a case for doing
nothing, so I'm confused.

I was mostly thinking about CASE B though.  It seems like the special
way the ZIL works has nothing to do with CASE B: if you send data
through the ZIL to a sha256 pool, it can be written to ZIL under
broken-fletcher2, corrupted by the storage, and then read in and
played back corrupt but covered with a sha256 checksum to the pool
proper.  AFAICT your relative-probability sentence has nothing to do
with CASE B.

   re> Unfortunately, the ZIL is also latency sensitive, so the
   re> performance case gets stronger

The performance case advocating what?  not fixing the broken checksum?

   re> while the additional error checking already boosts the
   re> dependability case.

what additional error checking?

Isn't the whole specialness of the ZIL that the checksum is needed in
normal operation, absent storage subsystem corruption, as I originally
said?  It seems like the checksum's strength is more important here,
not less.
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Best way to convert checksums

Reply via email to