On Sat, Feb 06, 2010 at 09:22:57AM -0800, Richard Elling wrote:
> I'm interested in anecdotal evidence which suggests there is a
> problem as it is currently designed. 

I like to look at it differently:  I'm not sure if there is a
problem. I'd like to have a simple way to discover a problem, using
the work zfs is already doing for me. 

So, I'd like two things from the "system" as a whole:

 - confidence that a send|recv which completes "successfully" has
   really delivered an exact copy.
 - verification that two datasets are the same, from a simple, quick,
   ideally cheap test.

I can get some way to the former from understanding of the mechanisms
used and analysis of their protective coverage and reasoning about the
possible failure modes. Having the latter gets me the rest of the way
there, and even most of the way there by itself.

 confidence < verification < assurance.

So, for example, in early tests with send|recv, I'm sure many of us
have run "rsync -nc .." comparison runs over the results.  That's
easy, relatively quick, but not entirely as cheap as could be.

"It would be very nice" if there was a simple dataset fingerprint that
depended, merkle-style, on the entire contents of the dataset
(snapshot) below, and that could be easily compared on sender and
receiver. This (together with scrub) would provide the desied
assurance that the two are indeed the same. 

Back to analysis and reasoning for a moment; I would have more
confidence in send|recv if I knew the end-to-end protections extended
to cover the on-disk checksums (since the on-disk copies are the
important endpoints for this operation).  I suspect this was a large 
part of the intent behind the OP's question.

As it stands from the current description, there are windows where
errors might be introduced and not detected - in particular, if I have
a protection gap via non-ECC RAM at either send or recv.  I can
cover many of the other gaps with pipeline tools, as discussed. This
is a hard gap to cover, even for detection, without help from the
actual zfs endpoints.

Of course there are conflicting requirements, since we also want
send|recv to facilitate recompression, reblocking, changing checksum
method, etc etc.

So lets turn the question around: what is the best way to verify that
send|recv really has produced an identical copy?

--
Dan.

Attachment: pgpWedvuxzWsP.pgp
Description: PGP signature

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to