> I could use a little clarification on how these unrecoverable disk errors > behave -- or maybe a lot, depending on one's point of view. > > So, when one of these "once in around ten (or 100) terabytes read" events > occurs, my understanding is that a read error is returned by the drive, > and the corresponding data is lost as far as the drive is concerned.
Yes -- the data being one or more disk blocks. (You can't lose a smaller amount of data, from the drive's point of view, since the error correction code covers the whole block.) > If my assumptions are correct about how these unrecoverable disk errors > are manifested, then a "dumb" scrubber will find such errors by simply > trying to read everything on disk -- no additional checksum is required. > Without some form of parity or replication, the data is lost, but at > least somebody will know about it. Right. Generally if you have replication and scrubbing, then you'll also re-write any data which was found to be unreadable, thus fixing the problem (and protecting yourself against future loss of the second copy). > Now it seems to me that without parity/replication, there's not much > point in doing the scrubbing, because you could just wait for the error > to be detected when someone tries to read the data for real. It's > only if you can repair such an error (before the data is needed) that > such scrubbing is useful. Pretty much, though if you're keeping backups, you could recover the data from backup at this point. Of course, backups could be considered a form of replication, but most of us in file systems don't think of them that way. Anton This message posted from opensolaris.org _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss