Re: [zfs-discuss] corrupt zfs stream? checksum mismatch

Jonathan Wheeler Fri, 15 Aug 2008 06:33:31 -0700

Hi Richard,

Thanks for the detailed reply, and the work behind the scenes filing the CRs.
I've bookmarked both, and will keep a keen eye on them for status changes.


As Miles put it, I'll have to put these dumps into storage for possible future 
use.
I do dearly hope that I'll be able to recover most of that data in the future, 
but for the most important bits (documents/spreadsheets), I'll have to rebuild 
them by way of some rather intensive data entry based on hard copies, now.

Not fun.

I do have a working [zfs send dump!] backup from October, so it's not a total 
loss of my livelihood, but it'll be a life lesson alright.

With CR 6736794, I wonder if some extra notes could be added around the 
checksumming side of the code?
The wording that has been used doesn't quite match my scenario, but I certainly 
agree with what  requested functionality has been requested there.

I have a 50GB zfs send dump and zfs receive is failing (and rolling back) 
around the 20GB mark.
While the exact cause and nature of my issue remains unknown, I very much 
expect that the vast majority of my zfs send dump is in fact in tact, including 
data beyond that 20GB checksum error point. I.E there is a problem around the 
20GB mark, but I expect that the remaining 30GB contains "good" data, or in 
very least, *mostly* good data.

The CR appears to be only requesting that zfs receive stop at the 20GB mark, 
but {new feature} allows the failed restore attempt to be mountable, in a 
unknown/known bad state.

I'd much prefer that zfs receive continue on error too, thus giving it the full 
50GB to process and attempt to repair, rather than only the data up until the 
point that it encountered it's first problem.

Without knowing much about the actual on disk format,metadata and structures I 
can't be sure, but the fs is going to have a much better chance at recovering 
when there is more data available across the entire length of the fs, right? I 
know from my linux days that the ext2/3 superblocks were distributed across the 
full disk, so the more of the disk that it can attempt to read, the better the 
chance that it'll find more correct metadata to use in an attempt a repair of 
the FS.

And of course the second benefit of reading more of the data stream, past an 
error is that more user data will at least have a chance of being recovered. If 
it stops half way, it has _no_ chance of recovering that data, so I favor my 
odds of letting it go on to at least try :)

Or is that an entirely new CR itself?

Jonathan
 
 
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] corrupt zfs stream? checksum mismatch

Reply via email to