Re: [zfs-discuss] Dealing with Single Bit Flips - WAS: Cause for data corruption?

2008-03-04 Thread Bob Friesenhahn
On Tue, 4 Mar 2008, Richard Elling wrote: > > Also note: the checksums don't have enough information to > recreate the data for very many bit changes. Hashes might, > but I don't know anyone using sha256. It is indeed important to recognize that the checksums are a way to detect that the data is

Re: [zfs-discuss] Dealing with Single Bit Flips - WAS: Cause for data corruption?

2008-03-04 Thread Richard Elling
[slightly different angle below...] Nathan Kroenert wrote: > Hey, Bob, > > Though I have already got the answer I was looking for here, I thought > I'd at least take the time to provide my point of view as to my *why*... > > First: I don't think any of us have forgotten the goodness that ZFS's >

Re: [zfs-discuss] Dealing with Single Bit Flips - WAS: Cause for data corruption?

2008-03-03 Thread Nathan Kroenert
Hey, Bob My perspective on Big reasons for it *to* be integrated would be: - It's tested - By the folks charged with making ZFS good - It's kept in sync with the differing Zpool versions - It's documented - When the system *is* patched, any changes the patch brings are synced with the rec

Re: [zfs-discuss] Dealing with Single Bit Flips - WAS: Cause for data corruption?

2008-03-03 Thread Boyd Adamson
Nathan Kroenert <[EMAIL PROTECTED]> writes: > Bob Friesenhahn wrote: >> On Tue, 4 Mar 2008, Nathan Kroenert wrote: >>> >>> It does seem that some of us are getting a little caught up in disks >>> and their magnificence in what they write to the platter and read >>> back, and overlooking the poten

Re: [zfs-discuss] Dealing with Single Bit Flips - WAS: Cause for data corruption?

2008-03-03 Thread Bob Friesenhahn
On Tue, 4 Mar 2008, Nathan Kroenert wrote: >> The circus trick can be handled via a user-contributed utility. In fact, >> people can compete with their various repair utilities. There are only >> 1048576 1-bit permuations to try, and then the various two-bit permutations >> can be tried. > > T

Re: [zfs-discuss] Dealing with Single Bit Flips - WAS: Cause for data corruption?

2008-03-03 Thread Nathan Kroenert
Bob Friesenhahn wrote: > On Tue, 4 Mar 2008, Nathan Kroenert wrote: >> >> It does seem that some of us are getting a little caught up in disks >> and their magnificence in what they write to the platter and read >> back, and overlooking the potential value of a simple (though >> potentially comp

Re: [zfs-discuss] Dealing with Single Bit Flips - WAS: Cause for data corruption?

2008-03-03 Thread Bob Friesenhahn
On Tue, 4 Mar 2008, Nathan Kroenert wrote: > > It does seem that some of us are getting a little caught up in disks and > their magnificence in what they write to the platter and read back, and > overlooking the potential value of a simple (though potentially > computationally expensive) circus

Re: [zfs-discuss] Dealing with Single Bit Flips - WAS: Cause for data corruption?

2008-03-03 Thread Nathan Kroenert
Hey, Bob, Though I have already got the answer I was looking for here, I thought I'd at least take the time to provide my point of view as to my *why*... First: I don't think any of us have forgotten the goodness that ZFS's checksum *can* bring. I'm also keenly aware that we have some customer

Re: [zfs-discuss] Dealing with Single Bit Flips - WAS: Cause for data corruption?

2008-03-03 Thread Richard Elling
Bob Friesenhahn wrote: > On Mon, 3 Mar 2008, Darren J Moffat wrote: > > >>> I'm not convinced that single bit flips are the common >>> failure mode for disks. Most enterprise class disks already >>> have enough ECC to correct at least 8 bytes per block. >>> >> and for consumer rather tha

Re: [zfs-discuss] Dealing with Single Bit Flips - WAS: Cause for data corruption?

2008-03-03 Thread Bob Friesenhahn
On Mon, 3 Mar 2008, Darren J Moffat wrote: >> I'm not convinced that single bit flips are the common >> failure mode for disks. Most enterprise class disks already >> have enough ECC to correct at least 8 bytes per block. > > and for consumer rather than enterprise class disks ? You are assumin

Re: [zfs-discuss] Dealing with Single Bit Flips - WAS: Cause for data corruption?

2008-03-03 Thread Richard Elling
Darren J Moffat wrote: > Jeff Bonwick wrote: > >> All that said, I'm still occasionally tempted to bring it back. >> It may become more relevant with flash memory as a storage medium. >> > > Would it be worth considering bring it back as part of zdb rather than > part of the core zio layer

Re: [zfs-discuss] Dealing with Single Bit Flips - WAS: Cause for data corruption?

2008-03-03 Thread Darren J Moffat
Richard Elling wrote: > Darren J Moffat wrote: >> Jeff Bonwick wrote: >> >>> All that said, I'm still occasionally tempted to bring it back. >>> It may become more relevant with flash memory as a storage medium. >>> >> Would it be worth considering bring it back as part of zdb rather than

Re: [zfs-discuss] Dealing with Single Bit Flips - WAS: Cause for data corruption?

2008-03-03 Thread Darren J Moffat
Jeff Bonwick wrote: > All that said, I'm still occasionally tempted to bring it back. > It may become more relevant with flash memory as a storage medium. Would it be worth considering bring it back as part of zdb rather than part of the core zio layer ? -- Darren J Moffat _

Re: [zfs-discuss] Dealing with Single Bit Flips - WAS: Cause for data corruption?

2008-03-02 Thread Jeff Bonwick
Nathan: yes. Flipping each bit and recomputing the checksum is not only possible, we actually did it in early versions of the code. The problem is that it's really expensive. For a 128K block, that's a million bits, so you have to re-run the checksum a million times, on 128K of data. That's 128G

Re: [zfs-discuss] Dealing with Single Bit Flips - WAS: Cause for data corruption?

2008-03-02 Thread Bob Friesenhahn
On Mon, 3 Mar 2008, Nathan Kroenert wrote: > Speaking of expensive, but interesting things we could do - > > From the little I know of ZFS's checksum, it's NOT like the ECC > checksum we use in memory in that it's not something we can use to > determine which bit flipped in the event that there was

[zfs-discuss] Dealing with Single Bit Flips - WAS: Cause for data corruption?

2008-03-02 Thread Nathan Kroenert
Say, Jeff - Speaking of expensive, but interesting things we could do - From the little I know of ZFS's checksum, it's NOT like the ECC checksum we use in memory in that it's not something we can use to determine which bit flipped in the event that there was a single bit flip in the data. (I