Re: [zfs-discuss] Re: Re: RAIDZ2 vs. ZFS RAID-10
Hello Anton, Saturday, January 6, 2007, 6:29:29 AM, you wrote: It's not about the checksum but about how a fs block is stored in raid-z[12] case - it's spread out to all non-parity disks so in order to read one fs block you have to read from all disks except parity disks. ABR However, if we didn't need to verify the checksum, we wouldn't ABR have to read the whole file system block to satisfy small reads. But we'll loose end-to-end integrity feature. And still with 9 or more disks for most workloads we would endup reading them all anyway as each disk would hold so small portion of fs block. -- Best regards, Robertmailto:[EMAIL PROTECTED] http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Re: RAIDZ2 vs. ZFS RAID-10
Ah, that's a major misconception on my part then. I'd thought I'd read that unlike any other RAID implementation, ZFS checked and verified parity on normal data access. That would be useless, and not provide anything extra. I think it's useless if a (disk) block of data holding RAIDZ parity never has silent corruption, or if scrubbing was a lightweight operation that could be run often. ZFS will do a block checksum check (that is, for each block read, read the checksum for that block, and compare to see if it is OK). If the block checksums show OK, then reading the parity for the corresponding data yields no additional useful information. It would yield useful information about the status of the parity information on disk. The read would be done because you're already paying the penalty for reading all the data blocks, so you can verify the stability of the parity information on disk by reading an additional amount. I'm assuming that in a RAIDZ, RAIDZ2, or mirror configuration, should a block checksum show the corresponding block is corrupted, then ZFS will read the parity (or corresponding mirror) block, and attempt to re-construct the bad block, give the corrected info to the calling process, then re-writing the corrected data to a new block section on the disk(s). Right? I was assuming that *all* the data for a FS block was read and if redundant, the redundancy was verified correct (same data on mirrors, valid parity for RAIDZ) or the redundacy would be repaired. At least with a mirror I have a chance of reading all copies over time. With RAIDZ, I'll never read the parity until a problem or a scrub occurs. Nothing wrong with that. I had simply managed to convince myself that it did more. -- Darren Dunham [EMAIL PROTECTED] Senior Technical Consultant TAOShttp://www.taos.com/ Got some Dr Pepper? San Francisco, CA bay area This line left intentionally blank to confuse you. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Re: RAIDZ2 vs. ZFS RAID-10
... If the block checksums show OK, then reading the parity for the corresponding data yields no additional useful information. It would yield useful information about the status of the parity information on disk. The read would be done because you're already paying the penalty for reading all the data blocks, so you can verify the stability of the parity information on disk by reading an additional amount. Sounds like this additional checking (I see your point) could be optional? --Toby ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Re: RAIDZ2 vs. ZFS RAID-10
... If the block checksums show OK, then reading the parity for the corresponding data yields no additional useful information. It would yield useful information about the status of the parity information on disk. The read would be done because you're already paying the penalty for reading all the data blocks, so you can verify the stability of the parity information on disk by reading an additional amount. Sounds like this additional checking (I see your point) could be optional? Well, I'm not offering to implement it or anything. :-) Somehow from some of the early discussions of ZFS, I managed to learn that this was one of the fatures. What I read was wrong, or I misinterpreted it. (Either way, I'm afraid I've managed to repeat it to others since). I would expect such behavior to have some redundancy benefits and some performance and code complexity impacts. I think it's a neat idea and I'm sorry to learn that I've been misunderstanding this as a feature, but I can't guess what the cost of implementing it would be. I suppose having it as a per-pool option could make sense. -- Darren Dunham [EMAIL PROTECTED] Senior Technical Consultant TAOShttp://www.taos.com/ Got some Dr Pepper? San Francisco, CA bay area This line left intentionally blank to confuse you. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: Re: RAIDZ2 vs. ZFS RAID-10
It's not about the checksum but about how a fs block is stored in raid-z[12] case - it's spread out to all non-parity disks so in order to read one fs block you have to read from all disks except parity disks. However, if we didn't need to verify the checksum, we wouldn't have to read the whole file system block to satisfy small reads. Anton This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Re: RAIDZ2 vs. ZFS RAID-10
Darren Dunham wrote: That would be useless, and not provide anything extra. I think it's useless if a (disk) block of data holding RAIDZ parity never has silent corruption, or if scrubbing was a lightweight operation that could be run often. The problem is that you will still need to perform a periodic scrub because you can't be sure that all data will be read during normal operation. So it doesn't make sense to me to (further) penalize every read, when doing so does not remove the need for scrub. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: Re: RAIDZ2 vs. ZFS RAID-10
What happens when a sub-block is missing (single disk failure)? Surely it doesn't have to discard the entire checksum and simply trust the remaining blocks? The checksum is over the data, not the data+parity. So when a disk fails, the data is first reconstructed, and then the block checksum is computed. Also, even if it could read the data from a subset of the disks, isn't it a feature that every read is also verifying the parity for correctness/silent corruption? It doesn't -- we only read the data, not the parity. (See line 708 of vdev_raidz.c.) The parity is checked only when scrubbing. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss