Re: [zfs-discuss] Re: Re: RAIDZ2 vs. ZFS RAID-10

2007-01-09 Thread Robert Milkowski
Hello Anton,

Saturday, January 6, 2007, 6:29:29 AM, you wrote:

 It's not about the checksum but about how a fs block is stored in
 raid-z[12] case - it's spread out to all non-parity disks so in order
 to read one fs block you have to read from all disks except parity
 disks.

ABR However, if we didn't need to verify the checksum, we wouldn't
ABR have to read the whole file system block to satisfy small reads.

But we'll loose end-to-end integrity feature.
And still with 9 or more disks for most workloads we would endup
reading them all anyway as each disk would hold so small portion of fs
block.

-- 
Best regards,
 Robertmailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Re: RAIDZ2 vs. ZFS RAID-10

2007-01-05 Thread Darren Dunham
  Ah, that's a major misconception on my part then.  I'd thought I'd read
  that unlike any other RAID implementation, ZFS checked and verified
  parity on normal data access.  

 That would be useless, and not provide anything extra.

I think it's useless if a (disk) block of data holding RAIDZ parity
never has silent corruption, or if scrubbing was a lightweight operation
that could be run often.

 ZFS will do a 
 block checksum check (that is, for each block read, read the checksum 
 for that block, and compare to see if it is OK).  If the block checksums 
 show OK, then reading the parity for the corresponding data yields no 
 additional useful information.

It would yield useful information about the status of the parity
information on disk.

The read would be done because you're already paying the penalty for
reading all the data blocks, so you can verify the stability of the
parity information on disk by reading an additional amount.

 I'm assuming that in a RAIDZ, RAIDZ2, or mirror configuration, should a 
 block checksum show the corresponding block is corrupted, then ZFS will 
 read the parity (or corresponding mirror) block, and attempt to 
 re-construct the bad block, give the corrected info to the calling 
 process, then re-writing the corrected data to a new block section on 
 the disk(s).
 
 Right?

I was assuming that *all* the data for a FS block was read and if
redundant, the redundancy was verified correct (same data on mirrors,
valid parity for RAIDZ) or the redundacy would be repaired.  At least
with a mirror I have a chance of reading all copies over time.  With
RAIDZ, I'll never read the parity until a problem or a scrub occurs.

Nothing wrong with that.  I had simply managed to convince myself that
it did more.  

-- 
Darren Dunham   [EMAIL PROTECTED]
Senior Technical Consultant TAOShttp://www.taos.com/
Got some Dr Pepper?   San Francisco, CA bay area
  This line left intentionally blank to confuse you. 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Re: RAIDZ2 vs. ZFS RAID-10

2007-01-05 Thread Toby Thain

... If the block checksums
show OK, then reading the parity for the corresponding data yields no
additional useful information.


It would yield useful information about the status of the parity
information on disk.

The read would be done because you're already paying the penalty for
reading all the data blocks, so you can verify the stability of the
parity information on disk by reading an additional amount.


Sounds like this additional checking (I see your point) could be  
optional?


--Toby

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Re: RAIDZ2 vs. ZFS RAID-10

2007-01-05 Thread Darren Dunham
  ... If the block checksums
  show OK, then reading the parity for the corresponding data yields no
  additional useful information.
 
  It would yield useful information about the status of the parity
  information on disk.
 
  The read would be done because you're already paying the penalty for
  reading all the data blocks, so you can verify the stability of the
  parity information on disk by reading an additional amount.
 
 Sounds like this additional checking (I see your point) could be  
 optional?

Well, I'm not offering to implement it or anything.  :-) Somehow from
some of the early discussions of ZFS, I managed to learn that this was
one of the fatures.  What I read was wrong, or I misinterpreted it.
(Either way, I'm afraid I've managed to repeat it to others since).

I would expect such behavior to have some redundancy benefits and some
performance and code complexity impacts.  I think it's a neat idea and
I'm sorry to learn that I've been misunderstanding this as a feature,
but I can't guess what the cost of implementing it would be.

I suppose having it as a per-pool option could make sense.



-- 
Darren Dunham   [EMAIL PROTECTED]
Senior Technical Consultant TAOShttp://www.taos.com/
Got some Dr Pepper?   San Francisco, CA bay area
  This line left intentionally blank to confuse you. 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Re: RAIDZ2 vs. ZFS RAID-10

2007-01-05 Thread Anton B. Rang
 It's not about the checksum but about how a fs block is stored in
 raid-z[12] case - it's spread out to all non-parity disks so in order
 to read one fs block you have to read from all disks except parity
 disks.

However, if we didn't need to verify the checksum, we wouldn't
have to read the whole file system block to satisfy small reads.

Anton
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Re: RAIDZ2 vs. ZFS RAID-10

2007-01-05 Thread Richard Elling

Darren Dunham wrote:

That would be useless, and not provide anything extra.



I think it's useless if a (disk) block of data holding RAIDZ parity
never has silent corruption, or if scrubbing was a lightweight operation
that could be run often.

  

The problem is that you will still need to perform a periodic scrub
because you can't be sure that all data will be read during normal
operation.  So it doesn't make sense to me to (further) penalize
every read, when doing so does not remove the need for scrub.
-- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Re: RAIDZ2 vs. ZFS RAID-10

2007-01-04 Thread Anton B. Rang
 What happens when a sub-block is missing (single disk failure)?  Surely
 it doesn't have to discard the entire checksum and simply trust the
 remaining blocks?

The checksum is over the data, not the data+parity.  So when a disk fails,
the data is first reconstructed, and then the block checksum is computed.

 Also, even if it could read the data from a subset of the disks, isn't
 it a feature that every read is also verifying the parity for
 correctness/silent corruption?

It doesn't -- we only read the data, not the parity.  (See line 708 of
vdev_raidz.c.)  The parity is checked only when scrubbing.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss