On Thu, Nov 08, 2007 at 07:28:47PM -0800, can you guess? wrote:
> > How so? In my opinion, it seems like a cure for the brain damage of RAID-5.
> 
> Nope.
> 
> A decent RAID-5 hardware implementation has no 'write hole' to worry about, 
> and one can make a software implementation similarly robust with some effort 
> (e.g., by using a transaction log to protect the data-plus-parity 
> double-update or by using COW mechanisms like ZFS's in a more intelligent 
> manner).

Can you reference a software RAID implementation which implements a solution
to the write hole and performs well. My understanding (and this is based on
what I've been told from people more knowledgeable in this domain than I) is
that software RAID has suffered from being unable to provide both
correctness and acceptable performance.

> The part of RAID-Z that's brain-damaged is its 
> concurrent-small-to-medium-sized-access performance (at least up to request 
> sizes equal to the largest block size that ZFS supports, and arguably 
> somewhat beyond that):  while conventional RAID-5 can satisfy N+1 
> small-to-medium read accesses or (N+1)/2 small-to-medium write accesses in 
> parallel (though the latter also take an extra rev to complete), RAID-Z can 
> satisfy only one small-to-medium access request at a time (well, plus a 
> smidge for read accesses if it doesn't verity the parity) - effectively 
> providing RAID-3-style performance.

Brain damage seems a bit of an alarmist label. While you're certainly right
that for a given block we do need to access all disks in the given stripe,
it seems like a rather quaint argument: aren't most environments that matter
trying to avoid waiting for the disk at all? Intelligent prefetch and large
caches -- I'd argue -- are far more important for performance these days.

> The easiest way to fix ZFS's deficiency in this area would probably be to map 
> each group of N blocks in a file as a stripe with its own parity - which 
> would have the added benefit of removing any need to handle parity groups at 
> the disk level (this would, incidentally, not be a bad idea to use for 
> mirroring as well, if my impression is correct that there's a remnant of 
> LVM-style internal management there).  While this wouldn't allow use of 
> parity RAID for very small files, in most installations they really don't 
> occupy much space compared to that used by large files so this should not 
> constitute a significant drawback.

I don't really think this would be feasible given how ZFS is stratified
today, but go ahead and prove me wrong: here are the instructions for
bringing over a copy of the source code:

  http://www.opensolaris.org/os/community/tools/scm

- ahl

-- 
Adam Leventhal, FishWorks                        http://blogs.sun.com/ahl
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to