which gap?

'RAID-Z should mind the gap on writes' ?

Message was edited by: thometal

I believe this is in reference to the raid 5 write hole, described here:
http://en.wikipedia.org/wiki/Standard_RAID_levels#RAID_5_performance

It's not.

So I'm not sure what the 'RAID-Z should mind the gap on writes' comment is getting at either.

Clarification?


I'm planning to write a blog post describing this, but the basic problem is that RAID-Z, by virtue of supporting variable stripe writes (the insight that allows us to avoid the RAID-5 write hole), must round the number of sectors up to a multiple of nparity+1. This means that we may have sectors that are effectively skipped. ZFS generally lays down data in large contiguous streams, but these skipped sectors can stymie both ZFS's write aggregation as well as the hard drive's ability to group I/Os and write them quickly.

Jeff Bonwick added some code to mind these gaps on reads. The key insight there is that if we're going to read 64K, say, with a 512 byte hole in the middle, we might as well do one big read rather than two smaller reads and just throw out the data that we don't care about.

Of course, doing this for writes is a bit trickier since we can't just blithely write over gaps as those might contain live data on the disk. To solve this we push the knowledge of those skipped sectors down to the I/O aggregation layer in the form of 'optional' I/Os purely for the purpose of coalescing writes into larger chunks.

I hope that's clear; if it's not, stay tuned for the aforementioned blog post.

Adam

--
Adam Leventhal, Fishworks                        http://blogs.sun.com/ahl

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to