On 2013-01-19 23:39, Richard Elling wrote:
This is not quite true for raidz. If there is a 4k write to a raidz comprised of 4k sector disks, then there will be one data and one parity block. There will not be 4 data + 1 parity with 75% space wastage. Rather, the space allocation more closely resembles a variant of mirroring, like some vendors call "RAID-1E"
I agree with this exact reply, but as I posted sometime late last year, reporting on my "digging in the bowels of ZFS" and my problematic pool, for a 6-disk raidz2 set I only saw allocations (including two parity disks) divisible by 3 sectors, even if the amount of the (compressed) userdata was not so rounded. I.e. I had either miniature files or tails of files fitting into one sector plus two parities (overall a 3 sector allocation), or tails ranging 2-4 sectors and occupying 6 with parity (while 2 or 3 sectors could use just 4 or 5 w/parities, respectively). I am not sure what these numbers mean - 3 being a case for "one userdata sector plus both parities" or for "half of 6-disk stripe" - both such explanations fit in my case. But yes, with current raidz allocation there are many ways to waste space. And those small percentages (or not so small) do add up. Rectifying this example, i.e. allocating only as much as is used, does not seem like an incompatible on-disk format change, and should be doable within the write-queue logic. Maybe it would cause tradeoffs in efficiency; however, ZFS does explicitly "rotate" starting disks of allocations every few megabytes in order to even out the loads among spindles (normally parity disks don't have to be accessed - unless mismatches occur on data disks). Disabling such padding would only help achieve this goal and save space at the same time... My 2c, //Jim _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss