Hey James,

> Using Bill's notation, if there isn't a roundup (writing 4k fs blocks
> to a 4 device RAID-Z) wouldn't you get something like this:
> 
> Disk     0   1   2   3
> ----------------------
> LBA  0   A.  A   A   A
>      1   A.  A   A   A
>      2   A.  A   A   B.
>      3   B   B   B   B.
>      4   B   B   B   B.
>      5   B   B   C.  C
>      etc.

You're right: that strategy would work fine, but it does have an unfortunate
side effect. Let's say you then delete the file associated with B and write
a bunch of 1 sector files; you'll end up with something like this:

Disk     0   1   2   3
----------------------
LBA  0   A.  A   A   A
     1   A.  A   A   A
     2   A.  A   A   D.
     3   D   E.  E   F.
     4   F   G.  G   H.
     5   H   X   C.  C
     etc.

In this case, the sector marked 'X' is a) unused, b) unallocatable (since a
single sector on a single disk is insufficient to meet the replication
requirements), and c) unassociated with any file (the space is unaccounted
for). In the degenerate case, you could fragment your RAID-Z stripe in such
a way that all "free" space was unallocatable and unassociated with any file
(i.e. there would be no way to determine how that free space could be
reclaimed).

By rounding space allocations up to a multiple of the minimum allocatable
size, we ensure that any wasted space is associated with a specific data
object.

Adam

-- 
Adam Leventhal, Solaris Kernel Development       http://blogs.sun.com/ahl

Reply via email to