Hey James,
> Using Bill's notation, if there isn't a roundup (writing 4k fs blocks > to a 4 device RAID-Z) wouldn't you get something like this: > > Disk 0 1 2 3 > ---------------------- > LBA 0 A. A A A > 1 A. A A A > 2 A. A A B. > 3 B B B B. > 4 B B B B. > 5 B B C. C > etc. You're right: that strategy would work fine, but it does have an unfortunate side effect. Let's say you then delete the file associated with B and write a bunch of 1 sector files; you'll end up with something like this: Disk 0 1 2 3 ---------------------- LBA 0 A. A A A 1 A. A A A 2 A. A A D. 3 D E. E F. 4 F G. G H. 5 H X C. C etc. In this case, the sector marked 'X' is a) unused, b) unallocatable (since a single sector on a single disk is insufficient to meet the replication requirements), and c) unassociated with any file (the space is unaccounted for). In the degenerate case, you could fragment your RAID-Z stripe in such a way that all "free" space was unallocatable and unassociated with any file (i.e. there would be no way to determine how that free space could be reclaimed). By rounding space allocations up to a multiple of the minimum allocatable size, we ensure that any wasted space is associated with a specific data object. Adam -- Adam Leventhal, Solaris Kernel Development http://blogs.sun.com/ahl