On 5/7/07, Tony Galway <[EMAIL PROTECTED]> wrote:
Greetings learned ZFS geeks & guru's,

Yet another question comes from my continued ZFS performance testing. This has 
to do with zpool iostat, and the strangeness that I do see.
I've created an eight (8) disk raidz pool from a Sun 3510 fibre array giving me 
a 465G volume.
# zpool create tp raidz c4t600 ... 8 disks worth of zpool
# zfs create tp/pool
# zfs set recordsize=8k tp/pool
# zfs set mountpoint=/pool tp/pool

This is a known problem, and is an interaction between the alignment
requirements imposed by RAID-Z and the small recordsize you have
chosen.  You may effectively avoid it in most situations by choosing a
RAID-Z strip width of 2^n+1.  For a fixed record size, this will work
perfectly well.

Even so, there will still be cases where small files will cause
problems for RAID-Z.  While it does not affect many people right now,
I think it will become a more serious issue when disks move to 4k
sectors.

I think the reason for the alignment constraint was to ensure that the
stranded space was accounted for, otherwise it would cause problems as
the pool fills up.  (Consider a 3 device RAID-Z, where only one data
sector and one parity sector are written; the third sector in that
stripe is essentially dead space.)

Would it be possible (or worthwhile) to make the allocator aware of
this dead space, rather than imposing the alignment requirements?
Something like a concept of tentatively allocated space in the
allocator, which would be managed based on the requirements of the
vdev.  Using such a mechanism, it could coalesce the space if possible
for allocations.  Of course, it would also have to convert the
misaligned bits back into tentatively allocated space when blocks are
freed.

While I expect this may require changes which would not easily be
backward compatible, the alignment on RAID-Z has always felt a bit
wrong.  While the more severe effects can be addressed by also writing
out the dead space, that will not address uneven placement of data and
parity across the stripes.

Any thoughts?

Chris
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to