> The thing is- as far as I know the OS doesn't ask the disk to find a place
> to fit the data. Instead the OS tracks what space on the disk is free and
> then tells the disk where to write the data.

Yes and no, I did not formulate my idea clearly enough, sorry for confusion ;)

Yes - The disks don't care about free blocks at all. For them they are just LBA 
sector numbers.

No - The OS does track which sectors correlate to its logical blocks it deems 
suitable for a write, and asks the disk to position its mechanical head to a 
specific track and access a specific sector. This is a slow operation which can 
only be done about 180-250 times per second for very random I/Os (may be more 
with HDD/Controller caching, queuing and faster spindles).

I'm afraid that seeking to very dispersed metadata blocks, such as traversing 
the tree during a scrub on a fragmented drive, may qualify as a very random I/O.

This reminds me of a long-hanging "BP Rewrite" project which would allow live 
re-arranging of ZFS data allowing, in particular, some extent of 
defragmentation. More useful usages would be changes to RAIDZ levels and number 
of disks though, maybe even removal of top-level VDEVs from a sufficiently 
empty pool... Hopefully the Illumos team or some other developers would push 
this idea into reality ;)

There was a good tip from Jim Litchfield regarding VDEV Queue Sizing, though. 
Possible current default for zfs_vdev_max_pending is 10, which is okay (or may 
be even too much) for individual drives, but is not very much for arrays of 
many disks hidden behind a smart controller with its own caching and queuing, 
be it a SAN box controller or a PCI one which would intercept and reinterpret 
your ZFS's calls.

So maybe this is indeed a bottleneck - which you would see in "iostat -Xn 1" as 
"actv" field numbers which are near the configured queue size. 

//Jim
-- 
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to