Re: [zfs-discuss] write cache partial-disk pools (was Server with 4 drives, how to configure ZFS?)

Andrew Gabriel Mon, 20 Jun 2011 08:38:58 -0700

Richard Elling wrote:

On Jun 19, 2011, at 6:04 AM, Andrew Gabriel wrote:

Richard Elling wrote:

Actually, all of the data I've gathered recently shows that the number of IOPS 
does not significantly increase for HDDs running random workloads. However the 
response time does :-( My data is leading me to want to restrict the queue 
depth to 1 or 2 for HDDs.

Thinking out loud here, but if you can queue up enough random I/Os, the 
embedded disk controller can probably do a good job reordering them into less 
random elevator sweep pattern, and increase IOPs through reducing the total 
seek time, which may be why IOPs does not drop as much as one might imagine if 
you think of the heads doing random seeks (they aren't random anymore). 
However, this requires that there's a reasonable queue of I/Os for the 
controller to optimise, and processing that queue will necessarily increase the 
average response time. If you run with a queue depth of 1 or 2, the controller 
can't do this.


I agree. And disksort is in the mix, too.


Oh, I'd never looked at that.

This is something I played with ~30 years ago, when the OS disk driver was 
responsible for the queuing and reordering disc transfers to reduce total seek 
time, and disk controllers were dumb.


...and disksort still survives... maybe we should kill it?

It looks like it's possibly slightly worse than the pathologically worstresponse time case I described below...

There are lots of options and compromises, generally weighing reduction in 
total seek time against longest response time. Best reduction in total seek 
time comes from planning out your elevator sweep, and inserting newly queued 
requests into the right position in the sweep ahead. That also gives the 
potentially worse response time, as you may have one transfer queued for the 
far end of the disk, whilst you keep getting new transfers queued for the track 
just in front of you, and you might end up reading or writing the whole disk 
before you get to do that transfer which is queued for the far end. If you can 
get a big enough queue, you can modify the insertion algorithm to never insert 
into the current sweep, so you are effectively planning two sweeps ahead. Then 
the worse response time becomes the time to process one queue full, rather than 
the time to read or write the whole disk. Lots of other tricks too (e.g. 
insertion into sweeps taking into account priority, such as if

 the I/O is a synchronous or asynchronous, and age of existing queue entries). 
I had much fun playing with this at the time.


The other wrinkle for ZFS is that the priority scheduler can't re-order I/Os 
sent to the disk.

Does that also go through disksort? Disksort doesn't seem to have anyconcept of priorities (but I haven't looked in detail where it plugs into the whole framework).

So it might make better sense for ZFS to keep the disk queue depth small for 
HDDs.
 -- richard


--
Andrew Gabriel

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] write cache partial-disk pools (was Server with 4 drives, how to configure ZFS?)

Reply via email to