Re: [zfs-discuss] write cache partial-disk pools (was Server with 4 drives, how to configure ZFS?)

Andrew Gabriel Sun, 19 Jun 2011 06:07:44 -0700

Richard Elling wrote:

Actually, all of the data I've gathered recently shows that the number ofIOPS does not significantly increase for HDDs running random workloads.However the response time does :-( My data is leading me to want to restrictthe queue depth to 1 or 2 for HDDs.

Thinking out loud here, but if you can queue up enough random I/Os, theembedded disk controller can probably do a good job reordering them intoless random elevator sweep pattern, and increase IOPs through reducingthe total seek time, which may be why IOPs does not drop as much as onemight imagine if you think of the heads doing random seeks (they aren'trandom anymore). However, this requires that there's a reasonable queueof I/Os for the controller to optimise, and processing that queue willnecessarily increase the average response time. If you run with a queuedepth of 1 or 2, the controller can't do this.

This is something I played with ~30 years ago, when the OS disk driverwas responsible for the queuing and reordering disc transfers to reducetotal seek time, and disk controllers were dumb. There are lots ofoptions and compromises, generally weighing reduction in total seek timeagainst longest response time. Best reduction in total seek time comesfrom planning out your elevator sweep, and inserting newly queuedrequests into the right position in the sweep ahead. That also gives thepotentially worse response time, as you may have one transfer queued forthe far end of the disk, whilst you keep getting new transfers queuedfor the track just in front of you, and you might end up reading orwriting the whole disk before you get to do that transfer which isqueued for the far end. If you can get a big enough queue, you canmodify the insertion algorithm to never insert into the current sweep,so you are effectively planning two sweeps ahead. Then the worseresponse time becomes the time to process one queue full, rather thanthe time to read or write the whole disk. Lots of other tricks too (e.g.insertion into sweeps taking into account priority, such as if the I/Ois a synchronous or asynchronous, and age of existing queue entries). Ihad much fun playing with this at the time.


--
Andrew Gabriel
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] write cache partial-disk pools (was Server with 4 drives, how to configure ZFS?)

Reply via email to