Long delays for I/Os using CFQ scheduler

Douglas Miller Tue, 15 Aug 2017 08:20:39 -0700

I present a lot of information here, hopefully it is not too much.Perhaps this should be put in a bugzilla. I'm not familiar with CFQ, orother I/O schedulers, so have been doing more generic debugging.


PROBLEM:

I am observing some undesirable behavior in the CFQ I/O scheduler. I amseeing that a subset of the I/Os are experiencing much longer delaysthan the rest. The behavior is not observed when using the "deadline"scheduler.


BACKGROUND:

I am running on Ubuntu Zesty using kernel 4.10.0-30, with the commitmentioned next. Ubuntu sets the default I/O scheduler to "cfq".

I am running the POWER architecture test suite "HTX"(https://github.com/open-power/HTX), and only certain phases of thattest seem to perturb CFQ. During these phases, I see that a smallpercentage of the total I/Os are getting delayed. Without commithttps://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5be6b75610cefd1e21b98a218211922c2feb6e08I, of course, see very long delays (sometimes over an hour). With thatcommit, the delays seem to max-out around 110 seconds - which to mestill seems too long. HTX monitors each I/O's progress, and will startcomplaining if an I/O takes longer than 10 minutes to complete. Withthis commit, no I/Os seem to reach that 10 minute threshold and so HTXno longer complains.

In HTX, I am running the I/O-only variation (mdt.io) and in addition Idisable all devices except for a few disks (otherwise unused). Earlydebugging showed that, in the case of the delayed I/Os, the delay is dueto the I/O not getting submitted to the SCSI layer (scsi_dispatch_cmd).When the I/O eventually does get submitted to SCSI, it completes promptly.

The test phase was shown to be using only __blkdev_direct_IO_simple, soI put the debugging in that path. I replaced the call to io_schedule()with a debug routine that calls "io_schedule_timeout(HZ * 60)". If thetimeout trips, then some debugging info is printed and io_schedule() iscalled to wait for eventual completion. Upon completion of a timed-outI/O, more debug is printed - including the total time elapsed. If notimeout occurs, no debugging is done.


OBSERVATIONS:

The period where I observe delayed I/Os seems to span about 2 hours.During that time, between 400-500 I/Os trip the 60-second timeout (4disks under test). It is estimated that about 2 million I/Os (per disk)occur during this interval. Earlier testing showed that a timeout of 30seconds traps many more I/Os, so it is likely that the delaysexperienced by I/Os varies over the full range.

Both reads and writes can experience these delays, although most of theI/Os going on are reads.

What I am observing is that when HTX first starts (when it ramps-upenough) I/Os will start tripping the timeout. In this first phase, theI/Os seem to all complete in less than 70 seconds. The next phase is notreached for another 10 hours, but at that time the maximum I/O delay isover 90 seconds. Each successive phase shows maximum delay values that,in general, increase. The 6th cycle I observed over 110 second delays,after which that maximum value declined. The 10th cycle (last onecollected), showed a maximum of just under 100 seconds.


Test runs with the scheduler set to "deadline" never tripped the timeout.

QUESTIONS:

Is this behavior expected/correct? Are there other commits that mightaddress this? Is there more information I can provide to help diagnose?

My concern is that certain processes/threads are getting unfairlydelayed due to this behavior.



Thanks,

Doug

Long delays for I/Os using CFQ scheduler

Reply via email to