Public bug reported:

Ubuntu 4.15.0-46.49-generic 4.15.18


It happens because the process is waiting for its request completion which 
never happens.

The reason for the hung request is a race condition inside the block
layer.

Namely, there is a race condition with a long request.

Each request has a timer. When timer fires it sets REQ_ATOM_COMPLETE and
clears it after finishing.

The request completion checks REQ_ATOM_COMPLETE and if it is set the completion 
returns doing nothing and never executes again, thinking that the request 
doesn't need any attention anymore since it's actually completed.
 
Thus, if the request completion starts executing when the timer handler is in 
progress it just returns seeing that the complete flag is set, then the timer 
clears the complete flag and the request stays in the system forever executing 
the timer handler again and again which just rearms itself.

This happens with the long-running requests only. By default, the request 
timeout is 30 seconds so there should be a request which execution time > 30 
seconds.
This is a rare case for local hardware storages but may appear more often when 
the storage is accessed via a network.

The behavior described affects mainstream 4.13, 4.14, 4.15 kernels and 
rh7-3.10.0-957.5.1.el7 kernel based systems.
 
Before 4.13 - the timer didn't rearm itself and just aborted the request. The 
patch rearming the timer was introduced in 4.13: e72c9a2a67a6400c "scsi: 
virtio_scsi: let host do exception handling"

After 4.15 the block layer switched to using MQ scheme in block layer
which isn't prone to this kind of races. In recent kernel >=5.0 there is
the only MQ scheme left and the legacy race-prone block layer code has
been removed.

** Affects: linux (Ubuntu)
     Importance: Undecided
         Status: New


** Tags: block layer virtio-scsi

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1821738

Title:
  A userspace process hangs in d-state forever in a virtual machine
  environment with a virtio-scsi disk

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1821738/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to