Public bug reported:

[Impact]

Without this patch, Xen guest devices that use MSI interrupts may use
the incorrect pirq and fail to re-configure their MSI interrupts.  For
example, Xen guests with multiple NVMe controllers will fail to
correctly configure all their MSI interrupts, due to the way the NVMe
driver works - it enables a single MSI for each controller, then
disables it, then re-enables multiple MSI interrupts.

[Test Case]

Start a Xen hypervisor using Trusty qemu, which does not have the
required patch.  Then in that hypervisor, start a Xen guest running
Xenial Ubuntu that contains the patch "xen: do not re-use pirq number
cached in pci device msi msg data" (i.e. Xenial kernel 4.4.0-61) from
bug 1656381; the guest must be configured with passthrough devices, the
easiest to reproduce this bug is multiple NVMe controllers.  In the
guest, with multiple NVMe controllers, some of the controller MSI
interrupts will fail to be configured (actually, reconfigured) as they
are using the wrong pirq for the device.

The combination of hypervisor and guest kernel results are:

1) qemu not patched (2.0.0 and earlier), guest kernel not patched: CORRECT 
behavior
hypervisor: Trusty qemu or UCA Icehouse qemu
guest: all without patch from bug 1656381
failure: none

2) qemu not patched (2.0.0 and earlier), guest kernel patched: INCORRECT 
behavior
hypervisor: Trusty qemu or UCA Icehouse qemu
guest: all with patch from bug 1656381
failure: MSI interrupts will fail to be configured for any device, if the 
device disables and then re-enables its MSI.  Only the first time a device 
enables MSI will work.  For example, unloading a driver will result in failure 
to enable MSI when the driver is reloaded.

3) qemu patched (2.1.0 and later), guest kernel not patched: INCORRECT behavior
hypervisor: Vivid or later qemu, or UCA Kilo or later qemu
guest: all without patch from bug 1656381
failure: MSI interrupts in the guest may not be correctly mapped if device B 
enables its MSI after device A has disabled its MSI; when device A re-enables 
its MSI, some of its interrupts will fail to be configured correctly.  NVMe 
shows this repeatedly with multiple NVMe controllers; usually only 1 NVMe 
controller will finish initialization correctly.

4) qemu patched (2.1.0 and later), guest kernel patched: CORRECT behavior
hypervisor: Vivid or later qemu, or UCA Kilo or later qemu
guest: all with patch from bug 1656381
failure: none

In a guest with multiple NVMe (passthrough) controllers, situations #1
and #4 will not fail, while situation #2 will cause failures for all
NVMe controllers, and #3 will cause all but one (usually, depending on
race conditions around concurrent MSI disable/reenable) to fail.

Note that the guest can be running any kernel, not just Xenial, but NVMe
failures to configure MSI are hidden in the Trusty kernel due to a
design difference.


[Regression Potential]

This patch to the hypervisor, and the corresponding patch to the guest
kernel, must work together; either the hypervisor must unmap a device's
pirqs (which this patch adds) when disabled and the kernel must *not*
re-use them, or the hypervisor must *not* unmap a device's pirqs when
disabled and the kernel must re-use them.  If the two do not both do the
right thing, then some of the device's MSI interrupts will not work
correctly, under certain conditions.

Current Trusty qemu Xen hypervisors will see only situations #1 and #2
above; unpatched guest kernels will work right, patched guest kernels
will not work right.  Patched Trusty qemu Xen hypervisors will see only
situations #3 and #4 above, which is the same situations as Xenial or
later qemu Xen hypervisors, and UCA Mitaka and later Xen hypervisors.

So essentially - a "regression" will happen for Trusty Xen hypervisors
that makes then behave the same all newer hypervisors, if the guest
kernels aren't patched.

[Other Info]

The qemu commit for this is:
c976437c7dba9c7444fb41df45468968aaa326ad ("qemu-xen: free all the pirqs for 
msi/msix when driver unload")

The upstream discussion for the patch to the guest kernel can be found at:
https://lists.xen.org/archives/html/xen-devel/2017-01/msg00447.html

Related: bug 1656381 ("Xen MSI setup code incorrectly re-uses cached
pirq")

** Affects: qemu (Ubuntu)
     Importance: Undecided
         Status: Fix Released

** Changed in: qemu (Ubuntu)
       Status: New => Fix Released

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1657489

Title:
  qemu-xen: free all the pirqs for msi/msix when driver unload

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1657489/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to