Frank suggested (and was right) that we can mostly take the bug 1907421
SRu content and copy it over. I've done so and think we are good no
that.

** Description changed:

+ [Impact]
+ 
+ * In case a vfio-pci device on s390x is under I/O load, vfio-pci device 
+   may end up in error state.
+ 
+ * However, lazy unmapping in s390x can in fact cause quite a large number 
+   of outstanding DMA requests to build up prior to being purged - 
+   potentially the entire guest DMA space.
+ 
+ * This results in unexpected errors seen in qemu such as 'VFIO_MAP_DMA 
+   failed: No space left on device'.
+ 
+ * The solution requires a change to both kernel and qemu.
+ 
+ * The qemu side of things is addressed by this SRU.
+ 
+ [Fix]
+ 
+ * A patch series that utilizes the recent kernel additions. It will
+ check the limits and refresh mappings before being exceeded
+ 
+ [Test Case]
+ 
+ * IBM Z or LinuxONE hardware with Ubuntu Server 20.10 installed.
+ 
+ * PCIe adapters in place that provide vfio, like RoCE Express 2.
+ 
+ * A KVM host needs to be setup and a KVM guest (use again 20.10) that
+ uses vfio.
+ 
+ * Generate I/O that flows through the vf and watch out for error like
+ 'VFIO_MAP_DMA failed: No space left on device' in the log.
+ 
+ * We don't have all of that in place, IBM (has done on the related bug
+ as well) will do these tests.
+ 
+ [Regression Potential]
+ 
+ * This is split in two.
+   - generally the reworks - albeit small - for vfio could affect all 
+     platforms so there I'd expect issues - if any - in vfio use-cases like 
+     device pass through
+   - on s390x there was more changed, but the regressions we need to look 
+     out for would still be in the same "vfio used for pass through"
+     use-case area
+ 
+ [Other]
+ 
+ * The kernel portion got accepted in bug 1907421
+ 
+ 
+ ---
+ 
  Description:   s390x/pci: Honor vfio DMA limiting
  Symptom:       vfio-pci device on s390 enters error state
  Problem:       Kernel commit 492855939bdb added a limit to the number of
-                concurrent DMA requests for a vfio container.  However, lazy
-                unmapping in s390 can in fact cause quite a large number of
-                outstanding DMA requests to build up prior to being purged,
-                potentially the entire guest DMA space.  This results in
-                unexpected errors seen in qemu such as 'VFIO_MAP_DMA failed:
-                No space left on device'
+                concurrent DMA requests for a vfio container.  However, lazy
+                unmapping in s390 can in fact cause quite a large number of
+                outstanding DMA requests to build up prior to being purged,
+                potentially the entire guest DMA space.  This results in
+                unexpected errors seen in qemu such as 'VFIO_MAP_DMA failed:
+                No space left on device'
  Solution:      The solution requires a change to both kernel and qemu - For
-                qemu, add functionality to get the number of allowable DMA
-                DMA requests via the VFIO_IOMMU_GET_INFO ioctl and then ensure
-                that the guest is told to refresh mappings before exceeding
-                the vfio limit.
+                qemu, add functionality to get the number of allowable DMA
+                DMA requests via the VFIO_IOMMU_GET_INFO ioctl and then ensure
+                that the guest is told to refresh mappings before exceeding
+                the vfio limit.
  Reproduction:  Put a vfio-pci device on s390 under I/O load
  
  This QEMU issue is related to the kernel issue in launchpad bug #1907421.  
Backport patches have been attached for a subset of the required patches for 
this fix...  The backports required boiled down to 3 major reasons:
  1) For the header sync, I suspect you only want the minimal set of changes 
needed
  2) There is a missing upstream commit (408b55db8be3) that re-organizes the 
location of 2 s390-pci header files, causing conflicts
- 3) Adjustments had to be made due to the QEMU build system change (meson) 
+ 3) Adjustments had to be made due to the QEMU build system change (meson)
  
  I initially performed the backport against 4.2/focal-devel; the same
  patches and process will also apply cleanly to 5.0/groovy-devel.  There
  should be nothing required for hirsute as everything is already in
  upstream QEMU 5.2.
  
  In summary:
  53ba2eee52bf: Backport as patch 0001.  Rather than doing a full header sync, 
update ONLY the header change needed for the DMA fix.  See attached patch 0001.
  3ab7a0b40d4b: cherry-pick works
  7486a62845b1: cherry-pick works
  cd7498d07fbb: Backport as patch 0004.  This upstream commit added a new part 
using meson, which does not exist in 5.0.
  37fa32de7073: Backport as patch 0005.  This was mainly due to conflicts with 
a missing patch that relocated some include files.
  77280d33bc9c: Backport as patch 0006.  This was due to different build system 
+ CONFIG_DEVICES doesn't exist.
  
  As such, I have attached patches 0001, 0004, 0005 and 0006.  Please
  cherry pick for patches 0002 and 0003.
  
  To verify, I applied the patches provided and cherry-picks against both
  focal-devel and groovy-devel.  In each case, for the host system I used
  the groovy kernel Frank provided in launchpad bug #1907421 which
  includes the kernel portion of this fix -- using these together, I
  verified that the DMA limit is being read in and honored appropriately
  by QEMU, and I can no longer trigger an overrun of the DMA space when a
  guest pushes heavy data transfer via PCI (no errors in log, no transfer
  stalls).
  
  Also, as related to the last patch of the set, I further verified that
  no build errors are encountered when configured with --without-default-
  devices.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1913395

Title:
  [UBUNTU 21.04] qemu s390x/pci: Honor vfio DMA limiting

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-z-systems/+bug/1913395/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to