** Description changed:

+ [Impact]
+ 
+  * Host -> Guest notifications can be lost and kill I/O due to that,
+    see below at the original bug report for more details.
+ 
+  * Backport the fix that ensures that the generated code has to re-load 
+    variables properly avoiding the issue.
+ 
+ [Test Case]
+ 
+  * Set up iperf in the host and run the server "iperf -s"
+  * get a guest using driver=qemu like:
+     <interface type='network'>
+     <source network='default'/>
+     <model type='virtio'/>
+     <driver name='qemu'/>
+     <interface/>
+  * In the guest run a loop of iperf runs connecting to the
+    server on the host.
+     #!/bin/bash
+     for i in $(seq 1 1000);
+     do
+       echo Try $i
+       iperf -c 192.168.122.1 || break
+     done
+  * Depending on the HW model, the machine saturation and such it seems
+    the above test either is rather reproducible or not-at-all.
+    That is bad, but we haven't found a much better repro, gladly IBM
+    who reported this issue (and created the fix) can recreate this on 
+    their end and are willing to do so again for the SRU verification.
+ 
+ [Regression Potential]
+ 
+  * The changed code path is s390x only and there on the virtio-ccw 
+    handling. Therefore regressions - if any - would be isolated to s390x 
+    only and there manifest on virtio-ccw based I/O.
+ 
+ [Other Info]
+  
+  * n/a
+ 
+ ----
+ 
+ 
  Problem Description:
  
  When irqfds are not used setting of the adapter interruption
  host-->guest notifier bit is accomplished by the QEMU function
  virtio_set_ind_atomic().
  
- The atomic_cmpxchg() loop in virtio_set_ind_atomic() is broken because we 
occasionally end up with old and _old having different values (a legit compiler 
can generate code that accessed *ind_addr again to pick up a value for _old 
instead of using the value of old that was already fetched according to the 
rules of the abstract machine). This means the underlying CS instruction may 
use a different old (_old) than the one we intended to use if atomic_cmpxchg() 
performed the xchg part.
-     
- The direct consequence of the problem is that host --> guest notifications 
can get lost. The indirect consequence is that queues may get stuck and the 
devices may cease operate normally. We stumbled on debugging a choked 
virtio-net interface (one that used the qemu driver and not vhost). But it can 
affect other virtio-ccw devices as well. 
+ The atomic_cmpxchg() loop in virtio_set_ind_atomic() is broken because
+ we occasionally end up with old and _old having different values (a
+ legit compiler can generate code that accessed *ind_addr again to pick
+ up a value for _old instead of using the value of old that was already
+ fetched according to the rules of the abstract machine). This means the
+ underlying CS instruction may use a different old (_old) than the one we
+ intended to use if atomic_cmpxchg() performed the xchg part.
+ 
+ The direct consequence of the problem is that host --> guest
+ notifications can get lost. The indirect consequence is that queues may
+ get stuck and the devices may cease operate normally. We stumbled on
+ debugging a choked virtio-net interface (one that used the qemu driver
+ and not vhost). But it can affect other virtio-ccw devices as well.
  
  If irqfds are used for host->guest notifications, then we are safe
  because notifier bit manipulation is done in the kernel (and it's done
  correctly).
- 
  
  The problem described above is fixed upstream by commit.
  
  1a8242f7c3 ("virtio-ccw: fix virtio_set_ind_atomic")
  
  All upstream versions since v2.0.0 are (potentially) affected.
  
  The same mistake was made in QEMU in another place, and is fixed by:
  
  45175361f1 ("s390x/pci: fix set_ind_atomic")
  
  We can file a separate BZ for it if necessary.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1894942

Title:
  [UBUNTU 20.04] Lost virtio host --> guest notifications cause devices
  to cease normal operation

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-z-systems/+bug/1894942/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to