** Description changed:

  [Impact]
  
   * There are situations where disk I/O from guests to host scsi
     devices can fail to return the right status. In doing so
     the guest believes that I/O was successful while it was not
     leading silently to latter data corruption.
  
   * Upstream fixed this in latter versions, backport these
  
  [Test Plan]
  
   * The IBM test lab can run tests with (virtual) cable pulls and
     all that. This kind of testing revealed the issue initially (we
     don't know what subset exactly). We'd rely on IBM to run those
     tests against the builds in -proposed.
-    IBM already did that on the PPA which was quite helpful (see below)
+    IBM already did that on the PPA which was quite helpful (see below)
  
   * Any kind of SCSI attached disks would be worth to test.
     That can be (all details 
herehttps://libvirt.org/formatdomain.html#usb-pci-scsi-devices):
     - scsi device via hostdevs
     - scsi device using iscsi
     - scsi adapter via scsi-host + vhost
  
     But that can only test if there is no regression in formerly working
     simple setups. For the original case we have to rely on IBM (see above)
  
   * Due to the complexity I'd suggest to keep this a bit
     longer than usual in -proposed
  
  [Where problems could occur]
  
   * Qemu does a lot of things, problems of this change would occur
-    and be limited to the handling of scsi disks.
-    - There is the usual kind of regression potential if our backports
+    and be limited to the handling of scsi disks.   - There is the usual kind 
of regression potential if our backports
       missed anything or are bad. The code isn't easy, but we've now had
-      three developers having a look and it looks ok.
-    - But then there is also the "intended regression" which is that we
+      three developers having a look and it looks ok.   - But then there is 
also the "intended regression" which is that we
       now deliver error codes correctly. If there was a setup with bad I/O
       errors and relying on not seeing them this will change. With this
       upload these guests will get the error reported. We can't change
       this as that is the main purpose of this fix. But one would assume
       that people prefer that over silent corruption.
  
  [Other Info]
  
-  * n/a
+  * Per Comment: #22 this should stay in -proposed longer for up to 14
+ days to ensure that it gets extra testing.
  
  --- original report ---
  
  == Comment: #63 - Halil Pasic <pa...@de.ibm.com> - 2022-03-28 17:33:34 ==
  I'm pretty confident I've figured out what is going on.
  
  From the guest side, the decision whether the SCSI command was completed 
successfully or not comes down to looking at the sense data. Prior to commit
  a108557bbf ("scsi: inline sg_io_sense_from_errno() into the callers."), we 
don't
  build sense data as a response to seeing a host status presented by the host 
SCSI stack (e.g. kernel).
  
  Thus when the kernel tells us that  a given SCSI command did not get 
completed via
  SCSI_HOST_TRANSPORT_DISRUPTED or SCSI_HOST_NO_LUN, we end up  fooling the 
guest into believing that the command completed successfully.
  
  The guest kernel, and especially virtio and multipath are at no fault
  (AFAIU). Given these facts, it isn't all that surprising, that we end up
  with corruptions.
  
  All we have to do is do backports for QEMU (when necessary). I didn't
  investigate vhost-scsi -- my guess is, that it ain't affected.
  
  How do we want to handle the back-ports?
  
  == Comment: #66 - Halil Pasic <pa...@de.ibm.com> - 2022-04-04 05:36:33 ==
  This is a proposed backport containing 7 patches in mbox format. I tried to 
pick patches sanely, and all I had to do was basically resolving merge 
conflicts.
  
  I have to admit I have no extensive experience in doing such invasive
  backports, and my knowledge of the QEMU SCSI stack is very limited. I
  would be happy if the Ubuntu folks would have a good look at this, and
  if possible improve on it.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1967814

Title:
  Ubuntu 20.04.3 - ilzlnx3g1 - virtio-scsi devs on KVM guest having
  miscompares on disktests when there is a failed path.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-z-systems/+bug/1967814/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to