** Tags added: bionic focal

** Description changed:

  The target subsystem (LIO) can hang if multiple threads try to destroy
  iSCSI sessions simultaneously. This is reproducible on systems that have
  multiple targets with initiators regularly connecting/disconnecting.
  
  This may happen when a "targetcli iscsi/iqn.../tpg1 disable" command is
  executed when a logout operation is underway.
  
  The iscsi target doesn't handle such events in a correct way: two or
  more threads may end up sleeping while waiting for the driver to close
  the remaining connections on the session. When the connections are
  closed, the driver wakes up only the first thread that will then proceed
  to destroy the session structure. The remaining threads are blocked
  there forever, waiting on a completion synchronization mechanism that
  doesn't exist in memory anymore because it has been freed by the first
  thread.
  
  Note that if the blocked threads are somehow forced to wake up, they
  will try to free the same iSCSI session structure destroyed by the first
  thread, causing double frees, memory corruptions, etc.
  
  The driver has been reorganized so the concurrent threads will set a
  flag in the session structure to notify the driver that the session
  should be destroyed; then, they wait for the driver to close the
  remaining connections. When the connections are all closed, the driver
  will wake up all the threads and will wait for the refcount of the iSCSI
  session structure to reach zero. When the last thread wakes up, the
  refcount is decreased to zero and the driver can proceed to destroy the
  session structure because no one is referencing it anymore.
  
  I've witnessed this happening on hundreds of Ubuntu 16.04.5 systems. It
  is a regression, because this did not occur several years ago.
  Unfortunately, I don't have detailed records from that far back to
  determine exactly which kernel I was running that was not affected by
  this bug (I believe it was either 4.8.x or 4.10.x).
  
  I've attached the requested uname, version_signature, dmesg, and lspci
  from my system. However, I've seen this happen on a wide array of
  hardware: 2 to 24 cores, 8GB to 256GB RAM, both AMD and Intel CPUs,
  onboard storage and PCIe SAS cards, etc.
  
  This has been fixed in the upstream master branch, but it hasn't yet
  been backported to "-stable".
  
  To fix this in the Ubuntu kernel, these three commits should be backported:
- * 
https://github.com/torvalds/linux/commit/e49a7d994379278d3353d7ffc7994672752fb0ad#diff-b7557d7ed3ba34645f6e9d510f281d3a
- * 
https://github.com/torvalds/linux/commit/57c46e9f33da530a2485fa01aa27b6d18c28c796#diff-b7557d7ed3ba34645f6e9d510f281d3a
- * 
https://github.com/torvalds/linux/commit/626bac73371eed79e2afa2966de393da96cf925e#diff-b7557d7ed3ba34645f6e9d510f281d3a
+ * 
https://github.com/torvalds/linux/commit/e49a7d994379278d3353d7ffc7994672752fb0ad
+ * 
https://github.com/torvalds/linux/commit/57c46e9f33da530a2485fa01aa27b6d18c28c796
+ * 
https://github.com/torvalds/linux/commit/626bac73371eed79e2afa2966de393da96cf925e
+ 
+ I'd like these commits to be added to the xenial, bionic, and focal
+ kernels.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1871688

Title:
  LIO hanging in iscsit_free_session and iscsit_stop_session

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1871688/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to