[Bug 2083061] Re: [SRU] error deleting cloned volumes and parent at the same time when using ceph

2025-03-24 Thread Rodrigo Barbieri
** Attachment added: "lp1823445_jammy_antelope_verification.txt"
   
https://bugs.launchpad.net/cinder/+bug/2083061/+attachment/5866973/+files/lp1823445_jammy_antelope_verification.txt

** Tags removed: verification-antelope-needed verification-bobcat-needed
** Tags added: verification-antelope-done verification-bobcat-done

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2083061

Title:
  [SRU] error deleting cloned volumes and parent at the same time when
  using ceph

To manage notifications about this bug go to:
https://bugs.launchpad.net/cinder/+bug/2083061/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 2083061] Re: [SRU] error deleting cloned volumes and parent at the same time when using ceph

2025-03-24 Thread Rodrigo Barbieri
** Attachment added: "lp2083061_jammy_bobcat_verification.txt"
   
https://bugs.launchpad.net/cinder/+bug/2083061/+attachment/5866972/+files/lp2083061_jammy_bobcat_verification.txt

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2083061

Title:
  [SRU] error deleting cloned volumes and parent at the same time when
  using ceph

To manage notifications about this bug go to:
https://bugs.launchpad.net/cinder/+bug/2083061/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 2083061] Re: [SRU] error deleting cloned volumes and parent at the same time when using ceph

2025-03-17 Thread Rodrigo Barbieri
Still pending UCAs for Antelope and Bobcat

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2083061

Title:
  [SRU] error deleting cloned volumes and parent at the same time when
  using ceph

To manage notifications about this bug go to:
https://bugs.launchpad.net/cinder/+bug/2083061/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 2083061] Re: [SRU] error deleting cloned volumes and parent at the same time when using ceph

2025-03-17 Thread Launchpad Bug Tracker
This bug was fixed in the package cinder - 2:20.3.1-0ubuntu1.6

---
cinder (2:20.3.1-0ubuntu1.6) jammy; urgency=medium

  * d/p/lp1823445.patch: Fix "signature_verified" volume metadata
from propagating to images and causing errors later when
creating volumes from such images (LP: #1823445).
  * d/p/lp2083061.patch: Fix race condition when deleting cloned
volumes and their parents at the same time. (LP: #2083061)

 -- Rodrigo Barbieri   Mon, 09 Dec 2024
15:37:22 +

** Changed in: cinder (Ubuntu Jammy)
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2083061

Title:
  [SRU] error deleting cloned volumes and parent at the same time when
  using ceph

To manage notifications about this bug go to:
https://bugs.launchpad.net/cinder/+bug/2083061/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 2083061] Re: [SRU] error deleting cloned volumes and parent at the same time when using ceph

2025-03-11 Thread Rodrigo Barbieri
** Attachment added: "lp2083061_jammy_yoga_verification.txt"
   
https://bugs.launchpad.net/cinder/+bug/2083061/+attachment/5864023/+files/lp2083061_jammy_yoga_verification.txt

** Tags removed: verification-needed-jammy
** Tags added: verification-done-jammy

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2083061

Title:
  [SRU] error deleting cloned volumes and parent at the same time when
  using ceph

To manage notifications about this bug go to:
https://bugs.launchpad.net/cinder/+bug/2083061/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 2083061] Re: [SRU] error deleting cloned volumes and parent at the same time when using ceph

2025-02-27 Thread Andreas Hasenack
Hello Rodrigo, or anyone else affected,

Accepted cinder into jammy-proposed. The package will build now and be
available at
https://launchpad.net/ubuntu/+source/cinder/2:20.3.1-0ubuntu1.6 in a few
hours, and then in the -proposed repository.

Please help us by testing this new package.  See
https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how
to enable and use -proposed.  Your feedback will aid us getting this
update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug,
mentioning the version of the package you tested, what testing has been
performed on the package and change the tag from verification-needed-
jammy to verification-done-jammy. If it does not fix the bug for you,
please add a comment stating that, and change the tag to verification-
failed-jammy. In either case, without details of your testing we will
not be able to proceed.

Further information regarding the verification process can be found at
https://wiki.ubuntu.com/QATeam/PerformingSRUVerification .  Thank you in
advance for helping!

N.B. The updated package will be released to -updates after the bug(s)
fixed by this package have been verified and the package has been in
-proposed for a minimum of 7 days.

** Changed in: cinder (Ubuntu Jammy)
   Status: In Progress => Fix Committed

** Tags added: verification-needed verification-needed-jammy

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2083061

Title:
  [SRU] error deleting cloned volumes and parent at the same time when
  using ceph

To manage notifications about this bug go to:
https://bugs.launchpad.net/cinder/+bug/2083061/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 2083061] Re: [SRU] error deleting cloned volumes and parent at the same time when using ceph

2025-02-20 Thread Rodrigo Barbieri
@Andreas see latest comments in
https://bugs.launchpad.net/ubuntu/noble/+source/cinder/+bug/1823445

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2083061

Title:
  [SRU] error deleting cloned volumes and parent at the same time when
  using ceph

To manage notifications about this bug go to:
https://bugs.launchpad.net/cinder/+bug/2083061/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 2083061] Re: [SRU] error deleting cloned volumes and parent at the same time when using ceph

2025-02-20 Thread Heitor Alves de Siqueira
This has already been fixed in Cinder for newer releases, I've adjusted the bug 
status accordingly.
It's also been sponsored for Jammy already, currently in the Unapproved queue.

** Changed in: cinder (Ubuntu)
   Status: Invalid => Fix Released

** Changed in: cinder
   Status: Invalid => Fix Released

** Changed in: cinder (Ubuntu Jammy)
   Status: New => In Progress

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2083061

Title:
  [SRU] error deleting cloned volumes and parent at the same time when
  using ceph

To manage notifications about this bug go to:
https://bugs.launchpad.net/cinder/+bug/2083061/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 2083061] Re: [SRU] error deleting cloned volumes and parent at the same time when using ceph

2025-01-27 Thread James Page
** Changed in: cinder (Ubuntu)
   Status: New => Invalid

** Changed in: cloud-archive
   Status: New => Invalid

** Changed in: cinder
   Status: New => Invalid

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2083061

Title:
  [SRU] error deleting cloned volumes and parent at the same time when
  using ceph

To manage notifications about this bug go to:
https://bugs.launchpad.net/cinder/+bug/2083061/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 2083061] Re: [SRU] error deleting cloned volumes and parent at the same time when using ceph

2025-01-24 Thread Andreas Hasenack
The main devel task for cinder is still "New", is this fixed in plucky?

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2083061

Title:
  [SRU] error deleting cloned volumes and parent at the same time when
  using ceph

To manage notifications about this bug go to:
https://bugs.launchpad.net/cinder/+bug/2083061/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 2083061] Re: [SRU] error deleting cloned volumes and parent at the same time when using ceph

2025-01-14 Thread James Page
Proposed updates for all targets uploaded for SRU team review.

I'll process UCA targets (which are also uploaded) once the SRU team
accepted into proposed

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2083061

Title:
  [SRU] error deleting cloned volumes and parent at the same time when
  using ceph

To manage notifications about this bug go to:
https://bugs.launchpad.net/cinder/+bug/2083061/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 2083061] Re: [SRU] error deleting cloned volumes and parent at the same time when using ceph

2024-11-25 Thread Rodrigo Barbieri
** Patch added: "lp2083061_jammy_antelope.debdiff"
   
https://bugs.launchpad.net/cloud-archive/+bug/2083061/+attachment/5840364/+files/lp2083061_jammy_antelope.debdiff

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2083061

Title:
  [SRU] error deleting cloned volumes and parent at the same time when
  using ceph

To manage notifications about this bug go to:
https://bugs.launchpad.net/cinder/+bug/2083061/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 2083061] Re: [SRU] error deleting cloned volumes and parent at the same time when using ceph

2024-11-25 Thread Rodrigo Barbieri
** Patch added: "lp2083061_jammy_bobcat.debdiff"
   
https://bugs.launchpad.net/cloud-archive/+bug/2083061/+attachment/5840363/+files/lp2083061_jammy_bobcat.debdiff

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2083061

Title:
  [SRU] error deleting cloned volumes and parent at the same time when
  using ceph

To manage notifications about this bug go to:
https://bugs.launchpad.net/cinder/+bug/2083061/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 2083061] Re: [SRU] error deleting cloned volumes and parent at the same time when using ceph

2024-11-25 Thread Rodrigo Barbieri
** Patch added: "lp2083061_jammy_yoga.debdiff"
   
https://bugs.launchpad.net/cloud-archive/+bug/2083061/+attachment/5840362/+files/lp2083061_jammy_yoga.debdiff

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2083061

Title:
  [SRU] error deleting cloned volumes and parent at the same time when
  using ceph

To manage notifications about this bug go to:
https://bugs.launchpad.net/cinder/+bug/2083061/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 2083061] Re: [SRU] error deleting cloned volumes and parent at the same time when using ceph

2024-11-25 Thread Rodrigo Barbieri
forgot the DEP3 tags. Deleting debdiffs and re-creating

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2083061

Title:
  [SRU] error deleting cloned volumes and parent at the same time when
  using ceph

To manage notifications about this bug go to:
https://bugs.launchpad.net/cinder/+bug/2083061/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 2083061] Re: [SRU] error deleting cloned volumes and parent at the same time when using ceph

2024-11-25 Thread Rodrigo Barbieri
** Patch added: "lp2083061_jammy_bobcat.debdiff"
   
https://bugs.launchpad.net/cloud-archive/+bug/2083061/+attachment/5840361/+files/lp2083061_jammy_bobcat.debdiff

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2083061

Title:
  [SRU] error deleting cloned volumes and parent at the same time when
  using ceph

To manage notifications about this bug go to:
https://bugs.launchpad.net/cinder/+bug/2083061/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 2083061] Re: [SRU] error deleting cloned volumes and parent at the same time when using ceph

2024-11-25 Thread Rodrigo Barbieri
** Patch added: "lp2083061_jammy_antelope.debdiff"
   
https://bugs.launchpad.net/cloud-archive/+bug/2083061/+attachment/5840360/+files/lp2083061_jammy_antelope.debdiff

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2083061

Title:
  [SRU] error deleting cloned volumes and parent at the same time when
  using ceph

To manage notifications about this bug go to:
https://bugs.launchpad.net/cinder/+bug/2083061/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 2083061] Re: [SRU] error deleting cloned volumes and parent at the same time when using ceph

2024-11-22 Thread Rodrigo Barbieri
** Description changed:

  *** SRU TEMPLATE AT THE BOTTOM **
  
  Affects: bobcat and older
  
  A race condition when deleting cloned volumes at the same time as their
  parent results in the volumes in error_deleting state. The reason it
  happens is because the code that looks for the parent in [3] may find
  the original volume or the ".deleted" renamed volume if the
  parent has been marked for deletion. The race happens because by running
  the deletion of both the parent and the child at the same time, the
  child may see the parent volume before it is marked for deletion, and
  then in [4] it fails to find it again because it is gone (renamed to
  ".deleted").
  
  Steps to reproduce:
  
  1) openstack volume create --size 1 v1
  
  Wait for volume to be create and available
  
  2) for i in {1..9}; do openstack volume create d$i --source v1 --size
  1;done
  
  Wait for all volumes to be created and available
  
  3) openstack volume delete $(openstack volume list --format value -c ID
  | sort | xargs)
  
  Some volumes may be in error_deleting state.
  
  Workaround: Reset volume state and try to delete again.
  
  Solutions:
  
  a) The issue does not happen in caracal+ because of commit [1] which
  refactors the code. I tried to reproduce in Caracal with 50 volumes,
  including grandparent volumes, and I couldn't. If we could backport this
  fix as far back as Yoga this would address the problem for our users.
  
  b) A single line of code in [2] can address the problem in bobcat and
  older releases by adding a retry:
  
  @utils.retry(rbd.ImageNotFound, 2)
  def delete_volume(self, volume: Volume) -> None:
  
  The retry basically causes the ImageNotFound exception thrown at [4] to
  retry the delete_volume function, which will then find the
  ".deleted" at [3], solving the race condition. It is simpler
  than adding something more complex directly at [4] where the error
  happens.
  
  [1]
  
https://github.com/openstack/cinder/commit/1a675c9aa178c6d9c6ed10fd98f086c46d350d3f
  
  [2]
  
https://github.com/openstack/cinder/blob/5b3717f8bfa69c142778ffeabfc4ab91f1f23581/cinder/volume/drivers/rbd.py#L1371
  
  [3]
  
https://github.com/openstack/cinder/blob/5b3717f8bfa69c142778ffeabfc4ab91f1f23581/cinder/volume/drivers/rbd.py#L1401
  
  [4]
  
https://github.com/openstack/cinder/blob/5b3717f8bfa69c142778ffeabfc4ab91f1f23581/cinder/volume/drivers/rbd.py#L1337
  
- 
  ===
  SRU TEMPLATE
  
  
  [Impact]
  
  Due to a race condition, attempting to delete multiple volumes where
  among them there is a parent and a child can result in one or more
  volumes being stuck in error_deleting. The reason is because the childs
  get updated as the parent is deleted, and if the code had already
  started deleting the child then the reference changes halfway through
  and fails. Later, the volumes can still be deleted by resetting the
  state and retrying, but the user experience is cumbersome.
  
  Upstream has fixed the issue in Caracal by refactoring the delete method
  with significant behavioural changes (see comment #2), and has
  backported the refactor to Antelope. Also the refactor code applies to
  Yoga, it is preferred to implement a simpler fix to address this
  specific problem in Yoga. The simpler fix is a retry decorator which
  will force the delete method to re-run, picking up the updated reference
  of the parent being deleted and therefore succeeding deleting the
  childs.
  
  [Test case]
  
  1) Deploy Cinder with Ceph
  2) Create a parent volume
  
  openstack volume create --size 1 v1
  
  3) Create the child volumes
  
  for i in {1..9}; do openstack volume create d$i --source v1 --size
  1;done
  
  4) Wait for all volumes to be created and available
  
  5) Delete all the volumes
  
- openstack volume delete $(openstack volume list --format value -c ID |
- sort | xargs)
+ for item in $(openstack volume list --format value -c ID | sort |
+ xargs); do openstack volume delete $item & done
  
  6) Check for volumes stuck in error_deleting, if None, repeat steps 2-5
  
  7) Confirm error message rbd.ImageNotFound in the logs
  
  8) Install fixed package
  
- 9) Repeat steps 2-5, confirm new error message rbd.ImageNotFound in the
- logs but no volumes stuck in error_deleting
+ 9) Repeat steps 2-5, confirm no volumes stuck in error_deleting and the
+ following messages in the log:
  
+ "no longer exists in backend"
+ 
+ "Retrying
+ 
cinder.volume.drivers.rbd.RBDDriver.delete_volume.._delete_volume_with_retry
+ ... as it raised ImageNotFound: [errno 2] RBD image not found (error
+ opening image ... at snapshot None)"
+ 
+ If messages are not in the logs after the new delete command, then retry
+ steps 2-5 until messages appear while still not having any volumes stuck
+ in error_deleting (in other words, all volume deleted successfully).
  
  [Regression Potential]
  
  For Bobcat and Antelope, there is reasonable

[Bug 2083061] Re: [SRU] error deleting cloned volumes and parent at the same time when using ceph

2024-11-21 Thread OpenStack Infra
Fix proposed to branch: unmaintained/yoga
Review: https://review.opendev.org/c/openstack/cinder/+/935976

** Changed in: cloud-archive/yoga
   Status: New => In Progress

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2083061

Title:
  [SRU] error deleting cloned volumes and parent at the same time when
  using ceph

To manage notifications about this bug go to:
https://bugs.launchpad.net/cinder/+bug/2083061/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 2083061] Re: [SRU] error deleting cloned volumes and parent at the same time when using ceph

2024-11-20 Thread Rodrigo Barbieri
For reference this is the error and stack trace:

parent: 2078a5c0-4272-46f3-b95b-d89d62da67af
child: 12d02019-17b1-45a3-9026-aed36873edaf

2024-11-20 15:43:43.583 38606 ERROR oslo_messaging.rpc.server   File 
"/usr/lib/python3/dist-packages/cinder/volume/manager.py", line 981, in 
delete_volume
2024-11-20 15:43:43.583 38606 ERROR oslo_messaging.rpc.server 
self.driver.delete_volume(volume)
2024-11-20 15:43:43.583 38606 ERROR oslo_messaging.rpc.server   File 
"/usr/lib/python3/dist-packages/cinder/volume/drivers/rbd.py", line 1350, in 
delete_volume
2024-11-20 15:43:43.583 38606 ERROR oslo_messaging.rpc.server 
self._delete_clone_parent_refs(client, parent, parent_snap)
2024-11-20 15:43:43.583 38606 ERROR oslo_messaging.rpc.server   File 
"/usr/lib/python3/dist-packages/cinder/volume/drivers/rbd.py", line 1231, in 
_delete_clone_parent_refs
2024-11-20 15:43:43.583 38606 ERROR oslo_messaging.rpc.server parent_rbd = 
self.rbd.Image(client.ioctx, parent_name)
2024-11-20 15:43:43.583 38606 ERROR oslo_messaging.rpc.server   File "rbd.pyx", 
line 2896, in rbd.Image.__init__
2024-11-20 15:43:43.583 38606 ERROR oslo_messaging.rpc.server 
rbd.ImageNotFound: [errno 2] RBD image not found (error opening image 
b'volume-2078a5c0-4272-46f3-b95b-d89d62da67af' at snapshot None)
2024-11-20 15:43:43.583 38606 ERROR oslo_messaging.rpc.server
2024-11-20 15:43:43.593 38606 DEBUG cinder.volume.drivers.rbd 
[req-5c4144c1-c310-4f05-ba15-b05dda6d61af 70958fca143047a583e91795ff460152 
5c20c2e1c8ed4948923449807a40b3e7 - - -] volume is a clone so cleaning 
references delete_volume 
/usr/lib/python3/dist-packages/cinder/volume/drivers/rbd.py:1348

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2083061

Title:
  [SRU] error deleting cloned volumes and parent at the same time when
  using ceph

To manage notifications about this bug go to:
https://bugs.launchpad.net/cinder/+bug/2083061/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 2083061] Re: [SRU] error deleting cloned volumes and parent at the same time when using ceph

2024-11-18 Thread Rodrigo Barbieri
** Description changed:

+ *** SRU TEMPLATE AT THE BOTTOM **
+ 
  Affects: bobcat and older
  
  A race condition when deleting cloned volumes at the same time as their
  parent results in the volumes in error_deleting state. The reason it
  happens is because the code that looks for the parent in [3] may find
  the original volume or the ".deleted" renamed volume if the
  parent has been marked for deletion. The race happens because by running
  the deletion of both the parent and the child at the same time, the
  child may see the parent volume before it is marked for deletion, and
  then in [4] it fails to find it again because it is gone (renamed to
  ".deleted").
  
  Steps to reproduce:
  
  1) openstack volume create --size 1 v1
  
  Wait for volume to be create and available
  
  2) for i in {1..9}; do openstack volume create d$i --source v1 --size
  1;done
  
  Wait for all volumes to be created and available
  
  3) openstack volume delete $(openstack volume list --format value -c ID
  | sort | xargs)
  
  Some volumes may be in error_deleting state.
  
  Workaround: Reset volume state and try to delete again.
  
  Solutions:
  
  a) The issue does not happen in caracal+ because of commit [1] which
  refactors the code. I tried to reproduce in Caracal with 50 volumes,
  including grandparent volumes, and I couldn't. If we could backport this
  fix as far back as Yoga this would address the problem for our users.
  
  b) A single line of code in [2] can address the problem in bobcat and
  older releases by adding a retry:
  
- @utils.retry(rbd.ImageNotFound, 2)
- def delete_volume(self, volume: Volume) -> None:
+ @utils.retry(rbd.ImageNotFound, 2)
+ def delete_volume(self, volume: Volume) -> None:
  
  The retry basically causes the ImageNotFound exception thrown at [4] to
  retry the delete_volume function, which will then find the
  ".deleted" at [3], solving the race condition. It is simpler
  than adding something more complex directly at [4] where the error
  happens.
  
- 
- [1] 
https://github.com/openstack/cinder/commit/1a675c9aa178c6d9c6ed10fd98f086c46d350d3f
+ [1]
+ 
https://github.com/openstack/cinder/commit/1a675c9aa178c6d9c6ed10fd98f086c46d350d3f
  
  [2]
  
https://github.com/openstack/cinder/blob/5b3717f8bfa69c142778ffeabfc4ab91f1f23581/cinder/volume/drivers/rbd.py#L1371
  
  [3]
  
https://github.com/openstack/cinder/blob/5b3717f8bfa69c142778ffeabfc4ab91f1f23581/cinder/volume/drivers/rbd.py#L1401
  
  [4]
  
https://github.com/openstack/cinder/blob/5b3717f8bfa69c142778ffeabfc4ab91f1f23581/cinder/volume/drivers/rbd.py#L1337
+ 
+ 
+ ===
+ SRU TEMPLATE
+ 
+ 
+ [Impact]
+ 
+ Due to a race condition, attempting to delete multiple volumes where
+ among them there is a parent and a child can result in one or more
+ volumes being stuck in error_deleting. The reason is because the childs
+ get updated as the parent is deleted, and if the code had already
+ started deleting the child then the reference changes halfway through
+ and fails. Later, the volumes can still be deleted by resetting the
+ state and retrying, but the user experience is cumbersome.
+ 
+ Upstream has fixed the issue in Caracal by refactoring the delete method
+ with significant behavioural changes (see comment #2), and has
+ backported the refactor to Antelope. Also the refactor code applies to
+ Yoga, it is preferred to implement a simpler fix to address this
+ specific problem in Yoga. The simpler fix is a retry decorator which
+ will force the delete method to re-run, picking up the updated reference
+ of the parent being deleted and therefore succeeding deleting the
+ childs.
+ 
+ [Test case]
+ 
+ 1) Deploy Cinder with Ceph
+ 2) Create a parent volume
+ 
+ openstack volume create --size 1 v1
+ 
+ 3) Create the child volumes
+ 
+ for i in {1..9}; do openstack volume create d$i --source v1 --size
+ 1;done
+ 
+ 4) Wait for all volumes to be created and available
+ 
+ 5) Delete all the volumes
+ 
+ openstack volume delete $(openstack volume list --format value -c ID |
+ sort | xargs)
+ 
+ 6) Check for volumes stuck in error_deleting, if None, repeat steps 2-5
+ 
+ 7) Confirm error message rbd.ImageNotFound in the logs
+ 
+ 8) Install fixed package
+ 
+ 9) Repeat steps 2-5, confirm new error message rbd.ImageNotFound in the
+ logs but no volumes stuck in error_deleting
+ 
+ 
+ [Regression Potential]
+ 
+ For Bobcat and Antelope, there is reasonable regression potential
+ because of the complexity of refactor [1] (see comment #2), however,
+ discussions on previous upstream meetings and upstream CI runs of
+ Caracal, Bobcat and Antelope backports which test the refactor provide
+ some level of reassurance. For Yoga, we consider no regression potential
+ with the simpler retry decorator fix.
+ 
+ [Other Info]
+ 
+ None.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https: