[Yahoo-eng-team] [Bug 1885528] Re: snapshot delete fails on shutdown VM

2021-06-16 Thread melanie witt
** Also affects: nova/train
   Importance: Undecided
   Status: New

** No longer affects: nova/trunk

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1885528

Title:
  snapshot delete fails on shutdown VM

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) queens series:
  New
Status in OpenStack Compute (nova) rocky series:
  In Progress
Status in OpenStack Compute (nova) stein series:
  New
Status in OpenStack Compute (nova) train series:
  New
Status in OpenStack Compute (nova) ussuri series:
  Fix Released
Status in OpenStack Compute (nova) victoria series:
  Fix Released

Bug description:
  Description:
  When we try to delete the last snapshot of a VM in shutdown state, this 
snapshot_delete will fail (and be stuck in state error-deleting). When setting 
state==available and redeleting the snapshot, the volume will be corrupted and 
the VM will never start again. Volumes are stored on NFS.
  (for root cause and fix, see the bottom of this post)

  To reproduce:
  - storage on NFS
  - create a VM and some snapshots
  - shut down the VM (ie volume is still considered "attached" but vm is no 
longer "active")
  - delete the last snapshot

  Expected Result:
  snapshot is deleted, vm still works

  Actual result:
  The snapshot is stuck on error deleting. After setting the snapshot 
state==available and deleting the snapshot again, the volume will be corrupted 
and the VM will never start again. (non-existing backing_file in qcow on disk)

  Environment:
  - openstack version: stein, deployed via kolla-ansible. I suspect this 
downloads from git but i don't know the exact version.
  - hypervisor: Libvirt + KVM
  - storage: NFS
  - networking: Neutron with OpenVSwitch

  Nova debug Logs:
  2020-02-06 12:20:10.713 6 ERROR nova.virt.libvirt.driver 
[req-d38b5ec8-afdb-4dfe-af12-0c47598c6a47 6dd1c995b2ea4ddfbeb0685bc52e5fbf 
6bebb564667d4a75b9281fd826b32ecf - d
  efault default] [instance: 711651a3-8440-42dd-a210-e7e550a8624e] Error 
occurred during volume_snapshot_delete, sending error status to Cinder.: 
DiskNotFound: No disk at
   
volume-86c06b12-699c-4b54-8bca-fb92c99a2bf0.63d1585e-eb76-4e8f-bc96-93960e9c9692
  2020-02-06 12:20:10.713 6 ERROR nova.virt.libvirt.driver [instance: 
711651a3-8440-42dd-a210-e7e550a8624e] Traceback (most recent call last):
  2020-02-06 12:20:10.713 6 ERROR nova.virt.libvirt.driver [instance: 
711651a3-8440-42dd-a210-e7e550a8624e]   File 
"/usr/lib/python2.7/site-packages/nova/virt/libvirt/dri
  ver.py", line 2726, in volume_snapshot_delete
  2020-02-06 12:20:10.713 6 ERROR nova.virt.libvirt.driver [instance: 
711651a3-8440-42dd-a210-e7e550a8624e] snapshot_id, delete_info=delete_info)
  2020-02-06 12:20:10.713 6 ERROR nova.virt.libvirt.driver [instance: 
711651a3-8440-42dd-a210-e7e550a8624e]   File 
"/usr/lib/python2.7/site-packages/nova/virt/libvirt/dri
  ver.py", line 2686, in _volume_snapshot_delete
  2020-02-06 12:20:10.713 6 ERROR nova.virt.libvirt.driver [instance: 
711651a3-8440-42dd-a210-e7e550a8624e] rebase_base)
  2020-02-06 12:20:10.713 6 ERROR nova.virt.libvirt.driver [instance: 
711651a3-8440-42dd-a210-e7e550a8624e]   File 
"/usr/lib/python2.7/site-packages/nova/virt/libvirt/dri
  ver.py", line 2519, in _rebase_with_qemu_img
  2020-02-06 12:20:10.713 6 ERROR nova.virt.libvirt.driver [instance: 
711651a3-8440-42dd-a210-e7e550a8624e] b_file_fmt = 
images.qemu_img_info(backing_file).file_forma
  t
  2020-02-06 12:20:10.713 6 ERROR nova.virt.libvirt.driver [instance: 
711651a3-8440-42dd-a210-e7e550a8624e]   File 
"/usr/lib/python2.7/site-packages/nova/virt/images.py",
   line 58, in qemu_img_info
  2020-02-06 12:20:10.713 6 ERROR nova.virt.libvirt.driver [instance: 
711651a3-8440-42dd-a210-e7e550a8624e] raise 
exception.DiskNotFound(location=path)
  2020-02-06 12:20:10.713 6 ERROR nova.virt.libvirt.driver [instance: 
711651a3-8440-42dd-a210-e7e550a8624e] DiskNotFound: No disk at 
volume-86c06b12-699c-4b54-8bca-fb92c9
  9a2bf0.63d1585e-eb76-4e8f-bc96-93960e9c9692
  2020-02-06 12:20:10.713 6 ERROR nova.virt.libvirt.driver [instance: 
711651a3-8440-42dd-a210-e7e550a8624e] 
  2020-02-06 12:20:10.780 6 ERROR oslo_messaging.rpc.server 
[req-d38b5ec8-afdb-4dfe-af12-0c47598c6a47 6dd1c995b2ea4ddfbeb0685bc52e5fbf 
6bebb564667d4a75b9281fd826b32ecf - 
  default default] Exception during message handling: DiskNotFound: No disk at 
volume-86c06b12-699c-4b54-8bca-fb92c99a2bf0.63d1585e-eb76-4e8f-bc96-93960e9c9692
  2020-02-06 12:20:10.780 6 ERROR oslo_messaging.rpc.server Traceback (most 
recent call last):
  2020-02-06 12:20:10.780 6 ERROR oslo_messaging.rpc.server   File 
"/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 166, in 
_process_incoming
  2020-02-06 12:20:10.780 6 ERROR oslo_messaging.rpc.server res = 
self.dispatcher.dispatch(message)
  2020-02-06 12:20:10.780 6 ERROR 

[Yahoo-eng-team] [Bug 1885528] Re: snapshot delete fails on shutdown VM

2021-06-03 Thread Elod Illes
** Changed in: nova/ussuri
   Status: In Progress => Fix Released

** Changed in: nova/victoria
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1885528

Title:
  snapshot delete fails on shutdown VM

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) queens series:
  New
Status in OpenStack Compute (nova) rocky series:
  In Progress
Status in OpenStack Compute (nova) stein series:
  New
Status in OpenStack Compute (nova) trunk series:
  New
Status in OpenStack Compute (nova) ussuri series:
  Fix Released
Status in OpenStack Compute (nova) victoria series:
  Fix Released

Bug description:
  Description:
  When we try to delete the last snapshot of a VM in shutdown state, this 
snapshot_delete will fail (and be stuck in state error-deleting). When setting 
state==available and redeleting the snapshot, the volume will be corrupted and 
the VM will never start again. Volumes are stored on NFS.
  (for root cause and fix, see the bottom of this post)

  To reproduce:
  - storage on NFS
  - create a VM and some snapshots
  - shut down the VM (ie volume is still considered "attached" but vm is no 
longer "active")
  - delete the last snapshot

  Expected Result:
  snapshot is deleted, vm still works

  Actual result:
  The snapshot is stuck on error deleting. After setting the snapshot 
state==available and deleting the snapshot again, the volume will be corrupted 
and the VM will never start again. (non-existing backing_file in qcow on disk)

  Environment:
  - openstack version: stein, deployed via kolla-ansible. I suspect this 
downloads from git but i don't know the exact version.
  - hypervisor: Libvirt + KVM
  - storage: NFS
  - networking: Neutron with OpenVSwitch

  Nova debug Logs:
  2020-02-06 12:20:10.713 6 ERROR nova.virt.libvirt.driver 
[req-d38b5ec8-afdb-4dfe-af12-0c47598c6a47 6dd1c995b2ea4ddfbeb0685bc52e5fbf 
6bebb564667d4a75b9281fd826b32ecf - d
  efault default] [instance: 711651a3-8440-42dd-a210-e7e550a8624e] Error 
occurred during volume_snapshot_delete, sending error status to Cinder.: 
DiskNotFound: No disk at
   
volume-86c06b12-699c-4b54-8bca-fb92c99a2bf0.63d1585e-eb76-4e8f-bc96-93960e9c9692
  2020-02-06 12:20:10.713 6 ERROR nova.virt.libvirt.driver [instance: 
711651a3-8440-42dd-a210-e7e550a8624e] Traceback (most recent call last):
  2020-02-06 12:20:10.713 6 ERROR nova.virt.libvirt.driver [instance: 
711651a3-8440-42dd-a210-e7e550a8624e]   File 
"/usr/lib/python2.7/site-packages/nova/virt/libvirt/dri
  ver.py", line 2726, in volume_snapshot_delete
  2020-02-06 12:20:10.713 6 ERROR nova.virt.libvirt.driver [instance: 
711651a3-8440-42dd-a210-e7e550a8624e] snapshot_id, delete_info=delete_info)
  2020-02-06 12:20:10.713 6 ERROR nova.virt.libvirt.driver [instance: 
711651a3-8440-42dd-a210-e7e550a8624e]   File 
"/usr/lib/python2.7/site-packages/nova/virt/libvirt/dri
  ver.py", line 2686, in _volume_snapshot_delete
  2020-02-06 12:20:10.713 6 ERROR nova.virt.libvirt.driver [instance: 
711651a3-8440-42dd-a210-e7e550a8624e] rebase_base)
  2020-02-06 12:20:10.713 6 ERROR nova.virt.libvirt.driver [instance: 
711651a3-8440-42dd-a210-e7e550a8624e]   File 
"/usr/lib/python2.7/site-packages/nova/virt/libvirt/dri
  ver.py", line 2519, in _rebase_with_qemu_img
  2020-02-06 12:20:10.713 6 ERROR nova.virt.libvirt.driver [instance: 
711651a3-8440-42dd-a210-e7e550a8624e] b_file_fmt = 
images.qemu_img_info(backing_file).file_forma
  t
  2020-02-06 12:20:10.713 6 ERROR nova.virt.libvirt.driver [instance: 
711651a3-8440-42dd-a210-e7e550a8624e]   File 
"/usr/lib/python2.7/site-packages/nova/virt/images.py",
   line 58, in qemu_img_info
  2020-02-06 12:20:10.713 6 ERROR nova.virt.libvirt.driver [instance: 
711651a3-8440-42dd-a210-e7e550a8624e] raise 
exception.DiskNotFound(location=path)
  2020-02-06 12:20:10.713 6 ERROR nova.virt.libvirt.driver [instance: 
711651a3-8440-42dd-a210-e7e550a8624e] DiskNotFound: No disk at 
volume-86c06b12-699c-4b54-8bca-fb92c9
  9a2bf0.63d1585e-eb76-4e8f-bc96-93960e9c9692
  2020-02-06 12:20:10.713 6 ERROR nova.virt.libvirt.driver [instance: 
711651a3-8440-42dd-a210-e7e550a8624e] 
  2020-02-06 12:20:10.780 6 ERROR oslo_messaging.rpc.server 
[req-d38b5ec8-afdb-4dfe-af12-0c47598c6a47 6dd1c995b2ea4ddfbeb0685bc52e5fbf 
6bebb564667d4a75b9281fd826b32ecf - 
  default default] Exception during message handling: DiskNotFound: No disk at 
volume-86c06b12-699c-4b54-8bca-fb92c99a2bf0.63d1585e-eb76-4e8f-bc96-93960e9c9692
  2020-02-06 12:20:10.780 6 ERROR oslo_messaging.rpc.server Traceback (most 
recent call last):
  2020-02-06 12:20:10.780 6 ERROR oslo_messaging.rpc.server   File 
"/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 166, in 
_process_incoming
  2020-02-06 12:20:10.780 6 ERROR oslo_messaging.rpc.server res = 

[Yahoo-eng-team] [Bug 1885528] Re: snapshot delete fails on shutdown VM

2021-03-17 Thread Lee Yarwood
** Also affects: nova/rocky
   Importance: Undecided
   Status: New

** Also affects: nova/queens
   Importance: Undecided
   Status: New

** Also affects: nova/ussuri
   Importance: Undecided
   Status: New

** Also affects: nova/victoria
   Importance: Undecided
   Status: New

** Also affects: nova/stein
   Importance: Undecided
   Status: New

** Also affects: nova/trunk
   Importance: Undecided
   Status: New

** Changed in: nova/ussuri
 Assignee: (unassigned) => Lee Yarwood (lyarwood)

** Changed in: nova/victoria
 Assignee: (unassigned) => Lee Yarwood (lyarwood)

** Changed in: nova/rocky
   Status: New => In Progress

** Changed in: nova/trunk
 Assignee: (unassigned) => Lee Yarwood (lyarwood)

** Changed in: nova/ussuri
   Status: New => In Progress

** Changed in: nova/victoria
   Status: New => In Progress

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1885528

Title:
  snapshot delete fails on shutdown VM

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) queens series:
  New
Status in OpenStack Compute (nova) rocky series:
  In Progress
Status in OpenStack Compute (nova) stein series:
  New
Status in OpenStack Compute (nova) trunk series:
  New
Status in OpenStack Compute (nova) ussuri series:
  In Progress
Status in OpenStack Compute (nova) victoria series:
  In Progress

Bug description:
  Description:
  When we try to delete the last snapshot of a VM in shutdown state, this 
snapshot_delete will fail (and be stuck in state error-deleting). When setting 
state==available and redeleting the snapshot, the volume will be corrupted and 
the VM will never start again. Volumes are stored on NFS.
  (for root cause and fix, see the bottom of this post)

  To reproduce:
  - storage on NFS
  - create a VM and some snapshots
  - shut down the VM (ie volume is still considered "attached" but vm is no 
longer "active")
  - delete the last snapshot

  Expected Result:
  snapshot is deleted, vm still works

  Actual result:
  The snapshot is stuck on error deleting. After setting the snapshot 
state==available and deleting the snapshot again, the volume will be corrupted 
and the VM will never start again. (non-existing backing_file in qcow on disk)

  Environment:
  - openstack version: stein, deployed via kolla-ansible. I suspect this 
downloads from git but i don't know the exact version.
  - hypervisor: Libvirt + KVM
  - storage: NFS
  - networking: Neutron with OpenVSwitch

  Nova debug Logs:
  2020-02-06 12:20:10.713 6 ERROR nova.virt.libvirt.driver 
[req-d38b5ec8-afdb-4dfe-af12-0c47598c6a47 6dd1c995b2ea4ddfbeb0685bc52e5fbf 
6bebb564667d4a75b9281fd826b32ecf - d
  efault default] [instance: 711651a3-8440-42dd-a210-e7e550a8624e] Error 
occurred during volume_snapshot_delete, sending error status to Cinder.: 
DiskNotFound: No disk at
   
volume-86c06b12-699c-4b54-8bca-fb92c99a2bf0.63d1585e-eb76-4e8f-bc96-93960e9c9692
  2020-02-06 12:20:10.713 6 ERROR nova.virt.libvirt.driver [instance: 
711651a3-8440-42dd-a210-e7e550a8624e] Traceback (most recent call last):
  2020-02-06 12:20:10.713 6 ERROR nova.virt.libvirt.driver [instance: 
711651a3-8440-42dd-a210-e7e550a8624e]   File 
"/usr/lib/python2.7/site-packages/nova/virt/libvirt/dri
  ver.py", line 2726, in volume_snapshot_delete
  2020-02-06 12:20:10.713 6 ERROR nova.virt.libvirt.driver [instance: 
711651a3-8440-42dd-a210-e7e550a8624e] snapshot_id, delete_info=delete_info)
  2020-02-06 12:20:10.713 6 ERROR nova.virt.libvirt.driver [instance: 
711651a3-8440-42dd-a210-e7e550a8624e]   File 
"/usr/lib/python2.7/site-packages/nova/virt/libvirt/dri
  ver.py", line 2686, in _volume_snapshot_delete
  2020-02-06 12:20:10.713 6 ERROR nova.virt.libvirt.driver [instance: 
711651a3-8440-42dd-a210-e7e550a8624e] rebase_base)
  2020-02-06 12:20:10.713 6 ERROR nova.virt.libvirt.driver [instance: 
711651a3-8440-42dd-a210-e7e550a8624e]   File 
"/usr/lib/python2.7/site-packages/nova/virt/libvirt/dri
  ver.py", line 2519, in _rebase_with_qemu_img
  2020-02-06 12:20:10.713 6 ERROR nova.virt.libvirt.driver [instance: 
711651a3-8440-42dd-a210-e7e550a8624e] b_file_fmt = 
images.qemu_img_info(backing_file).file_forma
  t
  2020-02-06 12:20:10.713 6 ERROR nova.virt.libvirt.driver [instance: 
711651a3-8440-42dd-a210-e7e550a8624e]   File 
"/usr/lib/python2.7/site-packages/nova/virt/images.py",
   line 58, in qemu_img_info
  2020-02-06 12:20:10.713 6 ERROR nova.virt.libvirt.driver [instance: 
711651a3-8440-42dd-a210-e7e550a8624e] raise 
exception.DiskNotFound(location=path)
  2020-02-06 12:20:10.713 6 ERROR nova.virt.libvirt.driver [instance: 
711651a3-8440-42dd-a210-e7e550a8624e] DiskNotFound: No disk at 
volume-86c06b12-699c-4b54-8bca-fb92c9
  9a2bf0.63d1585e-eb76-4e8f-bc96-93960e9c9692
  2020-02-06 12:20:10.713 6 ERROR nova.virt.libvirt.driver [instance: 

[Yahoo-eng-team] [Bug 1885528] Re: snapshot delete fails on shutdown VM

2020-09-25 Thread OpenStack Infra
Reviewed:  https://review.opendev.org/739246
Committed: 
https://git.openstack.org/cgit/openstack/nova/commit/?id=b9333125790682f9d60bc74fdbb12a098565e7c2
Submitter: Zuul
Branch:master

commit b9333125790682f9d60bc74fdbb12a098565e7c2
Author: Balazs Gibizer 
Date:   Thu Jul 2 12:13:29 2020 +0200

Use absolute path during qemu img rebase

During an assisted volume snapshot delete request from Cinder nova
removes the snapshot from the backing file chain. During that nova
checks the existence of such file. However in some cases (see the bug
report) the path is relative and therefore os.path.exists fails.

This patch makes sure that nova uses the volume absolute path to make
the backing file path absolute as well.

Closes-Bug #1885528

Change-Id: I58dca95251b607eaff602783fee2fc38e2421944


** Changed in: nova
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1885528

Title:
  snapshot delete fails on shutdown VM

Status in OpenStack Compute (nova):
  Fix Released

Bug description:
  Description:
  When we try to delete the last snapshot of a VM in shutdown state, this 
snapshot_delete will fail (and be stuck in state error-deleting). When setting 
state==available and redeleting the snapshot, the volume will be corrupted and 
the VM will never start again. Volumes are stored on NFS.
  (for root cause and fix, see the bottom of this post)

  To reproduce:
  - storage on NFS
  - create a VM and some snapshots
  - shut down the VM (ie volume is still considered "attached" but vm is no 
longer "active")
  - delete the last snapshot

  Expected Result:
  snapshot is deleted, vm still works

  Actual result:
  The snapshot is stuck on error deleting. After setting the snapshot 
state==available and deleting the snapshot again, the volume will be corrupted 
and the VM will never start again. (non-existing backing_file in qcow on disk)

  Environment:
  - openstack version: stein, deployed via kolla-ansible. I suspect this 
downloads from git but i don't know the exact version.
  - hypervisor: Libvirt + KVM
  - storage: NFS
  - networking: Neutron with OpenVSwitch

  Nova debug Logs:
  2020-02-06 12:20:10.713 6 ERROR nova.virt.libvirt.driver 
[req-d38b5ec8-afdb-4dfe-af12-0c47598c6a47 6dd1c995b2ea4ddfbeb0685bc52e5fbf 
6bebb564667d4a75b9281fd826b32ecf - d
  efault default] [instance: 711651a3-8440-42dd-a210-e7e550a8624e] Error 
occurred during volume_snapshot_delete, sending error status to Cinder.: 
DiskNotFound: No disk at
   
volume-86c06b12-699c-4b54-8bca-fb92c99a2bf0.63d1585e-eb76-4e8f-bc96-93960e9c9692
  2020-02-06 12:20:10.713 6 ERROR nova.virt.libvirt.driver [instance: 
711651a3-8440-42dd-a210-e7e550a8624e] Traceback (most recent call last):
  2020-02-06 12:20:10.713 6 ERROR nova.virt.libvirt.driver [instance: 
711651a3-8440-42dd-a210-e7e550a8624e]   File 
"/usr/lib/python2.7/site-packages/nova/virt/libvirt/dri
  ver.py", line 2726, in volume_snapshot_delete
  2020-02-06 12:20:10.713 6 ERROR nova.virt.libvirt.driver [instance: 
711651a3-8440-42dd-a210-e7e550a8624e] snapshot_id, delete_info=delete_info)
  2020-02-06 12:20:10.713 6 ERROR nova.virt.libvirt.driver [instance: 
711651a3-8440-42dd-a210-e7e550a8624e]   File 
"/usr/lib/python2.7/site-packages/nova/virt/libvirt/dri
  ver.py", line 2686, in _volume_snapshot_delete
  2020-02-06 12:20:10.713 6 ERROR nova.virt.libvirt.driver [instance: 
711651a3-8440-42dd-a210-e7e550a8624e] rebase_base)
  2020-02-06 12:20:10.713 6 ERROR nova.virt.libvirt.driver [instance: 
711651a3-8440-42dd-a210-e7e550a8624e]   File 
"/usr/lib/python2.7/site-packages/nova/virt/libvirt/dri
  ver.py", line 2519, in _rebase_with_qemu_img
  2020-02-06 12:20:10.713 6 ERROR nova.virt.libvirt.driver [instance: 
711651a3-8440-42dd-a210-e7e550a8624e] b_file_fmt = 
images.qemu_img_info(backing_file).file_forma
  t
  2020-02-06 12:20:10.713 6 ERROR nova.virt.libvirt.driver [instance: 
711651a3-8440-42dd-a210-e7e550a8624e]   File 
"/usr/lib/python2.7/site-packages/nova/virt/images.py",
   line 58, in qemu_img_info
  2020-02-06 12:20:10.713 6 ERROR nova.virt.libvirt.driver [instance: 
711651a3-8440-42dd-a210-e7e550a8624e] raise 
exception.DiskNotFound(location=path)
  2020-02-06 12:20:10.713 6 ERROR nova.virt.libvirt.driver [instance: 
711651a3-8440-42dd-a210-e7e550a8624e] DiskNotFound: No disk at 
volume-86c06b12-699c-4b54-8bca-fb92c9
  9a2bf0.63d1585e-eb76-4e8f-bc96-93960e9c9692
  2020-02-06 12:20:10.713 6 ERROR nova.virt.libvirt.driver [instance: 
711651a3-8440-42dd-a210-e7e550a8624e] 
  2020-02-06 12:20:10.780 6 ERROR oslo_messaging.rpc.server 
[req-d38b5ec8-afdb-4dfe-af12-0c47598c6a47 6dd1c995b2ea4ddfbeb0685bc52e5fbf 
6bebb564667d4a75b9281fd826b32ecf - 
  default default] Exception during message handling: DiskNotFound: No disk at