[Yahoo-eng-team] [Bug 1905701] Re: Do not recreate libvirt secret when one already exists on the host during a host reboot
** Changed in: nova/wallaby Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1905701 Title: Do not recreate libvirt secret when one already exists on the host during a host reboot Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) queens series: In Progress Status in OpenStack Compute (nova) rocky series: In Progress Status in OpenStack Compute (nova) stein series: In Progress Status in OpenStack Compute (nova) train series: In Progress Status in OpenStack Compute (nova) ussuri series: In Progress Status in OpenStack Compute (nova) victoria series: Fix Released Status in OpenStack Compute (nova) wallaby series: Fix Released Status in OpenStack Compute (nova) xena series: Fix Released Bug description: Description === When [compute]/resume_guests_state_on_host_boot is enabled the compute manager will attempt to restart instances on start up. When using the libvirt driver and instances with attached LUKSv1 encrypted volumes a call is made to _attach_encryptor that currently assumes that any volume libvirt secrets don't already exist on the host. As a result this call will currently lead to an attempt to lookup encryption metadata that fails as the compute service is using a bare bones local only admin context to drive the restart of the instances. The libvirt secrets associated with LUKSv1 encrypted volumes actually persist a host reboot and thus this call to fetch encryption metadata, fetch the symmetric key etc are not required. Removal of these calls in this context should allow the compute service to start instances with these volumes attached. Steps to reproduce == * Enable [compute]/resume_guests_state_on_host_boot * Launch instances with encrypted LUKSv1 volumes attached * Reboot the underlying host Expected result === * The instances are restarted successfully by Nova as no external calls are made and the existing libvirt secret for any encrypted LUKSv1 volumes are reused. Actual result = * The instances fail to restart as the initial calls made by the Nova service use an empty admin context without a service catelog etc. Environment === 1. Exact version of OpenStack you are running. See the following master 2. Which hypervisor did you use? (For example: Libvirt + KVM, Libvirt + XEN, Hyper-V, PowerKVM, ...) What's the version of that? libvirt + QEMU/KVM 2. Which storage type did you use? (For example: Ceph, LVM, GPFS, ...) What's the version of that? N/A 3. Which networking type did you use? (For example: nova-network, Neutron with OpenVSwitch, ...) N/A Logs & Configs == 2020-08-20 11:30:12.273 7 ERROR nova.virt.libvirt.driver [instance: c5b3e7d4-99ea-409c-aba6-d32751f93ccf] File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 1641, in _connect_volume 2020-08-20 11:30:12.273 7 ERROR nova.virt.libvirt.driver [instance: c5b3e7d4-99ea-409c-aba6-d32751f93ccf] self._attach_encryptor(context, connection_info, encryption) 2020-08-20 11:30:12.273 7 ERROR nova.virt.libvirt.driver [instance: c5b3e7d4-99ea-409c-aba6-d32751f93ccf] File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 1760, in _attach_encryptor 2020-08-20 11:30:12.273 7 ERROR nova.virt.libvirt.driver [instance: c5b3e7d4-99ea-409c-aba6-d32751f93ccf] key = keymgr.get(context, encryption['encryption_key_id']) 2020-08-20 11:30:12.273 7 ERROR nova.virt.libvirt.driver [instance: c5b3e7d4-99ea-409c-aba6-d32751f93ccf] File "/usr/lib/python3.6/site-packages/castellan/key_manager/barbican_key_manager.py", line 575, in get 2020-08-20 11:30:12.273 7 ERROR nova.virt.libvirt.driver [instance: c5b3e7d4-99ea-409c-aba6-d32751f93ccf] secret = self._get_secret(context, managed_object_id) 2020-08-20 11:30:12.273 7 ERROR nova.virt.libvirt.driver [instance: c5b3e7d4-99ea-409c-aba6-d32751f93ccf] File "/usr/lib/python3.6/site-packages/castellan/key_manager/barbican_key_manager.py", line 545, in _ge t_secret 2020-08-20 11:30:12.273 7 ERROR nova.virt.libvirt.driver [instance: c5b3e7d4-99ea-409c-aba6-d32751f93ccf] barbican_client = self._get_barbican_client(context) 2020-08-20 11:30:12.273 7 ERROR nova.virt.libvirt.driver [instance: c5b3e7d4-99ea-409c-aba6-d32751f93ccf] File "/usr/lib/python3.6/site-packages/castellan/key_manager/barbican_key_manager.py", line 142, in _ge t_barbican_client 2020-08-20 11:30:12.273 7 ERROR nova.virt.libvirt.driver [instance: c5b3e7d4-99ea-409c-aba6-d32751f93ccf] self._barbican_endpoint) 2020-08-20 11:30:12.273 7 ERROR nova.virt.libvirt.driver [instance: c5b3e7d4-99ea-409c-aba6-d32751f93ccf] File
[Yahoo-eng-team] [Bug 1905701] Re: Do not recreate libvirt secret when one already exists on the host during a host reboot
Reviewed: https://review.opendev.org/c/openstack/nova/+/793463 Committed: https://opendev.org/openstack/nova/commit/26d65fc882e42b824409dff87ff026dee1debe20 Submitter: "Zuul (22348)" Branch:master commit 26d65fc882e42b824409dff87ff026dee1debe20 Author: Lee Yarwood Date: Thu May 27 16:47:26 2021 +0100 libvirt: Do not destroy volume secrets during _hard_reboot Ia2007bc63ef09931ea0197cef29d6a5614ed821a unfortunately missed that resume_state_on_host_boot calls down into _hard_reboot always removing volume secrets rendering that change useless. This change seeks to address this by using the destroy_secrets kwarg introduced by I856268b371f7ba712b02189db3c927cd762a4dc3 within the _hard_reboot method of the libvirt driver to ensure secrets are not removed during a hard reboot. This resolves the original issue in bug #1905701 *and* allows admins to hard reboot a users instance when that instance has encrypted volumes attached with secrets stored in Barbican. This latter use case being something we can easily test within tempest unlike the compute reboot in bug #1905701. This change is kept small as it should ideally be backported alongside Ia2007bc63ef09931ea0197cef29d6a5614ed821a to stable/queens. Follow up changes on master will improve formatting, doc text and introduce functional tests to further validate this new behaviour of hard reboot within the libvirt driver. Closes-Bug: #1905701 Change-Id: I3d1b21ba6eb3f5eb728693197c24b4b315eef821 ** Changed in: nova Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1905701 Title: Do not recreate libvirt secret when one already exists on the host during a host reboot Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) queens series: In Progress Status in OpenStack Compute (nova) rocky series: In Progress Status in OpenStack Compute (nova) stein series: In Progress Status in OpenStack Compute (nova) train series: In Progress Status in OpenStack Compute (nova) ussuri series: In Progress Status in OpenStack Compute (nova) victoria series: Fix Released Status in OpenStack Compute (nova) wallaby series: New Status in OpenStack Compute (nova) xena series: Fix Released Bug description: Description === When [compute]/resume_guests_state_on_host_boot is enabled the compute manager will attempt to restart instances on start up. When using the libvirt driver and instances with attached LUKSv1 encrypted volumes a call is made to _attach_encryptor that currently assumes that any volume libvirt secrets don't already exist on the host. As a result this call will currently lead to an attempt to lookup encryption metadata that fails as the compute service is using a bare bones local only admin context to drive the restart of the instances. The libvirt secrets associated with LUKSv1 encrypted volumes actually persist a host reboot and thus this call to fetch encryption metadata, fetch the symmetric key etc are not required. Removal of these calls in this context should allow the compute service to start instances with these volumes attached. Steps to reproduce == * Enable [compute]/resume_guests_state_on_host_boot * Launch instances with encrypted LUKSv1 volumes attached * Reboot the underlying host Expected result === * The instances are restarted successfully by Nova as no external calls are made and the existing libvirt secret for any encrypted LUKSv1 volumes are reused. Actual result = * The instances fail to restart as the initial calls made by the Nova service use an empty admin context without a service catelog etc. Environment === 1. Exact version of OpenStack you are running. See the following master 2. Which hypervisor did you use? (For example: Libvirt + KVM, Libvirt + XEN, Hyper-V, PowerKVM, ...) What's the version of that? libvirt + QEMU/KVM 2. Which storage type did you use? (For example: Ceph, LVM, GPFS, ...) What's the version of that? N/A 3. Which networking type did you use? (For example: nova-network, Neutron with OpenVSwitch, ...) N/A Logs & Configs == 2020-08-20 11:30:12.273 7 ERROR nova.virt.libvirt.driver [instance: c5b3e7d4-99ea-409c-aba6-d32751f93ccf] File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 1641, in _connect_volume 2020-08-20 11:30:12.273 7 ERROR nova.virt.libvirt.driver [instance: c5b3e7d4-99ea-409c-aba6-d32751f93ccf] self._attach_encryptor(context, connection_info, encryption) 2020-08-20 11:30:12.273 7 ERROR nova.virt.libvirt.driver [instance:
[Yahoo-eng-team] [Bug 1905701] Re: Do not recreate libvirt secret when one already exists on the host during a host reboot
** Changed in: nova/victoria Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1905701 Title: Do not recreate libvirt secret when one already exists on the host during a host reboot Status in OpenStack Compute (nova): In Progress Status in OpenStack Compute (nova) queens series: In Progress Status in OpenStack Compute (nova) rocky series: In Progress Status in OpenStack Compute (nova) stein series: In Progress Status in OpenStack Compute (nova) train series: In Progress Status in OpenStack Compute (nova) ussuri series: In Progress Status in OpenStack Compute (nova) victoria series: Fix Released Status in OpenStack Compute (nova) wallaby series: New Status in OpenStack Compute (nova) xena series: In Progress Bug description: Description === When [compute]/resume_guests_state_on_host_boot is enabled the compute manager will attempt to restart instances on start up. When using the libvirt driver and instances with attached LUKSv1 encrypted volumes a call is made to _attach_encryptor that currently assumes that any volume libvirt secrets don't already exist on the host. As a result this call will currently lead to an attempt to lookup encryption metadata that fails as the compute service is using a bare bones local only admin context to drive the restart of the instances. The libvirt secrets associated with LUKSv1 encrypted volumes actually persist a host reboot and thus this call to fetch encryption metadata, fetch the symmetric key etc are not required. Removal of these calls in this context should allow the compute service to start instances with these volumes attached. Steps to reproduce == * Enable [compute]/resume_guests_state_on_host_boot * Launch instances with encrypted LUKSv1 volumes attached * Reboot the underlying host Expected result === * The instances are restarted successfully by Nova as no external calls are made and the existing libvirt secret for any encrypted LUKSv1 volumes are reused. Actual result = * The instances fail to restart as the initial calls made by the Nova service use an empty admin context without a service catelog etc. Environment === 1. Exact version of OpenStack you are running. See the following master 2. Which hypervisor did you use? (For example: Libvirt + KVM, Libvirt + XEN, Hyper-V, PowerKVM, ...) What's the version of that? libvirt + QEMU/KVM 2. Which storage type did you use? (For example: Ceph, LVM, GPFS, ...) What's the version of that? N/A 3. Which networking type did you use? (For example: nova-network, Neutron with OpenVSwitch, ...) N/A Logs & Configs == 2020-08-20 11:30:12.273 7 ERROR nova.virt.libvirt.driver [instance: c5b3e7d4-99ea-409c-aba6-d32751f93ccf] File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 1641, in _connect_volume 2020-08-20 11:30:12.273 7 ERROR nova.virt.libvirt.driver [instance: c5b3e7d4-99ea-409c-aba6-d32751f93ccf] self._attach_encryptor(context, connection_info, encryption) 2020-08-20 11:30:12.273 7 ERROR nova.virt.libvirt.driver [instance: c5b3e7d4-99ea-409c-aba6-d32751f93ccf] File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 1760, in _attach_encryptor 2020-08-20 11:30:12.273 7 ERROR nova.virt.libvirt.driver [instance: c5b3e7d4-99ea-409c-aba6-d32751f93ccf] key = keymgr.get(context, encryption['encryption_key_id']) 2020-08-20 11:30:12.273 7 ERROR nova.virt.libvirt.driver [instance: c5b3e7d4-99ea-409c-aba6-d32751f93ccf] File "/usr/lib/python3.6/site-packages/castellan/key_manager/barbican_key_manager.py", line 575, in get 2020-08-20 11:30:12.273 7 ERROR nova.virt.libvirt.driver [instance: c5b3e7d4-99ea-409c-aba6-d32751f93ccf] secret = self._get_secret(context, managed_object_id) 2020-08-20 11:30:12.273 7 ERROR nova.virt.libvirt.driver [instance: c5b3e7d4-99ea-409c-aba6-d32751f93ccf] File "/usr/lib/python3.6/site-packages/castellan/key_manager/barbican_key_manager.py", line 545, in _ge t_secret 2020-08-20 11:30:12.273 7 ERROR nova.virt.libvirt.driver [instance: c5b3e7d4-99ea-409c-aba6-d32751f93ccf] barbican_client = self._get_barbican_client(context) 2020-08-20 11:30:12.273 7 ERROR nova.virt.libvirt.driver [instance: c5b3e7d4-99ea-409c-aba6-d32751f93ccf] File "/usr/lib/python3.6/site-packages/castellan/key_manager/barbican_key_manager.py", line 142, in _ge t_barbican_client 2020-08-20 11:30:12.273 7 ERROR nova.virt.libvirt.driver [instance: c5b3e7d4-99ea-409c-aba6-d32751f93ccf] self._barbican_endpoint) 2020-08-20 11:30:12.273 7 ERROR nova.virt.libvirt.driver [instance: c5b3e7d4-99ea-409c-aba6-d32751f93ccf] File
[Yahoo-eng-team] [Bug 1905701] Re: Do not recreate libvirt secret when one already exists on the host during a host reboot
So this isn't enough by itself to avoid the failure case listed in c#0 as the call to resume_state_on_host_boot in turn calls _hard_reboot that always deletes the volume secret rendering the optimisation landed above useless. It's pretty easy to reproduce this using the demo user account in devstack: $ . openrc admin admin $ openstack volume type create --encryption-provider luks --encryption-cipher aes-xts-plain64 --encryption-key-size 256 --encryption-control-location front-end LUKS $ . openrc demo demo $ openstack volume create --size 1 --type luks test $ openstack server create --image cirros-0.5.1-x86_64-disk --flavor 1 --network private test $ openstack server add volume test test $ . openrc admin admin $ openstack server reboot --hard test $ openstack server event list f65c96c6-f63f-42b3-8e00-fff5b24daa35 +--+--+---++ | Request ID | Server ID | Action| Start Time | +--+--+---++ | req-d22d8d5a-a090-4f03-a246-a4c4487319aa | f65c96c6-f63f-42b3-8e00-fff5b24daa35 | reboot| 2021-05-27T09:42:56.00 | | req-e8ab2b76-00a4-4c3c-9616-c1437acd17db | f65c96c6-f63f-42b3-8e00-fff5b24daa35 | attach_volume | 2021-05-27T09:41:52.00 | | req-2314c5c8-1584-4d7e-9044-78bcececb459 | f65c96c6-f63f-42b3-8e00-fff5b24daa35 | create| 2021-05-27T09:41:43.00 | +--+--+---++ $ openstack server event show f65c96c6-f63f-42b3-8e00-fff5b24daa35 req-d22d8d5a-a090-4f03-a246-a4c4487319aa -f json -c events | awk '{gsub("n","\n")};1' { "events": [ { "event": "compute_reboot_instance", "start_time": "2021-05-27T09:42:56.00", "finish_time": "2021-05-27T09:42:59.00", "result": "Error", "traceback": " File \"/opt/stack/nova/nova/compute/utils.py\", line 1434, in decorated_function return function(self, context, *args, **kwargs) File \"/opt/stack/nova/nova/compute/manager.py\", line 211, in decorated_function compute_utils.add_instance_fault_from_exc(context, File \"/usr/local/lib/python3.8/site-packages/oslo_utils/excutils.py\", line 227, in __exit__ self.force_reraise() File \"/usr/local/lib/python3.8/site-packages/oslo_utils/excutils.py\", line 200, in force_reraise raise self.value File \"/opt/stack/nova/nova/compute/manager.py\", line 200, in decorated_function return function(self, context, *args, **kwargs) File \"/opt/stack/nova/nova/compute/manager.py\", line 3709, in reboot_instance do_reboot_instance(context, instance, block_device_info, reboot_type) File \"/usr/local/lib/python3.8/site-packages/oslo_concurrency/lockutils.py\", line 360, in inner return f(*args, **kwargs) File \"/opt/stack/nova/nova/compute/manager.py\", line 3707, in do_reboot_instance self._reboot_instance(context, instance, block_device_info, File \"/opt/stack/nova/nova/compute/manager.py\", line 3801, in _reboot_instance self._set_instance_obj_error_state(instance) File \"/usr/local/lib/python3.8/site-packages/oslo_utils/excutils.py\", line 227, in __exit__ self.force_reraise() File \"/usr/local/lib/python3.8/site-packages/oslo_utils/excutils.py\", line 200, in force_reraise raise self.value File \"/opt/stack/nova/nova/compute/manager.py\", line 3771, in _reboot_instance self.driver.reboot(context, instance, File \"/opt/stack/nova/nova/virt/libvirt/driver.py\", line 3659, in reboot return self._hard_reboot(context, instance, network_info, File \"/opt/stack/nova/nova/virt/libvirt/driver.py\", line 3748, in _hard_reboot xml = self._get_guest_xml(context, instance, network_info, disk_info, File \"/opt/stack/nova/nova/virt/libvirt/driver.py\", line 6990, in _get_guest_xml conf = self._get_guest_config(instance, network_info, image_meta, File \"/opt/stack/nova/nova/virt/libvirt/driver.py\", line 6612, in _get_guest_config storage_configs = self._get_guest_storage_config(context, File \"/opt/stack/nova/nova/virt/libvirt/driver.py\", line 5253, in _get_guest_storage_config self._connect_volume(context, connection_info, instance) File \"/opt/stack/nova/nova/virt/libvirt/driver.py\", line 1800, in _connect_volume vol_driver.disconnect_volume(connection_info, instance) File \"/usr/local/lib/python3.8/site-packages/oslo_utils/excutils.py\", line 227, in __exit__ self.force_reraise() File \"/usr/local/lib/python3.8/site-packages/oslo_utils/excutils.py\", line 200, in force_reraise raise self.value File \"/opt/stack/nova/nova/virt/libvirt/driver.py\", line 1794, in _connect_volume self._attach_encryptor(context,
[Yahoo-eng-team] [Bug 1905701] Re: Do not recreate libvirt secret when one already exists on the host during a host reboot
https://review.opendev.org/c/openstack/nova/+/765769 proposed to stable/victoria ** Also affects: nova/queens Importance: Undecided Status: New ** Also affects: nova/rocky Importance: Undecided Status: New ** Also affects: nova/trunk Importance: Undecided Status: New ** Also affects: nova/train Importance: Undecided Status: New ** Also affects: nova/stein Importance: Undecided Status: New ** Also affects: nova/ussuri Importance: Undecided Status: New ** Also affects: nova/victoria Importance: Undecided Status: New ** No longer affects: nova/trunk ** Changed in: nova/victoria Status: New => In Progress ** Changed in: nova/victoria Assignee: (unassigned) => Lee Yarwood (lyarwood) -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1905701 Title: Do not recreate libvirt secret when one already exists on the host during a host reboot Status in OpenStack Compute (nova): In Progress Status in OpenStack Compute (nova) queens series: In Progress Status in OpenStack Compute (nova) rocky series: In Progress Status in OpenStack Compute (nova) stein series: In Progress Status in OpenStack Compute (nova) train series: In Progress Status in OpenStack Compute (nova) ussuri series: In Progress Status in OpenStack Compute (nova) victoria series: In Progress Bug description: Description === When [compute]/resume_guests_state_on_host_boot is enabled the compute manager will attempt to restart instances on start up. When using the libvirt driver and instances with attached LUKSv1 encrypted volumes a call is made to _attach_encryptor that currently assumes that any volume libvirt secrets don't already exist on the host. As a result this call will currently lead to an attempt to lookup encryption metadata that fails as the compute service is using a bare bones local only admin context to drive the restart of the instances. The libvirt secrets associated with LUKSv1 encrypted volumes actually persist a host reboot and thus this call to fetch encryption metadata, fetch the symmetric key etc are not required. Removal of these calls in this context should allow the compute service to start instances with these volumes attached. Steps to reproduce == * Enable [compute]/resume_guests_state_on_host_boot * Launch instances with encrypted LUKSv1 volumes attached * Reboot the underlying host Expected result === * The instances are restarted successfully by Nova as no external calls are made and the existing libvirt secret for any encrypted LUKSv1 volumes are reused. Actual result = * The instances fail to restart as the initial calls made by the Nova service use an empty admin context without a service catelog etc. Environment === 1. Exact version of OpenStack you are running. See the following master 2. Which hypervisor did you use? (For example: Libvirt + KVM, Libvirt + XEN, Hyper-V, PowerKVM, ...) What's the version of that? libvirt + QEMU/KVM 2. Which storage type did you use? (For example: Ceph, LVM, GPFS, ...) What's the version of that? N/A 3. Which networking type did you use? (For example: nova-network, Neutron with OpenVSwitch, ...) N/A Logs & Configs == 2020-08-20 11:30:12.273 7 ERROR nova.virt.libvirt.driver [instance: c5b3e7d4-99ea-409c-aba6-d32751f93ccf] File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 1641, in _connect_volume 2020-08-20 11:30:12.273 7 ERROR nova.virt.libvirt.driver [instance: c5b3e7d4-99ea-409c-aba6-d32751f93ccf] self._attach_encryptor(context, connection_info, encryption) 2020-08-20 11:30:12.273 7 ERROR nova.virt.libvirt.driver [instance: c5b3e7d4-99ea-409c-aba6-d32751f93ccf] File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 1760, in _attach_encryptor 2020-08-20 11:30:12.273 7 ERROR nova.virt.libvirt.driver [instance: c5b3e7d4-99ea-409c-aba6-d32751f93ccf] key = keymgr.get(context, encryption['encryption_key_id']) 2020-08-20 11:30:12.273 7 ERROR nova.virt.libvirt.driver [instance: c5b3e7d4-99ea-409c-aba6-d32751f93ccf] File "/usr/lib/python3.6/site-packages/castellan/key_manager/barbican_key_manager.py", line 575, in get 2020-08-20 11:30:12.273 7 ERROR nova.virt.libvirt.driver [instance: c5b3e7d4-99ea-409c-aba6-d32751f93ccf] secret = self._get_secret(context, managed_object_id) 2020-08-20 11:30:12.273 7 ERROR nova.virt.libvirt.driver [instance: c5b3e7d4-99ea-409c-aba6-d32751f93ccf] File "/usr/lib/python3.6/site-packages/castellan/key_manager/barbican_key_manager.py", line 545, in _ge t_secret 2020-08-20 11:30:12.273 7 ERROR nova.virt.libvirt.driver [instance: