[Yahoo-eng-team] [Bug 1930750] [NEW] pyroute2 >= 0.6.2 fails in pep8 import analysis
Public bug reported: Since version 0.6.2, pyroute2 library dynamically imports the needed modules when loaded. A static analysis will fail when checking the import references. Example: https://c918cbae52d07f0b694c- 87cfb8a8e579ae39cc41214d7e8b69d2.ssl.cf1.rackcdn.com/793735/2/check /openstack-tox-pep8/62e482e/job-output.txt Snippet: http://paste.openstack.org/show/806340/ ** Affects: neutron Importance: Critical Assignee: Rodolfo Alonso (rodolfo-alonso-hernandez) Status: In Progress ** Changed in: neutron Importance: Undecided => Critical ** Changed in: neutron Assignee: (unassigned) => Rodolfo Alonso (rodolfo-alonso-hernandez) -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1930750 Title: pyroute2 >= 0.6.2 fails in pep8 import analysis Status in neutron: In Progress Bug description: Since version 0.6.2, pyroute2 library dynamically imports the needed modules when loaded. A static analysis will fail when checking the import references. Example: https://c918cbae52d07f0b694c- 87cfb8a8e579ae39cc41214d7e8b69d2.ssl.cf1.rackcdn.com/793735/2/check /openstack-tox-pep8/62e482e/job-output.txt Snippet: http://paste.openstack.org/show/806340/ To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1930750/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1837995] Re: "Unexpected API Error" when use "openstack usage show" command
** Changed in: nova/victoria Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1837995 Title: "Unexpected API Error" when use "openstack usage show" command Status in OpenStack Compute (nova): In Progress Status in OpenStack Compute (nova) train series: New Status in OpenStack Compute (nova) ussuri series: New Status in OpenStack Compute (nova) victoria series: Fix Released Bug description: Description === For a non-admin project, if you have instance launched. And try to query the usage information on GUI by clicking Overview or on CLI: openstack usage show I will got "Error: Unable to retrieve usage information." on GUI. and the following ERROR for CLI output: $ openstack usage show Unexpected API Error. Please report this at http://bugs.launchpad.net/nova/ and attach the Nova API log if possible. (HTTP 500) (Request-ID: req-cbea9542-ecce-42fd-b660-fc5f996ea3c3) Steps to reproduce == Execute "openstack usage show" command Or click Project - Compute - Overview button on the GUI. Expected result === No Error report and the usage information shown Actual result = Unexpected API Error. Please report this at http://bugs.launchpad.net/nova/ and attach the Nova API log if possible. (HTTP 500) (Request-ID: req-cbea9542-ecce-42fd-b660-fc5f996ea3c3) Environment === 1. Exact version of OpenStack you are running. Openstack Stein on CentOS7 $ rpm -qa | grep nova openstack-nova-api-19.0.1-1.el7.noarch puppet-nova-14.4.0-1.el7.noarch python2-nova-19.0.1-1.el7.noarch openstack-nova-conductor-19.0.1-1.el7.noarch openstack-nova-novncproxy-19.0.1-1.el7.noarch openstack-nova-migration-19.0.1-1.el7.noarch openstack-nova-common-19.0.1-1.el7.noarch openstack-nova-scheduler-19.0.1-1.el7.noarch openstack-nova-console-19.0.1-1.el7.noarch python2-novaclient-13.0.1-1.el7.noarch openstack-nova-placement-api-19.0.1-1.el7.noarch openstack-nova-compute-19.0.1-1.el7.noarch 2. Which hypervisor did you use? Libvirt + KVM $ rpm -qa | grep kvm qemu-kvm-ev-2.12.0-18.el7_6.5.1.x86_64 libvirt-daemon-kvm-4.5.0-10.el7_6.12.x86_64 qemu-kvm-common-ev-2.12.0-18.el7_6.5.1.x86_64 $ rpm -qa | grep libvirt libvirt-gconfig-1.0.0-1.el7.x86_64 libvirt-daemon-driver-nwfilter-4.5.0-10.el7_6.12.x86_64 libvirt-daemon-driver-interface-4.5.0-10.el7_6.12.x86_64 libvirt-daemon-config-nwfilter-4.5.0-10.el7_6.12.x86_64 libvirt-daemon-driver-storage-mpath-4.5.0-10.el7_6.12.x86_64 libvirt-daemon-driver-storage-gluster-4.5.0-10.el7_6.12.x86_64 libvirt-daemon-driver-storage-core-4.5.0-10.el7_6.12.x86_64 libvirt-daemon-driver-secret-4.5.0-10.el7_6.12.x86_64 libvirt-daemon-driver-lxc-4.5.0-10.el7_6.12.x86_64 libvirt-daemon-driver-storage-rbd-4.5.0-10.el7_6.12.x86_64 libvirt-daemon-kvm-4.5.0-10.el7_6.12.x86_64 libvirt-bash-completion-4.5.0-10.el7_6.12.x86_64 libvirt-4.5.0-10.el7_6.12.x86_64 libvirt-glib-1.0.0-1.el7.x86_64 libvirt-daemon-4.5.0-10.el7_6.12.x86_64 libvirt-daemon-driver-qemu-4.5.0-10.el7_6.12.x86_64 libvirt-daemon-config-network-4.5.0-10.el7_6.12.x86_64 libvirt-daemon-driver-storage-disk-4.5.0-10.el7_6.12.x86_64 libvirt-daemon-driver-storage-4.5.0-10.el7_6.12.x86_64 libvirt-python-4.5.0-1.el7.x86_64 libvirt-libs-4.5.0-10.el7_6.12.x86_64 libvirt-daemon-driver-storage-scsi-4.5.0-10.el7_6.12.x86_64 libvirt-daemon-driver-network-4.5.0-10.el7_6.12.x86_64 libvirt-daemon-driver-nodedev-4.5.0-10.el7_6.12.x86_64 libvirt-daemon-driver-storage-logical-4.5.0-10.el7_6.12.x86_64 libvirt-daemon-driver-storage-iscsi-4.5.0-10.el7_6.12.x86_64 libvirt-client-4.5.0-10.el7_6.12.x86_64 libvirt-gobject-1.0.0-1.el7.x86_64 Logs & Configs == nova-api.log 2019-07-26 16:12:53.967 8673 INFO nova.osapi_compute.wsgi.server [req-69d7df76-7dd9-4d42-8eeb-347ef1c9d0a5 f887cc44f21043dca85438d74a47d68d 0d47cfd5b9c94a5790fa4472e576cba6 - default default] c5f::e2 "GET /v2.1/0d47cfd5b9c94a5790fa4472e576cba6/servers/detail?all_tenants=True=2019-07-26T08%3A07%3A55.280119%2B00%3A00 HTTP/1.1" status: 200 len: 413 time: 0.0639658 2019-07-26 16:12:57.211 8682 ERROR nova.api.openstack.wsgi [req-cbea9542-ecce-42fd-b660-fc5f996ea3c3 1e45ea9a7d5647a6a938c2ac027822f2 85dd8936d21b46a8878ed59678c7ad9a - default default] Unexpected exception in API method: OrphanedObjectError: Cannot call obj_load_attr on orphaned Instance object 2019-07-26 16:12:57.211 8682 ERROR nova.api.openstack.wsgi Traceback (most recent call last): 2019-07-26 16:12:57.211 8682 ERROR nova.api.openstack.wsgi File "/usr/lib/python2.7/site-packages/nova/api/openstack/wsgi.py", line 671, in wrapped 2019-07-26 16:12:57.211 8682 ERROR nova.api.openstack.wsgi return f(*args,
[Yahoo-eng-team] [Bug 1732428] Re: Unshelving a VM breaks instance metadata when using qcow2 backed images
** Changed in: nova/ussuri Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1732428 Title: Unshelving a VM breaks instance metadata when using qcow2 backed images Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) ocata series: Confirmed Status in OpenStack Compute (nova) pike series: Confirmed Status in OpenStack Compute (nova) train series: Fix Committed Status in OpenStack Compute (nova) ussuri series: Fix Released Bug description: If you unshelve instances on compute nodes that use qcow2 backed instances, the instance image_ref will point to the original image the VM was lauched from. The base file for /var/lib/nova/instances/uuid/disk will be the snapshot which was used for shelving. This causes errors with e.g. resizes and migrations. Steps to reproduce/what happens: Have at least 2 compute nodes configured with the standard qcow2 backed images. 1) Launch an instance. 2) Shelve the instance. In the background this should in practice create a flattened snapshot of the VM. 3) Unshelve the instance. The instance will boot on one of the compute nodes. The /var/lib/nova/instances/uuid/disk should now have the snapshot as its base file. The instance metadata still claims that the image_ref is the original image which the VM was launched from, not the snapshot. 4) Resize/migrate the instance. /var/lib/nova/instances/uuid/disk should be copied to the other compute node. If you resize to an image with the same size disk, go to 5), if you resize to flavor with a larger disk, it probably causes an error here when it tries to grow the disk. 5a) If the instance was running: When nova tries to start the VM, it will copy the original base image to the new compute node, not the snapshot base image. The instance can't boot, since it doesn't find its actual base file, and it goes to an ERROR state. 5b) If the instance was shutdown: You can confirm the resize, but the VM won't start. The snapshot base file may be removed from the source machine causing dataloss. What should have happened: Either the instance image_ref should be updated to the snapshot image, or the snapshot image should be rebased to the original image, or is should force a raw only image after unshelve, or something else you smart people come up with. Environment: RDO Neutron with KVM rpm -qa |grep nova openstack-nova-common-14.0.6-1.el7.noarch python2-novaclient-6.0.1-1.el7.noarch python-nova-14.0.6-1.el7.noarch openstack-nova-compute-14.0.6-1.el7.noarch Also a big thank you to Toni Peltonen and Anton Aksola from nebula.fi for discovering and debugging this issue. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1732428/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1904446] Re: 'GetPMEMNamespacesFailed' is not a valid exception
** Changed in: nova/ussuri Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1904446 Title: 'GetPMEMNamespacesFailed' is not a valid exception Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) train series: In Progress Status in OpenStack Compute (nova) ussuri series: Fix Released Status in OpenStack Compute (nova) victoria series: Fix Released Bug description: Attempting to retrieve a non-existent PMEM device results in the following traceback: ./nova-compute.log.1:2020-11-16 16:01:22.704 7 ERROR nova During handling of the above exception, another exception occurred: ./nova-compute.log.1:2020-11-16 16:01:22.704 7 ERROR nova ./nova-compute.log.1:2020-11-16 16:01:22.704 7 ERROR nova Traceback (most recent call last): ./nova-compute.log.1:2020-11-16 16:01:22.704 7 ERROR nova File "/usr/bin/nova-compute", line 10, in ./nova-compute.log.1:2020-11-16 16:01:22.704 7 ERROR nova sys.exit(main()) ./nova-compute.log.1:2020-11-16 16:01:22.704 7 ERROR nova File "/usr/lib/python3.6/site-packages/nova/cmd/compute.py", line 57, in main ./nova-compute.log.1:2020-11-16 16:01:22.704 7 ERROR nova topic=compute_rpcapi.RPC_TOPIC) ./nova-compute.log.1:2020-11-16 16:01:22.704 7 ERROR nova File "/usr/lib/python3.6/site-packages/nova/service.py", line 271, in create ./nova-compute.log.1:2020-11-16 16:01:22.704 7 ERROR nova periodic_interval_max=periodic_interval_max) ./nova-compute.log.1:2020-11-16 16:01:22.704 7 ERROR nova File "/usr/lib/python3.6/site-packages/nova/service.py", line 129, in __init__ ./nova-compute.log.1:2020-11-16 16:01:22.704 7 ERROR nova self.manager = manager_class(host=self.host, *args, **kwargs) ./nova-compute.log.1:2020-11-16 16:01:22.704 7 ERROR nova File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 571, in __init__ ./nova-compute.log.1:2020-11-16 16:01:22.704 7 ERROR nova self.driver = driver.load_compute_driver(self.virtapi, compute_driver) ./nova-compute.log.1:2020-11-16 16:01:22.704 7 ERROR nova File "/usr/lib/python3.6/site-packages/nova/virt/driver.py", line 1911, in load_compute_driver ./nova-compute.log.1:2020-11-16 16:01:22.704 7 ERROR nova virtapi) ./nova-compute.log.1:2020-11-16 16:01:22.704 7 ERROR nova File "/usr/lib/python3.6/site-packages/oslo_utils/importutils.py", line 44, in import_object ./nova-compute.log.1:2020-11-16 16:01:22.704 7 ERROR nova return import_class(import_str)(*args, **kwargs) ./nova-compute.log.1:2020-11-16 16:01:22.704 7 ERROR nova File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 446, in __init__ ./nova-compute.log.1:2020-11-16 16:01:22.704 7 ERROR nova vpmem_conf=CONF.libvirt.pmem_namespaces) ./nova-compute.log.1:2020-11-16 16:01:22.704 7 ERROR nova File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 477, in _discover_vpmems ./nova-compute.log.1:2020-11-16 16:01:22.704 7 ERROR nova vpmems_host = self._get_vpmems_on_host() ./nova-compute.log.1:2020-11-16 16:01:22.704 7 ERROR nova File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 512, in _get_vpmems_on_host ./nova-compute.log.1:2020-11-16 16:01:22.704 7 ERROR nova raise exception.GetPMEMNamespacesFailed(reason=reason) ./nova-compute.log.1:2020-11-16 16:01:22.704 7 ERROR nova AttributeError: module 'nova.exception' has no attribute 'GetPMEMNamespacesFailed' ./nova-compute.log.1:2020-11-16 16:01:22.704 7 ERROR nova It seems there was a typo introduced when this code was added. The code referenced 'GetPMEMNamespacesFailed' but the exception, which has since been removed since it was "unused", was called 'GetPMEMNamespaceFailed'. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1904446/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1900006] Re: Asking for different vGPU types is racey
** Changed in: nova/victoria Status: Confirmed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/196 Title: Asking for different vGPU types is racey Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) victoria series: Fix Released Bug description: When testing on Victoria virtual GPUs, I wanted to have different types : [devices] enabled_vgpu_types = nvidia-320,nvidia-321 [vgpu_nvidia-320] device_addresses = :04:02.1,:04:02.2 [vgpu_nvidia-321] device_addresses = :04:02.3 Unfortunately, I saw that only the first type was used. When restarting the nova-compute service, we got the log : WARNING nova.virt.libvirt.driver [None req-a23d9cb4-6554-499c-9fcf-d7f9706535ef None None] The vGPU type 'nvidia-320' was listed in '[devices] enabled_vgpu_types' but no corresponding '[vgpu_nvidia-320]' group or '[vgpu_nvidia-320] device_addresses' option was defined. Only the first type 'nvidia-320' will be used. It's due to the fact that we call _get_supported_vgpu_types() first when creating the libvirt implementation [1] while we only register the new CONF options by init_host() [2] which is called after. [1] https://github.com/openstack/nova/blob/90777d790d7c268f50851ac3e5b4e02617f5ae1c/nova/virt/libvirt/driver.py#L418 [2] https://github.com/openstack/nova/blob/90777d7/nova/compute/manager.py#L1405 A simple fix would just be to make sure we have dynamic options within _get_supported_vgpu_types() To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/196/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1896496] Re: Combination of 'hw_video_ram' image metadata prop, 'hw_video:ram_max_mb' extra spec raises error
** Also affects: nova/train Importance: Undecided Status: New ** Changed in: nova/train Status: New => Fix Released ** Also affects: nova/ussuri Importance: Undecided Status: New ** Changed in: nova/ussuri Status: New => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1896496 Title: Combination of 'hw_video_ram' image metadata prop, 'hw_video:ram_max_mb' extra spec raises error Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) train series: Fix Released Status in OpenStack Compute (nova) ussuri series: Fix Released Status in OpenStack Compute (nova) victoria series: Fix Released Bug description: The 'hw_video_ram' image metadata property is used to configure the amount of memory allocated to VRAM. Using it requires specifying the 'hw_video:ram_max_mb' extra spec or you'll get the following error: nova.exception.RequestedVRamTooHigh: The requested amount of video memory 8 is higher than the maximum allowed by flavor 0. However, specifying these currently results in a libvirt failure. ERROR nova.compute.manager [None ...] [instance: 11a71ae4-e410-4856-aeab-eea6ca4784c5] Failed to build and run instance: libvirt.libvirtError: XML error: cannot parse video vram '8192.0' ERROR nova.compute.manager [instance: ...] Traceback (most recent call last): ERROR nova.compute.manager [instance: ...] File "/opt/stack/nova/nova/compute/manager.py", line 2333, in _build_and_run_instance ERROR nova.compute.manager [instance: ...] accel_info=accel_info) ERROR nova.compute.manager [instance: ...] File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 3632, in spawn ERROR nova.compute.manager [instance: ...] cleanup_instance_disks=created_disks) ERROR nova.compute.manager [instance: ...] File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 6527, in _create_domain_and_network ERROR nova.compute.manager [instance: ...] cleanup_instance_disks=cleanup_instance_disks) ERROR nova.compute.manager [instance: ...] File "/usr/local/lib/python3.6/dist-packages/oslo_utils/excutils.py", line 220, in __exit__ ERROR nova.compute.manager [instance: ...] self.force_reraise() ERROR nova.compute.manager [instance: ...] File "/usr/local/lib/python3.6/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise ERROR nova.compute.manager [instance: ...] six.reraise(self.type_, self.value, self.tb) ERROR nova.compute.manager [instance: ...] File "/usr/local/lib/python3.6/dist-packages/six.py", line 703, in reraise ERROR nova.compute.manager [instance: ...] raise value ERROR nova.compute.manager [instance: ...] File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 6496, in _create_domain_and_network ERROR nova.compute.manager [instance: ...] post_xml_callback=post_xml_callback) ERROR nova.compute.manager [instance: ...] File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 6425, in _create_domain ERROR nova.compute.manager [instance: ...] guest = libvirt_guest.Guest.create(xml, self._host) ERROR nova.compute.manager [instance: ...] File "/opt/stack/nova/nova/virt/libvirt/guest.py", line 127, in create ERROR nova.compute.manager [instance: ...] encodeutils.safe_decode(xml)) ERROR nova.compute.manager [instance: ...] File "/usr/local/lib/python3.6/dist-packages/oslo_utils/excutils.py", line 220, in __exit__ ERROR nova.compute.manager [instance: ...] self.force_reraise() ERROR nova.compute.manager [instance: ...] File "/usr/local/lib/python3.6/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise ERROR nova.compute.manager [instance: ...] six.reraise(self.type_, self.value, self.tb) ERROR nova.compute.manager [instance: ...] File "/usr/local/lib/python3.6/dist-packages/six.py", line 703, in reraise ERROR nova.compute.manager [instance: ...] raise value ERROR nova.compute.manager [instance: ...] File "/opt/stack/nova/nova/virt/libvirt/guest.py", line 123, in create ERROR nova.compute.manager [instance: ...] guest = host.write_instance_config(xml) ERROR nova.compute.manager [instance: ...] File "/opt/stack/nova/nova/virt/libvirt/host.py", line 1135, in write_instance_config ERROR nova.compute.manager [instance: ...] domain = self.get_connection().defineXML(xml) ERROR nova.compute.manager [instance: ...] File "/usr/local/lib/python3.6/dist-packages/eventlet/tpool.py", line 190, in doit ERROR nova.compute.manager [instance: ...] result = proxy_call(self._autowrap, f, *args, **kwargs) ERROR nova.compute.manager [instance: ...] File "/usr/local/lib/python3.6/dist-packages/eventlet/tpool.py", line 148, in proxy_call ERROR
[Yahoo-eng-team] [Bug 1899541] Re: archive_deleted_rows archives pci_devices records as residue because of 'instance_uuid'
** Changed in: nova/train Status: In Progress => Fix Released ** Changed in: nova/ussuri Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1899541 Title: archive_deleted_rows archives pci_devices records as residue because of 'instance_uuid' Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) queens series: In Progress Status in OpenStack Compute (nova) rocky series: In Progress Status in OpenStack Compute (nova) stein series: In Progress Status in OpenStack Compute (nova) train series: Fix Released Status in OpenStack Compute (nova) ussuri series: Fix Released Status in OpenStack Compute (nova) victoria series: Fix Released Bug description: This is based on a bug reported downstream [1] where after a random amount of time, update_available_resource began to fail with the following trace on nodes with PCI devices: "traceback": [ "Traceback (most recent call last):", " File \"/usr/lib/python2.7/site-packages/nova/compute/manager.py\", line 7447, in update_available_resource_for_node", "rt.update_available_resource(context, nodename)", " File \"/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py\", line 706, in update_available_resource", "self._update_available_resource(context, resources)", " File \"/usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py\", line 274, in inner", "return f(*args, **kwargs)", " File \"/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py\", line 782, in _update_available_resource", "self._update(context, cn)", " File \"/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py\", line 926, in _update", "self.pci_tracker.save(context)", " File \"/usr/lib/python2.7/site-packages/nova/pci/manager.py\", line 92, in save", "dev.save()", " File \"/usr/lib/python2.7/site-packages/oslo_versionedobjects/base.py\", line 210, in wrapper", "ctxt, self, fn.__name__, args, kwargs)", " File \"/usr/lib/python2.7/site-packages/nova/conductor/rpcapi.py\", line 245, in object_action", "objmethod=objmethod, args=args, kwargs=kwargs)", " File \"/usr/lib/python2.7/site-packages/oslo_messaging/rpc/client.py\", line 174, in call", "retry=self.retry)", " File \"/usr/lib/python2.7/site-packages/oslo_messaging/transport.py\", line 131, in _send", "timeout=timeout, retry=retry)", " File \"/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py\", line 559, in send", "retry=retry)", " File \"/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py\", line 550, in _send", "raise result", "RemoteError: Remote error: DBError (pymysql.err.IntegrityError) (1048, u\"Column 'compute_node_id' cannot be null\") [SQL: u'INSERT INTO pci_devices (created_at, updated_at, deleted_at, deleted, uuid, compute_node_id, address, vendor_id, product_id, dev_type, dev_id, label, status, request_id, extra_info, instance_uuid, numa_node, parent_addr) VALUES (%(created_at)s, %(updated_at)s, %(deleted_at)s, %(deleted)s, %(uuid)s, %(compute_node_id)s, %(address)s, %(vendor_id)s, %(product_id)s, %(dev_type)s, %(dev_id)s, %(label)s, %(status)s, %(request_id)s, %(extra_info)s, %(instance_uuid)s, %(numa_node)s, %(parent_addr)s)'] [parameters: {'status': u'available', 'instance_uuid': None, 'dev_type': None, 'uuid': None, 'dev_id': None, 'parent_addr': None, 'numa_node': None, 'created_at': datetime.datetime(2020, 8, 7, 11, 51, 19, 643044), 'vendor_id': None, 'updated_at': None, 'label': None, 'deleted': 0, 'extra_info': '{}', 'compute_node_id': None, 'request_id': None, 'deleted_at': None, 'address': None, 'product_id': None}] (Background on this error at: http://sqlalche.me/e/gkpj)", Here ^ we see an attempt to insert a nearly empty (NULL fields) record into the pci_devices table. Inspection of the code shows that the way this can occur is if we fail to lookup the pci_devices record we want and then we try to create a new one [2]: @pick_context_manager_writer def pci_device_update(context, node_id, address, values): query = model_query(context, models.PciDevice, read_deleted="no").\ filter_by(compute_node_id=node_id).\ filter_by(address=address) if query.update(values) == 0: device = models.PciDevice() device.update(values) context.session.add(device) return query.one() Turns out what was happening was when a request came in to delete an instance that had allocated a PCI device, if the archive_deleted_rows cron job fired at just the right (wrong) moment, it would sweep
[Yahoo-eng-team] [Bug 1885528] Re: snapshot delete fails on shutdown VM
** Changed in: nova/ussuri Status: In Progress => Fix Released ** Changed in: nova/victoria Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1885528 Title: snapshot delete fails on shutdown VM Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) queens series: New Status in OpenStack Compute (nova) rocky series: In Progress Status in OpenStack Compute (nova) stein series: New Status in OpenStack Compute (nova) trunk series: New Status in OpenStack Compute (nova) ussuri series: Fix Released Status in OpenStack Compute (nova) victoria series: Fix Released Bug description: Description: When we try to delete the last snapshot of a VM in shutdown state, this snapshot_delete will fail (and be stuck in state error-deleting). When setting state==available and redeleting the snapshot, the volume will be corrupted and the VM will never start again. Volumes are stored on NFS. (for root cause and fix, see the bottom of this post) To reproduce: - storage on NFS - create a VM and some snapshots - shut down the VM (ie volume is still considered "attached" but vm is no longer "active") - delete the last snapshot Expected Result: snapshot is deleted, vm still works Actual result: The snapshot is stuck on error deleting. After setting the snapshot state==available and deleting the snapshot again, the volume will be corrupted and the VM will never start again. (non-existing backing_file in qcow on disk) Environment: - openstack version: stein, deployed via kolla-ansible. I suspect this downloads from git but i don't know the exact version. - hypervisor: Libvirt + KVM - storage: NFS - networking: Neutron with OpenVSwitch Nova debug Logs: 2020-02-06 12:20:10.713 6 ERROR nova.virt.libvirt.driver [req-d38b5ec8-afdb-4dfe-af12-0c47598c6a47 6dd1c995b2ea4ddfbeb0685bc52e5fbf 6bebb564667d4a75b9281fd826b32ecf - d efault default] [instance: 711651a3-8440-42dd-a210-e7e550a8624e] Error occurred during volume_snapshot_delete, sending error status to Cinder.: DiskNotFound: No disk at volume-86c06b12-699c-4b54-8bca-fb92c99a2bf0.63d1585e-eb76-4e8f-bc96-93960e9c9692 2020-02-06 12:20:10.713 6 ERROR nova.virt.libvirt.driver [instance: 711651a3-8440-42dd-a210-e7e550a8624e] Traceback (most recent call last): 2020-02-06 12:20:10.713 6 ERROR nova.virt.libvirt.driver [instance: 711651a3-8440-42dd-a210-e7e550a8624e] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/dri ver.py", line 2726, in volume_snapshot_delete 2020-02-06 12:20:10.713 6 ERROR nova.virt.libvirt.driver [instance: 711651a3-8440-42dd-a210-e7e550a8624e] snapshot_id, delete_info=delete_info) 2020-02-06 12:20:10.713 6 ERROR nova.virt.libvirt.driver [instance: 711651a3-8440-42dd-a210-e7e550a8624e] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/dri ver.py", line 2686, in _volume_snapshot_delete 2020-02-06 12:20:10.713 6 ERROR nova.virt.libvirt.driver [instance: 711651a3-8440-42dd-a210-e7e550a8624e] rebase_base) 2020-02-06 12:20:10.713 6 ERROR nova.virt.libvirt.driver [instance: 711651a3-8440-42dd-a210-e7e550a8624e] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/dri ver.py", line 2519, in _rebase_with_qemu_img 2020-02-06 12:20:10.713 6 ERROR nova.virt.libvirt.driver [instance: 711651a3-8440-42dd-a210-e7e550a8624e] b_file_fmt = images.qemu_img_info(backing_file).file_forma t 2020-02-06 12:20:10.713 6 ERROR nova.virt.libvirt.driver [instance: 711651a3-8440-42dd-a210-e7e550a8624e] File "/usr/lib/python2.7/site-packages/nova/virt/images.py", line 58, in qemu_img_info 2020-02-06 12:20:10.713 6 ERROR nova.virt.libvirt.driver [instance: 711651a3-8440-42dd-a210-e7e550a8624e] raise exception.DiskNotFound(location=path) 2020-02-06 12:20:10.713 6 ERROR nova.virt.libvirt.driver [instance: 711651a3-8440-42dd-a210-e7e550a8624e] DiskNotFound: No disk at volume-86c06b12-699c-4b54-8bca-fb92c9 9a2bf0.63d1585e-eb76-4e8f-bc96-93960e9c9692 2020-02-06 12:20:10.713 6 ERROR nova.virt.libvirt.driver [instance: 711651a3-8440-42dd-a210-e7e550a8624e] 2020-02-06 12:20:10.780 6 ERROR oslo_messaging.rpc.server [req-d38b5ec8-afdb-4dfe-af12-0c47598c6a47 6dd1c995b2ea4ddfbeb0685bc52e5fbf 6bebb564667d4a75b9281fd826b32ecf - default default] Exception during message handling: DiskNotFound: No disk at volume-86c06b12-699c-4b54-8bca-fb92c99a2bf0.63d1585e-eb76-4e8f-bc96-93960e9c9692 2020-02-06 12:20:10.780 6 ERROR oslo_messaging.rpc.server Traceback (most recent call last): 2020-02-06 12:20:10.780 6 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 166, in _process_incoming 2020-02-06 12:20:10.780 6 ERROR oslo_messaging.rpc.server res =
[Yahoo-eng-team] [Bug 1905701] Re: Do not recreate libvirt secret when one already exists on the host during a host reboot
** Changed in: nova/victoria Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1905701 Title: Do not recreate libvirt secret when one already exists on the host during a host reboot Status in OpenStack Compute (nova): In Progress Status in OpenStack Compute (nova) queens series: In Progress Status in OpenStack Compute (nova) rocky series: In Progress Status in OpenStack Compute (nova) stein series: In Progress Status in OpenStack Compute (nova) train series: In Progress Status in OpenStack Compute (nova) ussuri series: In Progress Status in OpenStack Compute (nova) victoria series: Fix Released Status in OpenStack Compute (nova) wallaby series: New Status in OpenStack Compute (nova) xena series: In Progress Bug description: Description === When [compute]/resume_guests_state_on_host_boot is enabled the compute manager will attempt to restart instances on start up. When using the libvirt driver and instances with attached LUKSv1 encrypted volumes a call is made to _attach_encryptor that currently assumes that any volume libvirt secrets don't already exist on the host. As a result this call will currently lead to an attempt to lookup encryption metadata that fails as the compute service is using a bare bones local only admin context to drive the restart of the instances. The libvirt secrets associated with LUKSv1 encrypted volumes actually persist a host reboot and thus this call to fetch encryption metadata, fetch the symmetric key etc are not required. Removal of these calls in this context should allow the compute service to start instances with these volumes attached. Steps to reproduce == * Enable [compute]/resume_guests_state_on_host_boot * Launch instances with encrypted LUKSv1 volumes attached * Reboot the underlying host Expected result === * The instances are restarted successfully by Nova as no external calls are made and the existing libvirt secret for any encrypted LUKSv1 volumes are reused. Actual result = * The instances fail to restart as the initial calls made by the Nova service use an empty admin context without a service catelog etc. Environment === 1. Exact version of OpenStack you are running. See the following master 2. Which hypervisor did you use? (For example: Libvirt + KVM, Libvirt + XEN, Hyper-V, PowerKVM, ...) What's the version of that? libvirt + QEMU/KVM 2. Which storage type did you use? (For example: Ceph, LVM, GPFS, ...) What's the version of that? N/A 3. Which networking type did you use? (For example: nova-network, Neutron with OpenVSwitch, ...) N/A Logs & Configs == 2020-08-20 11:30:12.273 7 ERROR nova.virt.libvirt.driver [instance: c5b3e7d4-99ea-409c-aba6-d32751f93ccf] File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 1641, in _connect_volume 2020-08-20 11:30:12.273 7 ERROR nova.virt.libvirt.driver [instance: c5b3e7d4-99ea-409c-aba6-d32751f93ccf] self._attach_encryptor(context, connection_info, encryption) 2020-08-20 11:30:12.273 7 ERROR nova.virt.libvirt.driver [instance: c5b3e7d4-99ea-409c-aba6-d32751f93ccf] File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 1760, in _attach_encryptor 2020-08-20 11:30:12.273 7 ERROR nova.virt.libvirt.driver [instance: c5b3e7d4-99ea-409c-aba6-d32751f93ccf] key = keymgr.get(context, encryption['encryption_key_id']) 2020-08-20 11:30:12.273 7 ERROR nova.virt.libvirt.driver [instance: c5b3e7d4-99ea-409c-aba6-d32751f93ccf] File "/usr/lib/python3.6/site-packages/castellan/key_manager/barbican_key_manager.py", line 575, in get 2020-08-20 11:30:12.273 7 ERROR nova.virt.libvirt.driver [instance: c5b3e7d4-99ea-409c-aba6-d32751f93ccf] secret = self._get_secret(context, managed_object_id) 2020-08-20 11:30:12.273 7 ERROR nova.virt.libvirt.driver [instance: c5b3e7d4-99ea-409c-aba6-d32751f93ccf] File "/usr/lib/python3.6/site-packages/castellan/key_manager/barbican_key_manager.py", line 545, in _ge t_secret 2020-08-20 11:30:12.273 7 ERROR nova.virt.libvirt.driver [instance: c5b3e7d4-99ea-409c-aba6-d32751f93ccf] barbican_client = self._get_barbican_client(context) 2020-08-20 11:30:12.273 7 ERROR nova.virt.libvirt.driver [instance: c5b3e7d4-99ea-409c-aba6-d32751f93ccf] File "/usr/lib/python3.6/site-packages/castellan/key_manager/barbican_key_manager.py", line 142, in _ge t_barbican_client 2020-08-20 11:30:12.273 7 ERROR nova.virt.libvirt.driver [instance: c5b3e7d4-99ea-409c-aba6-d32751f93ccf] self._barbican_endpoint) 2020-08-20 11:30:12.273 7 ERROR nova.virt.libvirt.driver [instance: c5b3e7d4-99ea-409c-aba6-d32751f93ccf] File
[Yahoo-eng-team] [Bug 1911924] Re: os-resetState not logged as an instance action
** Changed in: nova/victoria Status: New => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1911924 Title: os-resetState not logged as an instance action Status in OpenStack Compute (nova): Confirmed Status in OpenStack Compute (nova) train series: New Status in OpenStack Compute (nova) ussuri series: New Status in OpenStack Compute (nova) victoria series: Fix Released Bug description: Description === When called os-resetState does not record an instance action. Steps to reproduce == $ nova reset-state --active test $ openstack server event list test Expected result === os-resetState listed as an instance action. Actual result = os-resetState not listed as an instance action. Environment === 1. Exact version of OpenStack you are running. See the following list for all releases: http://docs.openstack.org/releases/ 7aa7fb94fd3573f6006f7eb8bc92b870b1750721 2. Which hypervisor did you use? (For example: Libvirt + KVM, Libvirt + XEN, Hyper-V, PowerKVM, ...) What's the version of that? libvirt 2. Which storage type did you use? (For example: Ceph, LVM, GPFS, ...) What's the version of that? N/A 3. Which networking type did you use? (For example: nova-network, Neutron with OpenVSwitch, ...) N/A To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1911924/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1882421] Re: inject_password fails with python3
** Changed in: nova/victoria Status: Fix Committed => Fix Released ** Also affects: nova/wallaby Importance: Undecided Status: New ** Changed in: nova/wallaby Status: New => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1882421 Title: inject_password fails with python3 Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) train series: New Status in OpenStack Compute (nova) ussuri series: New Status in OpenStack Compute (nova) victoria series: Fix Released Status in OpenStack Compute (nova) wallaby series: Fix Released Bug description: Originally reported in #openstack-nova: 14:44 < lvdombrkr> hello guys, trying to inject admin_password (inject_password=true ) into image but when creating instance get this error in nova-compute.log 14:45 < lvdombrkr> 2020-06-06 14:53:50.188 6 WARNING nova.virt.disk.api [req-94f485ca-944c-40e9-bf14-c8b8dbe09a7b 052d02306e6746a4a3e7e5449de49f8c 413a4cadf9734fca9ec3e5e6192a446f - default default] Ignoring error injecting admin_password into image (a bytes-like object is required, not 'str') 14:45 < lvdombrkr> Train + Centos8 Can reproduce on master on devstack by installing python3-guestfs and setting [libvirt] inject_partition = -1 inject_password = true in nova-cpu.conf. Backtrace after adding a hard "raise" into inject_data_into_fs(): Jun 06 15:48:39 jh-devstack-focal-01a nova-compute[2983293]: ERROR nova.virt.libvirt.driver [None req-47214a25-b56a-4135-83bb-7c5ff4c86ca6 demo demo] [instance: 5604d60c-61c9-49b5-8786-ff5144817863] Error injecting data into image 4b3e63a6-b3c4-4de5-b515-cc286e7d5c48 (a bytes-like object is required, not 'str') Jun 06 15:48:39 jh-devstack-focal-01a nova-compute[2983293]: ERROR nova.compute.manager [None req-47214a25-b56a-4135-83bb-7c5ff4c86ca6 demo demo] [instance: 5604d60c-61c9-49b5-8786-ff5144817863] Instance failed to spawn: TypeError: a bytes-like object is required, not 'str' Jun 06 15:48:39 jh-devstack-focal-01a nova-compute[2983293]: ERROR nova.compute.manager [instance: 5604d60c-61c9-49b5-8786-ff5144817863] Traceback (most recent call last): Jun 06 15:48:39 jh-devstack-focal-01a nova-compute[2983293]: ERROR nova.compute.manager [instance: 5604d60c-61c9-49b5-8786-ff5144817863] File "/opt/stack/nova/nova/compute/manager.py", line 2614, in _build_resources Jun 06 15:48:39 jh-devstack-focal-01a nova-compute[2983293]: ERROR nova.compute.manager [instance: 5604d60c-61c9-49b5-8786-ff5144817863] yield resources Jun 06 15:48:39 jh-devstack-focal-01a nova-compute[2983293]: ERROR nova.compute.manager [instance: 5604d60c-61c9-49b5-8786-ff5144817863] File "/opt/stack/nova/nova/compute/manager.py", line 2374, in _build_and_run_instance Jun 06 15:48:39 jh-devstack-focal-01a nova-compute[2983293]: ERROR nova.compute.manager [instance: 5604d60c-61c9-49b5-8786-ff5144817863] self.driver.spawn(context, instance, image_meta, Jun 06 15:48:39 jh-devstack-focal-01a nova-compute[2983293]: ERROR nova.compute.manager [instance: 5604d60c-61c9-49b5-8786-ff5144817863] File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 3604, in spawn Jun 06 15:48:39 jh-devstack-focal-01a nova-compute[2983293]: ERROR nova.compute.manager [instance: 5604d60c-61c9-49b5-8786-ff5144817863] created_instance_dir, created_disks = self._create_image( Jun 06 15:48:39 jh-devstack-focal-01a nova-compute[2983293]: ERROR nova.compute.manager [instance: 5604d60c-61c9-49b5-8786-ff5144817863] File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 3991, in _create_image Jun 06 15:48:39 jh-devstack-focal-01a nova-compute[2983293]: ERROR nova.compute.manager [instance: 5604d60c-61c9-49b5-8786-ff5144817863] created_disks = self._create_and_inject_local_root( Jun 06 15:48:39 jh-devstack-focal-01a nova-compute[2983293]: ERROR nova.compute.manager [instance: 5604d60c-61c9-49b5-8786-ff5144817863] File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 4119, in _create_and_inject_local_root Jun 06 15:48:39 jh-devstack-focal-01a nova-compute[2983293]: ERROR nova.compute.manager [instance: 5604d60c-61c9-49b5-8786-ff5144817863] self._inject_data(backend, instance, injection_info) Jun 06 15:48:39 jh-devstack-focal-01a nova-compute[2983293]: ERROR nova.compute.manager [instance: 5604d60c-61c9-49b5-8786-ff5144817863] File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 3894, in _inject_data Jun 06 15:48:39 jh-devstack-focal-01a nova-compute[2983293]: ERROR nova.compute.manager [instance: 5604d60c-61c9-49b5-8786-ff5144817863] LOG.error('Error injecting data into image ' Jun 06 15:48:39 jh-devstack-focal-01a nova-compute[2983293]: ERROR nova.compute.manager [instance: 5604d60c-61c9-49b5-8786-ff5144817863] File
[Yahoo-eng-team] [Bug 1882608] Re: DELETE fails with HTTP 500, StaleDataError: UPDATE statement on table 'instance_mappings' expected to update 1 row(s); 0 were matched
** Changed in: nova/ussuri Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1882608 Title: DELETE fails with HTTP 500, StaleDataError: UPDATE statement on table 'instance_mappings' expected to update 1 row(s); 0 were matched Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) train series: In Progress Status in OpenStack Compute (nova) ussuri series: Fix Released Bug description: Noticed in a failed nova-grenade-multinode gate job where a resource cleanup (server delete) during a ServersNegativeTestJSON test results in a 500 error and the job fails with: Jun 01 14:33:57.523020 ubuntu-bionic-rax-iad-0016890725 devstack@n-api.service[13722]: ERROR nova.api.openstack.wsgi [None req-ab8b5ad1-c168-4f7e-9bfc-42b202b9894f tempest-ServersNegativeTestJSON-1435542876 tempest-ServersNegativeTestJSON-1435542876] Unexpected exception in API method: sqlalchemy.orm.exc.StaleDataError: UPDATE statement on table 'instance_mappings' expected to update 1 row(s); 0 were matched. Jun 01 14:33:57.523020 ubuntu-bionic-rax-iad-0016890725 devstack@n-api.service[13722]: ERROR nova.api.openstack.wsgi Traceback (most recent call last): Jun 01 14:33:57.523020 ubuntu-bionic-rax-iad-0016890725 devstack@n-api.service[13722]: ERROR nova.api.openstack.wsgi File "/opt/stack/new/nova/nova/api/openstack/wsgi.py", line 671, in wrapped Jun 01 14:33:57.523020 ubuntu-bionic-rax-iad-0016890725 devstack@n-api.service[13722]: ERROR nova.api.openstack.wsgi return f(*args, **kwargs) Jun 01 14:33:57.523020 ubuntu-bionic-rax-iad-0016890725 devstack@n-api.service[13722]: ERROR nova.api.openstack.wsgi File "/opt/stack/new/nova/nova/api/openstack/compute/servers.py", line 990, in delete Jun 01 14:33:57.523020 ubuntu-bionic-rax-iad-0016890725 devstack@n-api.service[13722]: ERROR nova.api.openstack.wsgi self._delete(req.environ['nova.context'], req, id) Jun 01 14:33:57.523020 ubuntu-bionic-rax-iad-0016890725 devstack@n-api.service[13722]: ERROR nova.api.openstack.wsgi File "/opt/stack/new/nova/nova/api/openstack/compute/servers.py", line 798, in _delete Jun 01 14:33:57.523020 ubuntu-bionic-rax-iad-0016890725 devstack@n-api.service[13722]: ERROR nova.api.openstack.wsgi self.compute_api.delete(context, instance) Jun 01 14:33:57.523020 ubuntu-bionic-rax-iad-0016890725 devstack@n-api.service[13722]: ERROR nova.api.openstack.wsgi File "/opt/stack/new/nova/nova/compute/api.py", line 224, in inner Jun 01 14:33:57.523020 ubuntu-bionic-rax-iad-0016890725 devstack@n-api.service[13722]: ERROR nova.api.openstack.wsgi return function(self, context, instance, *args, **kwargs) Jun 01 14:33:57.523020 ubuntu-bionic-rax-iad-0016890725 devstack@n-api.service[13722]: ERROR nova.api.openstack.wsgi File "/opt/stack/new/nova/nova/compute/api.py", line 151, in inner Jun 01 14:33:57.523020 ubuntu-bionic-rax-iad-0016890725 devstack@n-api.service[13722]: ERROR nova.api.openstack.wsgi return f(self, context, instance, *args, **kw) Jun 01 14:33:57.523020 ubuntu-bionic-rax-iad-0016890725 devstack@n-api.service[13722]: ERROR nova.api.openstack.wsgi File "/opt/stack/new/nova/nova/compute/api.py", line 2479, in delete Jun 01 14:33:57.523020 ubuntu-bionic-rax-iad-0016890725 devstack@n-api.service[13722]: ERROR nova.api.openstack.wsgi self._delete_instance(context, instance) Jun 01 14:33:57.523020 ubuntu-bionic-rax-iad-0016890725 devstack@n-api.service[13722]: ERROR nova.api.openstack.wsgi File "/opt/stack/new/nova/nova/compute/api.py", line 2471, in _delete_instance Jun 01 14:33:57.523020 ubuntu-bionic-rax-iad-0016890725 devstack@n-api.service[13722]: ERROR nova.api.openstack.wsgi task_state=task_states.DELETING) Jun 01 14:33:57.523020 ubuntu-bionic-rax-iad-0016890725 devstack@n-api.service[13722]: ERROR nova.api.openstack.wsgi File "/opt/stack/new/nova/nova/compute/api.py", line 2158, in _delete Jun 01 14:33:57.524852 ubuntu-bionic-rax-iad-0016890725 devstack@n-api.service[13722]: ERROR nova.api.openstack.wsgi self._local_delete_cleanup(context, instance) Jun 01 14:33:57.524852 ubuntu-bionic-rax-iad-0016890725 devstack@n-api.service[13722]: ERROR nova.api.openstack.wsgi File "/opt/stack/new/nova/nova/compute/api.py", line 2117, in _local_delete_cleanup Jun 01 14:33:57.524852 ubuntu-bionic-rax-iad-0016890725 devstack@n-api.service[13722]: ERROR nova.api.openstack.wsgi self._update_queued_for_deletion(context, instance, True) Jun 01 14:33:57.524852 ubuntu-bionic-rax-iad-0016890725 devstack@n-api.service[13722]: ERROR nova.api.openstack.wsgi File "/opt/stack/new/nova/nova/compute/api.py", line 2434, in _update_queued_for_deletion Jun 01 14:33:57.524852 ubuntu-bionic-rax-iad-0016890725 devstack@n-api.service[13722]: ERROR
[Yahoo-eng-team] [Bug 1882521] Re: Failing device detachments on Focal: "Unable to detach the device from the live config"
** Changed in: nova/ussuri Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1882521 Title: Failing device detachments on Focal: "Unable to detach the device from the live config" Status in Cinder: Invalid Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) train series: Fix Released Status in OpenStack Compute (nova) ussuri series: Fix Released Status in OpenStack Compute (nova) victoria series: Fix Released Bug description: The following tests are failing consistently when deploying devstack on Focal in the CI, see https://review.opendev.org/734029 for detailed logs: tempest.api.compute.servers.test_server_rescue_negative.ServerRescueNegativeTestJSON.test_rescued_vm_detach_volume tempest.api.compute.volumes.test_attach_volume.AttachVolumeMultiAttachTest.test_resize_server_with_multiattached_volume tempest.api.compute.servers.test_server_rescue.ServerStableDeviceRescueTest.test_stable_device_rescue_disk_virtio_with_volume_attached tearDownClass (tempest.api.compute.servers.test_server_rescue.ServerStableDeviceRescueTest) Sample extract from nova-compute log: Jun 08 08:48:24.384559 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: DEBUG oslo.service.loopingcall [-] Exception which is in the suggested list of exceptions occurred while invoking function: nova.virt.libvirt.guest.Guest.detach_device_with_retry.._do_wait_and_retry_detach. {{(pid=82495) _func /usr/local/lib/python3.8/dist-packages/oslo_service/loopingcall.py:410}} Jun 08 08:48:24.384862 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: DEBUG oslo.service.loopingcall [-] Cannot retry nova.virt.libvirt.guest.Guest.detach_device_with_retry.._do_wait_and_retry_detach upon suggested exception since retry count (7) reached max retry count (7). {{(pid=82495) _func /usr/local/lib/python3.8/dist-packages/oslo_service/loopingcall.py:416}} Jun 08 08:48:24.388855 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: ERROR oslo.service.loopingcall [-] Dynamic interval looping call 'oslo_service.loopingcall.RetryDecorator.__call__.._func' failed: nova.exception.DeviceDetachFailed: Device detach failed for vdb: Unable to detach the device from the live config. Jun 08 08:48:24.388855 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: ERROR oslo.service.loopingcall Traceback (most recent call last): Jun 08 08:48:24.388855 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: ERROR oslo.service.loopingcall File "/usr/local/lib/python3.8/dist-packages/oslo_service/loopingcall.py", line 150, in _run_loop Jun 08 08:48:24.388855 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: ERROR oslo.service.loopingcall result = func(*self.args, **self.kw) Jun 08 08:48:24.388855 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: ERROR oslo.service.loopingcall File "/usr/local/lib/python3.8/dist-packages/oslo_service/loopingcall.py", line 428, in _func Jun 08 08:48:24.388855 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: ERROR oslo.service.loopingcall return self._sleep_time Jun 08 08:48:24.388855 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: ERROR oslo.service.loopingcall File "/usr/local/lib/python3.8/dist-packages/oslo_utils/excutils.py", line 220, in __exit__ Jun 08 08:48:24.388855 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: ERROR oslo.service.loopingcall self.force_reraise() Jun 08 08:48:24.388855 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: ERROR oslo.service.loopingcall File "/usr/local/lib/python3.8/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise Jun 08 08:48:24.388855 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: ERROR oslo.service.loopingcall six.reraise(self.type_, self.value, self.tb) Jun 08 08:48:24.388855 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: ERROR oslo.service.loopingcall File "/usr/local/lib/python3.8/dist-packages/six.py", line 703, in reraise Jun 08 08:48:24.388855 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: ERROR oslo.service.loopingcall raise value Jun 08 08:48:24.388855 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: ERROR oslo.service.loopingcall File "/usr/local/lib/python3.8/dist-packages/oslo_service/loopingcall.py", line 407, in _func Jun 08 08:48:24.388855 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: ERROR oslo.service.loopingcall result = f(*args, **kwargs) Jun 08 08:48:24.388855 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: ERROR oslo.service.loopingcall File "/opt/stack/nova/nova/virt/libvirt/guest.py", line 453, in _do_wait_and_retry_detach Jun 08 08:48:24.388855 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: ERROR oslo.service.loopingcall raise exception.DeviceDetachFailed( Jun
[Yahoo-eng-team] [Bug 1917619] Re: Attempting to start or hard reboot a users instance as an admin with encrypted volumes leaves the instance unbootable when [workarounds]disable_native_luksv1 is enab
** Also affects: nova/wallaby Importance: Undecided Status: New ** Changed in: nova/wallaby Status: New => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1917619 Title: Attempting to start or hard reboot a users instance as an admin with encrypted volumes leaves the instance unbootable when [workarounds]disable_native_luksv1 is enabled Status in OpenStack Compute (nova): In Progress Status in OpenStack Compute (nova) wallaby series: Fix Released Bug description: Description === $subject, by default admins do not have access to user created barbican secrets. As a result admins cannot hard reboot or stop/start instances as this deletes local libvirt secrets, refetches secrets from Barbican and recreates the local secrets. However this initial attempt by an admin will destroy the local secrets *before* failing to access anything in Barbican. As a result any request by the owner of the instance to hard reboot or stop/start the instance can fail as the _detach_encryptor logic fails to find any local secret and assumes that native LUKSv1 encryption isn't being used. This causes the os-brick encryptors to be loaded that can fail if the underlying volume type isn't supported, such as rbd. Steps to reproduce == 1. As an non-admin user create an instance with encrypted rbd volumes attached 2. Attempt to hard reboot or stop/start the instance as an admin 3. Attempt to hard reboot or stop/start the instance as the owner Expected result === The request by the admin to hard reboot or stop/start the instance fails. The request by the owner to hard reboot or stop/start the instance fails due to os_brick.exception.VolumeEncryptionNotSupported being raised. Actual result = The request by the admin to hard reboot or stop/start the instance fails. The request by the owner to hard reboot or stop/start the instance succeeds. Environment === 1. Exact version of OpenStack you are running. See the following list for all releases: http://docs.openstack.org/releases/ master 2. Which hypervisor did you use? (For example: Libvirt + KVM, Libvirt + XEN, Hyper-V, PowerKVM, ...) What's the version of that? libvirt 2. Which storage type did you use? (For example: Ceph, LVM, GPFS, ...) What's the version of that? N/A 3. Which networking type did you use? (For example: nova-network, Neutron with OpenVSwitch, ...) N/A Logs & Configs == https://bugzilla.redhat.com/show_bug.cgi?id=1934513 2021-02-23 17:07:50.453 7 ERROR oslo_messaging.rpc.server [req-fe304872-e35f-4cb3-8760-4fd1eed745bc fef8c04ca63ab77e9a37b9d79367fd49747d2016352759f6faa8475fbf6f63c1 4127275f099844f28fde120064aa4753 - 1d485afd913b4c489730f79d83044080 1d485afd913b4c489730f79d83044080] Exception during message handling: os_brick.exception.VolumeEncryptionNotSupported: Volume encryption is not supported for rbd volume d9817c6a-9c84-472a-8fc8-58ad73b389aa. 2021-02-23 17:07:50.453 7 ERROR oslo_messaging.rpc.server Traceback (most recent call last): 2021-02-23 17:07:50.453 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_messaging/rpc/server.py", line 165, in _process_incoming 2021-02-23 17:07:50.453 7 ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message) 2021-02-23 17:07:50.453 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_messaging/rpc/dispatcher.py", line 274, in dispatch 2021-02-23 17:07:50.453 7 ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args) 2021-02-23 17:07:50.453 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_messaging/rpc/dispatcher.py", line 194, in _do_dispatch 2021-02-23 17:07:50.453 7 ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args) 2021-02-23 17:07:50.453 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/nova/exception_wrapper.py", line 79, in wrapped 2021-02-23 17:07:50.453 7 ERROR oslo_messaging.rpc.server function_name, call_dict, binary, tb) 2021-02-23 17:07:50.453 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 220, in __exit__ 2021-02-23 17:07:50.453 7 ERROR oslo_messaging.rpc.server self.force_reraise() 2021-02-23 17:07:50.453 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 196, in force_reraise 2021-02-23 17:07:50.453 7 ERROR oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb) 2021-02-23 17:07:50.453 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/six.py", line 693, in
[Yahoo-eng-team] [Bug 1913575] Re: Use auth_username when probing encrypted rbd volumes while extending them
** Changed in: nova/ussuri Status: In Progress => Fix Released ** Changed in: nova/victoria Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1913575 Title: Use auth_username when probing encrypted rbd volumes while extending them Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) ussuri series: Fix Released Status in OpenStack Compute (nova) victoria series: Fix Released Bug description: Description === I0c3f14100a18107f7e416293f3d4fcc641ce5e55 introduced new logic around resizing encrypted LUKSv1 volumes that probed the volume using qemu- img to determine the LUKSv1 header size and to take this into account during the resize. The use of qemu-img however assumes access to the admin rbd keyring as a username isn't provided. This isn't always available in all environment so the options `id:$username` need to be appended on the rbd URI provided to qemu-img. Steps to reproduce == Attempt to resize an encrypted LUKSv1 volume on a compute without access to the admin keyring. Expected result === The URI provided to qemu-img includes the username (and thus local keyring) to use. Actual result = qemu-img fails as it can't find the default admin keyring. Environment === 1. Exact version of OpenStack you are running. See the following list for all releases: http://docs.openstack.org/releases/ master 2. Which hypervisor did you use? (For example: Libvirt + KVM, Libvirt + XEN, Hyper-V, PowerKVM, ...) What's the version of that? libvirt + KVM 2. Which storage type did you use? (For example: Ceph, LVM, GPFS, ...) What's the version of that? c-vol ceph 3. Which networking type did you use? (For example: nova-network, Neutron with OpenVSwitch, ...) N/A Logs & Configs == 3e004ad2953a4aa7a2f9022be3ffc7cd - default default] [instance: 8d640d15-30dd-4e72-a9ba-d9f7cf11b1ec] Unknown error when attempting to find the payload_offset for LUKSv1 encrypted disk rbd:volumes/volume-d721825d-038a-42f6-8127-aaec171e5c39.: nova.exception.InvalidDiskInfo: Disk info file is invalid: qemu-img failed to execute on rbd:volumes/volume-d721825d-038a-42f6-8127-aaec171e5c39 : Unexpected error while running command. Command: /usr/libexec/platform-python -m oslo_concurrency.prlimit --as=1073741824 --cpu=30 -- env LC_ALL=C LANG=C qemu-img info rbd:volumes/volume-d721825d-038a-42f6-8127-aaec171e5c39 --output=json --force-share Exit code: 1 Stdout: '' Stdout: '' Stderr: "qemu-img: Could not open 'rbd:volumes/volume-d721825d-038a-42f6-8127-aaec171e5c39': error connecting: Permission denied\n" To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1913575/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1919357] Re: "Secure live migration with QEMU-native TLS in nova"-guide misses essential config option
** Changed in: nova/ussuri Status: New => Fix Released ** Changed in: nova/victoria Status: New => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1919357 Title: "Secure live migration with QEMU-native TLS in nova"-guide misses essential config option Status in OpenStack Compute (nova): In Progress Status in OpenStack Compute (nova) stein series: New Status in OpenStack Compute (nova) train series: Fix Released Status in OpenStack Compute (nova) ussuri series: Fix Released Status in OpenStack Compute (nova) victoria series: Fix Released Status in OpenStack Security Advisory: Won't Fix Status in OpenStack Security Notes: In Progress Bug description: - [x] This doc is inaccurate in this way: __ I followed the guide to setup qemu native tls for live migration. After checking, that libvirt is able to use tls using tcpdump to listen on the port for tls, I also wanted to check that it works when I live migrate an instance. Apparently it didn't. But it used the port for unencrypted TCP [1]. After digging through documentation and code afterwards I found that in this code part: https://github.com/openstack/nova/blob/stable/victoria/nova/virt/libvirt/driver.py#L1120 @staticmethod def _live_migration_uri(dest): uris = { 'kvm': 'qemu+%(scheme)s://%(dest)s/system', 'qemu': 'qemu+%(scheme)s://%(dest)s/system', 'xen': 'xenmigr://%(dest)s/system', 'parallels': 'parallels+tcp://%(dest)s/system', } dest = oslo_netutils.escape_ipv6(dest) virt_type = CONF.libvirt.virt_type # TODO(pkoniszewski): Remove fetching live_migration_uri in Pike uri = CONF.libvirt.live_migration_uri if uri: return uri % dest uri = uris.get(virt_type) if uri is None: raise exception.LiveMigrationURINotAvailable(virt_type=virt_type) str_format = { 'dest': dest, 'scheme': CONF.libvirt.live_migration_scheme or 'tcp', } return uri % str_format the uri is calculated using the config parameter 'live_migration_scheme' or using the hard coded tcp parameter. Coming from the guide for qemu native tls, there was no hint that this config option needs to be set. In fact without setting this 'live_migration_scheme' config option to tls, there is no way to see, that the live migration still uses the unencrypted tcp connection - one has to use tcpdump and listen for tcp or tls to recognize it. Neither in the logs nor in any debug output there is any hint that it is still unencrypted! Thus I conclude there might be OpenStack deployments which are configured as the guide say but these config changes have no effect! - [x] This is a doc addition request. To fix this the config parameter 'live_migration_scheme' should be set to tls and maybe there should be a warning in the documentation, that without doing this, the traffic is still unencrypted. - [ ] I have a fix to the document that I can paste below including example: input and output. [1] without setting 'live_migration_scheme' in the nova.conf $ tcpdump -i INTERFACE -n -X port 16509 and '(tcp[((tcp[12] & 0xf0) >> 2)] < 0x14 || tcp[((tcp[12] & 0xf0) >> 2)] > 0x17)' tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on INTERFACE, link-type EN10MB (Ethernet), capture size 262144 bytes 17:10:56.387407 IP 192.168.70.101.50900 > 192.168.70.100.16509: Flags [P.], seq 304:6488, ack 285, win 502, options [nop,nop,TS val 424149655 ecr 1875309961], length 6184 0x: 4500 185c ad05 4000 4006 677c c0a8 4665 E..\..@.@.g|..Fe 0x0010: c0a8 4664 c6d4 407d a407 70a6 15ad 0a5a ..Fd..@}..pZ 0x0020: 8018 01f6 2669 0101 080a 1948 0297 0x0030: 6fc6 f589 1828 2000 8086 0001 o..( 0x0040: 012f 0009 .../ 0x0050: 0001 000f 6465 7374 696e 6174 destinat 0x0060: 696f 6e5f 786d 6c00 0007 129b ion_xml. 0x0070: 3c64 6f6d 6169 6e20 7479 7065 3d27 6b76 ...inst 0x0090: 616e 6365 2d30 3030 3032 6539 393c 2f6e ance-2e99...7e2 0x00b0: 6364 3839 352d 6263 3765 2d34 6634 352d cd895-bc7e-4f45- 0x00c0: 6166 6264 2d33 3732 3166 3735 6134 3064 afbd-3721f75a40d 0x00d0: 383c 2f75 7569 643e 0a20 203c 6d65 7461 8...> 2)] > 0x13 && tcp[((tcp[12] & 0xf0) >> 2)] < 0x18)' tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on INTERFACE, link-type EN10MB (Ethernet), capture size 262144 bytes 16:55:47.746851 IP 192.168.70.100.35620 > 192.168.70.101.16514: Flags [P.], seq 1849334708:1849334914, ack 3121294199, win 502, options
[Yahoo-eng-team] [Bug 1919487] Re: virDomainBlockCommit called when deleting an intermediary snapshot via os-assisted-volume-snapshots even when instance is shutoff
** Also affects: nova/wallaby Importance: Undecided Status: New ** Changed in: nova/wallaby Status: New => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1919487 Title: virDomainBlockCommit called when deleting an intermediary snapshot via os-assisted-volume-snapshots even when instance is shutoff Status in OpenStack Compute (nova): In Progress Status in OpenStack Compute (nova) wallaby series: Fix Released Bug description: Description === Attempting to delete a NFS volume snapshot (via c-api and the the os- assisted-volume-snapshots n-api) of a volume attached to a SHUTOFF instance currently results in n-cpu attempting to fire off a virDomainBlockCommit command even though the instance isn't running. Steps to reproduce == 1. Create multiple volume snapshots against a volume. 2. Attach the volume to an ACTIVE instance. 3. Stop the instance and ensure it is SHUTOFF. 4. Attempt to delete an intermediary snapshot. Expected result === qemu-img commit or qemu-img rebase should be used to handle this offline. Actual result = virDomainBlockCommit is called even though the domain isn't running. Environment === 1. Exact version of OpenStack you are running. See the following list for all releases: http://docs.openstack.org/releases/ master 2. Which hypervisor did you use? (For example: Libvirt + KVM, Libvirt + XEN, Hyper-V, PowerKVM, ...) What's the version of that? libvirt + KVM 2. Which storage type did you use? (For example: Ceph, LVM, GPFS, ...) What's the version of that? NFS c-vol 3. Which networking type did you use? (For example: nova-network, Neutron with OpenVSwitch, ...) N/A Logs & Configs == Jul 03 09:37:57 localhost.localdomain nova-compute[127223]: ERROR oslo_messaging.rpc.server [req-570281c6-566e-44a3-9953-eeb634513778 req-0fbbe87f-fd1d-4861-9fb3-21b8eb011e55 service nova] Exception during message handling: libvirt.libvirtError: Requested operation is not valid: domain is not > Jul 03 09:37:57 localhost.localdomain nova-compute[127223]: ERROR oslo_messaging.rpc.server Traceback (most recent call last): Jul 03 09:37:57 localhost.localdomain nova-compute[127223]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.7/site-packages/oslo_messaging/rpc/server.py", line 165, in _process_incoming Jul 03 09:37:57 localhost.localdomain nova-compute[127223]: ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message) Jul 03 09:37:57 localhost.localdomain nova-compute[127223]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 273, in dispatch Jul 03 09:37:57 localhost.localdomain nova-compute[127223]: ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args) Jul 03 09:37:57 localhost.localdomain nova-compute[127223]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 193, in _do_dispatch Jul 03 09:37:57 localhost.localdomain nova-compute[127223]: ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args) Jul 03 09:37:57 localhost.localdomain nova-compute[127223]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.7/site-packages/oslo_messaging/rpc/server.py", line 241, in inner Jul 03 09:37:57 localhost.localdomain nova-compute[127223]: ERROR oslo_messaging.rpc.server return func(*args, **kwargs) Jul 03 09:37:57 localhost.localdomain nova-compute[127223]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/exception_wrapper.py", line 78, in wrapped Jul 03 09:37:57 localhost.localdomain nova-compute[127223]: ERROR oslo_messaging.rpc.server function_name, call_dict, binary) Jul 03 09:37:57 localhost.localdomain nova-compute[127223]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__ Jul 03 09:37:57 localhost.localdomain nova-compute[127223]: ERROR oslo_messaging.rpc.server self.force_reraise() Jul 03 09:37:57 localhost.localdomain nova-compute[127223]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise Jul 03 09:37:57 localhost.localdomain nova-compute[127223]: ERROR oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb) Jul 03 09:37:57 localhost.localdomain nova-compute[127223]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.7/site-packages/six.py", line 703, in reraise Jul 03 09:37:57 localhost.localdomain nova-compute[127223]: ERROR oslo_messaging.rpc.server raise
[Yahoo-eng-team] [Bug 1923206] Re: libvirt.libvirtError: internal error: unable to execute QEMU command 'device_del': Device $device is already in the process of unplug
** Also affects: nova/wallaby Importance: Undecided Status: New ** Changed in: nova/wallaby Status: New => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1923206 Title: libvirt.libvirtError: internal error: unable to execute QEMU command 'device_del': Device $device is already in the process of unplug Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) wallaby series: Fix Released Bug description: Description === This was initially reported downstream against QEMU in the following bug: Get libvirtError "Device XX is already in the process of unplug" when detach device in OSP env https://bugzilla.redhat.com/show_bug.cgi?id=1878659 I first saw the error crop up while testing q35 in TripleO in the following job: https://c6b36562677324bf8249-804f3f4695b3063292bbb3235f424ae0.ssl.cf1.rackcdn.com/785027/5/check /tripleo-ci- centos-8-standalone/6860050/logs/undercloud/var/log/containers/nova /nova-compute.log 2021-04-09 11:09:53.702 8 DEBUG nova.virt.libvirt.guest [req-4d0b64d5-a2cf-4a6e-a2f7-f6cc7ced4df1 7e2b737ed8f04b3ca819841a41be66c1 d4d933c7b10c462c8141820b0e70822b - default default] Attempting initial detach for device vdb detach_device_with_retry /usr/lib/python3.6/site-packages/nova/virt/libvirt/guest.py:455 [..] 2021-04-09 11:09:58.721 8 DEBUG nova.virt.libvirt.guest [req-4d0b64d5-a2cf-4a6e-a2f7-f6cc7ced4df1 7e2b737ed8f04b3ca819841a41be66c1 d4d933c7b10c462c8141820b0e70822b - default default] Start retrying detach until device vdb is gone. detach_device_with_retry /usr/lib/python3.6/site-packages/nova/virt/libvirt/guest.py:471 [..] 2021-04-09 11:09:58.729 8 ERROR oslo.service.loopingcall libvirt.libvirtError: internal error: unable to execute QEMU command 'device_del': Device virtio-disk1 is already in the process of unplug Steps to reproduce == Unclear at present, it looks like a genuine QEMU bug that causes it to fail when a repeat request to device_del a device comes in instead of ignore the request as would previously happen. I've asked for clarification in the downstream QEMU bug. Expected result === Repeat calls to device_del are ignored or the failure while raised is ignored by Nova. Actual result = Repeat calls to device_del lead to an error being raised to Nova via libvirt that causes the detach to fail while it still succeeds asynchronously within QEMU. Environment === 1. Exact version of OpenStack you are running. See the following list for all releases: http://docs.openstack.org/releases/ master 2. Which hypervisor did you use? (For example: Libvirt + KVM, Libvirt + XEN, Hyper-V, PowerKVM, ...) What's the version of that? libvirt + QEMU/KVM 2. Which storage type did you use? (For example: Ceph, LVM, GPFS, ...) What's the version of that? N/A 3. Which networking type did you use? (For example: nova-network, Neutron with OpenVSwitch, ...) N/A Logs & Configs == See above. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1923206/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1799298] Re: Metadata API cross joining instance_metadata and instance_system_metadata
** Changed in: nova/train Status: In Progress => Fix Released ** Changed in: nova/stein Status: In Progress => Fix Committed -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1799298 Title: Metadata API cross joining instance_metadata and instance_system_metadata Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) ocata series: Triaged Status in OpenStack Compute (nova) pike series: Triaged Status in OpenStack Compute (nova) queens series: Fix Committed Status in OpenStack Compute (nova) rocky series: Fix Committed Status in OpenStack Compute (nova) stein series: Fix Committed Status in OpenStack Compute (nova) train series: Fix Released Status in OpenStack Compute (nova) ussuri series: Fix Released Status in OpenStack Compute (nova) victoria series: Fix Released Status in OpenStack Security Advisory: Won't Fix Bug description: Description === While troubleshooting a production issue we identified that the Nova metadata API is fetching a lot more raw data from the database than seems necessary. The problem appears to be caused by the SQL query used to fetch instance data, which joins the "instance" table with, among others, two metadata tables: "instance_metadata" and "instance_system_metadata". Below is a simplified version of this query which was captured by adding extra logging (the full query is listed at the end of this bug report): SELECT ... FROM (SELECT ... FROM `instances` WHERE `instances` . `deleted` = ? AND `instances` . `uuid` = ? LIMIT ?) AS `anon_1` LEFT OUTER JOIN `instance_system_metadata` AS `instance_system_metadata_1` ON `anon_1` . `instances_uuid` = `instance_system_metadata_1` . `instance_uuid` LEFT OUTER JOIN (`security_group_instance_association` AS `security_group_instance_association_1` INNER JOIN `security_groups` AS `security_groups_1` ON `security_groups_1` . `id` = `security_group_instance_association_1` . `security_group_id` AND `security_group_instance_association_1` . `deleted` = ? AND `security_groups_1` . `deleted` = ? ) ON `security_group_instance_association_1` . `instance_uuid` = `anon_1` . `instances_uuid` AND `anon_1` . `instances_deleted` = ? LEFT OUTER JOIN `security_group_rules` AS `security_group_rules_1` ON `security_group_rules_1` . `parent_group_id` = `security_groups_1` . `id` AND `security_group_rules_1` . `deleted` = ? LEFT OUTER JOIN `instance_info_caches` AS `instance_info_caches_1` ON `instance_info_caches_1` . `instance_uuid` = `anon_1` . `instances_uuid` LEFT OUTER JOIN `instance_extra` AS `instance_extra_1` ON `instance_extra_1` . `instance_uuid` = `anon_1` . `instances_uuid` LEFT OUTER JOIN `instance_metadata` AS `instance_metadata_1` ON `instance_metadata_1` . `instance_uuid` = `anon_1` . `instances_uuid` AND `instance_metadata_1` . `deleted` = ? The instance table has a 1-to-many relationship to both "instance_metadata" and "instance_system_metadata" tables, so the query is effectively producing a cross join of both metadata tables. Steps to reproduce == To illustrate the impact of this query, add 2 properties to a running instance and verify that it has 2 records in "instance_metadata", as well as other records in "instance_system_metadata" such as base image properties: > select instance_uuid,`key`,value from instance_metadata where instance_uuid = 'a6cf4a6a-effe-4438-9b7f-d61b23117b9b'; +--+---++ | instance_uuid| key | value | +--+---++ | a6cf4a6a-effe-4438-9b7f-d61b23117b9b | property1 | value1 | | a6cf4a6a-effe-4438-9b7f-d61b23117b9b | property2 | value | +--+---++ 2 rows in set (0.61 sec) > select instance_uuid,`key`,valusystem_metadata where instance_uuid = 'a6cf4a6a-effe-4438-9b7f-d61b23117b9b'; ++--+ | key| value| ++--+ | image_disk_format | qcow2| | image_min_ram | 0| | image_min_disk | 20 | | image_base_image_ref | 39cd564f-6a29-43e2-815b-62097968486a | | image_container_format | bare | ++--+ 5 rows in set (0.00 sec) For this particular instance,
[Yahoo-eng-team] [Bug 1841932] Re: hide_hypervisor_id extra_specs in nova flavor cannot pass AggregateInstanceExtraSpecsFilter
** Also affects: nova/train Importance: Undecided Status: New ** Changed in: nova/train Status: New => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1841932 Title: hide_hypervisor_id extra_specs in nova flavor cannot pass AggregateInstanceExtraSpecsFilter Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) train series: Fix Released Status in OpenStack Compute (nova) ussuri series: Fix Released Bug description: Description === when we enable nova AggregateInstanceExtraSpecsFilter, and then we need to passthrough a nvidia gpu so that we need to set hide_hypervisor_id in nova flavor extra specs. hide_hypervisor_id cannot pass the AggregateInstanceExtraSpecsFilter because of # Either not scope format, or aggregate_instance_extra_specs scope. See the codes below: # Either not scope format, or aggregate_instance_extra_specs scope scope = key.split(':', 1) if len(scope) > 1: if scope[0] != _SCOPE: continue else: del scope[0] key = scope[0] Steps to reproduce == in nova.conf [filter_scheduler] enabled_filters = ,AggregateInstanceExtraSpecsFilter,... create a flavor like "g3.8xlarge" and setting extra_specs "hide_hypervisor_id": nova flavor-key g3.8xlarge set hide_hypervisor_id=true then create a instance with flavor g3.8xlarge, it will report "Filter AggregateInstanceExtraSpecsFilter returned 0 hosts" in nova schedualer log. Environment === (nova-scheduler)[nova@control1 /]$ rpm -qa | grep nova openstack-nova-common-18.2.1-0.1.el7.noarch openstack-nova-scheduler-18.2.1-0.1.el7.noarch python-nova-18.2.1-0.1.el7.noarch python2-novaclient-11.0.0-1.el7.noarch I think this is a BUG in AggregateInstanceExtraSpecsFilter, can I suggest to remove the "not scope format" support in AggregateInstanceExtraSpecsFilter? or add a explicitly scope for "hide_hypervisor_id". Otherwise, I cannot use AggregateInstanceExtraSpecsFilter and hide_hypervisor_id at the same time. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1841932/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1893618] Re: periodic-tripleo-ci-centos-8-standalone-full-tempest-api-compute-master tempest test_shelve_unshelve_server failing in component-pipeline
** Changed in: nova/ussuri Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1893618 Title: periodic-tripleo-ci-centos-8-standalone-full-tempest-api-compute- master tempest test_shelve_unshelve_server failing in component- pipeline Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) train series: Fix Committed Status in OpenStack Compute (nova) ussuri series: Fix Released Status in tripleo: Fix Released Bug description: https://logserver.rdoproject.org/openstack-component- compute/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci- centos-8-standalone-full-tempest-api-compute- master/b346467/logs/undercloud/var/log/tempest/stestr_results.html.gz traceback-1: {{{ Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/tempest/api/compute/servers/test_server_actions.py", line 66, in tearDown self.server_check_teardown() File "/usr/lib/python3.6/site-packages/tempest/api/compute/base.py", line 220, in server_check_teardown cls.server_id, 'ACTIVE') File "/usr/lib/python3.6/site-packages/tempest/common/waiters.py", line 96, in wait_for_server_status raise lib_exc.TimeoutException(message) tempest.lib.exceptions.TimeoutException: Request timed out Details: (ServerActionsTestJSON:tearDown) Server 41f15309-34bb-430d-8dad-7b9c8362a851 failed to reach ACTIVE status and task state "None" within the required time (300 s). Current status: SHELVED_OFFLOADED. Current task state: None. }}} traceback-2: {{{ Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/tempest/api/compute/servers/test_server_actions.py", line 649, in _unshelve_server server_info = self.client.show_server(self.server_id)['server'] File "/usr/lib/python3.6/site-packages/tempest/lib/services/compute/servers_client.py", line 145, in show_server resp, body = self.get("servers/%s" % server_id) File "/usr/lib/python3.6/site-packages/tempest/lib/common/rest_client.py", line 313, in get return self.request('GET', url, extra_headers, headers) File "/usr/lib/python3.6/site-packages/tempest/lib/services/compute/base_compute_client.py", line 48, in request method, url, extra_headers, headers, body, chunked) File "/usr/lib/python3.6/site-packages/tempest/lib/common/rest_client.py", line 702, in request self._error_checker(resp, resp_body) File "/usr/lib/python3.6/site-packages/tempest/lib/common/rest_client.py", line 808, in _error_checker raise exceptions.NotFound(resp_body, resp=resp) tempest.lib.exceptions.NotFound: Object not found Details: {'code': 404, 'message': 'Instance None could not be found.'} }}} Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/tempest/common/utils/__init__.py", line 89, in wrapper return f(*func_args, **func_kwargs) File "/usr/lib/python3.6/site-packages/tempest/api/compute/servers/test_server_actions.py", line 666, in test_shelve_unshelve_server waiters.wait_for_server_status(self.client, self.server_id, 'ACTIVE') File "/usr/lib/python3.6/site-packages/tempest/common/waiters.py", line 96, in wait_for_server_status raise lib_exc.TimeoutException(message) tempest.lib.exceptions.TimeoutException: Request timed out Details: (ServerActionsTestJSON:test_shelve_unshelve_server) Server 41f15309-34bb-430d-8dad-7b9c8362a851 failed to reach ACTIVE status and task state "None" within the required time (300 s). Current status: SHELVED_OFFLOADED. Current task state: None. Traceback in nova-compute logs https://logserver.rdoproject.org /openstack-component-compute/opendev.org/openstack/tripleo-ci/master /periodic-tripleo-ci-centos-8-standalone-full-tempest-api-compute- master/b346467/logs/undercloud/var/log/containers/nova/nova- compute.log.1.gz:- 2020-08-30 08:35:10.183 7 ERROR oslo_messaging.rpc.server [req-9280bac1-da23-4f45-b01c-b6012198d97e 10fe2caa6924408485c181adfc7377e8 df52aad2e4da4f07b4b7b4ff6644e121 - default default] Exception during message handling: AttributeError: 'NoneType' object has no attribute 'encode' 2020-08-30 08:35:10.183 7 ERROR oslo_messaging.rpc.server Traceback (most recent call last): 2020-08-30 08:35:10.183 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_messaging/rpc/server.py", line 165, in _process_incoming 2020-08-30 08:35:10.183 7 ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message) 2020-08-30 08:35:10.183 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_messaging/rpc/dispatcher.py", line 273, in dispatch 2020-08-30 08:35:10.183 7 ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args)
[Yahoo-eng-team] [Bug 1613770] Re: Improve error log when instance snapshot fails
** Changed in: nova/train Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1613770 Title: Improve error log when instance snapshot fails Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) queens series: New Status in OpenStack Compute (nova) rocky series: New Status in OpenStack Compute (nova) stein series: New Status in OpenStack Compute (nova) train series: Fix Released Bug description: If glance backend store is set to use filesystem storage and this storage is running out of space when if glance is trying to create a instance snapshot, then in nova-compute log the following message is displayed: 2016-08-08 22:24:31.644 TRACE oslo_messaging.rpc.server HTTPOverLimit: 413 Request Entity Too Large 2016-08-08 22:24:31.644 TRACE oslo_messaging.rpc.server Image storage media is full: There is not enough disk space on the image storage media. 2016-08-08 22:24:31.644 TRACE oslo_messaging.rpc.server (HTTP 413) It's a little bit annoying that we're logging the HTTP error from glance and that we don't specify the image uuid. Steps to reproduce: * set glance's config filesystem_store_datadir to small size filesystem * start nova instance * keep invoking "nova image-create" to create instance image snapshot, eventually the backend filesystem storage would run out of space * on nova-compute log see the HTTP error message above. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1613770/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1882521] Re: Failing device detachments on Focal: "Unable to detach the device from the live config"
** Changed in: nova/train Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1882521 Title: Failing device detachments on Focal: "Unable to detach the device from the live config" Status in Cinder: Invalid Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) train series: Fix Released Status in OpenStack Compute (nova) ussuri series: In Progress Status in OpenStack Compute (nova) victoria series: Fix Released Bug description: The following tests are failing consistently when deploying devstack on Focal in the CI, see https://review.opendev.org/734029 for detailed logs: tempest.api.compute.servers.test_server_rescue_negative.ServerRescueNegativeTestJSON.test_rescued_vm_detach_volume tempest.api.compute.volumes.test_attach_volume.AttachVolumeMultiAttachTest.test_resize_server_with_multiattached_volume tempest.api.compute.servers.test_server_rescue.ServerStableDeviceRescueTest.test_stable_device_rescue_disk_virtio_with_volume_attached tearDownClass (tempest.api.compute.servers.test_server_rescue.ServerStableDeviceRescueTest) Sample extract from nova-compute log: Jun 08 08:48:24.384559 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: DEBUG oslo.service.loopingcall [-] Exception which is in the suggested list of exceptions occurred while invoking function: nova.virt.libvirt.guest.Guest.detach_device_with_retry.._do_wait_and_retry_detach. {{(pid=82495) _func /usr/local/lib/python3.8/dist-packages/oslo_service/loopingcall.py:410}} Jun 08 08:48:24.384862 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: DEBUG oslo.service.loopingcall [-] Cannot retry nova.virt.libvirt.guest.Guest.detach_device_with_retry.._do_wait_and_retry_detach upon suggested exception since retry count (7) reached max retry count (7). {{(pid=82495) _func /usr/local/lib/python3.8/dist-packages/oslo_service/loopingcall.py:416}} Jun 08 08:48:24.388855 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: ERROR oslo.service.loopingcall [-] Dynamic interval looping call 'oslo_service.loopingcall.RetryDecorator.__call__.._func' failed: nova.exception.DeviceDetachFailed: Device detach failed for vdb: Unable to detach the device from the live config. Jun 08 08:48:24.388855 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: ERROR oslo.service.loopingcall Traceback (most recent call last): Jun 08 08:48:24.388855 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: ERROR oslo.service.loopingcall File "/usr/local/lib/python3.8/dist-packages/oslo_service/loopingcall.py", line 150, in _run_loop Jun 08 08:48:24.388855 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: ERROR oslo.service.loopingcall result = func(*self.args, **self.kw) Jun 08 08:48:24.388855 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: ERROR oslo.service.loopingcall File "/usr/local/lib/python3.8/dist-packages/oslo_service/loopingcall.py", line 428, in _func Jun 08 08:48:24.388855 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: ERROR oslo.service.loopingcall return self._sleep_time Jun 08 08:48:24.388855 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: ERROR oslo.service.loopingcall File "/usr/local/lib/python3.8/dist-packages/oslo_utils/excutils.py", line 220, in __exit__ Jun 08 08:48:24.388855 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: ERROR oslo.service.loopingcall self.force_reraise() Jun 08 08:48:24.388855 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: ERROR oslo.service.loopingcall File "/usr/local/lib/python3.8/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise Jun 08 08:48:24.388855 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: ERROR oslo.service.loopingcall six.reraise(self.type_, self.value, self.tb) Jun 08 08:48:24.388855 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: ERROR oslo.service.loopingcall File "/usr/local/lib/python3.8/dist-packages/six.py", line 703, in reraise Jun 08 08:48:24.388855 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: ERROR oslo.service.loopingcall raise value Jun 08 08:48:24.388855 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: ERROR oslo.service.loopingcall File "/usr/local/lib/python3.8/dist-packages/oslo_service/loopingcall.py", line 407, in _func Jun 08 08:48:24.388855 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: ERROR oslo.service.loopingcall result = f(*args, **kwargs) Jun 08 08:48:24.388855 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: ERROR oslo.service.loopingcall File "/opt/stack/nova/nova/virt/libvirt/guest.py", line 453, in _do_wait_and_retry_detach Jun 08 08:48:24.388855 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: ERROR oslo.service.loopingcall raise exception.DeviceDetachFailed( Jun
[Yahoo-eng-team] [Bug 1896621] Re: instance corrupted after volume retype
** Changed in: nova/train Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1896621 Title: instance corrupted after volume retype Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) queens series: In Progress Status in OpenStack Compute (nova) rocky series: In Progress Status in OpenStack Compute (nova) stein series: In Progress Status in OpenStack Compute (nova) train series: Fix Released Status in OpenStack Compute (nova) ussuri series: Fix Released Status in OpenStack Compute (nova) victoria series: Fix Released Bug description: Description === Following a cinder volume retype on a volume attached to a running instance, the instance became corrupt and cannot boot into the guest operating system any more. Upon further investigating it seems the retype operation failed. The nova-compute logs registered the following error: Exception during message handling: libvirtError: block copy still active: domain has active block job see log extract: http://paste.openstack.org/show/798201/ Steps to reproduce == I'm not sure how easy this would be to replicate the exact problem. As an admin user within the project, in Horizon go to Project | Volume | Volume, then from the context menu of the required volume select "change volume type". Select the new type and migration policy 'on-demand'. Following this it was reported that the instance was none-responsive, when checking in the console the instance was unable to boot from the volume. Environment === DISTRIB_ID="OSA" DISTRIB_RELEASE="18.1.5" DISTRIB_CODENAME="Rocky" DISTRIB_DESCRIPTION="OpenStack-Ansible" # nova-manage --version 18.1.1 # virsh version Compiled against library: libvirt 4.0.0 Using library: libvirt 4.0.0 Using API: QEMU 4.0.0 Running hypervisor: QEMU 2.11.1 Cinder v13.0.3 backed volumes using Zadara VPSA driver To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1896621/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1919357] Re: "Secure live migration with QEMU-native TLS in nova"-guide misses essential config option
** Changed in: nova/train Status: New => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1919357 Title: "Secure live migration with QEMU-native TLS in nova"-guide misses essential config option Status in OpenStack Compute (nova): In Progress Status in OpenStack Compute (nova) stein series: New Status in OpenStack Compute (nova) train series: Fix Released Status in OpenStack Compute (nova) ussuri series: New Status in OpenStack Compute (nova) victoria series: New Status in OpenStack Security Advisory: Won't Fix Status in OpenStack Security Notes: In Progress Bug description: - [x] This doc is inaccurate in this way: __ I followed the guide to setup qemu native tls for live migration. After checking, that libvirt is able to use tls using tcpdump to listen on the port for tls, I also wanted to check that it works when I live migrate an instance. Apparently it didn't. But it used the port for unencrypted TCP [1]. After digging through documentation and code afterwards I found that in this code part: https://github.com/openstack/nova/blob/stable/victoria/nova/virt/libvirt/driver.py#L1120 @staticmethod def _live_migration_uri(dest): uris = { 'kvm': 'qemu+%(scheme)s://%(dest)s/system', 'qemu': 'qemu+%(scheme)s://%(dest)s/system', 'xen': 'xenmigr://%(dest)s/system', 'parallels': 'parallels+tcp://%(dest)s/system', } dest = oslo_netutils.escape_ipv6(dest) virt_type = CONF.libvirt.virt_type # TODO(pkoniszewski): Remove fetching live_migration_uri in Pike uri = CONF.libvirt.live_migration_uri if uri: return uri % dest uri = uris.get(virt_type) if uri is None: raise exception.LiveMigrationURINotAvailable(virt_type=virt_type) str_format = { 'dest': dest, 'scheme': CONF.libvirt.live_migration_scheme or 'tcp', } return uri % str_format the uri is calculated using the config parameter 'live_migration_scheme' or using the hard coded tcp parameter. Coming from the guide for qemu native tls, there was no hint that this config option needs to be set. In fact without setting this 'live_migration_scheme' config option to tls, there is no way to see, that the live migration still uses the unencrypted tcp connection - one has to use tcpdump and listen for tcp or tls to recognize it. Neither in the logs nor in any debug output there is any hint that it is still unencrypted! Thus I conclude there might be OpenStack deployments which are configured as the guide say but these config changes have no effect! - [x] This is a doc addition request. To fix this the config parameter 'live_migration_scheme' should be set to tls and maybe there should be a warning in the documentation, that without doing this, the traffic is still unencrypted. - [ ] I have a fix to the document that I can paste below including example: input and output. [1] without setting 'live_migration_scheme' in the nova.conf $ tcpdump -i INTERFACE -n -X port 16509 and '(tcp[((tcp[12] & 0xf0) >> 2)] < 0x14 || tcp[((tcp[12] & 0xf0) >> 2)] > 0x17)' tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on INTERFACE, link-type EN10MB (Ethernet), capture size 262144 bytes 17:10:56.387407 IP 192.168.70.101.50900 > 192.168.70.100.16509: Flags [P.], seq 304:6488, ack 285, win 502, options [nop,nop,TS val 424149655 ecr 1875309961], length 6184 0x: 4500 185c ad05 4000 4006 677c c0a8 4665 E..\..@.@.g|..Fe 0x0010: c0a8 4664 c6d4 407d a407 70a6 15ad 0a5a ..Fd..@}..pZ 0x0020: 8018 01f6 2669 0101 080a 1948 0297 0x0030: 6fc6 f589 1828 2000 8086 0001 o..( 0x0040: 012f 0009 .../ 0x0050: 0001 000f 6465 7374 696e 6174 destinat 0x0060: 696f 6e5f 786d 6c00 0007 129b ion_xml. 0x0070: 3c64 6f6d 6169 6e20 7479 7065 3d27 6b76 ...inst 0x0090: 616e 6365 2d30 3030 3032 6539 393c 2f6e ance-2e99...7e2 0x00b0: 6364 3839 352d 6263 3765 2d34 6634 352d cd895-bc7e-4f45- 0x00c0: 6166 6264 2d33 3732 3166 3735 6134 3064 afbd-3721f75a40d 0x00d0: 383c 2f75 7569 643e 0a20 203c 6d65 7461 8...> 2)] > 0x13 && tcp[((tcp[12] & 0xf0) >> 2)] < 0x18)' tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on INTERFACE, link-type EN10MB (Ethernet), capture size 262144 bytes 16:55:47.746851 IP 192.168.70.100.35620 > 192.168.70.101.16514: Flags [P.], seq 1849334708:1849334914, ack 3121294199, win 502, options [nop,nop,TS val 1874401351 ecr 423241020], length 206 0x: 4500 0102 a605 4000
[Yahoo-eng-team] [Bug 1930734] [NEW] Volumes and vNICs are being hot plugged into SEV based instances without iommu='on' causing failures to attach and later detach within the guest OS
Public bug reported: Description === After successfully attaching a disk to a SEV enabled instance the request to detach the disk never completes with the following trace eventually logged regarding the initial attach: [7.773877] pcieport :00:02.5: Slot(0-5): Attention button pressed [7.774743] pcieport :00:02.5: Slot(0-5) Powering on due to button press [7.775714] pcieport :00:02.5: Slot(0-5): Card present [7.776403] pcieport :00:02.5: Slot(0-5): Link Up [7.903183] pci :06:00.0: [1af4:1042] type 00 class 0x01 [7.904095] pci :06:00.0: reg 0x14: [mem 0x-0x0fff] [7.905024] pci :06:00.0: reg 0x20: [mem 0x-0x3fff 64bit pref] [7.906977] pcieport :00:02.5: bridge window [io 0x1000-0x0fff] to [bus 06] add_size 1000 [7.908069] pcieport :00:02.5: BAR 13: no space for [io size 0x1000] [7.908917] pcieport :00:02.5: BAR 13: failed to assign [io size 0x1000] [7.909832] pcieport :00:02.5: BAR 13: no space for [io size 0x1000] [7.910667] pcieport :00:02.5: BAR 13: failed to assign [io size 0x1000] [7.911586] pci :06:00.0: BAR 4: assigned [mem 0x80060-0x800603fff 64bit pref] [7.912616] pci :06:00.0: BAR 1: assigned [mem 0x8040-0x80400fff] [7.913472] pcieport :00:02.5: PCI bridge to [bus 06] [7.915762] pcieport :00:02.5: bridge window [mem 0x8040-0x805f] [7.917525] pcieport :00:02.5: bridge window [mem 0x80060-0x8007f 64bit pref] [7.920252] virtio-pci :06:00.0: enabling device ( -> 0002) [7.924487] virtio_blk virtio4: [vdb] 2097152 512-byte logical blocks (1.07 GB/1.00 GiB) [7.926616] vdb: detected capacity change from 0 to 1073741824 [ .. ] [ 246.751028] INFO: task irq/29-pciehp:173 blocked for more than 120 seconds. [ 246.752801] Not tainted 4.18.0-305.el8.x86_64 #1 [ 246.753902] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 246.755457] irq/29-pciehp D0 173 2 0x80004000 [ 246.756616] Call Trace: [ 246.757328] __schedule+0x2c4/0x700 [ 246.758185] schedule+0x38/0xa0 [ 246.758966] io_schedule+0x12/0x40 [ 246.759801] do_read_cache_page+0x513/0x770 [ 246.760761] ? blkdev_writepages+0x10/0x10 [ 246.761692] ? file_fdatawait_range+0x20/0x20 [ 246.762659] read_part_sector+0x38/0xda [ 246.763554] read_lba+0x10f/0x220 [ 246.764367] efi_partition+0x1e4/0x6de [ 246.765245] ? snprintf+0x49/0x60 [ 246.766046] ? is_gpt_valid.part.5+0x430/0x430 [ 246.766991] blk_add_partitions+0x164/0x3f0 [ 246.767915] ? blk_drop_partitions+0x91/0xc0 [ 246.768863] bdev_disk_changed+0x65/0xd0 [ 246.769748] __blkdev_get+0x3c4/0x510 [ 246.770595] blkdev_get+0xaf/0x180 [ 246.771394] __device_add_disk+0x3de/0x4b0 [ 246.772302] virtblk_probe+0x4ba/0x8a0 [virtio_blk] [ 246.773313] virtio_dev_probe+0x158/0x1f0 [ 246.774208] really_probe+0x255/0x4a0 [ 246.775046] ? __driver_attach_async_helper+0x90/0x90 [ 246.776091] driver_probe_device+0x49/0xc0 [ 246.776965] bus_for_each_drv+0x79/0xc0 [ 246.777813] __device_attach+0xdc/0x160 [ 246.778669] bus_probe_device+0x9d/0xb0 [ 246.779523] device_add+0x418/0x780 [ 246.780321] register_virtio_device+0x9e/0xe0 [ 246.781254] virtio_pci_probe+0xb3/0x140 [ 246.782124] local_pci_probe+0x41/0x90 [ 246.782937] pci_device_probe+0x105/0x1c0 [ 246.783807] really_probe+0x255/0x4a0 [ 246.784623] ? __driver_attach_async_helper+0x90/0x90 [ 246.785647] driver_probe_device+0x49/0xc0 [ 246.786526] bus_for_each_drv+0x79/0xc0 [ 246.787364] __device_attach+0xdc/0x160 [ 246.788205] pci_bus_add_device+0x4a/0x90 [ 246.789063] pci_bus_add_devices+0x2c/0x70 [ 246.789916] pciehp_configure_device+0x91/0x130 [ 246.790855] pciehp_handle_presence_or_link_change+0x334/0x460 [ 246.791985] pciehp_ist+0x1a2/0x1b0 [ 246.792768] ? irq_finalize_oneshot.part.47+0xf0/0xf0 [ 246.793768] irq_thread_fn+0x1f/0x50 [ 246.794550] irq_thread+0xe7/0x170 [ 246.795299] ? irq_forced_thread_fn+0x70/0x70 [ 246.796190] ? irq_thread_check_affinity+0xe0/0xe0 [ 246.797147] kthread+0x116/0x130 [ 246.797841] ? kthread_flush_work_fn+0x10/0x10 [ 246.798735] ret_from_fork+0x22/0x40 [ 246.799523] INFO: task sfdisk:1129 blocked for more than 120 seconds. [ 246.800717] Not tainted 4.18.0-305.el8.x86_64 #1 [ 246.801733] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 246.803155] sfdisk D0 1129 1107 0x4080 [ 246.804225] Call Trace: [ 246.804827] __schedule+0x2c4/0x700 [ 246.805590] ? submit_bio+0x3c/0x160 [ 246.806373] schedule+0x38/0xa0 [ 246.807089] schedule_preempt_disabled+0xa/0x10 [ 246.807990] __mutex_lock.isra.6+0x2d0/0x4a0 [ 246.808876] ? wake_up_q+0x80/0x80 [ 246.809636] ? fdatawait_one_bdev+0x20/0x20 [ 246.810508] iterate_bdevs+0x98/0x142 [ 246.811304] ksys_sync+0x6e/0xb0 [ 246.812041] __ia32_sys_sync+0xa/0x10 [
[Yahoo-eng-team] [Bug 1907686] Re: ovn: instance unable to retrieve metadata
This bug was fixed in the package openvswitch - 2.13.3-0ubuntu0.20.04.1~cloud0 --- openvswitch (2.13.3-0ubuntu0.20.04.1~cloud0) bionic-ussuri; urgency=medium . * New update for the Ubuntu Cloud Archive. . openvswitch (2.13.3-0ubuntu0.20.04.1) focal; urgency=medium . [ James Page ] * New upstream point release (LP: #1920141, LP: #1907686). * Dropped security patches, included in release: - CVE-2015-8011.patch - CVE-2020-27827.patch - CVE-2020-35498.patch * Add BD on libdbus-1-dev to resolve linking issues for DPDK builds due to changes in DPDK. * d/control: Set minimum version of libdpdk-dev to avoid build failures with 19.11.6-0ubuntu0.20.04.1. . [ Frode Nordahl ] * Fix recording of FQDN/hostname on startup (LP: #1915829): - d/p/ovs-dev-ovs-ctl-Allow-recording-hostname-separately.patch: Cherry pick of committed upstream fix to support skip of hostname configuration on ovs-vswitchd/ovsdb-server startup. - d/openvswitch-switch.ovs-record-hostname.service: Record hostname in Open vSwitch after network-online.target using new systemd unit. - d/openvswitch-switch.ovs-vswitchd.service: Pass `--no-record-hostname` option to `ovs-ctl` to delegate recording of hostname to the separate service. - d/openvswitch-switch.ovsdb-server.service: Pass `--no-record-hostname` option to `ovs-ctl` to delegate recording of hostname to the separate service. - d/openvswitch-switch.service: Add `Also` reference to ovs-record-hostname.service so that the service is enabled on install. - d/rules: Add `ovs-record-hostname.service` to package build. ** Changed in: cloud-archive/ussuri Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1907686 Title: ovn: instance unable to retrieve metadata Status in charm-ovn-chassis: Invalid Status in Ubuntu Cloud Archive: Fix Released Status in Ubuntu Cloud Archive ussuri series: Fix Released Status in Ubuntu Cloud Archive victoria series: Won't Fix Status in Ubuntu Cloud Archive wallaby series: Fix Released Status in neutron: Invalid Status in openvswitch package in Ubuntu: Fix Released Status in openvswitch source package in Focal: Fix Released Status in openvswitch source package in Groovy: Fix Released Status in openvswitch source package in Hirsute: Fix Released Bug description: [Impact] Cloud instances are unable to retrieve metadata on startup. [Test Case] Deploy OpenStack with OVN/OVS Restart OVN central controllers Create a new instance Instance will fail to retrieve metadata with the message from the original bug report displayed in the metadata agent log on the local hypervisor [Regression Potential] The fix for this issue is included in the upstream 2.13.3 release of OVS. The fix ensures that SSL related connection issues are correctly handling in python3-ovs avoiding an issue where the connection to the OVN SB IDL is reset and never recreated. The OVN drivers use python3-ovsdbapp which in turn bases off code provided by python3-ovs. [Original Bug Report] Ubuntu:focal OpenStack: ussuri Instance port: hardware offloaded instance created, attempts to access metadata - metadata agent can't resolve the port/network combination: 2020-12-10 15:00:18.258 4732 INFO neutron.agent.ovn.metadata.agent [-] Port d65418a6-d0e9-47e6-84ba-3d02fe75131a in datapath 37706e4d-ce2a-4d81-8c61-3fd12437a0a7 bound to our ch assis 2020-12-10 15:00:31.672 8062 ERROR neutron.agent.ovn.metadata.server [-] No port found in network 37706e4d-ce2a-4d81-8c61-3fd12437a0a7 with IP address 10.5.1.155 2020-12-10 15:00:31.673 8062 INFO eventlet.wsgi.server [-] 10.5.1.155, "GET /openstack HTTP/1.1" status: 404 len: 297 time: 0.0043790 2020-12-10 15:00:34.639 8062 ERROR neutron.agent.ovn.metadata.server [-] No port found in network 37706e4d-ce2a-4d81-8c61-3fd12437a0a7 with IP address 10.5.1.155 2020-12-10 15:00:34.639 8062 INFO eventlet.wsgi.server [-] 10.5.1.155, "GET /openstack HTTP/1.1" status: 404 len: 297 time: 0.0040138 To manage notifications about this bug go to: https://bugs.launchpad.net/charm-ovn-chassis/+bug/1907686/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1832021] Please test proposed package
Hello David, or anyone else affected, Accepted neutron into rocky-proposed. The package will build now and be available in the Ubuntu Cloud Archive in a few hours, and then in the -proposed repository. Please help us by testing this new package. To enable the -proposed repository: sudo add-apt-repository cloud-archive:rocky-proposed sudo apt-get update Your feedback will aid us getting this update out to other Ubuntu users. If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-rocky-needed to verification-rocky-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-rocky-failed. In either case, details of your testing will help us make a better decision. Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance! ** Changed in: cloud-archive/rocky Status: New => Fix Committed ** Tags added: verification-rocky-needed ** Changed in: cloud-archive/queens Status: New => Fix Committed ** Changed in: cloud-archive Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1832021 Title: Checksum drop of metadata traffic on isolated networks with DPDK Status in OpenStack neutron-openvswitch charm: Fix Released Status in Ubuntu Cloud Archive: Invalid Status in Ubuntu Cloud Archive queens series: Fix Committed Status in Ubuntu Cloud Archive rocky series: Fix Committed Status in Ubuntu Cloud Archive stein series: Fix Released Status in neutron: Fix Released Status in neutron package in Ubuntu: Fix Released Status in neutron source package in Bionic: Fix Committed Status in neutron source package in Focal: Fix Released Bug description: [Impact] When an isolated network using provider networks for tenants (meaning without virtual routers: DVR or network node), metadata access occurs in the qdhcp ip netns rather than the qrouter netns. The following options are set in the dhcp_agent.ini file: force_metadata = True enable_isolated_metadata = True VMs on the provider tenant network are unable to access metadata as packets are dropped due to checksum. [Test Plan] 1. Create an OpenStack deployment with DPDK options enabled and 'enable-local-dhcp-and-metadata: true' in neutron-openvswitch. A sample, simple 3 node bundle can be found here[1]. 2. Create an external flat network and subnet: openstack network show dpdk_net || \ openstack network create --provider-network-type flat \ --provider-physical-network physnet1 dpdk_net \ --external openstack subnet show dpdk_net || \ openstack subnet create --allocation-pool start=10.230.58.100,end=10.230.58.200 \ --subnet-range 10.230.56.0/21 --dhcp --gateway 10.230.56.1 \ --dns-nameserver 10.230.56.2 \ --ip-version 4 --network dpdk_net dpdk_subnet 3. Create an instance attached to that network. The instance must have a flavor that uses huge pages. openstack flavor create --ram 8192 --disk 50 --vcpus 4 m1.dpdk openstack flavor set m1.dpdk --property hw:mem_page_size=large openstack server create --wait --image xenial --flavor m1.dpdk --key- name testkey --network dpdk_net i1 4. Log into the instance host and check the instance console. The instance will hang into the boot and show the following message: 2020-11-20 09:43:26,790 - openstack.py[DEBUG]: Failed reading optional path http://169.254.169.254/openstack/2015-10-15/user_data due to: HTTPConnectionPool(host='169.254.169.254', port=80): Read timed out. (read timeout=10.0) 5. Apply the fix in all computes, restart the DHCP agents in all computes and create the instance again. 6. No errors should be shown and the instance quickly boots. [Where problems could occur] * This change is only touched if datapath_type and ovs_use_veth. Those settings are mostly used for DPDK environments. The core of the fix is to toggle off checksum offload done by the DHCP namespace interfaces. This will have the drawback of adding some overhead on the packet processing for DHCP traffic but given DHCP does not demand too much data, this should be a minor proble. * Future changes on the syntax of the ethtool command could cause regressions [Other Info] * None [1] https://gist.github.com/sombrafam/e0741138773e444960eb4aeace6e3e79 To manage notifications about this bug go to: https://bugs.launchpad.net/charm-neutron-openvswitch/+bug/1832021/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net
[Yahoo-eng-team] [Bug 1930706] [NEW] nova allows suboptimal emulator tread pinning for realtime guests
Public bug reported: today when ever you use a realtime guest you are required to enable cpu pinning and other feature such as spcifing a real time core mask via hw:cpu_realtime_mask or hw_cpu_realtime_mask. in the victoria release this requriement was relaxed somewhat with the intoduction of mixed cpu policy guest that are assinged pinned and floating cores. https://github.com/openstack/nova/commit/9fc63c764429c10f9041e6b53659e0cbd595bf6b It is now possible to allocate all cores in an instance to realtime and omit the ``hw:cpu_realtime_mask`` extra spec. This requires specifying the ``hw:emulator_threads_policy`` extra spec. https://github.com/openstack/nova/blob/50fdbc752a9ca9c31488140ef2997ed59d861a41/releasenotes/notes/bug-1884231-16acf297d88b122e.yaml however while that works well it also possible to hw:cpu_realtime_mask but not specify hw:emulator_threads_policy which leads to sub optimal xml generation for the libvirt driver. this is reported downstream as https://bugzilla.redhat.com/show_bug.cgi?id=1700390 for older releas that predata the changes referenced above. though in revaluation of this a possible improvment can be made as detailed in https://bugzilla.redhat.com/show_bug.cgi?id=1700390#c11 today if we have a 2 core vm where guest cpu 0 is non realtime and guest cpu 1 is realtime we .e.g. hw:cpu_policy=dedicated hw:cpu_realtime=True hw:cpu_realtime_mask=^0 would generate the xml as follows this is because the default behavior when no emulator_threads_policy is specifed is for the emulator thread to float over all the vm cores. but a slight modifcation to the xml could be made to have a more optimal default in this case useing the cpu_realtime_mask we can instead restrict the emulator thread to float over the non realtime cores with realtime priortiy. this will ensure that if qemu need to process a request for a device attach for example that the emulator thread has higher priorty then the guest vcpus that deal with guest house keeping task but will not interupt the realtime cores. this would give many of the benifits of emulator_threads_policy=share or emulator_threads_policy=isolate without increase resource usage or requireing any config,flavor or image changes. this should also be a backporable solution to this problem. this is espically important given realtime host often are deplopy with the kernel isolcpus paramater which mean that the kernel will not load balance the emulator thread acrros the range and will instead leave it onthe core it intially spwaned on. today you coudl get lucky and it could be spawn on core 0 in which case the new behvior would be the same or it coudl get spwaned on core 1. wehn the emulatro thread is spawned on core 1 sicne it has less priority then the vcpu thread it will only run if the guest vcpu idels resulting in the iablity for qemu to process device attach and other qemu monitor commands form libvirt or the user. ** Affects: nova Importance: Wishlist Status: Triaged ** Tags: libvirt numa -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1930706 Title: nova allows suboptimal emulator tread pinning for realtime guests Status in OpenStack Compute (nova): Triaged Bug description: today when ever you use a realtime guest you are required to enable cpu pinning and other feature such as spcifing a real time core mask via hw:cpu_realtime_mask or hw_cpu_realtime_mask. in the victoria release this requriement was relaxed somewhat with the intoduction of mixed cpu policy guest that are assinged pinned and floating cores. https://github.com/openstack/nova/commit/9fc63c764429c10f9041e6b53659e0cbd595bf6b It is now possible to allocate all cores in an instance to realtime and omit the ``hw:cpu_realtime_mask`` extra spec. This requires specifying the ``hw:emulator_threads_policy`` extra spec. https://github.com/openstack/nova/blob/50fdbc752a9ca9c31488140ef2997ed59d861a41/releasenotes/notes/bug-1884231-16acf297d88b122e.yaml however while that works well it also possible to hw:cpu_realtime_mask but not specify hw:emulator_threads_policy which leads to sub optimal xml generation for the libvirt driver. this is reported downstream as https://bugzilla.redhat.com/show_bug.cgi?id=1700390 for older releas that predata the changes referenced above. though in revaluation of this a possible improvment can be made as detailed in https://bugzilla.redhat.com/show_bug.cgi?id=1700390#c11 today if we have a 2 core vm where guest cpu 0 is non realtime and guest cpu 1 is realtime we .e.g. hw:cpu_policy=dedicated hw:cpu_realtime=True hw:cpu_realtime_mask=^0 would generate the xml as follows this is because the default behavior when no emulator_threads_policy is specifed is for the emulator thread to float