[Expired for OpenStack Compute (nova) because there has been no activity for 60 days.]
** Changed in: nova Status: Incomplete => Expired -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1846027 Title: [Error Code 42] Domain not found when hard-reset is used Status in OpenStack Compute (nova): Expired Bug description: Not entirely sure if this is a bug, but at least the underlying logic seems to mess this up. I have 7 computes nodes on a ostack cluster. THis issue happens on cluster1 and 5. for two VMs. When it happens: At hard reboot. Let's say I have a VM that for some reason is blocked (out of memory, whatever). Then I do a hard reboot. When I do that the underlying nova code closes the iSCSI connection to the cinder storage (I verified this), then it tries to restart the domain failing with: 2019-09-30 11:54:00.366 4484 WARNING nova.virt.libvirt.driver [req- 1c2a5462-50d1-4cfb-b743-a4ea2195acb0 - - - - -] Error from libvirt while getting description of instance-000002b1: [Error Code 42] Domain not found: no domain with matching uuid '39a02162-7e99-45b8-837c- 4db0f20025af' (instance-000002b1): libvirt.libvirtError: Domain not found: no domain with matching uuid '39a02162-7e99-45b8-837c- 4db0f20025af' (instance-000002b1) Let me stop here for a moment. If in this step I go to the compute node and do a virsh list --all the instance is not there at all. I also get: {u'message': u'Volume device not found at .', u'code': 500, u'details': u' File "/usr/lib/python3/dist- packages/nova/compute/manager.py", line 202, in decorated_function\n return function(self, context, *args, **kwargs)\n File "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 3512, in reboot_instance\n self._set_instance_obj_error_state(context, instance)\n File "/usr/lib/python3/dist- packages/oslo_utils/excutils.py", line 220, in __exit__\n self.force_reraise()\n File "/usr/lib/python3/dist- packages/oslo_utils/excutils.py", line 196, in force_reraise\n six.reraise(self.type_, self.value, self.tb)\n File "/usr/lib/python3 /dist-packages/six.py", line 693, in reraise\n raise value\n File "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 3486, in reboot_instance\n bad_volumes_callback=bad_volumes_callback)\n File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 2739, in reboot\n block_device_info)\n File "/usr/lib/python3 /dist-packages/nova/virt/libvirt/driver.py", line 2833, in _hard_reboot\n mdevs=mdevs)\n File "/usr/lib/python3/dist- packages/nova/virt/libvirt/driver.py", line 5490, in _get_guest_xml\n context, mdevs)\n File "/usr/lib/python3/dist- packages/nova/virt/libvirt/driver.py", line 5283, in _get_guest_config\n flavor, guest.os_type)\n File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 4093, in _get_guest_storage_config\n self._connect_volume(context, connection_info, instance)\n File "/usr/lib/python3/dist- packages/nova/virt/libvirt/driver.py", line 1276, in _connect_volume\n vol_driver.connect_volume(connection_info, instance)\n File "/usr/lib/python3/dist-packages/nova/virt/libvirt/volume/iscsi.py", line 64, in connect_volume\n device_info = self.connector.connect_volume(connection_info[\'data\'])\n File "/usr/lib/python3/dist-packages/os_brick/utils.py", line 137, in trace_logging_wrapper\n return f(*args, **kwargs)\n File "/usr/lib/python3/dist-packages/oslo_concurrency/lockutils.py", line 328, in inner\n return f(*args, **kwargs)\n File "/usr/lib/python3 /dist-packages/os_brick/initiator/connectors/iscsi.py", line 518, in connect_volume\n self._cleanup_connection(connection_properties, force=True)\n File "/usr/lib/python3/dist- packages/oslo_utils/excutils.py", line 220, in __exit__\n self.force_reraise()\n File "/usr/lib/python3/dist- packages/oslo_utils/excutils.py", line 196, in force_reraise\n six.reraise(self.type_, self.value, self.tb)\n File "/usr/lib/python3 /dist-packages/six.py", line 693, in reraise\n raise value\n File "/usr/lib/python3/dist- packages/os_brick/initiator/connectors/iscsi.py", line 512, in connect_volume\n return self._connect_single_volume(connection_properties)\n File "/usr/lib/python3/dist-packages/os_brick/utils.py", line 61, in _wrapper\n return r.call(f, *args, **kwargs)\n File "/usr/lib/python3/dist-packages/retrying.py", line 212, in call\n raise attempt.get()\n File "/usr/lib/python3/dist- packages/retrying.py", line 247, in get\n six.reraise(self.value[0], self.value[1], self.value[2])\n File "/usr/lib/python3/dist-packages/six.py", line 693, in reraise\n raise value\n File "/usr/lib/python3/dist-packages/retrying.py", line 200, in call\n attempt = Attempt(fn(*args, **kwargs), attempt_number, False)\n File "/usr/lib/python3/dist- packages/os_brick/initiator/connectors/iscsi.py", line 587, in _connect_single_volume\n raise exception.VolumeDeviceNotFound(device=\'\')\n', u'created': u'2019-09-29T23:44:32Z'} | And on the nova compute logs I see: 2019-09-30 14:15:21.388 4484 WARNING nova.compute.manager [req- 1c2a5462-50d1-4cfb-b743-a4ea2195acb0 - - - - -] While synchronizing instance power states, found 33 instances in the database and 34 instances on the hypervisor. Something is not well synchronized and I believe this is the reason everything else is failing. My workaround: When this happens ostack set the vm-state to ERROR. I change the state to active, and the stop the Instance. then I detach the volume (cinder, iscsi based) start the VM, shutdown the VM, attach the volume agan, and start the VM. This fix it. But if my user do a hard reset again it will happen again. Let me know if you need more information and I would be eager to provide it. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1846027/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp