I'm still really confused by this but some thoughts on the nova os.chmod() call mentioned in an earlier commit that would fix this.
If I chmod the tmp dir that gets created by nova (e.g. /var/lib/nova/instances/snapshots/tmpkajuir8o) to 755 just before the snapshot (after the nova chmod), the snapshot is successful. As mentioned in https://bugs.launchpad.net/ubuntu/+source/nova/+bug/1896617/comments/18, the upstream nova code sets permissions for the tmp dir with: os.chmod(tmpdir, 0o701) That code has been that way since 2015, so it's not new in ussuri, see git blame: 824c3706a3e nova/virt/libvirt/driver.py (Nicolas Simonds 2015-07-23 12:47:24 -0500 2388) # NOTE(xqueralt): libvirt needs o+x in the tempdir 824c3706a3e nova/virt/libvirt/driver.py (Nicolas Simonds 2015-07-23 12:47:24 -0500 2389) os.chmod(tmpdir, 0o701) However, this seems like a heavy handed chmod if the goal, as the comment above it mentions, is to give libvirt o+x in the tempdir. I say this because it overrides any default permissions that were set previously by the operating system. It seems that this should really be a lighter touch such as the following (equivalent to chmod o+x tmpdir): st = os.stat(tmpdir) os.chmod(tmpdir, st.st_mode | stat.S_IXOTH) That would fix this bug for us, but still doesn't explain what changed in Ubuntu to cause this to fail. We did make some permissions changes in the nova package in focal but as compared above (with ussuri-proposed) file/directory permissions above in comment #21 I'm seeing no differences. ** Changed in: nova Status: Invalid => New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1896617 Title: Creation of image (or live snapshot) from the existing VM fails if libvirt-image-backend is configured to qcow2 starting from Ussuri Status in OpenStack nova-compute charm: Invalid Status in OpenStack Compute (nova): New Status in nova package in Ubuntu: Triaged Bug description: tl;dr 1) creating the image from the existing VM fails if qcow2 image backend is used, but everything is fine if using rbd image backend in nova-compute. 2) openstack server image create --name <name of the new image> <instance name or uuid> fails with some unrelated error: $ openstack server image create --wait 842fa12c-19ee-44cb-bb31-36d27ec9d8fc HTTP 404 Not Found: No image found with ID f4693860-cd8d-4088-91b9-56b2f173ffc7 == Details == Two Tempest tests ([1] and [2]) from the 2018.02 Refstack test lists [0] are failing with the following exception: 49701867-bedc-4d7d-aa71-7383d877d90c Traceback (most recent call last): File "/home/ubuntu/snap/fcbtest/14/.rally/verification/verifier-2d9cbf4d-fcbb-491d-848d-5137a9bde99e/repo/tempest/api/compute/base.py", line 369, in create_image_from_server waiters.wait_for_image_status(client, image_id, wait_until) File "/home/ubuntu/snap/fcbtest/14/.rally/verification/verifier-2d9cbf4d-fcbb-491d-848d-5137a9bde99e/repo/tempest/common/waiters.py", line 161, in wait_for_image_status image = show_image(image_id) File "/home/ubuntu/snap/fcbtest/14/.rally/verification/verifier-2d9cbf4d-fcbb-491d-848d-5137a9bde99e/repo/tempest/lib/services/compute/images_client.py", line 74, in show_image resp, body = self.get("images/%s" % image_id) File "/home/ubuntu/snap/fcbtest/14/.rally/verification/verifier-2d9cbf4d-fcbb-491d-848d-5137a9bde99e/repo/tempest/lib/common/rest_client.py", line 298, in get return self.request('GET', url, extra_headers, headers) File "/home/ubuntu/snap/fcbtest/14/.rally/verification/verifier-2d9cbf4d-fcbb-491d-848d-5137a9bde99e/repo/tempest/lib/services/compute/base_compute_client.py", line 48, in request method, url, extra_headers, headers, body, chunked) File "/home/ubuntu/snap/fcbtest/14/.rally/verification/verifier-2d9cbf4d-fcbb-491d-848d-5137a9bde99e/repo/tempest/lib/common/rest_client.py", line 687, in request self._error_checker(resp, resp_body) File "/home/ubuntu/snap/fcbtest/14/.rally/verification/verifier-2d9cbf4d-fcbb-491d-848d-5137a9bde99e/repo/tempest/lib/common/rest_client.py", line 793, in _error_checker raise exceptions.NotFound(resp_body, resp=resp) tempest.lib.exceptions.NotFound: Object not found Details: {'code': 404, 'message': 'Image not found.'} During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/ubuntu/snap/fcbtest/14/.rally/verification/verifier-2d9cbf4d-fcbb-491d-848d-5137a9bde99e/repo/tempest/api/compute/images/test_images_oneserver.py", line 69, in test_create_delete_image wait_until='ACTIVE') File "/home/ubuntu/snap/fcbtest/14/.rally/verification/verifier-2d9cbf4d-fcbb-491d-848d-5137a9bde99e/repo/tempest/api/compute/base.py", line 384, in create_image_from_server image_id=image_id) tempest.exceptions.SnapshotNotFoundException: Server snapshot image d82e95b0-9c62-492d-a08c-5bb118d3bf56 not found. So far I was able to identify the following: 1) https://github.com/openstack/tempest/blob/master/tempest/api/compute/images/test_images_oneserver.py#L69 invokes a "create image from server" 2) It fails with the following error message in the nova-compute logs: https://pastebin.canonical.com/p/h6ZXdqjRRm/ The same occurs if the "openstack server image create --wait" will be executed; however, according to https://docs.openstack.org/nova/ussuri/admin/migrate-instance-with- snapshot.html the VM has to be shut down before the image creation: "Shut down the source VM before you take the snapshot to ensure that all data is flushed to disk. If necessary, list the instances to view the instance name. Use the openstack server stop command to shut down the instance:" This step is definitely being skipped by the test (e.g it's trying to perform the snapshot on top of the live VM). FWIW, I'm using libvirt-image-backend: qcow2 in my nova-compute application params; and I was able to confirm that if the above parameter will be changed to "libvirt-image-backend: rbd", the tests will pass successfully. Also, there is similar issue I was able to find: https://bugs.launchpad.net/nova/+bug/1885418 but it doesn't have any useful information rather then confirmation of the fact that OpenStack Ussuri + libvirt backend has some problem with the live snapshotting. [0] https://refstack.openstack.org/api/v1/guidelines/2018.02/tests?target=platform&type=required&alias=true&flag=false [1] tempest.api.compute.images.test_images_oneserver.ImagesOneServerTestJSON.test_create_delete_image[id-3731d080-d4c5-4872-b41a-64d0d0021314] [2] tempest.api.compute.images.test_images_oneserver.ImagesOneServerTestJSON.test_create_image_specify_multibyte_character_image_name[id-3b7c6fe4-dfe7-477c-9243-b06359db51e6] To manage notifications about this bug go to: https://bugs.launchpad.net/charm-nova-compute/+bug/1896617/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp