[Yahoo-eng-team] [Bug 2110738] [NEW] Stable rescue fails when necessary image properties not set

Artom Lifshitz Wed, 14 May 2025 09:30:58 -0700

Public bug reported:

>From https://issues.redhat.com/browse/OSPRH-13142:


Description of problem:

For a boot-from-volume instances, 'openstack server rescue <vm> --image
<image>' fails with the following issues:

1.  It attempts to attach two disks: <instance_uuid>_disk &
<instance_uuid>_disk.rescue.  Only, <instance_uuid>_disk.rescue is
created so it fails with the following error:

2024-01-23 16:32:14.338 2 ERROR oslo_messaging.rpc.server
nova.exception.InstanceNotRescuable: Instance
dc0812ba-b4ca-4ffa-a7e5-2157e52f35d2 cannot be rescued: Driver Error:
internal error: process exited while connecting to monitor:
2024-01-23T16:32:13.017966Z qemu-kvm: -blockdev
{"driver":"rbd","pool":"vms","image":"dc0812ba-b4ca-4ffa-a7e5-2157e52f35d2_disk","server":[{"host":"172.16.1.100","port":"6789"}],"user":"openstack","auth-
client-required":["cephx","none"],"key-secret":"libvirt-1-storage-auth-
secret0","node-name":"libvirt-1-storage","cache":{"direct":false,"no-
flush":false},"auto-read-only":true,"discard":"unmap"}: error reading
header from dc0812ba-b4ca-4ffa-a7e5-2157e52f35d2_disk: No such file or
directory

If you look in ceph, only the .rescue image exists.

# rbd --id openstack -p vms ls -l
NAME                                              SIZE    PARENT  FMT  PROT  
LOCK
dc0812ba-b4ca-4ffa-a7e5-2157e52f35d2_disk.rescue  10 GiB            2        
excl

However we see the instance configured with both disks.


# virsh domblklist instance-00000003
 Target   Source
----------------------------------------------------------------
 vda      vms/dc0812ba-b4ca-4ffa-a7e5-2157e52f35d2_disk.rescue
 vdb      vms/dc0812ba-b4ca-4ffa-a7e5-2157e52f35d2_disk

If I manually copy, the UUID_disk.rescue to UUID_disk, the instance will
boot into RESCUE mode.  It seems the UUID_disk volume is not needed and
should not be configured in this RESCUE situation.

2.  The RESCUED instance doesn't attach the cinder root volume.  The
cinder root also doesnt re-attach after "unrescuing" the instance.

Reproducer:

$ openstack volume create --size 10 --image rhel8 rootvol1

$ openstack volume list
+--------------------------------------+----------+-----------+------+-------------+
| ID                                   | Name     | Status    | Size | Attached 
to |
+--------------------------------------+----------+-----------+------+-------------+
| f855dfe6-ad5a-4497-87ff-16ac5856f596 | rootvol1 | available |   10 |          
   |
+--------------------------------------+----------+-----------+------+-------------+


$ openstack server create --key-name default --flavor rhel --volume rootvol1 
--network external test1

$ openstack server show test1 -c status -c image -c volumes_attached
+------------------+--------------------------------------------------------------------------+
| Field            | Value                                                      
              |
+------------------+--------------------------------------------------------------------------+
| image            | N/A (booted from volume)                                   
              |
| status           | ACTIVE                                                     
              |
| volumes_attached | delete_on_termination='False', 
id='f855dfe6-ad5a-4497-87ff-16ac5856f596' |
+------------------+--------------------------------------------------------------------------+


$ openstack server rescue test1 --image rhel8


$ openstack server show test1 -c status -c image -c volumes_attached -c fault 
--fit
+------------------+-------------------------------------------------------------------------------------------------------------------------------------------+
| Field            | Value                                                      
                                                                               |
+------------------+-------------------------------------------------------------------------------------------------------------------------------------------+
| fault            | {'code': 400, 'created': '2024-01-23T20:12:17Z', 
'message': 'Instance ac3d46c0-c8d5-45df-bd17-d467baaa5a98 cannot be rescued: 
Driver      |
|                  | Error: internal error: process exited while connecting to 
monitor: 2024-01-23T20:12:17.612453Z qemu-kvm: -blockdev                        
|
|                  | 
{"driver":"rbd","pool":"vms","image":"ac3d46c0-c8d5-45df-bd17-d467ba'}          
                                                          |
| image            | N/A (booted from volume)                                   
                                                                               |
| status           | ERROR                                                      
                                                                               |
| volumes_attached | delete_on_termination='False', 
id='f855dfe6-ad5a-4497-87ff-16ac5856f596'                                       
                           |
+------------------+-------------------------------------------------------------------------------------------------------------------------------------------+


# virsh domblklist instance-00000004
 Target   Source
----------------------------------------------------------------
 vda      vms/ac3d46c0-c8d5-45df-bd17-d467baaa5a98_disk.rescue
 vdb      vms/ac3d46c0-c8d5-45df-bd17-d467baaa5a98_disk


# rbd --id openstack -p vms ls -l
NAME                                              SIZE    PARENT  FMT  PROT  
LOCK
ac3d46c0-c8d5-45df-bd17-d467baaa5a98_disk.rescue  10 GiB            2     

NOTE: here if manually create the _disk volume, the instance will boot
into rescue mode; however, the cinder volume is not attached.

# rbd --id openstack cp vms/ac3d46c0-c8d5-45df-bd17-d467baaa5a98_disk.rescue 
vms/ac3d46c0-c8d5-45df-bd17-d467baaa5a98_disk
Image copy: 100% complete...done.

RESCUE now completes and the instance is accessible (without cinder root
vol attached).

$ openstack server show test1 -c status -c image -c volumes_attached -c fault 
--fit
+------------------+--------------------------------------------------------------------------+
| Field            | Value                                                      
              |
+------------------+--------------------------------------------------------------------------+
| image            | N/A (booted from volume)                                   
              |
| status           | RESCUE                                                     
              |
| volumes_attached | delete_on_termination='False', 
id='f855dfe6-ad5a-4497-87ff-16ac5856f596' |
+------------------+--------------------------------------------------------------------------+

volume still shows in-use

$ openstack volume list
+--------------------------------------+----------+--------+------+--------------------------------+
| ID                                   | Name     | Status | Size | Attached to 
                   |
+--------------------------------------+----------+--------+------+--------------------------------+
| f855dfe6-ad5a-4497-87ff-16ac5856f596 | rootvol1 | in-use |   10 | Attached to 
test1 on /dev/vda  |
+--------------------------------------+----------+--------+------+--------------------------------+

But not attached.

# virsh domblklist instance-00000004
 Target   Source
----------------------------------------------------------------
 vda      vms/ac3d46c0-c8d5-45df-bd17-d467baaa5a98_disk.rescue
 vdb      vms/ac3d46c0-c8d5-45df-bd17-d467baaa5a98_disk


The other ugly thing, the unrescue does not revert this back to original disk 
config.

$ openstack server unrescue test1
$ openstack server show test1 -c status -c image -c volumes_attached -c fault 
--fit
+------------------+--------------------------------------------------------------------------+
| Field            | Value                                                      
              |
+------------------+--------------------------------------------------------------------------+
| image            | N/A (booted from volume)                                   
              |
| status           | ACTIVE                                                     
              |
| volumes_attached | delete_on_termination='False', 
id='f855dfe6-ad5a-4497-87ff-16ac5856f596' |
+------------------+--------------------------------------------------------------------------+

The above looks good, but the instance is still booted on rescue disks.

# virsh domblklist instance-00000004
 Target   Source
----------------------------------------------------------------
 vda      vms/ac3d46c0-c8d5-45df-bd17-d467baaa5a98_disk.rescue
 vdb      vms/ac3d46c0-c8d5-45df-bd17-d467baaa5a98_disk

A hard reboot will fix it:

$ openstack server reboot --hard test1

Now the instance is back to boot from vol:

# virsh domblklist instance-00000004
 Target   Source
---------------------------------------------------------------
 vda      volumes/volume-f855dfe6-ad5a-4497-87ff-16ac5856f596


Version-Release number of selected component (if applicable):
Wallaby

How reproducible:
100%

Steps to Reproduce:
1. See above
2.
3.

** Affects: nova
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2110738

Title:
  Stable rescue fails when necessary image properties not set

Status in OpenStack Compute (nova):
  New

Bug description:
  From https://issues.redhat.com/browse/OSPRH-13142:

  Description of problem:

  For a boot-from-volume instances, 'openstack server rescue <vm>
  --image <image>' fails with the following issues:

  1.  It attempts to attach two disks: <instance_uuid>_disk &
  <instance_uuid>_disk.rescue.  Only, <instance_uuid>_disk.rescue is
  created so it fails with the following error:

  2024-01-23 16:32:14.338 2 ERROR oslo_messaging.rpc.server
  nova.exception.InstanceNotRescuable: Instance
  dc0812ba-b4ca-4ffa-a7e5-2157e52f35d2 cannot be rescued: Driver Error:
  internal error: process exited while connecting to monitor:
  2024-01-23T16:32:13.017966Z qemu-kvm: -blockdev
  
{"driver":"rbd","pool":"vms","image":"dc0812ba-b4ca-4ffa-a7e5-2157e52f35d2_disk","server":[{"host":"172.16.1.100","port":"6789"}],"user":"openstack","auth-
  client-required":["cephx","none"],"key-secret":"libvirt-1-storage-
  auth-secret0","node-
  name":"libvirt-1-storage","cache":{"direct":false,"no-
  flush":false},"auto-read-only":true,"discard":"unmap"}: error reading
  header from dc0812ba-b4ca-4ffa-a7e5-2157e52f35d2_disk: No such file or
  directory

  If you look in ceph, only the .rescue image exists.

  # rbd --id openstack -p vms ls -l
  NAME                                              SIZE    PARENT  FMT  PROT  
LOCK
  dc0812ba-b4ca-4ffa-a7e5-2157e52f35d2_disk.rescue  10 GiB            2        
excl

  However we see the instance configured with both disks.

  
  # virsh domblklist instance-00000003
   Target   Source
  ----------------------------------------------------------------
   vda      vms/dc0812ba-b4ca-4ffa-a7e5-2157e52f35d2_disk.rescue
   vdb      vms/dc0812ba-b4ca-4ffa-a7e5-2157e52f35d2_disk

  If I manually copy, the UUID_disk.rescue to UUID_disk, the instance
  will boot into RESCUE mode.  It seems the UUID_disk volume is not
  needed and should not be configured in this RESCUE situation.

  2.  The RESCUED instance doesn't attach the cinder root volume.  The
  cinder root also doesnt re-attach after "unrescuing" the instance.

  Reproducer:

  $ openstack volume create --size 10 --image rhel8 rootvol1

  $ openstack volume list
  
+--------------------------------------+----------+-----------+------+-------------+
  | ID                                   | Name     | Status    | Size | 
Attached to |
  
+--------------------------------------+----------+-----------+------+-------------+
  | f855dfe6-ad5a-4497-87ff-16ac5856f596 | rootvol1 | available |   10 |        
     |
  
+--------------------------------------+----------+-----------+------+-------------+

  
  $ openstack server create --key-name default --flavor rhel --volume rootvol1 
--network external test1

  $ openstack server show test1 -c status -c image -c volumes_attached
  
+------------------+--------------------------------------------------------------------------+
  | Field            | Value                                                    
                |
  
+------------------+--------------------------------------------------------------------------+
  | image            | N/A (booted from volume)                                 
                |
  | status           | ACTIVE                                                   
                |
  | volumes_attached | delete_on_termination='False', 
id='f855dfe6-ad5a-4497-87ff-16ac5856f596' |
  
+------------------+--------------------------------------------------------------------------+

  
  $ openstack server rescue test1 --image rhel8

  
  $ openstack server show test1 -c status -c image -c volumes_attached -c fault 
--fit
  
+------------------+-------------------------------------------------------------------------------------------------------------------------------------------+
  | Field            | Value                                                    
                                                                                
 |
  
+------------------+-------------------------------------------------------------------------------------------------------------------------------------------+
  | fault            | {'code': 400, 'created': '2024-01-23T20:12:17Z', 
'message': 'Instance ac3d46c0-c8d5-45df-bd17-d467baaa5a98 cannot be rescued: 
Driver      |
  |                  | Error: internal error: process exited while connecting 
to monitor: 2024-01-23T20:12:17.612453Z qemu-kvm: -blockdev                     
   |
  |                  | 
{"driver":"rbd","pool":"vms","image":"ac3d46c0-c8d5-45df-bd17-d467ba'}          
                                                          |
  | image            | N/A (booted from volume)                                 
                                                                                
 |
  | status           | ERROR                                                    
                                                                                
 |
  | volumes_attached | delete_on_termination='False', 
id='f855dfe6-ad5a-4497-87ff-16ac5856f596'                                       
                           |
  
+------------------+-------------------------------------------------------------------------------------------------------------------------------------------+

  
  # virsh domblklist instance-00000004
   Target   Source
  ----------------------------------------------------------------
   vda      vms/ac3d46c0-c8d5-45df-bd17-d467baaa5a98_disk.rescue
   vdb      vms/ac3d46c0-c8d5-45df-bd17-d467baaa5a98_disk

  
  # rbd --id openstack -p vms ls -l
  NAME                                              SIZE    PARENT  FMT  PROT  
LOCK
  ac3d46c0-c8d5-45df-bd17-d467baaa5a98_disk.rescue  10 GiB            2     

  NOTE: here if manually create the _disk volume, the instance will boot
  into rescue mode; however, the cinder volume is not attached.

  # rbd --id openstack cp vms/ac3d46c0-c8d5-45df-bd17-d467baaa5a98_disk.rescue 
vms/ac3d46c0-c8d5-45df-bd17-d467baaa5a98_disk
  Image copy: 100% complete...done.

  RESCUE now completes and the instance is accessible (without cinder
  root vol attached).

  $ openstack server show test1 -c status -c image -c volumes_attached -c fault 
--fit
  
+------------------+--------------------------------------------------------------------------+
  | Field            | Value                                                    
                |
  
+------------------+--------------------------------------------------------------------------+
  | image            | N/A (booted from volume)                                 
                |
  | status           | RESCUE                                                   
                |
  | volumes_attached | delete_on_termination='False', 
id='f855dfe6-ad5a-4497-87ff-16ac5856f596' |
  
+------------------+--------------------------------------------------------------------------+

  volume still shows in-use

  $ openstack volume list
  
+--------------------------------------+----------+--------+------+--------------------------------+
  | ID                                   | Name     | Status | Size | Attached 
to                    |
  
+--------------------------------------+----------+--------+------+--------------------------------+
  | f855dfe6-ad5a-4497-87ff-16ac5856f596 | rootvol1 | in-use |   10 | Attached 
to test1 on /dev/vda  |
  
+--------------------------------------+----------+--------+------+--------------------------------+

  But not attached.

  # virsh domblklist instance-00000004
   Target   Source
  ----------------------------------------------------------------
   vda      vms/ac3d46c0-c8d5-45df-bd17-d467baaa5a98_disk.rescue
   vdb      vms/ac3d46c0-c8d5-45df-bd17-d467baaa5a98_disk

  
  The other ugly thing, the unrescue does not revert this back to original disk 
config.

  $ openstack server unrescue test1
  $ openstack server show test1 -c status -c image -c volumes_attached -c fault 
--fit
  
+------------------+--------------------------------------------------------------------------+
  | Field            | Value                                                    
                |
  
+------------------+--------------------------------------------------------------------------+
  | image            | N/A (booted from volume)                                 
                |
  | status           | ACTIVE                                                   
                |
  | volumes_attached | delete_on_termination='False', 
id='f855dfe6-ad5a-4497-87ff-16ac5856f596' |
  
+------------------+--------------------------------------------------------------------------+

  The above looks good, but the instance is still booted on rescue
  disks.

  # virsh domblklist instance-00000004
   Target   Source
  ----------------------------------------------------------------
   vda      vms/ac3d46c0-c8d5-45df-bd17-d467baaa5a98_disk.rescue
   vdb      vms/ac3d46c0-c8d5-45df-bd17-d467baaa5a98_disk

  A hard reboot will fix it:

  $ openstack server reboot --hard test1

  Now the instance is back to boot from vol:

  # virsh domblklist instance-00000004
   Target   Source
  ---------------------------------------------------------------
   vda      volumes/volume-f855dfe6-ad5a-4497-87ff-16ac5856f596

  
  Version-Release number of selected component (if applicable):
  Wallaby

  How reproducible:
  100%

  Steps to Reproduce:
  1. See above
  2.
  3.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/2110738/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 2110738] [NEW] Stable rescue fails when necessary image properties not set

Reply via email to