Public bug reported: This is semi-related to bug 1497253 but I found it while triaging that bug to see if it was still an issue since Pike (I don't think it is).
If you run devstack with default superconductor mode configuration, and configure nova-cpu.conf with: [cinder] cross_az_attach=False Then try to boot from volume where nova-compute creates the volume, it fails with CantStartEngineError because the cell conductor (n-cond- cell1.service) is not configured to reach the API DB to get host aggregate information. Here is a nova boot command to recreate: $ nova boot --flavor cirros256 --block-device id=e642acfd-4283-458a- b7ea- 6c316da3b2ce,source=image,dest=volume,shutdown=remove,size=1,bootindex=0 --poll test-bfv Where the block device id is the uuid of the cirros image in the devstack env. This is the failure in the nova-compute logs: http://paste.openstack.org/show/725723/ 972-4b14-93ad-e7b86edc3a26 service nova] [instance: 910509b9-e23a-4b40-bb42-0df7b65bb36e] Getting AZ for instance; instance.host: rocky; instance.availabilty_zone: nova 3-c972-4b14-93ad-e7b86edc3a26 service nova] [instance: 910509b9-e23a-4b40-bb42-0df7b65bb36e] Instance failed block device setup: RemoteError: Remote error: CantStartEngineEr File "/opt/stack/nova/nova/conductor/manager.py", line 124, in _object_dispatch\n return getattr(target, method)(*args, **kwargs)\n', u' File "/usr/local/lib/python2.7 b9-e23a-4b40-bb42-0df7b65bb36e] Traceback (most recent call last): b9-e23a-4b40-bb42-0df7b65bb36e] File "/opt/stack/nova/nova/compute/manager.py", line 1564, in _prep_block_device b9-e23a-4b40-bb42-0df7b65bb36e] wait_func=self._await_block_device_map_created) b9-e23a-4b40-bb42-0df7b65bb36e] File "/opt/stack/nova/nova/virt/block_device.py", line 854, in attach_block_devices b9-e23a-4b40-bb42-0df7b65bb36e] _log_and_attach(device) b9-e23a-4b40-bb42-0df7b65bb36e] File "/opt/stack/nova/nova/virt/block_device.py", line 851, in _log_and_attach b9-e23a-4b40-bb42-0df7b65bb36e] bdm.attach(*attach_args, **attach_kwargs) b9-e23a-4b40-bb42-0df7b65bb36e] File "/opt/stack/nova/nova/virt/block_device.py", line 747, in attach b9-e23a-4b40-bb42-0df7b65bb36e] context, instance, volume_api, virt_driver) b9-e23a-4b40-bb42-0df7b65bb36e] File "/opt/stack/nova/nova/virt/block_device.py", line 46, in wrapped b9-e23a-4b40-bb42-0df7b65bb36e] ret_val = method(obj, context, *args, **kwargs) b9-e23a-4b40-bb42-0df7b65bb36e] File "/opt/stack/nova/nova/virt/block_device.py", line 623, in attach b9-e23a-4b40-bb42-0df7b65bb36e] instance=instance) b9-e23a-4b40-bb42-0df7b65bb36e] File "/opt/stack/nova/nova/volume/cinder.py", line 504, in check_availability_zone b9-e23a-4b40-bb42-0df7b65bb36e] instance_az = az.get_instance_availability_zone(context, instance) b9-e23a-4b40-bb42-0df7b65bb36e] File "/opt/stack/nova/nova/availability_zones.py", line 194, in get_instance_availability_zone b9-e23a-4b40-bb42-0df7b65bb36e] az = get_host_availability_zone(elevated, host) b9-e23a-4b40-bb42-0df7b65bb36e] File "/opt/stack/nova/nova/availability_zones.py", line 95, in get_host_availability_zone b9-e23a-4b40-bb42-0df7b65bb36e] key='availability_zone') b9-e23a-4b40-bb42-0df7b65bb36e] File "/usr/local/lib/python2.7/dist-packages/oslo_versionedobjects/base.py", line 177, in wrapper b9-e23a-4b40-bb42-0df7b65bb36e] args, kwargs) b9-e23a-4b40-bb42-0df7b65bb36e] File "/opt/stack/nova/nova/conductor/rpcapi.py", line 241, in object_class_action_versions b9-e23a-4b40-bb42-0df7b65bb36e] args=args, kwargs=kwargs) b9-e23a-4b40-bb42-0df7b65bb36e] File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/client.py", line 179, in call b9-e23a-4b40-bb42-0df7b65bb36e] retry=self.retry) b9-e23a-4b40-bb42-0df7b65bb36e] File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/transport.py", line 133, in _send b9-e23a-4b40-bb42-0df7b65bb36e] retry=retry) b9-e23a-4b40-bb42-0df7b65bb36e] File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 584, in send b9-e23a-4b40-bb42-0df7b65bb36e] call_monitor_timeout, retry=retry) b9-e23a-4b40-bb42-0df7b65bb36e] File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 575, in _send b9-e23a-4b40-bb42-0df7b65bb36e] raise result b9-e23a-4b40-bb42-0df7b65bb36e] RemoteError: Remote error: CantStartEngineError No sql_connection parameter is established b9-e23a-4b40-bb42-0df7b65bb36e] [u'Traceback (most recent call last):\n', u' File "/opt/stack/nova/nova/conductor/manager.py", line 124, in _object_dispatch\n return get b9-e23a-4b40-bb42-0df7b65bb36e] The logging at the start is my own for debug: 972-4b14-93ad-e7b86edc3a26 service nova] [instance: 910509b9-e23a- 4b40-bb42-0df7b65bb36e] Getting AZ for instance; instance.host: rocky; instance.availabilty_zone: nova But it shows that the instance.host and instance.availability_zone are set. The instance.host gets set by the instance_claim in the resource tracker and the instance.availability_zone get set by conductor at the top in the schedule_and_build_instances method due to this change in pike: https://review.openstack.org/#/c/446053/ So all I have to do to avoid the up-call is this: diff --git a/nova/availability_zones.py b/nova/availability_zones.py index 7c8d948..f128d8e 100644 --- a/nova/availability_zones.py +++ b/nova/availability_zones.py @@ -165,7 +165,7 @@ def get_availability_zones(context, get_only_available=False, def get_instance_availability_zone(context, instance): """Return availability zone of specified instance.""" host = instance.host if 'host' in instance else None - if not host: + if not host or (host and instance.availability_zone): # Likely hasn't reached a viable compute node yet so give back the # desired availability_zone in the instance record if the boot request # specified one. This would also fix #5 in our up-call list: https://docs.openstack.org/nova/latest/user/cellsv2-layout.html #operations-requiring-upcalls ** Affects: nova Importance: Medium Status: Triaged ** Affects: nova/pike Importance: Medium Status: Triaged ** Affects: nova/queens Importance: Medium Status: Triaged ** Tags: cells cinder compute upcall ** Also affects: nova/queens Importance: Undecided Status: New ** Also affects: nova/pike Importance: Undecided Status: New ** Changed in: nova/pike Status: New => Triaged ** Changed in: nova/queens Status: New => Triaged ** Changed in: nova/queens Importance: Undecided => Medium ** Changed in: nova/pike Importance: Undecided => Medium -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1781421 Title: CantStartEngineError due to host aggregate up-call when boot from volume and [cinder]/cross_az_attach=False Status in OpenStack Compute (nova): Triaged Status in OpenStack Compute (nova) pike series: Triaged Status in OpenStack Compute (nova) queens series: Triaged Bug description: This is semi-related to bug 1497253 but I found it while triaging that bug to see if it was still an issue since Pike (I don't think it is). If you run devstack with default superconductor mode configuration, and configure nova-cpu.conf with: [cinder] cross_az_attach=False Then try to boot from volume where nova-compute creates the volume, it fails with CantStartEngineError because the cell conductor (n-cond- cell1.service) is not configured to reach the API DB to get host aggregate information. Here is a nova boot command to recreate: $ nova boot --flavor cirros256 --block-device id=e642acfd-4283-458a- b7ea- 6c316da3b2ce,source=image,dest=volume,shutdown=remove,size=1,bootindex=0 --poll test-bfv Where the block device id is the uuid of the cirros image in the devstack env. This is the failure in the nova-compute logs: http://paste.openstack.org/show/725723/ 972-4b14-93ad-e7b86edc3a26 service nova] [instance: 910509b9-e23a-4b40-bb42-0df7b65bb36e] Getting AZ for instance; instance.host: rocky; instance.availabilty_zone: nova 3-c972-4b14-93ad-e7b86edc3a26 service nova] [instance: 910509b9-e23a-4b40-bb42-0df7b65bb36e] Instance failed block device setup: RemoteError: Remote error: CantStartEngineEr File "/opt/stack/nova/nova/conductor/manager.py", line 124, in _object_dispatch\n return getattr(target, method)(*args, **kwargs)\n', u' File "/usr/local/lib/python2.7 b9-e23a-4b40-bb42-0df7b65bb36e] Traceback (most recent call last): b9-e23a-4b40-bb42-0df7b65bb36e] File "/opt/stack/nova/nova/compute/manager.py", line 1564, in _prep_block_device b9-e23a-4b40-bb42-0df7b65bb36e] wait_func=self._await_block_device_map_created) b9-e23a-4b40-bb42-0df7b65bb36e] File "/opt/stack/nova/nova/virt/block_device.py", line 854, in attach_block_devices b9-e23a-4b40-bb42-0df7b65bb36e] _log_and_attach(device) b9-e23a-4b40-bb42-0df7b65bb36e] File "/opt/stack/nova/nova/virt/block_device.py", line 851, in _log_and_attach b9-e23a-4b40-bb42-0df7b65bb36e] bdm.attach(*attach_args, **attach_kwargs) b9-e23a-4b40-bb42-0df7b65bb36e] File "/opt/stack/nova/nova/virt/block_device.py", line 747, in attach b9-e23a-4b40-bb42-0df7b65bb36e] context, instance, volume_api, virt_driver) b9-e23a-4b40-bb42-0df7b65bb36e] File "/opt/stack/nova/nova/virt/block_device.py", line 46, in wrapped b9-e23a-4b40-bb42-0df7b65bb36e] ret_val = method(obj, context, *args, **kwargs) b9-e23a-4b40-bb42-0df7b65bb36e] File "/opt/stack/nova/nova/virt/block_device.py", line 623, in attach b9-e23a-4b40-bb42-0df7b65bb36e] instance=instance) b9-e23a-4b40-bb42-0df7b65bb36e] File "/opt/stack/nova/nova/volume/cinder.py", line 504, in check_availability_zone b9-e23a-4b40-bb42-0df7b65bb36e] instance_az = az.get_instance_availability_zone(context, instance) b9-e23a-4b40-bb42-0df7b65bb36e] File "/opt/stack/nova/nova/availability_zones.py", line 194, in get_instance_availability_zone b9-e23a-4b40-bb42-0df7b65bb36e] az = get_host_availability_zone(elevated, host) b9-e23a-4b40-bb42-0df7b65bb36e] File "/opt/stack/nova/nova/availability_zones.py", line 95, in get_host_availability_zone b9-e23a-4b40-bb42-0df7b65bb36e] key='availability_zone') b9-e23a-4b40-bb42-0df7b65bb36e] File "/usr/local/lib/python2.7/dist-packages/oslo_versionedobjects/base.py", line 177, in wrapper b9-e23a-4b40-bb42-0df7b65bb36e] args, kwargs) b9-e23a-4b40-bb42-0df7b65bb36e] File "/opt/stack/nova/nova/conductor/rpcapi.py", line 241, in object_class_action_versions b9-e23a-4b40-bb42-0df7b65bb36e] args=args, kwargs=kwargs) b9-e23a-4b40-bb42-0df7b65bb36e] File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/client.py", line 179, in call b9-e23a-4b40-bb42-0df7b65bb36e] retry=self.retry) b9-e23a-4b40-bb42-0df7b65bb36e] File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/transport.py", line 133, in _send b9-e23a-4b40-bb42-0df7b65bb36e] retry=retry) b9-e23a-4b40-bb42-0df7b65bb36e] File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 584, in send b9-e23a-4b40-bb42-0df7b65bb36e] call_monitor_timeout, retry=retry) b9-e23a-4b40-bb42-0df7b65bb36e] File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 575, in _send b9-e23a-4b40-bb42-0df7b65bb36e] raise result b9-e23a-4b40-bb42-0df7b65bb36e] RemoteError: Remote error: CantStartEngineError No sql_connection parameter is established b9-e23a-4b40-bb42-0df7b65bb36e] [u'Traceback (most recent call last):\n', u' File "/opt/stack/nova/nova/conductor/manager.py", line 124, in _object_dispatch\n return get b9-e23a-4b40-bb42-0df7b65bb36e] The logging at the start is my own for debug: 972-4b14-93ad-e7b86edc3a26 service nova] [instance: 910509b9-e23a- 4b40-bb42-0df7b65bb36e] Getting AZ for instance; instance.host: rocky; instance.availabilty_zone: nova But it shows that the instance.host and instance.availability_zone are set. The instance.host gets set by the instance_claim in the resource tracker and the instance.availability_zone get set by conductor at the top in the schedule_and_build_instances method due to this change in pike: https://review.openstack.org/#/c/446053/ So all I have to do to avoid the up-call is this: diff --git a/nova/availability_zones.py b/nova/availability_zones.py index 7c8d948..f128d8e 100644 --- a/nova/availability_zones.py +++ b/nova/availability_zones.py @@ -165,7 +165,7 @@ def get_availability_zones(context, get_only_available=False, def get_instance_availability_zone(context, instance): """Return availability zone of specified instance.""" host = instance.host if 'host' in instance else None - if not host: + if not host or (host and instance.availability_zone): # Likely hasn't reached a viable compute node yet so give back the # desired availability_zone in the instance record if the boot request # specified one. This would also fix #5 in our up-call list: https://docs.openstack.org/nova/latest/user/cellsv2-layout.html #operations-requiring-upcalls To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1781421/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp