I think we can consider this bug "things will break" resolved now that the patch landed to disable the bit that makes shared storage providers affect allocations. The work to finish proper support for shared storage providers will be tracked on its blueprint.
** Changed in: nova Status: Triaged => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1784020 Title: Shared storage providers are not supported and will break things if used Status in OpenStack Compute (nova): Fix Released Bug description: https://review.openstack.org/#/c/560459/ in Rocky changed the libvirt driver such that if the compute node provider is in a shared storage provider aggregate relationship (in the same aggregate with a resource provider that has DISK_GB inventory and the MISC_SHARES_VIA_AGGREGATE trait), the compute node provider won't report DISK_GB inventory. There are at least two major issues with this: 1. On upgrade from Queens, any existing allocations against the compute node provider's DISK_GB inventory will not allow removal of the DISK_GB inventory from the compute node provider during the update_available_resource periodic task. In other words, we have no data migration routine in place to move DISK_GB allocations from the compute node provider to the shared storage provider in Rocky. 2. During a move operation, we move the instance's allocations from the source compute node provider to the migration record, then go through the scheduler to pick a dest host for the instance and allocate resources against the dest host (and optionally shared storage provider). So: a) The DISK_GB allocation from the instance to the shared storage provider is deleted for a short window of time during scheduling until we pick a dest host. https://github.com/openstack/nova/blob/6be7f7248fb1c2bbb890a0a48a424e205e173c9c/nova/conductor/tasks/migrate.py#L57 b) If cold migrate fails or is reverted, we delete the allocations (created by the scheduler) and move the allocations from the migration record (against the source node provider) back to the instance, but because we failed to move the DISK_GB allocation against the sharing provider for the instance to the migration record, we've lost that DISK_GB allocation when copying it back to the instance on revert/failure: https://github.com/openstack/nova/blob/6be7f7248fb1c2bbb890a0a48a424e205e173c9c/nova/compute/manager.py#L4155 -- We could also have issues with how forced live migrate: https://github.com/openstack/nova/blob/6be7f7248fb1c2bbb890a0a48a424e205e173c9c/nova/conductor/tasks/live_migrate.py#L109 And evacuate: https://github.com/openstack/nova/blob/6be7f7248fb1c2bbb890a0a48a424e205e173c9c/nova/conductor/manager.py#L868 bypass the scheduler altogether so we're potentially not handling shared provider allocations there either. Also, we don't have *any* shared storage provider CI jobs setup. A start to that is here: https://review.openstack.org/#/c/586363/ But that's just a single-node job at the moment and we'd need a multi- node shared storage CI job to really say we support shared storage providers as a feature in nova. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1784020/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp