Public bug reported: I'm running into an issue with kilo-3 that I think is present in current trunk.
I think there is a race between the claimed CPUs of an instance being persisted to the DB, and the resource audit scanning the DB for instances and subtracting pinned CPUs from the list of available CPUs. The problem only shows up when the following sequence happens: 1) instance A (with dedicated cpus) boots on a compute node 2) resource audit runs on that compute node 3) instance B (with dedicated cpus) boots on the same compute node So you need to either be booting many instances, or limiting the valid compute nodes (host aggregate or server groups), or have a small cluster in order to hit this. The nitty-gritty view looks like this: When booting up an instance we hold the COMPUTE_RESOURCE_SEMAPHORE in compute.resource_tracker.ResourceTracker.instance_claim() and that covers updating the resource usage on the compute node. But we don't persist the instance numa topology to the database until after instance_claim() returns, in compute.manager.ComputeManager._build_instance(). Note that this is done *after* we've given up the semaphore, so there is no longer any sort of ordering guarantee. compute.resource_tracker.ResourceTracker.update_available_resource() then aquires COMPUTE_RESOURCE_SEMAPHORE, queries the database for a list of instances and uses that to calculate a new view of what resources are available. If the numa topology of the most recent instance hasn't been persisted yet, then the new view of resources won't include any pCPUs pinned by that instance. compute.manager.ComputeManager._build_instance() runs for the next instance and based on the new view of available resources it allocates the same pCPU(s) used by the earlier instance. Boom, overlapping pinned pCPUs. ** Affects: nova Importance: Undecided Status: New ** Tags: compute -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1454451 Title: simultaneous boot of multiple instances leads to cpu pinning overlap Status in OpenStack Compute (Nova): New Bug description: I'm running into an issue with kilo-3 that I think is present in current trunk. I think there is a race between the claimed CPUs of an instance being persisted to the DB, and the resource audit scanning the DB for instances and subtracting pinned CPUs from the list of available CPUs. The problem only shows up when the following sequence happens: 1) instance A (with dedicated cpus) boots on a compute node 2) resource audit runs on that compute node 3) instance B (with dedicated cpus) boots on the same compute node So you need to either be booting many instances, or limiting the valid compute nodes (host aggregate or server groups), or have a small cluster in order to hit this. The nitty-gritty view looks like this: When booting up an instance we hold the COMPUTE_RESOURCE_SEMAPHORE in compute.resource_tracker.ResourceTracker.instance_claim() and that covers updating the resource usage on the compute node. But we don't persist the instance numa topology to the database until after instance_claim() returns, in compute.manager.ComputeManager._build_instance(). Note that this is done *after* we've given up the semaphore, so there is no longer any sort of ordering guarantee. compute.resource_tracker.ResourceTracker.update_available_resource() then aquires COMPUTE_RESOURCE_SEMAPHORE, queries the database for a list of instances and uses that to calculate a new view of what resources are available. If the numa topology of the most recent instance hasn't been persisted yet, then the new view of resources won't include any pCPUs pinned by that instance. compute.manager.ComputeManager._build_instance() runs for the next instance and based on the new view of available resources it allocates the same pCPU(s) used by the earlier instance. Boom, overlapping pinned pCPUs. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1454451/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp