Public bug reported: I reproduced this on mitaka, but seems like master has the same issue
The following flavor was used: $ openstack flavor show medium-dedicated +----------------------------+--------------------------------------+ | Field | Value | +----------------------------+--------------------------------------+ | OS-FLV-DISABLED:disabled | False | | OS-FLV-EXT-DATA:ephemeral | 0 | | disk | 5 | | id | 745d4bbb-78b8-4b86-83bf-f009745cd9b8 | | name | medium-dedicated | | os-flavor-access:is_public | True | | properties | hw:cpu_policy='dedicated' | | ram | 512 | | rxtx_factor | 1.0 | | swap | | | vcpus | 4 | +----------------------------+--------------------------------------+ Instance image does not have any custom properties. The following traceback can be seen in the nova-compute during boot of an instance with this flavor: 2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [req-786c093f-c0cf-4146-b55e-6ba2527af8de b7d47d36ea5144df9635ec1c834efde7 336db1eb014b4a2399c70cfe29360493 - - -] [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] Instance failed to spawn 2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] Traceback (most recent call last): 2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2221, in _build_resources 2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] yield resources 2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2067, in _build_and_run_instance 2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] block_device_info=block_device_info) 2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 2811, in spawn 2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] write_to_disk=True) 2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 4829, in _get_guest_xml 2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] context) 2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 4635, in _get_guest_config 2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] instance.numa_topology, flavor, pci_devs, allowed_cpus, image_meta) 2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 4121, in _get_guest_numa_config 2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] pcpu = object_numa_cell.cpu_pinning[cpu] 2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] KeyError: 2 2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] Here is the topology configuration (virsh capabilities) of the host that causes trouble (done this to reproduce the issue): <topology> <cells num='2'> <cell id='0'> <memory unit='KiB'>10239384</memory> <pages unit='KiB' size='4'>2559846</pages> <pages unit='KiB' size='2048'>0</pages> <pages unit='KiB' size='1048576'>0</pages> <distances> <sibling id='0' value='10'/> <sibling id='1' value='20'/> </distances> <cpus num='6'> <cpu id='0' socket_id='0' core_id='0' siblings='0-1'/> <cpu id='1' socket_id='0' core_id='0' siblings='0-1'/> <cpu id='2' socket_id='1' core_id='0' siblings='2-3'/> <cpu id='3' socket_id='1' core_id='0' siblings='2-3'/> <cpu id='4' socket_id='2' core_id='0' siblings='4-5'/> <cpu id='5' socket_id='2' core_id='0' siblings='4-5'/> </cpus> </cell> <cell id='1'> <memory unit='KiB'>10321056</memory> <pages unit='KiB' size='4'>2580264</pages> <pages unit='KiB' size='2048'>0</pages> <pages unit='KiB' size='1048576'>0</pages> <distances> <sibling id='0' value='20'/> <sibling id='1' value='10'/> </distances> <cpus num='2'> <cpu id='6' socket_id='3' core_id='0' siblings='6-7'/> <cpu id='7' socket_id='3' core_id='0' siblings='6-7'/> </cpus> </cell> </cells> </topology> vcpu_pin_set = 1,3,4,5,6,7 in nova.conf In the nova database, host topology looks the following way (including only relevant fields): cell0 -- "cpuset": [1, 3, 4, 5], "pinned_cpus": [], "siblings": [[4, 5]] cell1 -- "cpuset": [6, 7], "pinned_cpus": [], "siblings": [[6, 7]] It is caused by the fact that during fitting the instance to host cell we consider avail_cpus, but not free_siblings, so when asking for 4 vcpus, we get to cell0, as there are 4 available. But the compute adds vcpu-pcpu mapping only for two available siblings, and when accessing the third one key error happens. Also we might need to add more info to the docs about the siblings, and what to include in vcpu_pin_set, so that people don't misconfigure things. ** Affects: nova Importance: Undecided Assignee: Vladyslav Drok (vdrok) Status: New ** Changed in: nova Assignee: (unassigned) => Vladyslav Drok (vdrok) ** Description changed: I reproduced this on mitaka, but seems like master has the same issue The following flavor was used: $ openstack flavor show medium-dedicated +----------------------------+--------------------------------------+ | Field | Value | +----------------------------+--------------------------------------+ | OS-FLV-DISABLED:disabled | False | | OS-FLV-EXT-DATA:ephemeral | 0 | | disk | 5 | | id | 745d4bbb-78b8-4b86-83bf-f009745cd9b8 | | name | medium-dedicated | | os-flavor-access:is_public | True | | properties | hw:cpu_policy='dedicated' | | ram | 512 | | rxtx_factor | 1.0 | | swap | | | vcpus | 4 | +----------------------------+--------------------------------------+ Instance image does not have any custom properties. The following traceback can be seen in the nova-compute during boot of an instance with this flavor: 2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [req-786c093f-c0cf-4146-b55e-6ba2527af8de b7d47d36ea5144df9635ec1c834efde7 336db1eb014b4a2399c70cfe29360493 - - -] [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] Instance failed to spawn 2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] Traceback (most recent call last): 2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2221, in _build_resources 2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] yield resources 2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2067, in _build_and_run_instance 2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] block_device_info=block_device_info) 2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 2811, in spawn 2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] write_to_disk=True) 2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 4829, in _get_guest_xml 2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] context) 2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 4635, in _get_guest_config 2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] instance.numa_topology, flavor, pci_devs, allowed_cpus, image_meta) 2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 4121, in _get_guest_numa_config 2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] pcpu = object_numa_cell.cpu_pinning[cpu] 2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] KeyError: 2 2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] Here is the topology configuration (virsh capabilities) of the host that causes trouble (done this to reproduce the issue): - <topology> - <cells num='2'> - <cell id='0'> - <memory unit='KiB'>10239384</memory> - <pages unit='KiB' size='4'>2559846</pages> - <pages unit='KiB' size='2048'>0</pages> - <pages unit='KiB' size='1048576'>0</pages> - <distances> - <sibling id='0' value='10'/> - <sibling id='1' value='20'/> - </distances> - <cpus num='6'> - <cpu id='0' socket_id='0' core_id='0' siblings='0-1'/> - <cpu id='1' socket_id='0' core_id='0' siblings='0-1'/> - <cpu id='2' socket_id='1' core_id='0' siblings='2-3'/> - <cpu id='3' socket_id='1' core_id='0' siblings='2-3'/> - <cpu id='4' socket_id='2' core_id='0' siblings='4-5'/> - <cpu id='5' socket_id='2' core_id='0' siblings='4-5'/> - </cpus> - </cell> - <cell id='1'> - <memory unit='KiB'>10321056</memory> - <pages unit='KiB' size='4'>2580264</pages> - <pages unit='KiB' size='2048'>0</pages> - <pages unit='KiB' size='1048576'>0</pages> - <distances> - <sibling id='0' value='20'/> - <sibling id='1' value='10'/> - </distances> - <cpus num='2'> - <cpu id='6' socket_id='3' core_id='0' siblings='6-7'/> - <cpu id='7' socket_id='3' core_id='0' siblings='6-7'/> - </cpus> - </cell> - </cells> - </topology> + <topology> + <cells num='2'> + <cell id='0'> + <memory unit='KiB'>10239384</memory> + <pages unit='KiB' size='4'>2559846</pages> + <pages unit='KiB' size='2048'>0</pages> + <pages unit='KiB' size='1048576'>0</pages> + <distances> + <sibling id='0' value='10'/> + <sibling id='1' value='20'/> + </distances> + <cpus num='6'> + <cpu id='0' socket_id='0' core_id='0' siblings='0-1'/> + <cpu id='1' socket_id='0' core_id='0' siblings='0-1'/> + <cpu id='2' socket_id='1' core_id='0' siblings='2-3'/> + <cpu id='3' socket_id='1' core_id='0' siblings='2-3'/> + <cpu id='4' socket_id='2' core_id='0' siblings='4-5'/> + <cpu id='5' socket_id='2' core_id='0' siblings='4-5'/> + </cpus> + </cell> + <cell id='1'> + <memory unit='KiB'>10321056</memory> + <pages unit='KiB' size='4'>2580264</pages> + <pages unit='KiB' size='2048'>0</pages> + <pages unit='KiB' size='1048576'>0</pages> + <distances> + <sibling id='0' value='20'/> + <sibling id='1' value='10'/> + </distances> + <cpus num='2'> + <cpu id='6' socket_id='3' core_id='0' siblings='6-7'/> + <cpu id='7' socket_id='3' core_id='0' siblings='6-7'/> + </cpus> + </cell> + </cells> + </topology> vcpu_pin_set = 1,3,4,5,6,7 in nova.conf In the nova database, host topology looks the following way (including only relevant fields): cell0 -- "cpuset": [1, 3, 4, 5], "pinned_cpus": [], "siblings": [[4, 5]] cell1 -- "cpuset": [6, 7], "pinned_cpus": [], "siblings": [[6, 7]] It is caused by the fact that during fitting the instance to host cell we consider avail_cpus, but not free_siblings, so when asking for 4 - vcpus, the compute adds vcpu-pcpu mapping only for two available - siblings, and when accessing the third one key error happens. + vcpus, we get to cell0, as there are 4 available. But the compute adds + vcpu-pcpu mapping only for two available siblings, and when accessing + the third one key error happens. ** Description changed: I reproduced this on mitaka, but seems like master has the same issue The following flavor was used: $ openstack flavor show medium-dedicated +----------------------------+--------------------------------------+ | Field | Value | +----------------------------+--------------------------------------+ | OS-FLV-DISABLED:disabled | False | | OS-FLV-EXT-DATA:ephemeral | 0 | | disk | 5 | | id | 745d4bbb-78b8-4b86-83bf-f009745cd9b8 | | name | medium-dedicated | | os-flavor-access:is_public | True | | properties | hw:cpu_policy='dedicated' | | ram | 512 | | rxtx_factor | 1.0 | | swap | | | vcpus | 4 | +----------------------------+--------------------------------------+ Instance image does not have any custom properties. The following traceback can be seen in the nova-compute during boot of an instance with this flavor: 2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [req-786c093f-c0cf-4146-b55e-6ba2527af8de b7d47d36ea5144df9635ec1c834efde7 336db1eb014b4a2399c70cfe29360493 - - -] [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] Instance failed to spawn 2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] Traceback (most recent call last): 2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2221, in _build_resources 2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] yield resources 2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2067, in _build_and_run_instance 2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] block_device_info=block_device_info) 2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 2811, in spawn 2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] write_to_disk=True) 2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 4829, in _get_guest_xml 2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] context) 2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 4635, in _get_guest_config 2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] instance.numa_topology, flavor, pci_devs, allowed_cpus, image_meta) 2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 4121, in _get_guest_numa_config 2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] pcpu = object_numa_cell.cpu_pinning[cpu] 2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] KeyError: 2 2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] Here is the topology configuration (virsh capabilities) of the host that causes trouble (done this to reproduce the issue): <topology> <cells num='2'> <cell id='0'> <memory unit='KiB'>10239384</memory> <pages unit='KiB' size='4'>2559846</pages> <pages unit='KiB' size='2048'>0</pages> <pages unit='KiB' size='1048576'>0</pages> <distances> <sibling id='0' value='10'/> <sibling id='1' value='20'/> </distances> <cpus num='6'> <cpu id='0' socket_id='0' core_id='0' siblings='0-1'/> <cpu id='1' socket_id='0' core_id='0' siblings='0-1'/> <cpu id='2' socket_id='1' core_id='0' siblings='2-3'/> <cpu id='3' socket_id='1' core_id='0' siblings='2-3'/> <cpu id='4' socket_id='2' core_id='0' siblings='4-5'/> <cpu id='5' socket_id='2' core_id='0' siblings='4-5'/> </cpus> </cell> <cell id='1'> <memory unit='KiB'>10321056</memory> <pages unit='KiB' size='4'>2580264</pages> <pages unit='KiB' size='2048'>0</pages> <pages unit='KiB' size='1048576'>0</pages> <distances> <sibling id='0' value='20'/> <sibling id='1' value='10'/> </distances> <cpus num='2'> <cpu id='6' socket_id='3' core_id='0' siblings='6-7'/> <cpu id='7' socket_id='3' core_id='0' siblings='6-7'/> </cpus> </cell> </cells> </topology> vcpu_pin_set = 1,3,4,5,6,7 in nova.conf In the nova database, host topology looks the following way (including only relevant fields): cell0 -- "cpuset": [1, 3, 4, 5], "pinned_cpus": [], "siblings": [[4, 5]] cell1 -- "cpuset": [6, 7], "pinned_cpus": [], "siblings": [[6, 7]] It is caused by the fact that during fitting the instance to host cell we consider avail_cpus, but not free_siblings, so when asking for 4 vcpus, we get to cell0, as there are 4 available. But the compute adds vcpu-pcpu mapping only for two available siblings, and when accessing the third one key error happens. + + Also we might need to add more info to the docs about the siblings, and + what to include in vcpu_pin_set, so that people don't misconfigure + things. -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1776244 Title: KeyError during instance boot if vcpu_pin_set contains not all of the core siblings Status in OpenStack Compute (nova): New Bug description: I reproduced this on mitaka, but seems like master has the same issue The following flavor was used: $ openstack flavor show medium-dedicated +----------------------------+--------------------------------------+ | Field | Value | +----------------------------+--------------------------------------+ | OS-FLV-DISABLED:disabled | False | | OS-FLV-EXT-DATA:ephemeral | 0 | | disk | 5 | | id | 745d4bbb-78b8-4b86-83bf-f009745cd9b8 | | name | medium-dedicated | | os-flavor-access:is_public | True | | properties | hw:cpu_policy='dedicated' | | ram | 512 | | rxtx_factor | 1.0 | | swap | | | vcpus | 4 | +----------------------------+--------------------------------------+ Instance image does not have any custom properties. The following traceback can be seen in the nova-compute during boot of an instance with this flavor: 2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [req-786c093f-c0cf-4146-b55e-6ba2527af8de b7d47d36ea5144df9635ec1c834efde7 336db1eb014b4a2399c70cfe29360493 - - -] [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] Instance failed to spawn 2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] Traceback (most recent call last): 2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2221, in _build_resources 2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] yield resources 2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2067, in _build_and_run_instance 2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] block_device_info=block_device_info) 2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 2811, in spawn 2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] write_to_disk=True) 2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 4829, in _get_guest_xml 2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] context) 2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 4635, in _get_guest_config 2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] instance.numa_topology, flavor, pci_devs, allowed_cpus, image_meta) 2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 4121, in _get_guest_numa_config 2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] pcpu = object_numa_cell.cpu_pinning[cpu] 2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] KeyError: 2 2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] Here is the topology configuration (virsh capabilities) of the host that causes trouble (done this to reproduce the issue): <topology> <cells num='2'> <cell id='0'> <memory unit='KiB'>10239384</memory> <pages unit='KiB' size='4'>2559846</pages> <pages unit='KiB' size='2048'>0</pages> <pages unit='KiB' size='1048576'>0</pages> <distances> <sibling id='0' value='10'/> <sibling id='1' value='20'/> </distances> <cpus num='6'> <cpu id='0' socket_id='0' core_id='0' siblings='0-1'/> <cpu id='1' socket_id='0' core_id='0' siblings='0-1'/> <cpu id='2' socket_id='1' core_id='0' siblings='2-3'/> <cpu id='3' socket_id='1' core_id='0' siblings='2-3'/> <cpu id='4' socket_id='2' core_id='0' siblings='4-5'/> <cpu id='5' socket_id='2' core_id='0' siblings='4-5'/> </cpus> </cell> <cell id='1'> <memory unit='KiB'>10321056</memory> <pages unit='KiB' size='4'>2580264</pages> <pages unit='KiB' size='2048'>0</pages> <pages unit='KiB' size='1048576'>0</pages> <distances> <sibling id='0' value='20'/> <sibling id='1' value='10'/> </distances> <cpus num='2'> <cpu id='6' socket_id='3' core_id='0' siblings='6-7'/> <cpu id='7' socket_id='3' core_id='0' siblings='6-7'/> </cpus> </cell> </cells> </topology> vcpu_pin_set = 1,3,4,5,6,7 in nova.conf In the nova database, host topology looks the following way (including only relevant fields): cell0 -- "cpuset": [1, 3, 4, 5], "pinned_cpus": [], "siblings": [[4, 5]] cell1 -- "cpuset": [6, 7], "pinned_cpus": [], "siblings": [[6, 7]] It is caused by the fact that during fitting the instance to host cell we consider avail_cpus, but not free_siblings, so when asking for 4 vcpus, we get to cell0, as there are 4 available. But the compute adds vcpu-pcpu mapping only for two available siblings, and when accessing the third one key error happens. Also we might need to add more info to the docs about the siblings, and what to include in vcpu_pin_set, so that people don't misconfigure things. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1776244/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp