Public bug reported: The num_pcie_ports libvirt option defines the total number of available PCIe slots for an instance to hotplug using the q35 hardware machine type.
https://docs.openstack.org/nova/latest/configuration/config.html#:~:text=ignored%20by%20nova.-,num_pcie_ports,-%C2%B6 Since both volume attachments and virtual NICs (Neutron ports) consume PCIe slots or precisely pcie-root-port, the "max_disk_devices_to_attach" configuration option is suboptimal because it doesn't account for the NICs/Ports attached to the VM. https://docs.openstack.org/nova/latest/configuration/config.html#:~:text=means%20no%20limit.-,max_disk_devices_to_attach,-%C2%B6 This can lead to a resource allocation issue and config setting which will never be applied correctly. For example, consider the following configuration: num_pcie_ports = 19 max_disk_devices_to_attach = 15 A user could create a VM with 5 Ports and then attach 14 volumes, consuming all 19 available PCIe slots. If they then try to attach another volume, libvirt will deny the request and raise a "No more available PCI slots exception". Crucially, OpenStack doesn't inform the user with a HTTP 500 or 403 that the volume attachment is failing due to a lack of available PCIe slots, which causes confusion. In this scenario, the "max_disk_devices_to_attach" limit can't be even reached if the VM is configured with more than 5 Ports, as the instance runs out of PCIe slots first. This silent failure only applies to volume attachments. Attempting to add another Port for example returns a "500 Failed to attach network adapter device error". However, this message also obscures the root cause of the failure, as it doesn't expose the underlying libvirt exception. We created a patch that checks for available PCIe ports during both volume and network interface attachments. This check respects the max_disk_devices_to_attach configuration option. Ideally, the num_pcie_ports configuration should define the actual limit for attachable PCIe devices. However, in our QEMU + Libvirt environment, this setting is unreliable. For example, when num_pcie_ports is set to the default maximum of 28, the instance only has 25 available PCIe ports. For some unknown reason, three ports are always missing. This discrepancy causes the instance to run out of PCIe slots before the attachment limit is ever reached, reintroducing the original problem. ** Affects: nova Importance: Undecided Status: New ** Tags: cinder config libvirt neutron volumes ** Description changed: The num_pcie_ports libvirt option defines the total number of available PCIe slots for an instance to hotplug using the q35 hardware machine type. https://docs.openstack.org/nova/latest/configuration/config.html#:~:text=ignored%20by%20nova.-,num_pcie_ports,-%C2%B6 Since both volume attachments and virtual NICs (Neutron ports) consume PCIe slots or precisely pcie-root-port, the "max_disk_devices_to_attach" configuration option is suboptimal because it doesn't account for the NICs/Ports attached to the VM. + https://docs.openstack.org/nova/latest/configuration/config.html#:~:text=means%20no%20limit.-,max_disk_devices_to_attach,-%C2%B6 + This can lead to a resource allocation issue and config setting which will never be applied correctly. For example, consider the following configuration: num_pcie_ports = 19 max_disk_devices_to_attach = 15 A user could create a VM with 5 Ports and then attach 14 volumes, consuming all 19 available PCIe slots. - If they then try to attach another volume, libvirt will deny the request and raise a + If they then try to attach another volume, libvirt will deny the request and raise a "No more available PCI slots exception". Crucially, OpenStack doesn't inform the user with a HTTP 500 or 403 that the volume attachment is failing due to a lack of available PCIe slots, which causes confusion. In this scenario, the "max_disk_devices_to_attach" limit can't be even reached if the VM is configured with more than 5 Ports, as the instance runs out of PCIe slots first. This silent failure only applies to volume attachments. Attempting to add another Port for example returns a "500 Failed to attach network adapter device error". However, this message also obscures the root cause of the failure, as it doesn't expose the underlying libvirt exception. - - We created a patch that checks for available PCIe ports during both volume and network interface attachments. This check respects the max_disk_devices_to_attach configuration option. + We created a patch that checks for available PCIe ports during both + volume and network interface attachments. This check respects the + max_disk_devices_to_attach configuration option. Ideally, the num_pcie_ports configuration should define the actual limit for attachable PCIe devices. However, in our QEMU + Libvirt environment, this setting is unreliable. For example, when num_pcie_ports is set to the default maximum of 28, the instance only has 25 available PCIe ports. For some unknown reason, three ports are always missing. This discrepancy causes the instance to run out of PCIe slots before the attachment limit is ever reached, reintroducing the original problem. -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/2117481 Title: disk and interfaces not handling the pcie device limit for q35 Status in OpenStack Compute (nova): New Bug description: The num_pcie_ports libvirt option defines the total number of available PCIe slots for an instance to hotplug using the q35 hardware machine type. https://docs.openstack.org/nova/latest/configuration/config.html#:~:text=ignored%20by%20nova.-,num_pcie_ports,-%C2%B6 Since both volume attachments and virtual NICs (Neutron ports) consume PCIe slots or precisely pcie-root-port, the "max_disk_devices_to_attach" configuration option is suboptimal because it doesn't account for the NICs/Ports attached to the VM. https://docs.openstack.org/nova/latest/configuration/config.html#:~:text=means%20no%20limit.-,max_disk_devices_to_attach,-%C2%B6 This can lead to a resource allocation issue and config setting which will never be applied correctly. For example, consider the following configuration: num_pcie_ports = 19 max_disk_devices_to_attach = 15 A user could create a VM with 5 Ports and then attach 14 volumes, consuming all 19 available PCIe slots. If they then try to attach another volume, libvirt will deny the request and raise a "No more available PCI slots exception". Crucially, OpenStack doesn't inform the user with a HTTP 500 or 403 that the volume attachment is failing due to a lack of available PCIe slots, which causes confusion. In this scenario, the "max_disk_devices_to_attach" limit can't be even reached if the VM is configured with more than 5 Ports, as the instance runs out of PCIe slots first. This silent failure only applies to volume attachments. Attempting to add another Port for example returns a "500 Failed to attach network adapter device error". However, this message also obscures the root cause of the failure, as it doesn't expose the underlying libvirt exception. We created a patch that checks for available PCIe ports during both volume and network interface attachments. This check respects the max_disk_devices_to_attach configuration option. Ideally, the num_pcie_ports configuration should define the actual limit for attachable PCIe devices. However, in our QEMU + Libvirt environment, this setting is unreliable. For example, when num_pcie_ports is set to the default maximum of 28, the instance only has 25 available PCIe ports. For some unknown reason, three ports are always missing. This discrepancy causes the instance to run out of PCIe slots before the attachment limit is ever reached, reintroducing the original problem. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/2117481/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : [email protected] Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp

