Public bug reported: If you want to provide a flavor with "resources:VGPU=2" (or more) and have compute nodes using nvidia cards (ie. having PCI devices that have a 16-bit vendor ID of "10de"), then QEMU throws an exception that is due to the nvidia driver not supporting more than 1 IOMMU group per guest.
libvirtError: internal error: qemu unexpectedly closed the monitor: 2018-03-22T13:14:39.272301Z qemu-kvm: -device vfio-pci,id=hostdev0,sysfsdev=/sys/bus/mdev/devices/c949168d-d04d-4e74-925a-c38f3be11df5,bus=pci.0,addr=0x5: vfio warning: c949168d-d04d-4e74-925a-c38f3be11df5: Could not enable error recovery for the device 2018-03-22T13:14:39.273759Z qemu-kvm: -device vfio-pci,id=hostdev1,sysfsdev=/sys/bus/mdev/devices/f508c6d0-f859-4fa2-8976-94940e917709,bus=pci.0,addr=0x6: vfio error: f508c6d0-f859-4fa2-8976-94940e917709: error getting device from group 1: Operation not permitted Verify all devices in group 1 are bound to vfio-<bus> or pci-stub and not already in use Accordingly to that limitation, Nova should limit the maximum unit of possible resources per allocation depending on the PCI device vendor ID. ** Affects: nova Importance: Low Assignee: Sylvain Bauza (sylvain-bauza) Status: Triaged ** Tags: placement vgpu -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1758086 Title: nvidia driver limits to one single GPU per guest Status in OpenStack Compute (nova): Triaged Bug description: If you want to provide a flavor with "resources:VGPU=2" (or more) and have compute nodes using nvidia cards (ie. having PCI devices that have a 16-bit vendor ID of "10de"), then QEMU throws an exception that is due to the nvidia driver not supporting more than 1 IOMMU group per guest. libvirtError: internal error: qemu unexpectedly closed the monitor: 2018-03-22T13:14:39.272301Z qemu-kvm: -device vfio-pci,id=hostdev0,sysfsdev=/sys/bus/mdev/devices/c949168d-d04d-4e74-925a-c38f3be11df5,bus=pci.0,addr=0x5: vfio warning: c949168d-d04d-4e74-925a-c38f3be11df5: Could not enable error recovery for the device 2018-03-22T13:14:39.273759Z qemu-kvm: -device vfio-pci,id=hostdev1,sysfsdev=/sys/bus/mdev/devices/f508c6d0-f859-4fa2-8976-94940e917709,bus=pci.0,addr=0x6: vfio error: f508c6d0-f859-4fa2-8976-94940e917709: error getting device from group 1: Operation not permitted Verify all devices in group 1 are bound to vfio-<bus> or pci-stub and not already in use Accordingly to that limitation, Nova should limit the maximum unit of possible resources per allocation depending on the PCI device vendor ID. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1758086/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp