Re: vGPU support in CloudStack on Ubuntu KVM

Jayanth Reddy Sun, 13 Nov 2022 02:41:33 -0800

Hi,

This blog at https://lab.piszki.pl/cloudstack-kvm-and-running-vm-with-vgpu/
helped us a lot. Thanks to Piotr Pisz, he's active in the CloudStack
community as well.
Below are the steps we followed to make this work. It doesn't show up in
the GPU count in the zone or somewhere else on CloudStack.
We've got Intel Processors btw. If yours is AMD, please make appropriate
changes.


1. Enable VT-d at the BIOS.
2. Since the latest Kernel which comes with Ubuntu has vfio drivers as
in-built Kernel Modules, there is no need to additionally load them.
3. We have 2 GPU 2080 Ti cards inserted on the server and are as follows,

*37:00.0 VGA compatible controller [0300]: NVIDIA Corporation TU102
[GeForce RTX 2080 Ti Rev. A] [10de:1e07] (rev a1)*

* Subsystem: NVIDIA Corporation TU102 [GeForce RTX 2080 Ti Rev. A]
[10de:12fa]*

*37:00.1 Audio device [0403]: NVIDIA Corporation TU102 High Definition
Audio Controller [10de:10f7] (rev a1)*

* Subsystem: NVIDIA Corporation TU102 High Definition Audio Controller
[10de:12fa]*

*37:00.2 USB controller [0c03]: NVIDIA Corporation TU102 USB 3.1 Host
Controller [10de:1ad6] (rev a1)*

* Subsystem: NVIDIA Corporation TU102 USB 3.1 Host Controller [10de:12fa]*

*37:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU102 USB Type-C
UCSI Controller [10de:1ad7] (rev a1)*

* Subsystem: NVIDIA Corporation TU102 USB Type-C UCSI Controller
[10de:12fa]*

*86:00.0 VGA compatible controller [0300]: NVIDIA Corporation TU102
[GeForce RTX 2080 Ti Rev. A] [10de:1e07] (rev a1)*

* Subsystem: NVIDIA Corporation TU102 [GeForce RTX 2080 Ti Rev. A]
[10de:12fa]*

*86:00.1 Audio device [0403]: NVIDIA Corporation TU102 High Definition
Audio Controller [10de:10f7] (rev a1)*

* Subsystem: NVIDIA Corporation TU102 High Definition Audio Controller
[10de:12fa]*

*86:00.2 USB controller [0c03]: NVIDIA Corporation TU102 USB 3.1 Host
Controller [10de:1ad6] (rev a1)*

* Subsystem: NVIDIA Corporation TU102 USB 3.1 Host Controller [10de:12fa]*

*86:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU102 USB Type-C
UCSI Controller [10de:1ad7] (rev a1)*

* Subsystem: NVIDIA Corporation TU102 USB Type-C UCSI Controller
[10de:12fa]*


4. Add GRUB parameters in */etc/default/grub* as


*GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt
modprobe.blacklist=xhci_hcd
vfio-pci.ids=10de:1e07,10de:10f7,10de:1ad6,10de:1ad7"*


You may also do this by placing relevant items under */etc/modprobe.d/vfio.conf
*or similar


5. Blacklist any additional drivers on the host that take-over the GPU at
*/etc/modprobe.d/blacklist-nvidia.conf*


$ cat /etc/modprobe.d/blacklist-nvidia.conf

blacklist nouveau

blacklist nvidia

blacklist xhci_hcd


6. Do *# update-initramfs -u -k all *and *# update-grub2 *and reboot the
server.


7. Verify IOMMU is enabled and working # dmesg -T | grep -iE "iommu|dmar"


8. One important thing is to make sure all devices are bound to the
vfio-pci driver. In our case, *xhci_pci *driver was taking over the "USB
3.1 Host Controller". If this is the case, QEMU fails to spawn stating
"make sure all devices in the IOMMU group are bound to the vfio driver". To
tackle this, we'd write a script to properly unbind xhci_pci and use
vfio-pci as the driver at boot. So, we wrote,


*cat /etc/initramfs-tools/scripts/init-top/vfio.sh*

*#!/bin/sh*


*PREREQ=""*


*prereqs()*

*{*

*   echo "$PREREQ"*

*}*


*case $1 in*

*prereqs)*

*   prereqs*

*   exit 0*

*   ;;*

*esac*


*for dev in 0000:37:00.2 0000:86:00.2*

*do *

*  echo -n "$dev" > /sys/bus/pci/drivers/xhci_hcd/unbind*

*  echo -n "$dev" > /sys/bus/pci/drivers/vfio-pci/bind*

*done*


*exit 0*


Then # update-initramfs -u -k all


9. Once everything is good, go ahead and reboot the machine, and your
devices should look something like the below, the *vfio-pci* kernel module
should be the "Kernel driver in use".


37:00.0 VGA compatible controller [0300]: NVIDIA Corporation TU102 [GeForce
RTX 2080 Ti Rev. A] [10de:1e07] (rev a1)

Subsystem: NVIDIA Corporation TU102 [GeForce RTX 2080 Ti Rev. A] [10de:12fa]

Kernel driver in use: vfio-pci

Kernel modules: nouveau

37:00.1 Audio device [0403]: NVIDIA Corporation TU102 High Definition Audio
Controller [10de:10f7] (rev a1)

Subsystem: NVIDIA Corporation TU102 High Definition Audio Controller
[10de:12fa]

Kernel driver in use: vfio-pci

Kernel modules: snd_hda_intel

37:00.2 USB controller [0c03]: NVIDIA Corporation TU102 USB 3.1 Host
Controller [10de:1ad6] (rev a1)

Subsystem: NVIDIA Corporation TU102 USB 3.1 Host Controller [10de:12fa]

Kernel driver in use: vfio-pci

Kernel modules: xhci_pci

37:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU102 USB Type-C
UCSI Controller [10de:1ad7] (rev a1)

Subsystem: NVIDIA Corporation TU102 USB Type-C UCSI Controller [10de:12fa]

Kernel driver in use: vfio-pci


86:00.0 VGA compatible controller [0300]: NVIDIA Corporation TU102 [GeForce
RTX 2080 Ti Rev. A] [10de:1e07] (rev a1)

Subsystem: NVIDIA Corporation TU102 [GeForce RTX 2080 Ti Rev. A] [10de:12fa]

Kernel driver in use: vfio-pci

Kernel modules: nouveau

86:00.1 Audio device [0403]: NVIDIA Corporation TU102 High Definition Audio
Controller [10de:10f7] (rev a1)

Subsystem: NVIDIA Corporation TU102 High Definition Audio Controller
[10de:12fa]

Kernel driver in use: vfio-pci

Kernel modules: snd_hda_intel

86:00.2 USB controller [0c03]: NVIDIA Corporation TU102 USB 3.1 Host
Controller [10de:1ad6] (rev a1)

Subsystem: NVIDIA Corporation TU102 USB 3.1 Host Controller [10de:12fa]

Kernel driver in use: vfio-pci

Kernel modules: xhci_pci

86:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU102 USB Type-C
UCSI Controller [10de:1ad7] (rev a1)

Subsystem: NVIDIA Corporation TU102 USB Type-C UCSI Controller [10de:12fa]

Kernel driver in use: vfio-pci


10. Go to CloudStack, create a VM as per the normal workflow, then stop it
and insert the additional settings as below. This can be done from the
WebUI and API as well.


extraconfig-1
<devices> <hostdev mode='subsystem' type='pci' managed='yes'> <driver
name='vfio' /> <source> <address domain='0x0000' bus='0x37' slot='0x00'
function='0x0' /> </source> </hostdev> </devices>

extraconfig-2
*<devices> <hostdev mode='subsystem' type='pci' managed='yes'> <driver
name='vfio' /> <source> <address domain='0x0000' bus='0x86' slot='0x00'
function='0x0' /> </source> </hostdev> </devices>*


Replace the device address as per in your environment. We'd have to put 2
extraconfigs as we have 2 GPUs. Even extraconfig-1 can accomodate for both
devices as well.


11. Once done, schedule the VM on the host, CloudStack will compose the XML
with the additional configuration (in extraconifg-n) specified, and the
qemu process for that VM will have additional arguments as


-device vfio-pci,host=0000:37:00.0,id=hostdev0,bus=pci.0,addr=0x6 -device
vfio-pci,host=0000:86:00.0,id=hostdev1,bus=pci.0,addr=0x7



12. Once the VM is up, check using *lspci *or similar and finally install
nvidia-smi and confirm everything is working fine.



Thanks

On Sun, Nov 13, 2022 at 3:25 PM Alex Mattioli <alex.matti...@shapeblue.com>
wrote:

> Hi Jay,
> I'd love to hear more about how you implemented the GPU pass-through, and
> I think it could be quite useful for the community as well.
>
> Cheers
> Alex
>
>
>
>
> -----Original Message-----
> From: Jayanth Reddy <jayanthreddy5...@gmail.com>
> Sent: 13 November 2022 10:43
> To: users@cloudstack.apache.org
> Cc: Emil Karlsson <emi...@kth.se>
> Subject: Re: vGPU support in CloudStack on Ubuntu KVM
>
> Hi,
>     AFAIK, vGPU and GPU are only supported on Xen Hypervisor as per
> https://cwiki.apache.org/confluence/display/CLOUDSTACK/GPU+and+vGPU+support+for+CloudStack+Guest+VMs
> .
> Not sure about the vGPU but we managed to do a full GPU passthrough to a
> VM running on Ubuntu KVM Host. Interested to discuss further?
>
> Thanks
>
>
> On Wed, Oct 12, 2022 at 6:40 PM Pierre Le Fevre <pierr...@kth.se> wrote:
>
> > Hi all,
> > I am currently trying to get vGPU to work in some of our VMs in
> > cloudstack to enable GPU acceleration in Jupyter Notebooks.
> > Our current setup is using CloudStack 4.17.1.0 on Ubuntu 20.04 with
> > KVM as a hypervisor.
> > It seems like there is some support for vGPU in CloudStack but I can't
> > find proper documentation about compatibility with newer GPUs and
> > other hypervisors.
> >
> > I've tried installing all the proper drivers on the host machine with
> > a NVIDIA A6000 and it shows up properly in nvidia-smi. From the docs
> > available it seems like it should show up in the UI under the host
> > after adding it to our cluster, but no GPU appears. The dashboard also
> > reports 0 GPUs in the zone.
> >
> > Is this a limitation of KVM, or have some of you gotten this setup to
> work?
> >
> > All the best
> > Pierre
> >
>

Re: vGPU support in CloudStack on Ubuntu KVM

Reply via email to