Re: vGPU support in CloudStack on Ubuntu KVM

Pierre Le Fevre Tue, 15 Nov 2022 01:37:52 -0800

Hi,
That seems to confirm our suspicion, we'll try to get it working with
XenServer, but it seems like 8.2 is required for RTX A6000, so it will
probably take a while with sales teams etc.


Our current config is quite similar, where a couple beefy VMs have
some Quadro RTX 5000 passed through for acceleration but sharing is quite
limited. It's based on the blog from Pisz as well as this one:
https://mathiashueber.com/pci-passthrough-ubuntu-2004-virtual-machine/

I've attached our internal guide for how this is done in case someone is
interested!

I'll update you guys if we can get it to work, it seems like there aren't a
lot of docs on this other than the confluence ticket. Unfortunate that it's
locked behind vGPU and Citrix licenses!

Have a good CCC :)

Pierre

On Sun, Nov 13, 2022 at 11:41 AM Jayanth Reddy <jayanthreddy5...@gmail.com>
wrote:

> Hi,
>
> This blog at
> https://lab.piszki.pl/cloudstack-kvm-and-running-vm-with-vgpu/
> helped us a lot. Thanks to Piotr Pisz, he's active in the CloudStack
> community as well.
> Below are the steps we followed to make this work. It doesn't show up in
> the GPU count in the zone or somewhere else on CloudStack.
> We've got Intel Processors btw. If yours is AMD, please make appropriate
> changes.
>
> 1. Enable VT-d at the BIOS.
> 2. Since the latest Kernel which comes with Ubuntu has vfio drivers as
> in-built Kernel Modules, there is no need to additionally load them.
> 3. We have 2 GPU 2080 Ti cards inserted on the server and are as follows,
>
> *37:00.0 VGA compatible controller [0300]: NVIDIA Corporation TU102
> [GeForce RTX 2080 Ti Rev. A] [10de:1e07] (rev a1)*
>
> * Subsystem: NVIDIA Corporation TU102 [GeForce RTX 2080 Ti Rev. A]
> [10de:12fa]*
>
> *37:00.1 Audio device [0403]: NVIDIA Corporation TU102 High Definition
> Audio Controller [10de:10f7] (rev a1)*
>
> * Subsystem: NVIDIA Corporation TU102 High Definition Audio Controller
> [10de:12fa]*
>
> *37:00.2 USB controller [0c03]: NVIDIA Corporation TU102 USB 3.1 Host
> Controller [10de:1ad6] (rev a1)*
>
> * Subsystem: NVIDIA Corporation TU102 USB 3.1 Host Controller [10de:12fa]*
>
> *37:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU102 USB Type-C
> UCSI Controller [10de:1ad7] (rev a1)*
>
> * Subsystem: NVIDIA Corporation TU102 USB Type-C UCSI Controller
> [10de:12fa]*
>
> *86:00.0 VGA compatible controller [0300]: NVIDIA Corporation TU102
> [GeForce RTX 2080 Ti Rev. A] [10de:1e07] (rev a1)*
>
> * Subsystem: NVIDIA Corporation TU102 [GeForce RTX 2080 Ti Rev. A]
> [10de:12fa]*
>
> *86:00.1 Audio device [0403]: NVIDIA Corporation TU102 High Definition
> Audio Controller [10de:10f7] (rev a1)*
>
> * Subsystem: NVIDIA Corporation TU102 High Definition Audio Controller
> [10de:12fa]*
>
> *86:00.2 USB controller [0c03]: NVIDIA Corporation TU102 USB 3.1 Host
> Controller [10de:1ad6] (rev a1)*
>
> * Subsystem: NVIDIA Corporation TU102 USB 3.1 Host Controller [10de:12fa]*
>
> *86:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU102 USB Type-C
> UCSI Controller [10de:1ad7] (rev a1)*
>
> * Subsystem: NVIDIA Corporation TU102 USB Type-C UCSI Controller
> [10de:12fa]*
>
>
> 4. Add GRUB parameters in */etc/default/grub* as
>
>
> *GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt
> modprobe.blacklist=xhci_hcd
> vfio-pci.ids=10de:1e07,10de:10f7,10de:1ad6,10de:1ad7"*
>
>
> You may also do this by placing relevant items under
> */etc/modprobe.d/vfio.conf
> *or similar
>
>
> 5. Blacklist any additional drivers on the host that take-over the GPU at
> */etc/modprobe.d/blacklist-nvidia.conf*
>
>
> $ cat /etc/modprobe.d/blacklist-nvidia.conf
>
> blacklist nouveau
>
> blacklist nvidia
>
> blacklist xhci_hcd
>
>
> 6. Do *# update-initramfs -u -k all *and *# update-grub2 *and reboot the
> server.
>
>
> 7. Verify IOMMU is enabled and working # dmesg -T | grep -iE "iommu|dmar"
>
>
> 8. One important thing is to make sure all devices are bound to the
> vfio-pci driver. In our case, *xhci_pci *driver was taking over the "USB
> 3.1 Host Controller". If this is the case, QEMU fails to spawn stating
> "make sure all devices in the IOMMU group are bound to the vfio driver". To
> tackle this, we'd write a script to properly unbind xhci_pci and use
> vfio-pci as the driver at boot. So, we wrote,
>
>
> *cat /etc/initramfs-tools/scripts/init-top/vfio.sh*
>
> *#!/bin/sh*
>
>
> *PREREQ=""*
>
>
> *prereqs()*
>
> *{*
>
> *   echo "$PREREQ"*
>
> *}*
>
>
> *case $1 in*
>
> *prereqs)*
>
> *   prereqs*
>
> *   exit 0*
>
> *   ;;*
>
> *esac*
>
>
> *for dev in 0000:37:00.2 0000:86:00.2*
>
> *do *
>
> *  echo -n "$dev" > /sys/bus/pci/drivers/xhci_hcd/unbind*
>
> *  echo -n "$dev" > /sys/bus/pci/drivers/vfio-pci/bind*
>
> *done*
>
>
> *exit 0*
>
>
> Then # update-initramfs -u -k all
>
>
> 9. Once everything is good, go ahead and reboot the machine, and your
> devices should look something like the below, the *vfio-pci* kernel module
> should be the "Kernel driver in use".
>
>
> 37:00.0 VGA compatible controller [0300]: NVIDIA Corporation TU102 [GeForce
> RTX 2080 Ti Rev. A] [10de:1e07] (rev a1)
>
> Subsystem: NVIDIA Corporation TU102 [GeForce RTX 2080 Ti Rev. A]
> [10de:12fa]
>
> Kernel driver in use: vfio-pci
>
> Kernel modules: nouveau
>
> 37:00.1 Audio device [0403]: NVIDIA Corporation TU102 High Definition Audio
> Controller [10de:10f7] (rev a1)
>
> Subsystem: NVIDIA Corporation TU102 High Definition Audio Controller
> [10de:12fa]
>
> Kernel driver in use: vfio-pci
>
> Kernel modules: snd_hda_intel
>
> 37:00.2 USB controller [0c03]: NVIDIA Corporation TU102 USB 3.1 Host
> Controller [10de:1ad6] (rev a1)
>
> Subsystem: NVIDIA Corporation TU102 USB 3.1 Host Controller [10de:12fa]
>
> Kernel driver in use: vfio-pci
>
> Kernel modules: xhci_pci
>
> 37:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU102 USB Type-C
> UCSI Controller [10de:1ad7] (rev a1)
>
> Subsystem: NVIDIA Corporation TU102 USB Type-C UCSI Controller [10de:12fa]
>
> Kernel driver in use: vfio-pci
>
>
> 86:00.0 VGA compatible controller [0300]: NVIDIA Corporation TU102 [GeForce
> RTX 2080 Ti Rev. A] [10de:1e07] (rev a1)
>
> Subsystem: NVIDIA Corporation TU102 [GeForce RTX 2080 Ti Rev. A]
> [10de:12fa]
>
> Kernel driver in use: vfio-pci
>
> Kernel modules: nouveau
>
> 86:00.1 Audio device [0403]: NVIDIA Corporation TU102 High Definition Audio
> Controller [10de:10f7] (rev a1)
>
> Subsystem: NVIDIA Corporation TU102 High Definition Audio Controller
> [10de:12fa]
>
> Kernel driver in use: vfio-pci
>
> Kernel modules: snd_hda_intel
>
> 86:00.2 USB controller [0c03]: NVIDIA Corporation TU102 USB 3.1 Host
> Controller [10de:1ad6] (rev a1)
>
> Subsystem: NVIDIA Corporation TU102 USB 3.1 Host Controller [10de:12fa]
>
> Kernel driver in use: vfio-pci
>
> Kernel modules: xhci_pci
>
> 86:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU102 USB Type-C
> UCSI Controller [10de:1ad7] (rev a1)
>
> Subsystem: NVIDIA Corporation TU102 USB Type-C UCSI Controller [10de:12fa]
>
> Kernel driver in use: vfio-pci
>
>
> 10. Go to CloudStack, create a VM as per the normal workflow, then stop it
> and insert the additional settings as below. This can be done from the
> WebUI and API as well.
>
>
> extraconfig-1
> <devices> <hostdev mode='subsystem' type='pci' managed='yes'> <driver
> name='vfio' /> <source> <address domain='0x0000' bus='0x37' slot='0x00'
> function='0x0' /> </source> </hostdev> </devices>
>
> extraconfig-2
> *<devices> <hostdev mode='subsystem' type='pci' managed='yes'> <driver
> name='vfio' /> <source> <address domain='0x0000' bus='0x86' slot='0x00'
> function='0x0' /> </source> </hostdev> </devices>*
>
>
> Replace the device address as per in your environment. We'd have to put 2
> extraconfigs as we have 2 GPUs. Even extraconfig-1 can accomodate for both
> devices as well.
>
>
> 11. Once done, schedule the VM on the host, CloudStack will compose the XML
> with the additional configuration (in extraconifg-n) specified, and the
> qemu process for that VM will have additional arguments as
>
>
> -device vfio-pci,host=0000:37:00.0,id=hostdev0,bus=pci.0,addr=0x6 -device
> vfio-pci,host=0000:86:00.0,id=hostdev1,bus=pci.0,addr=0x7
>
>
>
> 12. Once the VM is up, check using *lspci *or similar and finally install
> nvidia-smi and confirm everything is working fine.
>
>
>
> Thanks
>
> On Sun, Nov 13, 2022 at 3:25 PM Alex Mattioli <alex.matti...@shapeblue.com
> >
> wrote:
>
> > Hi Jay,
> > I'd love to hear more about how you implemented the GPU pass-through, and
> > I think it could be quite useful for the community as well.
> >
> > Cheers
> > Alex
> >
> >
> >
> >
> > -----Original Message-----
> > From: Jayanth Reddy <jayanthreddy5...@gmail.com>
> > Sent: 13 November 2022 10:43
> > To: users@cloudstack.apache.org
> > Cc: Emil Karlsson <emi...@kth.se>
> > Subject: Re: vGPU support in CloudStack on Ubuntu KVM
> >
> > Hi,
> >     AFAIK, vGPU and GPU are only supported on Xen Hypervisor as per
> >
> https://cwiki.apache.org/confluence/display/CLOUDSTACK/GPU+and+vGPU+support+for+CloudStack+Guest+VMs
> > .
> > Not sure about the vGPU but we managed to do a full GPU passthrough to a
> > VM running on Ubuntu KVM Host. Interested to discuss further?
> >
> > Thanks
> >
> >
> > On Wed, Oct 12, 2022 at 6:40 PM Pierre Le Fevre <pierr...@kth.se> wrote:
> >
> > > Hi all,
> > > I am currently trying to get vGPU to work in some of our VMs in
> > > cloudstack to enable GPU acceleration in Jupyter Notebooks.
> > > Our current setup is using CloudStack 4.17.1.0 on Ubuntu 20.04 with
> > > KVM as a hypervisor.
> > > It seems like there is some support for vGPU in CloudStack but I can't
> > > find proper documentation about compatibility with newer GPUs and
> > > other hypervisors.
> > >
> > > I've tried installing all the proper drivers on the host machine with
> > > a NVIDIA A6000 and it shows up properly in nvidia-smi. From the docs
> > > available it seems like it should show up in the UI under the host
> > > after adding it to our cluster, but no GPU appears. The dashboard also
> > > reports 0 GPUs in the zone.
> > >
> > > Is this a limitation of KVM, or have some of you gotten this setup to
> > work?
> > >
> > > All the best
> > > Pierre
> > >
> >
>

Re: vGPU support in CloudStack on Ubuntu KVM

Reply via email to