Hi,
That seems to confirm our suspicion, we'll try to get it working with
XenServer, but it seems like 8.2 is required for RTX A6000, so it will
probably take a while with sales teams etc.

Our current config is quite similar, where a couple beefy VMs have
some Quadro RTX 5000 passed through for acceleration but sharing is quite
limited. It's based on the blog from Pisz as well as this one:
https://mathiashueber.com/pci-passthrough-ubuntu-2004-virtual-machine/

I've attached our internal guide for how this is done in case someone is
interested!

I'll update you guys if we can get it to work, it seems like there aren't a
lot of docs on this other than the confluence ticket. Unfortunate that it's
locked behind vGPU and Citrix licenses!

Have a good CCC :)

Pierre

On Sun, Nov 13, 2022 at 11:41 AM Jayanth Reddy <jayanthreddy5...@gmail.com>
wrote:

> Hi,
>
> This blog at
> https://lab.piszki.pl/cloudstack-kvm-and-running-vm-with-vgpu/
> helped us a lot. Thanks to Piotr Pisz, he's active in the CloudStack
> community as well.
> Below are the steps we followed to make this work. It doesn't show up in
> the GPU count in the zone or somewhere else on CloudStack.
> We've got Intel Processors btw. If yours is AMD, please make appropriate
> changes.
>
> 1. Enable VT-d at the BIOS.
> 2. Since the latest Kernel which comes with Ubuntu has vfio drivers as
> in-built Kernel Modules, there is no need to additionally load them.
> 3. We have 2 GPU 2080 Ti cards inserted on the server and are as follows,
>
> *37:00.0 VGA compatible controller [0300]: NVIDIA Corporation TU102
> [GeForce RTX 2080 Ti Rev. A] [10de:1e07] (rev a1)*
>
> * Subsystem: NVIDIA Corporation TU102 [GeForce RTX 2080 Ti Rev. A]
> [10de:12fa]*
>
> *37:00.1 Audio device [0403]: NVIDIA Corporation TU102 High Definition
> Audio Controller [10de:10f7] (rev a1)*
>
> * Subsystem: NVIDIA Corporation TU102 High Definition Audio Controller
> [10de:12fa]*
>
> *37:00.2 USB controller [0c03]: NVIDIA Corporation TU102 USB 3.1 Host
> Controller [10de:1ad6] (rev a1)*
>
> * Subsystem: NVIDIA Corporation TU102 USB 3.1 Host Controller [10de:12fa]*
>
> *37:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU102 USB Type-C
> UCSI Controller [10de:1ad7] (rev a1)*
>
> * Subsystem: NVIDIA Corporation TU102 USB Type-C UCSI Controller
> [10de:12fa]*
>
> *86:00.0 VGA compatible controller [0300]: NVIDIA Corporation TU102
> [GeForce RTX 2080 Ti Rev. A] [10de:1e07] (rev a1)*
>
> * Subsystem: NVIDIA Corporation TU102 [GeForce RTX 2080 Ti Rev. A]
> [10de:12fa]*
>
> *86:00.1 Audio device [0403]: NVIDIA Corporation TU102 High Definition
> Audio Controller [10de:10f7] (rev a1)*
>
> * Subsystem: NVIDIA Corporation TU102 High Definition Audio Controller
> [10de:12fa]*
>
> *86:00.2 USB controller [0c03]: NVIDIA Corporation TU102 USB 3.1 Host
> Controller [10de:1ad6] (rev a1)*
>
> * Subsystem: NVIDIA Corporation TU102 USB 3.1 Host Controller [10de:12fa]*
>
> *86:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU102 USB Type-C
> UCSI Controller [10de:1ad7] (rev a1)*
>
> * Subsystem: NVIDIA Corporation TU102 USB Type-C UCSI Controller
> [10de:12fa]*
>
>
> 4. Add GRUB parameters in */etc/default/grub* as
>
>
> *GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt
> modprobe.blacklist=xhci_hcd
> vfio-pci.ids=10de:1e07,10de:10f7,10de:1ad6,10de:1ad7"*
>
>
> You may also do this by placing relevant items under
> */etc/modprobe.d/vfio.conf
> *or similar
>
>
> 5. Blacklist any additional drivers on the host that take-over the GPU at
> */etc/modprobe.d/blacklist-nvidia.conf*
>
>
> $ cat /etc/modprobe.d/blacklist-nvidia.conf
>
> blacklist nouveau
>
> blacklist nvidia
>
> blacklist xhci_hcd
>
>
> 6. Do *# update-initramfs -u -k all *and *# update-grub2 *and reboot the
> server.
>
>
> 7. Verify IOMMU is enabled and working # dmesg -T | grep -iE "iommu|dmar"
>
>
> 8. One important thing is to make sure all devices are bound to the
> vfio-pci driver. In our case, *xhci_pci *driver was taking over the "USB
> 3.1 Host Controller". If this is the case, QEMU fails to spawn stating
> "make sure all devices in the IOMMU group are bound to the vfio driver". To
> tackle this, we'd write a script to properly unbind xhci_pci and use
> vfio-pci as the driver at boot. So, we wrote,
>
>
> *cat /etc/initramfs-tools/scripts/init-top/vfio.sh*
>
> *#!/bin/sh*
>
>
> *PREREQ=""*
>
>
> *prereqs()*
>
> *{*
>
> *   echo "$PREREQ"*
>
> *}*
>
>
> *case $1 in*
>
> *prereqs)*
>
> *   prereqs*
>
> *   exit 0*
>
> *   ;;*
>
> *esac*
>
>
> *for dev in 0000:37:00.2 0000:86:00.2*
>
> *do *
>
> *  echo -n "$dev" > /sys/bus/pci/drivers/xhci_hcd/unbind*
>
> *  echo -n "$dev" > /sys/bus/pci/drivers/vfio-pci/bind*
>
> *done*
>
>
> *exit 0*
>
>
> Then # update-initramfs -u -k all
>
>
> 9. Once everything is good, go ahead and reboot the machine, and your
> devices should look something like the below, the *vfio-pci* kernel module
> should be the "Kernel driver in use".
>
>
> 37:00.0 VGA compatible controller [0300]: NVIDIA Corporation TU102 [GeForce
> RTX 2080 Ti Rev. A] [10de:1e07] (rev a1)
>
> Subsystem: NVIDIA Corporation TU102 [GeForce RTX 2080 Ti Rev. A]
> [10de:12fa]
>
> Kernel driver in use: vfio-pci
>
> Kernel modules: nouveau
>
> 37:00.1 Audio device [0403]: NVIDIA Corporation TU102 High Definition Audio
> Controller [10de:10f7] (rev a1)
>
> Subsystem: NVIDIA Corporation TU102 High Definition Audio Controller
> [10de:12fa]
>
> Kernel driver in use: vfio-pci
>
> Kernel modules: snd_hda_intel
>
> 37:00.2 USB controller [0c03]: NVIDIA Corporation TU102 USB 3.1 Host
> Controller [10de:1ad6] (rev a1)
>
> Subsystem: NVIDIA Corporation TU102 USB 3.1 Host Controller [10de:12fa]
>
> Kernel driver in use: vfio-pci
>
> Kernel modules: xhci_pci
>
> 37:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU102 USB Type-C
> UCSI Controller [10de:1ad7] (rev a1)
>
> Subsystem: NVIDIA Corporation TU102 USB Type-C UCSI Controller [10de:12fa]
>
> Kernel driver in use: vfio-pci
>
>
> 86:00.0 VGA compatible controller [0300]: NVIDIA Corporation TU102 [GeForce
> RTX 2080 Ti Rev. A] [10de:1e07] (rev a1)
>
> Subsystem: NVIDIA Corporation TU102 [GeForce RTX 2080 Ti Rev. A]
> [10de:12fa]
>
> Kernel driver in use: vfio-pci
>
> Kernel modules: nouveau
>
> 86:00.1 Audio device [0403]: NVIDIA Corporation TU102 High Definition Audio
> Controller [10de:10f7] (rev a1)
>
> Subsystem: NVIDIA Corporation TU102 High Definition Audio Controller
> [10de:12fa]
>
> Kernel driver in use: vfio-pci
>
> Kernel modules: snd_hda_intel
>
> 86:00.2 USB controller [0c03]: NVIDIA Corporation TU102 USB 3.1 Host
> Controller [10de:1ad6] (rev a1)
>
> Subsystem: NVIDIA Corporation TU102 USB 3.1 Host Controller [10de:12fa]
>
> Kernel driver in use: vfio-pci
>
> Kernel modules: xhci_pci
>
> 86:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU102 USB Type-C
> UCSI Controller [10de:1ad7] (rev a1)
>
> Subsystem: NVIDIA Corporation TU102 USB Type-C UCSI Controller [10de:12fa]
>
> Kernel driver in use: vfio-pci
>
>
> 10. Go to CloudStack, create a VM as per the normal workflow, then stop it
> and insert the additional settings as below. This can be done from the
> WebUI and API as well.
>
>
> extraconfig-1
> <devices> <hostdev mode='subsystem' type='pci' managed='yes'> <driver
> name='vfio' /> <source> <address domain='0x0000' bus='0x37' slot='0x00'
> function='0x0' /> </source> </hostdev> </devices>
>
> extraconfig-2
> *<devices> <hostdev mode='subsystem' type='pci' managed='yes'> <driver
> name='vfio' /> <source> <address domain='0x0000' bus='0x86' slot='0x00'
> function='0x0' /> </source> </hostdev> </devices>*
>
>
> Replace the device address as per in your environment. We'd have to put 2
> extraconfigs as we have 2 GPUs. Even extraconfig-1 can accomodate for both
> devices as well.
>
>
> 11. Once done, schedule the VM on the host, CloudStack will compose the XML
> with the additional configuration (in extraconifg-n) specified, and the
> qemu process for that VM will have additional arguments as
>
>
> -device vfio-pci,host=0000:37:00.0,id=hostdev0,bus=pci.0,addr=0x6 -device
> vfio-pci,host=0000:86:00.0,id=hostdev1,bus=pci.0,addr=0x7
>
>
>
> 12. Once the VM is up, check using *lspci *or similar and finally install
> nvidia-smi and confirm everything is working fine.
>
>
>
> Thanks
>
> On Sun, Nov 13, 2022 at 3:25 PM Alex Mattioli <alex.matti...@shapeblue.com
> >
> wrote:
>
> > Hi Jay,
> > I'd love to hear more about how you implemented the GPU pass-through, and
> > I think it could be quite useful for the community as well.
> >
> > Cheers
> > Alex
> >
> >
> >
> >
> > -----Original Message-----
> > From: Jayanth Reddy <jayanthreddy5...@gmail.com>
> > Sent: 13 November 2022 10:43
> > To: users@cloudstack.apache.org
> > Cc: Emil Karlsson <emi...@kth.se>
> > Subject: Re: vGPU support in CloudStack on Ubuntu KVM
> >
> > Hi,
> >     AFAIK, vGPU and GPU are only supported on Xen Hypervisor as per
> >
> https://cwiki.apache.org/confluence/display/CLOUDSTACK/GPU+and+vGPU+support+for+CloudStack+Guest+VMs
> > .
> > Not sure about the vGPU but we managed to do a full GPU passthrough to a
> > VM running on Ubuntu KVM Host. Interested to discuss further?
> >
> > Thanks
> >
> >
> > On Wed, Oct 12, 2022 at 6:40 PM Pierre Le Fevre <pierr...@kth.se> wrote:
> >
> > > Hi all,
> > > I am currently trying to get vGPU to work in some of our VMs in
> > > cloudstack to enable GPU acceleration in Jupyter Notebooks.
> > > Our current setup is using CloudStack 4.17.1.0 on Ubuntu 20.04 with
> > > KVM as a hypervisor.
> > > It seems like there is some support for vGPU in CloudStack but I can't
> > > find proper documentation about compatibility with newer GPUs and
> > > other hypervisors.
> > >
> > > I've tried installing all the proper drivers on the host machine with
> > > a NVIDIA A6000 and it shows up properly in nvidia-smi. From the docs
> > > available it seems like it should show up in the UI under the host
> > > after adding it to our cluster, but no GPU appears. The dashboard also
> > > reports 0 GPUs in the zone.
> > >
> > > Is this a limitation of KVM, or have some of you gotten this setup to
> > work?
> > >
> > > All the best
> > > Pierre
> > >
> >
>

Reply via email to