On Thu, Jul 6, 2017 at 12:54 AM, Alex Williamson < [email protected]> wrote:
> On Wed, Jul 5, 2017 at 9:10 PM, Thiago Ramon <[email protected]> > wrote: > >> I'm having a quite unique problem, and have exhausted all possibilities I >> have found so far, after a couple weeks of attempts, so I've decided to >> bother you guys with this. >> >> My setup: Ryzen 7 1800X, Asus Crosshair VI Hero, NVidia GTX 980 Ti and >> NVidia GTX 1060 6G >> OS: Arch Linux, latest updates, mainline kernel (4.11.7-1-ARCH), >> QEMU 2.9.0, libvirt 3.4.0 >> >> The problem, for all I can tell, is that the GPU is getting corrupted >> somehow at or before reaching the BIOS/UEFI, getting reset and stuck on >> mode D3, no matter which GPU I passthrough, boot options, SeaBIOS/OVMF, >> chipset or connection to the PCI/PCIe bus. >> >> Both GPUs are healthy and working perfectly under Linux, using the >> proprietary NVidia drivers. >> >> Things tried: Disabled D3 mode in vfio_pci, used pci_stub instead, >> disabled NVidia driver and ran the VM from the console, multiple boot >> options involving the IOMMU and KVM (but hey, any new ideas help) >> >> I know the motherboard (in general, not this one specifically) can work >> with GPU passthrough, as I already had contact with someone passing a GTX >> 1070 with it (though his other GPU is AMD). >> >> Unless there's something I've overlooked, I probably need to gather more >> in-depth information on what's going on with the GPU in the first moments >> of the boot process, so if anyone knows of a good set of debug options for >> QEMU, or if kernel tracing is better, please let me know. >> >> Thanks for any help, and let me know if there's any extra info that could >> help solve this puzzle. >> >> >> Relevant logs and more details: https://www.reddit.co >> m/r/VFIO/comments/6khu5i/need_help_with_gpu_passthrough_on_ryzen_c6h_gtx/ >> > > Wow, formatting in reddit is nearly impossible to decipher... pastebin? I > can spot one issue: > > pci 0000:29:00.0: vgaarb: setting as boot VGA device > > Generally you want to assign the non-boot device. And probably related: > > Failed to mmap 0000:29:00.0 BAR 3. Performance may be slow > > This is really suggesting something much more wrong than performance may > be slow. Check /proc/iomem, find what driver is claiming resources on the > device, disable it. This probably means that some other driver besides > vesafb or efifb is blocking the device. The kernel will try pretty hard to > attach a driver to the primary graphics, which is one of the complications > of trying to assign primary graphics. Thanks, > > Alex > Here, dropped the raw message in pastebin: https://pastebin.com/hfJ6ryJg That particular run was trying to pass the 980 Ti, which is the boot device, and which probably had something else prodding at it (I'll give it a try again and check what else was attaching to it). I've mostly focused on passing the 1060 though, which doesn't get touched by anything but vfio-pci, and also doesn't show any mmap issues, here's the last QEMU run with SeaBIOS: https://pastebin.com/DEPpewCH And the last one from OVMF: https://pastebin.com/L7gkrm36 On the kernel log, I only get the vfio_bar_restore messages. One interesting and consistent pattern is that SeaBIOS always generate 2 pairs of warnings (one for GPU, one audio), while OVMF generates quite a bit (dozen+, don't have a log handy). Probably not relevant, as apparently the failure happens before the first message anyway. Another detail that may be relevant: Whenever I try a passthrough (and fail), the kernel fails to soft restart. It gets to the last stage where it would do a soft reset but the console just sits there. Could this just be vfio_pci trying to do something with the unresponsive card, or something else that may be a clue to what's going on? Thanks for the help
_______________________________________________ vfio-users mailing list [email protected] https://www.redhat.com/mailman/listinfo/vfio-users
