Hi again Tim,
this is an even more interesting case - as it depends on kernel module loading 
will be different between systems. It will differ in:
1. Need: only if gpu passthrough or GL rendernodes are configured this will 
matter
2. Availability: systems might have more GPUs, which one do we wait on. Or they 
have none with this path never being populated

So this is an interesting effort for a sysadmin - to decide I configured
and have the need #1, but I know it is available #2 so now how do I tune
my system to cope with that.

But at the same time tricky for a generic fix to not negatively affect
those that do not need it or have systems which never have it.

I have no solution yet, but some thoughts and questions.


## 1 Ordering

What you describe is gladly AFAIK the uncommon case, you are describing
that libvirt starts and then starts the guest before  the kernel
initialized dri/drm.

On the systems I could quickly check that was not the case, I have
always seen things like:

$ journalctl -b 0 | grep -i -e "\[drm\] Initialized" -e "Starting 
libvirtd.service"
Mai 21 08:11:03 Keschdeichel kernel: [drm] Initialized simpledrm 1.0.0 20200625 
for simple-framebuffer.0 on minor 0
Mai 21 08:11:05 Keschdeichel kernel: [drm] Initialized i915 1.6.0 20230929 for 
0000:00:02.0 on minor 1
Mai 21 08:11:05 Keschdeichel kernel: [drm] Initialized evdi 1.14.4 20240410 for 
evdi.0 on minor 0
Mai 21 08:11:05 Keschdeichel kernel: [drm] Initialized evdi 1.14.4 20240410 for 
evdi.1 on minor 2
Mai 21 08:11:05 Keschdeichel kernel: [drm] Initialized evdi 1.14.4 20240410 for 
evdi.2 on minor 3
Mai 21 08:11:05 Keschdeichel kernel: [drm] Initialized evdi 1.14.4 20240410 for 
evdi.3 on minor 4
Mai 21 08:11:10 Keschdeichel systemd[1]: Starting libvirtd.service - libvirt 
legacy monolithic daemon...


I'm just curious how to match this fail onto your case.
How does this look for you?
I assume that without your change you get libvirt starting before (all) drm - 
is that what you see?


## 2 Waiting

[Service]
ExecStartPre=/bin/sleep 10


I understand that this fixes your issue and keep it until we've found something 
better.
But that can not be a generic change we'd apply.
It is 10 seconds for you, maybe someone needs 12 or 123456 - we could never set 
this right and would slow everyone not even needing it down for nothing.

Yet, as documented workaround it is nice


## 3 The Unit

[Unit]
After=multi-user.target dev-dri.device

This looks much better, but AFAICS it should do nothing.
My system has dri entries

$ ll /dev/dri/
total 0
drwxr-xr-x   3 root root        180 Mai 21 08:11 ./
drwxr-xr-x  22 root root       6060 Jul 18 02:07 ../
drwxr-xr-x   2 root root        160 Mai 21 08:11 by-path/
crw-rw----+  1 root video  226,   0 Mai 24 03:46 card0
crw-rw----+  1 root video  226,   1 Mai 24 03:46 card1
crw-rw----+  1 root video  226,   2 Mai 24 03:46 card2
crw-rw----+  1 root video  226,   3 Mai 24 03:46 card3
crw-rw----+  1 root video  226,   4 Mai 24 03:46 card4
crw-rw----+  1 root render 226, 128 Mai 24 03:46 renderD128

But there are no such devices defined

$ systemctl list-units --all --type device | grep dri
<nothing>

That is because the closest to a matchin udev rule is 
$ cat /lib/udev/rules.d/60-drm.rules
# do not edit this file, it will be overwritten on update

ACTION!="remove", SUBSYSTEM=="drm", SUBSYSTEMS=="pci|usb|platform",
IMPORT{builtin}="path_id"

# by-path
KERNEL=="card*",     ENV{ID_PATH}=="?*",                   
SYMLINK+="dri/by-path/$env{ID_PATH}-card"
KERNEL=="card*",     ENV{ID_PATH_WITH_USB_REVISION}=="?*", 
SYMLINK+="dri/by-path/$env{ID_PATH_WITH_USB_REVISION}-card"
KERNEL=="controlD*", ENV{ID_PATH}=="?*",                   
SYMLINK+="dri/by-path/$env{ID_PATH}-control"
KERNEL=="controlD*", ENV{ID_PATH_WITH_USB_REVISION}=="?*", 
SYMLINK+="dri/by-path/$env{ID_PATH_WITH_USB_REVISION}-control"
KERNEL=="renderD*",  ENV{ID_PATH}=="?*",                   
SYMLINK+="dri/by-path/$env{ID_PATH}-render"
KERNEL=="renderD*",  ENV{ID_PATH_WITH_USB_REVISION}=="?*", 
SYMLINK+="dri/by-path/$env{ID_PATH_WITH_USB_REVISION}-render"

And there is nothing that would create dev-dri.device
They would need a TAG+="systemd" entry.
Similar to discussions [1][2]
But even if we'd have that, AFAIU there would be dev-dri-card0.device but not 
just dev-dri.device
And even if we'd have a particular guest config might need dev-dri-card7.device 
and that initializes even later - so just waiting on DRI sounds neat.

Yet on the other hand, just like selecting the right timeout on the
sleep - this is dependent on the system config, hardware and needs :-/


## 4 What now?

I'd appreciate if you could:
- share the initialization order your system really has
- explain if you found something about dev-dri.device that I do not know yet
- explain in more details which HW general and the GPUs you wait on are
- share how you configured the guest (is it passthrough, is it gl rendering 
...?)

That would help us to understand the situation a bit better, if we are
lucky it might even allow to recreate it.

Still, my expectation is that with all that we eventually need to reach out to 
the project at [3] or [4].
Due to the "at system config file time we'd never know if and what we need to 
wait on" problem described above I'd expect that this might need something 
completely else. Like libvirt internally (it knows what a guest needs as it 
knows its definition) waiting for that if and only as needed.
Or I might overlook something obvious which the subject matter experts there 
might know and share.

But for now, I'd appreciate if you could help my curiosity by providing
the above.

[1]: https://github.com/systemd/systemd/issues/25408
[2]: https://github.com/joukewitteveen/xlogin/issues/15
[3]: https://gitlab.com/libvirt/libvirt
[4]: https://listman.redhat.com/mailman/listinfo/libvir-list

P.S. @Sergio who usually looks at these - sorry for stealing those
interesting cases in my morning, bad timezone luck for you :-P. Do not
be concerned, you might deal with is long enough down the road :-)

** Bug watch added: github.com/systemd/systemd/issues #25408
   https://github.com/systemd/systemd/issues/25408

** Bug watch added: github.com/joukewitteveen/xlogin/issues #15
   https://github.com/joukewitteveen/xlogin/issues/15

** Changed in: libvirt (Ubuntu)
       Status: New => Incomplete

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2073442

Title:
  Failed to autostart VM: cannot open directory '/dev/dri': No such file
  or directory

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/2073442/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to