I omitted one piece of information about running with --cgroupns=private thinking it was unrelated, but actually it appears maybe it is related (and perhaps highlights a variant of the issue that is seen on first-boot, not only on container restart). Again (and what makes me think it's related), I can reproduce this on a Centos host but not on Ubuntu (still with SELinux in 'permissive' mode).
[root@localhost ~]# podman run -it --name ubuntu --privileged --cgroupns private ubuntu-systemd systemd 245.4-4ubuntu3.19 running in system mode. (+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTI) Detected virtualization podman. Detected architecture x86-64. Welcome to Ubuntu 20.04.5 LTS! Set hostname to <daca3bb894b7>. *Couldn't move remaining userspace processes, ignoring: Input/output errorFailed to create compat systemd cgroup /system.slice: No such file or directoryFailed to create compat systemd cgroup /system.slice/system-getty.slice: No such file or directory* [ OK ] Created slice system-getty.slice. *Failed to create compat systemd cgroup /system.slice/system-modprobe.slice: No such file or directory*[ OK ] Created slice system-modprobe.slice. *Failed to create compat systemd cgroup /user.slice: No such file or directory*[ OK ] Created slice User and Session Slice. [ OK ] Started Dispatch Password Requests to Console Directory Watch. [ OK ] Started Forward Password Requests to Wall Directory Watch. This first warning is coming from one of the same areas of code I linked in my first email: https://github.com/systemd/systemd/blob/v245/src/core/cgroup.c#L2967. I see the same thing with '--cap-add sys_admin' instead of '--privileged', and again seen with both docker and podman. Thanks, Lewis On Tue, 10 Jan 2023 at 15:28, Lewis Gaul <lewis.g...@gmail.com> wrote: > I'm aware of the higher level of collaboration between podman and systemd > compared to docker, hence primarily raising this issue from a podman angle. > > In privileged mode all mounts are read-write, so yes the container has > write access to the cgroup filesystem. (Podman also ensures write access to > the systemd cgroup subsystem mount in non-privileged mode by default). > > On first boot PID 1 can be found in > /sys/fs/cgroup/systemd/machine.slice/libpod-<ctr-id>.scope/init.scope/cgroup.procs, > whereas when the container restarts the 'init.scope/' directory does not > exist and PID 1 is instead found in the parent (container root) cgroup > /sys/fs/cgroup/systemd/machine.slice/libpod-<ctr-id>.scope/cgroup.procs > (also reflected by /proc/1/cgroup). This is strange because systemd must be > the one to create this cgroup dir in the initial boot, so I'm not sure why > it wouldn't on subsequent boot? > > I can confirm that the container has permissions since executing a 'mkdir' > in /sys/fs/cgroup/systemd/machine.slice/libpod-<ctr-id>.scope/ inside the > container succeeds after the restart, so I have no idea why systemd is not > creating the 'init.scope/' dir. I notice that inside the container's > systemd cgroup mount 'system.slice/' does exist, but 'user.slice/' also > does not (both exist on normal boot). Is there any way I can find systemd > logs that might indicate why the cgroup dir creation is failing? > > One final datapoint: the same is seen when using a private cgroup > namespace (via 'podman run --cgroupns=private'), although then the error is > then, as expected, "Failed to attach 1 to compat systemd cgroup > /init.scope: No such file or directory". > > I could raise this with the podman team, but it seems more in the systemd > area given it's a systemd warning and I would expect systemd to be creating > this cgroup dir? > > Thanks, > Lewis > > On Tue, 10 Jan 2023 at 14:48, Lennart Poettering <lenn...@poettering.net> > wrote: > >> On Di, 10.01.23 13:18, Lewis Gaul (lewis.g...@gmail.com) wrote: >> >> > Following 'setenforce 0' I still see the same issue (I was also >> suspecting >> > SELinux!). >> > >> > A few additional data points: >> > - this was not seen when using systemd v230 inside the container >> > - this is also seen on CentOS 8.4 >> > - this is seen under docker even if the container's cgroup driver is >> > changed from 'cgroupfs' to 'systemd' >> >> docker is garbage. They are hostile towards running systemd inside >> containers. >> >> podman upstream is a lot friendly, and apparently what everyone in OCI >> is going towards these days. >> >> I have not much experience with podman though, and in particular not >> old versions. Next step would probably be to look at what precisely >> causes the permission issue, via strace. >> >> but did you make sure your container actually gets write access to the >> cgroup trees? >> >> anyway, i'd recommend asking the podman community for help about this. >> >> Lennart >> >> -- >> Lennart Poettering, Berlin >> >