I omitted one piece of information about running with --cgroupns=private
thinking it was unrelated, but actually it appears maybe it is related (and
perhaps highlights a variant of the issue that is seen on first-boot, not
only on container restart). Again (and what makes me think it's related), I
can reproduce this on a Centos host but not on Ubuntu (still with SELinux
in 'permissive' mode).

[root@localhost ~]# podman run -it --name ubuntu --privileged --cgroupns
private ubuntu-systemd
systemd 245.4-4ubuntu3.19 running in system mode. (+PAM +AUDIT +SELINUX
+IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL
+XZ +LZ4 +SECCOMP +BLKID +ELFUTI)
Detected virtualization podman.
Detected architecture x86-64.

Welcome to Ubuntu 20.04.5 LTS!

Set hostname to <daca3bb894b7>.


*Couldn't move remaining userspace processes, ignoring: Input/output
errorFailed to create compat systemd cgroup /system.slice: No such file or
directoryFailed to create compat systemd cgroup
/system.slice/system-getty.slice: No such file or directory*
[  OK  ] Created slice system-getty.slice.

*Failed to create compat systemd cgroup
/system.slice/system-modprobe.slice: No such file or directory*[  OK  ]
Created slice system-modprobe.slice.

*Failed to create compat systemd cgroup /user.slice: No such file or
directory*[  OK  ] Created slice User and Session Slice.
[  OK  ] Started Dispatch Password Requests to Console Directory Watch.
[  OK  ] Started Forward Password Requests to Wall Directory Watch.

This first warning is coming from one of the same areas of code I linked in
my first email:
https://github.com/systemd/systemd/blob/v245/src/core/cgroup.c#L2967.

I see the same thing with '--cap-add sys_admin' instead of '--privileged',
and again seen with both docker and podman.

Thanks,
Lewis

On Tue, 10 Jan 2023 at 15:28, Lewis Gaul <lewis.g...@gmail.com> wrote:

> I'm aware of the higher level of collaboration between podman and systemd
> compared to docker, hence primarily raising this issue from a podman angle.
>
> In privileged mode all mounts are read-write, so yes the container has
> write access to the cgroup filesystem. (Podman also ensures write access to
> the systemd cgroup subsystem mount in non-privileged mode by default).
>
> On first boot PID 1 can be found in
> /sys/fs/cgroup/systemd/machine.slice/libpod-<ctr-id>.scope/init.scope/cgroup.procs,
> whereas when the container restarts the 'init.scope/' directory does not
> exist and PID 1 is instead found in the parent (container root) cgroup
> /sys/fs/cgroup/systemd/machine.slice/libpod-<ctr-id>.scope/cgroup.procs
> (also reflected by /proc/1/cgroup). This is strange because systemd must be
> the one to create this cgroup dir in the initial boot, so I'm not sure why
> it wouldn't on subsequent boot?
>
> I can confirm that the container has permissions since executing a 'mkdir'
> in /sys/fs/cgroup/systemd/machine.slice/libpod-<ctr-id>.scope/ inside the
> container succeeds after the restart, so I have no idea why systemd is not
> creating the 'init.scope/' dir. I notice that inside the container's
> systemd cgroup mount 'system.slice/' does exist, but 'user.slice/' also
> does not (both exist on normal boot). Is there any way I can find systemd
> logs that might indicate why the cgroup dir creation is failing?
>
> One final datapoint: the same is seen when using a private cgroup
> namespace (via 'podman run --cgroupns=private'), although then the error is
> then, as expected, "Failed to attach 1 to compat systemd cgroup
> /init.scope: No such file or directory".
>
> I could raise this with the podman team, but it seems more in the systemd
> area given it's a systemd warning and I would expect systemd to be creating
> this cgroup dir?
>
> Thanks,
> Lewis
>
> On Tue, 10 Jan 2023 at 14:48, Lennart Poettering <lenn...@poettering.net>
> wrote:
>
>> On Di, 10.01.23 13:18, Lewis Gaul (lewis.g...@gmail.com) wrote:
>>
>> > Following 'setenforce 0' I still see the same issue (I was also
>> suspecting
>> > SELinux!).
>> >
>> > A few additional data points:
>> > - this was not seen when using systemd v230 inside the container
>> > - this is also seen on CentOS 8.4
>> > - this is seen under docker even if the container's cgroup driver is
>> > changed from 'cgroupfs' to 'systemd'
>>
>> docker is garbage. They are hostile towards running systemd inside
>> containers.
>>
>> podman upstream is a lot friendly, and apparently what everyone in OCI
>> is going towards these days.
>>
>> I have not much experience with podman though, and in particular not
>> old versions. Next step would probably be to look at what precisely
>> causes the permission issue, via strace.
>>
>> but did you make sure your container actually gets write access to the
>> cgroup trees?
>>
>> anyway, i'd recommend asking the podman community for help about this.
>>
>> Lennart
>>
>> --
>> Lennart Poettering, Berlin
>>
>

Reply via email to