On Fri, Sep 29, 2023, 12:54 Lewis Gaul <lewis.g...@gmail.com> wrote: > Hi systemd team, > > I've encountered an issue when running systemd inside a container using > cgroups v2, where if a container exec process is created at the wrong > moment during early startup then systemd will fail to move all processes > into a child cgroup, and therefore fail to enable controllers due to the > "no internal processes" rule introduced in cgroups v2. In other words, a > systemd container is started and very soon after a process is created via > e.g. 'podman exec systemd-ctr cmd', where the exec process is placed in the > container's namespaces (although not a child of the container's PID 1). > This is not a totally crazy thing to be doing - this was hit when testing a > systemd container, using a container exec "probe" to check when the > container is ready. >
Wouldn't it be better to have the container inform the host via NOTIFY_SOCKET (the Type=notify mechanism)? I believe systemd has had support for sending readiness notifications from init to a container manager for quite a while. (Alternatively, connect out to the container's systemd or dbus Unix socket and query it directly that way, but NOTIFY_SOCKET would avoid the need to time it correctly.) Other than that – I'm not a container expert but this does seem like a self-inflicted problem to me. If you spawn processes unknown to systemd, it makes sense that systemd will fail to handle them. >