Hello all, In my efforts to make user LXC containers work I noticed that under a "real" desktop (not just nspawn with VT login or ssh logins) my carefully set up cgroups in the non-systemd controllers get reverted. I. e. I put the session leader (and all other pids) of logind sessions (/user.slice/user-1000.slice/session-XX.scope) into all cgroup controllers, but a second after they are all back in the / cgroups (or sometimes in /user.slice). After some days of debugging (during which I learned a lot about the innards of systemd :-) I finally found out why:
During unit cgroup initialization (unit_realize_cgroup), sibling cgroups are queued instead of initialized immediately. unit_create_cgroups() makes an attempt to avoid initializing and migrating a cgroup more than once: path = unit_default_cgroup_path(u); [...] r = hashmap_put(u->manager->cgroup_unit, path, u); if (r < 0) { log_error(r == -EEXIST ? "cgroup %s exists already: %s" : "hashmap_put failed for %s: %s", path, strerror(-r)); return r; } But this doesn't work: "path" always gets allocated freshly in that function, so the pointer is never in the hashmap and the -EEXISTS never actually hits. This causes -.slice to be initialized and recursively migrated a gazillion times, which explains the random punting of sub-cgroup PIDs back to /. I fixed this with an explicit hashmap_get() call, which compares values instead of pointers. I also added some debugging to that function: log_debug("unit_create_cgroups %s: path=%s realized %i hashmap %p", u->id, path, u->cgroup_realized, hashmap_get(u->manager->cgroup_unit, path)); (right before the hashmap_put() above), which illustrates the problem: systemd[1]: Starting Root Slice. systemd[1]: unit_create_cgroups -.slice: path= realized 0 hashmap (nil) systemd[1]: Created slice Root Slice. [...] pid1 systemd[1]: unit_create_cgroups session-c1.scope: path=/user.slice/user-1000.slice/session-c1.scope realized 0 hashmap (nil) systemd[1]: Started Session c1 of user martin. [... later on when most things have been started ...] systemd[1]: unit_create_cgroups -.slice: path= realized 1 hashmap 0x7f0e14aa4850 systemd[1]: unit_create_cgroups -.slice: cgroup exists already systemd[1]: Failed to realize cgroups for queued unit user.slice: File exists systemd[1]: unit_create_cgroups -.slice: path= realized 1 hashmap 0x7f0e14aa4850 systemd[1]: unit_create_cgroups -.slice: cgroup exists already systemd[1]: Failed to realize cgroups for queued unit grub-common.slice: File exists systemd[1]: unit_create_cgroups -.slice: path= realized 1 hashmap 0x7f0e14aa4850 systemd[1]: unit_create_cgroups -.slice: cgroup exists already systemd[1]: Failed to realize cgroups for queued unit networking.slice: File exists ... and so on, basically once for each .service. Initializing -.slice so many times is certainly an unintended effect of the peer cgroup creation. Thus the patch fixes the multiple initialization/creation, but a proper fix should also quiesce these error messages. The condition could be checked explicitly, i. e. we skip the "Failed to realize..." error for EEXISTS, or we generally tone this down to log_debug. I'm open to suggestions. And of course the log_debug should be removed; it's nice to illustrate the problem, but doesn't really belong into production code. Thanks, Martin -- Martin Pitt | http://www.piware.de Ubuntu Developer (www.ubuntu.com) | Debian Developer (www.debian.org)
From fd2f8a444c9c644a39dd3b619934e8768e03c2a3 Mon Sep 17 00:00:00 2001 From: Martin Pitt <martin.p...@ubuntu.com> Date: Mon, 1 Dec 2014 10:50:06 +0100 Subject: [PATCH] Do not realize and migrate cgroups multiple times unit_create_cgroups() tries to check if a cgroup already exists. But has the destination path is always allocated dynamically as a new string, that pointer will never already be in the hashmap, thus hashmap_put() will never actually fail with EEXISTS. Thus check for the existance of the cgroup path explicitly. Before this, "-.slice" got initialized and its child PIDs migrated many times through queuing the realization of sibling units; thiss caused any cgroup controller layout from sub-cgroups to be reverted and their pids moved back to the root cgroup in all controllers (except systemd). --- src/core/cgroup.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/src/core/cgroup.c b/src/core/cgroup.c index 8820bee..3d36080 100644 --- a/src/core/cgroup.c +++ b/src/core/cgroup.c @@ -614,6 +614,13 @@ static int unit_create_cgroups(Unit *u, CGroupControllerMask mask) { if (!path) return log_oom(); + log_debug("unit_create_cgroups %s: path=%s realized %i hashmap %p", u->id, path, u->cgroup_realized, hashmap_get(u->manager->cgroup_unit, path)); + + if (hashmap_get(u->manager->cgroup_unit, path)) { + log_error("unit_create_cgroups %s: cgroup %s exists already", u->id, u->cgroup_path); + return -EEXIST; + } + r = hashmap_put(u->manager->cgroup_unit, path, u); if (r < 0) { log_error(r == -EEXIST ? "cgroup %s exists already: %s" : "hashmap_put failed for %s: %s", path, strerror(-r)); -- 2.1.3
signature.asc
Description: Digital signature
_______________________________________________ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel