Hello all,

In my efforts to make user LXC containers work I noticed that under a
"real" desktop (not just nspawn with VT login or ssh logins) my
carefully set up cgroups in the non-systemd controllers get reverted.
I. e. I put the session leader (and all other pids) of logind sessions
(/user.slice/user-1000.slice/session-XX.scope) into all cgroup
controllers, but a second after they are all back in the / cgroups (or
sometimes in /user.slice). After some days of debugging (during which
I learned a lot about the innards of systemd :-) I finally found out
why:

During unit cgroup initialization (unit_realize_cgroup), sibling
cgroups are queued instead of initialized immediately.
unit_create_cgroups() makes an attempt to avoid initializing and
migrating a cgroup more than once:

       path = unit_default_cgroup_path(u);
       [...]
       r = hashmap_put(u->manager->cgroup_unit, path, u);
        if (r < 0) {
                log_error(r == -EEXIST ? "cgroup %s exists already: %s" : 
"hashmap_put failed for %s: %s", path, strerror(-r));
                return r;
        }

But this doesn't work: "path" always gets allocated freshly in that
function, so the pointer is never in the hashmap and the -EEXISTS
never actually hits. This causes -.slice to be initialized and
recursively migrated a gazillion times, which explains the random
punting of sub-cgroup PIDs back to /.

I fixed this with an explicit hashmap_get() call, which compares
values instead of pointers.

I also added some debugging to that function:

  log_debug("unit_create_cgroups %s: path=%s realized %i hashmap %p", u->id, 
path, u->cgroup_realized, hashmap_get(u->manager->cgroup_unit, path));

(right before the hashmap_put() above), which illustrates the
problem:

systemd[1]: Starting Root Slice.
systemd[1]: unit_create_cgroups -.slice: path= realized 0 hashmap (nil)
systemd[1]: Created slice Root Slice.
[...]
pid1 systemd[1]: unit_create_cgroups session-c1.scope: 
path=/user.slice/user-1000.slice/session-c1.scope realized 0 hashmap (nil)
systemd[1]: Started Session c1 of user martin.
[... later on when most things have been started ...]
systemd[1]: unit_create_cgroups -.slice: path= realized 1 hashmap 0x7f0e14aa4850
systemd[1]: unit_create_cgroups -.slice: cgroup  exists already
systemd[1]: Failed to realize cgroups for queued unit user.slice: File exists
systemd[1]: unit_create_cgroups -.slice: path= realized 1 hashmap 0x7f0e14aa4850
systemd[1]: unit_create_cgroups -.slice: cgroup  exists already
systemd[1]: Failed to realize cgroups for queued unit grub-common.slice: File 
exists
systemd[1]: unit_create_cgroups -.slice: path= realized 1 hashmap 0x7f0e14aa4850
systemd[1]: unit_create_cgroups -.slice: cgroup  exists already
systemd[1]: Failed to realize cgroups for queued unit networking.slice: File 
exists

... and so on, basically once for each .service.

Initializing -.slice so many times is certainly an unintended effect
of the peer cgroup creation. Thus the patch fixes the multiple
initialization/creation, but a proper fix should also quiesce these
error messages. The condition could be checked explicitly, i. e. we
skip the "Failed to realize..." error for EEXISTS, or we generally
tone this down to log_debug. I'm open to suggestions. And of course
the log_debug should be removed; it's nice to illustrate the problem,
but doesn't really belong into production code.

Thanks,

Martin

-- 
Martin Pitt                        | http://www.piware.de
Ubuntu Developer (www.ubuntu.com)  | Debian Developer  (www.debian.org)
From fd2f8a444c9c644a39dd3b619934e8768e03c2a3 Mon Sep 17 00:00:00 2001
From: Martin Pitt <martin.p...@ubuntu.com>
Date: Mon, 1 Dec 2014 10:50:06 +0100
Subject: [PATCH] Do not realize and migrate cgroups multiple times

unit_create_cgroups() tries to check if a cgroup already exists. But has the
destination path is always allocated dynamically as a new string, that pointer
will never already be in the hashmap, thus hashmap_put() will never actually
fail with EEXISTS. Thus check for the existance of the cgroup path explicitly.

Before this, "-.slice" got initialized and its child PIDs migrated many times
through queuing the realization of sibling units; thiss caused any cgroup
controller layout from sub-cgroups to be reverted and their pids moved back to
the root cgroup in all controllers (except systemd).
---
 src/core/cgroup.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/src/core/cgroup.c b/src/core/cgroup.c
index 8820bee..3d36080 100644
--- a/src/core/cgroup.c
+++ b/src/core/cgroup.c
@@ -614,6 +614,13 @@ static int unit_create_cgroups(Unit *u, CGroupControllerMask mask) {
         if (!path)
                 return log_oom();
 
+        log_debug("unit_create_cgroups %s: path=%s realized %i hashmap %p", u->id, path, u->cgroup_realized, hashmap_get(u->manager->cgroup_unit, path));
+
+        if (hashmap_get(u->manager->cgroup_unit, path)) {
+                log_error("unit_create_cgroups %s: cgroup %s exists already", u->id, u->cgroup_path);
+                return -EEXIST;
+        }
+
         r = hashmap_put(u->manager->cgroup_unit, path, u);
         if (r < 0) {
                 log_error(r == -EEXIST ? "cgroup %s exists already: %s" : "hashmap_put failed for %s: %s", path, strerror(-r));
-- 
2.1.3

Attachment: signature.asc
Description: Digital signature

_______________________________________________
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel

Reply via email to