[Ubuntu-ha] [Bug 1828228] Re: corosync fails to start in container (armhf) bump some limits

Rafael David Tinoco Sat, 20 Jul 2019 14:41:07 -0700

This "bug" happens because of "unprivileged" containers:


root@corosync:~# corosync -f
Jul 20 21:26:32 notice  [MAIN  ] Corosync Cluster Engine 3.0.1 starting up
Jul 20 21:26:32 info    [MAIN  ] Corosync built-in features: dbus monitoring 
watchdog augeas systemd xmlconf snmp pierelro bindnow
Jul 20 21:26:32 warning [MAIN  ] Could not set SCHED_RR at priority 99: 
Operation not permitted (1)
Jul 20 21:26:32 warning [MAIN  ] Could not set priority -2147483648: Permission 
denied (13)
Jul 20 21:26:32 notice  [TOTEM ] Initializing transport (Kronosnet).
Jul 20 21:26:33 crit    [TOTEM ] knet_handle_new failed: File name too long (36)
Jul 20 21:26:33 error   [KNET  ] transport: Failed to set socket buffer via 
force option 33: Operation not permitted
Jul 20 21:26:33 error   [KNET  ] transport: Unable to set local socketpair 
receive buffer: File name too long
Jul 20 21:26:33 error   [KNET  ] handle: Unable to initialize internal 
hostsockpair: File name too long
Jul 20 21:26:33 error   [MAIN  ] Can't initialize TOTEM layer
Jul 20 21:26:33 error   [MAIN  ] Corosync Cluster Engine exiting with status 15 
at main.c:1529.

connect(5, {sa_family=AF_UNIX, sun_path="/var/run/nscd/socket"}, 110) = -1 
ENOENT (No such file or directory)
connect(5, {sa_family=AF_UNIX, sun_path="/var/run/nscd/socket"}, 110) = -1 
ENOENT (No such file or directory)
openat(AT_FDCWD, "/sys/fs/cgroup/cpu/cpu.rt_runtime_us", O_RDONLY) = -1 ENOENT 
(No such file or directory)
sched_setscheduler(0, SCHED_RR, [99])   = -1 EPERM (Operation not permitted)
setpriority(PRIO_PGRP, 0, -2147483648)  = -1 EACCES (Permission denied)
prlimit64(0, RLIMIT_MEMLOCK, {rlim_cur=RLIM64_INFINITY, 
rlim_max=RLIM64_INFINITY}, NULL) = -1 EPERM (Operation not permitted)
[pid   694] setsockopt(11, SOL_SOCKET, SO_RCVBUFFORCE, [8388608], 4) = -1 EPERM 
(Operation not permitted)
[pid   694] epoll_ctl(0, EPOLL_CTL_DEL, 11, 0xff968fb8) = -1 EINVAL (Invalid 
argument)
[pid   694] epoll_ctl(0, EPOLL_CTL_DEL, 0, 0xff968fb8) = -1 EINVAL (Invalid 
argument)
[pid   694] close(0)                    = -1 EBADF (Bad file descriptor)
[pid   694] close(0)                    = -1 EBADF (Bad file descriptor)
[pid   695] madvise(0xf6055000, 8368128, MADV_DONTNEED) = -1 EINVAL (Invalid 
argument)

----

I was able to reproduce the exact same issue by using lxd on armhf with
unprivileged containers. And its pretty clear to check the issue by
issuing:

root@corosync:~# ulimit -l unlimited
-bash: ulimit: max locked memory: cannot modify limit: Operation not permitted

as root and checking that "root" does not have "cap_sys_resource"
capabilities. There is also the Kronosnet initialization failure because
of low {r,w}mem_max values.

-- 
You received this bug notification because you are a member of Ubuntu
High Availability Team, which is subscribed to pacemaker in Ubuntu.
https://bugs.launchpad.net/bugs/1828228

Title:
  corosync fails to start in container (armhf) bump some limits

Status in Auto Package Testing:
  New
Status in corosync package in Ubuntu:
  In Progress
Status in pacemaker package in Ubuntu:
  In Progress

Bug description:
  Currently pacemaker v2 fails to start in armhf containers (and by
  extension corosync too).

  I found that it is reproducible locally, and that I had to bump a few
  limits to get it going.

  Specifically I did:

  1) bump memlock limits
  2) bump rmem_max limits

  = 1) Bump memlock limits =

  I have no idea, which one of these finally worked, and/or is
  sufficient. A bit of a whack-a-mole.

  cat >>/etc/security/limits.conf <<EOF
  * soft memlock unlimited
  * hard memlock unlimited
  EOF

  lxc config set nice-mako limits.kernel.memlock 33554432

  mkdir -p /etc/systemd/system/snap.lxd.daemon.service.d/
  cat >/etc/systemd/system/snap.lxd.daemon.service.d/override.conf <<EOF
  [Service]
  LimitMEMLOCK=6553600000
  EOF
  systemctl daemon-reload
  systemctl restart snap.lxd.daemon.service

  
  = 2) Bump rmem_max values =

  Observed:
  # strace -s99999 -f /usr/sbin/corosync 2>&1 | grep sockop
  [pid   447] setsockopt(12, SOL_SOCKET, SO_RCVBUF, [8388608], 4) = 0
  [pid   447] getsockopt(12, SOL_SOCKET, SO_RCVBUF, [425984], [4]) = 0
  [pid   447] setsockopt(12, SOL_SOCKET, SO_RCVBUFFORCE, [8388608], 4) = -1 
EPERM (Operation not permitted)

  Bumped mem_max using:
  sudo sysctl -w net.core.wmem_max=8388608
  sudo sysctl -w net.core.rmem_max=8388608

  (Not sure if the desired sized depends on the machine/container I am
  running on)

  
  Can we check the values for above things on our armhf containers and/or bump 
them? or like can we mark pacemaker v2.0 autopkgtest as ignored on armhf?

To manage notifications about this bug go to:
https://bugs.launchpad.net/auto-package-testing/+bug/1828228/+subscriptions

_______________________________________________
Mailing list: https://launchpad.net/~ubuntu-ha
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~ubuntu-ha
More help   : https://help.launchpad.net/ListHelp

[Ubuntu-ha] [Bug 1828228] Re: corosync fails to start in container (armhf) bump some limits

Reply via email to