This "bug" happens because of "unprivileged" containers:
root@corosync:~# corosync -f
Jul 20 21:26:32 notice [MAIN ] Corosync Cluster Engine 3.0.1 starting up
Jul 20 21:26:32 info [MAIN ] Corosync built-in features: dbus monitoring
watchdog augeas systemd xmlconf snmp pierelro bindnow
Jul 20 21:26:32 warning [MAIN ] Could not set SCHED_RR at priority 99:
Operation not permitted (1)
Jul 20 21:26:32 warning [MAIN ] Could not set priority -2147483648: Permission
denied (13)
Jul 20 21:26:32 notice [TOTEM ] Initializing transport (Kronosnet).
Jul 20 21:26:33 crit [TOTEM ] knet_handle_new failed: File name too long (36)
Jul 20 21:26:33 error [KNET ] transport: Failed to set socket buffer via
force option 33: Operation not permitted
Jul 20 21:26:33 error [KNET ] transport: Unable to set local socketpair
receive buffer: File name too long
Jul 20 21:26:33 error [KNET ] handle: Unable to initialize internal
hostsockpair: File name too long
Jul 20 21:26:33 error [MAIN ] Can't initialize TOTEM layer
Jul 20 21:26:33 error [MAIN ] Corosync Cluster Engine exiting with status 15
at main.c:1529.
connect(5, {sa_family=AF_UNIX, sun_path="/var/run/nscd/socket"}, 110) = -1
ENOENT (No such file or directory)
connect(5, {sa_family=AF_UNIX, sun_path="/var/run/nscd/socket"}, 110) = -1
ENOENT (No such file or directory)
openat(AT_FDCWD, "/sys/fs/cgroup/cpu/cpu.rt_runtime_us", O_RDONLY) = -1 ENOENT
(No such file or directory)
sched_setscheduler(0, SCHED_RR, [99]) = -1 EPERM (Operation not permitted)
setpriority(PRIO_PGRP, 0, -2147483648) = -1 EACCES (Permission denied)
prlimit64(0, RLIMIT_MEMLOCK, {rlim_cur=RLIM64_INFINITY,
rlim_max=RLIM64_INFINITY}, NULL) = -1 EPERM (Operation not permitted)
[pid 694] setsockopt(11, SOL_SOCKET, SO_RCVBUFFORCE, [8388608], 4) = -1 EPERM
(Operation not permitted)
[pid 694] epoll_ctl(0, EPOLL_CTL_DEL, 11, 0xff968fb8) = -1 EINVAL (Invalid
argument)
[pid 694] epoll_ctl(0, EPOLL_CTL_DEL, 0, 0xff968fb8) = -1 EINVAL (Invalid
argument)
[pid 694] close(0) = -1 EBADF (Bad file descriptor)
[pid 694] close(0) = -1 EBADF (Bad file descriptor)
[pid 695] madvise(0xf6055000, 8368128, MADV_DONTNEED) = -1 EINVAL (Invalid
argument)
----
I was able to reproduce the exact same issue by using lxd on armhf with
unprivileged containers. And its pretty clear to check the issue by
issuing:
root@corosync:~# ulimit -l unlimited
-bash: ulimit: max locked memory: cannot modify limit: Operation not permitted
as root and checking that "root" does not have "cap_sys_resource"
capabilities. There is also the Kronosnet initialization failure because
of low {r,w}mem_max values.
--
You received this bug notification because you are a member of Ubuntu
High Availability Team, which is subscribed to pacemaker in Ubuntu.
https://bugs.launchpad.net/bugs/1828228
Title:
corosync fails to start in container (armhf) bump some limits
Status in Auto Package Testing:
New
Status in corosync package in Ubuntu:
In Progress
Status in pacemaker package in Ubuntu:
In Progress
Bug description:
Currently pacemaker v2 fails to start in armhf containers (and by
extension corosync too).
I found that it is reproducible locally, and that I had to bump a few
limits to get it going.
Specifically I did:
1) bump memlock limits
2) bump rmem_max limits
= 1) Bump memlock limits =
I have no idea, which one of these finally worked, and/or is
sufficient. A bit of a whack-a-mole.
cat >>/etc/security/limits.conf <<EOF
* soft memlock unlimited
* hard memlock unlimited
EOF
lxc config set nice-mako limits.kernel.memlock 33554432
mkdir -p /etc/systemd/system/snap.lxd.daemon.service.d/
cat >/etc/systemd/system/snap.lxd.daemon.service.d/override.conf <<EOF
[Service]
LimitMEMLOCK=6553600000
EOF
systemctl daemon-reload
systemctl restart snap.lxd.daemon.service
= 2) Bump rmem_max values =
Observed:
# strace -s99999 -f /usr/sbin/corosync 2>&1 | grep sockop
[pid 447] setsockopt(12, SOL_SOCKET, SO_RCVBUF, [8388608], 4) = 0
[pid 447] getsockopt(12, SOL_SOCKET, SO_RCVBUF, [425984], [4]) = 0
[pid 447] setsockopt(12, SOL_SOCKET, SO_RCVBUFFORCE, [8388608], 4) = -1
EPERM (Operation not permitted)
Bumped mem_max using:
sudo sysctl -w net.core.wmem_max=8388608
sudo sysctl -w net.core.rmem_max=8388608
(Not sure if the desired sized depends on the machine/container I am
running on)
Can we check the values for above things on our armhf containers and/or bump
them? or like can we mark pacemaker v2.0 autopkgtest as ignored on armhf?
To manage notifications about this bug go to:
https://bugs.launchpad.net/auto-package-testing/+bug/1828228/+subscriptions
_______________________________________________
Mailing list: https://launchpad.net/~ubuntu-ha
Post to : [email protected]
Unsubscribe : https://launchpad.net/~ubuntu-ha
More help : https://help.launchpad.net/ListHelp