First check with the head-to-head restart is confirming my former assumption on Zesty it is: 1. not failing 2. taking longer each restart
I wanted to go further but a trivial test comparing with this: $ time for i in $(seq 1 200); do sudo systemctl restart keepalived; sudo systemctl status keepalived | egrep 'Main.*exited'; done Failed for the default start limit kicking in. Well, lets start softer and go with reload first. $ time for i in $(seq 1 200); do sudo systemctl reload keepalived; sudo systemctl status keepalived | egrep 'Main.*exited'; done That works fine on both (but also the HUP just forces a reload and the PIDs stay). So this by-design does not fall into the same fault. Ok, so with that lets obey the default start-limit of 5 starts per 10 seconds and compare again. Xenial: $ time for i in $(seq 1 5); do sudo systemctl restart keepalived; sudo systemctl status keepalived | egrep 'Main.*exited'; done Main PID: 8800 (code=exited, status=0/SUCCESS) Main PID: 8836 (code=exited, status=0/SUCCESS) real 0m0.156s user 0m0.008s sys 0m0.000s Note: 2 cases is all we can get, restart #1 works, #2 is too early triggering the issue, #3 cleans up, #4 triggering again, #5 cleaning up again. Zesty: $ time for i in $(seq 1 5); do sudo systemctl restart keepalived; sudo systemctl status keepalived | egrep 'Main.*exited'; done real 0m2,258s user 0m0,012s sys 0m0,000s So that seems to be the repro we need: - head to head restarts with no time in between - showing the error on Xenial - showing it is not occuring on Zesty - showing something on Zesty makes it "wait" which is what avoids the issue -- You received this bug notification because you are a member of Ubuntu High Availability Team, which is subscribed to keepalived in Ubuntu. https://bugs.launchpad.net/bugs/1644530 Title: keepalived fails to restart cleanly due to the wrong systemd settings Status in keepalived package in Ubuntu: Fix Released Status in keepalived source package in Xenial: Confirmed Bug description: Because "PIDFile=" directive is missing in the systemd unit file, keepalived sometimes fails to kill all old processes. The old processes remain with old settings and cause unexpected behaviors. The detail of this bug is described in this ticket in upstream: https://github.com/acassen/keepalived/issues/443. The official systemd unit file is available since version 1.2.24 by this commit: https://github.com/acassen/keepalived/commit/635ab69afb44cd8573663e62f292c6bb84b44f15 This includes "PIDFile" directive correctly: PIDFile=/var/run/keepalived.pid We should go the same way. I am using Ubuntu 16.04.1, kernel 4.4.0-45-generic. Package: keepalived Version: 1.2.19-1 ======================================================================= How to reproduce: I used the two instances of Ubuntu 16.04.2 on DigitalOcean: Configurations -------------- MASTER server's /etc/keepalived/keepalived.conf: vrrp_script chk_nothing { script "/bin/true" interval 2 } vrrp_instance G1 { interface eth1 state BACKUP priority 100 virtual_router_id 123 unicast_src_ip <primal IP> unicast_peer { <secondal IP> } track_script { chk_nothing } } BACKUP server's /etc/keepalived/keepalived.conf: vrrp_script chk_nothing { script "/bin/true" interval 2 } vrrp_instance G1 { interface eth1 state MASTER priority 200 virtual_router_id 123 unicast_src_ip <secondal IP> unicast_peer { <primal IP> } track_script { chk_nothing } } Procedures ---------- 1) Start keepalived on both servers $ sudo systemctl start keepalived.service 2) Restart keepalived on either one $ sudo systemctl restart keepalived.service 3) Check status and PID $ systemctl status -n0 keepalived.service Result ------ 0) Before restart Main PID is 3402 and the subprocesses' PIDs are 3403-3406. So far so good. root@ubuntu-2gb-sgp1-01:~# systemctl status -n0 keepalived ● keepalived.service - Keepalive Daemon (LVS and VRRP) Loaded: loaded (/lib/systemd/system/keepalived.service; enabled; vendor preset: enabled) Active: active (running) since Sat 2017-03-04 01:37:12 UTC; 14min ago Process: 3402 ExecStart=/usr/sbin/keepalived $DAEMON_ARGS (code=exited, status=0/SUCCESS) Main PID: 3403 (keepalived) Tasks: 3 Memory: 1.7M CPU: 1.900s CGroup: /system.slice/keepalived.service ├─3403 /usr/sbin/keepalived ├─3405 /usr/sbin/keepalived └─3406 /usr/sbin/keepalived 1) First restart Now Main PID is 3403, which was one of the previous subprocesses and is actually exited. Something is wrong. Yet, the previous processes are all exited; we are not likely to see no weird behaviors here. root@ubuntu-2gb-sgp1-01:~# systemctl restart keepalived root@ubuntu-2gb-sgp1-01:~# systemctl status -n0 keepalived ● keepalived.service - Keepalive Daemon (LVS and VRRP) Loaded: loaded (/lib/systemd/system/keepalived.service; enabled; vendor preset: enabled) Active: active (running) since Sat 2017-03-04 01:51:45 UTC; 1s ago Process: 4782 ExecStart=/usr/sbin/keepalived $DAEMON_ARGS (code=exited, status=0/SUCCESS) Main PID: 3403 (code=exited, status=0/SUCCESS) Tasks: 3 Memory: 1.7M CPU: 11ms CGroup: /system.slice/keepalived.service ├─4783 /usr/sbin/keepalived ├─4784 /usr/sbin/keepalived └─4785 /usr/sbin/keepalived 2) Second restart Now Main PID is 4783 and subprocesses' PIDs are 4783-4785. This is problematic as 4783 is the old process, which should have exited before new processes arose. Therefore, keepalived remains in old settings while users believe it uses the new setting. root@ubuntu-2gb-sgp1-01:~# systemctl restart keepalived root@ubuntu-2gb-sgp1-01:~# systemctl status -n0 keepalived ● keepalived.service - Keepalive Daemon (LVS and VRRP) Loaded: loaded (/lib/systemd/system/keepalived.service; enabled; vendor preset: enabled) Active: active (running) since Sat 2017-03-04 01:51:49 UTC; 1s ago Process: 4796 ExecStart=/usr/sbin/keepalived $DAEMON_ARGS (code=exited, status=0/SUCCESS) Main PID: 4783 (keepalived) Tasks: 3 Memory: 1.7M CPU: 6ms CGroup: /system.slice/keepalived.service ├─4783 /usr/sbin/keepalived ├─4784 /usr/sbin/keepalived └─4785 /usr/sbin/keepalived To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/keepalived/+bug/1644530/+subscriptions _______________________________________________ Mailing list: https://launchpad.net/~ubuntu-ha Post to : [email protected] Unsubscribe : https://launchpad.net/~ubuntu-ha More help : https://help.launchpad.net/ListHelp

