** Description changed: [impact] - ip addresses managed by keepalived are lost across networkd restarts + - ALL related HA software has a small problem if interfaces are being + managed by systemd-networkd: nic restarts/reconfigs are always going to + wipe all interfaces aliases when HA software is not expecting it to (no + coordination between them. + + - keepalived, smb ctdb, pacemaker, all suffer from this. Pacemaker is + smarter in this case because it has a service monitor that will restart + the virtual IP resource, in affected node & nic, before considering a + real failure, but other HA service might consider a real failure when it + is not. [test case] - see original description below + - comment #14 is a full test case: to have 3 node pacemaker, in that + example, and cause a networkd service restart: it will trigger a failure + for the virtual IP resource monitor. + + - other example is given in the original description for keepalived. + both suffer from the same issue (and other HA softwares as well). [regression potential] - this backports KeepConfiguration parameter, which adds some significant - complexity to networkd's configuration and behavior, which could lead to - regressions in correctly configuring the network at networkd start, or - incorrectly maintaining configuration at networkd restart, or losing - network state at networkd stop. Any regressions are most likely to - occur during networkd start, restart, or stop, and most likely to - involve missing or incorrect ip address(es). + - this backports KeepConfiguration parameter, which adds some + significant complexity to networkd's configuration and behavior, which + could lead to regressions in correctly configuring the network at + networkd start, or incorrectly maintaining configuration at networkd + restart, or losing network state at networkd stop. + + - Any regressions are most likely to occur during networkd start, + restart, or stop, and most likely to involve missing or incorrect ip + address(es). + + - the change is based in upstream patches adding the exact feature we + needed to fix this issue & it will be integrated with a netplan change + to add the needed stanza to systemd nic configuration file + (KeepConfiguration=) [other info] original description: --- Configure netplan for interfaces, for example (a working config with IP addresses obfuscated) network: ethernets: eth0: addresses: [192.168.0.5/24] dhcp4: false nameservers: search: [blah.com, other.blah.com, hq.blah.com, cust.blah.com, phone.blah.com] addresses: [10.22.11.1] eth2: addresses: - 12.13.14.18/29 - 12.13.14.19/29 gateway4: 12.13.14.17 dhcp4: false nameservers: search: [blah.com, other.blah.com, hq.blah.com, cust.blah.com, phone.blah.com] addresses: [10.22.11.1] eth3: addresses: [10.22.11.6/24] dhcp4: false nameservers: search: [blah.com, other.blah.com, hq.blah.com, cust.blah.com, phone.blah.com] addresses: [10.22.11.1] eth4: addresses: [10.22.14.6/24] dhcp4: false nameservers: search: [blah.com, other.blah.com, hq.blah.com, cust.blah.com, phone.blah.com] addresses: [10.22.11.1] eth7: addresses: [9.5.17.34/29] dhcp4: false optional: true nameservers: search: [blah.com, other.blah.com, hq.blah.com, cust.blah.com, phone.blah.com] addresses: [10.22.11.1] version: 2 Configure keepalived (again, a working config with IP addresses obfuscated) global_defs # Block id { notification_email { sysadm...@blah.com } notification_email_from keepali...@system3.hq.blah.com smtp_server 10.22.11.7 # IP smtp_connect_timeout 30 # integer, seconds router_id system3 # string identifying the machine, # (doesn't have to be hostname). vrrp_mcast_group4 224.0.0.18 # optional, default 224.0.0.18 vrrp_mcast_group6 ff02::12 # optional, default ff02::12 enable_traps # enable SNMP traps } vrrp_sync_group collection { group { wan lan phone } vrrp_instance wan { state MASTER interface eth2 virtual_router_id 77 priority 150 advert_int 1 smtp_alert authentication { auth_type PASS auth_pass BlahBlah } virtual_ipaddress { 12.13.14.20 } } vrrp_instance lan { state MASTER interface eth3 virtual_router_id 78 priority 150 advert_int 1 smtp_alert authentication { auth_type PASS auth_pass MoreBlah } virtual_ipaddress { 10.22.11.13/24 } } vrrp_instance phone { state MASTER interface eth4 virtual_router_id 79 priority 150 advert_int 1 smtp_alert authentication { auth_type PASS auth_pass MostBlah } virtual_ipaddress { 10.22.14.3/24 } } At boot the affected interfaces have: 5: eth4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000 link/ether ab:cd:ef:90:c0:e3 brd ff:ff:ff:ff:ff:ff inet 10.22.14.6/24 brd 10.22.14.255 scope global eth4 valid_lft forever preferred_lft forever inet 10.22.14.3/24 scope global secondary eth4 valid_lft forever preferred_lft forever inet6 fe80::ae1f:6bff:fe90:c0e3/64 scope link valid_lft forever preferred_lft forever 7: eth3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000 link/ether ab:cd:ef:b0:26:29 brd ff:ff:ff:ff:ff:ff inet 10.22.11.6/24 brd 10.22.11.255 scope global eth3 valid_lft forever preferred_lft forever inet 10.22.11.13/24 scope global secondary eth3 valid_lft forever preferred_lft forever inet6 fe80::ae1f:6bff:feb0:2629/64 scope link valid_lft forever preferred_lft forever 9: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000 link/ether ab:cd:ef:b0:26:2b brd ff:ff:ff:ff:ff:ff inet 12.13.14.18/29 brd 12.13.14.23 scope global eth2 valid_lft forever preferred_lft forever inet 12.13.14.20/32 scope global eth2 valid_lft forever preferred_lft forever inet 12.33.89.19/29 brd 12.13.14.23 scope global secondary eth2 valid_lft forever preferred_lft forever inet6 fe80::ae1f:6bff:feb0:262b/64 scope link valid_lft forever preferred_lft forever Run 'netplan try' (didn't even make any changes to the configuration) and the keepalived addresses disappear never to return, the affected interfaces have: 5: eth4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000 link/ether ab:cd:ef:90:c0:e3 brd ff:ff:ff:ff:ff:ff inet 10.22.14.6/24 brd 10.22.14.255 scope global eth4 valid_lft forever preferred_lft forever inet6 fe80::ae1f:6bff:fe90:c0e3/64 scope link valid_lft forever preferred_lft forever 7: eth3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000 link/ether ab:cd:ef:b0:26:29 brd ff:ff:ff:ff:ff:ff inet 10.22.11.6/24 brd 10.22.11.255 scope global eth3 valid_lft forever preferred_lft forever inet6 fe80::ae1f:6bff:feb0:2629/64 scope link valid_lft forever preferred_lft forever 9: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000 link/ether ab:cd:ef:b0:26:2b brd ff:ff:ff:ff:ff:ff inet 12.13.14.18/29 brd 12.13.14.23 scope global eth2 valid_lft forever preferred_lft forever inet 12.33.89.19/29 brd 12.13.14.23 scope global secondary eth2 valid_lft forever preferred_lft forever inet6 fe80::ae1f:6bff:feb0:262b/64 scope link valid_lft forever preferred_lft forever
-- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1815101 Title: [master] Restarting systemd-networkd breaks keepalived, heartbeat, corosync, pacemaker (interface aliases are restarted) To manage notifications about this bug go to: https://bugs.launchpad.net/charm-keepalived/+bug/1815101/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs