[Bug 1815101] Re: [master] Restarting systemd-networkd breaks keepalived, heartbeat, corosync, pacemaker (interface aliases are restarted)

Rafael David Tinoco Thu, 07 Nov 2019 04:56:10 -0800

** Description changed:

  [impact]
  
- ip addresses managed by keepalived are lost across networkd restarts
+ - ALL related HA software has a small problem if interfaces are being
+ managed by systemd-networkd: nic restarts/reconfigs are always going to
+ wipe all interfaces aliases when HA software is not expecting it to (no
+ coordination between them.
+ 
+ - keepalived, smb ctdb, pacemaker, all suffer from this. Pacemaker is
+ smarter in this case because it has a service monitor that will restart
+ the virtual IP resource, in affected node & nic, before considering a
+ real failure, but other HA service might consider a real failure when it
+ is not.
  
  [test case]
  
- see original description below
+ - comment #14 is a full test case: to have 3 node pacemaker, in that
+ example, and cause a networkd service restart: it will trigger a failure
+ for the virtual IP resource monitor.
+ 
+ - other example is given in the original description for keepalived.
+ both suffer from the same issue (and other HA softwares as well).
  
  [regression potential]
  
- this backports KeepConfiguration parameter, which adds some significant
- complexity to networkd's configuration and behavior, which could lead to
- regressions in correctly configuring the network at networkd start, or
- incorrectly maintaining configuration at networkd restart, or losing
- network state at networkd stop.  Any regressions are most likely to
- occur during networkd start, restart, or stop, and most likely to
- involve missing or incorrect ip address(es).
+ - this backports KeepConfiguration parameter, which adds some
+ significant complexity to networkd's configuration and behavior, which
+ could lead to regressions in correctly configuring the network at
+ networkd start, or incorrectly maintaining configuration at networkd
+ restart, or losing network state at networkd stop.
+ 
+ - Any regressions are most likely to occur during networkd start,
+ restart, or stop, and most likely to involve missing or incorrect ip
+ address(es).
+ 
+ - the change is based in upstream patches adding the exact feature we
+ needed to fix this issue & it will be integrated with a netplan change
+ to add the needed stanza to systemd nic configuration file
+ (KeepConfiguration=)
  
  [other info]
  
  original description:
  ---
  
  Configure netplan for interfaces, for example (a working config with IP
  addresses obfuscated)
  
  network:
      ethernets:
          eth0:
              addresses: [192.168.0.5/24]
              dhcp4: false
              nameservers:
                search: [blah.com, other.blah.com, hq.blah.com, cust.blah.com, 
phone.blah.com]
                addresses: [10.22.11.1]
          eth2:
              addresses:
                - 12.13.14.18/29
                - 12.13.14.19/29
              gateway4: 12.13.14.17
              dhcp4: false
              nameservers:
                search: [blah.com, other.blah.com, hq.blah.com, cust.blah.com, 
phone.blah.com]
                addresses: [10.22.11.1]
          eth3:
              addresses: [10.22.11.6/24]
              dhcp4: false
              nameservers:
                search: [blah.com, other.blah.com, hq.blah.com, cust.blah.com, 
phone.blah.com]
                addresses: [10.22.11.1]
          eth4:
              addresses: [10.22.14.6/24]
              dhcp4: false
              nameservers:
                search: [blah.com, other.blah.com, hq.blah.com, cust.blah.com, 
phone.blah.com]
                addresses: [10.22.11.1]
          eth7:
              addresses: [9.5.17.34/29]
              dhcp4: false
              optional: true
              nameservers:
                search: [blah.com, other.blah.com, hq.blah.com, cust.blah.com, 
phone.blah.com]
                addresses: [10.22.11.1]
      version: 2
  
  Configure keepalived (again, a working config with IP addresses
  obfuscated)
  
  global_defs           # Block id
  {
  notification_email {
          sysadm...@blah.com
  }
          notification_email_from keepali...@system3.hq.blah.com
          smtp_server 10.22.11.7     # IP
          smtp_connect_timeout 30      # integer, seconds
          router_id system3          # string identifying the machine,
                                       # (doesn't have to be hostname).
          vrrp_mcast_group4 224.0.0.18 # optional, default 224.0.0.18
          vrrp_mcast_group6 ff02::12   # optional, default ff02::12
          enable_traps                 # enable SNMP traps
  }
  vrrp_sync_group collection {
          group {
                  wan
                  lan
                  phone
          }
  vrrp_instance wan {
          state MASTER
          interface eth2
          virtual_router_id 77
          priority 150
          advert_int 1
          smtp_alert
          authentication {
                  auth_type PASS
                  auth_pass BlahBlah
          }
          virtual_ipaddress {
          12.13.14.20
          }
  }
  vrrp_instance lan {
          state MASTER
          interface eth3
          virtual_router_id 78
          priority 150
          advert_int 1
          smtp_alert
          authentication {
                  auth_type PASS
                  auth_pass MoreBlah
          }
          virtual_ipaddress {
                  10.22.11.13/24
          }
  }
  vrrp_instance phone {
          state MASTER
          interface eth4
          virtual_router_id 79
          priority 150
          advert_int 1
          smtp_alert
          authentication {
                  auth_type PASS
                  auth_pass MostBlah
          }
          virtual_ipaddress {
                  10.22.14.3/24
          }
  }
  
  At boot the affected interfaces have:
  5: eth4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group 
default qlen 1000
      link/ether ab:cd:ef:90:c0:e3 brd ff:ff:ff:ff:ff:ff
      inet 10.22.14.6/24 brd 10.22.14.255 scope global eth4
         valid_lft forever preferred_lft forever
      inet 10.22.14.3/24 scope global secondary eth4
         valid_lft forever preferred_lft forever
      inet6 fe80::ae1f:6bff:fe90:c0e3/64 scope link
         valid_lft forever preferred_lft forever
  7: eth3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group 
default qlen 1000
      link/ether ab:cd:ef:b0:26:29 brd ff:ff:ff:ff:ff:ff
      inet 10.22.11.6/24 brd 10.22.11.255 scope global eth3
         valid_lft forever preferred_lft forever
      inet 10.22.11.13/24 scope global secondary eth3
         valid_lft forever preferred_lft forever
      inet6 fe80::ae1f:6bff:feb0:2629/64 scope link
         valid_lft forever preferred_lft forever
  9: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group 
default qlen 1000
      link/ether ab:cd:ef:b0:26:2b brd ff:ff:ff:ff:ff:ff
      inet 12.13.14.18/29 brd 12.13.14.23 scope global eth2
         valid_lft forever preferred_lft forever
      inet 12.13.14.20/32 scope global eth2
         valid_lft forever preferred_lft forever
      inet 12.33.89.19/29 brd 12.13.14.23 scope global secondary eth2
         valid_lft forever preferred_lft forever
      inet6 fe80::ae1f:6bff:feb0:262b/64 scope link
         valid_lft forever preferred_lft forever
  
  Run 'netplan try' (didn't even make any changes to the configuration) and the 
keepalived addresses disappear never to return, the affected interfaces have:
  5: eth4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group 
default qlen 1000
      link/ether ab:cd:ef:90:c0:e3 brd ff:ff:ff:ff:ff:ff
      inet 10.22.14.6/24 brd 10.22.14.255 scope global eth4
         valid_lft forever preferred_lft forever
      inet6 fe80::ae1f:6bff:fe90:c0e3/64 scope link
         valid_lft forever preferred_lft forever
  7: eth3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group 
default qlen 1000
      link/ether ab:cd:ef:b0:26:29 brd ff:ff:ff:ff:ff:ff
      inet 10.22.11.6/24 brd 10.22.11.255 scope global eth3
         valid_lft forever preferred_lft forever
      inet6 fe80::ae1f:6bff:feb0:2629/64 scope link
         valid_lft forever preferred_lft forever
  9: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group 
default qlen 1000
      link/ether ab:cd:ef:b0:26:2b brd ff:ff:ff:ff:ff:ff
      inet 12.13.14.18/29 brd 12.13.14.23 scope global eth2
         valid_lft forever preferred_lft forever
      inet 12.33.89.19/29 brd 12.13.14.23 scope global secondary eth2
         valid_lft forever preferred_lft forever
      inet6 fe80::ae1f:6bff:feb0:262b/64 scope link
         valid_lft forever preferred_lft forever


-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1815101

Title:
  [master] Restarting systemd-networkd breaks keepalived, heartbeat,
  corosync, pacemaker (interface aliases are restarted)

To manage notifications about this bug go to:
https://bugs.launchpad.net/charm-keepalived/+bug/1815101/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1815101] Re: [master] Restarting systemd-networkd breaks keepalived, heartbeat, corosync, pacemaker (interface aliases are restarted)

Reply via email to