Public bug reported: **Environment**
Queens OVSGTW DVR Mode: dvr_snat CMP DVR Mode: dvr No L3 HA Use Case: Centralized FIPs (aka Floating IPs agains unbound ports) https://object-storage-ca-ymq-1.vexxhost.net/swift/v1/6e4619c416ff4bd19e1c087f27a43eea/www-assets-prod/presentation-media/Neutron-Port-Binding-and-Impact-of-unbound-ports-on-DVR-Routers-with-FloatingIP.pdf **How to reproduce** 1. Create normally a VM 2. Create allowed-pair port against the VM port openstack port list --server <server_name> # Get port id openstack port create --security-group <sec_group> --fixed-ip subnet=<subnet>,ip-address=<ip_address> --network <network name> <port name> openstack port set --allowed-address ip-address=<ip_address> <server port> 3. Assign floating ip to the port openstack floating ip set --port <port_name> <floating_ip> 4. Inside the deployed VM create IP alias for the new ip address ip addr add <ip_address>/24 dev ens3 5. Detect which gtw node is hosting the centralized fip neutron l3-agent-list-hosting-router <router> 6. Perform manual failover neutron l3-agent-router-remove <hosting-l3-agent> <router> neutron l3-agent-router-add <new-l3-agent> <router> (Or) Perform automatic failover shutdown -h now (on hosting gtw) 7. Detect failover happened on new node neutron l3-agent-list-hosting-router <router> **Expected Result** Connection to floating ip address recovers automatically **Actual Result** Connection does not recover. Reoccurrence is 100% **How to recover** Perform "neutron-l3-agent" restart on hosting node (after failover). Recovers within few seconds. systemctl restart neutron-l3-agent **Additional information** After failover the SNAT namespace does not include the sysctl rules that should be added upon namespace creation. We have also confirmed that fixing them manually also fixes the issue. https://github.com/openstack/neutron/blob/stable/queens/neutron/agent/l3/namespaces.py#L91-L107 The following is the sysctl's after failover --- root@gtw03:~# ip netns exec snat-8737216a-b561-434f-a023-1d9cae2ce04e sysctl net.ipv4.ip_forward net.ipv4.ip_forward = 0 root@gtw03:~# ip netns exec snat-8737216a-b561-434f-a023-1d9cae2ce04e sysctl net.ipv4.conf.all.arp_ignore net.ipv4.conf.all.arp_ignore = 0 root@gtw03:~# ip netns exec snat-8737216a-b561-434f-a023-1d9cae2ce04e sysctl net.ipv4.conf.all.arp_announce net.ipv4.conf.all.arp_announce = 0 root@gtw03:~# ip netns exec snat-8737216a-b561-434f-a023-1d9cae2ce04e sysctl net.ipv6.conf.all.forwarding net.ipv6.conf.all.forwarding = 1 root@gtw03:~# --- We are believe this caused by the following commits which only does initialization when neutron-l3-agent starts. https://github.com/openstack/neutron/commit/9d5e80e935049d08e0fcefc0c823fb67c793a51b ** Affects: neutron Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1881995 Title: Centralized SNAT failover does not recover until "systemctl restart neutron-l3-agent" on transferred node Status in neutron: New Bug description: **Environment** Queens OVSGTW DVR Mode: dvr_snat CMP DVR Mode: dvr No L3 HA Use Case: Centralized FIPs (aka Floating IPs agains unbound ports) https://object-storage-ca-ymq-1.vexxhost.net/swift/v1/6e4619c416ff4bd19e1c087f27a43eea/www-assets-prod/presentation-media/Neutron-Port-Binding-and-Impact-of-unbound-ports-on-DVR-Routers-with-FloatingIP.pdf **How to reproduce** 1. Create normally a VM 2. Create allowed-pair port against the VM port openstack port list --server <server_name> # Get port id openstack port create --security-group <sec_group> --fixed-ip subnet=<subnet>,ip-address=<ip_address> --network <network name> <port name> openstack port set --allowed-address ip-address=<ip_address> <server port> 3. Assign floating ip to the port openstack floating ip set --port <port_name> <floating_ip> 4. Inside the deployed VM create IP alias for the new ip address ip addr add <ip_address>/24 dev ens3 5. Detect which gtw node is hosting the centralized fip neutron l3-agent-list-hosting-router <router> 6. Perform manual failover neutron l3-agent-router-remove <hosting-l3-agent> <router> neutron l3-agent-router-add <new-l3-agent> <router> (Or) Perform automatic failover shutdown -h now (on hosting gtw) 7. Detect failover happened on new node neutron l3-agent-list-hosting-router <router> **Expected Result** Connection to floating ip address recovers automatically **Actual Result** Connection does not recover. Reoccurrence is 100% **How to recover** Perform "neutron-l3-agent" restart on hosting node (after failover). Recovers within few seconds. systemctl restart neutron-l3-agent **Additional information** After failover the SNAT namespace does not include the sysctl rules that should be added upon namespace creation. We have also confirmed that fixing them manually also fixes the issue. https://github.com/openstack/neutron/blob/stable/queens/neutron/agent/l3/namespaces.py#L91-L107 The following is the sysctl's after failover --- root@gtw03:~# ip netns exec snat-8737216a-b561-434f-a023-1d9cae2ce04e sysctl net.ipv4.ip_forward net.ipv4.ip_forward = 0 root@gtw03:~# ip netns exec snat-8737216a-b561-434f-a023-1d9cae2ce04e sysctl net.ipv4.conf.all.arp_ignore net.ipv4.conf.all.arp_ignore = 0 root@gtw03:~# ip netns exec snat-8737216a-b561-434f-a023-1d9cae2ce04e sysctl net.ipv4.conf.all.arp_announce net.ipv4.conf.all.arp_announce = 0 root@gtw03:~# ip netns exec snat-8737216a-b561-434f-a023-1d9cae2ce04e sysctl net.ipv6.conf.all.forwarding net.ipv6.conf.all.forwarding = 1 root@gtw03:~# --- We are believe this caused by the following commits which only does initialization when neutron-l3-agent starts. https://github.com/openstack/neutron/commit/9d5e80e935049d08e0fcefc0c823fb67c793a51b To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1881995/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp