Reviewed: https://review.opendev.org/c/openstack/neutron/+/952988 Committed: https://opendev.org/openstack/neutron/commit/9001ddcb1c74b018fc3d47ddb04a76edf38168e1 Submitter: "Zuul (22348)" Branch: master
commit 9001ddcb1c74b018fc3d47ddb04a76edf38168e1 Author: elajkat <[email protected]> Date: Fri Jun 6 15:18:55 2025 +0200 Delete tunnel endpoints if endpoint agent is deleted Closes-Bug: #2084446 Change-Id: I64ad033e600c2c87af4716736c10e11c143afea2 Signed-off-by: lajoskatona <[email protected]> ** Changed in: neutron Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/2084446 Title: Scaling down neutron-openvswitch-agent don't remote it's tunnel endpoints Status in neutron: Fix Released Bug description: When scalling down node with neutron L2 agent (I tested it with neutron-openvswitch-agent) it is not cleaning it's tunnel endpoints from the db - for vxlan tunnels it is entry in "ml2_vxlan_endpoints" table but every tunneling has own table there. Additional issue is that even if I removed manually endpoint entry from that table, other running agents still kept tunnel to that endpoint in their br-tun bridge, even after I restarted such agent. To understand exactly what the issue is here is what I did step by step: 1. Deployed multinode devstack with compute-1 and compute-2 nodes, 2. Tunnels in br-tun were created by the neutron-openvswitch-agent on both nodes, 3. I stopped neutron-openvswitch-agent on compute-1 node and then I delete it from neutron db with API command "openstack network agent delete <agent_id>" 4. On compute-2 there was still tunnel to the compute-1 created in br-tun, 5. In the neutron db in "ml2_vxlan_endpoints" table there was still endpoint to the compute-1, 6. I manually removed endpoint from the "ml2_vxlan_endoints" table in db using query: "DELETE FROM ml2_vxlan_endpoints WHERE host='devstack-ubuntu-compute-1';" 7. I restarted neutron-openvswitch-agent on compute-2 but even after that tunnel to the compute-1 was still there, 8. To get rid of the stale endpoint to the compute-1 on compute-2 I had to delete br-tun and then restart neutron-openvswitch-agent on compute-2 This is usually not a big issue if that tunnel is not cleaned but in some cases it may cause serious problem. For example if it is scaling down networker nodes in the cluster with L3 ha used, it may happen that old node is removed from the openstack cluster but for some reason still up and running in the datacenter. In such case keepalived processes for some HA routers may still be running there and as it has still connectivity through the vxlan tunnels to the new networker nodes, it may happen that active keepalived node will be this old one causing that in the neutron API router will be visible as 'standby' on all known L3 agents. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/2084446/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : [email protected] Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp

