Bug closed due to lack of activity, please feel free to reopen if needed. ** Changed in: neutron Status: In Progress => Won't Fix
-- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1533455 Title: Stale processes lives after a fanout deleting HA router RPC between L3 agents Status in neutron: Won't Fix Bug description: Stale processes lives after a fanout deleting HA router RPC between L3 agents: The race happened between l3 agents after a fanout deleting HA router RPC. Race Scenario: 1. HA router X was schedulered to L3 agent A and L3 agent B 2. X in L3 agent A is the master state 3. a delete X RPC fanout 4. agent A delete all X HA attributes and processes including keepalived 5. (race) agent B was not ready to process the deleting RPC, assume there are a lot of deleting RPC is in the router update queue, or anything cause the agent B delay processing the RPC. 6. (race) X in agent B is backup state, now it can not get the VRRP advertisement from X in agent A because of the 4, so X set it's state to master 8. (race) enqueue_state_change for X in agent B 9. (race) agent B could process the deleting RPC 10. (race) X is still in agent B router_info, so spawn the metadata- proxy 11. (race) agent B do deleting process for HA router X gateway, floating IP etc. 12. (race) agent B remove X from router info 13. metadata-proxy for router X in agent B lives. If you have tried to use rally to run create_and_delete_routers, you will find the l3 agent side will have some stale metadata-proxy processes after the rally test. The only way to decide whether to spawn the metedata-proxy is to try get router in agent router_info dict. But enqueue_state_change and processing router deleting can be run concurrently. Here are some statistics after running Rally create_and_delete_routers: yulong@network2:/opt/openstack/neutron$ ~/ha_resource_state.sh neutron-keepalived-state-change count: 0 neutron-ns-metadata-proxy count: 2 keepalived process count: 0 HA router master state count: 0 IP monitor count: 9 external pids: 2 -rwxr-xr-x 1 root root 5 Mar 7 17:21 /opt/openstack/data/neutron/external/pids/5a83fe00-37c9-45fa-b299-2a1c49ce4bcc.pid -rwxr-xr-x 1 root root 5 Mar 7 17:20 /opt/openstack/data/neutron/external/pids/d9e2bdd3-63ac-4302-bb06-2f66e0308292.pid HA interface ip: all metadata-proxy router id: d9e2bdd3-63ac-4302-bb06-2f66e0308292 5a83fe00-37c9-45fa-b299-2a1c49ce4bcc all ovs ha ports: 0 all router namespace: 0 To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1533455/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp