Public bug reported: Network rescheduling would be triggered when neutron server is discovering that agents are down. At the same time, some bare metal and node management systems will reboot those same nodes at the same time. When those two actions happen together, it will result in the server sending RPC notifications to agents that just get rebooted which will lead to stale RPC messages when the DHCP agents return to service. These messages were sent to the agent before the node was rebooted but were not processed by the agent because it was shutdown at the time.
The negative effects brought by this case would be: when an agent has received a stale network create/end notification, it will be triggered to start servicing a network even though the server may have already had that network assigned to a different agent. Since the agent does not periodically audit the list of networks that it is servicing it could potentially continue servicing a network that was not assigned to it forever. Similarly, it is possible that a stale delete message is processed thus causing the agent to stop servicing a network that it was actually supposed to service. ** Affects: neutron Importance: Undecided Assignee: Kailun Qin (kailun.qin) Status: New ** Changed in: neutron Assignee: (unassigned) => Kailun Qin (kailun.qin) -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1795212 Title: [RFE] Prevent DHCP agent from processing stale RPC messages when restarting up Status in neutron: New Bug description: Network rescheduling would be triggered when neutron server is discovering that agents are down. At the same time, some bare metal and node management systems will reboot those same nodes at the same time. When those two actions happen together, it will result in the server sending RPC notifications to agents that just get rebooted which will lead to stale RPC messages when the DHCP agents return to service. These messages were sent to the agent before the node was rebooted but were not processed by the agent because it was shutdown at the time. The negative effects brought by this case would be: when an agent has received a stale network create/end notification, it will be triggered to start servicing a network even though the server may have already had that network assigned to a different agent. Since the agent does not periodically audit the list of networks that it is servicing it could potentially continue servicing a network that was not assigned to it forever. Similarly, it is possible that a stale delete message is processed thus causing the agent to stop servicing a network that it was actually supposed to service. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1795212/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp