[Yahoo-eng-team] [Bug 1974057] [NEW] [neutron-dynamic-routing] Plugin RPC queue should be consumed by RPC workers
Public bug reported: Currently, the RPC queue of the BGP service plugin is consumed directdly in the plugin thread. This may lead to unprocessed data in TCP queues due to infrequent polling, AMQP connection drops due to missed heartbeats, and other unwanted behavior. Instead, the RPC queue should be consumed by RPC workers, the same way as it is already done in other service plugins, like L3 plugin, metering plugin, etc. ** Affects: neutron Importance: Undecided Assignee: Renat Nurgaliyev (rnurgaliyev) Status: In Progress -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1974057 Title: [neutron-dynamic-routing] Plugin RPC queue should be consumed by RPC workers Status in neutron: In Progress Bug description: Currently, the RPC queue of the BGP service plugin is consumed directdly in the plugin thread. This may lead to unprocessed data in TCP queues due to infrequent polling, AMQP connection drops due to missed heartbeats, and other unwanted behavior. Instead, the RPC queue should be consumed by RPC workers, the same way as it is already done in other service plugins, like L3 plugin, metering plugin, etc. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1974057/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1973347] [NEW] OVN revision_number infinite update loop
Public bug reported: After the change described in https://mail.openvswitch.org/pipermail/ovs-dev/2022-May/393966.html was merged and released in stable OVN 22.03, there is a possibility to create an endless loop of revision_number update in external_ids of ports and router_ports. We have confirmed the bug in Ussuri and Yoga. When the problem happens, the Neutron log would look like this: 2022-05-13 09:30:56.318 25 ... Successfully bumped revision number for resource 8af189bd-c5bf-48a9-b072-3fb6c69ae592 (type: router_ports) to 4815 2022-05-13 09:30:56.366 25 ... Running txn n=1 command(idx=0): CheckRevisionNumberCommand(...) 2022-05-13 09:30:56.367 25 ... Running txn n=1 command(idx=1): SetLSwitchPortCommand(...) 2022-05-13 09:30:56.367 25 ... Running txn n=1 command(idx=2): PgDelPortCommand(...) 2022-05-13 09:30:56.467 25 ... Successfully bumped revision number for resource 8af189bd-c5bf-48a9-b072-3fb6c69ae592 (type: ports) to 4815 2022-05-13 09:30:56.880 25 ... Running txn n=1 command(idx=0): CheckRevisionNumberCommand(...) 2022-05-13 09:30:56.881 25 ... Running txn n=1 command(idx=1): UpdateLRouterPortCommand(...) 2022-05-13 09:30:56.881 25 ... Running txn n=1 command(idx=2): SetLRouterPortInLSwitchPortCommand(...) 2022-05-13 09:30:56.984 25 ... Successfully bumped revision number for resource 8af189bd-c5bf-48a9-b072-3fb6c69ae592 (type: router_ports) to 4816 2022-05-13 09:30:57.057 25 ... Running txn n=1 command(idx=0): CheckRevisionNumberCommand(...) 2022-05-13 09:30:57.057 25 ... Running txn n=1 command(idx=1): SetLSwitchPortCommand(...) 2022-05-13 09:30:57.058 25 ... Running txn n=1 command(idx=2): PgDelPortCommand(...) 2022-05-13 09:30:57.159 25 ... Successfully bumped revision number for resource 8af189bd-c5bf-48a9-b072-3fb6c69ae592 (type: ports) to 4816 2022-05-13 09:30:57.523 25 ... Running txn n=1 command(idx=0): CheckRevisionNumberCommand(...) 2022-05-13 09:30:57.523 25 ... Running txn n=1 command(idx=1): UpdateLRouterPortCommand(...) 2022-05-13 09:30:57.524 25 ... Running txn n=1 command(idx=2): SetLRouterPortInLSwitchPortCommand(...) 2022-05-13 09:30:57.627 25 ... Successfully bumped revision number for resource 8af189bd-c5bf-48a9-b072-3fb6c69ae592 (type: router_ports) to 4817 2022-05-13 09:30:57.674 25 ... Running txn n=1 command(idx=0): CheckRevisionNumberCommand(...) 2022-05-13 09:30:57.674 25 ... Running txn n=1 command(idx=1): SetLSwitchPortCommand(...) 2022-05-13 09:30:57.675 25 ... Running txn n=1 command(idx=2): PgDelPortCommand(...) 2022-05-13 09:30:57.765 25 ... Successfully bumped revision number for resource 8af189bd-c5bf-48a9-b072-3fb6c69ae592 (type: ports) to 4817 (full version here: https://pastebin.com/raw/NLP1b6Qm). In our lab environment we have confirmed that the problem is gone after mentioned change is rolled back. ** Affects: neutron Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1973347 Title: OVN revision_number infinite update loop Status in neutron: New Bug description: After the change described in https://mail.openvswitch.org/pipermail/ovs-dev/2022-May/393966.html was merged and released in stable OVN 22.03, there is a possibility to create an endless loop of revision_number update in external_ids of ports and router_ports. We have confirmed the bug in Ussuri and Yoga. When the problem happens, the Neutron log would look like this: 2022-05-13 09:30:56.318 25 ... Successfully bumped revision number for resource 8af189bd-c5bf-48a9-b072-3fb6c69ae592 (type: router_ports) to 4815 2022-05-13 09:30:56.366 25 ... Running txn n=1 command(idx=0): CheckRevisionNumberCommand(...) 2022-05-13 09:30:56.367 25 ... Running txn n=1 command(idx=1): SetLSwitchPortCommand(...) 2022-05-13 09:30:56.367 25 ... Running txn n=1 command(idx=2): PgDelPortCommand(...) 2022-05-13 09:30:56.467 25 ... Successfully bumped revision number for resource 8af189bd-c5bf-48a9-b072-3fb6c69ae592 (type: ports) to 4815 2022-05-13 09:30:56.880 25 ... Running txn n=1 command(idx=0): CheckRevisionNumberCommand(...) 2022-05-13 09:30:56.881 25 ... Running txn n=1 command(idx=1): UpdateLRouterPortCommand(...) 2022-05-13 09:30:56.881 25 ... Running txn n=1 command(idx=2): SetLRouterPortInLSwitchPortCommand(...) 2022-05-13 09:30:56.984 25 ... Successfully bumped revision number for resource 8af189bd-c5bf-48a9-b072-3fb6c69ae592 (type: router_ports) to 4816 2022-05-13 09:30:57.057 25 ... Running txn n=1 command(idx=0): CheckRevisionNumberCommand(...) 2022-05-13 09:30:57.057 25 ... Running txn n=1 command(idx=1): SetLSwitchPortCommand(...) 2022-05-13 09:30:57.058 25 ... Running txn n=1 command(idx=2): PgDelPortCommand(...) 2022-05-13 09:30:57.159 25 ... Successfully bumped revision number for resource 8af189bd-c5bf-48a9-b072-3fb6c69ae592 (type: ports) to 4816 2022-05-13
[Yahoo-eng-team] [Bug 1920065] [NEW] Automatic rescheduling of BGP speakers on DrAgents
Public bug reported: In case when dynamic routing agent becomes unreachable, neutron takes these actions: 1. Remove all BGP speakers from unreachable agents 2. Schedule all unassigned BGP speakers on available DrAgents This behavior can be undesirable, in the following cases: 1. Speakers are removed from DrAgent, even if there is no other alive agent running. Sometimes, I'd prefer them to stay configured exactly where they are, and come back after DrAgent is back online, after the server is restarted or so. This sometimes leads to situations, especially when there is only one active DrAgent, that speakers are not configured on any DrAgent at all. 2. Sometimes it is desirable to let operator control which components are running where. For example, not every node running DrAgent has reachability to all iBGP peers, and network designer places route reflectors, DrAgents, BGP speakers, in their appropriate places, keeping in mind high availability and other concerns. In these setups, it could be better to let the speaker fail on DrAgent which is down. Moving speaker to another DrAgent also means that the source IP address for the BGP session will also change, which sometimes can be not so good to reconfigure on the other side of BGP peering, and not predictable at all. These situations may happen after following change was introduced: https://review.opendev.org/c/openstack/neutron-dynamic-routing/+/478455 My proposal is to add a configuration flag to control this behavior: https://review.opendev.org/c/openstack/neutron-dynamic-routing/+/780675 ** Affects: neutron Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1920065 Title: Automatic rescheduling of BGP speakers on DrAgents Status in neutron: New Bug description: In case when dynamic routing agent becomes unreachable, neutron takes these actions: 1. Remove all BGP speakers from unreachable agents 2. Schedule all unassigned BGP speakers on available DrAgents This behavior can be undesirable, in the following cases: 1. Speakers are removed from DrAgent, even if there is no other alive agent running. Sometimes, I'd prefer them to stay configured exactly where they are, and come back after DrAgent is back online, after the server is restarted or so. This sometimes leads to situations, especially when there is only one active DrAgent, that speakers are not configured on any DrAgent at all. 2. Sometimes it is desirable to let operator control which components are running where. For example, not every node running DrAgent has reachability to all iBGP peers, and network designer places route reflectors, DrAgents, BGP speakers, in their appropriate places, keeping in mind high availability and other concerns. In these setups, it could be better to let the speaker fail on DrAgent which is down. Moving speaker to another DrAgent also means that the source IP address for the BGP session will also change, which sometimes can be not so good to reconfigure on the other side of BGP peering, and not predictable at all. These situations may happen after following change was introduced: https://review.opendev.org/c/openstack/neutron-dynamic-routing/+/478455 My proposal is to add a configuration flag to control this behavior: https://review.opendev.org/c/openstack/neutron-dynamic-routing/+/780675 To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1920065/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp