Public bug reported: Hi,
Looking at the external networks from the edge environment I see that these fields are None: | provider:network_type | None | | provider:physical_network | None | Instead we have this: | segments | [{'provider:network_type': 'flat', 'provider:physical_network': 'leaf0', 'provider:segmentation_id': None}, {'provider:network_type': 'flat', 'provider:physical_network': 'leaf1', 'provider:segmentation_id': None}, {'provider:network_type': 'flat', 'provider:physical_network': 'leaf2', 'provider:segmentation_id': None}] | When building a list of candidates nodes to scheduler the gateway router ports to, the ML2/OVN driver tries to check if there's a physical network on the nodes, see [0][1]. And in order to do that it uses the "provider:network_type" and "provider:physical_network" fields (see [1]). So the physnet attribute is now None (see [0]) and when it gets to the get_candidates_for_scheduling() method [2] the list of candidates will be empty because no gateway node matched this physnet. Also it is in this method that we filter the candidates based on the AZs. Now, the reason why it does not fail and the gw port still get scheduled to any other gw node is because once it gets to the scheduler code if the list candidates is empty it will then just fetch a list of gw chassis without any consideration [3] regarding the physnets and use it as candidates. As you can see the code is messy and a future refactor may be needed. For this problem specifically I would recommend doing a simpler fix where get_candidates_for_scheduling() would consider all GW chassis independent of the physnet in case it's None and then filter these Chassis based on their AZ. That would be a simpler fix that is backportable. [0] https://github.com/openstack/neutron/blob/b7befc98118c270877b42e94f9cb6f7ccad0b072/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovn_client.py#L1370 [1] https://github.com/openstack/neutron/blob/b7befc98118c270877b42e94f9cb6f7ccad0b072/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovn_client.py#L1314-L1317 [2] https://github.com/openstack/neutron/blob/b7befc98118c270877b42e94f9cb6f7ccad0b072/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovn_client.py#L1291-L1296 [3] https://github.com/openstack/neutron/blob/b7befc98118c270877b42e94f9cb6f7ccad0b072/neutron/scheduler/l3_ovn_scheduler.py#L62 ** Affects: neutron Importance: High Assignee: Lucas Alvares Gomes (lucasagomes) Status: Confirmed ** Tags: ovn ** Changed in: neutron Status: New => Confirmed ** Changed in: neutron Importance: Undecided => High ** Changed in: neutron Assignee: (unassigned) => Lucas Alvares Gomes (lucasagomes) ** Tags added: ovn -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1939144 Title: [OVN] Router Availability Zones doesn't work with segmented networks Status in neutron: Confirmed Bug description: Hi, Looking at the external networks from the edge environment I see that these fields are None: | provider:network_type | None | | provider:physical_network | None | Instead we have this: | segments | [{'provider:network_type': 'flat', 'provider:physical_network': 'leaf0', 'provider:segmentation_id': None}, {'provider:network_type': 'flat', 'provider:physical_network': 'leaf1', 'provider:segmentation_id': None}, {'provider:network_type': 'flat', 'provider:physical_network': 'leaf2', 'provider:segmentation_id': None}] | When building a list of candidates nodes to scheduler the gateway router ports to, the ML2/OVN driver tries to check if there's a physical network on the nodes, see [0][1]. And in order to do that it uses the "provider:network_type" and "provider:physical_network" fields (see [1]). So the physnet attribute is now None (see [0]) and when it gets to the get_candidates_for_scheduling() method [2] the list of candidates will be empty because no gateway node matched this physnet. Also it is in this method that we filter the candidates based on the AZs. Now, the reason why it does not fail and the gw port still get scheduled to any other gw node is because once it gets to the scheduler code if the list candidates is empty it will then just fetch a list of gw chassis without any consideration [3] regarding the physnets and use it as candidates. As you can see the code is messy and a future refactor may be needed. For this problem specifically I would recommend doing a simpler fix where get_candidates_for_scheduling() would consider all GW chassis independent of the physnet in case it's None and then filter these Chassis based on their AZ. That would be a simpler fix that is backportable. [0] https://github.com/openstack/neutron/blob/b7befc98118c270877b42e94f9cb6f7ccad0b072/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovn_client.py#L1370 [1] https://github.com/openstack/neutron/blob/b7befc98118c270877b42e94f9cb6f7ccad0b072/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovn_client.py#L1314-L1317 [2] https://github.com/openstack/neutron/blob/b7befc98118c270877b42e94f9cb6f7ccad0b072/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovn_client.py#L1291-L1296 [3] https://github.com/openstack/neutron/blob/b7befc98118c270877b42e94f9cb6f7ccad0b072/neutron/scheduler/l3_ovn_scheduler.py#L62 To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1939144/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp