There were many fixes related to the reported issue in neutron and ovs since this bug report, some of these that i quickly catched are:- - https://patchwork.ozlabs.org/project/openvswitch/patch/20220819230810.2626573-1-i.maxim...@ovn.org/ - https://review.opendev.org/c/openstack/ovsdbapp/+/856200 - https://review.opendev.org/c/openstack/ovsdbapp/+/862524 - https://review.opendev.org/c/openstack/neutron/+/857775 - https://review.opendev.org/c/openstack/neutron/+/871825
Closing it based on above and Comment #5. If the issues are still seen with python-ovs>=2.17 and above fixes included please feel free to open the issue along with ovsdb-server sb logs and neutron server and metadata agent debug logs. ** Bug watch added: Red Hat Bugzilla #2214289 https://bugzilla.redhat.com/show_bug.cgi?id=2214289 ** Changed in: neutron Status: New => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1996594 Title: OVN metadata randomly stops working Status in neutron: Fix Released Bug description: We found that OVN metadata will not work randomly when OVN is writing a snapshot. 1, At 12:30:35, OVN started to transfer leadership to write a snapshot $ find sosreport-juju-2752e1-*/var/log/ovn/* |xargs zgrep -i -E 'Transferring leadership' sosreport-juju-2752e1-6-lxd-24-xxx-2022-08-18-entowko/var/log/ovn/ovsdb-server-sb.log:2022-08-18T12:30:35.322Z|80962|raft|INFO|Transferring leadership to write a snapshot. sosreport-juju-2752e1-6-lxd-24-xxx-2022-08-18-entowko/var/log/ovn/ovsdb-server-sb.log:2022-08-18T17:52:53.024Z|82382|raft|INFO|Transferring leadership to write a snapshot. sosreport-juju-2752e1-7-lxd-27-xxx-2022-08-18-hhxxqci/var/log/ovn/ovsdb-server-sb.log:2022-08-18T12:30:35.330Z|92698|raft|INFO|Transferring leadership to write a snapshot. 2, At 12:30:36, neutron-ovn-metadata-agent reported OVSDB Error $ find sosreport-srv1*/var/log/neutron/* |xargs zgrep -i -E 'OVSDB Error' sosreport-srv1xxx2d-xxx-2022-08-18-cuvkufw/var/log/neutron/neutron-ovn-metadata-agent.log:2022-08-18 12:30:36.103 75556 ERROR ovsdbapp.backend.ovs_idl.transaction [-] OVSDB Error: no error details available sosreport-srv1xxx6d-xxx-2022-08-18-bgnovqu/var/log/neutron/neutron-ovn-metadata-agent.log:2022-08-18 12:30:36.104 2171 ERROR ovsdbapp.backend.ovs_idl.transaction [-] OVSDB Error: no error details available 3, At 12:57:53, we saw the error 'No port found in network', then we will hit the problem that OVN metadata does not work randomly 2022-08-18 12:57:53.800 3730 ERROR neutron.agent.ovn.metadata.server [-] No port found in network 63e2c276-60dd-40e3-baa1-c16342eacce2 with IP address 100.94.98.135 After the problem occurs, restarting neutron-ovn-metadata-agent or restarting haproxy instance as follows can be used as a workaround. /usr/bin/neutron-rootwrap /etc/neutron/rootwrap.conf ip netns exec ovnmeta-63e2c276-60dd-40e3-baa1-c16342eacce2 haproxy -f /var/lib/neutron/ovn-metadata- proxy/63e2c276-60dd-40e3-baa1-c16342eacce2.conf One lp bug #1990978 [1] is trying to reducing the frequency of transfers, it should be beneficial to this problem. But it only reduces the occurrence of problems, not completely avoiding them. I wonder if we need to add some retry logic on the neutron side NOTE: The openstack version we are using is focal-xena, and openvswitch's version is 2.16.0-0ubuntu2.1~cloud0 [1] https://bugs.launchpad.net/ubuntu/+source/openvswitch/+bug/1990978 To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1996594/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp