Reviewed: https://review.opendev.org/c/openstack/neutron/+/936428 Committed: https://opendev.org/openstack/neutron/commit/865097c6896a87fa5b6d508058db9ff1e6c423d1 Submitter: "Zuul (22348)" Branch: master
commit 865097c6896a87fa5b6d508058db9ff1e6c423d1 Author: Rodolfo Alonso Hernandez <[email protected]> Date: Fri Nov 29 09:30:23 2024 +0000 [OVN] Improve initial hash ring setup When using WSGI module, the multiprocess event is not shared between the WSGI workers. This new implementation retrieves the WSGI start time to define the OVN hash ring register creation time. That will be used to filter out the stale registers. Depends-On: https://review.opendev.org/c/openstack/devstack/+/936669 Closes-Bug: #2083570 Change-Id: Id9f851f33c2cb3d2c2759a3c66adf2599a3122fe ** Changed in: neutron Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/2083570 Title: Neutron times out handling a request to create a network Status in neutron: Fix Released Bug description: This happened in gate: https://369ed7cdb441e6ba648b-4c5da0a4d8de7235fc811d3d8ea842f8.ssl.cf5.rackcdn.com/930608/3/gate/neutron- ovn-tempest-ipv6-only-ovs-release/16831fa/job-output.txt In devstack log, we see this: 2024-10-02 19:07:29.822345 | controller | ++ lib/tempest:configure_tempest:384 : openstack --os-cloud devstack-admin --os-region RegionOne network create --share shared 2024-10-02 19:12:30.075768 | controller | Error while executing command: HttpException: 500, : 500 Internal Server Error: Internal Server Error: The server encountered an internal error or: misconfiguration and was unable to complete: your request.: Please contact the server administrator at: webmaster@localhost to inform them of the time this error occurred,: and the actions you performed just before this error.: More information about this error may be available: in the server error log.: Apache/2.4.52 (Ubuntu) Server at 2001:4802:7805:104:be76:4eff:fe20:23a0 Port 80 The req-id is: req-77c1f0b0-0b12-4b4b-9caa-f1d749596242 In neutron-api log, we see this as the last message for the req-id: Oct 02 19:07:30.194086 np0038691690 devstack@neutron- api.service[59580]: DEBUG neutron_lib.callbacks.manager [None req-77c1f0b0-0b12-4b4b-9caa-f1d749596242 admin admin] Publish callbacks ['neutron.services.auto_allocate.db._ensure_external_network_default_value_callback-8743187362991'] for network (13143b5f-8c77-4b54-be12-2807c0430033), precommit_create {{(pid=59580) _notify_loop /opt/stack/data/venv/lib/python3.10/site- packages/neutron_lib/callbacks/manager.py:184}} Nothing after it related to the req-id. There are also these errors for hash ring / db locks after the last req-id tagged message: Oct 02 19:07:59.684841 np0038691690 [email protected][59580]: DEBUG futurist.periodics [-] Submitting periodic callback 'neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.maintenance.HashRingHealthCheckPeriodics.touch_hash_ring_nodes' {{(pid=59580) _process_scheduled /opt/stack/data/venv/lib/python3.10/site-packages/futurist/periodics.py:638}} Oct 02 19:08:09.697938 np0038691690 [email protected][59580]: DEBUG dbcounter [-] [59580] Writing DB stats neutron:UPDATE=1 {{(pid=59580) stat_writer /opt/stack/data/venv/lib/python3.10/site-packages/dbcounter.py:115}} Oct 02 19:08:20.609536 np0038691690 [email protected][59580]: DEBUG neutron_lib.db.api [None req-fbfa6ef6-a02f-4aa1-bbc0-cfa1aab9022e None None] Retry wrapper got retriable exception: (pymysql.err.OperationalError) (1205, 'Lock wait timeout exceeded; try restarting transaction') Oct 02 19:08:20.609536 np0038691690 [email protected][59580]: [SQL: UPDATE ovn_hash_ring SET updated_at=%(updated_at)s WHERE ovn_hash_ring.node_uuid = %(node_uuid_1)s] Oct 02 19:08:20.609536 np0038691690 [email protected][59580]: [parameters: {'updated_at': datetime.datetime(2024, 10, 2, 19, 7, 30, 112762), 'node_uuid_1': '90677d5a-7f61-461d-8b8c-b9fad0a372e0'}] Oct 02 19:08:20.609536 np0038691690 [email protected][59580]: (Background on this error at: https://sqlalche.me/e/20/e3q8) {{(pid=59580) wrapped /opt/stack/data/venv/lib/python3.10/site-packages/neutron_lib/db/api.py:185}} Oct 02 19:08:20.610666 np0038691690 [email protected][59580]: DEBUG oslo_db.api [None req-fbfa6ef6-a02f-4aa1-bbc0-cfa1aab9022e None None] Performing DB retry for function neutron.db.ovn_hash_ring_db._touch {{(pid=59580) wrapper /opt/stack/data/venv/lib/python3.10/site-packages/oslo_db/api.py:155}} Oct 02 19:08:31.116859 np0038691690 [email protected][59580]: DEBUG dbcounter [-] [59580] Writing DB stats neutron:UPDATE=1 {{(pid=59580) stat_writer /opt/stack/data/venv/lib/python3.10/site-packages/dbcounter.py:115}} Oct 02 19:08:50.611310 np0038691690 [email protected][59580]: DEBUG neutron_lib.db.api [-] Retry wrapper got retriable exception: (pymysql.err.OperationalError) (1205, 'Lock wait timeout exceeded; try restarting transaction') Oct 02 19:08:50.611310 np0038691690 [email protected][59580]: [SQL: UPDATE ovn_hash_ring SET updated_at=%(updated_at)s WHERE ovn_hash_ring.hostname = %(hostname_1)s AND ovn_hash_ring.group_name = %(group_name_1)s] Oct 02 19:08:50.611310 np0038691690 [email protected][59580]: [parameters: {'updated_at': datetime.datetime(2024, 10, 2, 19, 7, 59, 685059), 'hostname_1': 'np0038691690', 'group_name_1': 'mechanism_driver'}] Oct 02 19:08:50.611310 np0038691690 [email protected][59580]: (Background on this error at: https://sqlalche.me/e/20/e3q8) {{(pid=59580) wrapped /opt/stack/data/venv/lib/python3.10/site-packages/neutron_lib/db/api.py:185}} Oct 02 19:08:50.612564 np0038691690 [email protected][59580]: DEBUG oslo_db.api [-] Performing DB retry for function neutron.db.ovn_hash_ring_db._touch {{(pid=59580) wrapper /opt/stack/data/venv/lib/python3.10/site-packages/oslo_db/api.py:155}} Oct 02 19:09:01.122051 np0038691690 [email protected][59580]: DEBUG dbcounter [-] [59580] Writing DB stats neutron:UPDATE=1 {{(pid=59580) stat_writer /opt/stack/data/venv/lib/python3.10/site-packages/dbcounter.py:115}} Oct 02 19:09:11.618081 np0038691690 [email protected][59580]: DEBUG neutron_lib.db.api [None req-fbfa6ef6-a02f-4aa1-bbc0-cfa1aab9022e None None] Retry wrapper got retriable exception: (pymysql.err.OperationalError) (1205, 'Lock wait timeout exceeded; try restarting transaction') Oct 02 19:09:11.618081 np0038691690 [email protected][59580]: [SQL: UPDATE ovn_hash_ring SET updated_at=%(updated_at)s WHERE ovn_hash_ring.node_uuid = %(node_uuid_1)s] Oct 02 19:09:11.618081 np0038691690 [email protected][59580]: [parameters: {'updated_at': datetime.datetime(2024, 10, 2, 19, 8, 21, 110834), 'node_uuid_1': '90677d5a-7f61-461d-8b8c-b9fad0a372e0'}] Oct 02 19:09:11.618081 np0038691690 [email protected][59580]: (Background on this error at: https://sqlalche.me/e/20/e3q8) {{(pid=59580) wrapped /opt/stack/data/venv/lib/python3.10/site-packages/neutron_lib/db/api.py:185}} Oct 02 19:09:11.619845 np0038691690 [email protected][59580]: DEBUG oslo_db.api [None req-fbfa6ef6-a02f-4aa1-bbc0-cfa1aab9022e None None] Performing DB retry for function neutron.db.ovn_hash_ring_db._touch {{(pid=59580) wrapper /opt/stack/data/venv/lib/python3.10/site-packages/oslo_db/api.py:155}} Oct 02 19:09:21.740920 np0038691690 [email protected][59580]: DEBUG dbcounter [-] [59580] Writing DB stats neutron:UPDATE=1 {{(pid=59580) stat_writer /opt/stack/data/venv/lib/python3.10/site-packages/dbcounter.py:115}} Oct 02 19:09:25.055794 np0038691690 [email protected][59580]: ERROR neutron.plugins.ml2.managers [None req-1ed43a5e-ac35-4256-850c-3d3d414dd16e None None] Mechanism driver 'ovn' failed in update_port_postcommit: ovsdbapp.exceptions.TimeoutException: Commands [CheckRevisionNumberCommand(_result=None, name=1ce51282-34b7-4e53-8b1c-811d585a1a7d, resource={'id': '1ce51282-34b7-4e53-8b1c-811d585a1a7d', 'name': '', 'network_id': 'a4a2cb55-6e5c-436e-b2b0-f240149b240f', 'tenant_id': '', 'mac_address': 'fa:16:3e:6e:77:38', 'admin_state_up': True, 'status': 'ACTIVE', 'device_id': '02ca23c9-f068-4884-b5f1-89209b252b98', 'device_owner': 'network:router_gateway', 'standard_attr_id': 23, 'fixed_ips': [{'subnet_id': '8a949128-8d38-454f-b7ed-931aafdeccdf', 'ip_address': '172.24.5.37'}, {'subnet_id': 'b02a7718-b0a3-4b32-afcb-384ab2c14812', 'ip_address': '2001:db8::7e'}], 'allowed_address_pairs': [], 'extra_dhcp_opts': [], 'security_groups': [], 'description': '', 'binding:vnic_type': 'nor mal', 'binding:profile': {}, 'binding:host_id': 'np0038691690', 'binding:vif_type': 'ovs', 'binding:vif_details': {'port_filter': True, 'connectivity': 'l2', 'bridge_name': 'br-int', 'datapath_type': 'system'}, 'port_security_enabled': False, 'tags': [], 'created_at': '2024-10-02T19:06:13Z', 'updated_at': '2024-10-02T19:06:21Z', 'revision_number': 6, 'project_id': '', 'network': {'id': 'a4a2cb55-6e5c-436e-b2b0-f240149b240f', 'name': 'public', 'tenant_id': 'ae9f99f870e44c829d566f4038cd9fcd', 'admin_state_up': True, 'mtu': 1430, 'status': 'ACTIVE', 'subnets': ['8a949128-8d38-454f-b7ed-931aafdeccdf', 'b02a7718-b0a3-4b32-afcb-384ab2c14812'], 'standard_attr_id': 18, 'shared': False, 'availability_zone_hints': [], 'availability_zones': [], 'ipv4_address_scope': None, 'ipv6_address_scope': None, 'router:external': True, 'vlan_transparent': None, 'description': '', 'port_security_enabled': True, 'is_default': True, 'tags': [], 'created_at': '2024-10-02T19:06:07Z', 'updated_at': '2024-10-02T 19:06:21Z', 'revision_number': 3, 'project_id': 'ae9f99f870e44c829d566f4038cd9fcd', 'provider:network_type': 'flat', 'provider:physical_network': 'public', 'provider:segmentation_id': None}}, resource_type=ports, if_exists=True), SetLSwitchPortCommand(_result=None, lport=1ce51282-34b7-4e53-8b1c-811d585a1a7d, external_ids_update=None, columns={'external_ids': {'neutron:port_name': '', 'neutron:device_id': '02ca23c9-f068-4884-b5f1-89209b252b98', 'neutron:project_id': '', 'neutron:cidrs': '172.24.5.37/24 2001:db8::7e/64', 'neutron:device_owner': 'network:router_gateway', 'neutron:subnet_pool_addr_scope4': '', 'neutron:subnet_pool_addr_scope6': '', 'neutron:network_name': 'neutron-a4a2cb55-6e5c-436e-b2b0-f240149b240f', 'neutron:security_group_ids': '', 'neutron:revision_number': '6', 'neutron:vnic_type': 'normal', 'neutron:port_capabilities': '', 'neutron:mtu': '', 'neutron:host_id': 'np0038691690'}, 'parent_name': [], 'tag': [], 'options': {'requested-chassis': 'np0038691690', 'mcast_f lood_reports': 'true', 'exclude-lb-vips-from-garp': 'true', 'nat-addresses': 'router', 'router-port': 'lrp-1ce51282-34b7-4e53-8b1c-811d585a1a7d'}, 'enabled': True, 'port_security': [], 'type': 'router'}, if_exists=False), PgDelPortCommand(_result=None, port_group=neutron_pg_drop, lsp=['1ce51282-34b7-4e53-8b1c-811d585a1a7d'], if_exists=False)] exceeded timeout 180 seconds, cause: Result queue is empty Oct 02 19:09:25.055794 np0038691690 [email protected][59580]: ERROR neutron.plugins.ml2.managers Traceback (most recent call last): Oct 02 19:09:25.055794 np0038691690 [email protected][59580]: ERROR neutron.plugins.ml2.managers File "/opt/stack/data/venv/lib/python3.10/site-packages/ovsdbapp/backend/ovs_idl/transaction.py", line 54, in commit Oct 02 19:09:25.055794 np0038691690 [email protected][59580]: ERROR neutron.plugins.ml2.managers result = self.results.get(timeout=self.timeout) Oct 02 19:09:25.055794 np0038691690 [email protected][59580]: ERROR neutron.plugins.ml2.managers File "/opt/stack/data/venv/lib/python3.10/site-packages/eventlet/queue.py", line 320, in get Oct 02 19:09:25.055794 np0038691690 [email protected][59580]: ERROR neutron.plugins.ml2.managers return waiter.wait() Oct 02 19:09:25.055794 np0038691690 [email protected][59580]: ERROR neutron.plugins.ml2.managers File "/opt/stack/data/venv/lib/python3.10/site-packages/eventlet/queue.py", line 139, in wait Oct 02 19:09:25.055794 np0038691690 [email protected][59580]: ERROR neutron.plugins.ml2.managers return get_hub().switch() Oct 02 19:09:25.055794 np0038691690 [email protected][59580]: ERROR neutron.plugins.ml2.managers File "/opt/stack/data/venv/lib/python3.10/site-packages/eventlet/hubs/hub.py", line 310, in switch Oct 02 19:09:25.055794 np0038691690 [email protected][59580]: ERROR neutron.plugins.ml2.managers return self.greenlet.switch() Oct 02 19:09:25.055794 np0038691690 [email protected][59580]: ERROR neutron.plugins.ml2.managers _queue.Empty Oct 02 19:09:25.055794 np0038691690 [email protected][59580]: ERROR neutron.plugins.ml2.managers Oct 02 19:09:25.055794 np0038691690 [email protected][59580]: ERROR neutron.plugins.ml2.managers The above exception was the direct cause of the following exception: Oct 02 19:09:25.055794 np0038691690 [email protected][59580]: ERROR neutron.plugins.ml2.managers Oct 02 19:09:25.055794 np0038691690 [email protected][59580]: ERROR neutron.plugins.ml2.managers Traceback (most recent call last): Oct 02 19:09:25.055794 np0038691690 [email protected][59580]: ERROR neutron.plugins.ml2.managers File "/opt/stack/neutron/neutron/plugins/ml2/managers.py", line 497, in _call_on_drivers Oct 02 19:09:25.055794 np0038691690 [email protected][59580]: ERROR neutron.plugins.ml2.managers getattr(driver.obj, method_name)(context) Oct 02 19:09:25.055794 np0038691690 [email protected][59580]: ERROR neutron.plugins.ml2.managers File "/opt/stack/neutron/neutron/plugins/ml2/drivers/ovn/mech_driver/mech_driver.py", line 886, in update_port_postcommit Oct 02 19:09:25.055794 np0038691690 [email protected][59580]: ERROR neutron.plugins.ml2.managers self._ovn_update_port(context.plugin_context, port, original_port, Oct 02 19:09:25.055794 np0038691690 [email protected][59580]: ERROR neutron.plugins.ml2.managers File "/opt/stack/neutron/neutron/plugins/ml2/drivers/ovn/mech_driver/mech_driver.py", line 766, in _ovn_update_port Oct 02 19:09:25.055794 np0038691690 [email protected][59580]: ERROR neutron.plugins.ml2.managers self._ovn_client.update_port(plugin_context, port, Oct 02 19:09:25.055794 np0038691690 [email protected][59580]: ERROR neutron.plugins.ml2.managers File "/opt/stack/neutron/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovn_client.py", line 663, in update_port Oct 02 19:09:25.055794 np0038691690 [email protected][59580]: ERROR neutron.plugins.ml2.managers with self._nb_idl.transaction(check_error=True, Oct 02 19:09:25.055794 np0038691690 [email protected][59580]: ERROR neutron.plugins.ml2.managers File "/usr/lib/python3.10/contextlib.py", line 142, in __exit__ Oct 02 19:09:25.055794 np0038691690 [email protected][59580]: ERROR neutron.plugins.ml2.managers next(self.gen) Oct 02 19:09:25.055794 np0038691690 [email protected][59580]: ERROR neutron.plugins.ml2.managers File "/opt/stack/neutron/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/impl_idl_ovn.py", line 274, in transaction Oct 02 19:09:25.058874 np0038691690 [email protected][59580]: ERROR neutron.plugins.ml2.managers with super(OvsdbNbOvnIdl, self).transaction(*args, **kwargs) as t: Oct 02 19:09:25.058874 np0038691690 [email protected][59580]: ERROR neutron.plugins.ml2.managers File "/usr/lib/python3.10/contextlib.py", line 142, in __exit__ Oct 02 19:09:25.058874 np0038691690 [email protected][59580]: ERROR neutron.plugins.ml2.managers next(self.gen) Oct 02 19:09:25.058874 np0038691690 [email protected][59580]: ERROR neutron.plugins.ml2.managers File "/opt/stack/data/venv/lib/python3.10/site-packages/ovsdbapp/api.py", line 114, in transaction Oct 02 19:09:25.058874 np0038691690 [email protected][59580]: ERROR neutron.plugins.ml2.managers with self.create_transaction( Oct 02 19:09:25.058874 np0038691690 [email protected][59580]: ERROR neutron.plugins.ml2.managers File "/opt/stack/data/venv/lib/python3.10/site-packages/ovsdbapp/api.py", line 71, in __exit__ Oct 02 19:09:25.058874 np0038691690 [email protected][59580]: ERROR neutron.plugins.ml2.managers self.result = self.commit() Oct 02 19:09:25.058874 np0038691690 [email protected][59580]: ERROR neutron.plugins.ml2.managers File "/opt/stack/data/venv/lib/python3.10/site-packages/ovsdbapp/backend/ovs_idl/transaction.py", line 56, in commit Oct 02 19:09:25.058874 np0038691690 [email protected][59580]: ERROR neutron.plugins.ml2.managers raise exceptions.TimeoutException( Oct 02 19:09:25.058874 np0038691690 [email protected][59580]: ERROR neutron.plugins.ml2.managers ovsdbapp.exceptions.TimeoutException: Commands [CheckRevisionNumberCommand(_result=None, name=1ce51282-34b7-4e53-8b1c-811d585a1a7d, resource={'id': '1ce51282-34b7-4e53-8b1c-811d585a1a7d', 'name': '', 'network_id': 'a4a2cb55-6e5c-436e-b2b0-f240149b240f', 'tenant_id': '', 'mac_address': 'fa:16:3e:6e:77:38', 'admin_state_up': True, 'status': 'ACTIVE', 'device_id': '02ca23c9-f068-4884-b5f1-89209b252b98', 'device_owner': 'network:router_gateway', 'standard_attr_id': 23, 'fixed_ips': [{'subnet_id': '8a949128-8d38-454f-b7ed-931aafdeccdf', 'ip_address': '172.24.5.37'}, {'subnet_id': 'b02a7718-b0a3-4b32-afcb-384ab2c14812', 'ip_address': '2001:db8::7e'}], 'allowed_address_pairs': [], 'extra_dhcp_opts': [], 'security_groups': [], 'description': '', 'binding:vnic_type': 'normal', 'binding:profile': {}, 'binding:host_id': 'np0038691690', 'binding:vif_type': 'ovs', 'binding:vif_details': { 'port_filter': True, 'connectivity': 'l2', 'bridge_name': 'br-int', 'datapath_type': 'system'}, 'port_security_enabled': False, 'tags': [], 'created_at': '2024-10-02T19:06:13Z', 'updated_at': '2024-10-02T19:06:21Z', 'revision_number': 6, 'project_id': '', 'network': {'id': 'a4a2cb55-6e5c-436e-b2b0-f240149b240f', 'name': 'public', 'tenant_id': 'ae9f99f870e44c829d566f4038cd9fcd', 'admin_state_up': True, 'mtu': 1430, 'status': 'ACTIVE', 'subnets': ['8a949128-8d38-454f-b7ed-931aafdeccdf', 'b02a7718-b0a3-4b32-afcb-384ab2c14812'], 'standard_attr_id': 18, 'shared': False, 'availability_zone_hints': [], 'availability_zones': [], 'ipv4_address_scope': None, 'ipv6_address_scope': None, 'router:external': True, 'vlan_transparent': None, 'description': '', 'port_security_enabled': True, 'is_default': True, 'tags': [], 'created_at': '2024-10-02T19:06:07Z', 'updated_at': '2024-10-02T19:06:21Z', 'revision_number': 3, 'project_id': 'ae9f99f870e44c829d566f4038cd9fcd', 'provider:network_type': 'flat' , 'provider:physical_network': 'public', 'provider:segmentation_id': None}}, resource_type=ports, if_exists=True), SetLSwitchPortCommand(_result=None, lport=1ce51282-34b7-4e53-8b1c-811d585a1a7d, external_ids_update=None, columns={'external_ids': {'neutron:port_name': '', 'neutron:device_id': '02ca23c9-f068-4884-b5f1-89209b252b98', 'neutron:project_id': '', 'neutron:cidrs': '172.24.5.37/24 2001:db8::7e/64', 'neutron:device_owner': 'network:router_gateway', 'neutron:subnet_pool_addr_scope4': '', 'neutron:subnet_pool_addr_scope6': '', 'neutron:network_name': 'neutron-a4a2cb55-6e5c-436e-b2b0-f240149b240f', 'neutron:security_group_ids': '', 'neutron:revision_number': '6', 'neutron:vnic_type': 'normal', 'neutron:port_capabilities': '', 'neutron:mtu': '', 'neutron:host_id': 'np0038691690'}, 'parent_name': [], 'tag': [], 'options': {'requested-chassis': 'np0038691690', 'mcast_flood_reports': 'true', 'exclude-lb-vips-from-garp': 'true', 'nat-addresses': 'router', 'router-port': 'lrp-1ce51282 -34b7-4e53-8b1c-811d585a1a7d'}, 'enabled': True, 'port_security': [], 'type': 'router'}, if_exists=False), PgDelPortCommand(_result=None, port_group=neutron_pg_drop, lsp=['1ce51282-34b7-4e53-8b1c-811d585a1a7d'], if_exists=False)] exceeded timeout 180 seconds, cause: Result queue is empty I don't know if hash ring error is related to the original request lost / timed out. I don't see any errors in ovsdb-server log for NB. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/2083570/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : [email protected] Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp

