[Yahoo-eng-team] [Bug 2025410] [NEW] After `openstack port set no-fixed-ip` command, the port's ip and mac's arp are still in qrouter without deletion
Public bug reported: neutron branch: stable/victoria Steps to reproduce the problem: ``` # openstack port show ba20b562-7320-4fb4-99f1-eae617720bf1 +-+---+ | Field | Value | +-+---+ | admin_state_up | UP | | allowed_address_pairs | | | binding_host_id | hci-002 | | binding_profile | | | binding_vif_details | bridge_name='br-int', connectivity='l2', datapath_type='system', ovs_hybrid_plug='False', port_filter='True' | | binding_vif_type| ovs | | binding_vnic_type | normal | | | device_owner| compute:az-box-1 | | extra_dhcp_opts | | | fixed_ips | ip_address='192.168.1.43', subnet_id='f5acaafe-554e-47d4-9107-6566b5d50bb3' | | id | ba20b562-7320-4fb4-99f1-eae617720bf1 | | ip_allocation | None | | mac_address | fa:16:3e:4c:fd:ce | ... openstack port set ba20b562-7320-4fb4-99f1-eae617720bf1 --no-fixed-ip openstack port set ba20b562-7320-4fb4-99f1-eae617720bf1 --fixed-ip ip-address=192.168.1.143 openstack port set ba20b562-7320-4fb4-99f1-eae617720bf1 --no-fixed-ip openstack port set ba20b562-7320-4fb4-99f1-eae617720bf1 --fixed-ip ip-address=192.168.1.243 [root@hci-002 ~]# ip netns exec qrouter-ffd819cb-349e-4b31-845a-7b7d97461f32 arp -n |grep fa:16:3e:4c:fd:ce 192.168.1.43 ether fa:16:3e:4c:fd:ce CM qr-8ce629a5-03 192.168.1.143ether fa:16:3e:4c:fd:ce CM qr-8ce629a5-03 192.168.1.243ether fa:16:3e:4c:fd:ce CM qr-8ce629a5-03 ``` ** Affects: neutron Importance: Undecided Status: New ** Summary changed: - When I pass the `openstack port set no-fixed-ip` command, the port's ip and mac's arp are still in qrouter without deletion + After `openstack port set no-fixed-ip` command, the port's ip and mac's arp are still in qrouter without deletion ** Description changed: - When I pass the `openstack port set no-fixed-ip` command, the port's ip - and mac's arp are still in qrouter without deletion - neutron branch: stable/victoria - Steps to reproduce the problem: ``` - # openstack port show ba20b562-7320-4fb4-99f1-eae617720bf1 + # openstack port show ba20b562-7320-4fb4-99f1-eae617720bf1 +-+---+ | Field | Value
[Yahoo-eng-team] [Bug 1988382] [NEW] L3 agent(agent_mode=dvr_snat) restart, fip namespace removed rfp-port, resulting in fip not connecting
Public bug reported: stable/victoria openstack network node(agent_mode=dvr_snat) and compute node are the same node,the VM on this node is bound to fip, but the snat_port of the router of this VM is located in another network node,VM can access north-south traffic via fip.But if you restart the l3-agent,The external_gateway_removed is called during l3-agent restart, causing the fip on that node to be unreachable https://github.com/openstack/neutron/blob/stable/victoria/neutron/agent/l3/dvr_edge_router.py #39 def external_gateway_added(self, ex_gw_port, interface_name): elif self.snat_namespace.exists(): # This is the case where the snat was moved manually or # rescheduled to a different agent when the agent was dead. LOG.debug("SNAT was moved or rescheduled to a different host " "and does not match with the current host. This is " "a stale namespace %s and will be cleared from the " "current dvr_snat host.", self.snat_namespace.name) self.external_gateway_removed(ex_gw_port, interface_name) ** Affects: neutron Importance: Undecided Status: New ** Description changed: stable/victoria - - openstack network node(agent_mode=dvr_snat) and compute node are the same node,the VM on this node is bound to fip, but the snat_port of the router of this VM is located in another network node,VM can access north-south traffic via fip.But if you restart the l3-agent,The external_gateway_removed is called during the reboot, causing the fip on that node to be unreachable + openstack network node(agent_mode=dvr_snat) and compute node are the + same node,the VM on this node is bound to fip, but the snat_port of the + router of this VM is located in another network node,VM can access + north-south traffic via fip.But if you restart the l3-agent,The + external_gateway_removed is called during l3-agent restart, causing the + fip on that node to be unreachable https://github.com/openstack/neutron/blob/stable/victoria/neutron/agent/l3/dvr_edge_router.py #39 - def external_gateway_added(self, ex_gw_port, interface_name): - elif self.snat_namespace.exists(): - # This is the case where the snat was moved manually or - # rescheduled to a different agent when the agent was dead. - LOG.debug("SNAT was moved or rescheduled to a different host " - "and does not match with the current host. This is " - "a stale namespace %s and will be cleared from the " - "current dvr_snat host.", self.snat_namespace.name) - self.external_gateway_removed(ex_gw_port, interface_name) + def external_gateway_added(self, ex_gw_port, interface_name): + elif self.snat_namespace.exists(): + # This is the case where the snat was moved manually or + # rescheduled to a different agent when the agent was dead. + LOG.debug("SNAT was moved or rescheduled to a different host " + "and does not match with the current host. This is " + "a stale namespace %s and will be cleared from the " + "current dvr_snat host.", self.snat_namespace.name) + self.external_gateway_removed(ex_gw_port, interface_name) -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1988382 Title: L3 agent(agent_mode=dvr_snat) restart, fip namespace removed rfp-port, resulting in fip not connecting Status in neutron: New Bug description: stable/victoria openstack network node(agent_mode=dvr_snat) and compute node are the same node,the VM on this node is bound to fip, but the snat_port of the router of this VM is located in another network node,VM can access north-south traffic via fip.But if you restart the l3-agent,The external_gateway_removed is called during l3-agent restart, causing the fip on that node to be unreachable https://github.com/openstack/neutron/blob/stable/victoria/neutron/agent/l3/dvr_edge_router.py #39 def external_gateway_added(self, ex_gw_port, interface_name): elif self.snat_namespace.exists(): # This is the case where the snat was moved manually or # rescheduled to a different agent when the agent was dead. LOG.debug("SNAT was moved or rescheduled to a different host " "and does not match with the current host. This is " "a stale namespace %s and will be cleared from the " "current dvr_snat host.", self.snat_namespace.name) self.external_gateway_removed(ex_gw_port, interface_name) To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1988382/+subscriptions --
[Yahoo-eng-team] [Bug 1987377] [NEW] neutron-metadata-agent the memory usage is increasing
Public bug reported: env: branch: stable/victoria The memory footprint becomes smaller after restarting the metadata- agent, but as it runs longer, the memory footprint becomes larger and larger until it is killed by oom kubectl top pod neutron-metadata-agent-default-6nz79 -nopenstack NAME CPU(cores) MEMORY(bytes) neutron-metadata-agent-default-6nz79 4m 7121Mi kubectl top pod -nopenstack neutron-metadata-agent-default-7znzp NAME CPU(cores) MEMORY(bytes) neutron-metadata-agent-default-7znzp 3m 24321Mi Tasks: 12 total, 1 running, 11 sleeping, 0 stopped, 0 zombie %Cpu(s): 3.5 us, 1.3 sy, 0.0 ni, 94.2 id, 0.0 wa, 0.0 hi, 0.3 si, 0.7 st KiB Mem : 32885820 total, 3087452 free, 28965316 used, 833052 buff/cache KiB Swap:0 total,0 free,0 used. 3446688 avail Mem PID USER PR NIVIRTRESSHR S %CPU %MEM TIME+ COMMAND 1 neutron 20 01020 4 0 S 0.0 0.0 0:00.07 pause 314636 neutron 20 0 193348 83160 4328 S 0.0 0.3 14:47.42 neutron-metadat 314649 neutron 20 0 3246420 2.988g972 S 0.0 9.5 8:37.78 neutron-metadat 314650 neutron 20 0 3225648 2.970g 3184 S 0.0 9.5 8:36.11 neutron-metadat 314651 neutron 20 0 3228576 2.970g 0 S 0.0 9.5 8:37.24 neutron-metadat 314652 neutron 20 0 3223508 2.966g 1316 S 0.0 9.5 8:35.71 neutron-metadat 314653 neutron 20 0 3216512 2.959g844 S 0.0 9.4 8:37.38 neutron-metadat 314654 neutron 20 0 3265104 3.006g976 S 0.0 9.6 8:40.20 neutron-metadat 314655 neutron 20 0 3180172 2.924g280 S 0.0 9.3 8:33.43 neutron-metadat 377345 neutron 20 0 193348 83388 4556 S 0.0 0.3 0:00.01 neutron-metadat ** Affects: neutron Importance: Undecided Status: New ** Description changed: + env: + branch: stable/victoria + + The memory footprint becomes smaller after restarting the metadata- + agent, but as it runs longer, the memory footprint becomes larger and + larger until it is killed by oom + + kubectl top pod neutron-metadata-agent-default-6nz79 -nopenstack - NAME CPU(cores) MEMORY(bytes) - neutron-metadata-agent-default-6nz79 4m 7121Mi + NAME CPU(cores) MEMORY(bytes) + neutron-metadata-agent-default-6nz79 4m 7121Mi kubectl top pod -nopenstack neutron-metadata-agent-default-7znzp - NAME CPU(cores) MEMORY(bytes) - neutron-metadata-agent-default-7znzp 3m 24321Mi + NAME CPU(cores) MEMORY(bytes) + neutron-metadata-agent-default-7znzp 3m 24321Mi - neutron-metadata-agent-default-7znzp 1/1 - Running 14 103d10.200.4.54 mgt14 - Tasks: 12 total, 1 running, 11 sleeping, 0 stopped, 0 zombie %Cpu(s): 3.5 us, 1.3 sy, 0.0 ni, 94.2 id, 0.0 wa, 0.0 hi, 0.3 si, 0.7 st KiB Mem : 32885820 total, 3087452 free, 28965316 used, 833052 buff/cache - KiB Swap:0 total,0 free,0 used. 3446688 avail Mem + KiB Swap:0 total,0 free,0 used. 3446688 avail Mem - PID USER PR NIVIRTRESSHR S %CPU %MEM TIME+ COMMAND - 1 neutron 20 01020 4 0 S 0.0 0.0 0:00.07 pause - 314636 neutron 20 0 193348 83160 4328 S 0.0 0.3 14:47.42 neutron-metadat - 314649 neutron 20 0 3246420 2.988g972 S 0.0 9.5 8:37.78 neutron-metadat - 314650 neutron 20 0 3225648 2.970g 3184 S 0.0 9.5 8:36.11 neutron-metadat - 314651 neutron 20 0 3228576 2.970g 0 S 0.0 9.5 8:37.24 neutron-metadat - 314652 neutron 20 0 3223508 2.966g 1316 S 0.0 9.5 8:35.71
[Yahoo-eng-team] [Bug 1978088] [NEW] After ovs-agent restart, table=21 and table=22 on br-tun openflow table is missing
Public bug reported: In the following scenarios (especially in large-scale cases, when restarting many ovs-agents at the same time), the openflow table is missing and cannot be self-recovered As a simple example, restarting two ovs-agent at the same time: ``` network.local_ip=30.0.1.6,output="vxlan-1e000106" compute1.local_ip=30.0.1.7,output="vxlan-1e000107" compute2.local_ip=30.0.1.8,output="vxlan-1e000108" network.port=('192.168.1.2') compute1.port=('192.168.1.11') compute2.port=('192.168.1.141') // iter_num=0 of compute1 DEBUG neutron.plugins.ml2.db [req-f8093da8-9f1a-4da2-a27f-03f1b4d50dfd - - - - -] For port cb7fad87-7dc7-4008-a349-3a17e3b8be71, host compute1, got binding levels [PortBindingLevel(driver='openvswitch',host='compute1',level=0,port_id=cb7fad87-7dc7-4008-a349-3a17e3b8be71,segment=NetworkSegment(0bcd776d-92cd-4d96-9e54-92350700c4ca),segment_id=0bcd776d-92cd-4d96-9e54-92350700c4ca)] get_binding_level_objs /usr/lib/python3.6/site-packages/neutron/plugins/ml2/db.py:78 DEBUG neutron.plugins.ml2.drivers.l2pop.mech_driver [req-f8093da8-9f1a-4da2-a27f-03f1b4d50dfd - - - - -] host: compute1, agent_active_ports: 3, refresh_tunnels: True update_port_up // rpc-1 Notify l2population agent compute1 at q-agent-notifier the message add_fdb_entries with {'8883e077-aadb-4b79-9315-3c029e94a857': {'segment_id': 22, 'network_type': 'vxlan', 'ports': {'30.0.1.6': [('00:00:00:00:00:00', '0.0.0.0'), PortInfo(mac_address='fa:16:3e:db:75:11', ip_address='192.168.1.2')], '30.0.1.8': [('00:00:00:00:00:00', '0.0.0.0'), PortInfo(mac_address='fa:16:3e:45:eb:6a', ip_address='192.168.1.141')]}}} _notification_host // rpc-2 Fanout notify l2population agents at q-agent-notifier the message add_fdb_entries with {'8883e077-aadb-4b79-9315-3c029e94a857': {'segment_id': 22, 'network_type': 'vxlan', 'ports': {'30.0.1.7': [('00:00:00:00:00:00', '0.0.0.0'), PortInfo(mac_address='fa:16:3e:21:34:43', ip_address='192.168.1.11')]}}} _notification_fanout // iter_num>0 of compute1 DEBUG neutron.plugins.ml2.db [req-f8093da8-9f1a-4da2-a27f-03f1b4d50dfd - - - - -] For port cb7fad87-7dc7-4008-a349-3a17e3b8be71, host compute1, got binding levels [PortBindingLevel(driver='openvswitch',host='compute1',level=0,port_id=cb7fad87-7dc7-4008-a349-3a17e3b8be71,segment=NetworkSegment(0bcd776d-92cd-4d96-9e54-92350700c4ca),segment_id=0bcd776d-92cd-4d96-9e54-92350700c4ca)] get_binding_level_objs /usr/lib/python3.6/site-packages/neutron/plugins/ml2/db.py:78 2022-06-09 17:45:39.546 833566 DEBUG neutron.plugins.ml2.drivers.l2pop.mech_driver [req-f8093da8-9f1a-4da2-a27f-03f1b4d50dfd - - - - -] host: compute1, agent_active_ports: 3, refresh_tunnels: False update_port_up ... // iter_num=0 of compute2 DEBUG neutron.plugins.ml2.db [req-2e977b20-4438-4928-85bb-59de4c7389f6 - - - - -] For port ccca9701-19c0-4590-92d0-5fbd909d4eeb, host compute2, got binding levels [PortBindingLevel(driver='openvswitch',host='compute2',level=0,port_id=ccca9701-19c0-4590-92d0-5fbd909d4eeb,segment=NetworkSegment(0bcd776d-92cd-4d96-9e54-92350700c4ca),segment_id=0bcd776d-92cd-4d96-9e54-92350700c4ca)] get_binding_level_objs /usr/lib/python3.6/site-packages/neutron/plugins/ml2/db.py:78 DEBUG neutron.plugins.ml2.drivers.l2pop.mech_driver [req-2e977b20-4438-4928-85bb-59de4c7389f6 - - - - -] host: compute2, agent_active_ports: 3, refresh_tunnels: True update_port_up // rpc-3 Notify l2population agent compute2 at q-agent-notifier the message add_fdb_entries with {'8883e077-aadb-4b79-9315-3c029e94a857': {'segment_id': 22, 'network_type': 'vxlan', 'ports': {'30.0.1.6': [('00:00:00:00:00:00', '0.0.0.0'), PortInfo(mac_address='fa:16:3e:db:75:11', ip_address='192.168.1.2')], '30.0.1.7': [('00:00:00:00:00:00', '0.0.0.0'), PortInfo(mac_address='fa:16:3e:21:34:43', ip_address='192.168.1.11')]}}} _notification_host // rpc-4 Fanout notify l2population agents at q-agent-notifier the message add_fdb_entries with {'8883e077-aadb-4b79-9315-3c029e94a857': {'segment_id': 22, 'network_type': 'vxlan', 'ports': {'30.0.1.8': [('00:00:00:00:00:00', '0.0.0.0'), PortInfo(mac_address='fa:16:3e:45:eb:6a', ip_address='192.168.1.141')]}}} _notification_fanout ``` 1. After iter_num=0, cleanup_stale_flows clears table=21 and table=22 of stale openflow tables 2. If compute1 receives rpc-4 first, tunnels_missing=False 3. rpc-1 timeout not received 4. As a result, table=22,priority=1, output is missing output="vxlan-1e000106" and table=21,priority=1 is missing 192.168.1.2 arp responder table 5. Missing flow tables will always be missing, resulting in VMs under this network not being able to communicate with VMs under the network node at layer 2 ** Affects: neutron Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1978088 Title: After ovs-agent restart, table=21 and table=22 on br-tun openflow table is
[Yahoo-eng-team] [Bug 1976439] [NEW] The database ml2_port_binding_levels and ml2_distributed_port_bindings tables have a lot of redundant data
Public bug reported: ENV: stable/victoria In a large scale cloud deployment and we have enabled DVR and have some huge virtual routers.When we remove some nodes from the cluster, the database ml2_port_binding_levels and ml2_distributed_port_bindings tables have a lot of redundant data,when restart neutron agent, especially l3 agents, neutron server side will trigger too many slow DB query. And this will cause the agent restart time to be too long to operate. For example, the following xxx nodes have been removed from the cluster, and there are no more qrouter on these nodes ``` MariaDB [neutron]> select count(*) from ml2_port_binding_levels; +--+ | count(*) | +--+ | 163986 | +--+ MariaDB [neutron]> select count(*) from ml2_distributed_port_bindings; +--+ | count(*) | +--+ | 119797 | +--+ MariaDB [neutron]> select count(*) from ml2_distributed_port_bindings where host like("%xxx%"); +--+ | count(*) | +--+ |78920 | +--+ MariaDB [neutron]> select count(*) from ml2_port_binding_levels where host like("%xxx%"); +--+ | count(*) | +--+ |79482 | +--+ MariaDB [neutron]> select count(distinct host) from ml2_port_binding_levels where host like("%xxx%"); +--+ | count(distinct host) | +--+ | 385 | +--+ MariaDB [neutron]> select count(*) from routers; +--+ | count(*) | +--+ | 7543 | +--+ ``` ** Affects: neutron Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1976439 Title: The database ml2_port_binding_levels and ml2_distributed_port_bindings tables have a lot of redundant data Status in neutron: New Bug description: ENV: stable/victoria In a large scale cloud deployment and we have enabled DVR and have some huge virtual routers.When we remove some nodes from the cluster, the database ml2_port_binding_levels and ml2_distributed_port_bindings tables have a lot of redundant data,when restart neutron agent, especially l3 agents, neutron server side will trigger too many slow DB query. And this will cause the agent restart time to be too long to operate. For example, the following xxx nodes have been removed from the cluster, and there are no more qrouter on these nodes ``` MariaDB [neutron]> select count(*) from ml2_port_binding_levels; +--+ | count(*) | +--+ | 163986 | +--+ MariaDB [neutron]> select count(*) from ml2_distributed_port_bindings; +--+ | count(*) | +--+ | 119797 | +--+ MariaDB [neutron]> select count(*) from ml2_distributed_port_bindings where host like("%xxx%"); +--+ | count(*) | +--+ |78920 | +--+ MariaDB [neutron]> select count(*) from ml2_port_binding_levels where host like("%xxx%"); +--+ | count(*) | +--+ |79482 | +--+ MariaDB [neutron]> select count(distinct host) from ml2_port_binding_levels where host like("%xxx%"); +--+ | count(distinct host) | +--+ | 385 | +--+ MariaDB [neutron]> select count(*) from routers; +--+ | count(*) | +--+ | 7543 | +--+ ``` To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1976439/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1976355] [NEW] remove eager subquery load for PortBindingLevel
Public bug reported: ENV: stable/victoria We have enabled DVR and have some huge virtual routers with around 60 router interfaces scheduled on around 800 compute nodes. In a large scale cloud deployment, when restart neutron agent, especially l3 agents, neutron server side will trigger too many slow DB query. And this will cause the agent restart time to be too long to operate. Error log of l3-agent restart: ``` ymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query') [SQL: 'SELECT ml2_port_binding_levels.port_id AS ml2_port_binding_levels_port_id, ml2_port_binding_levels.host AS ml2_port_binding_levels_host, ml2_port_binding_levels.level AS ml2_port_binding_levels_level, ml2_port_binding_levels.driver AS ml2_port_binding_levels_driver, ml2_port_binding_levels.segment_id AS ml2_port_binding_levels_segment_id, ports_1.id AS ports_1_id \nFROM (SELECT routers.id AS routers_id \nFROM routers LEFT OUTER JOIN (SELECT routerl3agentbindings.router_id AS router_id, count(routerl3agentbindings.router_id) AS count \nFROM routerl3agentbindings INNER JOIN router_extra_attributes ON routerl3agentbindings.router_id = router_extra_attributes.router_id INNER JOIN routers ON routers.id = router_extra_attributes.router_id GROUP BY routerl3agentbindings.router_id) AS anon_2 ON routers.id = anon_2.router_id) AS anon_1 INNER JOIN routerports AS routerports_1 ON anon_1.routers_id = routerports_1.router_id INNER JOIN ports AS ports_1 ON ports_1.id = routerports_1.port_id INNER JOIN ml2_port_binding_levels ON ports_1.id = ml2_port_binding_levels.port_id ORDER BY ports_1.id'] (Background on this error at: http://sqlalche.me/e/e3q8) ``` as well as ```console SELECT ml2_port_binding_levels.port_id AS ml2_port_binding_levels_port_id, ml2_port_binding_levels.host AS ml2_port_binding_levels_host, ml2_port_binding_levels.level AS ml2_port_binding_levels_level, ml2_port_binding_levels.driver AS ml2_port_binding_levels_driver, ml2_port_binding_levels.segment_id AS ml2_port_binding_levels_segment_id, ports_1.id AS ports_1_id FROM (SELECT routers.id AS routers_id FROM routers LEFT OUTER JOIN (SELECT routerl3agentbindings.router_id AS router_id, count(routerl3agentbindings.router_id) AS count FROM routerl3agentbindings INNER JOIN router_extra_attributes ON routerl3agentbindings.router_id = router_extra_attributes.router_id INNER JOIN routers ON routers.id = router_extra_attributes.router_id GROUP BY routerl3agentbindings.router_id) AS anon_2 ON routers.id = anon_2.router_id) AS anon_1 INNER JOIN routerports AS routerports_1 ON anon_1.routers_id = routerports_1.router_id INNER JOIN ports AS ports_1 ON ports_1.id = routerports_1.port_id INNER JOIN ml2_port_binding_levels ON ports_1.id = ml2_port_binding_levels.port_id; SELECT ml2_port_binding_levels.port_id AS ml2_port_binding_levels_port_id, ml2_port_binding_levels.host AS ml2_port_binding_levels_host, ml2_port_binding_levels.level AS ml2_port_binding_levels_level, ml2_port_binding_levels.driver AS ml2_port_binding_levels_driver, ml2_port_binding_levels.segment_id AS ml2_port_binding_levels_segment_id, ports_1.id AS ports_1_id FROM (SELECT DISTINCT routerports.port_id AS routerports_port_id FROM routerports WHERE routerports.router_id IN ('6e4ed0f5-e1b0-4cf1-931d-b30c93433719') AND routerports.port_type IN ('network:router_interface', 'network:ha_router_replicated_interface', 'network:router_interface_distributed')) AS anon_1 INNER JOIN ports AS ports_1 ON ports_1.id = anon_1.routerports_port_id INNER JOIN ml2_port_binding_levels ON ports_1.id = ml2_port_binding_levels.port_id ORDER BY ports_1.id; ``` from show processlist. and we saw excessive amounts of slow queries for ml2_port_binding_levels which is weird because it looks like not necessary. ** Affects: neutron Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1976355 Title: remove eager subquery load for PortBindingLevel Status in neutron: New Bug description: ENV: stable/victoria We have enabled DVR and have some huge virtual routers with around 60 router interfaces scheduled on around 800 compute nodes. In a large scale cloud deployment, when restart neutron agent, especially l3 agents, neutron server side will trigger too many slow DB query. And this will cause the agent restart time to be too long to operate. Error log of l3-agent restart: ``` ymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query') [SQL: 'SELECT ml2_port_binding_levels.port_id AS ml2_port_binding_levels_port_id, ml2_port_binding_levels.host AS ml2_port_binding_levels_host, ml2_port_binding_levels.level AS ml2_port_binding_levels_level, ml2_port_binding_levels.driver AS ml2_port_binding_levels_driver, ml2_port_binding_levels.segment_id AS
[Yahoo-eng-team] [Bug 1976345] [NEW] The sync_routers interface takes too long
Public bug reported: ENV: stable/victoria In a large scale cloud deployment, when restart neutron l3-agent, neutron server side the _routers_to_sync interface is called twice. And this will cause the agent restart time to be too long to operate. <_routers_to_sync> Elapsed:32.242 <_routers_to_sync> Elapsed:52.720s Elapsed:85.427s https://github.com/openstack/neutron/blob/master/neutron/api/rpc/handlers/l3_rpc.py #128 Can it be changed to: if extensions.is_extension_supported( self.plugin, constants.PORT_BINDING_EXT_ALIAS): self._ensure_host_set_on_ports(context, host, routers) # refresh the data structure after ports are bound routers = self._routers_to_sync(context, router_ids, host) else: routers = self._routers_to_sync(context, router_ids, host) ** Affects: neutron Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1976345 Title: The sync_routers interface takes too long Status in neutron: New Bug description: ENV: stable/victoria In a large scale cloud deployment, when restart neutron l3-agent, neutron server side the _routers_to_sync interface is called twice. And this will cause the agent restart time to be too long to operate. <_routers_to_sync> Elapsed:32.242 <_routers_to_sync> Elapsed:52.720s Elapsed:85.427s https://github.com/openstack/neutron/blob/master/neutron/api/rpc/handlers/l3_rpc.py #128 Can it be changed to: if extensions.is_extension_supported( self.plugin, constants.PORT_BINDING_EXT_ALIAS): self._ensure_host_set_on_ports(context, host, routers) # refresh the data structure after ports are bound routers = self._routers_to_sync(context, router_ids, host) else: routers = self._routers_to_sync(context, router_ids, host) To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1976345/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1966383] [NEW] ovs-agent: value -1 is not a valid port number when _set_port_filters
Public bug reported: env: neutron version: victoria /etc/neutron/plugins/ml2/openvswitch_agent.ini:firewall_driver=openvswitch reproduce the problem: $ openstack server delete xxx Reason for error: The port update and port remove events with very close timing. When the update event is processed, the ovs port has been deleted. When get_or_create_ofport reads the ovs port information from ovsdb, the port still exists but the obtained ofport=-1. Using ofport=-1 to call _update_flows_for_port or _set_port_filters to handle the openflow table will report the following error error log: INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [None req-984032db-a239-47ef-b80e-9064f5788a8b - - - - -] Port 61a04bae-b608-47cf-bb77-3e3a63def50c updated. Details: {'device': '61a04bae-b608-47cf-bb77-3e3a63def50c', 'device_id': '9a158b27-8ae8-4afd-ba8e-8e56247ac868', 'network_id': 'b225d077-0558-4b64-b32a-36217767e54f', 'port_id': '61a04bae-b608-47cf-bb77-3e3a63def50c', 'mac_address': 'fa:16:3e:ae:77:28', 'admin_state_up': True, 'network_type': 'vxlan', 'segmentation_id': 57, 'physical_network': None, 'fixed_ips': [{'subnet_id': '557d4d98-5e51-480b-8db5-a7f6adc615d7', 'ip_address': '172.31.16.184'}], 'device_owner': 'compute:az-x86-up-1', 'allowed_address_pairs': [], 'port_security_enabled': True, 'qos_policy_id': None, 'network_qos_policy_id': None, 'profile': {}, 'vif_type': 'ovs', 'vnic_type': 'normal', 'security_groups': ['05a2d0ca-df71-4ef4-92d3-1464a94c8e11']} INFO neutron.agent.securitygroups_rpc [None req-984032db-a239-47ef-b80e-9064f5788a8b - - - - -] Refresh firewall rules ERROR neutron.agent.linux.utils [None req-984032db-a239-47ef-b80e-9064f5788a8b - - - - -] Exit code: 1; Cmd: ['ovs-ofctl', 'add-flows', '-O', 'OpenFlow10', 'br-int', '--bundle', '-']; Stdin: hard_timeout=0,idle_timeout=0,priority=100,table=60,in_port=-1,cookie=6616235460093723383,actions=set_field:-1->reg5,set_field:9->reg6,resubmit(,71) hard_timeout=0,idle_timeout=0,priority=90,table=60,dl_dst=fa:16:3e:ae:77:28,dl_vlan=0x9,cookie=6616235460093723383,actions=set_field:-1->reg5,set_field:9->reg6,strip_vlan,resubmit(,81) hard_timeout=0,idle_timeout=0,priority=95,table=71,in_port=-1,dl_type=0x86dd,nw_proto=58,icmp_type=130,reg5=-1,cookie=6616235460093723383,actions=resubmit(,94) hard_timeout=0,idle_timeout=0,priority=95,table=71,in_port=-1,dl_type=0x86dd,nw_proto=58,icmp_type=133,reg5=-1,cookie=6616235460093723383,actions=resubmit(,94) hard_timeout=0,idle_timeout=0,priority=95,table=71,in_port=-1,dl_type=0x86dd,nw_proto=58,icmp_type=135,reg5=-1,cookie=6616235460093723383,actions=resubmit(,94) hard_timeout=0,idle_timeout=0,priority=95,table=71,in_port=-1,dl_type=0x86dd,nw_proto=58,icmp_type=136,reg5=-1,cookie=6616235460093723383,actions=resubmit(,94) hard_timeout=0,idle_timeout=0,priority=95,table=71,in_port=-1,dl_src=fa:16:3e:ae:77:28,dl_type=0x0806,arp_spa=172.31.16.184,reg5=-1,cookie=6616235460093723383,actions=resubmit(,94) hard_timeout=0,idle_timeout=0,priority=77,dl_type=0x0800,table=72,nw_proto=6,ct_state=+new-est,reg5=-1,cookie=6616235460093723383,actions=resubmit(,73); Stdout: ; Stderr: 2022-03-25T01:41:27Z|1|ofp_port|WARN|Negative value -1 is not a valid port number. ovs-ofctl: -:1: -1: invalid or unknown port for in_port ... ** Affects: neutron Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1966383 Title: ovs-agent: value -1 is not a valid port number when _set_port_filters Status in neutron: New Bug description: env: neutron version: victoria /etc/neutron/plugins/ml2/openvswitch_agent.ini:firewall_driver=openvswitch reproduce the problem: $ openstack server delete xxx Reason for error: The port update and port remove events with very close timing. When the update event is processed, the ovs port has been deleted. When get_or_create_ofport reads the ovs port information from ovsdb, the port still exists but the obtained ofport=-1. Using ofport=-1 to call _update_flows_for_port or _set_port_filters to handle the openflow table will report the following error error log: INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [None req-984032db-a239-47ef-b80e-9064f5788a8b - - - - -] Port 61a04bae-b608-47cf-bb77-3e3a63def50c updated. Details: {'device': '61a04bae-b608-47cf-bb77-3e3a63def50c', 'device_id': '9a158b27-8ae8-4afd-ba8e-8e56247ac868', 'network_id': 'b225d077-0558-4b64-b32a-36217767e54f', 'port_id': '61a04bae-b608-47cf-bb77-3e3a63def50c', 'mac_address': 'fa:16:3e:ae:77:28', 'admin_state_up': True, 'network_type': 'vxlan', 'segmentation_id': 57, 'physical_network': None, 'fixed_ips': [{'subnet_id': '557d4d98-5e51-480b-8db5-a7f6adc615d7', 'ip_address': '172.31.16.184'}], 'device_owner': 'compute:az-x86-up-1', 'allowed_address_pairs': [],
[Yahoo-eng-team] [Bug 1955640] [NEW] Performance of mariadb's neutron.agents table
Public bug reported: mariadb table of neutron.agents, only agents.When using only agent.host as index, it will not hit the index and will scan the whole table. neutron's many interfaces are using agents.host as index query, index can not be hit, query efficiency is too low. eg: ``` def get_dvr_active_network_ports(context, network_id): query = context.session.query(ml2_models.DistributedPortBinding, agent_model.Agent) query = query.join(agent_model.Agent, agent_model.Agent.host == ml2_models.DistributedPortBinding.host) MariaDB [neutron]> show index from agents; +++-+--+-+---+-+--++--++-+---+ | Table | Non_unique | Key_name| Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | +++-+--+-+---+-+--++--++-+---+ | agents | 0 | PRIMARY |1 | id | A | 20 | NULL | NULL | | BTREE | | | | agents | 0 | uniq_agents0agent_type0host |1 | agent_type | A | 10 | NULL | NULL | | BTREE | | | | agents | 0 | uniq_agents0agent_type0host |2 | host | A | 20 | NULL | NULL | | BTREE | | | +++-+--+-+---+-+--++--++-+---+ explain SELECT ports.project_id AS ports_project_id, ports.id AS ports_id, ports.name AS ports_name, ports.network_id AS ports_network_id, ports.mac_address AS ports_mac_address, ports.admin_state_up AS ports_admin_state_up, ports.status AS ports_status, ports.device_id AS ports_device_id, ports.device_owner AS ports_device_owner, ports.ip_allocation AS ports_ip_allocation, ports.standard_attr_id AS ports_standard_attr_id, anon_1.ml2_port_bindings_port_id AS anon_1_ml2_port_bindings_port_id, standardattributes_1.id AS standardattributes_1_id, standardattributes_1.resource_type AS standardattributes_1_resource_type, standardattributes_1.description AS standardattributes_1_description, standardattributes_1.revision_number AS standardattributes_1_revision_number, standardattributes_1.created_at AS standardattributes_1_created_at, standardattributes_1.updated_at AS standardattributes_1_updated_at, ml2_port_bindings_1.port_id AS ml2_port_bindings_1_port_id, ml2_port_bindings_1.host AS ml2_port_bindings_1_host, ml2_port_bindings_1.vnic_type AS ml2_port_bindings_1_vnic_type, ml2_port_bindings_1.profile AS ml2_port_bindings_1_profile, ml2_port_bindings_1.vif_type AS ml2_port_bindings_1_vif_type, ml2_port_bindings_1.vif_details AS ml2_port_bindings_1_vif_details, ml2_port_bindings_1.status AS ml2_port_bindings_1_status, subports_1.port_id AS subports_1_port_id, subports_1.trunk_id AS subports_1_trunk_id, subports_1.segmentation_type AS subports_1_segmentation_type, subports_1.segmentation_id AS subports_1_segmentation_id, standardattributes_2.id AS standardattributes_2_id, standardattributes_2.resource_type AS standardattributes_2_resource_type, standardattributes_2.description AS standardattributes_2_description, standardattributes_2.revision_number AS standardattributes_2_revision_number, standardattributes_2.created_at AS standardattributes_2_created_at, standardattributes_2.updated_at AS standardattributes_2_updated_at, trunks_1.project_id AS trunks_1_project_id, trunks_1.id AS trunks_1_id, trunks_1.admin_state_up AS trunks_1_admin_state_up, trunks_1.name AS trunks_1_name, trunks_1.port_id AS trunks_1_port_id, trunks_1.status AS trunks_1_status, trunks_1.standard_attr_id AS trunks_1_standard_attr_id, portsecuritybindings_1.port_id AS portsecuritybindings_1_port_id, portsecuritybindings_1.port_security_enabled AS portsecuritybindings_1_port_security_enabled, qos_port_policy_bindings_1.policy_id AS qos_port_policy_bindings_1_policy_id, qos_port_policy_bindings_1.port_id AS qos_port_policy_bindings_1_port_id, portdnses_1.port_id AS portdnses_1_port_id, portdnses_1.current_dns_name AS portdnses_1_current_dns_name, portdnses_1.current_dns_domain AS portdnses_1_current_dns_domain, portdnses_1.previous_dns_name AS portdnses_1_previous_dns_name, portdnses_1.previous_dns_domain AS portdnses_1_previous_dns_domain, portdnses_1.dns_name AS portdnses_1_dns_name, portdnses_1.dns_domain AS portdnses_1_dns_domain, securitygroupportbindings_1.port_id AS securitygroupportbindings_1_port_id,
[Yahoo-eng-team] [Bug 1955639] [NEW] Performance of mariadb's neutron.agents table
Public bug reported: mariadb table of neutron.agents, only agents.When using only agent.host as index, it will not hit the index and will scan the whole table. neutron's many interfaces are using agents.host as index query, index can not be hit, query efficiency is too low. eg: ``` def get_dvr_active_network_ports(context, network_id): query = context.session.query(ml2_models.DistributedPortBinding, agent_model.Agent) query = query.join(agent_model.Agent, agent_model.Agent.host == ml2_models.DistributedPortBinding.host) MariaDB [neutron]> show index from agents; +++-+--+-+---+-+--++--++-+---+ | Table | Non_unique | Key_name| Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | +++-+--+-+---+-+--++--++-+---+ | agents | 0 | PRIMARY |1 | id | A | 20 | NULL | NULL | | BTREE | | | | agents | 0 | uniq_agents0agent_type0host |1 | agent_type | A | 10 | NULL | NULL | | BTREE | | | | agents | 0 | uniq_agents0agent_type0host |2 | host | A | 20 | NULL | NULL | | BTREE | | | +++-+--+-+---+-+--++--++-+---+ explain SELECT ports.project_id AS ports_project_id, ports.id AS ports_id, ports.name AS ports_name, ports.network_id AS ports_network_id, ports.mac_address AS ports_mac_address, ports.admin_state_up AS ports_admin_state_up, ports.status AS ports_status, ports.device_id AS ports_device_id, ports.device_owner AS ports_device_owner, ports.ip_allocation AS ports_ip_allocation, ports.standard_attr_id AS ports_standard_attr_id, anon_1.ml2_port_bindings_port_id AS anon_1_ml2_port_bindings_port_id, standardattributes_1.id AS standardattributes_1_id, standardattributes_1.resource_type AS standardattributes_1_resource_type, standardattributes_1.description AS standardattributes_1_description, standardattributes_1.revision_number AS standardattributes_1_revision_number, standardattributes_1.created_at AS standardattributes_1_created_at, standardattributes_1.updated_at AS standardattributes_1_updated_at, ml2_port_bindings_1.port_id AS ml2_port_bindings_1_port_id, ml2_port_bindings_1.host AS ml2_port_bindings_1_host, ml2_port_bindings_1.vnic_type AS ml2_port_bindings_1_vnic_type, ml2_port_bindings_1.profile AS ml2_port_bindings_1_profile, ml2_port_bindings_1.vif_type AS ml2_port_bindings_1_vif_type, ml2_port_bindings_1.vif_details AS ml2_port_bindings_1_vif_details, ml2_port_bindings_1.status AS ml2_port_bindings_1_status, subports_1.port_id AS subports_1_port_id, subports_1.trunk_id AS subports_1_trunk_id, subports_1.segmentation_type AS subports_1_segmentation_type, subports_1.segmentation_id AS subports_1_segmentation_id, standardattributes_2.id AS standardattributes_2_id, standardattributes_2.resource_type AS standardattributes_2_resource_type, standardattributes_2.description AS standardattributes_2_description, standardattributes_2.revision_number AS standardattributes_2_revision_number, standardattributes_2.created_at AS standardattributes_2_created_at, standardattributes_2.updated_at AS standardattributes_2_updated_at, trunks_1.project_id AS trunks_1_project_id, trunks_1.id AS trunks_1_id, trunks_1.admin_state_up AS trunks_1_admin_state_up, trunks_1.name AS trunks_1_name, trunks_1.port_id AS trunks_1_port_id, trunks_1.status AS trunks_1_status, trunks_1.standard_attr_id AS trunks_1_standard_attr_id, portsecuritybindings_1.port_id AS portsecuritybindings_1_port_id, portsecuritybindings_1.port_security_enabled AS portsecuritybindings_1_port_security_enabled, qos_port_policy_bindings_1.policy_id AS qos_port_policy_bindings_1_policy_id, qos_port_policy_bindings_1.port_id AS qos_port_policy_bindings_1_port_id, portdnses_1.port_id AS portdnses_1_port_id, portdnses_1.current_dns_name AS portdnses_1_current_dns_name, portdnses_1.current_dns_domain AS portdnses_1_current_dns_domain, portdnses_1.previous_dns_name AS portdnses_1_previous_dns_name, portdnses_1.previous_dns_domain AS portdnses_1_previous_dns_domain, portdnses_1.dns_name AS portdnses_1_dns_name, portdnses_1.dns_domain AS portdnses_1_dns_domain, securitygroupportbindings_1.port_id AS securitygroupportbindings_1_port_id,