[Yahoo-eng-team] [Bug 2025410] [NEW] After `openstack port set no-fixed-ip` command, the port's ip and mac's arp are still in qrouter without deletion

2023-06-29 Thread liujinxin
Public bug reported:

neutron branch: stable/victoria

Steps to reproduce the problem:

```

# openstack port show ba20b562-7320-4fb4-99f1-eae617720bf1
+-+---+
| Field   | Value   

  |
+-+---+
| admin_state_up  | UP  

  |
| allowed_address_pairs   | 

  |
| binding_host_id | hci-002 

  |
| binding_profile | 

  |
| binding_vif_details | bridge_name='br-int', connectivity='l2', 
datapath_type='system', ovs_hybrid_plug='False', port_filter='True' 
 |
| binding_vif_type| ovs 

  |
| binding_vnic_type   | normal  

  | 
   |
| device_owner| compute:az-box-1


 |
| extra_dhcp_opts | 

  |
| fixed_ips   | ip_address='192.168.1.43', 
subnet_id='f5acaafe-554e-47d4-9107-6566b5d50bb3'
   |
| id  | ba20b562-7320-4fb4-99f1-eae617720bf1

  |
| ip_allocation   | None

  |
| mac_address | fa:16:3e:4c:fd:ce   

  |
...

openstack port set  ba20b562-7320-4fb4-99f1-eae617720bf1 --no-fixed-ip
openstack port set  ba20b562-7320-4fb4-99f1-eae617720bf1 --fixed-ip  
ip-address=192.168.1.143

openstack port set  ba20b562-7320-4fb4-99f1-eae617720bf1 --no-fixed-ip
openstack port set  ba20b562-7320-4fb4-99f1-eae617720bf1 --fixed-ip  
ip-address=192.168.1.243

[root@hci-002 ~]# ip netns exec qrouter-ffd819cb-349e-4b31-845a-7b7d97461f32 
arp -n |grep fa:16:3e:4c:fd:ce
192.168.1.43 ether   fa:16:3e:4c:fd:ce   CM
qr-8ce629a5-03
192.168.1.143ether   fa:16:3e:4c:fd:ce   CM
qr-8ce629a5-03
192.168.1.243ether   fa:16:3e:4c:fd:ce   CM
qr-8ce629a5-03
```

** Affects: neutron
 Importance: Undecided
 Status: New

** Summary changed:

- When I pass the `openstack port set no-fixed-ip` command, the port's ip and 
mac's arp are still in qrouter without deletion
+ After `openstack port set no-fixed-ip` command, the port's ip and mac's arp 
are still in qrouter without deletion

** Description changed:

- When I pass the `openstack port set no-fixed-ip` command, the port's ip
- and mac's arp are still in qrouter without deletion
- 
  neutron branch: stable/victoria
- 
  
  Steps to reproduce the problem:
  
  ```
  
- # openstack port show ba20b562-7320-4fb4-99f1-eae617720bf1 
+ # openstack port show ba20b562-7320-4fb4-99f1-eae617720bf1
  
+-+---+
  | Field   | Value 
   

[Yahoo-eng-team] [Bug 1988382] [NEW] L3 agent(agent_mode=dvr_snat) restart, fip namespace removed rfp-port, resulting in fip not connecting

2022-09-01 Thread liujinxin
Public bug reported:

stable/victoria

openstack network node(agent_mode=dvr_snat) and compute node are the
same node,the VM on this node is bound to fip, but the snat_port of the
router of this VM is located in another network node,VM can access
north-south traffic via fip.But if you restart the l3-agent,The
external_gateway_removed is called during l3-agent restart, causing the
fip on that node to be unreachable

https://github.com/openstack/neutron/blob/stable/victoria/neutron/agent/l3/dvr_edge_router.py
#39

def external_gateway_added(self, ex_gw_port, interface_name):
elif self.snat_namespace.exists():
# This is the case where the snat was moved manually or
# rescheduled to a different agent when the agent was dead.
LOG.debug("SNAT was moved or rescheduled to a different host "
  "and does not match with the current host. This is "
  "a stale namespace %s and will be cleared from the "
  "current dvr_snat host.", self.snat_namespace.name)
self.external_gateway_removed(ex_gw_port, interface_name)

** Affects: neutron
 Importance: Undecided
 Status: New

** Description changed:

  stable/victoria
  
- 
- openstack network node(agent_mode=dvr_snat) and compute node are the same 
node,the VM on this node is bound to fip, but the snat_port of the router of 
this VM is located in another network node,VM can access north-south traffic 
via fip.But if you restart the l3-agent,The external_gateway_removed is called 
during the reboot, causing the fip on that node to be unreachable
+ openstack network node(agent_mode=dvr_snat) and compute node are the
+ same node,the VM on this node is bound to fip, but the snat_port of the
+ router of this VM is located in another network node,VM can access
+ north-south traffic via fip.But if you restart the l3-agent,The
+ external_gateway_removed is called during l3-agent restart, causing the
+ fip on that node to be unreachable
  
  
https://github.com/openstack/neutron/blob/stable/victoria/neutron/agent/l3/dvr_edge_router.py
  #39
  
- def external_gateway_added(self, ex_gw_port, interface_name):
- elif self.snat_namespace.exists():
- # This is the case where the snat was moved manually or
- # rescheduled to a different agent when the agent was dead.
- LOG.debug("SNAT was moved or rescheduled to a different host "
-   "and does not match with the current host. This is "
-   "a stale namespace %s and will be cleared from the "
-   "current dvr_snat host.", self.snat_namespace.name)
- self.external_gateway_removed(ex_gw_port, interface_name)
+ def external_gateway_added(self, ex_gw_port, interface_name):
+ elif self.snat_namespace.exists():
+ # This is the case where the snat was moved manually or
+ # rescheduled to a different agent when the agent was dead.
+ LOG.debug("SNAT was moved or rescheduled to a different host "
+   "and does not match with the current host. This is "
+   "a stale namespace %s and will be cleared from the "
+   "current dvr_snat host.", self.snat_namespace.name)
+ self.external_gateway_removed(ex_gw_port, interface_name)

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1988382

Title:
  L3 agent(agent_mode=dvr_snat) restart, fip namespace removed rfp-port,
  resulting in fip not connecting

Status in neutron:
  New

Bug description:
  stable/victoria

  openstack network node(agent_mode=dvr_snat) and compute node are the
  same node,the VM on this node is bound to fip, but the snat_port of
  the router of this VM is located in another network node,VM can access
  north-south traffic via fip.But if you restart the l3-agent,The
  external_gateway_removed is called during l3-agent restart, causing
  the fip on that node to be unreachable

  
https://github.com/openstack/neutron/blob/stable/victoria/neutron/agent/l3/dvr_edge_router.py
  #39

  def external_gateway_added(self, ex_gw_port, interface_name):
  elif self.snat_namespace.exists():
  # This is the case where the snat was moved manually or
  # rescheduled to a different agent when the agent was dead.
  LOG.debug("SNAT was moved or rescheduled to a different host "
    "and does not match with the current host. This is "
    "a stale namespace %s and will be cleared from the "
    "current dvr_snat host.", self.snat_namespace.name)
  self.external_gateway_removed(ex_gw_port, interface_name)

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1988382/+subscriptions


-- 

[Yahoo-eng-team] [Bug 1987377] [NEW] neutron-metadata-agent the memory usage is increasing

2022-08-23 Thread liujinxin
Public bug reported:

env: 
branch: stable/victoria

The memory footprint becomes smaller after restarting the metadata-
agent, but as it runs longer, the memory footprint becomes larger and
larger until it is killed by oom


kubectl top pod neutron-metadata-agent-default-6nz79  -nopenstack
NAME   CPU(cores)   MEMORY(bytes)
neutron-metadata-agent-default-6nz79   4m   7121Mi

kubectl top pod -nopenstack neutron-metadata-agent-default-7znzp
NAME   CPU(cores)   MEMORY(bytes)
neutron-metadata-agent-default-7znzp   3m   24321Mi


Tasks:  12 total,   1 running,  11 sleeping,   0 stopped,   0 zombie
%Cpu(s):  3.5 us,  1.3 sy,  0.0 ni, 94.2 id,  0.0 wa,  0.0 hi,  0.3 si,  0.7 st
KiB Mem : 32885820 total,  3087452 free, 28965316 used,   833052 buff/cache
KiB Swap:0 total,0 free,0 used.  3446688 avail Mem

PID USER  PR  NIVIRTRESSHR S  %CPU %MEM TIME+ COMMAND
  1 neutron   20   01020  4  0 S   0.0  0.0   0:00.07 pause
 314636 neutron   20   0  193348  83160   4328 S   0.0  0.3  14:47.42 
neutron-metadat
 314649 neutron   20   0 3246420 2.988g972 S   0.0  9.5   8:37.78 
neutron-metadat
 314650 neutron   20   0 3225648 2.970g   3184 S   0.0  9.5   8:36.11 
neutron-metadat
 314651 neutron   20   0 3228576 2.970g  0 S   0.0  9.5   8:37.24 
neutron-metadat
 314652 neutron   20   0 3223508 2.966g   1316 S   0.0  9.5   8:35.71 
neutron-metadat
 314653 neutron   20   0 3216512 2.959g844 S   0.0  9.4   8:37.38 
neutron-metadat
 314654 neutron   20   0 3265104 3.006g976 S   0.0  9.6   8:40.20 
neutron-metadat
 314655 neutron   20   0 3180172 2.924g280 S   0.0  9.3   8:33.43 
neutron-metadat
 377345 neutron   20   0  193348  83388   4556 S   0.0  0.3   0:00.01 
neutron-metadat

** Affects: neutron
 Importance: Undecided
 Status: New

** Description changed:

+ env: 
+ branch: stable/victoria
+ 
+ The memory footprint becomes smaller after restarting the metadata-
+ agent, but as it runs longer, the memory footprint becomes larger and
+ larger until it is killed by oom
+ 
+ 
  kubectl top pod neutron-metadata-agent-default-6nz79  -nopenstack
- NAME   CPU(cores)   MEMORY(bytes)   
- neutron-metadata-agent-default-6nz79   4m   7121Mi  
+ NAME   CPU(cores)   MEMORY(bytes)
+ neutron-metadata-agent-default-6nz79   4m   7121Mi
  
  kubectl top pod -nopenstack neutron-metadata-agent-default-7znzp
- NAME   CPU(cores)   MEMORY(bytes)   
- neutron-metadata-agent-default-7znzp   3m   24321Mi 
+ NAME   CPU(cores)   MEMORY(bytes)
+ neutron-metadata-agent-default-7znzp   3m   24321Mi
  
- neutron-metadata-agent-default-7znzp  1/1
- Running 14 103d10.200.4.54   mgt14
- 
  
  Tasks:  12 total,   1 running,  11 sleeping,   0 stopped,   0 zombie
  %Cpu(s):  3.5 us,  1.3 sy,  0.0 ni, 94.2 id,  0.0 wa,  0.0 hi,  0.3 si,  0.7 
st
  KiB Mem : 32885820 total,  3087452 free, 28965316 used,   833052 buff/cache
- KiB Swap:0 total,0 free,0 used.  3446688 avail Mem 
+ KiB Swap:0 total,0 free,0 used.  3446688 avail Mem
  
- PID USER  PR  NIVIRTRESSHR S  %CPU %MEM TIME+ COMMAND 


   
-   1 neutron   20   01020  4  0 S   0.0  0.0   0:00.07 pause   


   
-  314636 neutron   20   0  193348  83160   4328 S   0.0  0.3  14:47.42 
neutron-metadat 

   
-  314649 neutron   20   0 3246420 2.988g972 S   0.0  9.5   8:37.78 
neutron-metadat 

   
-  314650 neutron   20   0 3225648 2.970g   3184 S   0.0  9.5   8:36.11 
neutron-metadat 

   
-  314651 neutron   20   0 3228576 2.970g  0 S   0.0  9.5   8:37.24 
neutron-metadat 

   
-  314652 neutron   20   0 3223508 2.966g   1316 S   0.0  9.5   8:35.71 

[Yahoo-eng-team] [Bug 1978088] [NEW] After ovs-agent restart, table=21 and table=22 on br-tun openflow table is missing

2022-06-09 Thread liujinxin
Public bug reported:

In the following scenarios (especially in large-scale cases, when
restarting many ovs-agents at the same time), the openflow table is
missing and cannot be self-recovered

As a simple example, restarting two ovs-agent at the same time:
```
network.local_ip=30.0.1.6,output="vxlan-1e000106"
compute1.local_ip=30.0.1.7,output="vxlan-1e000107"
compute2.local_ip=30.0.1.8,output="vxlan-1e000108"

network.port=('192.168.1.2')
compute1.port=('192.168.1.11')
compute2.port=('192.168.1.141')


// iter_num=0 of compute1
DEBUG neutron.plugins.ml2.db [req-f8093da8-9f1a-4da2-a27f-03f1b4d50dfd - - - - 
-] For port cb7fad87-7dc7-4008-a349-3a17e3b8be71, host compute1, got binding 
levels 
[PortBindingLevel(driver='openvswitch',host='compute1',level=0,port_id=cb7fad87-7dc7-4008-a349-3a17e3b8be71,segment=NetworkSegment(0bcd776d-92cd-4d96-9e54-92350700c4ca),segment_id=0bcd776d-92cd-4d96-9e54-92350700c4ca)]
 get_binding_level_objs 
/usr/lib/python3.6/site-packages/neutron/plugins/ml2/db.py:78
DEBUG neutron.plugins.ml2.drivers.l2pop.mech_driver 
[req-f8093da8-9f1a-4da2-a27f-03f1b4d50dfd - - - - -] host: compute1, 
agent_active_ports: 3, refresh_tunnels: True update_port_up

// rpc-1
Notify l2population agent compute1 at q-agent-notifier the message 
add_fdb_entries with {'8883e077-aadb-4b79-9315-3c029e94a857': {'segment_id': 
22, 'network_type': 'vxlan', 'ports': {'30.0.1.6': [('00:00:00:00:00:00', 
'0.0.0.0'), PortInfo(mac_address='fa:16:3e:db:75:11', 
ip_address='192.168.1.2')], '30.0.1.8': [('00:00:00:00:00:00', '0.0.0.0'), 
PortInfo(mac_address='fa:16:3e:45:eb:6a', ip_address='192.168.1.141')]}}} 
_notification_host

// rpc-2
Fanout notify l2population agents at q-agent-notifier the message 
add_fdb_entries with {'8883e077-aadb-4b79-9315-3c029e94a857': {'segment_id': 
22, 'network_type': 'vxlan', 'ports': {'30.0.1.7': [('00:00:00:00:00:00', 
'0.0.0.0'), PortInfo(mac_address='fa:16:3e:21:34:43', 
ip_address='192.168.1.11')]}}} _notification_fanout

// iter_num>0 of compute1
DEBUG neutron.plugins.ml2.db [req-f8093da8-9f1a-4da2-a27f-03f1b4d50dfd - - - - 
-] For port cb7fad87-7dc7-4008-a349-3a17e3b8be71, host compute1, got binding 
levels 
[PortBindingLevel(driver='openvswitch',host='compute1',level=0,port_id=cb7fad87-7dc7-4008-a349-3a17e3b8be71,segment=NetworkSegment(0bcd776d-92cd-4d96-9e54-92350700c4ca),segment_id=0bcd776d-92cd-4d96-9e54-92350700c4ca)]
 get_binding_level_objs 
/usr/lib/python3.6/site-packages/neutron/plugins/ml2/db.py:78
2022-06-09 17:45:39.546 833566 DEBUG 
neutron.plugins.ml2.drivers.l2pop.mech_driver 
[req-f8093da8-9f1a-4da2-a27f-03f1b4d50dfd - - - - -] host: compute1, 
agent_active_ports: 3, refresh_tunnels: False update_port_up 

...


// iter_num=0 of compute2
DEBUG neutron.plugins.ml2.db [req-2e977b20-4438-4928-85bb-59de4c7389f6 - - - - 
-] For port ccca9701-19c0-4590-92d0-5fbd909d4eeb, host compute2, got binding 
levels 
[PortBindingLevel(driver='openvswitch',host='compute2',level=0,port_id=ccca9701-19c0-4590-92d0-5fbd909d4eeb,segment=NetworkSegment(0bcd776d-92cd-4d96-9e54-92350700c4ca),segment_id=0bcd776d-92cd-4d96-9e54-92350700c4ca)]
 get_binding_level_objs 
/usr/lib/python3.6/site-packages/neutron/plugins/ml2/db.py:78
DEBUG neutron.plugins.ml2.drivers.l2pop.mech_driver 
[req-2e977b20-4438-4928-85bb-59de4c7389f6 - - - - -] host: compute2, 
agent_active_ports: 3, refresh_tunnels: True update_port_up

// rpc-3
Notify l2population agent compute2 at q-agent-notifier the message 
add_fdb_entries with {'8883e077-aadb-4b79-9315-3c029e94a857': {'segment_id': 
22, 'network_type': 'vxlan', 'ports': {'30.0.1.6': [('00:00:00:00:00:00', 
'0.0.0.0'), PortInfo(mac_address='fa:16:3e:db:75:11', 
ip_address='192.168.1.2')], '30.0.1.7': [('00:00:00:00:00:00', '0.0.0.0'), 
PortInfo(mac_address='fa:16:3e:21:34:43', ip_address='192.168.1.11')]}}} 
_notification_host

// rpc-4
Fanout notify l2population agents at q-agent-notifier the message 
add_fdb_entries with {'8883e077-aadb-4b79-9315-3c029e94a857': {'segment_id': 
22, 'network_type': 'vxlan', 'ports': {'30.0.1.8': [('00:00:00:00:00:00', 
'0.0.0.0'), PortInfo(mac_address='fa:16:3e:45:eb:6a', 
ip_address='192.168.1.141')]}}} _notification_fanout

```

1. After iter_num=0, cleanup_stale_flows clears table=21 and table=22 of stale 
openflow tables
2. If compute1 receives rpc-4 first, tunnels_missing=False
3. rpc-1 timeout not received 
4. As a result, table=22,priority=1, output is missing output="vxlan-1e000106" 
and table=21,priority=1 is missing 192.168.1.2 arp responder table
5. Missing flow tables will always be missing, resulting in VMs under this 
network not being able to communicate with VMs under the network node at layer 2

** Affects: neutron
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1978088

Title:
  After ovs-agent restart, table=21 and table=22 on br-tun openflow
  table is 

[Yahoo-eng-team] [Bug 1976439] [NEW] The database ml2_port_binding_levels and ml2_distributed_port_bindings tables have a lot of redundant data

2022-05-31 Thread liujinxin
Public bug reported:

ENV: stable/victoria


In a large scale cloud deployment and we have enabled DVR and have some huge 
virtual routers.When we remove some nodes from the cluster, the database 
ml2_port_binding_levels and ml2_distributed_port_bindings tables have a lot of 
redundant data,when restart neutron agent, especially l3 agents, neutron server 
side will trigger too many slow DB query. And this will cause the agent restart 
time to be too long to operate.

For example, the following xxx nodes have been removed from the cluster,
and there are no more qrouter on these nodes

```
MariaDB [neutron]> select count(*) from ml2_port_binding_levels;
+--+
| count(*) |
+--+
|   163986 |
+--+
MariaDB [neutron]> select count(*) from ml2_distributed_port_bindings; 
+--+
| count(*) |
+--+
|   119797 |
+--+
MariaDB [neutron]> select count(*) from ml2_distributed_port_bindings where 
host like("%xxx%");
+--+
| count(*) |
+--+
|78920 |
+--+
MariaDB [neutron]> select count(*) from ml2_port_binding_levels where host 
like("%xxx%"); 
+--+
| count(*) |
+--+
|79482 |
+--+

MariaDB [neutron]> select count(distinct host) from ml2_port_binding_levels 
where host like("%xxx%");
+--+
| count(distinct host) |
+--+
|  385 |
+--+

MariaDB [neutron]> select count(*) from routers;
+--+
| count(*) |
+--+
| 7543 |
+--+
```

** Affects: neutron
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1976439

Title:
  The database ml2_port_binding_levels and ml2_distributed_port_bindings
  tables have a lot of redundant data

Status in neutron:
  New

Bug description:
  ENV: stable/victoria

  
  In a large scale cloud deployment and we have enabled DVR and have some huge 
virtual routers.When we remove some nodes from the cluster, the database 
ml2_port_binding_levels and ml2_distributed_port_bindings tables have a lot of 
redundant data,when restart neutron agent, especially l3 agents, neutron server 
side will trigger too many slow DB query. And this will cause the agent restart 
time to be too long to operate.

  For example, the following xxx nodes have been removed from the
  cluster, and there are no more qrouter on these nodes

  ```
  MariaDB [neutron]> select count(*) from ml2_port_binding_levels;
  +--+
  | count(*) |
  +--+
  |   163986 |
  +--+
  MariaDB [neutron]> select count(*) from ml2_distributed_port_bindings; 
  +--+
  | count(*) |
  +--+
  |   119797 |
  +--+
  MariaDB [neutron]> select count(*) from ml2_distributed_port_bindings where 
host like("%xxx%");
  +--+
  | count(*) |
  +--+
  |78920 |
  +--+
  MariaDB [neutron]> select count(*) from ml2_port_binding_levels where host 
like("%xxx%"); 
  +--+
  | count(*) |
  +--+
  |79482 |
  +--+

  MariaDB [neutron]> select count(distinct host) from ml2_port_binding_levels 
where host like("%xxx%");
  +--+
  | count(distinct host) |
  +--+
  |  385 |
  +--+

  MariaDB [neutron]> select count(*) from routers;
  +--+
  | count(*) |
  +--+
  | 7543 |
  +--+
  ```

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1976439/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1976355] [NEW] remove eager subquery load for PortBindingLevel

2022-05-31 Thread liujinxin
Public bug reported:

ENV: stable/victoria

We have enabled DVR and have some huge virtual routers with around
60 router interfaces scheduled on around 800 compute nodes.

In a large scale cloud deployment, when restart neutron agent,
especially l3 agents, neutron server side will trigger too many slow DB
query. And this will cause the agent restart time to be too long to
operate.

Error log of l3-agent restart:

```
ymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during 
query') [SQL: 'SELECT ml2_port_binding_levels.port_id AS 
ml2_port_binding_levels_port_id, ml2_port_binding_levels.host AS 
ml2_port_binding_levels_host, ml2_port_binding_levels.level AS 
ml2_port_binding_levels_level, ml2_port_binding_levels.driver AS 
ml2_port_binding_levels_driver, ml2_port_binding_levels.segment_id AS 
ml2_port_binding_levels_segment_id, ports_1.id AS ports_1_id \nFROM (SELECT 
routers.id AS routers_id \nFROM routers LEFT OUTER JOIN (SELECT 
routerl3agentbindings.router_id AS router_id, 
count(routerl3agentbindings.router_id) AS count \nFROM routerl3agentbindings 
INNER JOIN router_extra_attributes ON routerl3agentbindings.router_id = 
router_extra_attributes.router_id INNER JOIN routers ON routers.id = 
router_extra_attributes.router_id GROUP BY routerl3agentbindings.router_id) AS 
anon_2 ON routers.id = anon_2.router_id) AS anon_1 INNER JOIN routerports AS 
routerports_1 ON anon_1.routers_id = routerports_1.router_id INNER JOIN ports 
AS ports_1 ON ports_1.id = routerports_1.port_id INNER JOIN 
ml2_port_binding_levels ON ports_1.id = ml2_port_binding_levels.port_id ORDER 
BY ports_1.id'] (Background on this error at: http://sqlalche.me/e/e3q8)
```
as well as

```console
   
SELECT ml2_port_binding_levels.port_id AS ml2_port_binding_levels_port_id, 
ml2_port_binding_levels.host AS ml2_port_binding_levels_host, 
ml2_port_binding_levels.level AS ml2_port_binding_levels_level, 
ml2_port_binding_levels.driver AS ml2_port_binding_levels_driver, 
ml2_port_binding_levels.segment_id AS ml2_port_binding_levels_segment_id, 
ports_1.id AS ports_1_id  FROM (SELECT routers.id AS routers_id  FROM routers 
LEFT OUTER JOIN (SELECT routerl3agentbindings.router_id AS router_id, 
count(routerl3agentbindings.router_id) AS count  FROM routerl3agentbindings 
INNER JOIN router_extra_attributes ON routerl3agentbindings.router_id = 
router_extra_attributes.router_id INNER JOIN routers ON routers.id = 
router_extra_attributes.router_id GROUP BY routerl3agentbindings.router_id) AS 
anon_2 ON routers.id = anon_2.router_id) AS anon_1 INNER JOIN routerports AS 
routerports_1 ON anon_1.routers_id = routerports_1.router_id INNER JOIN ports 
AS ports_1 ON ports_1.id = routerports_1.port_id INNER JOIN 
ml2_port_binding_levels ON ports_1.id = ml2_port_binding_levels.port_id;


SELECT ml2_port_binding_levels.port_id AS ml2_port_binding_levels_port_id, 
ml2_port_binding_levels.host AS ml2_port_binding_levels_host, 
ml2_port_binding_levels.level AS ml2_port_binding_levels_level, 
ml2_port_binding_levels.driver AS ml2_port_binding_levels_driver, 
ml2_port_binding_levels.segment_id AS ml2_port_binding_levels_segment_id, 
ports_1.id AS ports_1_id
FROM (SELECT DISTINCT routerports.port_id AS routerports_port_id
FROM routerports
WHERE routerports.router_id IN ('6e4ed0f5-e1b0-4cf1-931d-b30c93433719') AND 
routerports.port_type IN ('network:router_interface', 
'network:ha_router_replicated_interface', 
'network:router_interface_distributed')) AS anon_1 INNER JOIN ports AS ports_1 
ON ports_1.id = anon_1.routerports_port_id INNER JOIN ml2_port_binding_levels 
ON ports_1.id = ml2_port_binding_levels.port_id ORDER BY ports_1.id;


```

from show processlist. and we saw excessive amounts of slow queries for
ml2_port_binding_levels which is weird because it looks like not
necessary.

** Affects: neutron
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1976355

Title:
  remove eager subquery load for PortBindingLevel

Status in neutron:
  New

Bug description:
  ENV: stable/victoria

  We have enabled DVR and have some huge virtual routers with around
  60 router interfaces scheduled on around 800 compute nodes.

  In a large scale cloud deployment, when restart neutron agent,
  especially l3 agents, neutron server side will trigger too many slow
  DB query. And this will cause the agent restart time to be too long to
  operate.

  Error log of l3-agent restart:

  ```
  ymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during 
query') [SQL: 'SELECT ml2_port_binding_levels.port_id AS 
ml2_port_binding_levels_port_id, ml2_port_binding_levels.host AS 
ml2_port_binding_levels_host, ml2_port_binding_levels.level AS 
ml2_port_binding_levels_level, ml2_port_binding_levels.driver AS 
ml2_port_binding_levels_driver, ml2_port_binding_levels.segment_id AS 

[Yahoo-eng-team] [Bug 1976345] [NEW] The sync_routers interface takes too long

2022-05-31 Thread liujinxin
Public bug reported:

ENV: stable/victoria

In a large scale cloud deployment, when restart neutron l3-agent,
neutron server side the _routers_to_sync interface is called twice. And
this will cause the agent restart time to be too long to operate.

<_routers_to_sync> Elapsed:32.242
<_routers_to_sync> Elapsed:52.720s
 Elapsed:85.427s

https://github.com/openstack/neutron/blob/master/neutron/api/rpc/handlers/l3_rpc.py
#128

Can it be changed to:

if extensions.is_extension_supported(
self.plugin, constants.PORT_BINDING_EXT_ALIAS):
self._ensure_host_set_on_ports(context, host, routers)
# refresh the data structure after ports are bound
routers = self._routers_to_sync(context, router_ids, host)
else:
routers = self._routers_to_sync(context, router_ids, host)

** Affects: neutron
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1976345

Title:
  The sync_routers  interface takes too long

Status in neutron:
  New

Bug description:
  ENV: stable/victoria

  In a large scale cloud deployment, when restart neutron l3-agent,
  neutron server side the _routers_to_sync interface is called twice.
  And this will cause the agent restart time to be too long to operate.

  <_routers_to_sync> Elapsed:32.242
  <_routers_to_sync> Elapsed:52.720s
   Elapsed:85.427s

  
https://github.com/openstack/neutron/blob/master/neutron/api/rpc/handlers/l3_rpc.py
  #128

  Can it be changed to:

  if extensions.is_extension_supported(
  self.plugin, constants.PORT_BINDING_EXT_ALIAS):
  self._ensure_host_set_on_ports(context, host, routers)
  # refresh the data structure after ports are bound
  routers = self._routers_to_sync(context, router_ids, host)
  else:
  routers = self._routers_to_sync(context, router_ids, host)

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1976345/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1966383] [NEW] ovs-agent: value -1 is not a valid port number when _set_port_filters

2022-03-25 Thread liujinxin
Public bug reported:

env:
   neutron version: victoria
   /etc/neutron/plugins/ml2/openvswitch_agent.ini:firewall_driver=openvswitch

reproduce the problem:
$ openstack server delete xxx

Reason for error:
The port update and port remove events with very close timing.
When the update event is processed, the ovs port has been deleted. When 
get_or_create_ofport reads the ovs port information from ovsdb, the port still 
exists but the obtained ofport=-1.
Using ofport=-1 to call _update_flows_for_port or _set_port_filters to handle 
the openflow table will report the following error


error log:
INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [None 
req-984032db-a239-47ef-b80e-9064f5788a8b - - - - -] Port 
61a04bae-b608-47cf-bb77-3e3a63def50c updated. Details: {'device': 
'61a04bae-b608-47cf-bb77-3e3a63def50c', 'device_id': 
'9a158b27-8ae8-4afd-ba8e-8e56247ac868', 'network_id': 
'b225d077-0558-4b64-b32a-36217767e54f', 'port_id': 
'61a04bae-b608-47cf-bb77-3e3a63def50c', 'mac_address': 'fa:16:3e:ae:77:28', 
'admin_state_up': True, 'network_type': 'vxlan', 'segmentation_id': 57, 
'physical_network': None, 'fixed_ips': [{'subnet_id': 
'557d4d98-5e51-480b-8db5-a7f6adc615d7', 'ip_address': '172.31.16.184'}], 
'device_owner': 'compute:az-x86-up-1', 'allowed_address_pairs': [], 
'port_security_enabled': True, 'qos_policy_id': None, 'network_qos_policy_id': 
None, 'profile': {}, 'vif_type': 'ovs', 'vnic_type': 'normal', 
'security_groups': ['05a2d0ca-df71-4ef4-92d3-1464a94c8e11']}
INFO neutron.agent.securitygroups_rpc [None 
req-984032db-a239-47ef-b80e-9064f5788a8b - - - - -] Refresh firewall rules

ERROR neutron.agent.linux.utils [None req-984032db-a239-47ef-b80e-9064f5788a8b 
- - - - -] Exit code: 1; Cmd: ['ovs-ofctl', 'add-flows', '-O', 'OpenFlow10', 
'br-int', '--bundle', '-']; Stdin: 
hard_timeout=0,idle_timeout=0,priority=100,table=60,in_port=-1,cookie=6616235460093723383,actions=set_field:-1->reg5,set_field:9->reg6,resubmit(,71)
hard_timeout=0,idle_timeout=0,priority=90,table=60,dl_dst=fa:16:3e:ae:77:28,dl_vlan=0x9,cookie=6616235460093723383,actions=set_field:-1->reg5,set_field:9->reg6,strip_vlan,resubmit(,81)
hard_timeout=0,idle_timeout=0,priority=95,table=71,in_port=-1,dl_type=0x86dd,nw_proto=58,icmp_type=130,reg5=-1,cookie=6616235460093723383,actions=resubmit(,94)
hard_timeout=0,idle_timeout=0,priority=95,table=71,in_port=-1,dl_type=0x86dd,nw_proto=58,icmp_type=133,reg5=-1,cookie=6616235460093723383,actions=resubmit(,94)
hard_timeout=0,idle_timeout=0,priority=95,table=71,in_port=-1,dl_type=0x86dd,nw_proto=58,icmp_type=135,reg5=-1,cookie=6616235460093723383,actions=resubmit(,94)
hard_timeout=0,idle_timeout=0,priority=95,table=71,in_port=-1,dl_type=0x86dd,nw_proto=58,icmp_type=136,reg5=-1,cookie=6616235460093723383,actions=resubmit(,94)
hard_timeout=0,idle_timeout=0,priority=95,table=71,in_port=-1,dl_src=fa:16:3e:ae:77:28,dl_type=0x0806,arp_spa=172.31.16.184,reg5=-1,cookie=6616235460093723383,actions=resubmit(,94)
hard_timeout=0,idle_timeout=0,priority=77,dl_type=0x0800,table=72,nw_proto=6,ct_state=+new-est,reg5=-1,cookie=6616235460093723383,actions=resubmit(,73);
 Stdout: ; Stderr: 2022-03-25T01:41:27Z|1|ofp_port|WARN|Negative value -1 
is not a valid port number.
ovs-ofctl: -:1: -1: invalid or unknown port for in_port
...

** Affects: neutron
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1966383

Title:
  ovs-agent: value -1 is not a valid port number when _set_port_filters

Status in neutron:
  New

Bug description:
  env:
 neutron version: victoria
 /etc/neutron/plugins/ml2/openvswitch_agent.ini:firewall_driver=openvswitch

  reproduce the problem:
  $ openstack server delete xxx

  Reason for error:
  The port update and port remove events with very close timing.
  When the update event is processed, the ovs port has been deleted. When 
get_or_create_ofport reads the ovs port information from ovsdb, the port still 
exists but the obtained ofport=-1.
  Using ofport=-1 to call _update_flows_for_port or _set_port_filters to handle 
the openflow table will report the following error

  
  error log:
  INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [None 
req-984032db-a239-47ef-b80e-9064f5788a8b - - - - -] Port 
61a04bae-b608-47cf-bb77-3e3a63def50c updated. Details: {'device': 
'61a04bae-b608-47cf-bb77-3e3a63def50c', 'device_id': 
'9a158b27-8ae8-4afd-ba8e-8e56247ac868', 'network_id': 
'b225d077-0558-4b64-b32a-36217767e54f', 'port_id': 
'61a04bae-b608-47cf-bb77-3e3a63def50c', 'mac_address': 'fa:16:3e:ae:77:28', 
'admin_state_up': True, 'network_type': 'vxlan', 'segmentation_id': 57, 
'physical_network': None, 'fixed_ips': [{'subnet_id': 
'557d4d98-5e51-480b-8db5-a7f6adc615d7', 'ip_address': '172.31.16.184'}], 
'device_owner': 'compute:az-x86-up-1', 'allowed_address_pairs': [], 

[Yahoo-eng-team] [Bug 1955640] [NEW] Performance of mariadb's neutron.agents table

2021-12-23 Thread liujinxin
Public bug reported:

mariadb table of neutron.agents, only agents.When using only agent.host as 
index, it will not hit the index and will scan the whole table. neutron's many 
interfaces are using agents.host as index query, index can not be hit, query 
efficiency is too low.
eg:
```
def get_dvr_active_network_ports(context, network_id):
query = context.session.query(ml2_models.DistributedPortBinding,
  agent_model.Agent)
query = query.join(agent_model.Agent,
   agent_model.Agent.host ==
   ml2_models.DistributedPortBinding.host)
MariaDB [neutron]> show index from agents;
+++-+--+-+---+-+--++--++-+---+
| Table  | Non_unique | Key_name| Seq_in_index | 
Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | 
Comment | Index_comment |
+++-+--+-+---+-+--++--++-+---+
| agents |  0 | PRIMARY |1 | id 
 | A |  20 | NULL | NULL   |  | BTREE  | |  
 |
| agents |  0 | uniq_agents0agent_type0host |1 | agent_type 
 | A |  10 | NULL | NULL   |  | BTREE  | |  
 |
| agents |  0 | uniq_agents0agent_type0host |2 | host   
 | A |  20 | NULL | NULL   |  | BTREE  | |  
 |
+++-+--+-+---+-+--++--++-+---+

explain SELECT ports.project_id AS ports_project_id, ports.id AS ports_id, 
ports.name AS ports_name, ports.network_id AS ports_network_id, 
ports.mac_address AS ports_mac_address, ports.admin_state_up AS 
ports_admin_state_up, ports.status AS ports_status, ports.device_id AS 
ports_device_id, ports.device_owner AS ports_device_owner, ports.ip_allocation 
AS ports_ip_allocation, ports.standard_attr_id AS ports_standard_attr_id, 
anon_1.ml2_port_bindings_port_id AS anon_1_ml2_port_bindings_port_id, 
standardattributes_1.id AS standardattributes_1_id, 
standardattributes_1.resource_type AS standardattributes_1_resource_type, 
standardattributes_1.description AS standardattributes_1_description, 
standardattributes_1.revision_number AS standardattributes_1_revision_number, 
standardattributes_1.created_at AS standardattributes_1_created_at, 
standardattributes_1.updated_at AS standardattributes_1_updated_at, 
ml2_port_bindings_1.port_id AS ml2_port_bindings_1_port_id, 
ml2_port_bindings_1.host AS ml2_port_bindings_1_host, 
ml2_port_bindings_1.vnic_type AS ml2_port_bindings_1_vnic_type, 
ml2_port_bindings_1.profile AS ml2_port_bindings_1_profile, 
ml2_port_bindings_1.vif_type AS ml2_port_bindings_1_vif_type, 
ml2_port_bindings_1.vif_details AS ml2_port_bindings_1_vif_details, 
ml2_port_bindings_1.status AS ml2_port_bindings_1_status, subports_1.port_id AS 
subports_1_port_id, subports_1.trunk_id AS subports_1_trunk_id, 
subports_1.segmentation_type AS subports_1_segmentation_type, 
subports_1.segmentation_id AS subports_1_segmentation_id, 
standardattributes_2.id AS standardattributes_2_id, 
standardattributes_2.resource_type AS standardattributes_2_resource_type, 
standardattributes_2.description AS standardattributes_2_description, 
standardattributes_2.revision_number AS standardattributes_2_revision_number, 
standardattributes_2.created_at AS standardattributes_2_created_at, 
standardattributes_2.updated_at AS standardattributes_2_updated_at, 
trunks_1.project_id AS trunks_1_project_id, trunks_1.id AS trunks_1_id, 
trunks_1.admin_state_up AS trunks_1_admin_state_up, trunks_1.name AS 
trunks_1_name, trunks_1.port_id AS trunks_1_port_id, trunks_1.status AS 
trunks_1_status, trunks_1.standard_attr_id AS trunks_1_standard_attr_id, 
portsecuritybindings_1.port_id AS portsecuritybindings_1_port_id, 
portsecuritybindings_1.port_security_enabled AS 
portsecuritybindings_1_port_security_enabled, 
qos_port_policy_bindings_1.policy_id AS qos_port_policy_bindings_1_policy_id, 
qos_port_policy_bindings_1.port_id AS qos_port_policy_bindings_1_port_id, 
portdnses_1.port_id AS portdnses_1_port_id, portdnses_1.current_dns_name AS 
portdnses_1_current_dns_name, portdnses_1.current_dns_domain AS 
portdnses_1_current_dns_domain, portdnses_1.previous_dns_name AS 
portdnses_1_previous_dns_name, portdnses_1.previous_dns_domain AS 
portdnses_1_previous_dns_domain, portdnses_1.dns_name AS portdnses_1_dns_name, 
portdnses_1.dns_domain AS portdnses_1_dns_domain, 
securitygroupportbindings_1.port_id AS securitygroupportbindings_1_port_id, 

[Yahoo-eng-team] [Bug 1955639] [NEW] Performance of mariadb's neutron.agents table

2021-12-23 Thread liujinxin
Public bug reported:

mariadb table of neutron.agents, only agents.When using only agent.host as 
index, it will not hit the index and will scan the whole table. neutron's many 
interfaces are using agents.host as index query, index can not be hit, query 
efficiency is too low.
eg:
```
def get_dvr_active_network_ports(context, network_id):
query = context.session.query(ml2_models.DistributedPortBinding,
  agent_model.Agent)
query = query.join(agent_model.Agent,
   agent_model.Agent.host ==
   ml2_models.DistributedPortBinding.host)
MariaDB [neutron]> show index from agents;
+++-+--+-+---+-+--++--++-+---+
| Table  | Non_unique | Key_name| Seq_in_index | 
Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | 
Comment | Index_comment |
+++-+--+-+---+-+--++--++-+---+
| agents |  0 | PRIMARY |1 | id 
 | A |  20 | NULL | NULL   |  | BTREE  | |  
 |
| agents |  0 | uniq_agents0agent_type0host |1 | agent_type 
 | A |  10 | NULL | NULL   |  | BTREE  | |  
 |
| agents |  0 | uniq_agents0agent_type0host |2 | host   
 | A |  20 | NULL | NULL   |  | BTREE  | |  
 |
+++-+--+-+---+-+--++--++-+---+

explain SELECT ports.project_id AS ports_project_id, ports.id AS ports_id, 
ports.name AS ports_name, ports.network_id AS ports_network_id, 
ports.mac_address AS ports_mac_address, ports.admin_state_up AS 
ports_admin_state_up, ports.status AS ports_status, ports.device_id AS 
ports_device_id, ports.device_owner AS ports_device_owner, ports.ip_allocation 
AS ports_ip_allocation, ports.standard_attr_id AS ports_standard_attr_id, 
anon_1.ml2_port_bindings_port_id AS anon_1_ml2_port_bindings_port_id, 
standardattributes_1.id AS standardattributes_1_id, 
standardattributes_1.resource_type AS standardattributes_1_resource_type, 
standardattributes_1.description AS standardattributes_1_description, 
standardattributes_1.revision_number AS standardattributes_1_revision_number, 
standardattributes_1.created_at AS standardattributes_1_created_at, 
standardattributes_1.updated_at AS standardattributes_1_updated_at, 
ml2_port_bindings_1.port_id AS ml2_port_bindings_1_port_id, 
ml2_port_bindings_1.host AS ml2_port_bindings_1_host, 
ml2_port_bindings_1.vnic_type AS ml2_port_bindings_1_vnic_type, 
ml2_port_bindings_1.profile AS ml2_port_bindings_1_profile, 
ml2_port_bindings_1.vif_type AS ml2_port_bindings_1_vif_type, 
ml2_port_bindings_1.vif_details AS ml2_port_bindings_1_vif_details, 
ml2_port_bindings_1.status AS ml2_port_bindings_1_status, subports_1.port_id AS 
subports_1_port_id, subports_1.trunk_id AS subports_1_trunk_id, 
subports_1.segmentation_type AS subports_1_segmentation_type, 
subports_1.segmentation_id AS subports_1_segmentation_id, 
standardattributes_2.id AS standardattributes_2_id, 
standardattributes_2.resource_type AS standardattributes_2_resource_type, 
standardattributes_2.description AS standardattributes_2_description, 
standardattributes_2.revision_number AS standardattributes_2_revision_number, 
standardattributes_2.created_at AS standardattributes_2_created_at, 
standardattributes_2.updated_at AS standardattributes_2_updated_at, 
trunks_1.project_id AS trunks_1_project_id, trunks_1.id AS trunks_1_id, 
trunks_1.admin_state_up AS trunks_1_admin_state_up, trunks_1.name AS 
trunks_1_name, trunks_1.port_id AS trunks_1_port_id, trunks_1.status AS 
trunks_1_status, trunks_1.standard_attr_id AS trunks_1_standard_attr_id, 
portsecuritybindings_1.port_id AS portsecuritybindings_1_port_id, 
portsecuritybindings_1.port_security_enabled AS 
portsecuritybindings_1_port_security_enabled, 
qos_port_policy_bindings_1.policy_id AS qos_port_policy_bindings_1_policy_id, 
qos_port_policy_bindings_1.port_id AS qos_port_policy_bindings_1_port_id, 
portdnses_1.port_id AS portdnses_1_port_id, portdnses_1.current_dns_name AS 
portdnses_1_current_dns_name, portdnses_1.current_dns_domain AS 
portdnses_1_current_dns_domain, portdnses_1.previous_dns_name AS 
portdnses_1_previous_dns_name, portdnses_1.previous_dns_domain AS 
portdnses_1_previous_dns_domain, portdnses_1.dns_name AS portdnses_1_dns_name, 
portdnses_1.dns_domain AS portdnses_1_dns_domain, 
securitygroupportbindings_1.port_id AS securitygroupportbindings_1_port_id,