[Yahoo-eng-team] [Bug 2059128] [NEW] Internal Server Error when attempring to use an incorrect URL within the metadata API
Public bug reported: When trying to GET a non-existent metadata key within the VM, like '/latest/meta-data/hostname/abc', the Nova metadata service responses with a 500 HTTP status code: Inside a VM: $ curl http://169.254.169.254/latest/meta-data/hostname/abc 500 Internal Server Error 500 Internal Server Error An unknown error has occurred. Please try your request again. $ The nova metadata service logs: CRITICAL nova [None req-3286f047-98c4-41c8-a11b-02a140fd2e4d None None] Unhandled error: TypeError: string indices must be integers ERROR nova Traceback (most recent call last): ERROR nova File "/usr/local/lib/python3.9/site-packages/paste/urlmap.py", line 216, in __call__ ERROR nova return app(environ, start_response) ERROR nova File "/usr/local/lib/python3.9/site-packages/webob/dec.py", line 129, in __call__ ERROR nova resp = self.call_func(req, *args, **kw) ERROR nova File "/usr/local/lib/python3.9/site-packages/webob/dec.py", line 193, in call_func ERROR nova return self.func(req, *args, **kwargs) ERROR nova File "/usr/local/lib/python3.9/site-packages/oslo_middleware/base.py", line 124, in __call__ ERROR nova response = req.get_response(self.application) ERROR nova File "/usr/local/lib/python3.9/site-packages/webob/request.py", line 1313, in send ERROR nova status, headers, app_iter = self.call_application( ERROR nova File "/usr/local/lib/python3.9/site-packages/webob/request.py", line 1278, in call_application ERROR nova app_iter = application(self.environ, start_response) ERROR nova File "/usr/local/lib/python3.9/site-packages/webob/dec.py", line 129, in __call__ ERROR nova resp = self.call_func(req, *args, **kw) ERROR nova File "/usr/local/lib/python3.9/site-packages/webob/dec.py", line 193, in call_func ERROR nova return self.func(req, *args, **kwargs) ERROR nova File "/usr/local/lib/python3.9/site-packages/oslo_middleware/base.py", line 124, in __call__ ERROR nova response = req.get_response(self.application) ERROR nova File "/usr/local/lib/python3.9/site-packages/webob/request.py", line 1313, in send ERROR nova status, headers, app_iter = self.call_application( ERROR nova File "/usr/local/lib/python3.9/site-packages/webob/request.py", line 1278, in call_application ERROR nova app_iter = application(self.environ, start_response) ERROR nova File "/usr/local/lib/python3.9/site-packages/webob/dec.py", line 129, in __call__ ERROR nova resp = self.call_func(req, *args, **kw) ERROR nova File "/usr/local/lib/python3.9/site-packages/webob/dec.py", line 193, in call_func ERROR nova return self.func(req, *args, **kwargs) ERROR nova File "/opt/stack/nova/nova/api/metadata/handler.py", line 129, in __call__ ERROR nova data = meta_data.lookup(req.path_info) ERROR nova File "/opt/stack/nova/nova/api/metadata/base.py", line 576, in lookup ERROR nova data = self.get_ec2_item(path_tokens[1:]) ERROR nova File "/opt/stack/nova/nova/api/metadata/base.py", line 308, in get_ec2_item ERROR nova return find_path_in_tree(data, path_tokens[1:]) ERROR nova File "/opt/stack/nova/nova/api/metadata/base.py", line 737, in find_path_in_tree ERROR nova data = data[path_tokens[i]] ERROR nova TypeError: string indices must be integers ERROR nova [pid: 156048|app: 0|req: 5/9] 10.136.16.184 () {40 vars in 687 bytes} [Tue Mar 26 04:37:44 2024] GET /latest/meta-data/hostname/abc => generated 0 bytes in 82 msecs (HTTP/1.1 500) 0 headers in 0 bytes (0 switches on core 0) ** Affects: nova Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/2059128 Title: Internal Server Error when attempring to use an incorrect URL within the metadata API Status in OpenStack Compute (nova): New Bug description: When trying to GET a non-existent metadata key within the VM, like '/latest/meta-data/hostname/abc', the Nova metadata service responses with a 500 HTTP status code: Inside a VM: $ curl http://169.254.169.254/latest/meta-data/hostname/abc 500 Internal Server Error 500 Internal Server Error An unknown error has occurred. Please try your request again. $ The nova metadata service logs: CRITICAL nova [None req-3286f047-98c4-41c8-a11b-02a140fd2e4d None None] Unhandled error: TypeError: string indices must be integers ERROR nova Traceback (most recent call last): ERROR nova File "/usr/local/lib/python3.9/site-packages/paste/urlmap.py", line 216, in __call__ ERROR nova return app(environ, start_response) ERROR nova File "/usr/local/lib/python3.9/site-packages/webob/dec.py", line 129, in __call__ ERROR nova resp = self.call_func(req, *args, **kw) ERROR nova File "/usr/local/lib/python3.9/site-packages/webob/dec.py", line 193, in call_func ERROR
[Yahoo-eng-team] [Bug 2059032] [NEW] Neutron metadata service returns http code 500 if nova metadata service is down
Public bug reported: We discovered that if the nova metadata service is down, then the neutron metadata service starts printing stack traces with a 500 HTTP code to the user. Demo on a newly installed devstack $ systemctl stop devstack@n-api-meta.service Then inside a VM: $ curl http://169.254.169.254/latest/meta-data/hostname 500 Internal Server Error 500 Internal Server Error An unknown error has occurred. Please try your request again. $ Stack trace: ERROR neutron.agent.metadata.agent Traceback (most recent call last): ERROR neutron.agent.metadata.agent File "/opt/stack/neutron/neutron/agent/metadata/agent.py", line 85, in __call__ ERROR neutron.agent.metadata.agent res = self._proxy_request(instance_id, tenant_id, req) ERROR neutron.agent.metadata.agent File "/opt/stack/neutron/neutron/agent/metadata/agent.py", line 249, in _proxy_request ERROR neutron.agent.metadata.agent resp = requests.request(method=req.method, url=url, ERROR neutron.agent.metadata.agent File "/usr/local/lib/python3.9/site-packages/requests/api.py", line 59, in request ERROR neutron.agent.metadata.agent return session.request(method=method, url=url, **kwargs) ERROR neutron.agent.metadata.agent File "/usr/local/lib/python3.9/site-packages/requests/sessions.py", line 589, in request ERROR neutron.agent.metadata.agent resp = self.send(prep, **send_kwargs) ERROR neutron.agent.metadata.agent File "/usr/local/lib/python3.9/site-packages/requests/sessions.py", line 703, in send ERROR neutron.agent.metadata.agent r = adapter.send(request, **kwargs) ERROR neutron.agent.metadata.agent File "/usr/local/lib/python3.9/site-packages/requests/adapters.py", line 519, in send ERROR neutron.agent.metadata.agent raise ConnectionError(e, request=request) ERROR neutron.agent.metadata.agent requests.exceptions.ConnectionError: HTTPConnectionPool(host='10.136.16.184', port=8775): Max retries exceeded with url: /latest/meta-data/hostname (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 111] ECONNREFUSED')) ERROR neutron.agent.metadata.agent INFO eventlet.wsgi.server [-] :::192.168.100.14, "GET /latest/meta-data/hostname HTTP/1.1" status: 500 len: 362 time: 0.1392403 Also, in our installation the nova service is behind nginx. And if we stop nova metadata service we also get 500 http code but with another traceback: 2024-03-25 20:27:01.985 24 ERROR neutron.agent.metadata.agent [-] Unexpected error.: Exception: Unexpected response code: 502 2024-03-25 20:27:01.985 24 ERROR neutron.agent.metadata.agent Traceback (most recent call last): 2024-03-25 20:27:01.985 24 ERROR neutron.agent.metadata.agent File "/usr/lib/python3.6/site-packages/neutron/agent/metadata/agent.py", line 93, in __call__ 2024-03-25 20:27:01.985 24 ERROR neutron.agent.metadata.agent res = self._proxy_request(instance_id, tenant_id, req) 2024-03-25 20:27:01.985 24 ERROR neutron.agent.metadata.agent File "/usr/lib/python3.6/site-packages/neutron/agent/metadata/agent.py", line 288, in _proxy_request 2024-03-25 20:27:01.985 24 ERROR neutron.agent.metadata.agent resp.status_code) 2024-03-25 20:27:01.985 24 ERROR neutron.agent.metadata.agent Exception: Unexpected response code: 502 2024-03-25 20:27:01.985 24 ERROR neutron.agent.metadata.agent 2024-03-25 20:27:01.988 24 INFO eventlet.wsgi.server [-] 10.197.115.207, "GET /latest/meta-data/hostname HTTP/1.1" status: 500 len: 362 time: 0.1369441 It seems to me that it is also better to handle nginx-like gateway errors a bit more correctly. These 500 HTTP codes worries us because we are trying to create an alert system one of the criteria for which is 500 codes. ** Affects: neutron Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/2059032 Title: Neutron metadata service returns http code 500 if nova metadata service is down Status in neutron: New Bug description: We discovered that if the nova metadata service is down, then the neutron metadata service starts printing stack traces with a 500 HTTP code to the user. Demo on a newly installed devstack $ systemctl stop devstack@n-api-meta.service Then inside a VM: $ curl http://169.254.169.254/latest/meta-data/hostname 500 Internal Server Error 500 Internal Server Error An unknown error has occurred. Please try your request again. $ Stack trace: ERROR neutron.agent.metadata.agent Traceback (most recent call last): ERROR neutron.agent.metadata.agent File "/opt/stack/neutron/neutron/agent/metadata/agent.py", line 85, in __call__ ERROR neutron.agent.metadata.agent res = self._proxy_request(instance_id, tenant_id, req) ERROR neutron.agent.metadata.agent File "/opt/stack/neutron/neutron/agent/metadata/agent.py", line 249, in _proxy_request
[Yahoo-eng-team] [Bug 2038931] [NEW] ovsfw: OVS br-int rule disappears from the table=60 after stop/start VM
Public bug reported: I found out that after VM creation and after VM stop/start the set of OVS rules is different in br-int table=60 (TRANSIENT_TABLE) I have a flat network, in this network I create a VM. After the VM stop/start the set of rules in table 60 for this VM is different from the one that was after VM creation. Here is a demo: [root@devstack0 ~]# openstack server create test-vm --image cirros-0.6.2-x86_64-disk --network public --flavor m1.tiny -c id +---+--+ | Field | Value| +---+--+ | id| 84c7ed9c-c78e-4d15-8a09-6eb18b0f872a | +---+--+ [root@devstack0 ~]# openstack port list --device-id 84c7ed9c-c78e-4d15-8a09-6eb18b0f872a -c ID -c mac_address +--+---+ | ID | MAC Address | +--+---+ | 4fd0022b-223d-43ac-9134-1623b38ee2a6 | fa:16:3e:4b:db:3e | +--+---+ [root@devstack0 ~]# Table 60: two rules with dl_dst=fa:16:3e:4b:db:3e after VM is created: [root@devstack0 neutron]# ovs-ofctl dump-flows br-int table=60 | grep fa:16:3e:4b:db:3e cookie=0x1a51dc2aa3392248, duration=23.420s, table=60, n_packets=0, n_bytes=0, idle_age=1961, priority=90,vlan_tci=0x/0x1fff,dl_dst=fa:16:3e:4b:db:3e actions=load:0x1c->NXM_NX_REG5[],load:0x2->NXM_NX_REG6[],resubmit(,81) cookie=0x1a51dc2aa3392248, duration=23.420s, table=60, n_packets=25, n_bytes=2450, idle_age=678, priority=90,dl_vlan=2,dl_dst=fa:16:3e:4b:db:3e actions=load:0x1c->NXM_NX_REG5[],load:0x2->NXM_NX_REG6[],strip_vlan,resubmit(,81) [root@devstack0 neutron]# Stop/start the VM and check it again: [root@devstack0 ~]# openstack server stop test-vm [root@devstack0 ~]# openstack server start test-vm [root@devstack0 ~]# [root@devstack0 neutron]# ovs-ofctl dump-flows br-int table=60 | grep fa:16:3e:4b:db:3e cookie=0x1a51dc2aa3392248, duration=14.201s, table=60, n_packets=25, n_bytes=2450, idle_age=697, priority=90,dl_vlan=2,dl_dst=fa:16:3e:4b:db:3e actions=load:0x1d->NXM_NX_REG5[],load:0x2->NXM_NX_REG6[],strip_vlan,resubmit(,81) [root@devstack0 neutron]# You can see that the rule [1] has disappeared. And there is a neutron-openvsiwth-agent message 'Initializing port that was already initialized' while VM starting: Oct 10 08:50:05 devstack0 neutron-openvswitch-agent[232791]: INFO neutron.agent.securitygroups_rpc [None req-df876af2-5007-42ae-ae4e-8c968f59fb5c None None] Preparing filters for devices {'4fd0022b-223d-43ac-9134-1623b38ee2a6'} Oct 10 08:50:05 devstack0 neutron-openvswitch-agent[232791]: INFO neutron.agent.linux.openvswitch_firewall.firewall [None req-df876af2-5007-42ae-ae4e-8c968f59fb5c None None] Initializing port 4fd0022b-223d-43ac-9134-1623b38ee2a6 that was already initialized. I get this behavior on devstack with neutron from master branch. It looks like this rule is disappeared because OVS interface under OVS port is recreated after VM stop/start and new OFPort object is creating with network_type=None (as well with physical_network=None). Compare to a few lines above where the OFPort object is created with network_type/physical_network [2] I actually discovered this behavior while testing my neutron port-check plugin [3] [root@devstack0 ~]# openstack port check 4fd0022b-223d-43ac-9134-1623b38ee2a6 -c firewall +--+--+ | Field| Value | +--+--+ | firewall | - No flow: table=60, priority=90,vlan_tci=(0, 8191),eth_dst=fa:16:3e:4b:db:3e actions=set_field:29->reg5,set_field:2->reg6,resubmit(,81) | +--+--+ [root@devstack0 ~]# [1] https://opendev.org/openstack/neutron/src/commit/78027da56ccb25d19ac2c3bc1c174acb2150e6a5/neutron/agent/linux/openvswitch_firewall/firewall.py#L915 [2] https://opendev.org/openstack/neutron/src/commit/78027da56ccb25d19ac2c3bc1c174acb2150e6a5/neutron/agent/linux/openvswitch_firewall/firewall.py#L724 [3] https://github.com/antonkurbatov/neutron-portcheck ** Affects: neutron Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/2038931 Title: ovsfw: OVS br-int rule disappears from the table=60 after stop/start VM Status in neutron:
[Yahoo-eng-team] [Bug 2024381] [NEW] keepalived fails to start after updating DVR-HA internal network MTU
Public bug reported: We got an issue when keepalived stops to be running after update MTU on the internal network of the DVR-HA router. It turned out that the keepalived config has an interface from qrouter-ns although the keepalived process itself is running in snat-ns. Here is a simple demo on the latest master branch: $ openstack network create net1 $ openstack subnet create sub1 --network net1 --subnet-range 192.168.100.0/24 $ openstack router create r1 --distributed --ha $ openstack router add subnet r1 sub1 Keepalived process is running and the config looks like: $ ps axf | grep -w pid.keepalived ... 130250 ?S 0:00 \_ keepalived -P -f /opt/stack/data/neutron/ha_confs/f7df848f-f168-4305-8ba2-a31902bdbbfd/keepalived.conf -p /opt/stack/data/neutron/ha_confs/f7df848f-f168-4305-8ba2-a31902bdbbfd.pid.keepalived -r /opt/stack/data/neutron/ha_confs/f7df848f-f168-4305-8ba2-a31902bdbbfd.pid.keepalived-vrrp -D $ cat /opt/stack/data/neutron/ha_confs/f7df848f-f168-4305-8ba2-a31902bdbbfd/keepalived.conf global_defs { notification_email_from neutron@openstack.local router_id neutron } vrrp_instance VR_60 { state BACKUP interface ha-77ee55dc-5c virtual_router_id 60 priority 50 garp_master_delay 60 nopreempt advert_int 2 track_interface { ha-77ee55dc-5c } virtual_ipaddress { 169.254.0.60/24 dev ha-77ee55dc-5c } $ Now update MTU of the internal network: $ openstack network set net1 --mtu 1400 $ ps axf | grep -w pid.keepalived 131097 pts/0S+ 0:00 | \_ grep --color=auto -w pid.keepalived $ $ ip netns exec snat-f7df848f-f168-4305-8ba2-a31902bdbbfd keepalived -t -f /opt/stack/data/neutron/ha_confs/f7df848f-f168-4305-8ba2-a31902bdbbfd/keepalived.conf (/opt/stack/data/neutron/ha_confs/f7df848f-f168-4305-8ba2-a31902bdbbfd/keepalived.conf: Line 20) WARNING - interface qr-035f8095-76 for ip address 192.168.100.1/24 doesn't exist (/opt/stack/data/neutron/ha_confs/f7df848f-f168-4305-8ba2-a31902bdbbfd/keepalived.conf: Line 21) WARNING - interface qr-035f8095-76 for ip address fe80::f816:3eff:fe88:e922/64 doesn't exist Non-existent interface specified in configuration $ $ cat /opt/stack/data/neutron/ha_confs/f7df848f-f168-4305-8ba2-a31902bdbbfd/keepalived.conf global_defs { notification_email_from neutron@openstack.local router_id neutron } vrrp_instance VR_60 { state BACKUP interface ha-77ee55dc-5c virtual_router_id 60 priority 50 garp_master_delay 60 nopreempt advert_int 2 track_interface { ha-77ee55dc-5c } virtual_ipaddress { 169.254.0.60/24 dev ha-77ee55dc-5c } virtual_ipaddress_excluded { 192.168.100.1/24 dev qr-035f8095-76 fe80::f816:3eff:fe88:e922/64 dev qr-035f8095-76 scope link } }$ $ ip netns exec snat-f7df848f-f168-4305-8ba2-a31902bdbbfd ip link 1: lo: mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 10: ha-77ee55dc-5c: mtu 1450 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000 link/ether fa:16:3e:46:30:c4 brd ff:ff:ff:ff:ff:ff $ ** Affects: neutron Importance: Undecided Status: In Progress -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/2024381 Title: keepalived fails to start after updating DVR-HA internal network MTU Status in neutron: In Progress Bug description: We got an issue when keepalived stops to be running after update MTU on the internal network of the DVR-HA router. It turned out that the keepalived config has an interface from qrouter-ns although the keepalived process itself is running in snat-ns. Here is a simple demo on the latest master branch: $ openstack network create net1 $ openstack subnet create sub1 --network net1 --subnet-range 192.168.100.0/24 $ openstack router create r1 --distributed --ha $ openstack router add subnet r1 sub1 Keepalived process is running and the config looks like: $ ps axf | grep -w pid.keepalived ... 130250 ?S 0:00 \_ keepalived -P -f /opt/stack/data/neutron/ha_confs/f7df848f-f168-4305-8ba2-a31902bdbbfd/keepalived.conf -p /opt/stack/data/neutron/ha_confs/f7df848f-f168-4305-8ba2-a31902bdbbfd.pid.keepalived -r /opt/stack/data/neutron/ha_confs/f7df848f-f168-4305-8ba2-a31902bdbbfd.pid.keepalived-vrrp -D $ cat /opt/stack/data/neutron/ha_confs/f7df848f-f168-4305-8ba2-a31902bdbbfd/keepalived.conf global_defs { notification_email_from neutron@openstack.local router_id neutron } vrrp_instance VR_60 { state BACKUP interface ha-77ee55dc-5c virtual_router_id 60 priority 50 garp_master_delay 60 nopreempt advert_int 2 track_interface { ha-77ee55dc-5c } virtual_ipaddress { 169.254.0.60/24
[Yahoo-eng-team] [Bug 2008270] [NEW] Neutron allows you to delete router_ha_interface ports, which can lead to issues
Public bug reported: We ran into a problem with a customer when some external integration tries to remove all ports using the neutron API, including router prots. It seems only the router ports with the router_ha_interface device owner are allowed to delete, all other router ports cannot be deleted directly through the API. Here is a simple example that demonstrates the doubling of ARP responses if such a port is deleted: [root@dev0 ~]# openstack router create r1 --ha --external-gateway public -c id +---+--+ | Field | Value| +---+--+ | id| 5d9d6fee-6652-4843-9f7c-54c11899d721 | +---+--+ [root@dev0 ~]# neutron l3-agent-list-hosting-router r1 neutron CLI is deprecated and will be removed in the Z cycle. Use openstack CLI instead. +--+--++---+--+ | id | host | admin_state_up | alive | ha_state | +--+--++---+--+ | 9dd0920a-cb0c-47f1-a976-3e208e3e2e6c | dev0 | True | :-) | active | | 6fa92056-ca25-42e0-aee4-c4e744008239 | dev2 | True | :-) | standby | | 8fbda128-dc9c-4b3b-be1b-bb3f11ad1447 | dev1 | True | :-) | standby | +--+--++---+--+ [root@dev0 ~]# openstack port list --device-id 5d9d6fee-6652-4843-9f7c-54c11899d721 -c id -c device_owner -c fixed_ips --long +--+-++ | ID | Device Owner| Fixed IP Addresses | +--+-++ | 555a9272-c9df-4a05-9f08-752c91c5a4c9 | network:router_ha_interface | ip_address='169.254.192.147', subnet_id='20c159f7-13f8-4093-9a4a-8380bdcfea60' | | 6a196ff7-f3d4-4bee-aed0-b5d7ba727741 | network:router_ha_interface | ip_address='169.254.193.243', subnet_id='20c159f7-13f8-4093-9a4a-8380bdcfea60' | | 7a849dcc-eac4-4d5b-a547-7ce3986ffb95 | network:router_ha_interface | ip_address='169.254.192.155', subnet_id='20c159f7-13f8-4093-9a4a-8380bdcfea60' | | d77e624d-87a2-4135-9118-3d8e78539cee | network:router_gateway | ip_address='10.136.17.172', subnet_id='ee15c548-e497-449e-b46d-50e9ccc0f70c' | +--+-++ [root@dev0 ~]# [root@dev0 ~]# ip netns exec snat-5d9d6fee-6652-4843-9f7c-54c11899d721 ip a ... 25: ha-555a9272-c9: mtu 1450 qdisc noqueue state UNKNOWN group default qlen 1000 link/ether fa:16:3e:7d:cf:a0 brd ff:ff:ff:ff:ff:ff inet 169.254.192.147/18 brd 169.254.255.255 scope global ha-555a9272-c9 valid_lft forever preferred_lft forever inet 169.254.0.189/24 scope global ha-555a9272-c9 valid_lft forever preferred_lft forever inet6 fe80::f816:3eff:fe7d:cfa0/64 scope link valid_lft forever preferred_lft forever 28: qg-d77e624d-87: mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000 link/ether fa:16:3e:a8:54:29 brd ff:ff:ff:ff:ff:ff inet 10.136.17.172/20 scope global qg-d77e624d-87 valid_lft forever preferred_lft forever inet6 fe80::f816:3eff:fea8:5429/64 scope link nodad valid_lft forever preferred_lft forever [root@dev0 ~]# [root@dev0 ~]# openstack port delete 555a9272-c9df-4a05-9f08-752c91c5a4c9 [root@dev0 ~]# neutron l3-agent-list-hosting-router r1 neutron CLI is deprecated and will be removed in the Z cycle. Use openstack CLI instead. +--+--++---+--+ | id | host | admin_state_up | alive | ha_state | +--+--++---+--+ | 6fa92056-ca25-42e0-aee4-c4e744008239 | dev2 | True | :-) | active | | 8fbda128-dc9c-4b3b-be1b-bb3f11ad1447 | dev1 | True | :-) | standby | +--+--++---+--+ [root@dev0 ~]# [root@dev0 ~]# ip netns exec snat-5d9d6fee-6652-4843-9f7c-54c11899d721 ip a s qg-d77e624d-87 28: qg-d77e624d-87: mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000 link/ether fa:16:3e:a8:54:29 brd ff:ff:ff:ff:ff:ff inet 10.136.17.172/20 scope global qg-d77e624d-87 valid_lft forever preferred_lft forever inet6 fe80::f816:3eff:fea8:5429/64 scope link nodad valid_lft forever preferred_lft forever [root@dev0 ~]# ssh dev2 ip netns exec
[Yahoo-eng-team] [Bug 2003532] [NEW] Floating IP stuck in snat-ns after binding host to associated fixed IP
Public bug reported: We encountered a problem when the floating IP is not removed from the snat-ns when FIP is moving from the centralized to the distributed state (i.e. when the host is binding to the associated fixed IP address). This happens when the the fixed IP was originally created with a non-empty device_owner field. Steps to reproduce. Create a router, a port on a private network, and a FIP with this port as a fixed IP port: [root@devstack0 ~]# openstack router create --distributed r1 --external-gateway public [root@devstack0 ~]# openstack router add subnet r1 private [root@devstack0 ~]# openstack port create my-port --network private --device-owner compute:nova +--+---+ | Field| Value | +--+---+ | device_owner | compute:nova | | fixed_ips| ip_address='192.168.10.133', subnet_id='8ec1cd23-363a-474c-a53f-bab4692c312f' | +--+---+ [root@devstack0 ~]# openstack floating ip create public --port my-port -c floating_ip_address +-+---+ | Field | Value | +-+---+ | floating_ip_address | 10.136.17.171 | +-+---+ [root@devstack0 ~]# The FIP is added to the snat-ns: [root@devstack0 ~]# ip netns exec snat-b961c902-8cd9-4c5c-a03c-6595368a2314 ip a ... 38: qg-6a663b96-e1: mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000 link/ether fa:16:3e:bf:85:ab brd ff:ff:ff:ff:ff:ff inet 10.136.17.175/20 brd 10.136.31.255 scope global qg-6a663b96-e1 valid_lft forever preferred_lft forever inet 10.136.17.171/32 brd 10.136.17.171 scope global qg-6a663b96-e1 valid_lft forever preferred_lft forever inet6 fe80::f816:3eff:febf:85ab/64 scope link valid_lft forever preferred_lft forever ... [root@devstack0 ~]# Create a VM with `my-port` and boot it on an another node: [root@devstack0 ~]# openstack server create vm --port my-port --image cirros-0.5.2-x86_64-disk --flavor 1 --host devstack2 Check FIP state on the node with VM (OK): [root@devstack2 ~]# ip netns exec qrouter-b961c902-8cd9-4c5c-a03c-6595368a2314 ip rule ... 65426: from 192.168.10.133 lookup 16 3232238081: from 192.168.10.1/24 lookup 3232238081 [root@devstack2 ~]# Check the FIP on the node with the snat-ns (not OK, it's still here): [root@devstack0 ~]# ip netns exec snat-b961c902-8cd9-4c5c-a03c-6595368a2314 ip a ... 38: qg-6a663b96-e1: mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000 link/ether fa:16:3e:bf:85:ab brd ff:ff:ff:ff:ff:ff inet 10.136.17.175/20 brd 10.136.31.255 scope global qg-6a663b96-e1 valid_lft forever preferred_lft forever inet 10.136.17.171/32 brd 10.136.17.171 scope global qg-6a663b96-e1 valid_lft forever preferred_lft forever inet6 fe80::f816:3eff:febf:85ab/64 scope link valid_lft forever preferred_lft forever ... [root@devstack0 ~]# We found that FIP status "moving" notification is not sent to snat nodes in this scenario, see [1]. There was also some small discussion about why the notification should be sent only when changing from empty to a non-empty device_owner [2]. It looks like such behavior can be considered as a bug. [1] https://opendev.org/openstack/neutron/src/commit/c1eff1dd440b2243a4a31cf3c3af06a01e899f1d/neutron/db/l3_dvrscheduler_db.py#L647 [2] https://review.opendev.org/c/openstack/neutron/+/609924/10/neutron/db/l3_dvrscheduler_db.py#503 ** Affects: neutron Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/2003532 Title: Floating IP stuck in snat-ns after binding host to associated fixed IP Status in neutron: New Bug description: We encountered a problem when the floating IP is not removed from the snat-ns when FIP is moving from the centralized to the distributed state (i.e. when the host is binding to the associated fixed IP address). This happens when the the fixed IP was originally created with a non-empty device_owner field. Steps to reproduce. Create a router, a port on a private network, and a FIP with this port as a fixed IP port: [root@devstack0 ~]# openstack router create --distributed r1 --external-gateway public [root@devstack0 ~]# openstack router add subnet r1 private [root@devstack0 ~]# openstack port create my-port --network private --device-owner compute:nova +--+---+ | Field| Value
[Yahoo-eng-team] [Bug 2003359] [NEW] DVR HA router gets stuck in backup state
Public bug reported: We found the issue when a created HA DVR router gets stuck in the backup state and does not go into primary state. Preconditions: 1) there is no router with a specific external network yet 2) the router needs to go through a quick creation->deletion, and then the next creation of the router can get stuck in the backup state The reason for such behavior is not removed fip-ns on the agent while the floatingip_agent_gateway port was removed. Further is a demo with the help of which I managed to reproduce this behavior on a single node devstack setup with. Сreate a router and quickly delete it while the l3 agent processes the external GW adding: [root@devstack ~]# r_id=$(openstack router create r1 --distributed --ha -c id -f value); sleep 30 # give time to process [root@devstack ~]# count_fip_requests() { journalctl -u devstack@q-l3.service | grep 'FloatingIP agent gateway port received' | wc -l; } [root@devstack ~]# # add an external gateway and then delete the router while the agent processes gw [root@devstack ~]# fip_requests=$(count_fip_requests); openstack router set $r_id --external-gateway public; while :; do [[ $fip_requests == $(count_fip_requests) ]] && { echo "waiting before deletion..."; sleep 1; } || break; done; openstack router delete $r_id waiting before deletion... waiting before deletion... [root@devstack ~]# As a result fip-ns is not deleted even though the floatingip_agent_gateway port was removed: [root@devstack ~]# ip netns fip-8d4bc2d5-c6e7-44d0-99f7-1333bafa991f (id: 1) [root@devstack ~]# openstack port list --network public -c ID -c device_owner -c status --long [root@devstack ~]# Re-create the router together with external gw now: [root@devstack ~]# openstack router create r1 --ha --distributed --external-gateway public In the logs, one can see a traceback that the creation of this router failed initially, followed by a successful creation: ERROR neutron.agent.l3.dvr_fip_ns Traceback (most recent call last): ERROR neutron.agent.l3.dvr_fip_ns File "/opt/stack/neutron/neutron/agent/l3/dvr_fip_ns.py", line 152, in create_or_update_gateway_port ERROR neutron.agent.l3.dvr_fip_ns self._update_gateway_port( ERROR neutron.agent.l3.dvr_fip_ns File "/opt/stack/neutron/neutron/agent/l3/dvr_fip_ns.py", line 323, in _update_gateway_port ERROR neutron.agent.l3.dvr_fip_ns self.driver.set_onlink_routes( ERROR neutron.agent.l3.dvr_fip_ns File "/opt/stack/neutron/neutron/agent/linux/interface.py", line 193, in set_onlink_routes ERROR neutron.agent.l3.dvr_fip_ns onlink = device.route.list_onlink_routes(constants.IP_VERSION_4) ERROR neutron.agent.l3.dvr_fip_ns File "/opt/stack/neutron/neutron/agent/linux/ip_lib.py", line 633, in list_onlink_routes ERROR neutron.agent.l3.dvr_fip_ns routes = self.list_routes(ip_version, scope='link') ERROR neutron.agent.l3.dvr_fip_ns File "/opt/stack/neutron/neutron/agent/linux/ip_lib.py", line 629, in list_routes ERROR neutron.agent.l3.dvr_fip_ns return list_ip_routes(self._parent.namespace, ip_version, scope=scope, ERROR neutron.agent.l3.dvr_fip_ns File "/opt/stack/neutron/neutron/agent/linux/ip_lib.py", line 1585, in list_ip_routes ERROR neutron.agent.l3.dvr_fip_ns routes = privileged.list_ip_routes(namespace, ip_version, device=device, ERROR neutron.agent.l3.dvr_fip_ns File "/usr/local/lib/python3.9/site-packages/tenacity/__init__.py", line 333, in wrapped_f ERROR neutron.agent.l3.dvr_fip_ns return self(f, *args, **kw) ERROR neutron.agent.l3.dvr_fip_ns File "/usr/local/lib/python3.9/site-packages/tenacity/__init__.py", line 423, in __call__ ERROR neutron.agent.l3.dvr_fip_ns do = self.iter(retry_state=retry_state) ERROR neutron.agent.l3.dvr_fip_ns File "/usr/local/lib/python3.9/site-packages/tenacity/__init__.py", line 360, in iter ERROR neutron.agent.l3.dvr_fip_ns return fut.result() ERROR neutron.agent.l3.dvr_fip_ns File "/usr/lib64/python3.9/concurrent/futures/_base.py", line 439, in result ERROR neutron.agent.l3.dvr_fip_ns return self.__get_result() ERROR neutron.agent.l3.dvr_fip_ns File "/usr/lib64/python3.9/concurrent/futures/_base.py", line 391, in __get_result ERROR neutron.agent.l3.dvr_fip_ns raise self._exception ERROR neutron.agent.l3.dvr_fip_ns File "/usr/local/lib/python3.9/site-packages/tenacity/__init__.py", line 426, in __call__ ERROR neutron.agent.l3.dvr_fip_ns result = fn(*args, **kwargs) ERROR neutron.agent.l3.dvr_fip_ns File "/usr/local/lib/python3.9/site-packages/oslo_privsep/priv_context.py", line 271, in _wrap ERROR neutron.agent.l3.dvr_fip_ns return self.channel.remote_call(name, args, kwargs, ERROR neutron.agent.l3.dvr_fip_ns File "/usr/local/lib/python3.9/site-packages/oslo_privsep/daemon.py", line 215, in remote_call ERROR neutron.agent.l3.dvr_fip_ns raise exc_type(*result[2]) ERROR neutron.agent.l3.dvr_fip_ns neutron.privileged.agent.linux.ip_lib.NetworkInterfaceNotFound: Network
[Yahoo-eng-team] [Bug 2000078] [NEW] neutron-remove-duplicated-port-bindings doesn't remove binding_levels
Public bug reported: I'm trying to do an INACTIVE port binding cleanup using neutron-remove-duplicated-port-bindings tool from #1979072 But I found an issue with this help tool: it doens't remove entries from the ml2_port_binding_levels table that still blocks new port binding to the host. Demo: 1) create VM and bind a port to another host: $ openstack port create my-port --network private --device-owner compute:test -> get port port ID -> 075c4058-2933-4f6f-90a9-f754e81cef52 $ curl -k -H "x-auth-token: $t" -H "Content-Type: application/json" -X POST http://10.136.16.186:9696/networking/v2.0/ports/075c4058-2933-4f6f-90a9-f754e81cef52/bindings -d '{"binding": {"host": "ak-dev2"}}' MariaDB [neutron]> select port_id,host,vif_type,status from ml2_port_bindings where port_id='075c4058-2933-4f6f-90a9-f754e81cef52'; +--+-+--+--+ | port_id | host| vif_type | status | +--+-+--+--+ | 075c4058-2933-4f6f-90a9-f754e81cef52 | ak-dev1 | ovs | ACTIVE | | 075c4058-2933-4f6f-90a9-f754e81cef52 | ak-dev2 | ovs | INACTIVE | +--+-+--+--+ 2 rows in set (0.000 sec) MariaDB [neutron]> select * from ml2_port_binding_levels where port_id='075c4058-2933-4f6f-90a9-f754e81cef52'; +--+-+---+-+--+ | port_id | host| level | driver | segment_id | +--+-+---+-+--+ | 075c4058-2933-4f6f-90a9-f754e81cef52 | ak-dev1 | 0 | openvswitch | 2250e731-0046-46ae-8cf0-8da7fd3aad98 | | 075c4058-2933-4f6f-90a9-f754e81cef52 | ak-dev2 | 0 | openvswitch | 2250e731-0046-46ae-8cf0-8da7fd3aad98 | +--+-+---+-+--+ 2 rows in set (0.000 sec) MariaDB [neutron]> 2) remove INACTIVE port bindings via neutron-remove-duplicated-port-bindings: $ neutron-remove-duplicated-port-bindings --config-file /etc/neutron/neutron.conf MariaDB [neutron]> select port_id,host,vif_type,status from ml2_port_bindings where port_id='075c4058-2933-4f6f-90a9-f754e81cef52'; +--+-+--++ | port_id | host| vif_type | status | +--+-+--++ | 075c4058-2933-4f6f-90a9-f754e81cef52 | ak-dev1 | ovs | ACTIVE | +--+-+--++ 1 row in set (0.000 sec) MariaDB [neutron]> select * from ml2_port_binding_levels where port_id='075c4058-2933-4f6f-90a9-f754e81cef52'; +--+-+---+-+--+ | port_id | host| level | driver | segment_id | +--+-+---+-+--+ | 075c4058-2933-4f6f-90a9-f754e81cef52 | ak-dev1 | 0 | openvswitch | 2250e731-0046-46ae-8cf0-8da7fd3aad98 | | 075c4058-2933-4f6f-90a9-f754e81cef52 | ak-dev2 | 0 | openvswitch | 2250e731-0046-46ae-8cf0-8da7fd3aad98 | +--+-+---+-+--+ 2 rows in set (0.000 sec) MariaDB [neutron]> 3) Create the port binding again. It fails: $ # curl -k -H "x-auth-token: $t" -H "Content-Type: application/json" -X POST http://10.136.16.186:9696/networking/v2.0/ports/075c4058-2933-4f6f-90a9-f754e81cef52/bindings -d '{"binding": {"host": "ak-dev2"}}' {"NeutronError": {"type": "NeutronDbObjectDuplicateEntry", "message": "Failed to create a duplicate PortBindingLevel: for attribute(s) ['PRIMARY'] with value(s) 075c4058-2933-4f6f-90a9-f754e81cef52-ak-dev2-0", "detail": ""}} ** Affects: neutron Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/278 Title: neutron-remove-duplicated-port-bindings doesn't remove binding_levels Status in neutron: New Bug description: I'm trying to do an INACTIVE port binding cleanup using neutron-remove-duplicated-port-bindings tool from #1979072 But I found an issue with this help tool: it doens't remove entries from the ml2_port_binding_levels table that still blocks new port binding to the host. Demo: 1) create VM and bind a port to another host: $ openstack port create my-port --network private --device-owner compute:test -> get port port ID -> 075c4058-2933-4f6f-90a9-f754e81cef52 $ curl -k -H "x-auth-token: $t"
[Yahoo-eng-team] [Bug 1999678] [NEW] Static route can get stuck in the router snat namespace
Public bug reported: I ran into a problem where a static route just gets stuck in the snat namepsace, even when removing all static routes from a distributed router with ha enabled. Here is a simple demo from my devstack setup: [root@node0 ~]# openstack network create private [root@node0 ~]# openstack subnet create private --network private --subnet-range 192.168.10.0/24 --dhcp --gateway 192.168.10.1 [root@node0 ~]# openstack router create r1 --external-gateway public --distributed --ha [root@node0 ~]# openstack router add subnet r1 private [root@node0 ~]# openstack router set r1 --route destination=8.8.8.0/24,gateway=192.168.10.100 --route destination=8.8.8.0/24,gateway=192.168.10.200 After multipath route was added, snat-ns routes look like this: [root@node0 ~]# ip netns exec snat-dcbec74b-2003-4447-8854-524d918260ac ip r default via 10.136.16.1 dev qg-94c43336-56 proto keepalived 8.8.8.0/24 via 192.168.10.200 dev sg-dcf4a20b-8a proto keepalived 8.8.8.0/24 via 192.168.10.100 dev sg-dcf4a20b-8a proto keepalived 8.8.8.0/24 via 192.168.10.100 dev sg-dcf4a20b-8a proto static 10.136.16.0/20 dev qg-94c43336-56 proto kernel scope link src 10.136.17.171 169.254.0.0/24 dev ha-11b5b7d3-4e proto kernel scope link src 169.254.0.21 169.254.192.0/18 dev ha-11b5b7d3-4e proto kernel scope link src 169.254.195.228 192.168.10.0/24 dev sg-dcf4a20b-8a proto kernel scope link src 192.168.10.228 [root@node0 ~]# Note that there is only one 'static' route added by neutron and no multipath route. And two routes with 'proto keepalived' that have been added by keepalived process. Now delete all routes and check the routes inside snat-ns, the route is still there: [root@node0 ~]# openstack router set r1 --no-route [root@node0 ~]# ip netns exec snat-dcbec74b-2003-4447-8854-524d918260ac ip r default via 10.136.16.1 dev qg-94c43336-56 proto keepalived 8.8.8.0/24 via 192.168.10.100 dev sg-dcf4a20b-8a proto static 10.136.16.0/20 dev qg-94c43336-56 proto kernel scope link src 10.136.17.171 169.254.0.0/24 dev ha-11b5b7d3-4e proto kernel scope link src 169.254.0.21 169.254.192.0/18 dev ha-11b5b7d3-4e proto kernel scope link src 169.254.195.228 192.168.10.0/24 dev sg-dcf4a20b-8a proto kernel scope link src 192.168.10.228 [root@node0 ~]# ** Affects: neutron Importance: Undecided Status: In Progress -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1999678 Title: Static route can get stuck in the router snat namespace Status in neutron: In Progress Bug description: I ran into a problem where a static route just gets stuck in the snat namepsace, even when removing all static routes from a distributed router with ha enabled. Here is a simple demo from my devstack setup: [root@node0 ~]# openstack network create private [root@node0 ~]# openstack subnet create private --network private --subnet-range 192.168.10.0/24 --dhcp --gateway 192.168.10.1 [root@node0 ~]# openstack router create r1 --external-gateway public --distributed --ha [root@node0 ~]# openstack router add subnet r1 private [root@node0 ~]# openstack router set r1 --route destination=8.8.8.0/24,gateway=192.168.10.100 --route destination=8.8.8.0/24,gateway=192.168.10.200 After multipath route was added, snat-ns routes look like this: [root@node0 ~]# ip netns exec snat-dcbec74b-2003-4447-8854-524d918260ac ip r default via 10.136.16.1 dev qg-94c43336-56 proto keepalived 8.8.8.0/24 via 192.168.10.200 dev sg-dcf4a20b-8a proto keepalived 8.8.8.0/24 via 192.168.10.100 dev sg-dcf4a20b-8a proto keepalived 8.8.8.0/24 via 192.168.10.100 dev sg-dcf4a20b-8a proto static 10.136.16.0/20 dev qg-94c43336-56 proto kernel scope link src 10.136.17.171 169.254.0.0/24 dev ha-11b5b7d3-4e proto kernel scope link src 169.254.0.21 169.254.192.0/18 dev ha-11b5b7d3-4e proto kernel scope link src 169.254.195.228 192.168.10.0/24 dev sg-dcf4a20b-8a proto kernel scope link src 192.168.10.228 [root@node0 ~]# Note that there is only one 'static' route added by neutron and no multipath route. And two routes with 'proto keepalived' that have been added by keepalived process. Now delete all routes and check the routes inside snat-ns, the route is still there: [root@node0 ~]# openstack router set r1 --no-route [root@node0 ~]# ip netns exec snat-dcbec74b-2003-4447-8854-524d918260ac ip r default via 10.136.16.1 dev qg-94c43336-56 proto keepalived 8.8.8.0/24 via 192.168.10.100 dev sg-dcf4a20b-8a proto static 10.136.16.0/20 dev qg-94c43336-56 proto kernel scope link src 10.136.17.171 169.254.0.0/24 dev ha-11b5b7d3-4e proto kernel scope link src 169.254.0.21 169.254.192.0/18 dev ha-11b5b7d3-4e proto kernel scope link src 169.254.195.228 192.168.10.0/24 dev sg-dcf4a20b-8a proto kernel scope link src 192.168.10.228 [root@node0 ~]# To manage notifications about this bug go to:
[Yahoo-eng-team] [Bug 1998343] [NEW] Unittest test_distributed_port_binding_deleted_by_port_deletion fails: DeprecationWarning('ssl.PROTOCOL_TLS is deprecated')
Public bug reported: I got an error in the test_distributed_port_binding_deleted_by_port_deletion test on my CI run [1]. Also I found the same failure in another CI run [2] FAIL: neutron.tests.unit.plugins.ml2.test_db.Ml2DvrDBTestCase.test_distributed_port_binding_deleted_by_port_deletion tags: worker-0 -- stderr: {{{ /home/zuul/src/opendev.org/openstack/neutron/.tox/shared/lib/python3.10/site-packages/ovs/stream.py:794: DeprecationWarning: ssl.PROTOCOL_TLS is deprecated ctx = ssl.SSLContext(ssl.PROTOCOL_SSLv23) /home/zuul/src/opendev.org/openstack/neutron/.tox/shared/lib/python3.10/site-packages/ovs/stream.py:794: DeprecationWarning: ssl.PROTOCOL_TLS is deprecated ctx = ssl.SSLContext(ssl.PROTOCOL_SSLv23) }}} Traceback (most recent call last): File "/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/base.py", line 182, in func return f(self, *args, **kwargs) File "/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/unit/plugins/ml2/test_db.py", line 535, in test_distributed_port_binding_deleted_by_port_deletion self.assertEqual( File "/home/zuul/src/opendev.org/openstack/neutron/.tox/shared/lib/python3.10/site-packages/testtools/testcase.py", line 393, in assertEqual self.assertThat(observed, matcher, message) File "/home/zuul/src/opendev.org/openstack/neutron/.tox/shared/lib/python3.10/site-packages/testtools/testcase.py", line 480, in assertThat raise mismatch_error testtools.matchers._impl.MismatchError: [] != []: Warnings: {message : DeprecationWarning('ssl.PROTOCOL_TLS is deprecated'), category : 'DeprecationWarning', filename : '/home/zuul/src/opendev.org/openstack/neutron/.tox/shared/lib/python3.10/site-packages/ovs/stream.py', lineno : 794, line : None} I have spent some time and seem to have found the reason for this behavior on python 3.10. First of all, since python3.10 we get a warning when using ssl.PROTOCOL_TLS [3]: [root@node0 neutron]# python Python 3.10.8+ (heads/3.10-dirty:ca3c480, Nov 30 2022, 12:16:40) [GCC 4.8.5 20150623 (Red Hat 4.8.5-44)] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import ssl >>> ssl.SSLContext(ssl.PROTOCOL_SSLv23) :1: DeprecationWarning: ssl.PROTOCOL_TLS is deprecated >>> I also found that the `test_ssl_connection` test case affects catching warnings in the test_distributed_port_binding_deleted_by_port_deletion test case. I was then able to reproduce the issue like this: [root@node0 neutron]# cat run_list.txt neutron.tests.unit.agent.ovsdb.native.test_connection.ConfigureSslConnTestCase.test_ssl_connection neutron.tests.unit.plugins.ml2.test_db.Ml2DvrDBTestCase.test_distributed_port_binding_deleted_by_port_deletion [root@node0 neutron]# git diff diff --git a/neutron/tests/unit/plugins/ml2/test_db.py b/neutron/tests/unit/plugins/ml2/test_db.py index 578a01a..d837871 100644 --- a/neutron/tests/unit/plugins/ml2/test_db.py +++ b/neutron/tests/unit/plugins/ml2/test_db.py @@ -531,6 +531,8 @@ class Ml2DvrDBTestCase(testlib_api.SqlTestCase): router_id='router_id', status=constants.PORT_STATUS_DOWN).create() with warnings.catch_warnings(record=True) as warning_list: +import time +time.sleep(0.1) port.delete() self.assertEqual( [], warning_list, [root@node0 neutron]# source .tox/shared/bin/activate (shared) [root@node0 neutron]# stestr run --concurrency=1 --load-list ./run_list.txt ... neutron.tests.unit.plugins.ml2.test_db.Ml2DvrDBTestCase.test_distributed_port_binding_deleted_by_port_deletion -- Captured traceback: ~~~ Traceback (most recent call last): File "/root/github/neutron/neutron/tests/base.py", line 182, in func return f(self, *args, **kwargs) File "/root/github/neutron/neutron/tests/unit/plugins/ml2/test_db.py", line 537, in test_distributed_port_binding_deleted_by_port_deletion self.assertEqual( File "/root/github/neutron/.tox/shared/lib/python3.10/site-packages/testtools/testcase.py", line 393, in assertEqual self.assertThat(observed, matcher, message) File "/root/github/neutron/.tox/shared/lib/python3.10/site-packages/testtools/testcase.py", line 480, in assertThat raise mismatch_error testtools.matchers._impl.MismatchError: [] != []: Warnings: {message : DeprecationWarning('ssl.PROTOCOL_TLS is deprecated'), category : 'DeprecationWarning', filename : '/root/github/neutron/.tox/shared/lib/python3.10/site-packages/ovs/stream.py', lineno : 794, line : None} == Totals == Ran: 2 tests in 1.3571 sec. - Passed: 1 - Skipped: 0 - Expected Fail: 0 - Unexpected Success: 0 - Failed: 1 Sum of execute time for each test: 1.3053 sec. [1]
[Yahoo-eng-team] [Bug 1998110] [NEW] Tempest test test_resize_server_revert: failed to build and is in ERROR status: Virtual Interface creation failed
Public bug reported: In my CI run I got an error in test_resize_server_revert test case [1] {3} tempest.api.compute.servers.test_server_actions.ServerActionsTestJSON.test_resize_server_revert [401.454625s] ... FAILED Captured traceback: ~~~ Traceback (most recent call last): File "/opt/stack/tempest/tempest/api/compute/servers/test_server_actions.py", line 430, in test_resize_server_revert waiters.wait_for_server_status(self.client, self.server_id, 'ACTIVE') File "/opt/stack/tempest/tempest/common/waiters.py", line 101, in wait_for_server_status raise lib_exc.TimeoutException(message) tempest.lib.exceptions.TimeoutException: Request timed out Details: (ServerActionsTestJSON:test_resize_server_revert) Server e69e6d33-c494-415a-9cb8-b597af2ea052 failed to reach ACTIVE status and task state "None" within the required time (196 s). Current status: REVERT_RESIZE. Current task state: resize_reverting. Captured traceback-1: ~ Traceback (most recent call last): File "/opt/stack/tempest/tempest/api/compute/base.py", line 228, in server_check_teardown waiters.wait_for_server_status(cls.servers_client, File "/opt/stack/tempest/tempest/common/waiters.py", line 81, in wait_for_server_status raise exceptions.BuildErrorException(details, server_id=server_id) tempest.exceptions.BuildErrorException: Server e69e6d33-c494-415a-9cb8-b597af2ea052 failed to build and is in ERROR status Details: Fault: {'code': 500, 'created': '2022-11-23T21:46:15Z', 'message': 'Virtual Interface creation failed'}. The test checks the following: 1) resize to new flavor; 2) wait for a VM VERIFY_RESIZE status; 3) revert a resizing; 4) wait for a VM ACTIVE status < I got fail here. The test did a resize with a change of node: VM on the node 0032209120 -> resize VM, new VM node is 0032209122 -> revert resize The `resize revert` (p3) started here: Nov 23 21:41:05.514686 ubuntu-jammy-rax-dfw-0032209120 devstack@n-api.service[54681]: DEBUG nova.api.openstack.wsgi [None req-83266751-d6d9-4a35-89fc-b4c97c1b481d tempest-ServerActionsTestJSON-1939410532 tempest-ServerActionsTestJSON-1939410532-project] Action: 'action', calling method: >, body: {"revertResize": {}} {{(pid=54681) _process_stack /opt/stack/nova/nova/api/openstack/wsgi.py:511}} The Nova got unexpected event network-vif-plugged: Nov 23 21:41:12.404453 ubuntu-jammy-rax-dfw-0032209122 nova-compute[31414]: WARNING nova.compute.manager [req-b389f403-c195-4fa0-b578-7b687f85b79d req-c9eab04d-708d-4666-b4ce-f7bb760c7aa6 service nova] [instance: e69e6d33-c494-415a-9cb8-b597af2ea052] Received unexpected event network-vif-plugged-775d8945-1367-4e08-8306-9c683e1891cf for instance with vm_state resized and task_state resize_reverting. The Nova is preparing to receive network-vif-plugged notification: Nov 23 21:41:13.497369 ubuntu-jammy-rax-dfw-0032209122 nova-compute[31414]: DEBUG nova.compute.manager [None req-83266751-d6d9-4a35-89fc-b4c97c1b481d tempest-ServerActionsTestJSON-1939410532 tempest-ServerActionsTestJSON-1939410532-project] [instance: e69e6d33-c494-415a-9cb8-b597af2ea052] Preparing to wait for external event network-vif-plugged-775d8945-1367-4e08-8306-9c683e1891cf {{(pid=31414) prepare_for_instance_event /opt/stack/nova/nova/compute/manager.py:281}} So, there is an unexpected network-vif-plugged event. I believe that the trigger of this event is the `resize` operation from p1: The Nova does not wait for network interfaces to be plugged when resizing a VM (vifs_already_plugged=True) and a VM can switch to the VERIFY_RESIZE status without waiting for the port processing by Neutron [2] At the same time on the Newtron server side: Binding the port to the node 0032209120 in `resize` operation (p1): Nov 23 21:40:57.981780 ubuntu-jammy-rax-dfw-0032209120 neutron-server[55724]: DEBUG neutron.api.v2.base [req-b1a98064-7e8e-4ad3-84cf-09e3bf12727e req-036d315c-21e6-47d5-be1d-44a4efc8a3e9 service neutron] Request body: {'port': {'binding:host_id': 'ubuntu-jammy-rax-dfw-0032209120', 'device_owner': 'compute:nova'}} {{(pid=55724) prepare_request_body /opt/stack/neutron/neutron/api/v2/base.py:731}} Binding the port to the node 0032209122 in `resize revert` operation (p3): Nov 23 21:41:10.832391 ubuntu-jammy-rax-dfw-0032209120 neutron-server[55723]: DEBUG neutron.api.v2.base [req-83266751-d6d9-4a35-89fc-b4c97c1b481d req-268a8b14-b6b9-438d-bc3f-446f5eaad88d service neutron] Request body: {'port': {'binding:host_id': 'ubuntu-jammy-rax-dfw-0032209122', 'device_owner': 'compute:nova'}} {{(pid=55723) prepare_request_body /opt/stack/neutron/neutron/api/v2/base.py:731}} Provisioning completed by L2 from `resize` operation (p1): Nov 23 21:41:10.950190 ubuntu-jammy-rax-dfw-0032209120 neutron-server[55725]: DEBUG neutron.db.provisioning_blocks [None req-793235de-b92d-459f-b016-a3d9ba1a1ddd None None] Provisioning complete for port
[Yahoo-eng-team] [Bug 1997492] [NEW] Neutron server doesn't wait for port DHCP provisioning while VM creation
Public bug reported: I found that neutron-server does not wait for successful port provisioning from the dhcp agent in the case of VM creation. DHCP entity is not added into provisioning_block by neutron-server for such port. As a result, nova receives a notification that the port is plugged, while the DHCP agent is still processing the port or even getting an error during processing. Steps to reproduce on devstack from master: - make port_create_end method fail in DHCP agent side [1] - create a VM with network with DHCP enabled VM is successfully created, port is active, while the DHCP entry for this port is not configured. [root@node0 neutron]# git diff diff --git a/neutron/agent/dhcp/agent.py b/neutron/agent/dhcp/agent.py index 7349d7e297..553ba81fdc 100644 --- a/neutron/agent/dhcp/agent.py +++ b/neutron/agent/dhcp/agent.py @@ -676,6 +676,7 @@ class DhcpAgent(manager.Manager): payload.get('priority', DEFAULT_PRIORITY), action='_port_create', resource=created_port, obj_type='port') +raise Exception('fail for testing purposes') self._queue.add(update) @_wait_if_syncing [root@node0 neutron]# openstack server create test-vm --network net1 --flavor m1.tiny --image cirros-0.5.2-x86_64-disk [root@node0 ~]# openstack server list +--+-+++--+-+ | ID | Name| Status | Networks | Image| Flavor | +--+-+++--+-+ | cce75084-b1e0-4407-a0d6-0074ed05abad | test-vm | ACTIVE | net1=192.168.1.111 | cirros-0.5.2-x86_64-disk | m1.tiny | +--+-+++--+-+ [root@node0 ~]# openstack port list --device-id cce75084-b1e0-4407-a0d6-0074ed05abad +--+--+---+--++ | ID | Name | MAC Address | Fixed IP Addresses | Status | +--+--+---+--++ | d7e55e08-05ae-4ac4-8cd0-4f88b93c5872 | | fa:16:3e:9e:30:b3 | ip_address='192.168.1.111', subnet_id='281f70f3-8996-436b-ab90-bff1f9dbf5f8' | ACTIVE | +--+--+---+--++ [root@node0 ~]# [root@node0 ~]# cat /opt/stack/data/neutron/dhcp/710bcfcd-44d9-445d-a895-8ec522f64016/addn_hosts [root@node0 ~]# While VM creation there two API calls from the nova: 1) Port 'create' API call: Nov 22 16:19:40 node0 neutron-server[953593]: DEBUG neutron.api.v2.base [req-5cbe6387-fe21-4509-81f6-cfcfe268252f req-0b7496ea-3697-4bc8-abb4-95d8f23d3497 demo admin] Request body: {'port': {'device_id': 'cce75084-b1e0-4407-a0d6-0074ed05abad', 'network_id': '710bcfcd-44d9-445d-a895-8ec522f64016', 'admin_state_up': True, 'tenant_id': 'a022c969871149e9b19ec31c896a0701'}} {{(pid=953593) prepare_request_body /opt/stack/neutron/neutron/api/v2/base.py:730}} 2) Port 'update' API call: Nov 22 16:16:11 node0 neutron-server[953593]: DEBUG neutron.api.v2.base [req-145264e0-96a0-450b-9ad5-a5181c2497b1 req-9015e2c3-7dbb-430f-9cba-c7d6972f5134 service neutron] Request body: {'port': {'device_id': '4a4f87c0-a357-49eb-8639-58b499b8ae1f', 'device_owner': 'compute:nova', 'binding:host_id': 'node1'}} {{(pid=953593) prepare_request_body /opt/stack/neutron/neutron/api/v2/base.py:730}} For the port creation API call a DHCP provisioning is not setup because device_owner is absent [2] For the port 'update' API call a DCHP provisioning is not setup because none of the fixed_ips/mac_address is updated [3] [1] https://opendev.org/openstack/neutron/src/commit/51827d8e78db4926f3aa347c4b2237a7b210f861/neutron/agent/dhcp/agent.py#L670 [2] https://opendev.org/openstack/neutron/src/commit/51827d8e78db4926f3aa347c4b2237a7b210f861/neutron/plugins/ml2/plugin.py#L1501 [3] https://opendev.org/openstack/neutron/src/commit/51827d8e78db4926f3aa347c4b2237a7b210f861/neutron/plugins/ml2/plugin.py#L1925 ** Affects: neutron Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1997492 Title: Neutron server doesn't wait for port DHCP provisioning while VM creation Status in neutron: New Bug description: I found that neutron-server does not wait for successful
[Yahoo-eng-team] [Bug 1997090] [NEW] VMs listing with sort keys throws exception when trying to compare None values
Public bug reported: The nova-api raises exception on attempt to get VMs sorted by i.e. task_state key. Here are steps-to-reproduce: - create two VMs: vm1 in ACTIVE state (cell1) and vm2 in ERROR state (cell0) - try to list servers sorted by sort_key=task_state [root@node0 ~]# openstack server create vm1 --network net1 --flavor m1.tiny --image cirros-0.5.2-x86_64-disk [root@node0 ~]# openstack server create vm2 --network net1 --flavor m1.xlarge --image cirros-0.5.2-x86_64-disk [root@node0 ~]# openstack server list -f json --long -c ID -c 'Task State' -c 'Status' [ { "ID": "3a3927c4-9f67-4356-8a3e-a3e58cf0744e", "Status": "ERROR", "Task State": null }, { "ID": "9af631ec-3e59-45da-bafa-85141e3707da", "Status": "ACTIVE", "Task State": null } ] [root@node0 ~]# [root@node0 ~]# curl -k -H "x-auth-token: $s" 'http://10.136.16.186/compute/v2.1/servers/detail?sort_key=task_state' {"computeFault": {"code": 500, "message": "Unexpected API Error. Please report this at http://bugs.launchpad.net/nova/ and attach the Nova API log if possible.\n"}}[root@node0 ~]# Traceback: Nov 18 09:59:09 node0 devstack@n-api.service[1156072]: ERROR nova.api.openstack.wsgi [None req-59ce5d12-1c84-4c45-8b10-da863b721d6f demo admin] Unexpected exception in API method: TypeError: '<' not supported between instances of 'NoneType' and 'NoneType' Nov 18 09:59:09 node0 devstack@n-api.service[1156072]: ERROR nova.api.openstack.wsgi Traceback (most recent call last): Nov 18 09:59:09 node0 devstack@n-api.service[1156072]: ERROR nova.api.openstack.wsgi File "/opt/stack/nova/nova/api/openstack/wsgi.py", line 664, in wrapped Nov 18 09:59:09 node0 devstack@n-api.service[1156072]: ERROR nova.api.openstack.wsgi return f(*args, **kwargs) Nov 18 09:59:09 node0 devstack@n-api.service[1156072]: ERROR nova.api.openstack.wsgi File "/opt/stack/nova/nova/api/validation/__init__.py", line 192, in wrapper Nov 18 09:59:09 node0 devstack@n-api.service[1156072]: ERROR nova.api.openstack.wsgi return func(*args, **kwargs) Nov 18 09:59:09 node0 devstack@n-api.service[1156072]: ERROR nova.api.openstack.wsgi File "/opt/stack/nova/nova/api/validation/__init__.py", line 192, in wrapper Nov 18 09:59:09 node0 devstack@n-api.service[1156072]: ERROR nova.api.openstack.wsgi return func(*args, **kwargs) Nov 18 09:59:09 node0 devstack@n-api.service[1156072]: ERROR nova.api.openstack.wsgi File "/opt/stack/nova/nova/api/validation/__init__.py", line 192, in wrapper Nov 18 09:59:09 node0 devstack@n-api.service[1156072]: ERROR nova.api.openstack.wsgi return func(*args, **kwargs) Nov 18 09:59:09 node0 devstack@n-api.service[1156072]: ERROR nova.api.openstack.wsgi [Previous line repeated 2 more times] Nov 18 09:59:09 node0 devstack@n-api.service[1156072]: ERROR nova.api.openstack.wsgi File "/opt/stack/nova/nova/api/openstack/compute/servers.py", line 143, in detail Nov 18 09:59:09 node0 devstack@n-api.service[1156072]: ERROR nova.api.openstack.wsgi servers = self._get_servers(req, is_detail=True) Nov 18 09:59:09 node0 devstack@n-api.service[1156072]: ERROR nova.api.openstack.wsgi File "/opt/stack/nova/nova/api/openstack/compute/servers.py", line 327, in _get_servers Nov 18 09:59:09 node0 devstack@n-api.service[1156072]: ERROR nova.api.openstack.wsgi instance_list = self.compute_api.get_all(elevated or context, Nov 18 09:59:09 node0 devstack@n-api.service[1156072]: ERROR nova.api.openstack.wsgi File "/opt/stack/nova/nova/compute/api.py", line 3140, in get_all Nov 18 09:59:09 node0 devstack@n-api.service[1156072]: ERROR nova.api.openstack.wsgi insts, down_cell_uuids = instance_list.get_instance_objects_sorted( Nov 18 09:59:09 node0 devstack@n-api.service[1156072]: ERROR nova.api.openstack.wsgi File "/opt/stack/nova/nova/compute/instance_list.py", line 176, in get_instance_objects_sorted Nov 18 09:59:09 node0 devstack@n-api.service[1156072]: ERROR nova.api.openstack.wsgi instance_list = instance_obj._make_instance_list(ctx, Nov 18 09:59:09 node0 devstack@n-api.service[1156072]: ERROR nova.api.openstack.wsgi File "/opt/stack/nova/nova/objects/instance.py", line 1287, in _make_instance_list Nov 18 09:59:09 node0 devstack@n-api.service[1156072]: ERROR nova.api.openstack.wsgi for db_inst in db_inst_list: Nov 18 09:59:09 node0 devstack@n-api.service[1156072]: ERROR nova.api.openstack.wsgi File "/opt/stack/nova/nova/compute/multi_cell_list.py", line 411, in get_records_sorted Nov 18 09:59:09 node0 devstack@n-api.service[1156072]: ERROR nova.api.openstack.wsgi item = next(feeder) Nov 18 09:59:09 node0 devstack@n-api.service[1156072]: ERROR nova.api.openstack.wsgi File "/usr/lib64/python3.9/heapq.py", line 353, in merge Nov 18 09:59:09 node0 devstack@n-api.service[1156072]: ERROR nova.api.openstack.wsgi _heapify(h) Nov 18 09:59:09 node0 devstack@n-api.service[1156072]: ERROR nova.api.openstack.wsgi File
[Yahoo-eng-team] [Bug 1996788] [NEW] The virtual network is broken on the node after neutron-openvswitch-agent is restarted if RPC requests return an error for a while.
Public bug reported: We ran into a problem in our openstack cluster, when traffic does not go through the virtual network on the node on which the neutron-openvswitch-agent was restarted. We had an update from one version of the Openstack to another and by chance we had a inconsistency of the DB and neutron-server: any port select from the DB returned an error. For a while neutron-openvswitch-agent (just after restart) couldn't get any information via RCP in its rpc_loop iterations due to DB/neutron-server inconsistency. But after updating the database, we got a broken virtual network on the node where the neutron-openvswitch-agent was restarted. It seems to me that I have found a problem place in the logic of neutron-ovs-agent. To demonstrate, better to emulate the RPC request fail from neutron-ovs-agent to neutron-server. Here are the steps to reproduce on devstack setup from the master branch. Two nodes: node0 is controller, node1 is compute. 0) Prepare a vxlan based network and a VM. [root@node0 ~]# openstack network create net1 [root@node0 ~]# openstack subnet create sub1 --network net1 --subnet-range 192.168.1.0/24 [root@node0 ~]# openstack server create vm1 --network net1 --flavor m1.tiny --image cirros-0.5.2-x86_64-disk --host node1 Just after creating the VM, there is a message in the devstack@q-agt logs: Nov 16 09:53:35 node1 neutron-openvswitch-agent[374810]: INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [None req-77753b72-cb23-4dae-b68a-7048b63faf8b None None] Assigning 1 as local vlan for net-id=710bcfcd-44d9-445d-a895-8ec522f64016, seg-id=466 So, local vlan which is used on node1 for the network is `1` A ping from the node0 to the VM on node1 success works: [root@node0 ~]# ip netns exec qdhcp-710bcfcd-44d9-445d-a895-8ec522f64016 ping 192.168.1.211 PING 192.168.1.211 (192.168.1.211) 56(84) bytes of data. 64 bytes from 192.168.1.211: icmp_seq=1 ttl=64 time=1.86 ms 64 bytes from 192.168.1.211: icmp_seq=2 ttl=64 time=0.891 ms 1) Now, please don't misunderstand me, I don't want to be read that I'm patching the code and then clearly something won't work, I just want to emulate a problem that's hard enough to reproduce in a normal way but it can. So, emulate a problem that method get_resource_by_id returns an error just after neutron-ovs-agent restart (RPC based method actually): [root@node1 neutron]# git diff diff --git a/neutron/agent/rpc.py b/neutron/agent/rpc.py index 9a133afb07..299eb25981 100644 --- a/neutron/agent/rpc.py +++ b/neutron/agent/rpc.py @@ -327,6 +327,11 @@ class CacheBackedPluginApi(PluginApi): def get_device_details(self, context, device, agent_id, host=None, agent_restarted=False): +import time +if not hasattr(self, '_stime'): +self._stime = time.time() +if self._stime + 5 > time.time(): +raise Exception('Emulate RPC error in get_resource_by_id call') port_obj = self.remote_resource_cache.get_resource_by_id( resources.PORT, device, agent_restarted) if not port_obj: Restart neutron-openvswitch-agent agent and try to ping after 1-2 mins: [root@node1 ~]# systemctl restart devstack@q-agt [root@node0 ~]# ip netns exec qdhcp-710bcfcd-44d9-445d-a895-8ec522f64016 ping -c 2 192.168.1.234 PING 192.168.1.234 (192.168.1.234) 56(84) bytes of data. --- 192.168.1.234 ping statistics --- 2 packets transmitted, 0 received, 100% packet loss, time 1058ms [root@node0 ~]# Ping doesn't work. Just after the neutron-ovs-agent restart and when the RPC starts working correctly, there are logs: Nov 16 09:55:13 node1 neutron-openvswitch-agent[375032]: INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [None req-135ae96d-905e-485f-8c1f-b0a70616b4c7 None None] Assigning 2 as local vlan for net-id=710bcfcd-44d9-445d-a895-8ec522f64016, seg-id=466 Nov 16 09:55:13 node1 neutron-openvswitch-agent[375032]: INFO neutron.agent.securitygroups_rpc [None req-135ae96d-905e-485f-8c1f-b0a70616b4c7 None None] Preparing filters for devices {'40d82f69-274f-4de5-84d9-6290159f288b'} Nov 16 09:55:13 node1 neutron-openvswitch-agent[375032]: INFO neutron.agent.linux.openvswitch_firewall.firewall [None req-135ae96d-905e-485f-8c1f-b0a70616b4c7 None None] Initializing port 40d82f69-274f-4de5-84d9-6290159f288b that was already initialized. So, `Assigning 2 as local vlan` followed by `Initializing port ... that was already initialized.` 2) Using a pyrasite the eventlet backdoor was setup and I see that in the internal structure inside the OVSFirewallDriver a `vlan_tag` of the port is still `1` instead of `2`: >>> import gc >>> from neutron.agent.linux.openvswitch_firewall.firewall import >>> OVSFirewallDriver >>> for ob in gc.get_objects(): ... if isinstance(ob, OVSFirewallDriver): ... break ... >>> ob.sg_port_map.ports['40d82f69-274f-4de5-84d9-6290159f288b'].vlan_tag 1 >>> So, the OVSFirewallDriver still thinks that
[Yahoo-eng-team] [Bug 1995872] [NEW] A stuck INACTIVE port binding causes wrong l2pop fdb entries to be sent
Public bug reported: We are testing the network availability of VMs in case of HA events. And we run into a problem where aborting live migration of a VM can break communication with that VM in the future at the OVS rules level. The fault of the wrong OVS rules is the stuck INACTIVE port binding in the neutron `ml2_port_bindings` table. Steps to reproduce: Install cluster via devstack with target branch `master` with 3 nodes. mechanism driver is `openvswitch` with `l2population` enabled: [root@node0 ~]# grep -r l2population /etc/neutron/* /etc/neutron/plugins/ml2/ml2_conf.ini:mechanism_drivers = openvswitch,l2population [root@node0 ~]# 0) preparation: - create a vxlan based internal network, - start 3 VMs per each node: vm0 -> node0, vm1 -> node1, vm2 -> node2 [root@node0 ~]# for i in {0..2}; do openstack server create vm$i --network vxlan-net --flavor m1.tiny --image cirros-0.5.2-x86_64-disk; done [root@node0 ~]# for i in {0..2}; do openstack server migrate vm$i --host node$i --live-migration; done 1) abort the `vm1` live migration from node1 -> node0 [root@node0 ~]# openstack server migrate vm1 --host node0 --live-migration; sleep 1; ssh root@node1 systemctl stop devstack@n-cpu.service [root@node0 ~]# openstack server list +--+--+---+-+--+-+ | ID | Name | Status| Networks | Image| Flavor | +--+--+---+-+--+-+ | 56ec7007-5470-42df-863e-8ae7d6a0110f | vm1 | MIGRATING | vxlan-net=192.168.0.169 | cirros-0.5.2-x86_64-disk | m1.tiny | | 5bc93710-8da8-4b12-b1f0-767cf1768d27 | vm2 | ACTIVE| vxlan-net=192.168.0.82 | cirros-0.5.2-x86_64-disk | m1.tiny | | 6f93f40f-0065-413c-81e6-724a21b3756b | vm0 | ACTIVE| vxlan-net=192.168.0.135 | cirros-0.5.2-x86_64-disk | m1.tiny | +--+--+---+-+--+-+ [root@node0 ~]# ssh root@node1 systemctl start devstack@n-cpu.service [root@node0 ~]# openstack server list +--+--++-+--+-+ | ID | Name | Status | Networks | Image| Flavor | +--+--++-+--+-+ | 56ec7007-5470-42df-863e-8ae7d6a0110f | vm1 | ACTIVE | vxlan-net=192.168.0.169 | cirros-0.5.2-x86_64-disk | m1.tiny | | 5bc93710-8da8-4b12-b1f0-767cf1768d27 | vm2 | ACTIVE | vxlan-net=192.168.0.82 | cirros-0.5.2-x86_64-disk | m1.tiny | | 6f93f40f-0065-413c-81e6-724a21b3756b | vm0 | ACTIVE | vxlan-net=192.168.0.135 | cirros-0.5.2-x86_64-disk | m1.tiny | +--+--++-+--+-+ [root@node0 ~]# VM failed to migrate and still on the node1: [root@node0 ~]# openstack server show vm1 -c OS-EXT-SRV-ATTR:host +--+-+ | Field| Value | +--+-+ | OS-EXT-SRV-ATTR:host | node1 | +--+-+ [root@node0 ~]# ssh node1 virsh list Id NameState --- 3instance-0009 running [root@node0 ~]# Now I get two port bindings ACTIVE and INACTIVE for the `vm1` port: MariaDB [neutron]> select port_id,host,vif_type,profile from ml2_port_bindings where port_id='3be55a45-83c6-42b7-82fc-fb6c4855f255'; +--+-+--+-+ | port_id | host| vif_type | profile | +--+-+--+-+ | 3be55a45-83c6-42b7-82fc-fb6c4855f255 | node0 | ovs | {"os_vif_delegation": true} | | 3be55a45-83c6-42b7-82fc-fb6c4855f255 | node1 | ovs | {"migrating_to": "node0"} | +--+-+--+-+ 2) restart on the node2 the neutron-openvswitch-agent, that forces neutron-server to repopulate neighbors fdb entries: [root@node0 ~]# ssh node2 systemctl restart devstack@q-agt.service Now a ping from the vm2 to the vm1 doesn't work: [root@node0 ~]# ip netns exec qdhcp-f0f8f0b6-3cd3-4ae5-b5cf-25f2834bcdb2 ssh cirros@192.168.0.82 sign_and_send_pubkey: no mutual signature supported cirros@192.168.0.82's password: $ ping 192.168.0.169 PING 192.168.0.169 (192.168.0.169): 56 data bytes ^C --- 192.168.0.169 ping statistics --- 4 packets transmitted, 0 packets received, 100% packet loss $ This is because the br-tun rules on node2 send traffic for `vm1` to node0 and not to node1 where VM is
[Yahoo-eng-team] [Bug 1990561] [NEW] Network filtering by provider attributes has a race condition with network removal
Public bug reported: I ran into a problem when the list of networks filtered by segment ID does not match the expected list. An important condition is the parallel removal of another network. Here is a demo: Console 1: $ while :; do openstack network create test-net --provider-segment 200 --provider-network-type vxlan >/dev/null; openstack network delete test-net; done Console 2: $ for i in {0..1000}; do net=$(openstack network list --provider-segment 100); [[ -n "${net}" ]] && echo "${net}" && echo "Iter=$i" && break; done +--+--+-+ | ID | Name | Subnets | +--+--+-+ | 64ccd339-c669-4b8b-9d11-758e98295955 | test-net | | +--+--+-+ Iter=81 $ A log file has a message: 2022-09-22 20:13:15.706 25 DEBUG neutron.plugins.ml2.managers [None req-4c379e00-4794-4625-afe7-64643aa801cf 4f5e975fb1044192a4930fd01ca7d9d7 1958e62e718f468299ae302a12364c08 - default default] Network 64ccd339-c669-4b8b-9d11-758e98295955 has no segments extend_network_with_provider_segments /usr/lib64/python3.6/site-packages/neutron/plugins/ml2/managers.py:169 So, it looks like there is a race condition. OS version: Xena ** Affects: neutron Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1990561 Title: Network filtering by provider attributes has a race condition with network removal Status in neutron: New Bug description: I ran into a problem when the list of networks filtered by segment ID does not match the expected list. An important condition is the parallel removal of another network. Here is a demo: Console 1: $ while :; do openstack network create test-net --provider-segment 200 --provider-network-type vxlan >/dev/null; openstack network delete test-net; done Console 2: $ for i in {0..1000}; do net=$(openstack network list --provider-segment 100); [[ -n "${net}" ]] && echo "${net}" && echo "Iter=$i" && break; done +--+--+-+ | ID | Name | Subnets | +--+--+-+ | 64ccd339-c669-4b8b-9d11-758e98295955 | test-net | | +--+--+-+ Iter=81 $ A log file has a message: 2022-09-22 20:13:15.706 25 DEBUG neutron.plugins.ml2.managers [None req-4c379e00-4794-4625-afe7-64643aa801cf 4f5e975fb1044192a4930fd01ca7d9d7 1958e62e718f468299ae302a12364c08 - default default] Network 64ccd339-c669-4b8b-9d11-758e98295955 has no segments extend_network_with_provider_segments /usr/lib64/python3.6/site-packages/neutron/plugins/ml2/managers.py:169 So, it looks like there is a race condition. OS version: Xena To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1990561/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1967142] [NEW] No way to set quotas for neutron-vpnaas resources using openstack CLI tool
Public bug reported: I can't find a way to set up VPN quotas using the CLI tools: neither the openstack CLI nor deprecated neutron CLI has this feature. I can only update VPN quotas using a direct API request (e.g. via curl). And can only list VPN quotas using neutron CLI tool. [root@node4578 ~]# curl -ks -H "x-auth-token: $token" -X PUT https://192.168.1.10:9696/v2.0/quotas/e28d46f9ce084b21a163f72ce1a49adf -d '{"quota": {"ipsec_site_connection": 5}}' {"quota": {"subnet": -1, "ikepolicy": -1, "subnetpool": -1, "network": -1, "ipsec_site_connection": 5, "endpoint_group": -1, "ipsecpolicy": -1, "security_group_device": -1, "security_group_rule": -1, "vpnservice": -1, "floatingip": -1, "security_group": -1, "router": -1, "rbac_policy": -1, "port": -1}} [root@node4578 ~]# [root@node4578 ~]# neutron quota-show e28d46f9ce084b21a163f72ce1a49adf neutron CLI is deprecated and will be removed in the future. Use openstack CLI instead. +---+---+ | Field | Value | +---+---+ | endpoint_group| -1| | floatingip| -1| | ikepolicy | -1| | ipsec_site_connection | 5 | | ipsecpolicy | -1| | network | -1| | port | -1| | rbac_policy | -1| | router| -1| | security_group| -1| | security_group_device | -1| | security_group_rule | -1| | subnet| -1| | subnetpool| -1| | vpnservice| -1| +---+---+ [root@node4578 ~]# openstack quota list --network --detail --project e28d46f9ce084b21a163f72ce1a49adf +--++--+---+ | Resource | In Use | Reserved | Limit | +--++--+---+ | subnets | 0 |0 |-1 | | routers | 0 |0 |-1 | | security_group_rules | 0 |0 |-1 | | subnet_pools | 0 |0 |-1 | | security_groups | 0 |0 |-1 | | rbac_policies| 0 |0 |-1 | | floating_ips | 0 |0 |-1 | | networks | 0 |0 |-1 | | ports| 0 |0 |-1 | +--++--+---+ [root@node4578 ~]# ** Affects: neutron Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1967142 Title: No way to set quotas for neutron-vpnaas resources using openstack CLI tool Status in neutron: New Bug description: I can't find a way to set up VPN quotas using the CLI tools: neither the openstack CLI nor deprecated neutron CLI has this feature. I can only update VPN quotas using a direct API request (e.g. via curl). And can only list VPN quotas using neutron CLI tool. [root@node4578 ~]# curl -ks -H "x-auth-token: $token" -X PUT https://192.168.1.10:9696/v2.0/quotas/e28d46f9ce084b21a163f72ce1a49adf -d '{"quota": {"ipsec_site_connection": 5}}' {"quota": {"subnet": -1, "ikepolicy": -1, "subnetpool": -1, "network": -1, "ipsec_site_connection": 5, "endpoint_group": -1, "ipsecpolicy": -1, "security_group_device": -1, "security_group_rule": -1, "vpnservice": -1, "floatingip": -1, "security_group": -1, "router": -1, "rbac_policy": -1, "port": -1}} [root@node4578 ~]# [root@node4578 ~]# neutron quota-show e28d46f9ce084b21a163f72ce1a49adf neutron CLI is deprecated and will be removed in the future. Use openstack CLI instead. +---+---+ | Field | Value | +---+---+ | endpoint_group| -1| | floatingip| -1| | ikepolicy | -1| | ipsec_site_connection | 5 | | ipsecpolicy | -1| | network | -1| | port | -1| | rbac_policy | -1| | router| -1| | security_group| -1| | security_group_device | -1| | security_group_rule | -1| | subnet| -1| | subnetpool| -1| | vpnservice| -1| +---+---+ [root@node4578 ~]# openstack quota list --network --detail --project e28d46f9ce084b21a163f72ce1a49adf +--++--+---+ | Resource | In Use | Reserved | Limit | +--++--+---+ | subnets | 0 |0 |-1 | | routers | 0 |0 |-1 | | security_group_rules | 0 |0 |-1 | | subnet_pools | 0 |0 |-1 | | security_groups | 0 |0 |-1 | | rbac_policies| 0 |0 |-1 |
[Yahoo-eng-team] [Bug 1959697] [NEW] VM gets wrong ipv6 address from dhcp-agent after ipv6 address on port was changed
Public bug reported: I run into a problem when neutron dhcp-agent is still replying to the old address confirmation. Simple steps to reproduce: - create a port with IPv6 address in dhcpv6-stateful subnet - create a VM with cloud-init inside - change the IPv6 port address - reboot the VM Here are my commands: $ openstack subnet create --subnet-range 2001:db8:123::/64 --ip-version 6 --ipv6-address-mode dhcpv6-stateful --network public subv6 $ openstack subnet list --network public +--+---+--+---+ | ID | Name | Network | Subnet| +--+---+--+---+ | 6d9a7fb5-5c1b-4759-b32b-5720b5cedbf4 | subv4 | f1f3d967-26db-41b3-b6f6-1d5356e33a84 | 10.136.16.0/22| | 76db898c-6a7a-4301-9253-23241cafaa83 | subv6 | f1f3d967-26db-41b3-b6f6-1d5356e33a84 | 2001:db8:123::/64 | +--+---+--+---+ $ $ openstack port create my-port --network public --fixed-ip ip-address=10.136.17.163 --fixed-ip ip-address=2001:db8:123::111 $ openstack server create test --flavor m1.small --port my-port --image CentOS-7-x86_64-GenericCloud-2009.qcow2 --key-name key --use-config-drive Check IPv6 address inside VM (it's correct): [centos@test ~]$ ip a s eth0 2: eth0: mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 link/ether fa:16:3e:2e:66:ac brd ff:ff:ff:ff:ff:ff inet 10.136.17.163/22 brd 10.136.19.255 scope global dynamic eth0 valid_lft 86371sec preferred_lft 86371sec inet6 2001:db8:123::111/128 scope global dynamic valid_lft 7473sec preferred_lft 7173sec inet6 fe80::f816:3eff:fe2e:66ac/64 scope link valid_lft forever preferred_lft forever [centos@test ~]$ Change IPv6 address and reboot the VM: $ openstack port set my-port --no-fixed-ip --fixed-ip ip-address=10.136.17.163 --fixed-ip ip-address=2001:db8:123::222 $ openstack server reboot test [centos@test ~]$ ip a s eth0 2: eth0: mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 link/ether fa:16:3e:2e:66:ac brd ff:ff:ff:ff:ff:ff inet 10.136.17.163/22 brd 10.136.19.255 scope global dynamic eth0 valid_lft 86382sec preferred_lft 86382sec inet6 2001:db8:123::111/128 scope global dynamic valid_lft 7482sec preferred_lft 7182sec inet6 fe80::f816:3eff:fe2e:66ac/64 scope link valid_lft forever preferred_lft forever [centos@test ~]$ ^^ you can see the VM got the old IPv6 address and actually all traffic is blocked by port-security feature. If I remove a lease file and re- spawn a dhclient, all is fine: [centos@test ~]$ ps axf | grep dhcl 780 ?Ss 0:00 /sbin/dhclient -1 -q -lf /var/lib/dhclient/dhclient--eth0.lease -pf /var/run/dhclient-eth0.pid -H test eth0 868 ?Ss 0:00 /sbin/dhclient -6 -1 -lf /var/lib/dhclient/dhclient6--eth0.lease -pf /var/run/dhclient6-eth0.pid eth0 -H test 1371 pts/0S+ 0:00 \_ grep --color=auto dhcl [centos@test ~]$ sudo kill -9 868 [centos@test ~]$ sudo ip addr del 2001:db8:123::111/128 dev eth0 [centos@test ~]$ sudo rm -rf /var/lib/dhclient/dhclient6--eth0.lease [centos@test ~]$ sudo /sbin/dhclient -6 -1 -lf /var/lib/dhclient/dhclient6--eth0.lease -pf /var/run/dhclient6-eth0.pid eth0 -H test [centos@test ~]$ ip a s eth0 2: eth0: mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 link/ether fa:16:3e:2e:66:ac brd ff:ff:ff:ff:ff:ff inet 10.136.17.163/22 brd 10.136.19.255 scope global dynamic eth0 valid_lft 86319sec preferred_lft 86319sec inet6 2001:db8:123::222/128 scope global dynamic valid_lft 7481sec preferred_lft 7181sec inet6 fe80::f816:3eff:fe2e:66ac/64 scope link valid_lft forever preferred_lft forever [centos@test ~]$ I found some logic with dhcpv6 leases removing here: https://opendev.org/openstack/neutron/src/commit/e7b70521d0e230143a80974e7e4795a2acafcc9b/neutron/agent/linux/dhcp.py#L600 but it looks like it doesn't help in case of DHCPCONFIRM client request: In the dnsmasq logs I see the following DHCPCONFIRM->DHCPREPLY messages exchange after the VM came back after the reboot (see also https://datatracker.ietf.org/doc/html/rfc3315#page-50): Feb 1 16:49:12 dnsmasq-dhcp[1360521]: DHCPREQUEST(tapc233cb5c-8f) 10.136.17.163 fa:16:3e:2e:66:ac Feb 1 16:49:12 dnsmasq-dhcp[1360521]: DHCPACK(tapc233cb5c-8f) 10.136.17.163 fa:16:3e:2e:66:ac host-10-136-17-163 Feb 1 16:49:15 dnsmasq-dhcp[1360521]: DHCPCONFIRM(tapc233cb5c-8f) 00:01:00:01:29:8c:20:5e:fa:16:3e:2e:66:ac Feb 1 16:49:15 dnsmasq-dhcp[1360521]: DHCPREPLY(tapc233cb5c-8f) 2001:db8:123::111 00:01:00:01:29:8c:20:5e:fa:16:3e:2e:66:ac host-2001-db8-123--222 ** Affects: neutron Importance: Undecided Status: New ** Description changed: I run into a
[Yahoo-eng-team] [Bug 1958643] [NEW] Unicast RA messages for a VM are filtered out by ovs rules
Public bug reported: I run into a problem when unicast RA messages are not accepted by openflow rules. In my configuration I'm using radvd daemon to send RA messages in my IPv6 network. Here is a config of radvd with `clients` dirrective to turn off multicast messages: [root@radvd ~]# cat /etc/radvd.conf interface br-eth0 { AdvSendAdvert on; MinRtrAdvInterval 3; MaxRtrAdvInterval 5; prefix 2001:db8:123::/64 { AdvOnLink on; AdvAutonomous on; AdvRouterAddr off; }; clients { fe80::f816:3eff:fed7:358a; }; }; [root@radvd ~]# I use devstack installation with Neutron from the master branch. I've create a virtual flat network with dual stack: IPv4 and IPv6 subnets. IPv6 subnet has a SLAAC address mode. And created a VM to test IPv6 address assignment inside VM. But RA message doesn't reach the VM. VM/port/security group rules: [root@devstack ~]# openstack server list +--+--++--+-+--+ | ID | Name | Status | Networks | Image | Flavor | +--+--++--+-+--+ | 332942be-0869-403f-9aba-386f88b9bc9d | test | ACTIVE | public=10.136.17.163, 2001:db8:123:0:f816:3eff:fed7:358a | CentOS-7-x86_64-GenericCloud-2009.qcow2 | m1.small | +--+--++--+-+--+ [root@devstack ~]# [root@devstack ~]# openstack port show 664489d1-f15f-4990-99eb-b53ad21f673a +-++ | Field | Value | +-++ | admin_state_up | UP | | allowed_address_pairs | | | binding_host_id | devstack | | binding_profile | | | binding_vif_details | bridge_name='br-int', connectivity='l2', datapath_type='system', ovs_hybrid_plug='False', port_filter='True' | | binding_vif_type| ovs | | binding_vnic_type | normal | | created_at | 2022-01-21T11:32:19Z | | data_plane_status | None | | description | | | device_id | 332942be-0869-403f-9aba-386f88b9bc9d | | device_owner| compute:nova
[Yahoo-eng-team] [Bug 1938191] [NEW] L3 agent fails to process a DVR router external network change
Public bug reported: I ran into a problem when L3 agent fails to process the external network change on the router and was hitting the retry limit. I'm using a devstack deployment over the master branch. * Pre-conditions: L3 agent in DVR mode mechanism driver is openvswitch * Step-by-step reproduction steps: - create two external networks and three internal - create three routers and add the corresponding internal networks - connect external networks to the routers (according to the scheme: net1->r1, net2->r2, net1->r3) - switch the external network of the r3 router from net1 to net2 Here are the CLI commands: openstack network create phys-net1 --external openstack network create phys-net2 --external openstack network create priv-net1 openstack network create priv-net2 openstack network create priv-net3 openstack subnet create --network phys-net1 --subnet-range 192.168.1.0/24 phys-sub1 openstack subnet create --network phys-net2 --subnet-range 192.168.2.0/24 phys-sub2 openstack subnet create --network priv-net1 --subnet-range 192.168.10.0/24 priv-sub1 openstack subnet create --network priv-net2 --subnet-range 192.168.20.0/24 priv-sub2 openstack subnet create --network priv-net3 --subnet-range 192.168.30.0/24 priv-sub3 openstack router create r1 openstack router create r2 openstack router create r3 openstack router add subnet r1 priv-sub1 openstack router add subnet r2 priv-sub2 openstack router add subnet r3 priv-sub3 openstack router set r1 --external-gateway phys-net1 openstack router set r2 --external-gateway phys-net2 openstack router set r3 --external-gateway phys-net1 # Switch r3 external network from phys-net1 to phys-net2: openstack router set r3 --external-gateway phys-net2 After switching in the l3 agent logs one can observe unsuccessful attempts to process the changes and the message (see the logs of router processing below): 'Hit retry limit with router update for , action 3' The state of resources and net devices: [root@devstack ~]# openstack router list +--+--++---+--+-+---+ | ID | Name | Status | State | Project | Distributed | HA| +--+--++---+--+-+---+ | 6cb4a81f-9b5a-4f98-9ef2-705b369d4240 | r2 | ACTIVE | UP| f3f8c288836f47ca930e13620f27a8c8 | True| False | | 9e15faf3-8478-4b2a-83f1-ad2cc8cd9de4 | r3 | ACTIVE | UP| f3f8c288836f47ca930e13620f27a8c8 | True| False | | c37e75aa-4bc1-4d56-95a1-3045d8817c26 | r1 | ACTIVE | UP| f3f8c288836f47ca930e13620f27a8c8 | True| False | +--+--++---+--+-+---+ [root@devstack ~]# openstack network list +--+---+--+ | ID | Name | Subnets | +--+---+--+ | 34cf22a5-8368-4935-a5a6-47bf2763d6a1 | priv-net2 | 2f067140-d6a8-4341-ac53-aef48be15877 | | 86f5bceb-a945-48c0-ad50-ae3e395fd21f | phys-net1 | d03016ee-5724-47ea-891c-018cdd8338f1 | | 8bbaff79-4e40-4341-b48d-76b8a62f80cd | priv-net1 | ef7dca63-29f8-4483-af7f-8ab9661232f2 | | a3704615-3e3e-4a03-a425-5851a381e702 | phys-net2 | 647ed571-c6ee-4f7f-8ecf-8a78b5f0b534 | | f142ca45-9cce-4619-9964-ad68b64aa0a2 | priv-net3 | e386cfdd-d52c-4830-a90b-bdc5cb656ad7 | +--+---+--+ [root@devstack ~]# openstack router show r3 -c external_gateway_info +---+--+ | Field | Value | +---+--+ | external_gateway_info | {"network_id": "a3704615-3e3e-4a03-a425-5851a381e702", "external_fixed_ips": [{"subnet_id": "647ed571-c6ee-4f7f-8ecf-8a78b5f0b534", "ip_address": "192.168.2.42"}], "enable_snat": true} | +---+--+ [root@devstack ~]# [root@devstack ~]# ip netns snat-9e15faf3-8478-4b2a-83f1-ad2cc8cd9de4 (id: 12)
[Yahoo-eng-team] [Bug 1929438] [NEW] Cannot provision flat network after reconfiguring physical bridges
Public bug reported: I ran into a problem when the network inside the newly created VM is not working. * Pre-conditions: - the neutron ovs agent has not yet seen any ports from the VM network; - any other bridge (except for the network in which the VM is created) is recreated on the node. * Step-by-step reproduction steps: The bridge mapping from ml2_conf.ini looks like: [ovs] bridge_mappings = Public:br-eth0,Test:br-test The 'Test:br-test' mapping is a test bridge to demonstrate the problem. I've created it using ovs-vsctl tool like: ovs-vsctl add-br br-test. 1) Recreate this test bridge that triggers _reconfigure_physical_bridges: [root@sqvm2-2009 ~]# ovs-vsctl del-br br-test; ovs-vsctl add-br br-test [root@sqvm2-2009 ~]# 2) Create the first VM from the 'public' network that is mapped to the 'Public' bridge and try to ping it: --- [root@sqvm2-2009 ~]# openstack server create test-vm --image cirros --flavor 100 --network public --boot-from-volume 1 [root@sqvm2-2009 ~]# openstack server list +--+-++-+--++ | ID | Name| Status | Networks | Image| Flavor | +--+-++-+--++ | 68c32b4d-8f90-4ced-8ca4-67a9e4ff255b | test-vm | ACTIVE | public=10.34.111.12 | N/A (booted from volume) | tiny | +--+-++-+--++ [root@sqvm2-2009 ~]# virsh console 68c32b4d-8f90-4ced-8ca4-67a9e4ff255b Connected to domain instance-0005 Escape character is ^] login as 'cirros' user. default password: 'gocubsgo'. use 'sudo' for root. cirros login: cirros Password: $ sudo ip addr add 10.34.111.12/18 dev eth0 $ ip a s eth0 2: eth0: mtu 1500 qdisc pfifo_fast qlen 1000 link/ether fa:16:3e:67:f8:4e brd ff:ff:ff:ff:ff:ff inet 10.34.111.12/18 scope global eth0 valid_lft forever preferred_lft forever inet6 fe80::f816:3eff:fe67:f84e/64 scope link valid_lft forever preferred_lft forever $ [root@sqvm2-2009 ~]# ping 10.34.111.12 PING 10.34.111.12 (10.34.111.12) 56(84) bytes of data. >From 10.34.66.138 icmp_seq=1 Destination Host Unreachable >From 10.34.66.138 icmp_seq=2 Destination Host Unreachable >From 10.34.66.138 icmp_seq=3 Destination Host Unreachable >From 10.34.66.138 icmp_seq=4 Destination Host Unreachable ^C --- 10.34.111.12 ping statistics --- 5 packets transmitted, 0 received, +4 errors, 100% packet loss, time 4000ms pipe 4 [root@sqvm2-2009 ~]# --- * Actual result: The VM is not pingable, but should. During port processing in neutron-openvswitch-agent rpc_loop, one can see the logs: 2021-05-24 17:28:29.776 13744 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-7f3667e0-56a5-4830-8376-10577a2ee167 - - - - -] Port c010955a-4782-418d-a612-0bfbd66b3c09 updated. Details: {'device': 'c010955a-4782-418d-a612-0bfbd66b3c09', 'device_id': '68c32b4d-8f90-4ced-8ca4-67a9e4ff255b', 'network_id': '568fb8ce-8f1b-456e-8a31-330ef19f2f5c', 'port_id': 'c010955a-4782-418d-a612-0bfbd66b3c09', 'mac_address': 'fa:16:3e:67:f8:4e', 'admin_state_up': True, 'network_type': 'flat', 'segmentation_id': None, 'physical_network': 'Public', 'fixed_ips': [{'subnet_id': 'b6f963e3-ad77-4bde-8431-049f87871422', 'ip_address': '10.34.111.12'}], 'device_owner': 'compute:nova', 'allowed_address_pairs': [], 'port_security_enabled': True, 'qos_policy_id': None, 'network_qos_policy_id': None, 'profile': {}, 'vif_type': 'ovs', 'vnic_type': 'normal', 'security_groups': ['09948793-2e11-4d89-ad1f-0c0d0eef80f0'], 'migrating_to': None} 2021-05-24 17:28:29.776 13744 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-7f3667e0-56a5-4830-8376-10577a2ee167 - - - - -] Assigning 2 as local vlan for net-id=568fb8ce-8f1b-456e-8a31-330ef19f2f5c 2021-05-24 17:28:29.777 13744 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-7f3667e0-56a5-4830-8376-10577a2ee167 - - - - -] Cannot provision flat network for net-id=568fb8ce-8f1b-456e-8a31-330ef19f2f5c - no bridge for physical_network Public * Version: Stein release. The issue is also reproducible on the master branch. * Attachments: Full neutron-openvswitch-agent service logs attached ** Affects: neutron Importance: Undecided Status: New ** Attachment added: "neutron-openvswitch-agent.log" https://bugs.launchpad.net/bugs/1929438/+attachment/5499913/+files/neutron-openvswitch-agent.log -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1929438 Title: Cannot provision flat network after reconfiguring physical bridges Status in neutron: New Bug description: I ran into a problem when the network inside the
[Yahoo-eng-team] [Bug 1808541] [NEW] Openflow entries are not totally removed for stopped VM
Public bug reported: I am using Queens release and VM's tap interfaces are plugged into ovs br-int. I'm watching a case when openflow entries are not totally removed when I stop my VM (name='my-vm'). It is only reproducable when there is some another activity on a node for different VMs: in my case I attach new network to another VM (name='vm-other') ovs-agent logs in attach. I managed to simulate the issue using next steps: 1) grep by mac current openflow entries for my-vm: # ovs-ofctl dump-flows br-int | grep fa:16:3e:ec:d3:45 cookie=0xf4d7d970f5382f3d, duration=93.162s, table=60, n_packets=146, n_bytes=21001, idle_age=4, priority=90,dl_vlan=9,dl_dst=fa:16:3e:ec:d3:45 actions=load:0xa3->NXM_NX_REG5[],load:0x9->NXM_NX_REG6[],strip_vlan,resubmit(,81) cookie=0xf4d7d970f5382f3d, duration=93.162s, table=71, n_packets=2, n_bytes=84, idle_age=4, priority=95,arp,reg5=0xa3,in_port=163,dl_src=fa:16:3e:ec:d3:45,arp_spa=10.94.152.212 actions=NORMAL cookie=0xf4d7d970f5382f3d, duration=93.162s, table=71, n_packets=28, n_bytes=2448, idle_age=9, priority=65,ip,reg5=0xa3,in_port=163,dl_src=fa:16:3e:ec:d3:45,nw_src=10.94.152.212 actions=ct(table=72,zone=NXM_NX_REG6[0..15]) cookie=0xf4d7d970f5382f3d, duration=93.162s, table=71, n_packets=0, n_bytes=0, idle_age=93, priority=65,ipv6,reg5=0xa3,in_port=163,dl_src=fa:16:3e:ec:d3:45,ipv6_src=fe80::f816:3eff:feec:d345 actions=ct(table=72,zone=NXM_NX_REG6[0..15]) cookie=0xf4d7d970f5382f3d, duration=93.162s, table=73, n_packets=0, n_bytes=0, idle_age=3401, priority=100,reg6=0x9,dl_dst=fa:16:3e:ec:d3:45 actions=load:0xa3->NXM_NX_REG5[],resubmit(,81) # 2) # ps ax | grep libvirt 4887 pts/6S+ 0:00 grep --color=auto libvirt 3934 ?Sl 0:18 /usr/libexec/qemu-kvm -name guest=instance-0012,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-131-instance-0012/master-key.aes -machine pc-i440fx-vz7.8.0,accel=kvm,usb=off,dump-guest-core=off -cpu Westmere-IBRS,vme=on,ss=on,pcid=on,x2apic=on,tsc-deadline=on,hypervisor=on,arat=on,tsc_adjust=on,ssbd=on,stibp=on,pdpe1gb=on,rdtscp=on,aes=off,+kvmclock -m 512 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -numa node,nodeid=0,cpus=0,mem=512 -uuid 89fccc31-96a6-47ce-abd1-e40fba7274e6 -smbios type=1,manufacturer=Virtuozzo Infrastructure Platform,product=OpenStack Compute,version=17.0.6-1.vl7,serial=71f55add-ef93-4ec2-a4dd-ab8098b6312d,uuid=89fccc31-96a6-47ce-abd1-e40fba7274e6,family=Virtual Machine -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-131-instance-0012/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -global kvm-pit.lost_tick_policy=discard -no-shutdown -boot strict=on -device nec-usb-x,id=usb,bus=pci.0,addr=0x4 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x5 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x6 -drive file=/mnt/vstorage/vols/datastores/cinder/volume-e30d1874-d68e-4578-bf89-aa599e8383c7/volume-e30d1874-d68e-4578-bf89-aa599e8383c7,format=qcow2,if=none,id=drive-scsi0-0-0-0,cache=none,l2-cache-size=128M,discard=unmap,aio=native -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1,logical_block_size=512,physical_block_size=4096,serial=e30d1874-d68e-4578-bf89-aa599e8383c7 -netdev tap,fd=28,id=hostnet0,vhost=on,vhostfd=30 -device virtio-net-pci,host_mtu=1500,netdev=hostnet0,id=net0,mac=fa:16:3e:ec:d3:45,bus=pci.0,addr=0x3 -chardev pty,id=charserial0,logfile=/mnt/vstorage/vols/datastores/nova/instances/89fccc31-96a6-47ce-abd1-e40fba7274e6/console.log,logappend=on -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/org.qemu.guest_agent.0.instance-0012.sock,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 -chardev socket,id=charchannel1,path=/var/lib/libvirt/qemu/org.qemu.guest_agent.1.instance-0012.sock,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=org.qemu.guest_agent.1 -device usb-tablet,id=input0,bus=usb.0,port=1 -vnc 10.10.1.237:0 -device VGA,id=video0,vgamem_mb=16,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -device vmcoreinfo -device pvpanic,ioport=1285 -msg timestamp=on 24553 ?Ssl 11:44 /usr/sbin/libvirtd --listen Note: "-netdev tap,fd=28", so, net device is passed to qemu as handle and AFAIU tap interface is auto removed (by kernel) when qemu process exits. 3) SIGSTOP libvirtd to emulate port deletion delay that is performed by libvirtd when guest is stopped (I believe libvirtd removes port when guest is stopped in the way like 'ovs-vsctl --timeout=5 -- --if-exists del-port taped0487c9-23') # kill -SIGSTOP 24553 # 4) Kill the guest: # kill -9 3934 # Ovs agent logs right after killing: 2018-12-14