[Yahoo-eng-team] [Bug 2052937] Re: Policy: binding operations are prohibited for service role
** Changed in: neutron Status: Invalid => Triaged -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/2052937 Title: Policy: binding operations are prohibited for service role Status in neutron: Triaged Bug description: Create/update port binding:* policies are admin only, which prevents for example ironic service user with service role to manage baremetal ports: "http://192.0.2.10:9292;, "region": "RegionOne"}], "id": "e6e42ef4fc984e71b575150e59a92704", "type": "image", "name": "glance"}]}} get_auth_ref /var/lib/kolla/venv/lib64/python3.9/site-packages/keystoneauth1/identity/v3/base.py:189 2024-02-12 11:44:57.848 7 ERROR ironic.common.neutron [None req-6737aef3-c823-4f7c-95ec-1c9f38b14faa a4dbb0dc59024c199843cea86603308b 9fd64a4cbd774756869cb3968de2e9b6 - - default default] Unable to clear binding profile for neutron port 291dbb7b-5cc8-480d-b39d-eb849bcb4a64. Error: ForbiddenException: 403: Client Error for url: http://192.0.2.10:9696/v2.0/ports/291dbb7b-5cc8-480d-b39d-eb849bcb4a64, ((rule:update_port and rule:update_port:binding:host_id) and rule:update_port:binding:profile) is disallowed by policy: openstack.exceptions.ForbiddenException: ForbiddenException: 403: Client Error for url: http://192.0.2.10:9696/v2.0/ports/291dbb7b-5cc8-480d-b39d-eb849bcb4a64, ((rule:update_port and rule:update_port:binding:host_id) and rule:update_port:binding:profile) is disallowed by policy 2024-02-12 11:44:57.848 7 ERROR ironic.common.neutron Traceback (most recent call last): 2024-02-12 11:44:57.848 7 ERROR ironic.common.neutron File "/var/lib/kolla/venv/lib64/python3.9/site-packages/ironic/common/neutron.py", line 130, in unbind_neutron_port 2024-02-12 11:44:57.848 7 ERROR ironic.common.neutron update_neutron_port(context, port_id, attrs_unbind, client) 2024-02-12 11:44:57.848 7 ERROR ironic.common.neutron File "/var/lib/kolla/venv/lib64/python3.9/site-packages/ironic/common/neutron.py", line 109, in update_neutron_port 2024-02-12 11:44:57.848 7 ERROR ironic.common.neutron return client.update_port(port_id, **attrs) 2024-02-12 11:44:57.848 7 ERROR ironic.common.neutron File "/var/lib/kolla/venv/lib64/python3.9/site-packages/openstack/network/v2/_proxy.py", line 2992, in update_port 2024-02-12 11:44:57.848 7 ERROR ironic.common.neutron return self._update(_port.Port, port, if_revision=if_revision, **attrs) 2024-02-12 11:44:57.848 7 ERROR ironic.common.neutron File "/var/lib/kolla/venv/lib64/python3.9/site-packages/openstack/proxy.py", line 61, in check 2024-02-12 11:44:57.848 7 ERROR ironic.common.neutron return method(self, expected, actual, *args, **kwargs) 2024-02-12 11:44:57.848 7 ERROR ironic.common.neutron File "/var/lib/kolla/venv/lib64/python3.9/site-packages/openstack/network/v2/_proxy.py", line 202, in _update 2024-02-12 11:44:57.848 7 ERROR ironic.common.neutron return res.commit(self, base_path=base_path, if_revision=if_revision) 2024-02-12 11:44:57.848 7 ERROR ironic.common.neutron File "/var/lib/kolla/venv/lib64/python3.9/site-packages/openstack/resource.py", line 1803, in commit 2024-02-12 11:44:57.848 7 ERROR ironic.common.neutron return self._commit( 2024-02-12 11:44:57.848 7 ERROR ironic.common.neutron File "/var/lib/kolla/venv/lib64/python3.9/site-packages/openstack/resource.py", line 1848, in _commit 2024-02-12 11:44:57.848 7 ERROR ironic.common.neutron self._translate_response(response, has_body=has_body) 2024-02-12 11:44:57.848 7 ERROR ironic.common.neutron File "/var/lib/kolla/venv/lib64/python3.9/site-packages/openstack/resource.py", line 1287, in _translate_response 2024-02-12 11:44:57.848 7 ERROR ironic.common.neutron exceptions.raise_from_response(response, error_message=error_message) 2024-02-12 11:44:57.848 7 ERROR ironic.common.neutron File "/var/lib/kolla/venv/lib64/python3.9/site-packages/openstack/exceptions.py", line 250, in raise_from_response 2024-02-12 11:44:57.848 7 ERROR ironic.common.neutron raise cls( 2024-02-12 11:44:57.848 7 ERROR ironic.common.neutron openstack.exceptions.ForbiddenException: ForbiddenException: 403: Client Error for url: http://192.0.2.10:9696/v2.0/ports/291dbb7b-5cc8-480d-b39d-eb849bcb4a64, ((rule:update_port and rule:update_port:binding:host_id) and rule:update_port:binding:profile) is disallowed by policy To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/2052937/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 2052937] Re: Policy: binding operations are prohibited for service role
Hi Bartosz, Yes, by default this is prohibited. However oslo.policy based policies are configurable. For example, in my devstack I don't have ironic deployed, but I reproduced the problem using the unprivileged 'demo' user: $ source openrc demo demo $ openstack network create net0 $ openstack subnet create --network net0 --subnet-range 10.0.0.0/24 subnet0 $ openstack port create --network net0 port0 $ openstack port set --host devstack0 port0 ForbiddenException: 403: Client Error for url: http://192.168.122.225:9696/networking/v2.0/ports/4d6fa1c1-bbb0-4298-a901-c3dec7f1b1f1, (rule:update_port and rule:update_port:binding:host_id) is disallowed by policy While in q-svc logs I had this: febr 13 14:03:42 devstack0 neutron-server[5814]: DEBUG neutron.policy [None req-9fa226e6-2ae5-4abe-9b70-efc749ef4913 None demo] Enforcing rules: ['update_port', 'update_port:binding:host_id'] {{(pid=5814) log_rule_list /opt/stack/neutron/neutron/policy.py:457}} febr 13 14:03:42 devstack0 neutron-server[5814]: DEBUG neutron.policy [None req-9fa226e6-2ae5-4abe-9b70-efc749ef4913 None demo] Failed policy enforce for 'update_port' {{(pid=5814) enforce /opt/stack/neutron/neutron/policy.py:530}} The non-default policy configuration is looked up by oslo.policy in /etc/neutron/policy.{json,yaml}. Today I believe the yaml format is preferred. But for some reason devstack still created the old json format for me. So first I migrated the one-line json file to yaml: $ cat /etc/neutron/policy.json {"context_is_admin": "role:admin or user_name:neutron"} $ cat /etc/neutron/policy.yaml "context_is_admin": "role:admin or user_name:neutron" I believe this all was deployment (here devstack) specific. I also told oslo.policy running in neutron-server to use the yaml formatted file: /etc/neutron/neutron.conf: [oslo_policy] policy_file = /etc/neutron/policy.yaml Then I changed the policy for port binding from the default: "update_port:binding:host_id": "rule:admin_only" to "update_port:binding:host_id": "rule:admin_or_owner" After this change the above "openstack port set --host" starts working. Even without restarting neutron-server. In your environment of course you want to use a different rule, maybe something like this: "update_port:binding:host_id": "(rule:admin_only) or (rule:service_api)" Since I don't have ironic in this environment, I could not test this rule. But please have a look at the documentation, I'm virtually sure there's a way to set what you need. https://docs.openstack.org/neutron/latest/configuration/policy.html https://docs.openstack.org/neutron/latest/configuration/policy-sample.html https://docs.openstack.org/oslo.policy/latest/ Regarding the default, I believe for most environments it is good that only the admin can change port bindings. If you believe differently, please share your reasons. Until then I'm marking this as not a bug. Regards, Bence ** Changed in: neutron Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/2052937 Title: Policy: binding operations are prohibited for service role Status in neutron: Invalid Bug description: Create/update port binding:* policies are admin only, which prevents for example ironic service user with service role to manage baremetal ports: "http://192.0.2.10:9292;, "region": "RegionOne"}], "id": "e6e42ef4fc984e71b575150e59a92704", "type": "image", "name": "glance"}]}} get_auth_ref /var/lib/kolla/venv/lib64/python3.9/site-packages/keystoneauth1/identity/v3/base.py:189 2024-02-12 11:44:57.848 7 ERROR ironic.common.neutron [None req-6737aef3-c823-4f7c-95ec-1c9f38b14faa a4dbb0dc59024c199843cea86603308b 9fd64a4cbd774756869cb3968de2e9b6 - - default default] Unable to clear binding profile for neutron port 291dbb7b-5cc8-480d-b39d-eb849bcb4a64. Error: ForbiddenException: 403: Client Error for url: http://192.0.2.10:9696/v2.0/ports/291dbb7b-5cc8-480d-b39d-eb849bcb4a64, ((rule:update_port and rule:update_port:binding:host_id) and rule:update_port:binding:profile) is disallowed by policy: openstack.exceptions.ForbiddenException: ForbiddenException: 403: Client Error for url: http://192.0.2.10:9696/v2.0/ports/291dbb7b-5cc8-480d-b39d-eb849bcb4a64, ((rule:update_port and rule:update_port:binding:host_id) and rule:update_port:binding:profile) is disallowed by policy 2024-02-12 11:44:57.848 7 ERROR ironic.common.neutron Traceback (most recent call last): 2024-02-12 11:44:57.848 7 ERROR ironic.common.neutron File "/var/lib/kolla/venv/lib64/python3.9/site-packages/ironic/common/neutron.py", line 130, in unbind_neutron_port 2024-02-12 11:44:57.848 7 ERROR ironic.common.neutron update_neutron_port(context, port_id, attrs_unbind, client) 2024-02-12 11:44:57.848 7 ERROR ironic.common.neutron File "/var/lib/kolla/venv/lib64/python3.9/site-packages/ironic/common/neutron.py", line 109, in
[Yahoo-eng-team] [Bug 2051685] [NEW] After repeat of incomplete migration nova applies wrong (status=error) migration context in update_available_resource periodic job
Public bug reported: The original problem observed in a downstream deployment was of overcommit on dedicated PCPUs and CPUPinningInvalid exception breaking update_available_resource periodic job. The following reproduction is not an end-to-end reproduction, but I hope I can demonstrate where things go wrong. The environment is a multi-node devstack: devstack0 - all-in-one devstack0a - compute Nova is backed by libvirt/qemu/kvm. devstack 6b0f055b nova on devstack0 39f560d673 nova on devstack0a a72f7eaac7 libvirt 8.0.0-1ubuntu7.8 qemu 1:6.2+dfsg-2ubuntu6.16 linux 5.15.0-91-generic # Clean up if not the first run. openstack server list -f value -c ID | xargs -r openstack server delete --wait openstack volume list --status available -f value -c ID | xargs -r openstack volume delete # Create a server on devstack0. openstack flavor create cirros256-pinned --public --vcpus 1 --ram 256 --disk 1 --property hw_rng:allowed=True --property hw:cpu_policy=dedicated openstack server create --flavor cirros256-pinned --image cirros-0.6.2-x86_64-disk --boot-from-volume 1 --nic net-id=private --availability-zone :devstack0 vm0 --wait # Start a live migration to devstack0a, but simulate a failure. In my environment a complete live migration takes around 20 seconds. Using 'sleep 3' it usually breaks in the 'preparing' status. # As far as I understand other kinds of migration (like cold migration) are also affected. openstack server migrate --live-migration vm0 --wait & sleep 2 ; ssh devstack0a sudo systemctl stop devstack@n-cpu $ openstack server migration list --server vm0 --sort-column 'Created At' +++-+++--++---++++++-+ | Id | UUID | Source Node | Dest Node | Source Compute | Dest Compute | Dest Host | Status| Server UUID| Old Flavor | New Flavor | Type | Created At | Updated At | +++-+++--++---++++++-+ | 33 | c7a42f9e-dfee- | devstack0 | devstack0a | devstack0 | devstack0a | 192.168.122.79 | preparing | a2b43180-8ad9- | 11 | 11 | live-migration | 2024-01- | 2024-01-| || 4a2c-b42a- | ||| || | 4c12-ad47- || || 29T12:41:40.00 | 29T12:41:42.00 | || a73b1a19c0c9 | ||| || | 12b8dd7a7384 || ||| | +++-+++--++---++++++-+ # After some timeout (around 60 s) the migration goes to 'error' status. $ openstack server migration list --server vm0 --sort-column 'Created At' ++-+-+++--+++-++++-+--+ | Id | UUID| Source Node | Dest Node | Source Compute | Dest Compute | Dest Host | Status | Server UUID | Old Flavor | New Flavor | Type | Created At | Updated At | ++-+-+++--+++-++++-+--+ | 33 | c7a42f9e-dfee-4a2c- | devstack0 | devstack0a | devstack0 | devstack0a | 192.168.122.79 | error | a2b43180-8ad9-4c12- | 11 | 11 | live-migration | 2024-01-| 2024-01- | || b42a-a73b1a19c0c9 | ||| ||| ad47-12b8dd7a7384 || || 29T12:41:40.00 | 29T12:42:42.00 | ++-+-+++--+++-++++-+--+ # Wait before restarting n-cpu on devstack0a. I don't think I fully understand the factors of when the migration ends up finally in failed or in error status. Currently it seems to me if I restart n-cpu too quickly the migration goes to the failed state right after restart.
[Yahoo-eng-team] [Bug 2051351] [NEW] explicity_egress_direct prevents learning of local MACs and causes flooding of ingress packets, firewall_driver = openvswitch
Public bug reported: I believe this issue was already reported earlier: https://bugs.launchpad.net/neutron/+bug/1884708 That bug has a fix committed: https://review.opendev.org/c/openstack/neutron/+/738551 However I believe the above change fixed only part of the issue (with firewall_driver=noop). But the same problem is still not fixed with firewall_driver=openvswitch. First, I re-opened bug #1884708, but then I realized that nobody will notice a several year old bug's status change, so I rather opened this new bug report instead. Reproduction: # config ml2_conf.ini: [securitygroup] firewall_driver = openvswitch [agent] explicitly_egress_direct = True [ovs] bridge_mappings = physnet0:br-physnet0,... # a random IP on net0 we can ping sudo ip link set up dev br-physnet0 sudo ip link add link br-physnet0 name br-physnet0.100 type vlan id 100 sudo ip link set up dev br-physnet0.100 sudo ip address add dev br-physnet0.100 10.0.100.1/24 # code devstack 6b0f055b neutron $ git log --oneline -n2 27601f8eea (HEAD, origin/bug/2048785, origin/HEAD) Set trunk parent port as access port in ovs to avoid loop 3ef02cc2fb (origin/master) Consume code from neutron-lib openvswitch 2.17.8-0ubuntu0.22.04.1 linux 5.15.0-91-generic # clean up first openstack server delete vm0 --wait openstack port delete port0 openstack network delete net1 net0 # build the environment openstack network create net0 --provider-network-type vlan --provider-physical-network physnet0 --provider-segment 100 openstack subnet create --network net0 --subnet-range 10.0.100.0/24 subnet0 openstack port create --no-security-group --disable-port-security --network net0 --fixed-ip ip-address=10.0.100.10 port0 openstack server create --flavor cirros256 --image cirros-0.6.2-x86_64-disk --nic port-id=port0 --availability-zone :devstack0a --wait vm0 # mac addresses for reference $ openstack port show port0 -f value -c mac_address fa:16:3e:96:58:ab $ ifdata -ph br-physnet0 82:E8:18:67:7E:40 # generate traffic that will keep fdb entries fresh sudo virsh console "$( openstack server show vm0 -f value -c OS-EXT-SRV-ATTR:instance_name )" ping 10.0.100.1 # clear all past junk for br in br-physnet0 br-int ; do sudo ovs-appctl fdb/flush "$br" ; done # br-int does not learn port0's mac despite the ongoing ping for br in br-physnet0 br-int ; do echo ">>> $br <<<" ; sudo ovs-appctl fdb/show "$br" | egrep -i "$( openstack port show port0 -f value -c mac_address )|$( ifdata -ph br-physnet0 )" ; done >>> br-physnet0 <<< 1 100 fa:16:3e:96:58:ab0 LOCAL 100 82:e8:18:67:7e:400 >>> br-int <<< 1 4 82:e8:18:67:7e:400 # port and physnet bridge mac in all fdbs, egress == vnic -> physnet bridge # in br-int we have a direct output action $ sudo ovs-appctl ofproto/trace br-int in_port="$( sudo ovs-vsctl -- --columns=ofport find Interface name=$( echo "tap$( openstack port show port0 -f value -c id )" | cut -b1-14 ) | awk '{ print $3 }' )",dl_vlan=0,dl_dst=$( ifdata -ph br-physnet0 ),dl_src=$( openstack port show port0 -f value -c mac_address ) Flow: in_port=45,dl_vlan=0,dl_vlan_pcp=0,vlan_tci1=0x,dl_src=fa:16:3e:96:58:ab,dl_dst=82:e8:18:67:7e:40,dl_type=0x bridge("br-int") 0. priority 0, cookie 0x2b36d6b4a42fe7b5 goto_table:58 58. priority 0, cookie 0x2b36d6b4a42fe7b5 goto_table:60 60. in_port=45, priority 100, cookie 0x2b36d6b4a42fe7b5 set_field:0x2d->reg5 set_field:0x4->reg6 resubmit(,73) 73. reg5=0x2d, priority 80, cookie 0x2b36d6b4a42fe7b5 resubmit(,94) 94. reg6=0x4,dl_src=fa:16:3e:96:58:ab,dl_dst=00:00:00:00:00:00/01:00:00:00:00:00, priority 10, cookie 0x2b36d6b4a42fe7b5 push_vlan:0x8100 set_field:4100->vlan_vid output:1 bridge("br-physnet0") - 0. in_port=1,dl_vlan=4, priority 4, cookie 0x85bc1a5077d54d3f set_field:4196->vlan_vid NORMAL -> forwarding to learned port Final flow: reg5=0x2d,reg6=0x4,in_port=45,dl_vlan=4,dl_vlan_pcp=0,dl_vlan1=0,dl_vlan_pcp1=0,dl_src=fa:16:3e:96:58:ab,dl_dst=82:e8:18:67:7e:40,dl_type=0x Megaflow: recirc_id=0,eth,in_port=45,dl_vlan=0,dl_vlan_pcp=0,dl_src=fa:16:3e:96:58:ab,dl_dst=82:e8:18:67:7e:40,dl_type=0x Datapath actions: pop_vlan,push_vlan(vid=100,pcp=0),1 # port and physnet bridge mac in all fdbs, ingress == physnet bridge -> vnic # in br-int we have the normal action flooding, despite the ongoing ping $ sudo ovs-appctl ofproto/trace br-physnet0 in_port=LOCAL,dl_vlan=100,dl_src=$( ifdata -ph br-physnet0 ),dl_dst=$( openstack port show port0 -f value -c mac_address ) Flow: in_port=LOCAL,dl_vlan=100,dl_vlan_pcp=0,vlan_tci1=0x,dl_src=82:e8:18:67:7e:40,dl_dst=fa:16:3e:96:58:ab,dl_type=0x bridge("br-physnet0") - 0. priority 0, cookie 0x85bc1a5077d54d3f NORMAL -> forwarding to learned port bridge("br-int") 0. in_port=1,dl_vlan=100, priority 3, cookie 0x2b36d6b4a42fe7b5 set_field:4100->vlan_vid goto_table:58 58.
[Yahoo-eng-team] [Bug 1884708] Re: explicity_egress_direct prevents learning of local MACs and causes flooding of ingress packets
** Changed in: neutron Status: New => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1884708 Title: explicity_egress_direct prevents learning of local MACs and causes flooding of ingress packets Status in neutron: Fix Released Bug description: We took this bug fix: https://bugs.launchpad.net/neutron/+bug/1732067 and then also backported ourselves https://bugs.launchpad.net/neutron/+bug/1866445 The latter is for iptables based firewall. We have VLAN based networks, and seeing ingress packets destined to local MACs being flooded. We are not seeing any local MACs present under ovs-appctl fdb/show br-int. Consider following example: HOST 1: MAC A = fa:16:3e:c1:01:43 MAC B = fa:16:3e:de:0b:8a HOST 2: MAC C = fa:16:3e:d6:3f:31 A is talking to C. Snooping on qvo interface of B, we are seeing all the traffic destined to MAC A (along with other unicast traffic not destined to or sourced from MAC B. Neither Mac A or B are present in br-int FDB, despite sending heavy traffic. Here is ofproto trace for such packet. in_port 8313 is qvo of MAC A: sudo ovs-appctl ofproto/trace br-int in_port=8313,tcp,dl_src=fa:16:3e:c1:01:43,dl_dst=fa:16:3e:d6:3f:31 Flow: tcp,in_port=8313,vlan_tci=0x,dl_src=fa:16:3e:c1:01:43,dl_dst=fa:16:3e:d6:3f:31,nw_src=0.0.0.0,nw_dst=0.0.0.0,nw_tos=0,nw_ecn=0,nw_ttl=0,tp_src=0,tp_dst=0,tcp_flags=0 bridge("br-int") 0. in_port=8313, priority 9, cookie 0x9a67096130ac45c2 goto_table:25 25. in_port=8313,dl_src=fa:16:3e:c1:01:43, priority 2, cookie 0x9a67096130ac45c2 goto_table:60 60. in_port=8313,dl_src=fa:16:3e:c1:01:43, priority 9, cookie 0x9a67096130ac45c2 resubmit(,61) 61. in_port=8313,dl_src=fa:16:3e:c1:01:43,dl_dst=00:00:00:00:00:00/01:00:00:00:00:00, priority 10, cookie 0x9a67096130ac45c2 push_vlan:0x8100 set_field:4098->vlan_vid output:1 bridge("br-ext") 0. in_port=2, priority 2, cookie 0xab09adf2af892674 goto_table:1 1. priority 0, cookie 0xab09adf2af892674 goto_table:2 2. in_port=2,dl_vlan=2, priority 4, cookie 0xab09adf2af892674 set_field:4240->vlan_vid NORMAL -> forwarding to learned port bridge("br-vlan") - 0. priority 1, cookie 0x651552fc69601a2d goto_table:3 3. priority 1, cookie 0x651552fc69601a2d NORMAL -> forwarding to learned port Final flow: tcp,in_port=8313,dl_vlan=2,dl_vlan_pcp=0,vlan_tci1=0x,dl_src=fa:16:3e:c1:01:43,dl_dst=fa:16:3e:d6:3f:31,nw_src=0.0.0.0,nw_dst=0.0.0.0,nw_tos=0,nw_ecn=0,nw_ttl=0,tp_src=0,tp_dst=0,tcp_flags=0 Megaflow: recirc_id=0,eth,ip,in_port=8313,vlan_tci=0x/0x1fff,dl_src=fa:16:3e:c1:01:43,dl_dst=fa:16:3e:d6:3f:31,nw_frag=no Datapath actions: push_vlan(vid=144,pcp=0),51 Because it took output: action from table=61, added by fix explicitly_egress_direct, the local MAC is not learned. But on ingress, the packet is hitting table=60's NORMAL action, causing it to be flooded because it never knows where to send the local MAC. sudo ovs-appctl ofproto/trace br-int in_port=1,dl_vlan=144,dl_src=fa:16:3e:d6:3f:31,dl_dst=fa:16:3e:c1:01:43 Flow: in_port=1,dl_vlan=144,dl_vlan_pcp=0,vlan_tci1=0x,dl_src=fa:16:3e:d6:3f:31,dl_dst=fa:16:3e:c1:01:43,dl_type=0x bridge("br-int") 0. in_port=1,dl_vlan=144, priority 3, cookie 0x9a67096130ac45c2 set_field:4098->vlan_vid goto_table:60 60. priority 3, cookie 0x9a67096130ac45c2 NORMAL -> no learned MAC for destination, flooding bridge("br-vlan") - 0. in_port=4, priority 2, cookie 0x651552fc69601a2d goto_table:1 1. priority 0, cookie 0x651552fc69601a2d goto_table:2 2. in_port=4, priority 2, cookie 0x651552fc69601a2d drop bridge("br-tun") 0. in_port=1, priority 1, cookie 0xf1baf24d000c6f7c goto_table:1 1. priority 0, cookie 0xf1baf24d000c6f7c goto_table:2 2. dl_dst=00:00:00:00:00:00/01:00:00:00:00:00, priority 0, cookie 0xf1baf24d000c6f7c goto_table:20 20. priority 0, cookie 0xf1baf24d000c6f7c goto_table:22 22. priority 0, cookie 0xf1baf24d000c6f7c drop Final flow: in_port=1,dl_vlan=2,dl_vlan_pcp=0,vlan_tci1=0x,dl_src=fa:16:3e:d6:3f:31,dl_dst=fa:16:3e:c1:01:43,dl_type=0x Megaflow: recirc_id=0,eth,in_port=1,dl_vlan=144,dl_vlan_pcp=0,dl_src=fa:16:3e:d6:3f:31,dl_dst=fa:16:3e:c1:01:43,dl_type=0x Datapath actions:
[Yahoo-eng-team] [Bug 1884708] Re: explicity_egress_direct prevents learning of local MACs and causes flooding of ingress packets
I'm reopening this because I believe the fix committed fixes only part of the problem. With firewall_driver=noop the unnecessary ingress flooding on br-int is gone. However we still have the same unnecessary flooding with firewall_driver=openvswitch. For details and a full reproduction please comments to bug #2048785: https://bugs.launchpad.net/neutron/+bug/2048785/comments/2 https://bugs.launchpad.net/neutron/+bug/2048785/comments/6 ** Changed in: neutron Status: Fix Released => New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1884708 Title: explicity_egress_direct prevents learning of local MACs and causes flooding of ingress packets Status in neutron: New Bug description: We took this bug fix: https://bugs.launchpad.net/neutron/+bug/1732067 and then also backported ourselves https://bugs.launchpad.net/neutron/+bug/1866445 The latter is for iptables based firewall. We have VLAN based networks, and seeing ingress packets destined to local MACs being flooded. We are not seeing any local MACs present under ovs-appctl fdb/show br-int. Consider following example: HOST 1: MAC A = fa:16:3e:c1:01:43 MAC B = fa:16:3e:de:0b:8a HOST 2: MAC C = fa:16:3e:d6:3f:31 A is talking to C. Snooping on qvo interface of B, we are seeing all the traffic destined to MAC A (along with other unicast traffic not destined to or sourced from MAC B. Neither Mac A or B are present in br-int FDB, despite sending heavy traffic. Here is ofproto trace for such packet. in_port 8313 is qvo of MAC A: sudo ovs-appctl ofproto/trace br-int in_port=8313,tcp,dl_src=fa:16:3e:c1:01:43,dl_dst=fa:16:3e:d6:3f:31 Flow: tcp,in_port=8313,vlan_tci=0x,dl_src=fa:16:3e:c1:01:43,dl_dst=fa:16:3e:d6:3f:31,nw_src=0.0.0.0,nw_dst=0.0.0.0,nw_tos=0,nw_ecn=0,nw_ttl=0,tp_src=0,tp_dst=0,tcp_flags=0 bridge("br-int") 0. in_port=8313, priority 9, cookie 0x9a67096130ac45c2 goto_table:25 25. in_port=8313,dl_src=fa:16:3e:c1:01:43, priority 2, cookie 0x9a67096130ac45c2 goto_table:60 60. in_port=8313,dl_src=fa:16:3e:c1:01:43, priority 9, cookie 0x9a67096130ac45c2 resubmit(,61) 61. in_port=8313,dl_src=fa:16:3e:c1:01:43,dl_dst=00:00:00:00:00:00/01:00:00:00:00:00, priority 10, cookie 0x9a67096130ac45c2 push_vlan:0x8100 set_field:4098->vlan_vid output:1 bridge("br-ext") 0. in_port=2, priority 2, cookie 0xab09adf2af892674 goto_table:1 1. priority 0, cookie 0xab09adf2af892674 goto_table:2 2. in_port=2,dl_vlan=2, priority 4, cookie 0xab09adf2af892674 set_field:4240->vlan_vid NORMAL -> forwarding to learned port bridge("br-vlan") - 0. priority 1, cookie 0x651552fc69601a2d goto_table:3 3. priority 1, cookie 0x651552fc69601a2d NORMAL -> forwarding to learned port Final flow: tcp,in_port=8313,dl_vlan=2,dl_vlan_pcp=0,vlan_tci1=0x,dl_src=fa:16:3e:c1:01:43,dl_dst=fa:16:3e:d6:3f:31,nw_src=0.0.0.0,nw_dst=0.0.0.0,nw_tos=0,nw_ecn=0,nw_ttl=0,tp_src=0,tp_dst=0,tcp_flags=0 Megaflow: recirc_id=0,eth,ip,in_port=8313,vlan_tci=0x/0x1fff,dl_src=fa:16:3e:c1:01:43,dl_dst=fa:16:3e:d6:3f:31,nw_frag=no Datapath actions: push_vlan(vid=144,pcp=0),51 Because it took output: action from table=61, added by fix explicitly_egress_direct, the local MAC is not learned. But on ingress, the packet is hitting table=60's NORMAL action, causing it to be flooded because it never knows where to send the local MAC. sudo ovs-appctl ofproto/trace br-int in_port=1,dl_vlan=144,dl_src=fa:16:3e:d6:3f:31,dl_dst=fa:16:3e:c1:01:43 Flow: in_port=1,dl_vlan=144,dl_vlan_pcp=0,vlan_tci1=0x,dl_src=fa:16:3e:d6:3f:31,dl_dst=fa:16:3e:c1:01:43,dl_type=0x bridge("br-int") 0. in_port=1,dl_vlan=144, priority 3, cookie 0x9a67096130ac45c2 set_field:4098->vlan_vid goto_table:60 60. priority 3, cookie 0x9a67096130ac45c2 NORMAL -> no learned MAC for destination, flooding bridge("br-vlan") - 0. in_port=4, priority 2, cookie 0x651552fc69601a2d goto_table:1 1. priority 0, cookie 0x651552fc69601a2d goto_table:2 2. in_port=4, priority 2, cookie 0x651552fc69601a2d drop bridge("br-tun") 0. in_port=1, priority 1, cookie 0xf1baf24d000c6f7c goto_table:1 1. priority 0, cookie 0xf1baf24d000c6f7c goto_table:2 2. dl_dst=00:00:00:00:00:00/01:00:00:00:00:00, priority 0, cookie 0xf1baf24d000c6f7c goto_table:20 20. priority 0, cookie 0xf1baf24d000c6f7c goto_table:22 22. priority 0, cookie 0xf1baf24d000c6f7c drop Final flow: in_port=1,dl_vlan=2,dl_vlan_pcp=0,vlan_tci1=0x,dl_src=fa:16:3e:d6:3f:31,dl_dst=fa:16:3e:c1:01:43,dl_type=0x
[Yahoo-eng-team] [Bug 2048785] [NEW] Trunk parent port (tpt port) vlan_mode is wrong in ovs
k port show port0b -f value -c mac_address )" openstack port create --no-security-group --disable-port-security --mac-address "$port0b_mac" --network net1 --fixed-ip ip-address=10.0.101.11 port1b openstack network trunk create --parent-port port0a trunka openstack network trunk set --subport port=port1a,segmentation-type=vlan,segmentation-id=101 trunka openstack network trunk create --parent-port port0b trunkb openstack network trunk set --subport port=port1b,segmentation-type=vlan,segmentation-id=101 trunkb openstack server create --flavor ds1G --image u1804 --nic port-id=port0a --wait vma openstack server create --flavor ds1G --image u1804 --nic port-id=port0b --wait vmb # booted on the same compute as vma At the moment I don't have a reproduction independent of that environment, that re-creates the same state of the bridges' FDBs and the same kind of traffic. Anyway, in this environment colleagues observed: * Lost frames. * Duplicated frames arriving to the vNIC of one of the VMs. * Unexpectedly double tagged frames on the physical bridge leaving the compute host. Local analysis showed as the traffic arrived to br-int, which did not have the dst MAC in its FDB, had to flood to all ports. This way the frame ended up on both trunk bridges. One of these trunk bridges was on the proper way to the destination address. But the other trunk bridge, also not having the dst MAC in its FDB, had to flood to all ports. And this trunk bridge also flooded the frame to its tpt port back to br-int. But the tpt port conceptually is in a different VLAN and the frame should never have been flooded to that port. However the tpt port has the wrong configuration and forwards the traffic from the wrong VLANs. After the looped frame got back to br-int, it reached the intended VMs vNIC via the trunk parent (sic!) port. Which means that the latter trunk bridge learned the traffic generator's source MAC now on the wrong port. I have a suspicion that this may have lead to the unexpectedly double tagged packets in the other direction. ** Affects: neutron Importance: Undecided Assignee: Bence Romsics (bence-romsics) Status: In Progress ** Tags: trunk -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/2048785 Title: Trunk parent port (tpt port) vlan_mode is wrong in ovs Status in neutron: In Progress Bug description: ... therefore a forwarding loop, packet duplication, packet loss and double tagging is possible. Today a trunk bridge with one parent and one subport looks like this: # ovs-vsctl show ... Bridge tbr-b2781877-3 datapath_type: system Port spt-28c9689e-9e tag: 101 Interface spt-28c9689e-9e type: patch options: {peer=spi-28c9689e-9e} Port tap3709f1a1-a5 Interface tap3709f1a1-a5 Port tpt-3709f1a1-a5 Interface tpt-3709f1a1-a5 type: patch options: {peer=tpi-3709f1a1-a5} Port tbr-b2781877-3 Interface tbr-b2781877-3 type: internal ... # ovs-vsctl find Port name=tpt-3709f1a1-a5 | egrep 'tag|vlan_mode|trunks' tag : [] trunks : [] vlan_mode : [] # ovs-vsctl find Port name=spt-28c9689e-9e | egrep 'tag|vlan_mode|trunks' tag : 101 trunks : [] vlan_mode : [] I believe the vlan_mode of the tpt port is wrong (at least when the port is not "vlan_transparent") and it should have the value "access". Even when the port is "vlan_transparent", forwarding loops between br-int and a trunk bridge should be prevented. According to: http://www.openvswitch.org/support/dist-docs/ovs- vswitchd.conf.db.5.txt """ vlan_mode: optional string, one of access, dot1q-tunnel, native-tagged, native-untagged, or trunk The VLAN mode of the port, as described above. When this column is empty, a default mode is selected as follows: • If tag contains a value, the port is an access port. The trunks column should be empty. • Otherwise, the port is a trunk port. The trunks column value is honored if it is present. """ """ trunks: set of up to 4,096 integers, in range 0 to 4,095 For a trunk, native-tagged, or native-untagged port, the 802.1Q VLAN or VLANs that this port trunks; if it is empty, then the port trunks all VLANs. Must be empty if this is an access port. A native-tagged or native-untagged port always trunks its native VLAN, regardless of whether trunks i
[Yahoo-eng-team] [Bug 2042598] Re: neutron_server container suspended in health:starting state
Hi, Thanks for the report! At first glance this looks like a deployment problem, not a neutron bug. From neutron perspective there's no clear error symptom described (other than "networking does not work"). And no neutron log (the attached "log from neutron_server" stops right when neutron-server is started). Even if there is a neutron bug, this is not enough to identify and/or debug it. I'm no kolla expert (not even a kolla user), but I would recommend that you turn with your questions to kolla folks, for example on their irc channel (#kolla on irc.oftc.net, archives: https://meetings.opendev.org/) or on the mailing list (https://lists.openstack.org/mailman3/lists/openstack- discuss.lists.openstack.org/). It would also help in debugging if you collected actual neutron-server logs to see why it did not start properly. Hope this helps, Bence ** Changed in: neutron Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/2042598 Title: neutron_server container suspended in health:starting state Status in neutron: Invalid Bug description: I installed OpenStack (zed) on a Raspberry Pi cluster with kolla- ansible (version tagged for zed), all containers are healthy except the neutron_server which is suspended in 'health: starting' state. Network related part of OpenStack does not work. Some other commands commands work as expected (e.g., can create an image which is reported by openstack image list as 'active'). There are four Raspberry Pi 4B in the cluster (2 x 4GB RAM and 2 x 8GB RAM). They run Debian 11 (bullseaye) and kolla-ansible has been used for the installation. Notably, I'm using a specific configuration of networking on my Pis to mimic two network interfaces on each host as kolla-ansible expects. These are provided as interfaces of veth pairs (more details on that below, too). Below, one can find: 1. configuration commands I used to configure my Pi hosts (this panel) 2. environment details related to the Pis (the one serving as controller in OpenStack) and kolla-ansible install information (this panel) 3. ml2_conf.ini and nova-compute.conf configuration used in kolla-ansible 4. kolla-ansible files: globals.yml (4.1) and inventory multinode (4.2) - changed parts - this panel - complete versions - attachments 5. HttpException: 503 message from running init-runonce (kolla-ansible test script for new installation) (this panel) 6. status of containers on the control node as reported by 'docker ps -a' (this panel) 7. output form docker neutron_server inspect command (attachment) 8. log form neutron_server container (attachment) * 1. Debian configuration on the Pis * Selected details fo the configuration are given in the following. Basically, most of them are needed to configure Pis' host networking using netplan. Another one relates to qemu-kvm. (Note: initial configs to enable ssh access should be done locally (keyboard, monitor) on each Pi, in particular: PermitRootLogin yes PasswordAuthentication yes I skip the details of enabling ssh access, though. Below, I assume ssh acces as a regular (non-root) user. ) === Preparation for host networking setup === $ sudo apt-get remove unattended-upgrades -y $ sudo apt-get update -y && sudo apt-get upgrade -y - updating $PATH for a user $ sudo tee -a ~/.bashrc << EOT export PATH=$PATH:/usr/local/sbin:/usr/sbin:/sbin EOT $ source ~/.bashrc - enable systemd-networkd and configure eth0 for ssh access (neede to use ssh; not neede if one does stuff locally, attaching keyboard and monitor to each Pi) - enabling systemd-networkd $ sudo mv /etc/network/interfaces /etc/network/interfaces.save $ sudo mv /etc/network/interfaces.d /etc/network/interfaces.d.save $ sudo systemctl enable systemd-networkd && sudo systemctl start systemd-networkd $ sudo systemctl status systemd-networkd - configure eth0 (in may case, I've configured static DHCP for each Pi on my DHCP server) $ sudo tee /etc/systemd/network/20-wired.network << EOT [Match] Name=eth0 [Network] DHCP=yes EOT - install netplan $ sudo apt update && sudo apt -y install netplan.io $ sudo reboot - enable ip forwarding $ sudo nano /etc/sysctl.conf ===> uncomment the line: net.ipv4.ip_forward=1 $sudo sysctl -p = Host networking setup == - network setup on each Pi host - drawing: 192.168.1.xy/24 bez adresu IP +-+ +-+ | veth0 | | veth1 |< network-interface and network-external-interface for kolla-ansible +-+ +-+ | veth pairs | +-+ +-+ | veth0br | | veth1br |
[Yahoo-eng-team] [Bug 2042089] Re: neutron : going to shared network is working, going back not
Hi, Thanks for the report! I'm not sure if the behavior you describe is a bug. If multiple projects are actually using a shared network, why would you expect it to be unshared without an error? How should such a network work when it's shared=False but it has multiple tenants on it? Maybe I'm missing what you mean. In that case can you please give me a series of commands, inlcuding which one should behave differently and how? ** Changed in: neutron Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/2042089 Title: neutron : going to shared network is working, going back not Status in neutron: Invalid Bug description: We have admin-generated provider-networks. Projects are allowed to create ports and instances on these networks. When we now set the "shared" property on these networks, we are no longer allowed to unset this property. We get the error : "Unable to reconfigure sharing settings for network net.vlan10.provider. Multiple tenants are using it.". Once all ports and instances created by non-admin projects are removed we can again unset the "shared" property. So, we are allowed to set a parameter for which it is afterwards no longer possible to unset. We have now a network that is visible by all and we do not prefer this situation. Removing the corresponding RBAC policy is also not allowed. This is a OpenStack-Ansible installation with version Yoga. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/2042089/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1838760] Re: Security groups don't work for trunk ports with iptables_hybrid fw driver
I believe regarding this bug report what could be done, has been done. Other fixes are not going to happen, therefore I'm setting this to Won't Fix, to clean up the open bug list. ** Changed in: neutron Status: Confirmed => Won't Fix -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1838760 Title: Security groups don't work for trunk ports with iptables_hybrid fw driver Status in neutron: Won't Fix Bug description: When iptables_hybrid firewall driver is used, security groups don't work for trunk ports as vlan tagged packes on qbr bridge aren't filtered by default at all. I found it when I was trying to add new CI job https://review.opendev.org/#/c/670738/ and I noticed that this job is failing constantly on Queens release. On Rocky and newer this new job is fine and the difference between those jobs is firewall_driver - since rocky we are using openvswitch fw driver instead of iptables_hybrid. I also confirmed locally that when I switched firewall driver to openvswitch, same test worked fine for me. I did some debugging on Queens release locally and it looks that flag /proc/sys/net/bridge/bridge-nf-filter-vlan-tagged should be set to 1 to make it possible to filter vlan tagged traffic in iptables, see https://ebtables.netfilter.org/documentation/bridge-nf.html for details. But even if this knob is switched to "1", there are probably bigger changes required as vlan header which belongs to those packets should be included in iptables rules to match on proper packets. My test was done on stable/queens branch of neutron but I'm pretty sure that the same issue exists still in master. We simply don't see it as we are testing it with openvswitch fw driver. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1838760/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 2028544] Re: dhcp agent binding count greather than dhcp_agents_per_network
** Changed in: neutron Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/2028544 Title: dhcp agent binding count greather than dhcp_agents_per_network Status in neutron: Invalid Bug description: neutron version: train dhcp_agents_per_network = 2 execute command "neutron dhcp-agent-network-add" bind a network to dhcp agent, but not check configuration dhcp_agents_per_network. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/2028544/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 2025480] [NEW] overlapping pinned CPUs after unshelve
Public bug reported: It seems that after unshelve, occasionally the request for a dedicated CPU is ignored. More precisely the first pinned CPU does not seem to be marked as consumed, so the second may end up on the same CPU. This was first observed on victoria (6 times out of 46 tries), but then I was able to reproduce it on master too (6 times out of 20 tries). The logs attached are from the victoria environment, which was a single-host all- in-one devstack running only the vms used for this reprouduction. stable/victoria devstack 3eb6e2d7 nova 1aca09b966 master devstack b10c0602 nova 2aea80c0af config: [[post-config|/etc/nova/nova.conf]] [DEFAULT] scheduler_default_filters = NUMATopologyFilter, ... [compute] cpu_dedicated_set = 0,1 Confirming this config in placement: $ openstack --os-placement-api-version 1.17 resource provide inventory show 46b3d4de-bb45-4607-8860-040eb2dcd0d7 PCPU +--+---+ | Field| Value | +--+---+ | allocation_ratio | 1.0 | | min_unit | 1 | | max_unit | 2 | | reserved | 0 | | step_size| 1 | | total| 2 | +--+---+ Reproduction steps: openstack flavor create cirros256-pinned --public --vcpus 1 --ram 256 --disk 1 --property hw_rng:allowed=True --property hw:cpu_policy=dedicated openstack server list -f value -c ID | xargs -r openstack server delete --wait openstack server create --flavor cirros256-pinned --image cirros-0.5.1-x86_64-disk --nic net-id=private vm0 --wait openstack server shelve vm0 sleep 10 # make sure shelve finished openstack server create --flavor cirros256-pinned --image cirros-0.5.1-x86_64-disk --nic net-id=private vm1 --wait openstack server shelve vm1 sleep 10 openstack server unshelve vm0 ; sleep 15 ; openstack server unshelve vm1 # the amount of sleep could easily be relevant watch openstack server list # wait until both go ACTIVE # both vms ended up on the same cpu $ for vm in $( sudo virsh list --name ) ; do sudo virsh dumpxml $vm | xmlstarlet sel -t -v '//vcpupin/@cpuset' ; echo ; done 0 0 Data collected from the environment where the above reproduction triggered the bug: $ openstack server list +--+--+++--+--+ | ID | Name | Status | Networks | Image| Flavor | +--+--+++--+--+ | 4734b8a5-a6dd-432a-86c9-ba0367bb86cc | vm1 | ACTIVE | private=10.0.0.27, fdfb:ab27:b2b2:0:f816:3eff:fe80:2fd | cirros-0.5.1-x86_64-disk | cirros256-pinned | | e30de509-6988-4535-a6f5-520c52fba087 | vm0 | ACTIVE | private=10.0.0.6, fdfb:ab27:b2b2:0:f816:3eff:fe78:d368 | cirros-0.5.1-x86_64-disk | cirros256-pinned | +--+--+++--+--+ $ openstack server show vm0 +-+-+ | Field | Value | +-+-+ | OS-DCF:diskConfig | MANUAL | | OS-EXT-AZ:availability_zone | nova | | OS-EXT-SRV-ATTR:host| devstack1v | | OS-EXT-SRV-ATTR:hypervisor_hostname | devstack1v | | OS-EXT-SRV-ATTR:instance_name | instance-001f | | OS-EXT-STS:power_state | Running | | OS-EXT-STS:task_state | None | | OS-EXT-STS:vm_state | active | | OS-SRV-USG:launched_at | 2023-06-29T10:45:25.00 | | OS-SRV-USG:terminated_at| None | | accessIPv4 | | | accessIPv6 | | | addresses | private=10.0.0.6, fdfb:ab27:b2b2:0:f816:3eff:fe78:d368 | | config_drive|
[Yahoo-eng-team] [Bug 2025341] [NEW] flows lost with noop firewall driver at ovs-agent restart while the db is down
Public bug reported: If we restart ovs-agent while neutron-server is up but neutron DB is down, then the agent deletes and cannot recover the per-port flows, if we also use the noop firewall driver. Because the affected flows include the mod_vlan_vid flows this means traffic loss until another agent restart (with the db up) or a full successful resync happens. For example: [securitygroup] firewall_driver = noop openstack server delete vm0 --wait openstack server create --flavor cirros256-pinned --image cirros-0.5.2-x86_64-disk --nic net-id=private vm0 --wait sudo ovs-ofctl dump-flows br-int > ~/noop-db-stop.1 # execute these by hand and make sure that each command took effect before moving on to the next sudo systemctl stop mysql sudo systemctl restart devstack@q-agt sudo ovs-ofctl dump-flows br-int > ~/noop-db-stop.2 # diff the flows (for the sake of simplicity this devstack environment has a single vm with a single port, started above) a=1 ; b=2 ; base=noop-db-stop. ; colordiff -u <( cat ~/$base$a | egrep -v ^NXST_FLOW | sed -r -e 's/(cookie|duration|n_packets|n_bytes|idle_age|hard_age)=[^ ]+ //g' -e 's/^ *//' -e 's/, +/ /g' | sort ) <( cat ~/$base$b | egrep -v ^NXST_FLOW | sed -r -e 's/(cookie|duration|n_packets|n_bytes|idle_age|hard_age)=[^ ]+ //g' -e 's/^ *//' -e 's/, +/ /g' | sort ) --- /dev/fd/63 2023-06-29 08:10:00.142623814 + +++ /dev/fd/62 2023-06-29 08:10:00.142623814 + @@ -1,19 +1,10 @@ table=0 priority=0 actions=resubmit(,58) -table=0 priority=10,arp,in_port=12 actions=resubmit(,24) -table=0 priority=10,icmp6,in_port=12,icmp_type=136 actions=resubmit(,24) table=0 priority=200,reg3=0 actions=set_queue:0,load:0x1->NXM_NX_REG3[0],resubmit(,0) table=0 priority=2,in_port=1 actions=drop table=0 priority=2,in_port=2 actions=drop -table=0 priority=3,in_port=1,vlan_tci=0x/0x1fff actions=mod_vlan_vid:2,resubmit(,58) -table=0 priority=3,in_port=2,dl_vlan=100 actions=mod_vlan_vid:3,resubmit(,58) table=0 priority=65535,dl_vlan=4095 actions=drop -table=0 priority=9,in_port=12 actions=resubmit(,25) table=23 priority=0 actions=drop table=24 priority=0 actions=drop -table=24 priority=2,arp,in_port=12,arp_spa=10.0.0.19 actions=resubmit(,25) -table=24 priority=2,icmp6,in_port=12,icmp_type=136,nd_target=fd17:d094:5207:0:f816:3eff:fe8e:b23f actions=resubmit(,58) -table=24 priority=2,icmp6,in_port=12,icmp_type=136,nd_target=fe80::f816:3eff:fe8e:b23f actions=resubmit(,58) -table=25 priority=2,in_port=12,dl_src=fa:16:3e:8e:b2:3f actions=resubmit(,30) table=30 priority=0 actions=resubmit(,58) table=31 priority=0 actions=resubmit(,58) table=58 priority=0 actions=resubmit(,60) The same loss of flows does not happen with the openvswitch firewall driver: [securitygroup] firewall_driver = openvswitch openstack server delete vm0 --wait openstack server create --flavor cirros256-pinned --image cirros-0.5.2-x86_64-disk --nic net-id=private vm0 --wait sudo ovs-ofctl dump-flows br-int > ~/openvswitch-db-stop.1 sudo systemctl stop mysql sudo systemctl restart devstack@q-agt sudo ovs-ofctl dump-flows br-int > ~/openvswitch-db-stop.2 a=1 ; b=2 ; base=openvswitch-db-stop. ; colordiff -u <( cat ~/$base$a | egrep -v ^NXST_FLOW | sed -r -e 's/(cookie|duration|n_packets|n_bytes|idle_age|hard_age)=[^ ]+ //g' -e 's/^ *//' -e 's/, +/ /g' | sort ) <( cat ~/$base$b | egrep -v ^NXST_FLOW | sed -r -e 's/(cookie|duration|n_packets|n_bytes|idle_age|hard_age)=[^ ]+ //g' -e 's/^ *//' -e 's/, +/ /g' | sort ) [no diff] The same loss of flows does not happen either if neutron-server is down while ovs-agent restarts: [securitygroup] firewall_driver = noop openstack server delete vm0 --wait openstack server create --flavor cirros256-pinned --image cirros-0.5.2-x86_64-disk --nic net-id=private vm0 --wait sudo ovs-ofctl dump-flows br-int > ~/noop-server-stop.1 sudo systemctl stop devstack@q-svc sudo systemctl restart devstack@q-agt sudo ovs-ofctl dump-flows br-int > ~/noop-server-stop.2 a=1 ; b=2 ; base=noop-server-stop. ; colordiff -u <( cat ~/$base$a | egrep -v ^NXST_FLOW | sed -r -e 's/(cookie|duration|n_packets|n_bytes|idle_age|hard_age)=[^ ]+ //g' -e 's/^ *//' -e 's/, +/ /g' | sort ) <( cat ~/$base$b | egrep -v ^NXST_FLOW | sed -r -e 's/(cookie|duration|n_packets|n_bytes|idle_age|hard_age)=[^ ]+ //g' -e 's/^ *//' -e 's/, +/ /g' | sort ) [no diff] devstack b10c0602 neutron 0c5d4b8728 I'll push a proposed fix soon. ** Affects: neutron Importance: Undecided Assignee: Bence Romsics (bence-romsics) Status: New ** Tags: ovs -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/2025341 Title: flows lost with noop firewall driver at ovs-agent restart while the db is down Status in neutron: New Bug description: If we restart ovs-agent while neutron-server is up but neutron DB is down, then the agent deletes and can
[Yahoo-eng-team] [Bug 2008712] [NEW] Security group rule deleted by cascade (because its remote group had been deleted) is not deleted in the backend
Public bug reported: devstack 7533276c neutron aa40aef70f This reproduction uses the openvswitch ml2 mechanism_driver and firewall_driver, but I believe this bug affects all mechanism_drivers. # Choose a port number no other rule uses on the test host. $ sudo ovs-ofctl dump-flows br-int | egrep 1234 [nothing] # Create two security groups. $ openstack security group create sg1 $ openstack security group create sg2 # Create a rule in sg1 that references sg2 (as remote group). $ openstack security group rule create sg1 --ingress --ethertype IPv4 --dst-port 1234:1234 --protocol tcp --remote-group sg2 # The API returns the new rule. $ openstack security group rule list sg1 +--+-+---+---++---+--+--+ | ID | IP Protocol | Ethertype | IP Range | Port Range | Direction | Remote Security Group| Remote Address Group | +--+-+---+---++---+--+--+ | 77db9548-b3ab-46ea-94a5-f00f6a4062da | None| IPv4 | 0.0.0.0/0 | | egress| None | None | | 9b569a88-177a-4422-a0f3-6ed039e0217a | tcp | IPv4 | 0.0.0.0/0 | 1234:1234 | ingress | 7df90218-3d52-4156-9630-43563a3d5ba6 | None | | f40d258b-4d13-4dc8-a0c4-82ccce9922e0 | None| IPv6 | ::/0 | | egress| None | None | +--+-+---+---++---+--+--+ # Make sure sg1 is used on the test host. $ openstack server create --flavor cirros256 --image cirros-0.5.2-x86_64-disk --availability-zone :devstack0 --nic net-id=private --security-group sg1 vm1 --wait # See if the rule is implemented in the backend. $ sudo ovs-ofctl dump-flows br-int | egrep 1234 cookie=0x33704a39bf5031d7, duration=55.263s, table=82, n_packets=0, n_bytes=0, idle_age=57, priority=73,ct_state=+est-rel-rpl,tcp,reg5=0x20,tp_dst=1234 actions=conjunction(22,2/2) cookie=0x33704a39bf5031d7, duration=55.263s, table=82, n_packets=0, n_bytes=0, idle_age=57, priority=73,ct_state=+new-est,tcp,reg5=0x20,tp_dst=1234 actions=conjunction(23,2/2) # Delete sg2... $ openstack security group delete sg2 # ...by cascade also delete the rule in sg1 referencing sg2. At least in the API. $ openstack security group rule list sg1 +--+-+---+---++---+---+--+ | ID | IP Protocol | Ethertype | IP Range | Port Range | Direction | Remote Security Group | Remote Address Group | +--+-+---+---++---+---+--+ | 77db9548-b3ab-46ea-94a5-f00f6a4062da | None| IPv4 | 0.0.0.0/0 | | egress| None | None | | f40d258b-4d13-4dc8-a0c4-82ccce9922e0 | None| IPv6 | ::/0 | | egress| None | None | +--+-+---+---++---+---+--+ # But the delete is not propagated to the backend. $ sudo ovs-ofctl dump-flows br-int | egrep 1234 cookie=0x33704a39bf5031d7, duration=112.917s, table=82, n_packets=0, n_bytes=0, idle_age=115, priority=73,ct_state=+est-rel-rpl,tcp,reg5=0x20,tp_dst=1234 actions=conjunction(22,2/2) cookie=0x33704a39bf5031d7, duration=112.917s, table=82, n_packets=0, n_bytes=0, idle_age=115, priority=73,ct_state=+new-est,tcp,reg5=0x20,tp_dst=1234 actions=conjunction(23,2/2) # Clean up - even the left over backend flows. $ openstack server delete vm1 --wait $ sudo ovs-ofctl dump-flows br-int | egrep 1234 [nothing] $ openstack security group delete sg2 $ openstack security group delete sg1 ** Affects: neutron Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/2008712 Title: Security group rule deleted by cascade (because its remote group had been deleted) is not deleted in the backend Status in neutron: New Bug description: devstack 7533276c neutron aa40aef70f This reproduction uses the openvswitch ml2 mechanism_driver and
[Yahoo-eng-team] [Bug 2003553] [NEW] Some port attributes are ignored in bulk port create: allowed_address_pairs, extra_dhcp_opts
Public bug reported: It seems the bulk port create API ignores some of the port attributes it receives: export TOKEN="$( openstack token issue -f value -c id )" # bulk equivalent of # openstack --debug port create port0 --network private --allowed-address ip-address=10.0.0.1,mac-address=01:23:45:67:89:ab curl -s -H "Content-Type: application/json" -H "X-Auth-Token: $TOKEN" -d "{\"ports\":[{\"name\":\"port0\",\"network_id\":\"$( openstack net show private -f value -c id )\",\"allowed_address_pairs\":[{\"ip_address\":\"10.0.0.1\",\"mac_address\":\"01:23:45:67:89:ab\"}]}]}" -X POST http://127.0.0.1:9696/networking/v2.0/ports | json_pp ... "allowed_address_pairs" : [], ... # bulk equivalent of # openstack --debug port create port0 --network private --extra-dhcp-option name=domain-name-servers,value=10.0.0.1,ip-version=4 curl -s -H "Content-Type: application/json" -H "X-Auth-Token: $TOKEN" -d "{\"ports\":[{\"name\":\"port0\",\"network_id\":\"$( openstack net show private -f value -c id )\",\"extra_dhcp_opts\":[{\"opt_name\":\"domain-name-servers\",\"opt_value\":\"10.0.0.1\",\"ip_version\":\"4\"}]}]}" -X POST http://127.0.0.1:9696/networking/v2.0/ports | json_pp ... "extra_dhcp_opts" : [], ... neutron b71b25820be6d61ed9f249eddf32bfa49ac76524 ** Affects: neutron Importance: Undecided Assignee: Bence Romsics (bence-romsics) Status: New ** Tags: api -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/2003553 Title: Some port attributes are ignored in bulk port create: allowed_address_pairs, extra_dhcp_opts Status in neutron: New Bug description: It seems the bulk port create API ignores some of the port attributes it receives: export TOKEN="$( openstack token issue -f value -c id )" # bulk equivalent of # openstack --debug port create port0 --network private --allowed-address ip-address=10.0.0.1,mac-address=01:23:45:67:89:ab curl -s -H "Content-Type: application/json" -H "X-Auth-Token: $TOKEN" -d "{\"ports\":[{\"name\":\"port0\",\"network_id\":\"$( openstack net show private -f value -c id )\",\"allowed_address_pairs\":[{\"ip_address\":\"10.0.0.1\",\"mac_address\":\"01:23:45:67:89:ab\"}]}]}" -X POST http://127.0.0.1:9696/networking/v2.0/ports | json_pp ... "allowed_address_pairs" : [], ... # bulk equivalent of # openstack --debug port create port0 --network private --extra-dhcp-option name=domain-name-servers,value=10.0.0.1,ip-version=4 curl -s -H "Content-Type: application/json" -H "X-Auth-Token: $TOKEN" -d "{\"ports\":[{\"name\":\"port0\",\"network_id\":\"$( openstack net show private -f value -c id )\",\"extra_dhcp_opts\":[{\"opt_name\":\"domain-name-servers\",\"opt_value\":\"10.0.0.1\",\"ip_version\":\"4\"}]}]}" -X POST http://127.0.0.1:9696/networking/v2.0/ports | json_pp ... "extra_dhcp_opts" : [], ... neutron b71b25820be6d61ed9f249eddf32bfa49ac76524 To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/2003553/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 2002629] Re: devstack build in the gate fails with: ovnnb_db.sock: database connection failed
Removing neutron from the affected projects, since Yatin found the cause in devstack. ** No longer affects: neutron -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/2002629 Title: devstack build in the gate fails with: ovnnb_db.sock: database connection failed Status in devstack: In Progress Bug description: Recently we seem to have many the same devstack build failure in many different gate jobs. The usual error message is: + lib/neutron_plugins/ovn_agent:start_ovn:714 : wait_for_db_file /var/lib/ovn/ovnsb_db.db + lib/neutron_plugins/ovn_agent:wait_for_db_file:175 : local count=0 + lib/neutron_plugins/ovn_agent:wait_for_db_file:176 : '[' '!' -f /var/lib/ovn/ovnsb_db.db ']' + lib/neutron_plugins/ovn_agent:start_ovn:716 : is_service_enabled tls-proxy + functions-common:is_service_enabled:2089 : return 0 + lib/neutron_plugins/ovn_agent:start_ovn:717 : sudo ovn-nbctl --db=unix:/var/run/ovn/ovnnb_db.sock set-ssl /opt/stack/data/CA/int-ca/private/devstack-cert.key /opt/stack/data/CA/int-ca/devstack-cert.crt /opt/stack/data/CA/int-ca/ca-chain.pem ovn-nbctl: unix:/var/run/ovn/ovnnb_db.sock: database connection failed (No such file or directory) + lib/neutron_plugins/ovn_agent:start_ovn:1 : exit_trap A few example logs: https://zuul.opendev.org/t/openstack/build/ec852d75c8094afcb4140871bc9ffa36 https://zuul.opendev.org/t/openstack/build/eae988aa8cd24c78894a3d3438392357 The search expression 'message:"ovnnb_db.sock: database connection failed"' gives me 1200+ hits in https://opensearch.logs.openstack.org for the last 2 weeks. To manage notifications about this bug go to: https://bugs.launchpad.net/devstack/+bug/2002629/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 2002629] [NEW] devstack build in the gate fails with: ovnnb_db.sock: database connection failed
Public bug reported: Recently we seem to have many the same devstack build failure in many different gate jobs. The usual error message is: + lib/neutron_plugins/ovn_agent:start_ovn:714 : wait_for_db_file /var/lib/ovn/ovnsb_db.db + lib/neutron_plugins/ovn_agent:wait_for_db_file:175 : local count=0 + lib/neutron_plugins/ovn_agent:wait_for_db_file:176 : '[' '!' -f /var/lib/ovn/ovnsb_db.db ']' + lib/neutron_plugins/ovn_agent:start_ovn:716 : is_service_enabled tls-proxy + functions-common:is_service_enabled:2089 : return 0 + lib/neutron_plugins/ovn_agent:start_ovn:717 : sudo ovn-nbctl --db=unix:/var/run/ovn/ovnnb_db.sock set-ssl /opt/stack/data/CA/int-ca/private/devstack-cert.key /opt/stack/data/CA/int-ca/devstack-cert.crt /opt/stack/data/CA/int-ca/ca-chain.pem ovn-nbctl: unix:/var/run/ovn/ovnnb_db.sock: database connection failed (No such file or directory) + lib/neutron_plugins/ovn_agent:start_ovn:1 : exit_trap A few example logs: https://zuul.opendev.org/t/openstack/build/ec852d75c8094afcb4140871bc9ffa36 https://zuul.opendev.org/t/openstack/build/eae988aa8cd24c78894a3d3438392357 The search expression 'message:"ovnnb_db.sock: database connection failed"' gives me 1200+ hits in https://opensearch.logs.openstack.org for the last 2 weeks. ** Affects: neutron Importance: Undecided Status: New ** Tags: gate-failure ovn -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/2002629 Title: devstack build in the gate fails with: ovnnb_db.sock: database connection failed Status in neutron: New Bug description: Recently we seem to have many the same devstack build failure in many different gate jobs. The usual error message is: + lib/neutron_plugins/ovn_agent:start_ovn:714 : wait_for_db_file /var/lib/ovn/ovnsb_db.db + lib/neutron_plugins/ovn_agent:wait_for_db_file:175 : local count=0 + lib/neutron_plugins/ovn_agent:wait_for_db_file:176 : '[' '!' -f /var/lib/ovn/ovnsb_db.db ']' + lib/neutron_plugins/ovn_agent:start_ovn:716 : is_service_enabled tls-proxy + functions-common:is_service_enabled:2089 : return 0 + lib/neutron_plugins/ovn_agent:start_ovn:717 : sudo ovn-nbctl --db=unix:/var/run/ovn/ovnnb_db.sock set-ssl /opt/stack/data/CA/int-ca/private/devstack-cert.key /opt/stack/data/CA/int-ca/devstack-cert.crt /opt/stack/data/CA/int-ca/ca-chain.pem ovn-nbctl: unix:/var/run/ovn/ovnnb_db.sock: database connection failed (No such file or directory) + lib/neutron_plugins/ovn_agent:start_ovn:1 : exit_trap A few example logs: https://zuul.opendev.org/t/openstack/build/ec852d75c8094afcb4140871bc9ffa36 https://zuul.opendev.org/t/openstack/build/eae988aa8cd24c78894a3d3438392357 The search expression 'message:"ovnnb_db.sock: database connection failed"' gives me 1200+ hits in https://opensearch.logs.openstack.org for the last 2 weeks. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/2002629/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1998820] [NEW] Floor division in size usage calculation leads to surprising quota limits
Public bug reported: Colleagues working downstream found a slight discrepancy in quota enforcement while working with the new unified quota system. If we set the image_size_total quota to 1 MiB, the actual limit where quota enforcement turns on is 2 MiB - 1 byte: openstack --os-cloud devstack-system-admin registered limit create --service glance --default-limit 1 --region RegionOne image_size_total openstack image list -f value -c ID | xargs -r openstack image delete openstack image create --file <( dd if=/dev/zero bs=1 count=$(( 2 * 1024 ** 2 - 1 )) ) img1 ## succeeds openstack image create --file <( dd if=/dev/zero bs=1 count=1 ) img2 ## succeeds openstack image list -f value -c ID | xargs -r openstack image delete openstack image create --file <( dd if=/dev/zero bs=1 count=$(( 2 * 1024 ** 2 )) ) img1 ## succeeds openstack image create --file <( dd if=/dev/zero bs=1 count=1 ) img2 ## HttpException: 413: ... Request Entity Too Large This bug report is not about the size of img1 - we know that the limit is soft and img1 can go over the quota - but the success/failure of 'image create img2'. I believe the root cause is an integer/floor division when calculating the usage in megabytes. My colleagues also proposed a fix, which I am going to upload right after opening this ticket. Environment details: glance 199722a65 devstack 0d5c8d66 Quota setup as described in: https://docs.openstack.org/glance/latest/admin/quotas.html $ for opt in image_stage_total image_count_total image_count_uploading ; do openstack --os-cloud devstack-system-admin registered limit create --service glance --default-limit 99 --region RegionOne $opt ; done $ openstack --os-cloud devstack-system-admin registered limit create --service glance --default-limit 1 --region RegionOne image_size_total +---+--+ | Field | Value| +---+--+ | default_limit | 1| | description | None | | id| 828fe62d931449d08d96f725226891d4 | | region_id | RegionOne| | resource_name | image_size_total | | service_id| 3400473cffa047edb79c67383e86072d | +---+--+ $ source openrc admin admin $ openstack user create --password devstack glance-service +-+--+ | Field | Value| +-+--+ | domain_id | default | | enabled | True | | id | 43268355b8f64d399a7a35535ffee399 | | name| glance-service | | options | {} | | password_expires_at | None | +-+--+ $ openstack role add --user glance-service --user-domain Default --system all reader $ echo $OS_AUTH_URL http://192.168.122.218/identity $ openstack endpoint list --service glance +--+---+--+--+-+---+--+ | ID | Region| Service Name | Service Type | Enabled | Interface | URL | +--+---+--+--+-+---+--+ | 92995b7a76444502acbbecfb421d0bc1 | RegionOne | glance | image| True| public| http://192.168.122.218/image | +--+---+--+--+-+---+--+ $ vi /etc/glance/glance-api [DEFAULT] use_keystone_limits = True [oslo_limit] auth_url = http://192.168.122.218/identity auth_type = password user_domain_id = default username = glance-service system_scope = all password = devstack endpoint_id = 92995b7a76444502acbbecfb421d0bc1 region_name = RegionOne $ sudo systemctl restart devstack@g-api.service ** Affects: glance Importance: Undecided Status: In Progress -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to Glance. https://bugs.launchpad.net/bugs/1998820 Title: Floor division in size usage calculation leads to surprising quota limits Status in Glance: In Progress Bug description: Colleagues working downstream found a slight discrepancy in quota enforcement while working with the new unified quota system. If we set the image_size_total quota to 1 MiB, the actual limit where quota enforcement turns on is 2 MiB - 1 byte: openstack --os-cloud devstack-system-admin registered limit create --service glance --default-limit 1 --region RegionOne image_size_total
[Yahoo-eng-team] [Bug 1998337] [NEW] test_dvr_router_lifecycle_ha_with_snat_with_fips fails occasionally in the gate
Public bug reported: Opening this report to track the following test that fails occasionally in the gate: job neutron-functional-with-uwsgi test neutron.tests.functional.agent.l3.extensions.qos.test_fip_qos_extension.TestL3AgentFipQosExtensionDVR.test_dvr_router_lifecycle_ha_with_snat_with_fipstesttools Sample traceback: ft1.31: neutron.tests.functional.agent.l3.extensions.qos.test_fip_qos_extension.TestL3AgentFipQosExtensionDVR.test_dvr_router_lifecycle_ha_with_snat_with_fipstesttools.testresult.real._StringException: Traceback (most recent call last): File "/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/base.py", line 182, in func return f(self, *args, **kwargs) File "/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/base.py", line 182, in func return f(self, *args, **kwargs) File "/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/functional/agent/l3/test_dvr_router.py", line 208, in test_dvr_router_lifecycle_ha_with_snat_with_fips self._dvr_router_lifecycle(enable_ha=True, enable_snat=True) File "/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/functional/agent/l3/test_dvr_router.py", line 626, in _dvr_router_lifecycle self._assert_dvr_floating_ips(router, snat_bound_fip=snat_bound_fip, File "/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/functional/agent/l3/test_dvr_router.py", line 791, in _assert_dvr_floating_ips self.assertTrue(fg_port_created_successfully) File "/usr/lib/python3.10/unittest/case.py", line 687, in assertTrue raise self.failureException(msg) AssertionError: False is not true It seems to recur occasionally, for example: https://675daf3418638bf15806-f7e1f8eddcfdd9404f4b72ab9bb1f324.ssl.cf1.rackcdn.com/865575/1/check/neutron-functional-with-uwsgi/bd983b3/testr_results.html https://488eb2b76bde124417ee-80e67ec01f194d5b25d665df26ee3378.ssl.cf2.rackcdn.com/839066/18/check/neutron-functional-with-uwsgi/66c7fcc/testr_results.html There may be more that's similar: $ logsearch log --project openstack/neutron --result FAILURE --pipeline check --job neutron-functional-with-uwsgi --limit 30 'line 208, in test_dvr_router_lifecycle_ha_with_snat_with_fips' Builds with matching logs 5/30: +--+-+---++ | uuid | finished| review | branch | +--+-+---++ | 1d265722d23548d6930486699202347d | 2022-11-30T13:42:28 | https://review.opendev.org/863881 | master | | cb2a2d7161764d5f823a09528eedc44c | 2022-11-28T16:47:20 | https://review.opendev.org/865018 | master | | 66c7fcc56a5347648732bfcb90341ef5 | 2022-11-27T00:55:10 | https://review.opendev.org/839066 | master | | 85b3b709e9d54718a4f0847da5b4b2df | 2022-11-25T10:00:01 | https://review.opendev.org/865018 | master | | bd983b367ac441c190e38dcf1fadc87f | 2022-11-24T16:17:06 | https://review.opendev.org/865575 | master | +--+-+---++ ** Affects: neutron Importance: Medium Status: New ** Tags: gate-failure -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1998337 Title: test_dvr_router_lifecycle_ha_with_snat_with_fips fails occasionally in the gate Status in neutron: New Bug description: Opening this report to track the following test that fails occasionally in the gate: job neutron-functional-with-uwsgi test neutron.tests.functional.agent.l3.extensions.qos.test_fip_qos_extension.TestL3AgentFipQosExtensionDVR.test_dvr_router_lifecycle_ha_with_snat_with_fipstesttools Sample traceback: ft1.31: neutron.tests.functional.agent.l3.extensions.qos.test_fip_qos_extension.TestL3AgentFipQosExtensionDVR.test_dvr_router_lifecycle_ha_with_snat_with_fipstesttools.testresult.real._StringException: Traceback (most recent call last): File "/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/base.py", line 182, in func return f(self, *args, **kwargs) File "/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/base.py", line 182, in func return f(self, *args, **kwargs) File "/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/functional/agent/l3/test_dvr_router.py", line 208, in test_dvr_router_lifecycle_ha_with_snat_with_fips self._dvr_router_lifecycle(enable_ha=True, enable_snat=True) File "/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/functional/agent/l3/test_dvr_router.py", line 626, in _dvr_router_lifecycle self._assert_dvr_floating_ips(router, snat_bound_fip=snat_bound_fip, File "/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/functional/agent/l3/test_dvr_router.py", line 791, in
[Yahoo-eng-team] [Bug 1995732] [NEW] bulk port create: TypeError: Bad prefix type for generating IPv6 address by EUI-64
Public bug reported: source openrc admin admin export TOKEN="$( openstack token issue -f value -c id )" A single port create succeeds: curl -s -H "Content-Type: application/json" -H "X-Auth-Token: $TOKEN" -d "{\"port\":{\"name\":\"port0\",\"network_id\":\"$( openstack net show private -f value -c id )\"}}" -X POST http://127.0.0.1:9696/networking/v2.0/ports | json_pp ... But the same request via the bulk api fails: curl -s -H "Content-Type: application/json" -H "X-Auth-Token: $TOKEN" -d "{\"ports\":[{\"name\":\"port0-via-bulk\",\"network_id\":\"$( openstack net show private -f value -c id )\"}]}" -X POST http://127.0.0.1:9696/networking/v2.0/ports | json_pp { "NeutronError" : { "detail" : "", "message" : "Request Failed: internal server error while processing your request.", "type" : "HTTPInternalServerError" } } While in q-svc logs we have: nov 04 15:56:52 devstack0 neutron-server[101377]: ERROR neutron.pecan_wsgi.hooks.translation [None req-f5c79830-013a-4ae2-8c47-2102b20299e1 admin admin] POST failed.: TypeError: Bad prefix type for generating IPv6 address by EUI-64: fdd6:813:349::/64 nov 04 15:56:52 devstack0 neutron-server[101377]: ERROR neutron.pecan_wsgi.hooks.translation Traceback (most recent call last): nov 04 15:56:52 devstack0 neutron-server[101377]: ERROR neutron.pecan_wsgi.hooks.translation File "/usr/local/lib/python3.10/dist-packages/oslo_utils/netutils.py", line 210, in get_ipv6_addr_by_EUI64 nov 04 15:56:52 devstack0 neutron-server[101377]: ERROR neutron.pecan_wsgi.hooks.translation eui64 = int(netaddr.EUI(mac).eui64()) nov 04 15:56:52 devstack0 neutron-server[101377]: ERROR neutron.pecan_wsgi.hooks.translation File "/usr/local/lib/python3.10/dist-packages/netaddr/eui/__init__.py", line 389, in __init__ nov 04 15:56:52 devstack0 neutron-server[101377]: ERROR neutron.pecan_wsgi.hooks.translation self.value = addr nov 04 15:56:52 devstack0 neutron-server[101377]: ERROR neutron.pecan_wsgi.hooks.translation File "/usr/local/lib/python3.10/dist-packages/netaddr/eui/__init__.py", line 425, in _set_value nov 04 15:56:52 devstack0 neutron-server[101377]: ERROR neutron.pecan_wsgi.hooks.translation self._value = module.str_to_int(value) nov 04 15:56:52 devstack0 neutron-server[101377]: ERROR neutron.pecan_wsgi.hooks.translation File "/usr/local/lib/python3.10/dist-packages/netaddr/strategy/eui48.py", line 178, in str_to_int nov 04 15:56:52 devstack0 neutron-server[101377]: ERROR neutron.pecan_wsgi.hooks.translation raise TypeError('%r is not str() or unicode()!' % (addr,)) nov 04 15:56:52 devstack0 neutron-server[101377]: ERROR neutron.pecan_wsgi.hooks.translation TypeError: is not str() or unicode()! nov 04 15:56:52 devstack0 neutron-server[101377]: ERROR neutron.pecan_wsgi.hooks.translation nov 04 15:56:52 devstack0 neutron-server[101377]: ERROR neutron.pecan_wsgi.hooks.translation During handling of the above exception, another exception occurred: nov 04 15:56:52 devstack0 neutron-server[101377]: ERROR neutron.pecan_wsgi.hooks.translation nov 04 15:56:52 devstack0 neutron-server[101377]: ERROR neutron.pecan_wsgi.hooks.translation Traceback (most recent call last): nov 04 15:56:52 devstack0 neutron-server[101377]: ERROR neutron.pecan_wsgi.hooks.translation File "/usr/local/lib/python3.10/dist-packages/pecan/core.py", line 693, in __call__ nov 04 15:56:52 devstack0 neutron-server[101377]: ERROR neutron.pecan_wsgi.hooks.translation self.invoke_controller(controller, args, kwargs, state) nov 04 15:56:52 devstack0 neutron-server[101377]: ERROR neutron.pecan_wsgi.hooks.translation File "/usr/local/lib/python3.10/dist-packages/pecan/core.py", line 584, in invoke_controller nov 04 15:56:52 devstack0 neutron-server[101377]: ERROR neutron.pecan_wsgi.hooks.translation result = controller(*args, **kwargs) nov 04 15:56:52 devstack0 neutron-server[101377]: ERROR neutron.pecan_wsgi.hooks.translation File "/opt/stack/neutron-lib/neutron_lib/db/api.py", line 140, in wrapped nov 04 15:56:52 devstack0 neutron-server[101377]: ERROR neutron.pecan_wsgi.hooks.translation with excutils.save_and_reraise_exception(): nov 04 15:56:52 devstack0 neutron-server[101377]: ERROR neutron.pecan_wsgi.hooks.translation File "/usr/local/lib/python3.10/dist-packages/oslo_utils/excutils.py", line 227, in __exit__ nov 04 15:56:52 devstack0 neutron-server[101377]: ERROR neutron.pecan_wsgi.hooks.translation self.force_reraise() nov 04 15:56:52 devstack0 neutron-server[101377]: ERROR neutron.pecan_wsgi.hooks.translation File "/usr/local/lib/python3.10/dist-packages/oslo_utils/excutils.py", line 200, in force_reraise nov 04 15:56:52 devstack0 neutron-server[101377]: ERROR neutron.pecan_wsgi.hooks.translation raise self.value nov 04 15:56:52 devstack0 neutron-server[101377]: ERROR neutron.pecan_wsgi.hooks.translation File "/opt/stack/neutron-lib/neutron_lib/db/api.py", line 138, in wrapped nov 04
[Yahoo-eng-team] [Bug 1992328] [NEW] volume timeouts in nova gate
Public bug reported: I'm trying to track here a bug I have seen in nova gate appearing randomly through rechecks. Typical stack traces: Traceback (most recent call last): File "/opt/stack/tempest/tempest/common/utils/__init__.py", line 90, in wrapper return f(*func_args, **func_kwargs) File "/opt/stack/tempest/tempest/api/compute/admin/test_volume_swap.py", line 110, in test_volume_swap volume1['id'], 'available') File "/opt/stack/tempest/tempest/common/waiters.py", line 288, in wait_for_volume_resource_status raise lib_exc.TimeoutException(message) tempest.lib.exceptions.TimeoutException: Request timed out Details: volume a19743a3-4651-4c7f-a9a1-823735ea84a0 failed to reach available status (current in-use) within the required time (196 s). Traceback (most recent call last): File "/opt/stack/tempest/tempest/common/utils/__init__.py", line 90, in wrapper return f(*func_args, **func_kwargs) File "/opt/stack/tempest/tempest/api/compute/admin/test_live_migration.py", line 190, in test_live_block_migration_with_attached_volume self.attach_volume(server, volume, device='/dev/xvdb') File "/opt/stack/tempest/tempest/api/compute/base.py", line 581, in attach_volume volume['id'], 'in-use') File "/opt/stack/tempest/tempest/common/waiters.py", line 288, in wait_for_volume_resource_status raise lib_exc.TimeoutException(message) tempest.lib.exceptions.TimeoutException: Request timed out Details: volume 92685b8f-4db0-4110-a1ac-016ea7c51d1f failed to reach in-use status (current available) within the required time (196 s). Typical jobs and tests: nova-multi-cell test_volume_swap[id-1769f00d-a693-4d67-a631-6a3496773813] nova-live-migration test_live_block_migration_with_attached_volume[id-e19c0cc6-6720-4ed8-be83-b6603ed5c812] Example hits with (affecting multiple branches): $ logsearch log --project openstack/nova --job nova-live-migration --result FAILURE --limit 50 "test_live_block_migration_with_attached_volume .* ... FAILED" ... Builds with matching logs 10/50: +--+-+--+---+-+ | uuid | finished| pipeline | review | branch | +--+-+--+---+-+ | 36b367b0d0bb46d2a7fc6af4eb7739ca | 2022-10-07T19:39:47 | check| https://review.opendev.org/860736 | stable/victoria | | d02ed047fcfd4180902dc0bec0334c38 | 2022-10-03T10:37:00 | check| https://review.opendev.org/854980 | stable/victoria | | 0df9b00df16c4bbc9e49baf853fe0cf5 | 2022-09-19T09:47:02 | check| https://review.opendev.org/854980 | stable/victoria | | 0db0e8d510d04443a172cc43e537f973 | 2022-09-16T14:14:31 | check| https://review.opendev.org/857877 | stable/train| | 6ca30836a1b34be58728dc5d69c44c21 | 2022-09-16T10:33:55 | check| https://review.opendev.org/858051 | stable/victoria | | 684e7c37c61745829908495ba249afb7 | 2022-09-16T10:14:07 | check| https://review.opendev.org/854980 | stable/victoria | | 6bcf4105d0fc476faf9ee56e7f0ed41f | 2022-09-15T14:22:01 | check| https://review.opendev.org/857877 | stable/train| | 0ea47624757c48a8bcfa9fd5c35b6465 | 2022-09-13T10:33:52 | check| https://review.opendev.org/854980 | stable/victoria | | ca0d5f750b3040ed99c1e6ec3414d154 | 2022-09-06T17:28:41 | check| https://review.opendev.org/836830 | master | | 2ce6d7aa67404587b050a6b56f4d15e6 | 2022-08-29T11:58:59 | check| https://review.opendev.org/833090 | master | +--+-+--+---+-+ ** Affects: nova Importance: Undecided Status: New ** Tags: gate-failure -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1992328 Title: volume timeouts in nova gate Status in OpenStack Compute (nova): New Bug description: I'm trying to track here a bug I have seen in nova gate appearing randomly through rechecks. Typical stack traces: Traceback (most recent call last): File "/opt/stack/tempest/tempest/common/utils/__init__.py", line 90, in wrapper return f(*func_args, **func_kwargs) File "/opt/stack/tempest/tempest/api/compute/admin/test_volume_swap.py", line 110, in test_volume_swap volume1['id'], 'available') File "/opt/stack/tempest/tempest/common/waiters.py", line 288, in wait_for_volume_resource_status raise lib_exc.TimeoutException(message) tempest.lib.exceptions.TimeoutException: Request timed out Details: volume a19743a3-4651-4c7f-a9a1-823735ea84a0 failed to reach available status (current in-use) within the required time (196 s). Traceback (most recent call last): File
[Yahoo-eng-team] [Bug 1990842] [NEW] RFE Expose Open vSwitch other_config column in the API
Public bug reported: Some of our performance sensitive users would like to tweak Open vSwitch's Tx packet steering option under OpenStack: https://docs.openvswitch.org/en/latest/topics/userspace-tx-steering/ available since Open vSwitch v2.17.0: https://github.com/openvswitch/ovs/blob/7af5c33c1629b309cbcbe3b6c9c3bd6d3b4c0abf/NEWS#L103 https://github.com/openvswitch/ovs/commit/c18e707b2f259438633af5b23df53e1409472871 To enable that, we would like to expose some OVS interface configuration in a Neutron port's binding_profile. Consider for example: openstack port create port0 --binding-profile ovs_other_config=tx-steering:hash ... more generally: --binding-profile ovs_other_config=foo:bar,bar:baz or an alternative syntax: --binding-profile ovs:other_config='{"foo": "bar", "bar": "baz"}' Given this information, ovs-agent can set the corresponding OVS interface's other_config (using the python native interface of course, not ovs-vsctl): sudo ovs-vsctl set Interface ovs-interface-of-port0 other_config:tx-steering=hash sudo ovs-vsctl set Interface ovs-interface-of-port0 other_config:foo=bar other_config:bar=baz ** Affects: neutron Importance: Wishlist Assignee: Bence Romsics (bence-romsics) Status: New ** Tags: rfe -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1990842 Title: RFE Expose Open vSwitch other_config column in the API Status in neutron: New Bug description: Some of our performance sensitive users would like to tweak Open vSwitch's Tx packet steering option under OpenStack: https://docs.openvswitch.org/en/latest/topics/userspace-tx-steering/ available since Open vSwitch v2.17.0: https://github.com/openvswitch/ovs/blob/7af5c33c1629b309cbcbe3b6c9c3bd6d3b4c0abf/NEWS#L103 https://github.com/openvswitch/ovs/commit/c18e707b2f259438633af5b23df53e1409472871 To enable that, we would like to expose some OVS interface configuration in a Neutron port's binding_profile. Consider for example: openstack port create port0 --binding-profile ovs_other_config=tx-steering:hash ... more generally: --binding-profile ovs_other_config=foo:bar,bar:baz or an alternative syntax: --binding-profile ovs:other_config='{"foo": "bar", "bar": "baz"}' Given this information, ovs-agent can set the corresponding OVS interface's other_config (using the python native interface of course, not ovs-vsctl): sudo ovs-vsctl set Interface ovs-interface-of-port0 other_config:tx-steering=hash sudo ovs-vsctl set Interface ovs-interface-of-port0 other_config:foo=bar other_config:bar=baz To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1990842/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1988986] [NEW] gate: keystone-protection-functional: keystone_tempest_plugin.tests.rbac.v3.test_credential.SystemAdminTests.test_identity_list_credentials: Could not find credent
Public bug reported: Tracking a bug seen in the gate: zuul report: https://50aa58668700125588f9-69e8ab9908c85e150921aaa267a6677d.ssl.cf1.rackcdn.com/855198/1/gate/keystone-protection-functional/edeae8a/testr_results.html zuul log: https://50aa58668700125588f9-69e8ab9908c85e150921aaa267a6677d.ssl.cf1.rackcdn.com/855198/1/gate/keystone-protection-functional/edeae8a/job-output.txt pipeline: gate job: keystone-protection-functional test: keystone_tempest_plugin.tests.rbac.v3.test_credential.SystemAdminTests.test_identity_list_credentials stack trace: 2022-09-06 16:19:59.894748 | controller | {3} keystone_tempest_plugin.tests.rbac.v3.test_credential.SystemAdminTests.test_identity_list_credentials [0.842160s] ... FAILED 2022-09-06 16:19:59.894814 | controller | 2022-09-06 16:19:59.894840 | controller | Captured traceback: 2022-09-06 16:19:59.894859 | controller | ~~~ 2022-09-06 16:19:59.894877 | controller | Traceback (most recent call last): 2022-09-06 16:19:59.894903 | controller | 2022-09-06 16:19:59.894922 | controller | File "/opt/stack/tempest/.tox/tempest/lib/python3.8/site-packages/keystone_tempest_plugin/tests/rbac/v3/test_credential.py", line 220, in test_identity_list_credentials 2022-09-06 16:19:59.894941 | controller | resp = self.do_request('list_credentials')['credentials'] 2022-09-06 16:19:59.894959 | controller | 2022-09-06 16:19:59.894977 | controller | File "/opt/stack/tempest/.tox/tempest/lib/python3.8/site-packages/keystone_tempest_plugin/tests/rbac/v3/base.py", line 39, in do_request 2022-09-06 16:19:59.894994 | controller | response = getattr(client, method)(**payload) 2022-09-06 16:19:59.895012 | controller | 2022-09-06 16:19:59.895029 | controller | File "/opt/stack/tempest/tempest/lib/services/identity/v3/credentials_client.py", line 78, in list_credentials 2022-09-06 16:19:59.895047 | controller | resp, body = self.get(url) 2022-09-06 16:19:59.895064 | controller | 2022-09-06 16:19:59.895093 | controller | File "/opt/stack/tempest/tempest/lib/common/rest_client.py", line 314, in get 2022-09-06 16:19:59.895111 | controller | return self.request('GET', url, extra_headers, headers) 2022-09-06 16:19:59.895129 | controller | 2022-09-06 16:19:59.895146 | controller | File "/opt/stack/tempest/tempest/lib/common/rest_client.py", line 720, in request 2022-09-06 16:19:59.895164 | controller | self._error_checker(resp, resp_body) 2022-09-06 16:19:59.895181 | controller | 2022-09-06 16:19:59.895203 | controller | File "/opt/stack/tempest/tempest/lib/common/rest_client.py", line 826, in _error_checker 2022-09-06 16:19:59.895221 | controller | raise exceptions.NotFound(resp_body, resp=resp) 2022-09-06 16:19:59.895239 | controller | 2022-09-06 16:19:59.895256 | controller | tempest.lib.exceptions.NotFound: Object not found 2022-09-06 16:19:59.895274 | controller | Details: {'code': 404, 'message': 'Could not find credential: f5b242ff18564f548caa1072929fdac2.', 'title': 'Not Found'} ** Affects: keystone Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Identity (keystone). https://bugs.launchpad.net/bugs/1988986 Title: gate: keystone-protection-functional: keystone_tempest_plugin.tests.rbac.v3.test_credential.SystemAdminTests.test_identity_list_credentials: Could not find credential Status in OpenStack Identity (keystone): New Bug description: Tracking a bug seen in the gate: zuul report: https://50aa58668700125588f9-69e8ab9908c85e150921aaa267a6677d.ssl.cf1.rackcdn.com/855198/1/gate/keystone-protection-functional/edeae8a/testr_results.html zuul log: https://50aa58668700125588f9-69e8ab9908c85e150921aaa267a6677d.ssl.cf1.rackcdn.com/855198/1/gate/keystone-protection-functional/edeae8a/job-output.txt pipeline: gate job: keystone-protection-functional test: keystone_tempest_plugin.tests.rbac.v3.test_credential.SystemAdminTests.test_identity_list_credentials stack trace: 2022-09-06 16:19:59.894748 | controller | {3} keystone_tempest_plugin.tests.rbac.v3.test_credential.SystemAdminTests.test_identity_list_credentials [0.842160s] ... FAILED 2022-09-06 16:19:59.894814 | controller | 2022-09-06 16:19:59.894840 | controller | Captured traceback: 2022-09-06 16:19:59.894859 | controller | ~~~ 2022-09-06 16:19:59.894877 | controller | Traceback (most recent call last): 2022-09-06 16:19:59.894903 | controller | 2022-09-06 16:19:59.894922 | controller | File "/opt/stack/tempest/.tox/tempest/lib/python3.8/site-packages/keystone_tempest_plugin/tests/rbac/v3/test_credential.py", line 220, in test_identity_list_credentials 2022-09-06 16:19:59.894941 | controller | resp = self.do_request('list_credentials')['credentials'] 2022-09-06 16:19:59.894959 | controller | 2022-09-06 16:19:59.894977 | controller
[Yahoo-eng-team] [Bug 1988311] [NEW] Concurrent evacuation of vms with pinned cpus to the same host fail randomly
Public bug reported: Reproduction: Boot two vms (each with one pinned cpu) on devstack0. Then evacuate them to devtack0a. devstack0a has two dedicated cpus, so both vms should fit. However sometimes (for example 6 out of 10 times) the evacuation of one vm fails with this error message: 'CPU set to pin [0] must be a subset of free CPU set [1]'. devstack0 - all-in-one host devstack0a - compute-only host # have two dedicated cpus for pinning on the evacuation target host devstack0a:/etc/nova/nova-cpu.conf: [compute] cpu_dedicated_set = 0,1 # the dedicated cpus are properly tracked in placement $ openstack resource provider list +--+++--+--+ | uuid | name | generation | root_provider_uuid | parent_provider_uuid | +--+++--+--+ | a0574d87-42ee-4e13-b05a-639dc62c1196 | devstack0a | 2 | a0574d87-42ee-4e13-b05a-639dc62c1196 | None | | 2e6fac42-d6e3-4366-a864-d5eb2bdc2241 | devstack0 | 2 | 2e6fac42-d6e3-4366-a864-d5eb2bdc2241 | None | +--+++--+--+ $ openstack resource provider inventory list a0574d87-42ee-4e13-b05a-639dc62c1196 ++--+--+--+--+---+---+--+ | resource_class | allocation_ratio | min_unit | max_unit | reserved | step_size | total | used | ++--+--+--+--+---+---+--+ | MEMORY_MB | 1.5 |1 | 3923 | 512 | 1 | 3923 |0 | | DISK_GB| 1.0 |1 | 28 |0 | 1 |28 |0 | | PCPU | 1.0 |1 |2 |0 | 1 | 2 |0 | ++--+--+--+--+---+---+--+ # use vms with one pinned cpu openstack flavor create cirros256-pinned --public --ram 256 --disk 1 --vcpus 1 --property hw_rng:allowed=True --property hw:cpu_policy=dedicated # boot two vms (each with one pinned cpu) on devstack0 n=2 ; for i in $( seq $n ) ; do openstack server create --flavor cirros256-pinned --image cirros-0.5.2-x86_64-disk --nic net-id=private --availability-zone :devstack0 --wait vm$i ; done # kill n-cpu on devstack0 devstack0 $ sudo systemctl stop devstack@n-cpu # and force it down, so we can start evacuating openstack compute service set devstack0 nova-compute --down # evacuate both vms to devstack0a concurrently for vm in $( openstack server list --host devstack0 -f value -c ID ) ; do openstack --os-compute-api-version 2.29 server evacuate --host devstack0a $vm & done # follow up on how the evacuation is going, check if the bug occured, see details a bit below for i in $( seq $n ) ; do openstack server show vm$i -f value -c OS-EXT-SRV-ATTR:host -c status ; done # clean up devstack0 $ sudo systemctl start devstack@n-cpu openstack compute service set devstack0 nova-compute --up for i in $( seq $n ) ; do openstack server delete vm$i --wait ; done This bug is not deterministic. For example out of 10 tries (like above) I have seen 4 successes - when both vms successfully evacuated to (went to ACTIVE on) devstack0a. But in the other 6 cases only one vm evacuated successfully. The other vm went to ERROR state, with the error message: "CPU set to pin [0] must be a subset of free CPU set [1]". For example: $ openstack server show vm2 ... | fault | {'code': 400, 'created': '2022-08-24T13:50:33Z', 'message': 'CPU set to pin [0] must be a subset of free CPU set [1]'} | ... In n-cpu logs we see the following: aug 24 13:50:33 devstack0a nova-compute[246038]: ERROR nova.compute.manager [None req-278f5b67-a765-4231-b2b9-db3f8c7fe092 admin admin] [instance: dc3acde3-f1c6-41a9-9a12-0c278ad4b348] Setting instance vm_state to ERROR: nova.exception.CPUPinningInvalid: CPU set to pin [0] must be a subset of free CPU set [1] aug 24 13:50:33 devstack0a nova-compute[246038]: ERROR nova.compute.manager [instance: dc3acde3-f1c6-41a9-9a12-0c278ad4b348] Traceback (most recent call last): aug 24 13:50:33 devstack0a nova-compute[246038]: ERROR nova.compute.manager [instance: dc3acde3-f1c6-41a9-9a12-0c278ad4b348] File "/opt/stack/nova/nova/compute/manager.py", line 10375, in _error_out_instance_on_exception aug 24 13:50:33 devstack0a nova-compute[246038]: ERROR nova.compute.manager [instance: dc3acde3-f1c6-41a9-9a12-0c278ad4b348] yield aug 24 13:50:33 devstack0a nova-compute[246038]: ERROR nova.compute.manager [instance: dc3acde3-f1c6-41a9-9a12-0c278ad4b348] File
[Yahoo-eng-team] [Bug 1988168] [NEW] Broken host:port splitting
Public bug reported: Our users found a bug while POSTing to /v3/ec2tokens. I could simplify the reproduction to this script: $ cat keystone-post-ec2tokens.sh #! /bin/sh # source openrc admin admin # keystone-post-ec2tokens.sh http://127.0.0.1/identity/v3 keystone_base_url="${1:?}" cleanup () { openstack ec2 credential delete "$access" } trap cleanup EXIT #host="localhost" host="localhost:123" #host="1.2.3.4:123" #host="[fc00::]:123" access="$( openstack ec2 credential create -f value -c access )" secret="$( openstack ec2 credential show "$access" -f value -c secret )" signature="intentionally-invalid" cat
[Yahoo-eng-team] [Bug 1983570] [NEW] cannot schedule ovs sriov offload port to tunneled segment
Public bug reported: We observed a scheduling failure when using ovs sriov offload (https://docs.openstack.org/neutron/latest/admin/config-ovs-offload.html ) in combination with multisegment networks. The problem seems to affect the case when the port should be bound to a tunneled network segment (a segment that does not have a physnet). I read that nova scheduler works the same way with pci sriov passthrough, therefore I believe the same bug affects pci sriov passthrough, though I did not test that. Due to the special hardware needs for this environment I could not reproduce this in devstack. But I hope we have collected enough information that shows the error regardless. We believe we also identified the relevant lines of code. The overall setup includes l2gw - connecting the segments in the multisegment network. But I will ignore that here, since l2gw cannot be part of the root cause here. Neutron was configured with mechanism_drivers=sriovnicswitch,opendaylight_v2. However since the error happens before we bind the port, I believe the mechanism_driver is irrelevant as long as it allows the creation of ports with "--vnic-type direct --binding-profile '{"capabilities": ["switchdev"]}'". For the sake of simplicity I will call these "ovs sriov offload ports". As I understand the problem: 1) ovs sriov offload port on single segment neutron network, the segment is vxlan: works 2) normal port on no offload capable ovs (--vnic-type normal) on multisegment neutron network, one vlan, one vxlan segment, the port should be bound to the vxlan segment: works 3) ovs sriov offload port on multisegment neutron network, one vlan, one vxlan segment, the port should be bound to the vxlan segment: does not work To reproduce: * create a multisegment network with one vlan and one vxlan segment * create a port on that network with "--vnic-type direct --binding-profile '{"capabilities": ["switchdev"]}' --disable-port-security --no-security-group". * boot a vm with that port On the compute host on which we expect the scheduling and boot to succeed we have configuration like: [pci] passthrough_whitelist = [{"devname": "data2", "physical_network": null}, {"devname": "data3", "physical_network": null}] According to https://docs.openstack.org/nova/latest/admin/pci- passthrough.html this marks the tunneled segments on this host to be passthrough (and ovs offload) capable. The vm boot fails with: $ openstack server show c3_ms_1 ... | fault | {'code': 500, 'created': '2022-07-16T08:12:31Z', 'message': 'Insufficient compute resources: Requested instance NUMA topology together with requested PCI devices cannot fit the given host NUMA topology; Claim pci failed.', 'details': 'Traceback (most recent call last):\n File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 2418, in _build_and_run_instance\nlimits):\n File "/usr/lib/python3.6/site-packages/oslo_concurrency/lockutils.py", line 360, in inner\nreturn f(*args, **kwargs)\n File "/usr/lib/python3.6/site-packages/nova/compute/resource_tracker.py", line 172, in instance_claim\npci_requests, limits=limits)\n File "/usr/lib/python3.6/site-packages/nova/compute/claims.py", line 72, in __init__\nself._claim_test(compute_node, limits)\n File "/usr/lib/python3.6/site-packages/nova/compute/claims.py", line 114, in _claim_test\n"; ".join(reasons))\nnova.exception.ComputeResourcesUnavailable: Insufficient compute resources: Requested instance NUMA topology together with requested PCI devices cannot fit the given host NUMA topology; Claim pci failed.\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 2271, in _do_build_and_run_instance\nfilter_properties, request_spec, accel_uuids)\n File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 2469, in _build_and_run_instance\ninstance_uuid=instance.uuid, reason=e.format_message())\nnova.exception.RescheduledException: Build of instance 09f3f8bb-b4c0-4395-8167-c10609d32d08 was re-scheduled: Insufficient compute resources: Requested instance NUMA topology together with requested PCI devices cannot fit the given host NUMA topology; Claim pci failed.\n'} | ... In the scheduler logs we see that the scheduler uses a spec with a physnet. But the pci passthrough capability is on a device without a physnet. controlhost3:/home/ceeinfra # grep DC259-CEE3- /var/log/nova/nova-scheduler.log <180>2022-07-16T10:12:29.680009+02:00 controlhost3.dc259cee3.cloud.k2.ericsson.se nova-scheduler[67299]: 2022-07-16 10:12:29.679 76 WARNING nova.scheduler.host_manager [req-4dd7c37e-eb18-48da-9914-44a6a2a18b1d fcd3b2713191485d95befe1941f20e20 cf7024f0f2bd46a8b17fd42055a20323 - default default] Selected host: compute3.dc259cee3.cloud.k2.ericsson.se failed to consume from instance. Error: PCI device
[Yahoo-eng-team] [Bug 1966403] Re: Cres_Ubuntu 20.04, CI, Checkbox TPM test failed
Hi, Are you sure you wanted to post this bug report to the neutron project's bug tracker? ** Changed in: neutron Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1966403 Title: Cres_Ubuntu 20.04,CI,Checkbox TPM test failed Status in neutron: Invalid Bug description: Install Ubuntu 20.04 OS,and install checkbox,run the TPM test of the checkbox,There is 4 items failed. Failed Item: 1.tpm2.0_4.1.1/context_gap_max_check 2.tpm2.0_4.1.1/tpm2_getcap 3.tpm2.0_4.1.1/tpm2_nv 4.tpm2.0_4.1.1/tpm2_quote [Reproduce Steps] 1.Install Ubuntu 20.04. 2.Install checkbox. 3.Run the TPM test,issue occurred. [Result] Expected Result:the test should be pass. Actual Result: Test failed [Additional information] Test Vault ID:159637 Checkbox Test Case ID:100554 BIOS Version:0.9.39 Image/Manifest:dell-bto-focal-fossa-corsola-X212-20220302-1.iso CPU:XEON(R) PROCESSOR SAPPHIRE RAPIDS WS D-0 56c 105MB 350 W QYQU ES2 -112L SSKU, DPN:99AMTK MEM:Samsung, DIMM,16GB,4800,1RX8,16G,DDR5,R, DPN:1V1N1 GPU:GV100 Failure rate:100% To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1966403/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1955775] Re: Error when l3-agent get filter id for ip
** Changed in: neutron Status: In Progress => Won't Fix ** Changed in: neutron Status: Won't Fix => Triaged ** Changed in: neutron Importance: Undecided => High -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1955775 Title: Error when l3-agent get filter id for ip Status in neutron: Triaged Bug description: 2021-12-22 23:37:34.366 3403 ERROR neutron.agent.l3.agent Traceback (most recent call last): 2021-12-22 23:37:34.366 3403 ERROR neutron.agent.l3.agent File "/usr/lib/python3.6/site-packages/neutron/agent/l3/agent.py", line 555, in _process_router_update 2021-12-22 23:37:34.366 3403 ERROR neutron.agent.l3.agent self._process_router_if_compatible(router) 2021-12-22 23:37:34.366 3403 ERROR neutron.agent.l3.agent File "/usr/lib/python3.6/site-packages/neutron/agent/l3/agent.py", line 477, in _process_router_if_compatible 2021-12-22 23:37:34.366 3403 ERROR neutron.agent.l3.agent self._process_updated_router(router) 2021-12-22 23:37:34.366 3403 ERROR neutron.agent.l3.agent File "/usr/lib/python3.6/site-packages/neutron/agent/l3/agent.py", line 501, in _process_updated_router 2021-12-22 23:37:34.366 3403 ERROR neutron.agent.l3.agent self.l3_ext_manager.update_router(self.context, router) 2021-12-22 23:37:34.366 3403 ERROR neutron.agent.l3.agent File "/usr/lib/python3.6/site-packages/neutron/agent/l3/l3_agent_extensions_manager.py", line 54, in update_router 2021-12-22 23:37:34.366 3403 ERROR neutron.agent.l3.agent extension.obj.update_router(context, data) 2021-12-22 23:37:34.366 3403 ERROR neutron.agent.l3.agent File "/usr/lib/python3.6/site-packages/oslo_concurrency/lockutils.py", line 359, in inner 2021-12-22 23:37:34.366 3403 ERROR neutron.agent.l3.agent return f(*args, **kwargs) 2021-12-22 23:37:34.366 3403 ERROR neutron.agent.l3.agent File "/usr/lib/python3.6/site-packages/neutron/agent/l3/extensions/qos/fip.py", line 236, in update_router 2021-12-22 23:37:34.366 3403 ERROR neutron.agent.l3.agent self.process_floating_ip_addresses(context, router_info) 2021-12-22 23:37:34.366 3403 ERROR neutron.agent.l3.agent File "/usr/lib/python3.6/site-packages/neutron/agent/l3/extensions/qos/fip.py", line 218, in process_floating_ip_addresses 2021-12-22 23:37:34.366 3403 ERROR neutron.agent.l3.agent self.process_ip_rates(fip_addr, device, rates) 2021-12-22 23:37:34.366 3403 ERROR neutron.agent.l3.agent File "/usr/lib/python3.6/site-packages/neutron/agent/l3/extensions/qos/fip.py", line 183, in process_ip_rates 2021-12-22 23:37:34.366 3403 ERROR neutron.agent.l3.agent rate['rate'], rate['burst']) 2021-12-22 23:37:34.366 3403 ERROR neutron.agent.l3.agent File "/usr/lib/python3.6/site-packages/neutron/agent/l3/extensions/qos/fip.py", line 123, in process_ip_rate_limit 2021-12-22 23:37:34.366 3403 ERROR neutron.agent.l3.agent tc_wrapper.set_ip_rate_limit(direction, ip, rate, burst) 2021-12-22 23:37:34.366 3403 ERROR neutron.agent.l3.agent File "/usr/lib/python3.6/site-packages/neutron/agent/linux/l3_tc_lib.py", line 169, in set_ip_rate_limit 2021-12-22 23:37:34.366 3403 ERROR neutron.agent.l3.agent filter_id = self._get_filterid_for_ip(qdisc_id, ip) 2021-12-22 23:37:34.366 3403 ERROR neutron.agent.l3.agent File "/usr/lib/python3.6/site-packages/neutron/agent/linux/l3_tc_lib.py", line 82, in _get_filterid_for_ip 2021-12-22 23:37:34.366 3403 ERROR neutron.agent.l3.agent filterids_for_ip.append(filter_id) 2021-12-22 23:37:34.366 3403 ERROR neutron.agent.l3.agent UnboundLocalError: local variable 'filter_id' referenced before assignment If the interface is accidentally added some tc rules not through neutron, for example, the interface has two tc rules, the first rule is "filter protocol all ...", the second rule is "match ...". This first rule mismatch FILTER_ID_REGEX and the second rule starts with "match", so the code will execute this statement: filterids_for_ip.append(filter_id) But filter_id has not been assignment at this time. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1955775/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1955765] Re: Devstack - Can no longer enable qos with neutron-qos
Hi, There's a long history here, but I would actually recommend that you switch back to using the legacy devstack plugin. The new neutron devstack plugin AFAICT worked quite well in a simple dev environment. Despite the legacy one being deprecated for a long time, the work on the new one stalled and it never completely replaced the legacy plugin (mostly for use cases in the gate). For a time both were maintained. And at some point we acknowledged that the new devstack plugin will never be completed and un-deprecated the legacy plugin: https://review.opendev.org/c/openstack/devstack/+/704829 Some of these changes were clearly unexpected and probably we could have done a better job communicating which plugin is the preferred. And now maybe we should deprecate the new plugin. I think I'll ask the team about that on our next meeting. But until then the best I can recommend is that you switch back to using the legacy devstack plugin. Regards, Bence ** Changed in: neutron Status: New => Opinion -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1955765 Title: Devstack - Can no longer enable qos with neutron-qos Status in neutron: Opinion Bug description: The neutron-qos functions were moved away from neutron devstack plugin with [1] and added to devstack directly with [2] and [3]. However, when one would previously enable neutron-qos in devstack with `neutron-qos`, this is no longer the case as the functions were added to the neutron-legacy file that is only sourced when legacy (quantum era) neutron services are enabled. [1] https://review.opendev.org/#/q/I7b70d6281d551a88080c6e727e2485079ba5c061 [2] https://review.opendev.org/#/q/I48f65d530db53fe2c94cad57a8072e1158d738b0 [3] https://review.opendev.org/#/q/Icf459a2f8c6ae3c3cb29b16ba0b92766af41af30 To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1955765/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1955491] Re: [DHCP] Neutron DHCP agent failing when disabling the Linux DHCP service
Rodolfo, based on your analysis I moved this report to tripleo. Of course if it also has a neutron part, just add that back please. ** Project changed: neutron => tripleo -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1955491 Title: [DHCP] Neutron DHCP agent failing when disabling the Linux DHCP service Status in tripleo: New Bug description: This issue has been detected running Neutron Train (Red Hat OSP 16.2), using TripleO as deployment tool. The services run on containers, using podman. The DHCP tries to stop the disable the DHCP helper. That calls the driver "disable" method [1]. In Linux that will call [2], that will try to stop the running process. In devstack, this process is a "dnsmasq" instance running on the DHCP namespace. In TripleO, the DHCP agent container will spawn a sidecar container to execute the "dnsmasq" instance. That requires a specific kill script [3]. In this deployment, the DHCP agent is returning exit code 125 when trying to disable the "dnsmasq" process (running in a container): neutron_lib.exceptions.ProcessExecutionError: Exit code: 125; Stdin: ; Stdout: ; Stderr: This error code comes from "podman" and could be cause because the container is not present in the system. That will raise an exception [4] that will re schedule a resync. The DHCP agent will enter in an endless loop unless restarted. That will remove from "self.cache = NetworkCache()" the affected network that is triggering the exception. Logs DHCP agent (snippet): [4] Bugzilla reference: https://bugzilla.redhat.com/show_bug.cgi?id=2032010 [1]https://github.com/openstack/neutron/blob/df9435a9a6fab9492c4f23d9ab0f1507841430c7/neutron/agent/dhcp/agent.py#L413-L426 [2]https://github.com/openstack/neutron/blob/df9435a9a6fab9492c4f23d9ab0f1507841430c7/neutron/agent/linux/dhcp.py#L305-L313 [3]https://github.com/openstack/tripleo-heat-templates/blob/25db32d4e5ed7ed4687bbb6d07a8a87ad65b71e6/deployment/neutron/kill-script [4]https://paste.opendev.org/show/811802/ To manage notifications about this bug go to: https://bugs.launchpad.net/tripleo/+bug/1955491/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1952730] [NEW] Segment updates may cause unnecessary overload
Public bug reported: When: * the segments service plugin is enabled and * we have many rpc worker processes (as in the sum of rpc_workers and rpc_state_report_workers, since both kind processes agent state_reports) and * many ovs-agents report physnets and * neutron-server is restarted, then rpc workers may get overloaded by state_report messages. That is: they may run at 100% CPU utilization for tens of minutes and during that they are not able process ovs-agent's state_reports in a timely manner. Which in turn causes the agent state to go down and back, maybe multiple times. Eventually, as the workers get through the initial processing, the load lessens, and the system stabilizes. The same rate of incoming state_report messages is not a problem at that point. (Colleagues working downstream observed this on a stable/victoria base with cc 150 ovs-agents and 3 neutron-servers each having maybe rpc_workers=6 and rpc_state_report_workers=6. The relevant code did not change at all since victoria, so I believe the same would happen on master.) I think the root cause is the following: rabbitmq dispatches the state_report messages between the workers in a round robin fashion, terefore eventually the state_reports of the same agent will hit all rpc workers. Each worker has logic to update the host segment mapping if either the server or the agent got restarted: https://opendev.org/openstack/neutron/src/commit/90b5456b8c11011c41f2fcd53a8943cb45fb6479/neutron/services/segments/db.py#L304-L305 Unfortunately the 'reported_hosts' set (to remember from which host the server has seen agent reports already) is private to each worker process. But right after a server (re-)start when that set is still empty, each worker will unconditionally write the received physnet-segment information into the db. This means we multiply the load on the db and rpc workers by a factor of the total rpc worker count. Pushing a fix attempt soon. ** Affects: neutron Importance: High Assignee: Bence Romsics (bence-romsics) Status: In Progress -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1952730 Title: Segment updates may cause unnecessary overload Status in neutron: In Progress Bug description: When: * the segments service plugin is enabled and * we have many rpc worker processes (as in the sum of rpc_workers and rpc_state_report_workers, since both kind processes agent state_reports) and * many ovs-agents report physnets and * neutron-server is restarted, then rpc workers may get overloaded by state_report messages. That is: they may run at 100% CPU utilization for tens of minutes and during that they are not able process ovs-agent's state_reports in a timely manner. Which in turn causes the agent state to go down and back, maybe multiple times. Eventually, as the workers get through the initial processing, the load lessens, and the system stabilizes. The same rate of incoming state_report messages is not a problem at that point. (Colleagues working downstream observed this on a stable/victoria base with cc 150 ovs-agents and 3 neutron-servers each having maybe rpc_workers=6 and rpc_state_report_workers=6. The relevant code did not change at all since victoria, so I believe the same would happen on master.) I think the root cause is the following: rabbitmq dispatches the state_report messages between the workers in a round robin fashion, terefore eventually the state_reports of the same agent will hit all rpc workers. Each worker has logic to update the host segment mapping if either the server or the agent got restarted: https://opendev.org/openstack/neutron/src/commit/90b5456b8c11011c41f2fcd53a8943cb45fb6479/neutron/services/segments/db.py#L304-L305 Unfortunately the 'reported_hosts' set (to remember from which host the server has seen agent reports already) is private to each worker process. But right after a server (re-)start when that set is still empty, each worker will unconditionally write the received physnet-segment information into the db. This means we multiply the load on the db and rpc workers by a factor of the total rpc worker count. Pushing a fix attempt soon. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1952730/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1951429] [NEW] Neutron API responses should not contain tracebacks
Public bug reported: Security folks found some corner cases in the neutron API where the response contains a traceback, for example: $ curl --request-target foo -k http://127.0.0.1:9696 Traceback (most recent call last): File "/usr/local/lib/python3.8/dist-packages/eventlet/wsgi.py", line 563, in handle_one_response result = self.application(self.environ, start_response) File "/usr/local/lib/python3.8/dist-packages/paste/urlmap.py", line 208, in __call__ path_info = self.normalize_url(path_info, False)[1] File "/usr/local/lib/python3.8/dist-packages/paste/urlmap.py", line 130, in normalize_url assert (not url or url.startswith('/') AssertionError: URL fragments must start with / or http:// (you gave 'foo') As a developer I don't mind such tracebacks, but I see their point that this may give away unwanted information to an attacker. On the other hand I would not consider this in itself a vulnerability. Pushing a trivial fix in a minute. ** Affects: neutron Importance: Low Assignee: Bence Romsics (bence-romsics) Status: In Progress ** Tags: api -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1951429 Title: Neutron API responses should not contain tracebacks Status in neutron: In Progress Bug description: Security folks found some corner cases in the neutron API where the response contains a traceback, for example: $ curl --request-target foo -k http://127.0.0.1:9696 Traceback (most recent call last): File "/usr/local/lib/python3.8/dist-packages/eventlet/wsgi.py", line 563, in handle_one_response result = self.application(self.environ, start_response) File "/usr/local/lib/python3.8/dist-packages/paste/urlmap.py", line 208, in __call__ path_info = self.normalize_url(path_info, False)[1] File "/usr/local/lib/python3.8/dist-packages/paste/urlmap.py", line 130, in normalize_url assert (not url or url.startswith('/') AssertionError: URL fragments must start with / or http:// (you gave 'foo') As a developer I don't mind such tracebacks, but I see their point that this may give away unwanted information to an attacker. On the other hand I would not consider this in itself a vulnerability. Pushing a trivial fix in a minute. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1951429/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1945747] Re: GET security group rule is missing description attribute
*** This bug is a duplicate of bug 1904188 *** https://bugs.launchpad.net/bugs/1904188 I am marking this as duplicate. Let me know if you think differently. Also don't hesitate to propose a backport to stable/ussuri. ** This bug has been marked a duplicate of bug 1904188 Include standard attributes ID in OVO dictionaries to improve the OVN revision numbers operation -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1945747 Title: GET security group rule is missing description attribute Status in neutron: New Bug description: The description attribute is missed attribute in _make_security_group_rule_dict Create sec group rule with desc stack@bionic-template:~/devstack$ openstack security group rule create --description "test rule" --remote-ip 0.0.0.0/0 --ingress ff57f76f-93a0-4bf3-b538-c88df40fdc40 +---+--+ | Field | Value | +---+--+ | created_at| 2021-10-01T06:35:50Z | | description | test rule | | direction | ingress | | ether_type| IPv4 | | id| 389eb45e-58ac-471c-b966-a3c8784009f7 | | location | cloud='', project.domain_id='default', project.domain_name=, project.id='f2527eb734c745eca32b1dfbd9107563', project.name='admin', region_name='RegionOne', zone= | | name | None | | port_range_max| None | | port_range_min| None | | project_id| f2527eb734c745eca32b1dfbd9107563 | | protocol | None | | remote_group_id | None | | remote_ip_prefix | None | | revision_number | 0 | | security_group_id | ff57f76f-93a0-4bf3-b538-c88df40fdc40 | | tags | [] | | updated_at| 2021-10-01T06:35:50Z | +---+--+ Example get (no description) RESP BODY: {"security_group_rule": {"id":
[Yahoo-eng-team] [Bug 1936839] [NEW] Ingress bw-limit with DPDK does not work
Public bug reported: A colleague of mine working downstream found the following bug (his report follows with minor redactions of company-internal details). I'm going to push his proposed fix in a minute too. In short, the inbound bandwidth limitation on vHost user ports doesn't seem to work. The value set with OpenStack QoS commands on the port isn't configured properly. The problem exists in OVS backend. Creating a 1 Mbit/s limit rule: openstack network qos rule create max_1_Mbps --type bandwidth-limit --max-kbps 1000 --max-burst-kbits 1000 --ingress After applying to the port, you can query it on the compute: compute-0-5:/home/ceeinfra # ovs-vsctl list qos _uuid : c326ed8b-24ef-4f1f-a5b0-b20f3ca3297d external_ids: {id=vhu84edf6c2-f0} other_config: {cbs="125000.0", cir="125000.0"} queues : {} type: egress-policer Note: the traffic is ingress from the VM point of view, and egress from OVS. The values are not integers, they have a .0 at the end. In /var/log/openvswitch/ovs-vswitchd.log you can see that it is not accepted: 2021-07-15T12:36:23.121Z|00208|netdev_dpdk|ERR|Could not create rte meter for egress policer 2021-07-15T12:36:23.121Z|00209|netdev_dpdk|ERR|Failed to set QoS type egress-policer on port vhu84edf6c2-f0: Invalid argument 2021-07-15T12:36:23.126Z|00210|netdev_dpdk|ERR|Could not create rte meter for egress policer 2021-07-15T12:36:23.126Z|00211|netdev_dpdk|ERR|Failed to set QoS type egress-policer on port vhu84edf6c2-f0: Invalid argument If you create a traffic between two VMs, the downloading one having the limitation applied on its port reports this: root@bwtest2:~# nc 192.168.1.201 | dd of=/dev/null status=progress 816316928 bytes (816 MB, 779 MiB) copied, 5 s, 163 MB/s^C 1863705+71 records in 1863738+0 records out 954233856 bytes (954 MB, 910 MiB) copied, 8.23046 s, 116 MB/s The bandwidth is higher than the set 1 Mb/s. It is possible to modify the OVS agent so it applies the bandwidth limit correctly. You have to find out where the Python scripts of the neutron_openvswitch_agent container are stored on the compute host. In our environment the file to modify is: /var/lib/docker/overlay2/68653008fca0a6434adb3985b021b2329680b71b49859c3a028f951deed59df3/merged/usr/lib/python3.6/site- packages/neutron/agent/common/ovs_lib.py In the _update_ingress_bw_limit_for_dpdk_port function, the original code is: # cir and cbs should be set in bytes instead of bits qos_other_config = { 'cir': str(max_bw_in_bits / 8), 'cbs': str(max_burst_in_bits / 8) } If you modify the code to this: # cir and cbs should be set in bytes instead of bits qos_other_config = { 'cir': str(int(max_bw_in_bits / 8)), 'cbs': str(int(max_burst_in_bits / 8)) } the values passed to OVS will be integers. You can see the difference querying the new values after applying the limit on the ports again: compute-0-5:/home/ceeinfra # ovs-vsctl list qos _uuid : b93b1165-e839-4378-a6b7-b75c13ad0d41 external_ids: {id=vhu84edf6c2-f0} other_config: {cbs="125000", cir="125000"} queues : {} type: egress-policer They don't have the .0 at the and anymore, and OVS doesn't complain in the logs about invalid arguments. The bandwidth limitation between the computes now works: root@bwtest2:~# nc 192.168.1.201 | dd of=/dev/null status=progress 4095488 bytes (4.1 MB, 3.9 MiB) copied, 33 s, 123 kB/s^C 7274+1382 records in 8051+0 records out 4122112 bytes (4.1 MB, 3.9 MiB) copied, 33.4033 s, 123 kB/s 125 kB/s translates to 1 Mb/s that we have applied with the rule, so it works now. My guess is that this problem comes from the different behaviour between the division in Python2 and Python3: user@debian:~$ python2 Python 2.7.16 (default, Oct 10 2019, 22:02:15) [GCC 8.3.0] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> 4/3 1 >>> user@debian:~$ python3 Python 3.7.3 (default, Jan 22 2021, 20:04:44) [GCC 8.3.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> 4/3 1. >>> Python3 doesn't round to integers, and OVS doesn't seem to accept floating point numbers. I have seen this in multiple versions. ** Affects: neutron Importance: Undecided Assignee: Bence Romsics (bence-romsics) Status: In Progress ** Tags: ovs qos -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1936839 Title: Ingress bw-limit with DPDK does not work Status in neutron: In Progress Bug description: A colleague of mine working downstream found the following bug (his
[Yahoo-eng-team] [Bug 1934238] Re: instance failed network setup
As gibi said above, this is unlikely to be either a nova or a neutron problem, but more likely a deployment problem. I don't believe the various neutron log lines quoted have anything to do with the root cause. To help with the debugging: What deployment software did you use? Are you using devstack - since you said you deployed into vms? How was the deployment software configured? Did the deployment complete successfully? Is neutron-server running? Is neutron-server actually available at the address nova tries to connect to? >From both hosts? Since you mentioned that you used 2 vms, did the error message come from the same host where the controller components are running? If not then the http://localhost:9696/... url is definitely wrong. ** Changed in: neutron Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1934238 Title: instance failed network setup Status in neutron: Invalid Status in OpenStack Compute (nova): Invalid Bug description: I set up open stack on 2 ubuntu vm. when i want to create a new instance its cant connect to neutron and in nova-compute logs show bellow logs: 2021-07-01 02:15:08.398 83631 INFO nova.compute.claims [req-830e0a7f-5a50-448c-a186-f082962c3c86 91704884e43f48fcbd156b8d7429fc3e 5e055db0a5464dc1997ab0f456792271 - default default] [instance: 3316f595-0e20-4914-90a2-c00da68c82ec] Claim successful on node compute 2021-07-01 02:15:12.006 83631 INFO nova.virt.libvirt.driver [req-830e0a7f-5a50-448c-a186-f082962c3c86 91704884e43f48fcbd156b8d7429fc3e 5e055db0a5464dc1997ab0f456792271 - default default] [instance: 3316f595-0e20-4914-90a2-c00da68c82ec] Creating image 2021-07-01 02:15:14.264 83631 ERROR nova.compute.manager [req-830e0a7f-5a50-448c-a186-f082962c3c86 91704884e43f48fcbd156b8d7429fc3e 5e055db0a5464dc1997ab0f456792271 - default default] Instance failed network setup after 1 attempt(s): keystoneauth1.exceptions.connection.ConnectFailure: Unable to establish connection to http://localhost:9696/v2.0/networks?id=163f0b54-e337-40ac-81af-958c24ceeb7f: HTTPConnectionPool(host='localhost', port=9696): Max retries exceeded with url: /v2.0/networks?id=163f0b54-e337-40ac-81af-958c24ceeb7f (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 111] ECONNREFUSED')) 2021-07-01 02:15:14.264 83631 ERROR nova.compute.manager Traceback (most recent call last): 2021-07-01 02:15:14.264 83631 ERROR nova.compute.manager File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 159, in _new_conn 2021-07-01 02:15:14.264 83631 ERROR nova.compute.manager conn = connection.create_connection( 2021-07-01 02:15:14.264 83631 ERROR nova.compute.manager File "/usr/lib/python3/dist-packages/urllib3/util/connection.py", line 84, in create_connection 2021-07-01 02:15:14.264 83631 ERROR nova.compute.manager raise err 2021-07-01 02:15:14.264 83631 ERROR nova.compute.manager File "/usr/lib/python3/dist-packages/urllib3/util/connection.py", line 74, in create_connection 2021-07-01 02:15:14.264 83631 ERROR nova.compute.manager sock.connect(sa) 2021-07-01 02:15:14.264 83631 ERROR nova.compute.manager File "/usr/lib/python3/dist-packages/eventlet/greenio/base.py", line 253, in connect 2021-07-01 02:15:14.264 83631 ERROR nova.compute.manager socket_checkerr(fd) 2021-07-01 02:15:14.264 83631 ERROR nova.compute.manager File "/usr/lib/python3/dist-packages/eventlet/greenio/base.py", line 51, in socket_checkerr 2021-07-01 02:15:14.264 83631 ERROR nova.compute.manager raise socket.error(err, errno.errorcode[err]) 2021-07-01 02:15:14.264 83631 ERROR nova.compute.manager ConnectionRefusedError: [Errno 111] ECONNREFUSED 2021-07-01 02:15:14.264 83631 ERROR nova.compute.manager 2021-07-01 02:15:14.264 83631 ERROR nova.compute.manager During handling of the above exception, another exception occurred: 2021-07-01 02:15:14.264 83631 ERROR nova.compute.manager 2021-07-01 02:15:14.264 83631 ERROR nova.compute.manager Traceback (most recent call last): 2021-07-01 02:15:14.264 83631 ERROR nova.compute.manager File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 665, in urlopen 2021-07-01 02:15:14.264 83631 ERROR nova.compute.manager httplib_response = self._make_request( 2021-07-01 02:15:14.264 83631 ERROR nova.compute.manager File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 387, in _make_request 2021-07-01 02:15:14.264 83631 ERROR nova.compute.manager conn.request(method, url, **httplib_request_kw) 2021-07-01 02:15:14.264 83631 ERROR nova.compute.manager File "/usr/lib/python3.8/http/client.py", line 1255, in request 2021-07-01 02:15:14.264 83631 ERROR nova.compute.manager self._send_request(method, url, body, headers, encode_chunked) 2021-07-01 02:15:14.264 83631 ERROR
[Yahoo-eng-team] [Bug 1921126] [NEW] [RFE] Allow explicit management of default routes
Public bug reported: This RFE proposes to allow explicit management of the default route(s) of a Neutron router. This is mostly useful for a user to install multiple default routes for Equal Cost Multipath (ECMP) and treat all these routes uniformly. Since I already written a spec proposal for this, please see the details there: https://review.opendev.org/c/openstack/neutron-specs/+/781475 ** Affects: neutron Importance: Wishlist Assignee: Bence Romsics (bence-romsics) Status: New ** Tags: rfe -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1921126 Title: [RFE] Allow explicit management of default routes Status in neutron: New Bug description: This RFE proposes to allow explicit management of the default route(s) of a Neutron router. This is mostly useful for a user to install multiple default routes for Equal Cost Multipath (ECMP) and treat all these routes uniformly. Since I already written a spec proposal for this, please see the details there: https://review.opendev.org/c/openstack/neutron-specs/+/781475 To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1921126/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1905295] [NEW] [RFE] Allow multiple external gateways on a router
Public bug reported: I'd like to bring the following idea to the drivers' meeting. If this still looks like a good idea after that discussion, I'll open a spec so this can be properly commented on in gerrit. Until then feel free to comment here of course. # Problem Description A general router can be configured to connect and route to multiple external networks for higher availability and/or to balance the load. However the current Neutron API syntax allows exactly one external gateway for a router. https://docs.openstack.org/api-ref/network/v2/?expanded=create-router- detail#create-router { "router": { "name": "router1", "external_gateway_info": { "network_id": "ae34051f-aa6c-4c75-abf5-50dc9ac99ef3", "enable_snat": true, "external_fixed_ips": [ { "ip_address": "172.24.4.6", "subnet_id": "b930d7f6-ceb7-40a0-8b81-a425dd994ccf" } ] }, "admin_state_up": true } } However consider the following (simplified) network architecture as an example: R3 R4 |X| R1 R2 |X| C1 C2 ... (Sorry, my original, nice ascii art was eaten by launchpad. I hope this still conveys what I mean.) Where C1, C2, ... are compute nodes, R1 and R2 are OpenStack-managed routers, while R3 and R4 are provider edge routers. Between R1-R2 and R3-R4 Equal Cost Multipath (ECMP) routing is used to utilize all links in an active-active manner. In such an architecture it makes sense to represent R1 and R2 as 2 logical routers with 2-2 external gateways, or in some cases (depending on other architectural choices) even as 1 logical router with 4 external gateways. But with the current API that is not possible. # Proposed Change Extend the router API object with a new attribute 'additional_external_gateways', for example: { "router" : { "name" : "router1", "admin_state_up" : true, "external_gateway_info" : { "enable_snat" : false, "external_fixed_ips" : [ { "ip_address" : "172.24.4.6", "subnet_id" : "b930d7f6-ceb7-40a0-8b81-a425dd994ccf" } ], "network_id" : "ae34051f-aa6c-4c75-abf5-50dc9ac99ef3" }, "additional_external_gateways" : [ { "enable_snat" : false, "external_fixed_ips" : [ { "ip_address" : "172.24.5.6", "subnet_id" : "62da64b0-29ab-11eb-9ed9-3b1175418487" } ], "network_id" : "592d4716-29ab-11eb-a7dd-4f4b5e319915" }, ... ] } } Edited via the following HTTP PUT methods with diff semantics: PUT /v2.0/routers/{router_id}/add_additional_external_gateways PUT /v2.0/routers/{router_id}/remove_additional_external_gateways We keep 'external_gateway_info' for backwards compatibility. When additional_external_gateways is an empty list, everything behaves as before. When additional_external_gateways are given, then the actual list of external gateways is (in Python-like pseudo-code): [external_gateway_info] + additional_external_gateways. Unless otherwise specified all non-directly connected external IPs are routed towards the original external_gateway_info. However this behavior may be overriden by either using (static) extraroutes, or by running () routing protocols and routing towards the external gateway where a particular route was learned from. # Alternatives 1) Using 4 logical routers with 1 external gateway each. However in this case the API misses the information which (2 or 4) logical routers represent the same backend router. 2) Using a VRRP HA router. However this provides a different level of High Availability plus it is active-passive instead of active-active. 3) Adding router interfaces (since their number is not limited in the API) instead of external gateways. However this creates confusion by blurring the line of what is internal and what is external to the cloud deployment. ** Affects: neutron Importance: Wishlist Assignee: Bence Romsics (bence-romsics) Status: New ** Tags: rfe ** Description changed: I'd like to bring the following idea to the drivers' meeting. If this still looks like a good idea after that discussion, I'll open a spec so this can be properly commented on in gerrit. Until then feel free to comment here of course. # Problem Description A general router can be configured to connect and route to multiple external networks for higher availability and/or to balance the load
[Yahoo-eng-team] [Bug 1878031] Re: Unable to delete an instance | Conflict: Port [port-id] is currently a parent port for trunk [trunk-id]
While I agree that it would be way more user friendly to give a warning/error in the problematic API workflow that would entail some cross project changes because today: * nova does not know when an already bound port is added to a trunk * neutron does not know if nova is supposed to auto-delete a port That means neither nova nor neutron can detect the error condition in itself. Again, I believe changing the workflow to pre-create the parent port for the server stops the problem described in this bug report completely. So I'm setting this bug as Invalid. But let me know if you see other alternatives. ** Changed in: neutron Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1878031 Title: Unable to delete an instance | Conflict: Port [port-id] is currently a parent port for trunk [trunk-id] Status in neutron: Invalid Bug description: When you create a trunk in Neutron you create a parent port for the trunk and attach the trunk to the parent. Then subports can be created on the trunk. When instances are created on the trunk, first a port is created and then an instance is associated with a free port. It looks to me that's this is the oversight in the logic. From the perspective of the code, the parent port looks like any other port attached to the trunk bridge. It doesn't have an instance attached to it so it looks like it's not being used for anything (which is technically correct). So it becomes an eligible port for an instance to bind to. That is all fine and dandy until you go to delete the instance and you get the "Port [port-id] is currently a parent port for trunk [trunk-id]" exception just as happened here. Anecdotally, it's seems rare that an instance will actually bind to it, but that is what happened for the user in this case and I have had several pings over the past year about people in a similar state. I propose that when a port is made parent port for a trunk, that the trunk be established as the owner of the port. That way it will be ineligible for instances seeking to bind to the port. See also old bug: https://bugs.launchpad.net/neutron/+bug/1700428 Description of problem: Attempting to delete instance failed with error in nova-compute ~~~ 2020-03-04 09:52:46.257 1 WARNING nova.network.neutronv2.api [req-0dd45fe4-861c-46d3-a5ec-7db36352da58 02c6d1bc10fe4ffaa289c786cd09b146 695c417810ac460480055b074bc41817 - default default] [instance: 2f9e3740-b425-4f00-a949-e1aacf2239c4] Failed to delete port 991e4e50-481a-4ca6-9ea6-69f848c4ca9f for instance.: Conflict: Port 991e4e50-481a-4ca6-9ea6-69f848c4ca9f is currently a parent port for trunk 5800ee0f-b558-46cb-bb0b-92799dbe02cf. ~~~ ~~~ [stack@migration-host ~]$ openstack network trunk show 5800ee0f-b558-46cb-bb0b-92799dbe02cf +-+--+ | Field | Value| +-+--+ | admin_state_up | UP | | created_at | 2020-03-04T09:01:23Z | | description | | | id | 5800ee0f-b558-46cb-bb0b-92799dbe02cf | | name| WIN-TRUNK| | port_id | 991e4e50-481a-4ca6-9ea6-69f848c4ca9f | | project_id | 695c417810ac460480055b074bc41817 | | revision_number | 3| | status | ACTIVE | | sub_ports | | | tags| [] | | tenant_id | 695c417810ac460480055b074bc41817 | | updated_at | 2020-03-04T10:20:46Z | +-+--+ [stack@migration-host ~]$ nova interface-list 2f9e3740-b425-4f00-a949-e1aacf2239c4 ++--+--+--+---+ | Port State | Port ID | Net ID | IP addresses | MAC Addr | ++--+--+--+---+ | DOWN | 991e4e50-481a-4ca6-9ea6-69f848c4ca9f | 9be62c82-4274-48b4-bba0-39ccbdd5bb1b | 192.168.0.19 | fa:16:3e:0a:2b:9b | ++--+--+--+---+ [stack@migration-host ~]$ openstack port show 991e4e50-481a-4ca6-9ea6-69f848c4ca9f +---+---+ | Field | Value
[Yahoo-eng-team] [Bug 1878622] Re: Open vSwitch with DPDK datapath in neutron
Thank you for your bug report! I believe this typo was fixed in the change below: https://review.opendev.org/565289 So the command is correct since the rocky version of our docs, for example: https://docs.openstack.org/neutron/latest/admin/config-ovs-dpdk.html ** Changed in: neutron Status: New => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1878622 Title: Open vSwitch with DPDK datapath in neutron Status in neutron: Fix Released Bug description: - [ x ] I have a fix to the document that I can paste below including example: input and output. There is a typo in the following documentation page: https://docs.openstack.org/neutron/queens/admin/config-ovs-dpdk.html $ openstack image set --property hw_vif_mutliqueue_enabled=true IMAGE_NAME should read: $ openstack image set --property hw_vif_multiqueue_enabled=true IMAGE_NAME (i.e. multi not mutli) --- Release: 12.1.2.dev96 on 2020-05-11 17:10 SHA: ed413939fcd134ee616078c017272f229b09f1d9 Source: https://git.openstack.org/cgit/openstack/neutron/tree/doc/source/admin/config-ovs-dpdk.rst URL: https://docs.openstack.org/neutron/queens/admin/config-ovs-dpdk.html To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1878622/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1878632] [NEW] Race condition in subnet and segment delete: The segment is still bound with port(s)
08]: ERROR heat.engine.resource Traceback (most recent call last): máj 14 14:37:11 devstack1 heat-engine[12508]: ERROR heat.engine.resource File "/opt/stack/heat/heat/engine/resource.py", line 918, in _action_recorder máj 14 14:37:11 devstack1 heat-engine[12508]: ERROR heat.engine.resource yield máj 14 14:37:11 devstack1 heat-engine[12508]: ERROR heat.engine.resource File "/opt/stack/heat/heat/engine/resource.py", line 2051, in delete máj 14 14:37:11 devstack1 heat-engine[12508]: ERROR heat.engine.resource *action_args) máj 14 14:37:11 devstack1 heat-engine[12508]: ERROR heat.engine.resource File "/opt/stack/heat/heat/engine/scheduler.py", line 326, in wrapper máj 14 14:37:11 devstack1 heat-engine[12508]: ERROR heat.engine.resource step = next(subtask) máj 14 14:37:11 devstack1 heat-engine[12508]: ERROR heat.engine.resource File "/opt/stack/heat/heat/engine/resource.py", line 972, in action_handler_task máj 14 14:37:11 devstack1 heat-engine[12508]: ERROR heat.engine.resource handler_data = handler(*args) máj 14 14:37:11 devstack1 heat-engine[12508]: ERROR heat.engine.resource File "/opt/stack/heat/heat/engine/resources/openstack/neutron/segment.py", line 146, in handle_delete máj 14 14:37:11 devstack1 heat-engine[12508]: ERROR heat.engine.resource self.client('openstack').network.delete_segment(self.resource_id) máj 14 14:37:11 devstack1 heat-engine[12508]: ERROR heat.engine.resource File "/usr/local/lib/python3.6/dist-packages/openstack/network/v2/_proxy.py", line 3312, in delete_segment máj 14 14:37:11 devstack1 heat-engine[12508]: ERROR heat.engine.resource self._delete(_segment.Segment, segment, ignore_missing=ignore_missing) máj 14 14:37:11 devstack1 heat-engine[12508]: ERROR heat.engine.resource File "/usr/local/lib/python3.6/dist-packages/openstack/proxy.py", line 46, in check máj 14 14:37:11 devstack1 heat-engine[12508]: ERROR heat.engine.resource return method(self, expected, actual, *args, **kwargs) máj 14 14:37:11 devstack1 heat-engine[12508]: ERROR heat.engine.resource File "/usr/local/lib/python3.6/dist-packages/openstack/network/v2/_proxy.py", line 75, in _delete máj 14 14:37:11 devstack1 heat-engine[12508]: ERROR heat.engine.resource rv = res.delete(self, if_revision=if_revision) máj 14 14:37:11 devstack1 heat-engine[12508]: ERROR heat.engine.resource File "/usr/local/lib/python3.6/dist-packages/openstack/resource.py", line 1615, in delete máj 14 14:37:11 devstack1 heat-engine[12508]: ERROR heat.engine.resource self._translate_response(response, has_body=False, **kwargs) máj 14 14:37:11 devstack1 heat-engine[12508]: ERROR heat.engine.resource File "/usr/local/lib/python3.6/dist-packages/openstack/resource.py", line 1113, in _translate_response máj 14 14:37:11 devstack1 heat-engine[12508]: ERROR heat.engine.resource exceptions.raise_from_response(response, error_message=error_message) máj 14 14:37:11 devstack1 heat-engine[12508]: ERROR heat.engine.resource File "/usr/local/lib/python3.6/dist-packages/openstack/exceptions.py", line 236, in raise_from_response máj 14 14:37:11 devstack1 heat-engine[12508]: ERROR heat.engine.resource http_status=http_status, request_id=request_id máj 14 14:37:11 devstack1 heat-engine[12508]: ERROR heat.engine.resource openstack.exceptions.ConflictException: ConflictException: 409: Client Error for url: http://192.168.122.246:9696/v2.0/segments/641c8c60-59c9-4972-bf82-3637f3e0f1cb, Segment '641c8c60-59c9-4972-bf82-3637f3e0f1cb' cannot be deleted: The segment is still bound with port(s) 8cf8f188-5ea4-41b0-aa3a-fb8a8802888d. máj 14 14:37:11 devstack1 heat-engine[12508]: ERROR heat.engine.resource # a few seconds later a second delete succeeds $ openstack stack delete s0 --yes --wait 2020-05-14 14:24:26Z [s0]: DELETE_IN_PROGRESS Stack DELETE started I have an idea what the root cause is. I'll describe that in a comment. ** Affects: neutron Importance: Medium Assignee: Bence Romsics (bence-romsics) Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1878632 Title: Race condition in subnet and segment delete: The segment is still bound with port(s) Status in neutron: New Bug description: The HOT template below may expose a race condition and by that make stack deletion fail. On the neutron API this means that a segment delete fails with "The segment is still bound with port(s)". The reproduction uses a HOT template but I don't think this problem is Heat specific. Rather I think it depends on quick succession of API calls, which Heat does rather well. Configuration: ml2_conf.ini [ml2] mechanism_drivers = openvswitch,linuxbridge,sriovnicswitch,l2population tenant_network_types = vxlan,vlan [ml2_ty
[Yahoo-eng-team] [Bug 1871340] [NEW] neutron.tests.functional.agent.ovn.metadata.test_metadata_agent.TestMetadataAgent.test_agent_registration_at_chassis_create_event fails randomly
Public bug reported: Seemingly starting from the 1st of April neutron.tests.functional.agent.ovn.metadata.test_metadata_agent.TestMetadataAgent.test_agent_registration_at_chassis_create_event fails randomly in the gate with the error message: 2020-04-06 08:55:57.302891 | controller | == 2020-04-06 08:55:57.302931 | controller | Failed 1 tests - output below: 2020-04-06 08:55:57.302953 | controller | == 2020-04-06 08:55:57.302972 | controller | 2020-04-06 08:55:57.302992 | controller | neutron.tests.functional.agent.ovn.metadata.test_metadata_agent.TestMetadataAgent.test_agent_registration_at_chassis_create_event 2020-04-06 08:55:57.303012 | controller | - 2020-04-06 08:55:57.303030 | controller | 2020-04-06 08:55:57.303050 | controller | Captured traceback: 2020-04-06 08:55:57.303069 | controller | ~~~ 2020-04-06 08:55:57.303088 | controller | Traceback (most recent call last): 2020-04-06 08:55:57.303107 | controller | 2020-04-06 08:55:57.303126 | controller | File "/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/base.py", line 182, in func 2020-04-06 08:55:57.303145 | controller | return f(self, *args, **kwargs) 2020-04-06 08:55:57.303164 | controller | 2020-04-06 08:55:57.303184 | controller | File "/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/functional/agent/ovn/metadata/test_metadata_agent.py", line 220, in test_agent_registration_at_chassis_create_event 2020-04-06 08:55:57.303203 | controller | chassis.external_ids) 2020-04-06 08:55:57.303223 | controller | 2020-04-06 08:55:57.303242 | controller | File "/home/zuul/src/opendev.org/openstack/neutron/.tox/dsvm-functional/lib/python3.6/site-packages/testtools/testcase.py", line 421, in assertIn 2020-04-06 08:55:57.303261 | controller | self.assertThat(haystack, Contains(needle), message) 2020-04-06 08:55:57.303281 | controller | 2020-04-06 08:55:57.303300 | controller | File "/home/zuul/src/opendev.org/openstack/neutron/.tox/dsvm-functional/lib/python3.6/site-packages/testtools/testcase.py", line 502, in assertThat 2020-04-06 08:55:57.303319 | controller | raise mismatch_error 2020-04-06 08:55:57.303338 | controller | 2020-04-06 08:55:57.303357 | controller | testtools.matchers._impl.MismatchError: 'neutron:ovn-metadata-id' not in {'ovn-bridge-mappings': ''} Example log: https://99f8d9af3210ff587b09-7ad1a719016265adf2ccc36ef6645b87.ssl.cf2.rackcdn.com/702247/7/gate /neutron-functional/f42992a/job-output.txt Logstash: http://logstash.openstack.org/#/dashboard/file/logstash.json?query=message:%5C%22neutron :ovn-metadata-id%5C%22%20AND%20message:%5C%22ovn-bridge- mappings%5C%22%20AND%20voting:1=864000s ** Affects: neutron Importance: Undecided Status: New ** Tags: gate-failure ovn ** Tags added: ovn -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1871340 Title: neutron.tests.functional.agent.ovn.metadata.test_metadata_agent.TestMetadataAgent.test_agent_registration_at_chassis_create_event fails randomly Status in neutron: New Bug description: Seemingly starting from the 1st of April neutron.tests.functional.agent.ovn.metadata.test_metadata_agent.TestMetadataAgent.test_agent_registration_at_chassis_create_event fails randomly in the gate with the error message: 2020-04-06 08:55:57.302891 | controller | == 2020-04-06 08:55:57.302931 | controller | Failed 1 tests - output below: 2020-04-06 08:55:57.302953 | controller | == 2020-04-06 08:55:57.302972 | controller | 2020-04-06 08:55:57.302992 | controller | neutron.tests.functional.agent.ovn.metadata.test_metadata_agent.TestMetadataAgent.test_agent_registration_at_chassis_create_event 2020-04-06 08:55:57.303012 | controller | - 2020-04-06 08:55:57.303030 | controller | 2020-04-06 08:55:57.303050 | controller | Captured traceback: 2020-04-06 08:55:57.303069 | controller | ~~~ 2020-04-06 08:55:57.303088 | controller | Traceback (most recent call last): 2020-04-06 08:55:57.303107 | controller | 2020-04-06 08:55:57.303126 | controller | File "/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/base.py", line 182, in func 2020-04-06 08:55:57.303145 | controller | return f(self, *args, **kwargs) 2020-04-06 08:55:57.303164 | controller | 2020-04-06 08:55:57.303184 | controller | File "/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/functional/agent/ovn/metadata/test_metadata_agent.py", line 220, in
[Yahoo-eng-team] [Bug 1870110] [NEW] neutron-rally-task fails in rally_openstack.task.scenarios.neutron.trunk.CreateAndListTrunks
Public bug reported: It seems we have a gate failure in neutron-rally-task. It fails in rally_openstack.task.scenarios.neutron.trunk.CreateAndListTrunks. For example: https://zuul.opendev.org/t/openstack/build/9c9970da456d4145a174f73c90529dd2/log/job-output.txt#41274 https://zuul.opendev.org/t/openstack/build/8319cc946cc9407a90467f68757c11e8/log/job-output.txt#41269 ** Affects: neutron Importance: Undecided Status: New ** Tags: gate-failure -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1870110 Title: neutron-rally-task fails in rally_openstack.task.scenarios.neutron.trunk.CreateAndListTrunks Status in neutron: New Bug description: It seems we have a gate failure in neutron-rally-task. It fails in rally_openstack.task.scenarios.neutron.trunk.CreateAndListTrunks. For example: https://zuul.opendev.org/t/openstack/build/9c9970da456d4145a174f73c90529dd2/log/job-output.txt#41274 https://zuul.opendev.org/t/openstack/build/8319cc946cc9407a90467f68757c11e8/log/job-output.txt#41269 To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1870110/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1866353] Re: Neutron API returning HTTP 201 for SG rule create when not fully created yet
** Changed in: neutron Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1866353 Title: Neutron API returning HTTP 201 for SG rule create when not fully created yet Status in neutron: Invalid Bug description: Neutron API returns HTTP 201 (Created) for security group rule create requests, although it takes longer to apply the configuration to the port. This means for a period of time the firewall on the port is outdated, eventually posing a security risk or applications to fail/misbehave. Even though not tested, it might even be that the q-agent could completely miss the SG rule add event from the Neutron server and never apply it. The log below is of a security group rule create request from Octavia to Neutron. Neutron returns HTTP 201 but the q-agent has not yet applied the configuration. The Octavia tempest test expects the load balancer VIP to conform to the security group rules but fails as the q-agent still have not applied the new security group rule to the port yet. Mar 03 17:33:24.786466 ubuntu-bionic-airship-kna1-0014969351 octavia-worker[8605]: DEBUG octavia.controller.worker.v1.controller_worker [-] Task 'octavia.controller.worker.v1.tasks.network_tasks.UpdateVIP' (10c8bae1-19b1-4757-9530-12ac29384565) transitioned into state 'RUNNING' from state 'PENDING' {{(pid=8984) _task_receiver /usr/local/lib/python3.6/dist-packages/taskflow/listeners/logging.py:194}} Mar 03 17:33:24.787574 ubuntu-bionic-airship-kna1-0014969351 octavia-worker[8605]: DEBUG octavia.controller.worker.v1.tasks.network_tasks [None req-6bbb57f5-2a06-4e8e-9ddd-6da259333fd7 None None] Updating VIP of load_balancer 61145d72-04e1-49bd-bcb0-5c215ed217ea. {{(pid=8984) execute /opt/stack/octavia/octavia/controller/worker/v1/tasks/network_tasks.py:472}} Mar 03 17:33:24.805139 ubuntu-bionic-airship-kna1-0014969351 octavia-worker[8605]: DEBUG octavia.network.drivers.neutron.base [None req-6bbb57f5-2a06-4e8e-9ddd-6da259333fd7 None None] Neutron extension security-group found enabled {{(pid=8984) _check_extension_enabled /opt/stack/octavia/octavia/network/drivers/neutron/base.py:66}} Mar 03 17:33:24.819184 ubuntu-bionic-airship-kna1-0014969351 octavia-worker[8605]: DEBUG octavia.network.drivers.neutron.base [None req-6bbb57f5-2a06-4e8e-9ddd-6da259333fd7 None None] Neutron extension dns-integration is not enabled {{(pid=8984) _check_extension_enabled /opt/stack/octavia/octavia/network/drivers/neutron/base.py:70}} Mar 03 17:33:24.832337 ubuntu-bionic-airship-kna1-0014969351 octavia-worker[8605]: DEBUG octavia.network.drivers.neutron.base [None req-6bbb57f5-2a06-4e8e-9ddd-6da259333fd7 None None] Neutron extension qos found enabled {{(pid=8984) _check_extension_enabled /opt/stack/octavia/octavia/network/drivers/neutron/base.py:66}} Mar 03 17:33:24.847909 ubuntu-bionic-airship-kna1-0014969351 octavia-worker[8605]: DEBUG octavia.network.drivers.neutron.base [None req-6bbb57f5-2a06-4e8e-9ddd-6da259333fd7 None None] Neutron extension allowed-address-pairs found enabled {{(pid=8984) _check_extension_enabled /opt/stack/octavia/octavia/network/drivers/neutron/base.py:66}} Mar 03 17:33:25.221590 ubuntu-bionic-airship-kna1-0014969351 neutron-server[7030]: INFO neutron.wsgi [None req-137e4288-fac0-490b-b828-8b43a94f675c admin admin] 10.0.1.16,10.0.1.16 "POST /v2.0/security-group-rules HTTP/1.1" status: 201 len: 725 time: 0.1413145 Mar 03 17:33:25.224900 ubuntu-bionic-airship-kna1-0014969351 octavia-worker[8605]: DEBUG octavia.controller.worker.v1.controller_worker [-] Task 'octavia.controller.worker.v1.tasks.network_tasks.UpdateVIP' (10c8bae1-19b1-4757-9530-12ac29384565) transitioned into state 'SUCCESS' from state 'RUNNING' with result 'None' {{(pid=8984) _task_receiver /usr/local/lib/python3.6/dist-packages/taskflow/listeners/logging.py:183}} Mar 03 17:33:25.224298 ubuntu-bionic-airship-kna1-0014969351 neutron-openvswitch-agent[7528]: DEBUG neutron.agent.resource_cache [None req-137e4288-fac0-490b-b828-8b43a94f675c admin admin] Received new resource SecurityGroupRule: SecurityGroupRule(created_at=2020-03-03T17:33:25Z,description='',direction='ingress',ethertype='IPv4',id=73e2e34d-a813-4846-8f85-2b8daae5d29c,port_range_max=8080,port_range_min=8080,project_id='e821f6bae64f4fa0bca1c230fbf4b364',protocol='tcp',remote_group_id=,remote_ip_prefix=192.0.1.0/32,revision_number=0,security_group_id=14216a23-b9c5-4cb3-b42d-c76b22c643ec,updated_at=2020-03-03T17:33:25Z) {{(pid=7528) record_resource_update /opt/stack/neutron/neutron/agent/resource_cache.py:192}} Mar 03 17:33:25.224767 ubuntu-bionic-airship-kna1-0014969351 neutron-openvswitch-agent[7528]: DEBUG neutron_lib.callbacks.manager [None req-137e4288-fac0-490b-b828-8b43a94f675c admin admin] Notify callbacks
[Yahoo-eng-team] [Bug 1845575] Re: Networking Option 1: Provider networks in neutron
Please note that the following two lines are NOT the same, one config option ends in urI the other ends in urL. In later versions keystone folks renamed auth_uri to www_authenticate_uri so it's easier to distinguish these config options. But in queens we have to live with this. auth_uri = http://controller:5000 auth_url = http://controller:5000 ** Changed in: neutron Status: In Progress => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1845575 Title: Networking Option 1: Provider networks in neutron Status in neutron: Invalid Bug description: In part: Configure the server component¶ auth_uri = http://controller:5000 auth_url = http://controller:5000 The two sentences are the same. This bug tracker is for errors with the documentation, use the following as a template and remove or add fields as you see fit. Convert [ ] into [x] to check boxes: - [ ] This doc is inaccurate in this way: __ - [ ] This is a doc addition request. - [ ] I have a fix to the document that I can paste below including example: input and output. If you have a troubleshooting or support issue, use the following resources: - Ask OpenStack: http://ask.openstack.org - The mailing list: http://lists.openstack.org - IRC: 'openstack' channel on Freenode --- Release: 12.1.1.dev43 on 2019-09-21 05:59 SHA: b3d3d6d64358f6e8340bf0dbdff716968bf0d92c Source: https://git.openstack.org/cgit/openstack/neutron/tree/doc/source/install/controller-install-option1-ubuntu.rst URL: https://docs.openstack.org/neutron/queens/install/controller-install-option1-ubuntu.html To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1845575/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1836253] Re: Sometimes InstanceMetada API returns 404 due to invalid InstaceID returned by _get_instance_and_tenant_id()
I don't know when William will read my previous comment, but overall what I found is this: The cache of metadata-agent was designed to be invalidated by time-based expiry. That method has the reported kind of side effect if a client is too fast. Which is not perfect, but usually can be addressed by tweaking the cache TTL and/or waiting more in the client. A more correct cache invalidation is theoretically possible, but I think it is not feasible, because it would introduce cross-dependencies between metadata-agent and far-away parts of neutron. Therefore I'm inclined to mark this bug report as Invalid (not a bug). Let me know please if I missed something here. ** Changed in: neutron Status: Confirmed => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1836253 Title: Sometimes InstanceMetada API returns 404 due to invalid InstaceID returned by _get_instance_and_tenant_id() Status in neutron: Invalid Bug description: Sometimes on instance initialization, the metadata step fails. On metadata-agent.log there are lots of 404: "GET /2009-04-04/meta-data/instance-id HTTP/1.1" status: 404 len: 297 time: 0.0771070 On nova-api.log we get 404 too: "GET /2009-04-04/meta-data/instance-id HTTP/1.1" status: 404 After some debuging we get that problem occurs when new instances is getting same IP used by deleted instances. The problem is related to cache implementation on method "_get_ports_for_remote_address()" on "/neutron/agent/metadata/agent.py" that returns an port from deleted instance (with the same IP) which returns wrong InstanceID that will be sent to nova-api which will fail because this instanceId not exists. This problem only occurs with cache enabled on neuton metadata-agent. Version: Queens How to reproduce: --- #!/bin/bash computenodelist=( 'computenode00.test.openstack.net' 'computenode01.test.openstack.net' 'computenode02.test.openstack.net' 'computenode03.test.openstack.net' ) validate_metadata(){ cat << EOF > /tmp/metadata #!/bin/sh -x if curl 192.168.10.2 then echo "ControllerNode00 - OK" else echo "ControllerNode00 - ERROR" fi EOF #SUBNAME=$(date +%s) openstack server delete "${node}" 2>/dev/null source /root/admin-openrc openstack server create --image cirros --nic net-id=internal --flavor Cirros --security-group default --user-data /tmp/metadata --availability-zone nova:${node} --wait "${node}" &> /dev/null i=0 until [ $i -gt 3 ] || openstack console log show "${node}" | grep -q "ControllerNode00" do i=$((i+1)) sleep 1 done openstack console log show "${node}" | grep -q "ControllerNode00 - OK" if [ $? == 0 ]; then echo "Metadata Servers OK: ${node}" else echo "Metadata Servers ERROR: ${node}" fi rm /tmp/metadata } for node in ${computenodelist[@]} do export node validate_metadata done echo -e "\n" --- To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1836253/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1833674] [NEW] [RFE] Improve profiling of port binding and vif plugging
Public bug reported: As discussed on the 2019-May PTG in Denver we want to measure then improve the performance of Neutron's most important operation that is port binding. As we're working with OSProfiler reports we are realizing the report is incomplete. We could turn on tracing in other components and subcomponents by further propagating trace information. We heavily build on some previous work: * https://bugs.launchpad.net/neutron/+bug/1335640 [RFE] Neutron support for OSprofiler * https://review.opendev.org/615350 Integrate rally with osprofiler A few patches were already merged before opening this RFE: * https://review.opendev.org/662804 Run nova's VM boot rally scenario in the neutron gate * https://review.opendev.org/665614 Allow VM booting rally scenarios to time out We already see the need for a few changes: * New rally scenario to measure port binding * Profiling coverage for vif plugging This work is also driven by the discoveries made while interpreting profiler reports so I expect further changes here and there. ** Affects: neutron Importance: Wishlist Assignee: Bence Romsics (bence-romsics) Status: In Progress ** Tags: osprofiler rfe -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1833674 Title: [RFE] Improve profiling of port binding and vif plugging Status in neutron: In Progress Bug description: As discussed on the 2019-May PTG in Denver we want to measure then improve the performance of Neutron's most important operation that is port binding. As we're working with OSProfiler reports we are realizing the report is incomplete. We could turn on tracing in other components and subcomponents by further propagating trace information. We heavily build on some previous work: * https://bugs.launchpad.net/neutron/+bug/1335640 [RFE] Neutron support for OSprofiler * https://review.opendev.org/615350 Integrate rally with osprofiler A few patches were already merged before opening this RFE: * https://review.opendev.org/662804 Run nova's VM boot rally scenario in the neutron gate * https://review.opendev.org/665614 Allow VM booting rally scenarios to time out We already see the need for a few changes: * New rally scenario to measure port binding * Profiling coverage for vif plugging This work is also driven by the discoveries made while interpreting profiler reports so I expect further changes here and there. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1833674/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1826396] [NEW] Atomic Extraroute API
Public bug reported: As discussed in an openstack-disciss thread [1] we could improve the extraroute API to better support Neutron API clients, especially Heat. The problem is that the current extraroute API does not allow atomic additions/deletions of particular routing table entries. In the current API the routes attribute of a router (containing all routing table entries) must be updated at once. Therefore additions and deletions must be performed on the client side. Therefore multiple clients race to update the routes attribute and updates may get lost. A detailed spec is coming soon. [1] http://lists.openstack.org/pipermail/openstack- discuss/2019-April/005121.html1 ** Affects: neutron Importance: Undecided Assignee: Bence Romsics (bence-romsics) Status: New ** Tags: rfe ** Summary changed: - Add atomic extraroute API + Atomic Extraroute API -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1826396 Title: Atomic Extraroute API Status in neutron: New Bug description: As discussed in an openstack-disciss thread [1] we could improve the extraroute API to better support Neutron API clients, especially Heat. The problem is that the current extraroute API does not allow atomic additions/deletions of particular routing table entries. In the current API the routes attribute of a router (containing all routing table entries) must be updated at once. Therefore additions and deletions must be performed on the client side. Therefore multiple clients race to update the routes attribute and updates may get lost. A detailed spec is coming soon. [1] http://lists.openstack.org/pipermail/openstack- discuss/2019-April/005121.html1 To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1826396/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1821948] [NEW] Unstable unit test uses subnet broadcast address
Public bug reported: This is a low frequency gate failure in unit tests. Example log: http://logs.openstack.org/10/645210/4/check/openstack-tox-py37/688ffa8/job-output.txt.gz Logstash search: http://logstash.openstack.org/#/dashboard/file/logstash.json?query=message:%5C%22line%20171,%20in%20test_port_ip_update_revises%5C%22%20AND%20voting:1=864000s 2019-03-25 12:16:24.333688 | ubuntu-bionic | == 2019-03-25 12:16:24.333764 | ubuntu-bionic | Failed 1 tests - output below: 2019-03-25 12:16:24.333837 | ubuntu-bionic | == 2019-03-25 12:16:24.333863 | ubuntu-bionic | 2019-03-25 12:16:24.334052 | ubuntu-bionic | neutron.tests.unit.services.revisions.test_revision_plugin.TestRevisionPlugin.test_port_ip_update_revises 2019-03-25 12:16:24.334243 | ubuntu-bionic | - 2019-03-25 12:16:24.334271 | ubuntu-bionic | 2019-03-25 12:16:24.334326 | ubuntu-bionic | Captured traceback: 2019-03-25 12:16:24.334381 | ubuntu-bionic | ~~~ 2019-03-25 12:16:24.334471 | ubuntu-bionic | b'Traceback (most recent call last):' 2019-03-25 12:16:24.334662 | ubuntu-bionic | b' File "/home/zuul/src/git.openstack.org/openstack/neutron/neutron/tests/base.py", line 174, in func' 2019-03-25 12:16:24.334754 | ubuntu-bionic | b'return f(self, *args, **kwargs)' 2019-03-25 12:16:24.335103 | ubuntu-bionic | b' File "/home/zuul/src/git.openstack.org/openstack/neutron/neutron/tests/unit/services/revisions/test_revision_plugin.py", line 171, in test_port_ip_update_revises' 2019-03-25 12:16:24.335243 | ubuntu-bionic | b"response = self._update('ports', port['port']['id'], new)" 2019-03-25 12:16:24.335490 | ubuntu-bionic | b' File "/home/zuul/src/git.openstack.org/openstack/neutron/neutron/tests/unit/db/test_db_base_plugin_v2.py", line 601, in _update' 2019-03-25 12:16:24.335642 | ubuntu-bionic | b' self.assertEqual(expected_code, res.status_int)' 2019-03-25 12:16:24.335921 | ubuntu-bionic | b' File "/home/zuul/src/git.openstack.org/openstack/neutron/.tox/py37/lib/python3.7/site-packages/testtools/testcase.py", line 411, in assertEqual' 2019-03-25 12:16:24.336035 | ubuntu-bionic | b' self.assertThat(observed, matcher, message)' 2019-03-25 12:16:24.336297 | ubuntu-bionic | b' File "/home/zuul/src/git.openstack.org/openstack/neutron/.tox/py37/lib/python3.7/site-packages/testtools/testcase.py", line 498, in assertThat' 2019-03-25 12:16:24.336372 | ubuntu-bionic | b'raise mismatch_error' 2019-03-25 12:16:24.336486 | ubuntu-bionic | b'testtools.matchers._impl.MismatchError: 200 != 400' 2019-03-25 12:16:24.336523 | ubuntu-bionic | b'' 2019-03-25 12:16:24.336549 | ubuntu-bionic | 2019-03-25 12:16:24.336599 | ubuntu-bionic | Captured stderr: 2019-03-25 12:16:24.336650 | ubuntu-bionic | 2019-03-25 12:16:24.337086 | ubuntu-bionic | b'/home/zuul/src/git.openstack.org/openstack/neutron/.tox/py37/lib/python3.7/site-packages/neutron_lib/context.py:154: DeprecationWarning: context.session is used with and without new enginefacade. Please update the code to use new enginefacede consistently.' 2019-03-25 12:16:24.337157 | ubuntu-bionic | b' DeprecationWarning)' 2019-03-25 12:16:24.337594 | ubuntu-bionic | b'/home/zuul/src/git.openstack.org/openstack/neutron/.tox/py37/lib/python3.7/site-packages/neutron_lib/context.py:154: DeprecationWarning: context.session is used with and without new enginefacade. Please update the code to use new enginefacede consistently.' 2019-03-25 12:16:24.337664 | ubuntu-bionic | b' DeprecationWarning)' 2019-03-25 12:16:24.337701 | ubuntu-bionic | b'' With some extra debug logging added I managed to obtain this error message: ERROR [neutron.tests.unit.db.test_db_base_plugin_v2] XXX b\'{"NeutronError": {"type": "InvalidIpForNetwork", "message": "IP address 10.0.0.255 is not a valid IP for any of the subnets on the specified network.", "detail": ""}}\' Reading the unit test source it seems likely that a random IP+1 is occasionally the subnet broadcast address which is invalid as a fixed_ip. https://opendev.org/openstack/neutron/src/commit/1ea9326fda303b48905d7f7748d320ba8e9322aa/neutron/tests/unit/services/revisions/test_revision_plugin.py#L169 I'm going to upload an attempted fix soon. ** Affects: neutron Importance: Medium Assignee: Bence Romsics (bence-romsics) Status: In Progress ** Tags: gate-failure -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1821948 Title: Unstable unit test uses subnet broadcast address Status in neutron: In Progress Bug description: Thi
[Yahoo-eng-team] [Bug 1821654] Re: Neutron Installation Prerequisites. The mysql command cannot execute without parameters
Since we have two contradicting bug reports over the preferred form I'm marking this as Opinion. ** Changed in: neutron Status: New => Opinion ** Changed in: neutron Importance: Undecided => Wishlist -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1821654 Title: Neutron Installation Prerequisites. The mysql command cannot execute without parameters Status in neutron: Opinion Bug description: In the first step of the prerequisites (https://docs.openstack.org/neutron/rocky/install/controller-install-rdo.html#prerequisites) the first instruction is to connect to the DB Server. The documentation instructs to use command # mysql The command cannot be run as instructed, it should instruct the using of parameters like in the installation of other services e.g. identity, compute: # mysql -u root -p This bug tracker is for errors with the documentation, use the following as a template and remove or add fields as you see fit. Convert [ ] into [x] to check boxes: - [X] This doc is inaccurate in this way: The command will not execute as instructed - [ ] This is a doc addition request. - [ ] I have a fix to the document that I can paste below including example: input and output. If you have a troubleshooting or support issue, use the following resources: - Ask OpenStack: http://ask.openstack.org - The mailing list: http://lists.openstack.org - IRC: 'openstack' channel on Freenode --- Release: 13.0.3.dev77 on 2019-03-22 23:34 SHA: cfb6e0eb72bcb12cdca76c0baf14df86bd95c272 Source: https://git.openstack.org/cgit/openstack/neutron/tree/doc/source/install/controller-install-rdo.rst URL: https://docs.openstack.org/neutron/rocky/install/controller-install-rdo.html To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1821654/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1819029] [NEW] QoS policies with minimum-bandwidth rule should be rejected on non-physnet ports/networks
Public bug reported: We seem to have forgot to reject some API operations that are actually not supported (and weren't planned to be supported) by the Stein implementation of the Guaranteed Minimum Bandwidth feature. That is QoS policies with a minimum-bandwidth rule should not be used on ports/networks that are not backed by a physnet. But currently we allow this: $ openstack network show private | egrep provider | provider:network_type | vxlan | provider:physical_network | None | provider:segmentation_id | 1062 $ openstack network qos policy create policy0 $ openstack network qos rule create policy0 --type minimum-bandwidth --min-kbps 1000 --egress $ openstack port create port0 --network private --qos-policy policy0 The port-create seems to work today, but on non-physnet networks there's no guarantee at all (as planned in the blueprint). Therefore I think API operations like these should be rejected now, otherwise we may set up false expectations in our users. ** Affects: neutron Importance: Undecided Assignee: Bence Romsics (bence-romsics) Status: New ** Tags: qos stein-rc-potential -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1819029 Title: QoS policies with minimum-bandwidth rule should be rejected on non- physnet ports/networks Status in neutron: New Bug description: We seem to have forgot to reject some API operations that are actually not supported (and weren't planned to be supported) by the Stein implementation of the Guaranteed Minimum Bandwidth feature. That is QoS policies with a minimum-bandwidth rule should not be used on ports/networks that are not backed by a physnet. But currently we allow this: $ openstack network show private | egrep provider | provider:network_type | vxlan | provider:physical_network | None | provider:segmentation_id | 1062 $ openstack network qos policy create policy0 $ openstack network qos rule create policy0 --type minimum-bandwidth --min-kbps 1000 --egress $ openstack port create port0 --network private --qos-policy policy0 The port-create seems to work today, but on non-physnet networks there's no guarantee at all (as planned in the blueprint). Therefore I think API operations like these should be rejected now, otherwise we may set up false expectations in our users. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1819029/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1818683] [NEW] Placement reporter service plugin sometimes creates orphaned resource providers
Public bug reported: As discovered by lajoskatona while working on a fullstack test (https://review.openstack.org/631793) the placement reporter plugin may create some of the neutron resource providers in the wrong resource provider tree. For example consider: $ openstack --os-placement-api-version 1.17 resource provider list +--+--++--+--+ | uuid | name | generation | root_provider_uuid | parent_provider_uuid | +--+--++--+--+ | 89ca1421-5117-5348-acab-6d0e2054239c | devstack0:Open vSwitch agent | 0 | 89ca1421-5117-5348-acab-6d0e2054239c | None | | 4a6f5f40-b7a1-5df4-9938-63983543f365 | devstack0:Open vSwitch agent:br-physnet0 | 2 | 89ca1421-5117-5348-acab-6d0e2054239c | 89ca1421-5117-5348-acab-6d0e2054239c | | 193134fd-464c-5545-9d20-df7d58c0166f | devstack0:Open vSwitch agent:br-ex | 2 | 89ca1421-5117-5348-acab-6d0e2054239c | 89ca1421-5117-5348-acab-6d0e2054239c | | dbc498c7-8808-4f31-8abb-18560a4c3b53 | devstack0 | 2 | dbc498c7-8808-4f31-8abb-18560a4c3b53 | None | | 4a8a819d-61f9-5822-8c5c-3e9c7cb942d6 | devstack0:NIC Switch agent | 0 | dbc498c7-8808-4f31-8abb-18560a4c3b53 | dbc498c7-8808-4f31-8abb-18560a4c3b53 | | 1c7e83f0-108d-5c35-ada7-7ebebbe43aad | devstack0:NIC Switch agent:ens5 | 2 | dbc498c7-8808-4f31-8abb-18560a4c3b53 | 4a8a819d-61f9-5822-8c5c-3e9c7cb942d6 | +--+--++--+--+ Please note that all RPs should have the root_provider_uuid set to the devstack0 RP's uuid, but the open vswitch RPs have a different (wrong) root. And 'devstack0:Open vSwitch agent' has no parent. This situation is dependent on service startup order. The ovs RPs were created before the compute host RP. That case should have been detected as an error, but it was not. I'll upload a proposed fix right away. ** Affects: neutron Importance: Undecided Assignee: Bence Romsics (bence-romsics) Status: New ** Tags: qos -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1818683 Title: Placement reporter service plugin sometimes creates orphaned resource providers Status in neutron: New Bug description: As discovered by lajoskatona while working on a fullstack test (https://review.openstack.org/631793) the placement reporter plugin may create some of the neutron resource providers in the wrong resource provider tree. For example consider: $ openstack --os-placement-api-version 1.17 resource provider list +--+--++--+--+ | uuid | name | generation | root_provider_uuid | parent_provider_uuid | +--+--++--+--+ | 89ca1421-5117-5348-acab-6d0e2054239c | devstack0:Open vSwitch agent | 0 | 89ca1421-5117-5348-acab-6d0e2054239c | None | | 4a6f5f40-b7a1-5df4-9938-63983543f365 | devstack0:Open vSwitch agent:br-physnet0 | 2 | 89ca1421-5117-5348-acab-6d0e2054239c | 89ca1421-5117-5348-acab-6d0e2054239c | | 193134fd-464c-5545-9d20-df7d58c0166f | devstack0:Open vSwitch agent:br-ex | 2 | 89ca1421-5117-5348-acab-6d0e2054239c | 89ca1421-5117-5348-acab-6d0e2054239c | | dbc498c7-8808-4f31-8abb-18560a4c3b53 | devstack0 | 2 | dbc498c7-8808-4f31-8abb-18560a4c3b53 | None | | 4a8a819d-61f9-5822-8c5c-3e9c7cb942d6 | devstack0:NIC Switch agent | 0 | dbc498c7-8808-4f31-8abb-18560a4c3b53 | dbc498c7-8808-4f31-8abb-18560a4c3b53 | | 1c7e83f0-108d-5c35-ada7-7ebebbe43aad | devstack0:NIC Switch agent:ens5 | 2 | dbc498c7-8808-4f31-8abb-18560a4c3b53 | 4a8a819d-61f9-5822-8c5c-3e9c7cb942d6
[Yahoo-eng-team] [Bug 1818479] [NEW] RFE Decouple placement reporting service plugin from ML2
Public bug reported: This RFE tracks an improvement to the placement reporter service plugin that was suggested just a few days before the Stein feature freeze, so instead of working on it right there, this is delayed to the Train cycle. The original code review comment: https://review.openstack.org/#/c/580672/30/neutron/services/placement_report/plugin.py@187 The placement reporter service plugin as merged in Stein depends on ML2. The improvement idea is to decouple it, by a driver pattern as in the qos service plugin. We need to investigate the costs and benefits of this refactoring and if it's feasible implement it in Train. ** Affects: neutron Importance: Undecided Assignee: Bence Romsics (bence-romsics) Status: New ** Tags: qos rfe -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1818479 Title: RFE Decouple placement reporting service plugin from ML2 Status in neutron: New Bug description: This RFE tracks an improvement to the placement reporter service plugin that was suggested just a few days before the Stein feature freeze, so instead of working on it right there, this is delayed to the Train cycle. The original code review comment: https://review.openstack.org/#/c/580672/30/neutron/services/placement_report/plugin.py@187 The placement reporter service plugin as merged in Stein depends on ML2. The improvement idea is to decouple it, by a driver pattern as in the qos service plugin. We need to investigate the costs and benefits of this refactoring and if it's feasible implement it in Train. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1818479/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1815618] [NEW] cannot update qos rule
ck_rules_conflict(policy, rule) febr 12 12:20:19 devstack0 neutron-server[31565]: ERROR neutron.api.v2.resource File "/opt/stack/neutron/neutron/objects/qos/qos_policy_validator.py", line 63, in check_rules_conflict febr 12 12:20:19 devstack0 neutron-server[31565]: ERROR neutron.api.v2.resource if rule.duplicates(rule_obj): febr 12 12:20:19 devstack0 neutron-server[31565]: ERROR neutron.api.v2.resource File "/opt/stack/neutron/neutron/objects/qos/rule.py", line 83, in duplicates febr 12 12:20:19 devstack0 neutron-server[31565]: ERROR neutron.api.v2.resource if getattr(self, field) != getattr(other_rule, field): febr 12 12:20:19 devstack0 neutron-server[31565]: ERROR neutron.api.v2.resource File "/usr/local/lib/python2.7/dist-packages/oslo_versionedobjects/base.py", line 68, in getter febr 12 12:20:19 devstack0 neutron-server[31565]: ERROR neutron.api.v2.resource return getattr(self, attrname) febr 12 12:20:19 devstack0 neutron-server[31565]: ERROR neutron.api.v2.resource AttributeError: 'QosMinimumBandwidthRule' object has no attribute '_obj_direction' febr 12 12:20:19 devstack0 neutron-server[31565]: ERROR neutron.api.v2.resource· The version used to reproduce the bug: neutron 2f3cc51784 neutron-lib aceb7c50ed devstack ee4b6a01 python-openstackclient dcff1012fd python-neutronclient d74b871f7fe openstacksdk==0.23.0 osc-lib==1.12.0 I'll work on fixing these problems. ** Affects: neutron Importance: Undecided Assignee: Bence Romsics (bence-romsics) Status: New ** Tags: low-hanging-fruit qos -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1815618 Title: cannot update qos rule Status in neutron: New Bug description: This bug seems to be combination of problems on both client and server sides. So we may need to add pyhton-neutronclient and/or python- openstackclient as an affected component. I'll do that as soon as I manage to locate which one contains the client side bug. But this report will be good to track the overall problem. First the reproduction: openstack network qos policy create policy0 openstack network qos rule create policy0 --type minimum-bandwidth --min-kbps 1000 --egress # 71a84995-cccd-4f09-9c3d-b1caa18ff363 openstack network qos rule set policy0 71a84995-cccd-4f09-9c3d-b1caa18ff363 --min-kbps 1001 --egress -> works as expected # make sure we only have one rule of the type openstack network qos rule delete policy0 71a84995-cccd-4f09-9c3d-b1caa18ff363 openstack network qos rule create policy0 --type minimum-bandwidth --min-kbps 1000 --ingress # 1155c1c8-f9a7-4954-b195-9f58c8e18b4d openstack network qos rule set policy0 1155c1c8-f9a7-4954-b195-9f58c8e18b4d --min-kbps 1001 --ingress -> works as expected openstack network qos rule delete policy0 1155c1c8-f9a7-4954-b195-9f58c8e18b4d # create the ingress/egress pair at once openstack network qos rule create policy0 --type minimum-bandwidth --min-kbps 1000 --egress # f392837a-09e2-4b5e-8c29-86670797679e openstack network qos rule create policy0 --type minimum-bandwidth --min-kbps 1000 --ingress # 77dae223-b787-4943-bb45-c42424fd29ec # This is the bug. As we'll see later the trigger is a client-side problem, but I don't think neutron-server should return 500 Internal Server Error. The malformed input should be caught earlier and a 4xx response should be given. openstack network qos rule set policy0 f392837a-09e2-4b5e-8c29-86670797679e --min-kbps 1001 --egress Failed to set Network QoS rule ID "f392837a-09e2-4b5e-8c29-86670797679e": HttpException: 500: Server Error for url: http://100.109.0.20:9696/v2.0/qos/policies/188a2f59-ab90-41a3-9e6f-58e641a34544/minimum_bandwidth_rules/f392837a-09e2-4b5e-8c29-86670797679e, Request Failed: internal server error while processing your request. openstack network qos rule set policy0 77dae223-b787-4943-bb45-c42424fd29ec --min-kbps 1001 --ingress Failed to set Network QoS rule ID "77dae223-b787-4943-bb45-c42424fd29ec": HttpException: 500: Server Error for url: http://100.109.0.20:9696/v2.0/qos/policies/188a2f59-ab90-41a3-9e6f-58e641a34544/minimum_bandwidth_rules/77dae223-b787-4943-bb45-c42424fd29ec, Request Failed: internal server error while processing your request. # the same rule update can be done by neutronclient, but only for the egress direction neutron qos-minimum-bandwidth-rule-update f392837a-09e2-4b5e-8c29-86670797679e policy0 --min-kbps 1001 --direction egress -> works as expected # this failure is expected because neutronclient was long deprecated already when the ingress direction was introduced neutron qos-minimum-bandwidth-rule-update 77dae223-b787-4943-bb45-c42424fd29ec policy0 --min-kbps 1001 --direction ingress neutron qos-minimum-bandwidth-rule-update: error: argument --direction:
[Yahoo-eng-team] [Bug 1749404] [NEW] nova-compute resource tracker ignores 'reserved' while reporting 'max_unit'
Public bug reported: The following inventory was reported after a fresh devstack build: curl --silent \ --header "Accept: application/json" \ --header "Content-Type: application/json" \ --header "OpenStack-API-Version: placement latest" \ --header "X-Auth-Token: ${TOKEN:?}" \ -X GET http://127.0.0.1/placement/resource_providers/8d4d7926-df76-42e5-b5da-67893468f5cb/inventories | json_pp { "resource_provider_generation" : 1, "inventories" : { "DISK_GB" : { "max_unit" : 19, "min_unit" : 1, "allocation_ratio" : 1, "step_size" : 1, "reserved" : 0, "total" : 19 }, "MEMORY_MB" : { "allocation_ratio" : 1.5, "max_unit" : 5967, "min_unit" : 1, "reserved" : 512, "step_size" : 1, "total" : 5967 }, "VCPU" : { "allocation_ratio" : 16, "min_unit" : 1, "max_unit" : 2, "reserved" : 0, "step_size" : 1, "total" : 2 } } } IMO the correct max_unit value of the MEMORY_MB resource would be (total - reserved). But today it equals the total value. nova commit: 9e9b3e1 devstack commit: fbdefac devstack config: ENABLED_SERVICES+=,placement-api,placement-client ** Affects: nova Importance: Undecided Status: New ** Tags: low-hanging-fruit placement ** Description changed: The following inventory was reported after a fresh devstack build: - curl --silent --header "Accept: application/json" --header "Content-Type: application/json" --header "OpenStack-API-Version: placement latest" --header "X-Auth-Token: ${TOKEN:?}" -X GET http://127.0.0.1/placement/resource_providers/8d4d7926-df76-42e5-b5da-67893468f5cb/inventories | json_pp + curl --silent \ + --header "Accept: application/json" \ + --header "Content-Type: application/json" \ + --header "OpenStack-API-Version: placement latest" \ + --header "X-Auth-Token: ${TOKEN:?}" \ + -X GET http://127.0.0.1/placement/resource_providers/8d4d7926-df76-42e5-b5da-67893468f5cb/inventories | json_pp { -"resource_provider_generation" : 1, -"inventories" : { - "DISK_GB" : { - "max_unit" : 19, - "min_unit" : 1, - "allocation_ratio" : 1, - "step_size" : 1, - "reserved" : 0, - "total" : 19 - }, - "MEMORY_MB" : { - "allocation_ratio" : 1.5, - "max_unit" : 5967, - "min_unit" : 1, - "reserved" : 512, - "step_size" : 1, - "total" : 5967 - }, - "VCPU" : { - "allocation_ratio" : 16, - "min_unit" : 1, - "max_unit" : 2, - "reserved" : 0, - "step_size" : 1, - "total" : 2 - } -} + "resource_provider_generation" : 1, + "inventories" : { + "DISK_GB" : { + "max_unit" : 19, + "min_unit" : 1, + "allocation_ratio" : 1, + "step_size" : 1, + "reserved" : 0, + "total" : 19 + }, + "MEMORY_MB" : { + "allocation_ratio" : 1.5, + "max_unit" : 5967, + "min_unit" : 1, + "reserved" : 512, + "step_size" : 1, + "total" : 5967 + }, + "VCPU" : { + "allocation_ratio" : 16, + "min_unit" : 1, + "max_unit" : 2, + "reserved" : 0, + "step_size" : 1, + "total" : 2 + } + } } IMO the correct max_unit value of the MEMORY_MB resource would be (total - reserved). But today it equals the total value. nova commit: 9e9b3e1 devstack commit: fbdefac devstack config: ENABLED_SERVICES+=,placement-api,placement-client -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1749404 Title: nova-compute resource tracker ignores 'reserved' while reporting 'max_unit' Status in OpenStack Compute (nova): New Bug description: The following inventory was reported after a fresh devstack build: curl --silent \ --header "Accept: application/json" \ --header "Content-Type: application/json" \ --header "OpenStack-API-Version: placement latest" \ --header "X-Auth-Token: ${TOKEN:?}" \ -X GET http://127.0.0.1/placement/resource_providers/8d4d7926-df76-42e5-b5da-67893468f5cb/inventories | json_pp { "resource_provider_generation" : 1, "inventories" : { "DISK_GB" : { "max_unit" : 19, "min_unit" : 1, "allocation_ratio" : 1, "step_size" : 1, "reserved" : 0, "total" : 19 }, "MEMORY_MB" : { "allocation_ratio" : 1.5, "max_unit" : 5967, "min_unit" : 1, "reserved" : 512, "step_size" : 1, "total" : 5967 }, "VCPU" : {
[Yahoo-eng-team] [Bug 1749410] [NEW] placement api-ref unclear if capacity is meant to be total or current
Public bug reported: While exploring the newer microversions (here 1.4) of the placement API I found this part of the API reference unclear to me (https://developer.openstack.org/api-ref/placement/#list-resource- providers, 'resources' parameter): "A comma-separated list of strings indicating an amount of resource of a specified class that a provider must have the capacity to serve:" Based on the reference I cannot tell if the capacity is meant to be total or current (ie. total - current allocations). Running a few queries it seems to me the actual behavior is to filter on total capacity. If that was the intended behavior then this report is just a tiny documentation bug I guess. https://github.com/openstack/nova/blob/17.0.0.0rc1/placement-api- ref/source/parameters.yaml#L105 ** Affects: nova Importance: Undecided Status: New ** Tags: doc placement -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1749410 Title: placement api-ref unclear if capacity is meant to be total or current Status in OpenStack Compute (nova): New Bug description: While exploring the newer microversions (here 1.4) of the placement API I found this part of the API reference unclear to me (https://developer.openstack.org/api-ref/placement/#list-resource- providers, 'resources' parameter): "A comma-separated list of strings indicating an amount of resource of a specified class that a provider must have the capacity to serve:" Based on the reference I cannot tell if the capacity is meant to be total or current (ie. total - current allocations). Running a few queries it seems to me the actual behavior is to filter on total capacity. If that was the intended behavior then this report is just a tiny documentation bug I guess. https://github.com/openstack/nova/blob/17.0.0.0rc1/placement-api- ref/source/parameters.yaml#L105 To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1749410/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1708444] [NEW] Angular role table stays stale after editing a role
Public bug reported: In the angularized role panel if I edit a role (eg. change its name) the actual update happens in Keystone, but the role table is not refreshed and shows the old state until I reload the page. devstack b79531a horizon 53dd2db ANGULAR_FEATURES={ 'roles_panel': True, ... } A proposed fix is on the way. ** Affects: horizon Importance: Undecided Assignee: Bence Romsics (bence-romsics) Status: In Progress ** Tags: angularjs -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Dashboard (Horizon). https://bugs.launchpad.net/bugs/1708444 Title: Angular role table stays stale after editing a role Status in OpenStack Dashboard (Horizon): In Progress Bug description: In the angularized role panel if I edit a role (eg. change its name) the actual update happens in Keystone, but the role table is not refreshed and shows the old state until I reload the page. devstack b79531a horizon 53dd2db ANGULAR_FEATURES={ 'roles_panel': True, ... } A proposed fix is on the way. To manage notifications about this bug go to: https://bugs.launchpad.net/horizon/+bug/1708444/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1704118] [NEW] Spinner is stuck when deleting image in angularized panel
Public bug reported: Reproduction: local_settings: ANGULAR_FEATURES={ 'images_panel': True, ... } devstack commit b79531a9f96736225a8991052a0be5767c217377 horizon commit d5779eae0ad267533001cb7dae6ca7dbc5becb27 Go to detail page of an image eg: /ngdetails/OS::Glance::Image/90ccb1bf- 1feb-4f49-8234-c6812c952131 Click delete image. After that the image is deleted though multiple UI errors can be seen: 1) The 'Please wait' spinner is stuck forever 2) A red toast is displayed: Error: Unable to retrieve the image 3) In the javascript console this error appears: GET http://127.0.0.1:9000/api/glance/images/90ccb1bf-1feb-4f49-8234-c6812c952131/ 404 (Not Found) ** Affects: horizon Importance: Undecided Status: New ** Tags: angularjs glance low-hanging-fruit -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Dashboard (Horizon). https://bugs.launchpad.net/bugs/1704118 Title: Spinner is stuck when deleting image in angularized panel Status in OpenStack Dashboard (Horizon): New Bug description: Reproduction: local_settings: ANGULAR_FEATURES={ 'images_panel': True, ... } devstack commit b79531a9f96736225a8991052a0be5767c217377 horizon commit d5779eae0ad267533001cb7dae6ca7dbc5becb27 Go to detail page of an image eg: /ngdetails/OS::Glance::Image /90ccb1bf-1feb-4f49-8234-c6812c952131 Click delete image. After that the image is deleted though multiple UI errors can be seen: 1) The 'Please wait' spinner is stuck forever 2) A red toast is displayed: Error: Unable to retrieve the image 3) In the javascript console this error appears: GET http://127.0.0.1:9000/api/glance/images/90ccb1bf-1feb-4f49-8234-c6812c952131/ 404 (Not Found) To manage notifications about this bug go to: https://bugs.launchpad.net/horizon/+bug/1704118/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1699516] [NEW] Trunk create fails due to case typo
Public bug reported: When you boot a vm with a trunk using the ovs trunk driver the boot fails in allocating the network. While you get this ovs-agent error log: neutron-openvswitch-agent[12170]: CallbackFailure: Callback neutron.services.trunk.drivers.openvswitch.agent.driver .OVSTrunkSkeleton.check_trunk_dependencies-1030432 failed with "no such option securitygroup in group [DEFAULT]" The cause looks like to be a case typo in the fix of bug #1669074. neutron/services/trunk/drivers/openvswitch/agent/driver.py: wrong: cfg.CONF.securitygroup.firewall_driver right: cfg.CONF.SECURITYGROUP.firewall_driver ** Affects: neutron Importance: Undecided Status: New ** Tags: trunk -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1699516 Title: Trunk create fails due to case typo Status in neutron: New Bug description: When you boot a vm with a trunk using the ovs trunk driver the boot fails in allocating the network. While you get this ovs-agent error log: neutron-openvswitch-agent[12170]: CallbackFailure: Callback neutron.services.trunk.drivers.openvswitch.agent.driver .OVSTrunkSkeleton.check_trunk_dependencies-1030432 failed with "no such option securitygroup in group [DEFAULT]" The cause looks like to be a case typo in the fix of bug #1669074. neutron/services/trunk/drivers/openvswitch/agent/driver.py: wrong: cfg.CONF.securitygroup.firewall_driver right: cfg.CONF.SECURITYGROUP.firewall_driver To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1699516/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1631371] [NEW] [RFE] Expose trunk details over metadata API
Public bug reported: Enable bringup of subports via exposing trunk/subport details over the metadata API With the completion of the trunk port feature in Newton (Neutron bp/vlan-aware-vms [1]), trunk and subports are now available. But the bringup of the subports' VLAN interfaces inside an instance is not automatic. In Newton there's no easy way to pass information about the subports to the guest operating system. But using the metadata API we can change this. Problem Description --- To bring up (and/or tear down) a subport the guest OS (a) must know the segmentation-type and segmentation-id of a subport as set in 'openstack network trunk create/set --subport' (b) must know the MAC address of a subport as set in 'openstack port create' (c) must know which vNIC the subport belongs to (d) may need to know when were subports added or removed (if they are added or removed during the lifetime of an instance) Since subports do not have a corresponding vNIC, the approach used for regular ports (with a vNIC) cannot work. This write-up addresses problems (a), (b) and (c), but not (d). Proposed Change --- Here we propose a change involving both Nova and Neutron to expose the information needed via the metadata API. Information covering (a) and (b) is already available (read-only) in the 'trunk_details' attribute of the trunk parent port (ie. the port which the instance was booted with). [2] We propose to use the MAC address of the trunk parent port to cover (c). We recognize this may occasionally be problematic, because MAC addresses (of ports belonging to different neutron networks) are not guaranteed to be unique, therefore collision may happen. But this seems to be a small price for avoiding the complexity of other solutions. The mechanism would be the following. Let's suppose we have port0 which is a trunk parent port and instance0 was booted with '--nic port-id=port0'. On every update of port0's trunk_details Neutron constructs the following JSON structure: PORT0-DETAILS = { "mac_address": PORT0-MAC-ADDRESS, "trunk_details": PORT0-TRUNK-DETAILS } Then Neutron sets a metadata key-value pair of instance0, equivalent to the following nova command: nova meta set instance0 trunk_details::PORT0-MAC-ADDRESS=PORT0-DETAILS Nova in Newton limits meta values to <= 255 characters, this limit must be raised. Assuming the current format of trunk_details roughly 150 characters/subport are needed. Alternatively meta values could have unlimited length - at least for the service tenant used by Neutron. (Though tenant-specific API validators may not be a good idea.) The 'values' column of the the 'instance_metadata' table should be altered from VARCHAR(255) to TEXT() in a Nova DB migration. (A slightly related bug report: [3]) A program could read http://169.254.169.254/openstack/2016-06-30/meta_data.json and bring up the subport VLAN interfaces accordingly. This program is not covered here, however it is worth pointing out that it could be called by cloud-init. Alternatives (1) The MAC address of a parent port can be reused for all its child ports (when creating the child ports). Then VLAN subinterfaces of a network interface will have the correct MAC address by default. Segmentation type and ID can be shared in other ways, for example as a VLAN plan embedded into a golden image. This approach could even partially solve problem (d), however it cannot solve problem (a) in the dynamic case. Use of this approach is currently blocked by an openvswitch firewall driver bug. [4][5] (2) Generate and inject a subport bringup script into the instance as user data. Cannot handle subports added or removed after VM boot. (3) An alternative solution to problem (c) could rely on the preservation of ordering between NICs passed to nova boot and NICs inside an instance. However this would turn the update of trunk_details into an instance-level operation instead of the port-level operation proposed here. Plus it would fail if this ordering is ever lost. References -- [1] https://blueprints.launchpad.net/neutron/+spec/vlan-aware-vms [2] https://review.openstack.org/#q,Id23ce8fc16c6ea6a405cb8febf8470a5bf3bcb43,n,z [3] https://bugs.launchpad.net/nova/+bug/1117923 [4] https://bugs.launchpad.net/neutron/+bug/1626010 [5] https://bugs.launchpad.net/neutron/+bug/1593760 ** Affects: neutron Importance: Undecided Status: New ** Tags: rfe trunk -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1631371 Title: [RFE] Expose trunk details over metadata API Status in neutron: New Bug description: Enable bringup of subports via exposing trunk/subport details over the metadata API With the completion of the trunk port feature in Newton (Neutron bp/vlan-aware-vms [1]), trunk and subports are now available. But the bringup of the
[Yahoo-eng-team] [Bug 1630920] [NEW] native/idl ovsdb driver loses some ovsdb transactions
Public bug reported: It seems the 'native' and the 'vsctl' ovsdb drivers behave differently. The native/idl driver seems to lose some ovsdb transactions, at least the transactions setting the 'other_config' ovs port attribute. I have written about this in a comment of an earlier bug report (https://bugs.launchpad.net/neutron/+bug/1626010). But I opened this new bug report because the two problems seem to be independent and that other comment may have gone unnoticed. It is not completely clear to me what difference this causes in user- observable behavior. I think it at least leads to losing information about which conntrack zone to use in the openvswitch firewall driver. See here: https://github.com/openstack/neutron/blob/3ade301/neutron/agent/linux/openvswitch_firewall/firewall.py#L257 The details: If I use the vsctl ovsdb driver: ml2_conf.ini: [ovs] ovsdb_interface = vsctl then I see this: $ > /opt/stack/logs/q-agt.log $ sudo ovs-vsctl list Port | grep other_config | grep -c net_uuid 1 $ openstack server create --flavor cirros256 --image cirros-0.3.4-x86_64-uec --nic net-id=net0 --wait vm0 $ sudo ovs-vsctl list Port | grep other_config | grep -c net_uuid 2 $ openstack server delete vm0 $ sleep 3 $ sudo ovs-vsctl list Port | grep other_config | grep -c net_uuid 1 $ egrep -c 'Transaction caused no change' /opt/stack/logs/q-agt.log 0 But if I use the (default) native driver: ml2_conf.ini: [ovs] ovsdb_interface = native Then this happens: $ > /opt/stack/logs/q-agt.log $ sudo ovs-vsctl list Port | grep other_config | grep -c net_uuid 1 $ openstack server create --flavor cirros256 --image cirros-0.3.4-x86_64-uec --nic net-id=net0 --wait vm0 $ sudo ovs-vsctl list Port | grep other_config | grep -c net_uuid 1 $ openstack server delete vm0 $ sleep 3 $ sudo ovs-vsctl list Port | grep other_config | grep -c net_uuid 1 $ egrep -c 'Transaction caused no change' /opt/stack/logs/q-agt.log 22 A sample log message from q-agt.log: 2016-10-06 09:23:05.447 DEBUG neutron.agent.ovsdb.impl_idl [-] Running txn command(idx=0): DbSetCommand(table=Port, col_values=(('other_config', {'tag': 1}),), record=tap8e2a390d-63) from (pid=6068) do_commit /opt/stack/neutron/neutron/agent/ovsdb/impl_idl.py:99 2016-10-06 09:23:05.448 DEBUG neutron.agent.ovsdb.impl_idl [-] Transaction caused no change from (pid=6068) do_commit /opt/stack/neutron/neutron/agent/ovsdb/impl_idl.py:126 devstack version: 563d377 neutron version: 3ade301 ** Affects: neutron Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1630920 Title: native/idl ovsdb driver loses some ovsdb transactions Status in neutron: New Bug description: It seems the 'native' and the 'vsctl' ovsdb drivers behave differently. The native/idl driver seems to lose some ovsdb transactions, at least the transactions setting the 'other_config' ovs port attribute. I have written about this in a comment of an earlier bug report (https://bugs.launchpad.net/neutron/+bug/1626010). But I opened this new bug report because the two problems seem to be independent and that other comment may have gone unnoticed. It is not completely clear to me what difference this causes in user- observable behavior. I think it at least leads to losing information about which conntrack zone to use in the openvswitch firewall driver. See here: https://github.com/openstack/neutron/blob/3ade301/neutron/agent/linux/openvswitch_firewall/firewall.py#L257 The details: If I use the vsctl ovsdb driver: ml2_conf.ini: [ovs] ovsdb_interface = vsctl then I see this: $ > /opt/stack/logs/q-agt.log $ sudo ovs-vsctl list Port | grep other_config | grep -c net_uuid 1 $ openstack server create --flavor cirros256 --image cirros-0.3.4-x86_64-uec --nic net-id=net0 --wait vm0 $ sudo ovs-vsctl list Port | grep other_config | grep -c net_uuid 2 $ openstack server delete vm0 $ sleep 3 $ sudo ovs-vsctl list Port | grep other_config | grep -c net_uuid 1 $ egrep -c 'Transaction caused no change' /opt/stack/logs/q-agt.log 0 But if I use the (default) native driver: ml2_conf.ini: [ovs] ovsdb_interface = native Then this happens: $ > /opt/stack/logs/q-agt.log $ sudo ovs-vsctl list Port | grep other_config | grep -c net_uuid 1 $ openstack server create --flavor cirros256 --image cirros-0.3.4-x86_64-uec --nic net-id=net0 --wait vm0 $ sudo ovs-vsctl list Port | grep other_config | grep -c net_uuid 1 $ openstack server delete vm0 $ sleep 3 $ sudo ovs-vsctl list Port | grep other_config | grep -c net_uuid 1 $ egrep -c 'Transaction caused no change' /opt/stack/logs/q-agt.log 22 A sample log message from q-agt.log: 2016-10-06 09:23:05.447 DEBUG neutron.agent.ovsdb.impl_idl [-] Running txn command(idx=0): DbSetCommand(table=Port, col_values=(('other_config',
[Yahoo-eng-team] [Bug 1626010] [NEW] Connectivity problem on trunk parent with MAC reuse and openvswitch firewall driver
Public bug reported: It seems we have a case where the openvswitch firewall driver and a use of trunks interferes with each other. I tried using the parent's MAC address for a subport. Like this: openstack network create net0 openstack network create net1 openstack subnet create --network net0 --subnet-range 10.0.4.0/24 subnet0 openstack subnet create --network net1 --subnet-range 10.0.5.0/24 subnet1 openstack port create --network net0 port0 parent_mac="$( openstack port show port0 | awk '/ mac_address / { print $4 }' )" openstack port create --network net1 --mac-address "$parent_mac" port1 openstack network trunk create --parent-port port0 --subport port=port1,segmentation-type=vlan,segmentation-id=101 trunk0 openstack server create --flavor cirros256 --image cirros-0.3.4-x86_64-uec --nic port-id=port0 --key-name key0 --wait vm0 Then all packets are lost on the trunk's parent port: $ openstack server show vm0 | egrep addresses.*net0 | addresses| net0=10.0.4.6 | $ sudo ip netns exec "qdhcp-$( openstack network show net0 | awk '/ id / { print $4 }' )" ping -c3 10.0.4.6 WARNING: openstackclient.common.utils is deprecated and will be removed after Jun 2017. Please use osc_lib.utils PING 10.0.4.6 (10.0.4.6) 56(84) bytes of data. --- 10.0.4.6 ping statistics --- 3 packets transmitted, 0 received, 100% packet loss, time 2016ms If I change the firewall_driver to noop and redo the same I have connectivity. If I still have the openvswitch firewall_driver but I don't explicitly set the subport MAC, but let neutron automatically assign one, then again I have connectivity. devstack version: 81d89cf neutron version: 60010a8 relevant parts of local.conf: [[local|localrc]] enable_service neutron-api enable_service neutron-l3 enable_service neutron-agent enable_service neutron-dhcp enable_service neutron-metadata-agent [[post-config|$NEUTRON_CONF]] [DEFAULT] service_plugins = router,trunk [[post-config|$NEUTRON_PLUGIN_CONF]] [securitygroup] firewall_driver = openvswitch ** Affects: neutron Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1626010 Title: Connectivity problem on trunk parent with MAC reuse and openvswitch firewall driver Status in neutron: New Bug description: It seems we have a case where the openvswitch firewall driver and a use of trunks interferes with each other. I tried using the parent's MAC address for a subport. Like this: openstack network create net0 openstack network create net1 openstack subnet create --network net0 --subnet-range 10.0.4.0/24 subnet0 openstack subnet create --network net1 --subnet-range 10.0.5.0/24 subnet1 openstack port create --network net0 port0 parent_mac="$( openstack port show port0 | awk '/ mac_address / { print $4 }' )" openstack port create --network net1 --mac-address "$parent_mac" port1 openstack network trunk create --parent-port port0 --subport port=port1,segmentation-type=vlan,segmentation-id=101 trunk0 openstack server create --flavor cirros256 --image cirros-0.3.4-x86_64-uec --nic port-id=port0 --key-name key0 --wait vm0 Then all packets are lost on the trunk's parent port: $ openstack server show vm0 | egrep addresses.*net0 | addresses| net0=10.0.4.6 | $ sudo ip netns exec "qdhcp-$( openstack network show net0 | awk '/ id / { print $4 }' )" ping -c3 10.0.4.6 WARNING: openstackclient.common.utils is deprecated and will be removed after Jun 2017. Please use osc_lib.utils PING 10.0.4.6 (10.0.4.6) 56(84) bytes of data. --- 10.0.4.6 ping statistics --- 3 packets transmitted, 0 received, 100% packet loss, time 2016ms If I change the firewall_driver to noop and redo the same I have connectivity. If I still have the openvswitch firewall_driver but I don't explicitly set the subport MAC, but let neutron automatically assign one, then again I have connectivity. devstack version: 81d89cf neutron version: 60010a8 relevant parts of local.conf: [[local|localrc]] enable_service neutron-api enable_service neutron-l3 enable_service neutron-agent enable_service neutron-dhcp enable_service neutron-metadata-agent [[post-config|$NEUTRON_CONF]] [DEFAULT] service_plugins = router,trunk [[post-config|$NEUTRON_PLUGIN_CONF]] [securitygroup] firewall_driver = openvswitch To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1626010/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1587296] [NEW] ovs-agent: use_veth_interconnection is not needed anymore
Public bug reported: Config option 'use_veth_interconnection' should be deprecated. Instead we can always use Open vSwitch patch ports. The discussion started in a review here: https://review.openstack.org/#/c/318317/2 openstack/neutron/doc/source/devref/openvswitch_agent.rst line 471 AFAICT the use of veth pairs was always a fallback when sufficiently new ovs was not available from distro packages. Since veth pairs have always worse packet forwarding performance than ovs patch cables it makes no sense using them if ovs patch cables are available. If we no longer support veth pairs, the agent code can be simplified. We think providing the veth fallback is no longer relevant. Open vSwitch release notes state this (http://openvswitch.org/releases/NEWS-2.5.0): v1.10.0 - 01 May 2013 - ... - Patch ports no longer require kernel support, so they now work with FreeBSD and the kernel module built into Linux 3.3 and later. For example for Ubuntu this means veth is not needed in 14.04+. I opened this bug to separate this conversation from the above review. To get feedback if anybody still uses veth pairs. Shall we deprecate 'use_veth_interconnection'? If yes, what should be the deprecation timeline? ** Affects: neutron Importance: Undecided Status: New ** Tags: ovs rfe ** Project changed: tempest => neutron -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1587296 Title: ovs-agent: use_veth_interconnection is not needed anymore Status in neutron: New Bug description: Config option 'use_veth_interconnection' should be deprecated. Instead we can always use Open vSwitch patch ports. The discussion started in a review here: https://review.openstack.org/#/c/318317/2 openstack/neutron/doc/source/devref/openvswitch_agent.rst line 471 AFAICT the use of veth pairs was always a fallback when sufficiently new ovs was not available from distro packages. Since veth pairs have always worse packet forwarding performance than ovs patch cables it makes no sense using them if ovs patch cables are available. If we no longer support veth pairs, the agent code can be simplified. We think providing the veth fallback is no longer relevant. Open vSwitch release notes state this (http://openvswitch.org/releases/NEWS-2.5.0): v1.10.0 - 01 May 2013 - ... - Patch ports no longer require kernel support, so they now work with FreeBSD and the kernel module built into Linux 3.3 and later. For example for Ubuntu this means veth is not needed in 14.04+. I opened this bug to separate this conversation from the above review. To get feedback if anybody still uses veth pairs. Shall we deprecate 'use_veth_interconnection'? If yes, what should be the deprecation timeline? To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1587296/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp