[Yahoo-eng-team] [Bug 2052937] Re: Policy: binding operations are prohibited for service role

2024-02-13 Thread Bence Romsics
** Changed in: neutron
   Status: Invalid => Triaged

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/2052937

Title:
  Policy: binding operations are prohibited for service role

Status in neutron:
  Triaged

Bug description:
  Create/update port binding:* policies are admin only, which prevents
  for example ironic service user with service role to manage baremetal
  ports:

  
  "http://192.0.2.10:9292;, "region": "RegionOne"}], "id": 
"e6e42ef4fc984e71b575150e59a92704", "type": "image", "name": "glance"}]}} 
get_auth_ref 
/var/lib/kolla/venv/lib64/python3.9/site-packages/keystoneauth1/identity/v3/base.py:189
  2024-02-12 11:44:57.848 7 ERROR ironic.common.neutron [None 
req-6737aef3-c823-4f7c-95ec-1c9f38b14faa a4dbb0dc59024c199843cea86603308b 
9fd64a4cbd774756869cb3968de2e9b6 - - default default] Unable to clear binding 
profile for neutron port 291dbb7b-5cc8-480d-b39d-eb849bcb4a64. Error: 
ForbiddenException: 403: Client Error for url: 
http://192.0.2.10:9696/v2.0/ports/291dbb7b-5cc8-480d-b39d-eb849bcb4a64, 
((rule:update_port and rule:update_port:binding:host_id) and 
rule:update_port:binding:profile) is disallowed by policy: 
openstack.exceptions.ForbiddenException: ForbiddenException: 403: Client Error 
for url: 
http://192.0.2.10:9696/v2.0/ports/291dbb7b-5cc8-480d-b39d-eb849bcb4a64, 
((rule:update_port and rule:update_port:binding:host_id) and 
rule:update_port:binding:profile) is disallowed by policy
  2024-02-12 11:44:57.848 7 ERROR ironic.common.neutron Traceback (most recent 
call last):
  2024-02-12 11:44:57.848 7 ERROR ironic.common.neutron   File 
"/var/lib/kolla/venv/lib64/python3.9/site-packages/ironic/common/neutron.py", 
line 130, in unbind_neutron_port
  2024-02-12 11:44:57.848 7 ERROR ironic.common.neutron 
update_neutron_port(context, port_id, attrs_unbind, client)
  2024-02-12 11:44:57.848 7 ERROR ironic.common.neutron   File 
"/var/lib/kolla/venv/lib64/python3.9/site-packages/ironic/common/neutron.py", 
line 109, in update_neutron_port
  2024-02-12 11:44:57.848 7 ERROR ironic.common.neutron return 
client.update_port(port_id, **attrs)
  2024-02-12 11:44:57.848 7 ERROR ironic.common.neutron   File 
"/var/lib/kolla/venv/lib64/python3.9/site-packages/openstack/network/v2/_proxy.py",
 line 2992, in update_port
  2024-02-12 11:44:57.848 7 ERROR ironic.common.neutron return 
self._update(_port.Port, port, if_revision=if_revision, **attrs)
  2024-02-12 11:44:57.848 7 ERROR ironic.common.neutron   File 
"/var/lib/kolla/venv/lib64/python3.9/site-packages/openstack/proxy.py", line 
61, in check
  2024-02-12 11:44:57.848 7 ERROR ironic.common.neutron return method(self, 
expected, actual, *args, **kwargs)
  2024-02-12 11:44:57.848 7 ERROR ironic.common.neutron   File 
"/var/lib/kolla/venv/lib64/python3.9/site-packages/openstack/network/v2/_proxy.py",
 line 202, in _update
  2024-02-12 11:44:57.848 7 ERROR ironic.common.neutron return 
res.commit(self, base_path=base_path, if_revision=if_revision)
  2024-02-12 11:44:57.848 7 ERROR ironic.common.neutron   File 
"/var/lib/kolla/venv/lib64/python3.9/site-packages/openstack/resource.py", line 
1803, in commit
  2024-02-12 11:44:57.848 7 ERROR ironic.common.neutron return self._commit(
  2024-02-12 11:44:57.848 7 ERROR ironic.common.neutron   File 
"/var/lib/kolla/venv/lib64/python3.9/site-packages/openstack/resource.py", line 
1848, in _commit
  2024-02-12 11:44:57.848 7 ERROR ironic.common.neutron 
self._translate_response(response, has_body=has_body)
  2024-02-12 11:44:57.848 7 ERROR ironic.common.neutron   File 
"/var/lib/kolla/venv/lib64/python3.9/site-packages/openstack/resource.py", line 
1287, in _translate_response
  2024-02-12 11:44:57.848 7 ERROR ironic.common.neutron 
exceptions.raise_from_response(response, error_message=error_message)
  2024-02-12 11:44:57.848 7 ERROR ironic.common.neutron   File 
"/var/lib/kolla/venv/lib64/python3.9/site-packages/openstack/exceptions.py", 
line 250, in raise_from_response
  2024-02-12 11:44:57.848 7 ERROR ironic.common.neutron raise cls(
  2024-02-12 11:44:57.848 7 ERROR ironic.common.neutron 
openstack.exceptions.ForbiddenException: ForbiddenException: 403: Client Error 
for url: 
http://192.0.2.10:9696/v2.0/ports/291dbb7b-5cc8-480d-b39d-eb849bcb4a64, 
((rule:update_port and rule:update_port:binding:host_id) and 
rule:update_port:binding:profile) is disallowed by policy

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/2052937/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 2052937] Re: Policy: binding operations are prohibited for service role

2024-02-13 Thread Bence Romsics
Hi Bartosz,

Yes, by default this is prohibited. However oslo.policy based policies
are configurable.

For example, in my devstack I don't have ironic deployed, but I
reproduced the problem using the unprivileged 'demo' user:

$ source openrc demo demo
$ openstack network create net0
$ openstack subnet create --network net0 --subnet-range 10.0.0.0/24 subnet0
$ openstack port create --network net0 port0
$ openstack port set --host devstack0 port0
ForbiddenException: 403: Client Error for url: 
http://192.168.122.225:9696/networking/v2.0/ports/4d6fa1c1-bbb0-4298-a901-c3dec7f1b1f1,
 (rule:update_port and rule:update_port:binding:host_id) is disallowed by policy

While in q-svc logs I had this:

febr 13 14:03:42 devstack0 neutron-server[5814]: DEBUG neutron.policy [None 
req-9fa226e6-2ae5-4abe-9b70-efc749ef4913 None demo] Enforcing rules: 
['update_port', 'update_port:binding:host_id'] {{(pid=5814) log_rule_list 
/opt/stack/neutron/neutron/policy.py:457}}
febr 13 14:03:42 devstack0 neutron-server[5814]: DEBUG neutron.policy [None 
req-9fa226e6-2ae5-4abe-9b70-efc749ef4913 None demo] Failed policy enforce for 
'update_port' {{(pid=5814) enforce /opt/stack/neutron/neutron/policy.py:530}}

The non-default policy configuration is looked up by oslo.policy in
/etc/neutron/policy.{json,yaml}. Today I believe the yaml format is
preferred. But for some reason devstack still created the old json
format for me. So first I migrated the one-line json file to yaml:

$ cat /etc/neutron/policy.json
{"context_is_admin":  "role:admin or user_name:neutron"}

$ cat /etc/neutron/policy.yaml 
"context_is_admin": "role:admin or user_name:neutron"

I believe this all was deployment (here devstack) specific.

I also told oslo.policy running in neutron-server to use the yaml formatted 
file:
/etc/neutron/neutron.conf:
[oslo_policy]
policy_file = /etc/neutron/policy.yaml

Then I changed the policy for port binding from the default:
"update_port:binding:host_id": "rule:admin_only" to
"update_port:binding:host_id": "rule:admin_or_owner"

After this change the above "openstack port set --host" starts working.
Even without restarting neutron-server.

In your environment of course you want to use a different rule, maybe something 
like this:
"update_port:binding:host_id": "(rule:admin_only) or (rule:service_api)"

Since I don't have ironic in this environment, I could not test this
rule. But please have a look at the documentation, I'm virtually sure
there's a way to set what you need.

https://docs.openstack.org/neutron/latest/configuration/policy.html
https://docs.openstack.org/neutron/latest/configuration/policy-sample.html
https://docs.openstack.org/oslo.policy/latest/

Regarding the default, I believe for most environments it is good that
only the admin can change port bindings. If you believe differently,
please share your reasons. Until then I'm marking this as not a bug.

Regards,
Bence

** Changed in: neutron
   Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/2052937

Title:
  Policy: binding operations are prohibited for service role

Status in neutron:
  Invalid

Bug description:
  Create/update port binding:* policies are admin only, which prevents
  for example ironic service user with service role to manage baremetal
  ports:

  
  "http://192.0.2.10:9292;, "region": "RegionOne"}], "id": 
"e6e42ef4fc984e71b575150e59a92704", "type": "image", "name": "glance"}]}} 
get_auth_ref 
/var/lib/kolla/venv/lib64/python3.9/site-packages/keystoneauth1/identity/v3/base.py:189
  2024-02-12 11:44:57.848 7 ERROR ironic.common.neutron [None 
req-6737aef3-c823-4f7c-95ec-1c9f38b14faa a4dbb0dc59024c199843cea86603308b 
9fd64a4cbd774756869cb3968de2e9b6 - - default default] Unable to clear binding 
profile for neutron port 291dbb7b-5cc8-480d-b39d-eb849bcb4a64. Error: 
ForbiddenException: 403: Client Error for url: 
http://192.0.2.10:9696/v2.0/ports/291dbb7b-5cc8-480d-b39d-eb849bcb4a64, 
((rule:update_port and rule:update_port:binding:host_id) and 
rule:update_port:binding:profile) is disallowed by policy: 
openstack.exceptions.ForbiddenException: ForbiddenException: 403: Client Error 
for url: 
http://192.0.2.10:9696/v2.0/ports/291dbb7b-5cc8-480d-b39d-eb849bcb4a64, 
((rule:update_port and rule:update_port:binding:host_id) and 
rule:update_port:binding:profile) is disallowed by policy
  2024-02-12 11:44:57.848 7 ERROR ironic.common.neutron Traceback (most recent 
call last):
  2024-02-12 11:44:57.848 7 ERROR ironic.common.neutron   File 
"/var/lib/kolla/venv/lib64/python3.9/site-packages/ironic/common/neutron.py", 
line 130, in unbind_neutron_port
  2024-02-12 11:44:57.848 7 ERROR ironic.common.neutron 
update_neutron_port(context, port_id, attrs_unbind, client)
  2024-02-12 11:44:57.848 7 ERROR ironic.common.neutron   File 
"/var/lib/kolla/venv/lib64/python3.9/site-packages/ironic/common/neutron.py", 
line 109, in 

[Yahoo-eng-team] [Bug 2051685] [NEW] After repeat of incomplete migration nova applies wrong (status=error) migration context in update_available_resource periodic job

2024-01-30 Thread Bence Romsics
Public bug reported:

The original problem observed in a downstream deployment was of
overcommit on dedicated PCPUs and CPUPinningInvalid exception breaking
update_available_resource periodic job.

The following reproduction is not an end-to-end reproduction, but I hope
I can demonstrate where things go wrong.

The environment is a multi-node devstack:
devstack0 - all-in-one
devstack0a - compute

Nova is backed by libvirt/qemu/kvm.

devstack 6b0f055b
nova on devstack0 39f560d673
nova on devstack0a a72f7eaac7
libvirt 8.0.0-1ubuntu7.8
qemu 1:6.2+dfsg-2ubuntu6.16
linux 5.15.0-91-generic

# Clean up if not the first run.
openstack server list -f value -c ID | xargs -r openstack server delete --wait
openstack volume list --status available -f value -c ID | xargs -r openstack 
volume delete

# Create a server on devstack0.
openstack flavor create cirros256-pinned --public --vcpus 1 --ram 256 --disk 1 
--property hw_rng:allowed=True --property hw:cpu_policy=dedicated
openstack server create --flavor cirros256-pinned --image 
cirros-0.6.2-x86_64-disk --boot-from-volume 1 --nic net-id=private 
--availability-zone :devstack0 vm0 --wait

# Start a live migration to devstack0a, but simulate a failure. In my 
environment a complete live migration takes around 20 seconds. Using 'sleep 3' 
it usually breaks in the 'preparing' status.
# As far as I understand other kinds of migration (like cold migration) are 
also affected.
openstack server migrate --live-migration vm0 --wait & sleep 2 ; ssh devstack0a 
sudo systemctl stop devstack@n-cpu

$ openstack server migration list --server vm0 --sort-column 'Created At'
+++-+++--++---++++++-+
| Id | UUID   | Source Node | Dest Node  | Source Compute | Dest 
Compute | Dest Host  | Status| Server UUID| Old Flavor | New 
Flavor | Type   | Created At | Updated At  |
+++-+++--++---++++++-+
| 33 | c7a42f9e-dfee- | devstack0   | devstack0a | devstack0  | 
devstack0a   | 192.168.122.79 | preparing | a2b43180-8ad9- | 11 |   
  11 | live-migration | 2024-01-   | 2024-01-|
|| 4a2c-b42a- | ||| 
 ||   | 4c12-ad47- ||   
 || 29T12:41:40.00 | 29T12:41:42.00  |
|| a73b1a19c0c9   | ||| 
 ||   | 12b8dd7a7384   ||   
 ||| |
+++-+++--++---++++++-+

# After some timeout (around 60 s) the migration goes to 'error' status.
$ openstack server migration list --server vm0 --sort-column 'Created At'
++-+-+++--+++-++++-+--+
| Id | UUID| Source Node | Dest Node  | Source Compute | Dest 
Compute | Dest Host  | Status | Server UUID | Old Flavor | New 
Flavor | Type   | Created At  | Updated At   |
++-+-+++--+++-++++-+--+
| 33 | c7a42f9e-dfee-4a2c- | devstack0   | devstack0a | devstack0  | 
devstack0a   | 192.168.122.79 | error  | a2b43180-8ad9-4c12- | 11 | 
11 | live-migration | 2024-01-| 2024-01- |
|| b42a-a73b1a19c0c9   | |||
  ||| ad47-12b8dd7a7384   ||
|| 29T12:41:40.00  | 29T12:42:42.00   |
++-+-+++--+++-++++-+--+

# Wait before restarting n-cpu on devstack0a. I don't think I fully understand 
the factors of when the migration ends up finally in failed or in error status. 
Currently it seems to me if I restart n-cpu too quickly the migration goes to 
the failed state right after restart. 

[Yahoo-eng-team] [Bug 2051351] [NEW] explicity_egress_direct prevents learning of local MACs and causes flooding of ingress packets, firewall_driver = openvswitch

2024-01-26 Thread Bence Romsics
Public bug reported:

I believe this issue was already reported earlier:

https://bugs.launchpad.net/neutron/+bug/1884708

That bug has a fix committed:

https://review.opendev.org/c/openstack/neutron/+/738551

However I believe the above change fixed only part of the issue (with 
firewall_driver=noop).
But the same problem is still not fixed with firewall_driver=openvswitch.

First, I re-opened bug #1884708, but then I realized that nobody will
notice a several year old bug's status change, so I rather opened this
new bug report instead.

Reproduction:

# config
ml2_conf.ini:
[securitygroup]
firewall_driver = openvswitch
[agent]
explicitly_egress_direct = True
[ovs]
bridge_mappings = physnet0:br-physnet0,...

# a random IP on net0 we can ping
sudo ip link set up dev br-physnet0
sudo ip link add link br-physnet0 name br-physnet0.100 type vlan id 100
sudo ip link set up dev br-physnet0.100
sudo ip address add dev br-physnet0.100 10.0.100.1/24

# code
devstack 6b0f055b
neutron $ git log --oneline -n2
27601f8eea (HEAD, origin/bug/2048785, origin/HEAD) Set trunk parent port as 
access port in ovs to avoid loop
3ef02cc2fb (origin/master) Consume code from neutron-lib
openvswitch 2.17.8-0ubuntu0.22.04.1
linux 5.15.0-91-generic

# clean up first
openstack server delete vm0 --wait
openstack port delete port0
openstack network delete net1 net0

# build the environment
openstack network create net0 --provider-network-type vlan 
--provider-physical-network physnet0 --provider-segment 100
openstack subnet create --network net0 --subnet-range 10.0.100.0/24 subnet0
openstack port create --no-security-group --disable-port-security --network 
net0 --fixed-ip ip-address=10.0.100.10 port0
openstack server create --flavor cirros256 --image cirros-0.6.2-x86_64-disk 
--nic port-id=port0 --availability-zone :devstack0a --wait vm0

# mac addresses for reference
$ openstack port show port0 -f value -c mac_address
fa:16:3e:96:58:ab
$ ifdata -ph br-physnet0
82:E8:18:67:7E:40

# generate traffic that will keep fdb entries fresh
sudo virsh console "$( openstack server show vm0 -f value -c 
OS-EXT-SRV-ATTR:instance_name )"
ping 10.0.100.1

# clear all past junk
for br in br-physnet0 br-int ; do sudo ovs-appctl fdb/flush "$br" ; done

# br-int does not learn port0's mac despite the ongoing ping
for br in br-physnet0 br-int ; do echo ">>> $br <<<" ; sudo ovs-appctl fdb/show 
"$br" | egrep -i "$( openstack port show port0 -f value -c mac_address )|$( 
ifdata -ph br-physnet0 )" ; done
>>> br-physnet0 <<<
1   100  fa:16:3e:96:58:ab0
LOCAL   100  82:e8:18:67:7e:400
>>> br-int <<<
1 4  82:e8:18:67:7e:400

# port and physnet bridge mac in all fdbs, egress == vnic -> physnet bridge
# in br-int we have a direct output action
$ sudo ovs-appctl ofproto/trace br-int in_port="$( sudo ovs-vsctl -- 
--columns=ofport find Interface name=$( echo "tap$( openstack port show port0 
-f value -c id )" | cut -b1-14 ) | awk '{ print $3 }' )",dl_vlan=0,dl_dst=$( 
ifdata -ph br-physnet0 ),dl_src=$( openstack port show port0 -f value -c 
mac_address )
Flow: 
in_port=45,dl_vlan=0,dl_vlan_pcp=0,vlan_tci1=0x,dl_src=fa:16:3e:96:58:ab,dl_dst=82:e8:18:67:7e:40,dl_type=0x

bridge("br-int")

 0. priority 0, cookie 0x2b36d6b4a42fe7b5
goto_table:58
58. priority 0, cookie 0x2b36d6b4a42fe7b5
goto_table:60
60. in_port=45, priority 100, cookie 0x2b36d6b4a42fe7b5
set_field:0x2d->reg5
set_field:0x4->reg6
resubmit(,73)
73. reg5=0x2d, priority 80, cookie 0x2b36d6b4a42fe7b5
resubmit(,94)
94. 
reg6=0x4,dl_src=fa:16:3e:96:58:ab,dl_dst=00:00:00:00:00:00/01:00:00:00:00:00, 
priority 10, cookie 0x2b36d6b4a42fe7b5
push_vlan:0x8100
set_field:4100->vlan_vid
output:1

bridge("br-physnet0")
-
 0. in_port=1,dl_vlan=4, priority 4, cookie 0x85bc1a5077d54d3f
set_field:4196->vlan_vid
NORMAL
 -> forwarding to learned port

Final flow: 
reg5=0x2d,reg6=0x4,in_port=45,dl_vlan=4,dl_vlan_pcp=0,dl_vlan1=0,dl_vlan_pcp1=0,dl_src=fa:16:3e:96:58:ab,dl_dst=82:e8:18:67:7e:40,dl_type=0x
Megaflow: 
recirc_id=0,eth,in_port=45,dl_vlan=0,dl_vlan_pcp=0,dl_src=fa:16:3e:96:58:ab,dl_dst=82:e8:18:67:7e:40,dl_type=0x
Datapath actions: pop_vlan,push_vlan(vid=100,pcp=0),1

# port and physnet bridge mac in all fdbs, ingress == physnet bridge -> vnic
# in br-int we have the normal action flooding, despite the ongoing ping
$ sudo ovs-appctl ofproto/trace br-physnet0 in_port=LOCAL,dl_vlan=100,dl_src=$( 
ifdata -ph br-physnet0 ),dl_dst=$( openstack port show port0 -f value -c 
mac_address )
Flow: 
in_port=LOCAL,dl_vlan=100,dl_vlan_pcp=0,vlan_tci1=0x,dl_src=82:e8:18:67:7e:40,dl_dst=fa:16:3e:96:58:ab,dl_type=0x

bridge("br-physnet0")
-
 0. priority 0, cookie 0x85bc1a5077d54d3f
NORMAL
 -> forwarding to learned port

bridge("br-int")

 0. in_port=1,dl_vlan=100, priority 3, cookie 0x2b36d6b4a42fe7b5
set_field:4100->vlan_vid
goto_table:58
58. 

[Yahoo-eng-team] [Bug 1884708] Re: explicity_egress_direct prevents learning of local MACs and causes flooding of ingress packets

2024-01-26 Thread Bence Romsics
** Changed in: neutron
   Status: New => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1884708

Title:
  explicity_egress_direct prevents learning of local MACs and causes
  flooding of ingress packets

Status in neutron:
  Fix Released

Bug description:
  We took this bug fix: https://bugs.launchpad.net/neutron/+bug/1732067
  and then also backported ourselves
  https://bugs.launchpad.net/neutron/+bug/1866445

  The latter is for iptables based firewall.

  We have VLAN based networks, and seeing ingress packets destined to
  local MACs being flooded. We are not seeing any local MACs present
  under ovs-appctl fdb/show br-int.

  Consider following example:

  HOST 1:
  MAC A = fa:16:3e:c1:01:43
  MAC B = fa:16:3e:de:0b:8a

  HOST 2:
  MAC C = fa:16:3e:d6:3f:31

  A is talking to C. Snooping on qvo interface of B, we are seeing all
  the traffic destined to MAC A (along with other unicast traffic not
  destined to or sourced from MAC B. Neither Mac A or B are present in
  br-int FDB, despite sending heavy traffic.

  
  Here is ofproto trace for such packet. in_port 8313 is qvo of MAC A:

  sudo ovs-appctl ofproto/trace br-int 
in_port=8313,tcp,dl_src=fa:16:3e:c1:01:43,dl_dst=fa:16:3e:d6:3f:31
  Flow: 
tcp,in_port=8313,vlan_tci=0x,dl_src=fa:16:3e:c1:01:43,dl_dst=fa:16:3e:d6:3f:31,nw_src=0.0.0.0,nw_dst=0.0.0.0,nw_tos=0,nw_ecn=0,nw_ttl=0,tp_src=0,tp_dst=0,tcp_flags=0

  bridge("br-int")
  
   0. in_port=8313, priority 9, cookie 0x9a67096130ac45c2
  goto_table:25
  25. in_port=8313,dl_src=fa:16:3e:c1:01:43, priority 2, cookie 
0x9a67096130ac45c2
  goto_table:60
  60. in_port=8313,dl_src=fa:16:3e:c1:01:43, priority 9, cookie 
0x9a67096130ac45c2
  resubmit(,61)
  61. 
in_port=8313,dl_src=fa:16:3e:c1:01:43,dl_dst=00:00:00:00:00:00/01:00:00:00:00:00,
 priority 10, cookie 0x9a67096130ac45c2
  push_vlan:0x8100
  set_field:4098->vlan_vid
  output:1

  bridge("br-ext")
  
   0. in_port=2, priority 2, cookie 0xab09adf2af892674
  goto_table:1
   1. priority 0, cookie 0xab09adf2af892674
  goto_table:2
   2. in_port=2,dl_vlan=2, priority 4, cookie 0xab09adf2af892674
  set_field:4240->vlan_vid
  NORMAL
   -> forwarding to learned port

  bridge("br-vlan")
  -
   0. priority 1, cookie 0x651552fc69601a2d
  goto_table:3
   3. priority 1, cookie 0x651552fc69601a2d
  NORMAL
   -> forwarding to learned port

  Final flow: 
tcp,in_port=8313,dl_vlan=2,dl_vlan_pcp=0,vlan_tci1=0x,dl_src=fa:16:3e:c1:01:43,dl_dst=fa:16:3e:d6:3f:31,nw_src=0.0.0.0,nw_dst=0.0.0.0,nw_tos=0,nw_ecn=0,nw_ttl=0,tp_src=0,tp_dst=0,tcp_flags=0
  Megaflow: 
recirc_id=0,eth,ip,in_port=8313,vlan_tci=0x/0x1fff,dl_src=fa:16:3e:c1:01:43,dl_dst=fa:16:3e:d6:3f:31,nw_frag=no
  Datapath actions: push_vlan(vid=144,pcp=0),51

  
  Because it took output: action from table=61, added by fix 
explicitly_egress_direct, the local MAC is not learned. But on ingress, the 
packet is hitting table=60's NORMAL action, causing it to be flooded because it 
never knows where to send the local MAC.

  sudo ovs-appctl ofproto/trace br-int 
in_port=1,dl_vlan=144,dl_src=fa:16:3e:d6:3f:31,dl_dst=fa:16:3e:c1:01:43
  Flow: 
in_port=1,dl_vlan=144,dl_vlan_pcp=0,vlan_tci1=0x,dl_src=fa:16:3e:d6:3f:31,dl_dst=fa:16:3e:c1:01:43,dl_type=0x

  bridge("br-int")
  
   0. in_port=1,dl_vlan=144, priority 3, cookie 0x9a67096130ac45c2
  set_field:4098->vlan_vid
  goto_table:60
  60. priority 3, cookie 0x9a67096130ac45c2
  NORMAL
   -> no learned MAC for destination, flooding

  bridge("br-vlan")
  -
   0. in_port=4, priority 2, cookie 0x651552fc69601a2d
  goto_table:1
   1. priority 0, cookie 0x651552fc69601a2d
  goto_table:2
   2. in_port=4, priority 2, cookie 0x651552fc69601a2d
  drop

  bridge("br-tun")
  
   0. in_port=1, priority 1, cookie 0xf1baf24d000c6f7c
  goto_table:1
   1. priority 0, cookie 0xf1baf24d000c6f7c
  goto_table:2
   2. dl_dst=00:00:00:00:00:00/01:00:00:00:00:00, priority 0, cookie 
0xf1baf24d000c6f7c
  goto_table:20
  20. priority 0, cookie 0xf1baf24d000c6f7c
  goto_table:22
  22. priority 0, cookie 0xf1baf24d000c6f7c
  drop

  Final flow: 
in_port=1,dl_vlan=2,dl_vlan_pcp=0,vlan_tci1=0x,dl_src=fa:16:3e:d6:3f:31,dl_dst=fa:16:3e:c1:01:43,dl_type=0x
  Megaflow: 
recirc_id=0,eth,in_port=1,dl_vlan=144,dl_vlan_pcp=0,dl_src=fa:16:3e:d6:3f:31,dl_dst=fa:16:3e:c1:01:43,dl_type=0x
  Datapath actions: 

[Yahoo-eng-team] [Bug 1884708] Re: explicity_egress_direct prevents learning of local MACs and causes flooding of ingress packets

2024-01-24 Thread Bence Romsics
I'm reopening this because I believe the fix committed fixes only part
of the problem. With firewall_driver=noop the unnecessary ingress
flooding on br-int is gone. However we still have the same unnecessary
flooding with firewall_driver=openvswitch. For details and a full
reproduction please comments to bug #2048785:

https://bugs.launchpad.net/neutron/+bug/2048785/comments/2
https://bugs.launchpad.net/neutron/+bug/2048785/comments/6


** Changed in: neutron
   Status: Fix Released => New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1884708

Title:
  explicity_egress_direct prevents learning of local MACs and causes
  flooding of ingress packets

Status in neutron:
  New

Bug description:
  We took this bug fix: https://bugs.launchpad.net/neutron/+bug/1732067
  and then also backported ourselves
  https://bugs.launchpad.net/neutron/+bug/1866445

  The latter is for iptables based firewall.

  We have VLAN based networks, and seeing ingress packets destined to
  local MACs being flooded. We are not seeing any local MACs present
  under ovs-appctl fdb/show br-int.

  Consider following example:

  HOST 1:
  MAC A = fa:16:3e:c1:01:43
  MAC B = fa:16:3e:de:0b:8a

  HOST 2:
  MAC C = fa:16:3e:d6:3f:31

  A is talking to C. Snooping on qvo interface of B, we are seeing all
  the traffic destined to MAC A (along with other unicast traffic not
  destined to or sourced from MAC B. Neither Mac A or B are present in
  br-int FDB, despite sending heavy traffic.

  
  Here is ofproto trace for such packet. in_port 8313 is qvo of MAC A:

  sudo ovs-appctl ofproto/trace br-int 
in_port=8313,tcp,dl_src=fa:16:3e:c1:01:43,dl_dst=fa:16:3e:d6:3f:31
  Flow: 
tcp,in_port=8313,vlan_tci=0x,dl_src=fa:16:3e:c1:01:43,dl_dst=fa:16:3e:d6:3f:31,nw_src=0.0.0.0,nw_dst=0.0.0.0,nw_tos=0,nw_ecn=0,nw_ttl=0,tp_src=0,tp_dst=0,tcp_flags=0

  bridge("br-int")
  
   0. in_port=8313, priority 9, cookie 0x9a67096130ac45c2
  goto_table:25
  25. in_port=8313,dl_src=fa:16:3e:c1:01:43, priority 2, cookie 
0x9a67096130ac45c2
  goto_table:60
  60. in_port=8313,dl_src=fa:16:3e:c1:01:43, priority 9, cookie 
0x9a67096130ac45c2
  resubmit(,61)
  61. 
in_port=8313,dl_src=fa:16:3e:c1:01:43,dl_dst=00:00:00:00:00:00/01:00:00:00:00:00,
 priority 10, cookie 0x9a67096130ac45c2
  push_vlan:0x8100
  set_field:4098->vlan_vid
  output:1

  bridge("br-ext")
  
   0. in_port=2, priority 2, cookie 0xab09adf2af892674
  goto_table:1
   1. priority 0, cookie 0xab09adf2af892674
  goto_table:2
   2. in_port=2,dl_vlan=2, priority 4, cookie 0xab09adf2af892674
  set_field:4240->vlan_vid
  NORMAL
   -> forwarding to learned port

  bridge("br-vlan")
  -
   0. priority 1, cookie 0x651552fc69601a2d
  goto_table:3
   3. priority 1, cookie 0x651552fc69601a2d
  NORMAL
   -> forwarding to learned port

  Final flow: 
tcp,in_port=8313,dl_vlan=2,dl_vlan_pcp=0,vlan_tci1=0x,dl_src=fa:16:3e:c1:01:43,dl_dst=fa:16:3e:d6:3f:31,nw_src=0.0.0.0,nw_dst=0.0.0.0,nw_tos=0,nw_ecn=0,nw_ttl=0,tp_src=0,tp_dst=0,tcp_flags=0
  Megaflow: 
recirc_id=0,eth,ip,in_port=8313,vlan_tci=0x/0x1fff,dl_src=fa:16:3e:c1:01:43,dl_dst=fa:16:3e:d6:3f:31,nw_frag=no
  Datapath actions: push_vlan(vid=144,pcp=0),51

  
  Because it took output: action from table=61, added by fix 
explicitly_egress_direct, the local MAC is not learned. But on ingress, the 
packet is hitting table=60's NORMAL action, causing it to be flooded because it 
never knows where to send the local MAC.

  sudo ovs-appctl ofproto/trace br-int 
in_port=1,dl_vlan=144,dl_src=fa:16:3e:d6:3f:31,dl_dst=fa:16:3e:c1:01:43
  Flow: 
in_port=1,dl_vlan=144,dl_vlan_pcp=0,vlan_tci1=0x,dl_src=fa:16:3e:d6:3f:31,dl_dst=fa:16:3e:c1:01:43,dl_type=0x

  bridge("br-int")
  
   0. in_port=1,dl_vlan=144, priority 3, cookie 0x9a67096130ac45c2
  set_field:4098->vlan_vid
  goto_table:60
  60. priority 3, cookie 0x9a67096130ac45c2
  NORMAL
   -> no learned MAC for destination, flooding

  bridge("br-vlan")
  -
   0. in_port=4, priority 2, cookie 0x651552fc69601a2d
  goto_table:1
   1. priority 0, cookie 0x651552fc69601a2d
  goto_table:2
   2. in_port=4, priority 2, cookie 0x651552fc69601a2d
  drop

  bridge("br-tun")
  
   0. in_port=1, priority 1, cookie 0xf1baf24d000c6f7c
  goto_table:1
   1. priority 0, cookie 0xf1baf24d000c6f7c
  goto_table:2
   2. dl_dst=00:00:00:00:00:00/01:00:00:00:00:00, priority 0, cookie 
0xf1baf24d000c6f7c
  goto_table:20
  20. priority 0, cookie 0xf1baf24d000c6f7c
  goto_table:22
  22. priority 0, cookie 0xf1baf24d000c6f7c
  drop

  Final flow: 
in_port=1,dl_vlan=2,dl_vlan_pcp=0,vlan_tci1=0x,dl_src=fa:16:3e:d6:3f:31,dl_dst=fa:16:3e:c1:01:43,dl_type=0x
  

[Yahoo-eng-team] [Bug 2048785] [NEW] Trunk parent port (tpt port) vlan_mode is wrong in ovs

2024-01-09 Thread Bence Romsics
k port show port0b -f value -c mac_address )"
openstack port create --no-security-group --disable-port-security --mac-address 
"$port0b_mac" --network net1 --fixed-ip ip-address=10.0.101.11 port1b

openstack network trunk create --parent-port port0a trunka
openstack network trunk set --subport 
port=port1a,segmentation-type=vlan,segmentation-id=101 trunka

openstack network trunk create --parent-port port0b trunkb
openstack network trunk set --subport 
port=port1b,segmentation-type=vlan,segmentation-id=101 trunkb

openstack server create --flavor ds1G --image u1804 --nic port-id=port0a --wait 
vma
openstack server create --flavor ds1G --image u1804 --nic port-id=port0b --wait 
vmb # booted on the same compute as vma

At the moment I don't have a reproduction independent of that
environment, that re-creates the same state of the bridges' FDBs and the
same kind of traffic.

Anyway, in this environment colleagues observed:
* Lost frames.
* Duplicated frames arriving to the vNIC of one of the VMs.
* Unexpectedly double tagged frames on the physical bridge leaving the compute 
host.

Local analysis showed as the traffic arrived to br-int, which did not have the 
dst MAC in its FDB, had to flood to all ports.
This way the frame ended up on both trunk bridges.
One of these trunk bridges was on the proper way to the destination address.
But the other trunk bridge, also not having the dst MAC in its FDB, had to 
flood to all ports.
And this trunk bridge also flooded the frame to its tpt port back to br-int.
But the tpt port conceptually is in a different VLAN and the frame should never 
have been flooded to that port.
However the tpt port has the wrong configuration and forwards the traffic from 
the wrong VLANs.

After the looped frame got back to br-int, it reached the intended VMs
vNIC via the trunk parent (sic!) port. Which means that the latter trunk
bridge learned the traffic generator's source MAC now on the wrong port.
I have a suspicion that this may have lead to the unexpectedly double
tagged packets in the other direction.

** Affects: neutron
 Importance: Undecided
 Assignee: Bence Romsics (bence-romsics)
 Status: In Progress


** Tags: trunk

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/2048785

Title:
  Trunk parent port (tpt port) vlan_mode is wrong in ovs

Status in neutron:
  In Progress

Bug description:
  ... therefore a forwarding loop, packet duplication, packet loss and
  double tagging is possible.

  Today a trunk bridge with one parent and one subport looks like this:

  # ovs-vsctl show
  ...
  Bridge tbr-b2781877-3
  datapath_type: system
  Port spt-28c9689e-9e
  tag: 101
  Interface spt-28c9689e-9e
  type: patch
  options: {peer=spi-28c9689e-9e}
  Port tap3709f1a1-a5
  Interface tap3709f1a1-a5
  Port tpt-3709f1a1-a5
  Interface tpt-3709f1a1-a5
  type: patch
  options: {peer=tpi-3709f1a1-a5}
  Port tbr-b2781877-3
  Interface tbr-b2781877-3
  type: internal
  ...

  # ovs-vsctl find Port name=tpt-3709f1a1-a5 | egrep 'tag|vlan_mode|trunks'
  tag : []
  trunks  : []
  vlan_mode   : []

  # ovs-vsctl find Port name=spt-28c9689e-9e | egrep 'tag|vlan_mode|trunks'
  tag : 101
  trunks  : []
  vlan_mode   : []

  I believe the vlan_mode of the tpt port is wrong (at least when the port is 
not "vlan_transparent") and it should have the value "access".
  Even when the port is "vlan_transparent", forwarding loops between br-int and 
a trunk bridge should be prevented.

  According to: http://www.openvswitch.org/support/dist-docs/ovs-
  vswitchd.conf.db.5.txt

  """
 vlan_mode: optional string, one of access, dot1q-tunnel, native-tagged,
 native-untagged, or trunk
The VLAN mode of the port, as described above. When this  column
is empty, a default mode is selected as follows:

•  If  tag contains a value, the port is an access port. The
   trunks column should be empty.

•  Otherwise, the port is a trunk port.  The  trunks  column
   value is honored if it is present.
  """

  """
 trunks: set of up to 4,096 integers, in range 0 to 4,095
For  a trunk, native-tagged, or native-untagged port, the 802.1Q
VLAN or VLANs that this port trunks; if it is  empty,  then  the
port trunks all VLANs. Must be empty if this is an access port.

A native-tagged or native-untagged port always trunks its native
VLAN, regardless of whether trunks i

[Yahoo-eng-team] [Bug 2042598] Re: neutron_server container suspended in health:starting state

2023-11-03 Thread Bence Romsics
Hi,

Thanks for the report!

At first glance this looks like a deployment problem, not a neutron bug.
From neutron perspective there's no clear error symptom described (other
than "networking does not work"). And no neutron log (the attached "log
from neutron_server" stops right when neutron-server is started). Even
if there is a neutron bug, this is not enough to identify and/or debug
it.

I'm no kolla expert (not even a kolla user), but I would recommend that
you turn with your questions to kolla folks, for example on their irc
channel (#kolla on irc.oftc.net, archives:
https://meetings.opendev.org/) or on the mailing list
(https://lists.openstack.org/mailman3/lists/openstack-
discuss.lists.openstack.org/). It would also help in debugging if you
collected actual neutron-server logs to see why it did not start
properly.

Hope this helps,
Bence

** Changed in: neutron
   Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/2042598

Title:
  neutron_server container suspended in health:starting state

Status in neutron:
  Invalid

Bug description:
  I installed OpenStack (zed) on a Raspberry Pi cluster with kolla-
  ansible (version tagged for zed), all containers are healthy except
  the neutron_server which is suspended in 'health: starting' state.
  Network related part of OpenStack does not work. Some other commands
  commands work as expected (e.g., can create an image which is reported
  by openstack image list as 'active').

  There are four Raspberry Pi 4B in the cluster (2 x 4GB RAM and 2 x 8GB RAM). 
They run Debian 11 (bullseaye) and kolla-ansible has been used for the 
installation.
  Notably, I'm using a specific configuration of networking on my Pis to mimic 
two network interfaces on each host as kolla-ansible expects. These are 
provided as interfaces of veth pairs (more details on that below, too).

  Below, one can find:

  1. configuration commands I used to configure my Pi hosts (this panel)
  2. environment details related to the Pis (the one serving as controller in 
OpenStack) and kolla-ansible install information (this panel)
  3. ml2_conf.ini and nova-compute.conf configuration used in kolla-ansible
  4. kolla-ansible files: globals.yml (4.1) and inventory multinode (4.2)
     - changed parts - this panel
     - complete versions - attachments
  5. HttpException: 503 message from running init-runonce (kolla-ansible test 
script for new installation) (this panel)
  6. status of containers on the control node as reported by 'docker ps -a' 
(this panel)
  7. output form docker neutron_server inspect command (attachment)
  8. log form neutron_server container (attachment)

  *
  1. Debian configuration on the Pis
  *

  Selected details fo the configuration are given in the following.
  Basically, most of them are needed to configure Pis' host networking
  using netplan. Another one relates to qemu-kvm.

  (Note: initial configs to enable ssh access should be done locally (keyboard, 
monitor) on each Pi, in particular:
  PermitRootLogin yes
  PasswordAuthentication yes
  I skip the details of enabling ssh access, though. Below, I assume ssh acces 
as a regular (non-root) user.
  )

  === Preparation for host networking setup ===

  $ sudo apt-get remove unattended-upgrades -y
  $ sudo apt-get update -y && sudo apt-get upgrade -y

  - updating $PATH for a user
  $ sudo tee -a ~/.bashrc << EOT
  export PATH=$PATH:/usr/local/sbin:/usr/sbin:/sbin
  EOT
  $ source ~/.bashrc

  - enable systemd-networkd and configure eth0 for ssh access (neede to use 
ssh; not neede if one does stuff locally, attaching keyboard and monitor to 
each Pi)
    - enabling systemd-networkd
  $ sudo mv /etc/network/interfaces /etc/network/interfaces.save
  $ sudo mv /etc/network/interfaces.d /etc/network/interfaces.d.save
  $ sudo systemctl enable systemd-networkd && sudo systemctl start 
systemd-networkd
  $ sudo systemctl status systemd-networkd

  - configure eth0 (in may case, I've configured static DHCP for each Pi on my 
DHCP server)
  $ sudo tee /etc/systemd/network/20-wired.network << EOT
  [Match]
  Name=eth0

  [Network]
  DHCP=yes
  EOT

  - install netplan
  $ sudo apt update && sudo apt -y install netplan.io
  $ sudo reboot

  - enable ip forwarding
  $ sudo nano /etc/sysctl.conf
   ===> uncomment the line: net.ipv4.ip_forward=1
  $sudo sysctl -p

  = Host networking setup ==
  - network setup on each Pi host - drawing:

  192.168.1.xy/24   bez adresu IP
    +-+   +-+
    |  veth0  |   |  veth1  |< network-interface and 
network-external-interface for kolla-ansible
    +-+   +-+
     |   veth  pairs   |
    +-+   +-+
    | veth0br |   | veth1br |

[Yahoo-eng-team] [Bug 2042089] Re: neutron : going to shared network is working, going back not

2023-11-02 Thread Bence Romsics
Hi,

Thanks for the report!

I'm not sure if the behavior you describe is a bug. If multiple projects
are actually using a shared network, why would you expect it to be
unshared without an error? How should such a network work when it's
shared=False but it has multiple tenants on it?

Maybe I'm missing what you mean. In that case can you please give me a
series of commands, inlcuding which one should behave differently and
how?

** Changed in: neutron
   Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/2042089

Title:
  neutron : going to shared network is working, going back not

Status in neutron:
  Invalid

Bug description:
  We have admin-generated provider-networks. Projects are allowed to
  create ports and instances on these networks. When we now set the
  "shared" property on these networks, we are no longer allowed to unset
  this property. We get the error : "Unable to reconfigure sharing
  settings for network net.vlan10.provider. Multiple tenants are using
  it.".  Once all ports and instances created by non-admin projects are
  removed we can again unset the "shared" property. So, we are allowed
  to set a parameter for which it is afterwards no longer possible to
  unset. We have now a network that is visible by all and we do not
  prefer this situation. Removing the corresponding RBAC policy is also
  not allowed.

  This is a OpenStack-Ansible installation with version Yoga.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/2042089/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1838760] Re: Security groups don't work for trunk ports with iptables_hybrid fw driver

2023-09-06 Thread Bence Romsics
I believe regarding this bug report what could be done, has been done.
Other fixes are not going to happen, therefore I'm setting this to Won't
Fix, to clean up the open bug list.

** Changed in: neutron
   Status: Confirmed => Won't Fix

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1838760

Title:
  Security groups don't work for trunk ports with iptables_hybrid fw
  driver

Status in neutron:
  Won't Fix

Bug description:
  When iptables_hybrid firewall driver is used, security groups don't
  work for trunk ports as vlan tagged packes on qbr bridge aren't
  filtered by default at all.

  I found it when I was trying to add new CI job
  https://review.opendev.org/#/c/670738/ and I noticed that this job is
  failing constantly on Queens release.

  On Rocky and newer this new job is fine and the difference between
  those jobs is firewall_driver - since rocky we are using openvswitch
  fw driver instead of iptables_hybrid. I also confirmed locally that
  when I switched firewall driver to openvswitch, same test worked fine
  for me.

  I did some debugging on Queens release locally and it looks that flag
  /proc/sys/net/bridge/bridge-nf-filter-vlan-tagged should be set to 1
  to make it possible to filter vlan tagged traffic in iptables, see
  https://ebtables.netfilter.org/documentation/bridge-nf.html for
  details.

  But even if this knob is switched to "1", there are probably bigger
  changes required as vlan header which belongs to those packets should
  be included in iptables rules to match on proper packets.

  My test was done on stable/queens branch of neutron but I'm pretty
  sure that the same issue exists still in master. We simply don't see
  it as we are testing it with openvswitch fw driver.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1838760/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 2028544] Re: dhcp agent binding count greather than dhcp_agents_per_network

2023-07-26 Thread Bence Romsics
** Changed in: neutron
   Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/2028544

Title:
  dhcp agent binding count greather than dhcp_agents_per_network

Status in neutron:
  Invalid

Bug description:
  neutron version: train
  dhcp_agents_per_network = 2
  execute command "neutron dhcp-agent-network-add" bind a network to dhcp 
agent, but not check configuration dhcp_agents_per_network.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/2028544/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 2025480] [NEW] overlapping pinned CPUs after unshelve

2023-06-30 Thread Bence Romsics
Public bug reported:

It seems that after unshelve, occasionally the request for a dedicated
CPU is ignored. More precisely the first pinned CPU does not seem to be
marked as consumed, so the second may end up on the same CPU. This was
first observed on victoria (6 times out of 46 tries), but then I was
able to reproduce it on master too (6 times out of 20 tries). The logs
attached are from the victoria environment, which was a single-host all-
in-one devstack running only the vms used for this reprouduction.

stable/victoria
devstack 3eb6e2d7
nova 1aca09b966

master
devstack b10c0602
nova 2aea80c0af

config:
[[post-config|/etc/nova/nova.conf]]
[DEFAULT]
scheduler_default_filters = NUMATopologyFilter, ...
[compute]
cpu_dedicated_set = 0,1

Confirming this config in placement:
$ openstack --os-placement-api-version 1.17 resource provide inventory show 
46b3d4de-bb45-4607-8860-040eb2dcd0d7 PCPU
+--+---+
| Field| Value |
+--+---+
| allocation_ratio | 1.0   |
| min_unit | 1 |
| max_unit | 2 |
| reserved | 0 |
| step_size| 1 |
| total| 2 |
+--+---+

Reproduction steps:
openstack flavor create cirros256-pinned --public --vcpus 1 --ram 256 --disk 1 
--property hw_rng:allowed=True --property hw:cpu_policy=dedicated

openstack server list -f value -c ID | xargs -r openstack server delete
--wait

openstack server create --flavor cirros256-pinned --image 
cirros-0.5.1-x86_64-disk --nic net-id=private vm0 --wait
openstack server shelve vm0
sleep 10 # make sure shelve finished
openstack server create --flavor cirros256-pinned --image 
cirros-0.5.1-x86_64-disk --nic net-id=private vm1 --wait
openstack server shelve vm1
sleep 10

openstack server unshelve vm0 ; sleep 15 ; openstack server unshelve vm1 # the 
amount of sleep could easily be relevant
watch openstack server list # wait until both go ACTIVE

# both vms ended up on the same cpu
$ for vm in $( sudo virsh list --name ) ; do sudo virsh dumpxml $vm | 
xmlstarlet sel -t -v '//vcpupin/@cpuset' ; echo ; done
0
0

Data collected from the environment where the above reproduction
triggered the bug:

$ openstack server list
+--+--+++--+--+
| ID   | Name | Status | Networks   
| Image| Flavor   |
+--+--+++--+--+
| 4734b8a5-a6dd-432a-86c9-ba0367bb86cc | vm1  | ACTIVE | private=10.0.0.27, 
fdfb:ab27:b2b2:0:f816:3eff:fe80:2fd | cirros-0.5.1-x86_64-disk | 
cirros256-pinned |
| e30de509-6988-4535-a6f5-520c52fba087 | vm0  | ACTIVE | private=10.0.0.6, 
fdfb:ab27:b2b2:0:f816:3eff:fe78:d368 | cirros-0.5.1-x86_64-disk | 
cirros256-pinned |
+--+--+++--+--+

$ openstack server show vm0
+-+-+
| Field   | Value   
|
+-+-+
| OS-DCF:diskConfig   | MANUAL  
|
| OS-EXT-AZ:availability_zone | nova
|
| OS-EXT-SRV-ATTR:host| devstack1v  
|
| OS-EXT-SRV-ATTR:hypervisor_hostname | devstack1v  
|
| OS-EXT-SRV-ATTR:instance_name   | instance-001f   
|
| OS-EXT-STS:power_state  | Running 
|
| OS-EXT-STS:task_state   | None
|
| OS-EXT-STS:vm_state | active  
|
| OS-SRV-USG:launched_at  | 2023-06-29T10:45:25.00  
|
| OS-SRV-USG:terminated_at| None
|
| accessIPv4  | 
|
| accessIPv6  | 
|
| addresses   | private=10.0.0.6, 
fdfb:ab27:b2b2:0:f816:3eff:fe78:d368  |
| config_drive|   

[Yahoo-eng-team] [Bug 2025341] [NEW] flows lost with noop firewall driver at ovs-agent restart while the db is down

2023-06-29 Thread Bence Romsics
Public bug reported:

If we restart ovs-agent while neutron-server is up but neutron DB is
down, then the agent deletes and cannot recover the per-port flows, if
we also use the noop firewall driver. Because the affected flows include
the mod_vlan_vid flows this means traffic loss until another agent
restart (with the db up) or a full successful resync happens.

For example:

[securitygroup]
firewall_driver = noop

openstack server delete vm0 --wait
openstack server create --flavor cirros256-pinned --image 
cirros-0.5.2-x86_64-disk --nic net-id=private vm0 --wait

sudo ovs-ofctl dump-flows br-int > ~/noop-db-stop.1

# execute these by hand and make sure that each command took effect before 
moving on to the next
sudo systemctl stop mysql
sudo systemctl restart devstack@q-agt

sudo ovs-ofctl dump-flows br-int > ~/noop-db-stop.2

# diff the flows (for the sake of simplicity this devstack environment has a 
single vm with a single port, started above)
a=1 ; b=2 ; base=noop-db-stop. ; colordiff -u <( cat ~/$base$a | egrep -v 
^NXST_FLOW | sed -r -e 
's/(cookie|duration|n_packets|n_bytes|idle_age|hard_age)=[^ ]+ //g' -e 's/^ 
*//' -e 's/, +/ /g' | sort ) <( cat ~/$base$b | egrep -v ^NXST_FLOW | sed -r -e 
's/(cookie|duration|n_packets|n_bytes|idle_age|hard_age)=[^ ]+ //g' -e 's/^ 
*//' -e 's/, +/ /g' | sort )

--- /dev/fd/63  2023-06-29 08:10:00.142623814 +
+++ /dev/fd/62  2023-06-29 08:10:00.142623814 +
@@ -1,19 +1,10 @@
 table=0 priority=0 actions=resubmit(,58)
-table=0 priority=10,arp,in_port=12 actions=resubmit(,24)
-table=0 priority=10,icmp6,in_port=12,icmp_type=136 actions=resubmit(,24)
 table=0 priority=200,reg3=0 
actions=set_queue:0,load:0x1->NXM_NX_REG3[0],resubmit(,0)
 table=0 priority=2,in_port=1 actions=drop
 table=0 priority=2,in_port=2 actions=drop
-table=0 priority=3,in_port=1,vlan_tci=0x/0x1fff 
actions=mod_vlan_vid:2,resubmit(,58)
-table=0 priority=3,in_port=2,dl_vlan=100 actions=mod_vlan_vid:3,resubmit(,58)
 table=0 priority=65535,dl_vlan=4095 actions=drop
-table=0 priority=9,in_port=12 actions=resubmit(,25)
 table=23 priority=0 actions=drop
 table=24 priority=0 actions=drop
-table=24 priority=2,arp,in_port=12,arp_spa=10.0.0.19 actions=resubmit(,25)
-table=24 
priority=2,icmp6,in_port=12,icmp_type=136,nd_target=fd17:d094:5207:0:f816:3eff:fe8e:b23f
 actions=resubmit(,58)
-table=24 
priority=2,icmp6,in_port=12,icmp_type=136,nd_target=fe80::f816:3eff:fe8e:b23f 
actions=resubmit(,58)
-table=25 priority=2,in_port=12,dl_src=fa:16:3e:8e:b2:3f actions=resubmit(,30)
 table=30 priority=0 actions=resubmit(,58)
 table=31 priority=0 actions=resubmit(,58)
 table=58 priority=0 actions=resubmit(,60)

The same loss of flows does not happen with the openvswitch firewall
driver:

[securitygroup]
firewall_driver = openvswitch

openstack server delete vm0 --wait
openstack server create --flavor cirros256-pinned --image 
cirros-0.5.2-x86_64-disk --nic net-id=private vm0 --wait

sudo ovs-ofctl dump-flows br-int > ~/openvswitch-db-stop.1

sudo systemctl stop mysql
sudo systemctl restart devstack@q-agt

sudo ovs-ofctl dump-flows br-int > ~/openvswitch-db-stop.2

a=1 ; b=2 ; base=openvswitch-db-stop. ; colordiff -u <( cat ~/$base$a |
egrep -v ^NXST_FLOW | sed -r -e
's/(cookie|duration|n_packets|n_bytes|idle_age|hard_age)=[^ ]+ //g' -e
's/^ *//' -e 's/, +/ /g' | sort ) <( cat ~/$base$b | egrep -v ^NXST_FLOW
| sed -r -e 's/(cookie|duration|n_packets|n_bytes|idle_age|hard_age)=[^
]+ //g' -e 's/^ *//' -e 's/, +/ /g' | sort )

[no diff]

The same loss of flows does not happen either if neutron-server is down
while ovs-agent restarts:

[securitygroup]
firewall_driver = noop

openstack server delete vm0 --wait
openstack server create --flavor cirros256-pinned --image 
cirros-0.5.2-x86_64-disk --nic net-id=private vm0 --wait

sudo ovs-ofctl dump-flows br-int > ~/noop-server-stop.1

sudo systemctl stop devstack@q-svc
sudo systemctl restart devstack@q-agt

sudo ovs-ofctl dump-flows br-int > ~/noop-server-stop.2

a=1 ; b=2 ; base=noop-server-stop. ; colordiff -u <( cat ~/$base$a |
egrep -v ^NXST_FLOW | sed -r -e
's/(cookie|duration|n_packets|n_bytes|idle_age|hard_age)=[^ ]+ //g' -e
's/^ *//' -e 's/, +/ /g' | sort ) <( cat ~/$base$b | egrep -v ^NXST_FLOW
| sed -r -e 's/(cookie|duration|n_packets|n_bytes|idle_age|hard_age)=[^
]+ //g' -e 's/^ *//' -e 's/, +/ /g' | sort )

[no diff]

devstack b10c0602
neutron 0c5d4b8728

I'll push a proposed fix soon.

** Affects: neutron
     Importance: Undecided
 Assignee: Bence Romsics (bence-romsics)
 Status: New


** Tags: ovs

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/2025341

Title:
  flows lost with noop firewall driver at ovs-agent restart while the db
  is down

Status in neutron:
  New

Bug description:
  If we restart ovs-agent while neutron-server is up but neutron DB is
  down, then the agent deletes and can

[Yahoo-eng-team] [Bug 2008712] [NEW] Security group rule deleted by cascade (because its remote group had been deleted) is not deleted in the backend

2023-02-27 Thread Bence Romsics
Public bug reported:

devstack 7533276c
neutron aa40aef70f

This reproduction uses the openvswitch ml2 mechanism_driver and
firewall_driver, but I believe this bug affects all mechanism_drivers.

# Choose a port number no other rule uses on the test host.
$ sudo ovs-ofctl dump-flows br-int | egrep 1234
[nothing]

# Create two security groups.
$ openstack security group create sg1
$ openstack security group create sg2

# Create a rule in sg1 that references sg2 (as remote group).
$ openstack security group rule create sg1 --ingress --ethertype IPv4 
--dst-port 1234:1234 --protocol tcp --remote-group sg2

# The API returns the new rule.
$ openstack security group rule list sg1
+--+-+---+---++---+--+--+
| ID   | IP Protocol | Ethertype | IP Range  | 
Port Range | Direction | Remote Security Group| Remote Address 
Group |
+--+-+---+---++---+--+--+
| 77db9548-b3ab-46ea-94a5-f00f6a4062da | None| IPv4  | 0.0.0.0/0 |  
  | egress| None | None 
|
| 9b569a88-177a-4422-a0f3-6ed039e0217a | tcp | IPv4  | 0.0.0.0/0 | 
1234:1234  | ingress   | 7df90218-3d52-4156-9630-43563a3d5ba6 | None
 |
| f40d258b-4d13-4dc8-a0c4-82ccce9922e0 | None| IPv6  | ::/0  |  
  | egress| None | None 
|
+--+-+---+---++---+--+--+

# Make sure sg1 is used on the test host.
$ openstack server create --flavor cirros256 --image cirros-0.5.2-x86_64-disk 
--availability-zone :devstack0 --nic net-id=private --security-group sg1 vm1 
--wait

# See if the rule is implemented in the backend.
$ sudo ovs-ofctl dump-flows br-int | egrep 1234
 cookie=0x33704a39bf5031d7, duration=55.263s, table=82, n_packets=0, n_bytes=0, 
idle_age=57, priority=73,ct_state=+est-rel-rpl,tcp,reg5=0x20,tp_dst=1234 
actions=conjunction(22,2/2)
 cookie=0x33704a39bf5031d7, duration=55.263s, table=82, n_packets=0, n_bytes=0, 
idle_age=57, priority=73,ct_state=+new-est,tcp,reg5=0x20,tp_dst=1234 
actions=conjunction(23,2/2)

# Delete sg2...
$ openstack security group delete sg2

# ...by cascade also delete the rule in sg1 referencing sg2. At least in the 
API.
$ openstack security group rule list sg1
+--+-+---+---++---+---+--+
| ID   | IP Protocol | Ethertype | IP Range  | 
Port Range | Direction | Remote Security Group | Remote Address Group |
+--+-+---+---++---+---+--+
| 77db9548-b3ab-46ea-94a5-f00f6a4062da | None| IPv4  | 0.0.0.0/0 |  
  | egress| None  | None |
| f40d258b-4d13-4dc8-a0c4-82ccce9922e0 | None| IPv6  | ::/0  |  
  | egress| None  | None |
+--+-+---+---++---+---+--+

# But the delete is not propagated to the backend.  

  
$ sudo ovs-ofctl dump-flows br-int | egrep 1234
 cookie=0x33704a39bf5031d7, duration=112.917s, table=82, n_packets=0, 
n_bytes=0, idle_age=115, 
priority=73,ct_state=+est-rel-rpl,tcp,reg5=0x20,tp_dst=1234 
actions=conjunction(22,2/2)
 cookie=0x33704a39bf5031d7, duration=112.917s, table=82, n_packets=0, 
n_bytes=0, idle_age=115, 
priority=73,ct_state=+new-est,tcp,reg5=0x20,tp_dst=1234 
actions=conjunction(23,2/2)

# Clean up - even the left over backend flows.
$ openstack server delete vm1 --wait
$ sudo ovs-ofctl dump-flows br-int | egrep 1234
[nothing]
$ openstack security group delete sg2
$ openstack security group delete sg1

** Affects: neutron
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/2008712

Title:
  Security group rule deleted by cascade (because its remote group had
  been deleted) is not deleted in the backend

Status in neutron:
  New

Bug description:
  devstack 7533276c
  neutron aa40aef70f

  This reproduction uses the openvswitch ml2 mechanism_driver and
 

[Yahoo-eng-team] [Bug 2003553] [NEW] Some port attributes are ignored in bulk port create: allowed_address_pairs, extra_dhcp_opts

2023-01-20 Thread Bence Romsics
Public bug reported:

It seems the bulk port create API ignores some of the port attributes it
receives:

export TOKEN="$( openstack token issue -f value -c id )"

# bulk equivalent of
# openstack --debug port create port0 --network private --allowed-address 
ip-address=10.0.0.1,mac-address=01:23:45:67:89:ab

curl -s -H "Content-Type: application/json" -H "X-Auth-Token: $TOKEN" -d 
"{\"ports\":[{\"name\":\"port0\",\"network_id\":\"$( openstack net show private 
-f value -c id 
)\",\"allowed_address_pairs\":[{\"ip_address\":\"10.0.0.1\",\"mac_address\":\"01:23:45:67:89:ab\"}]}]}"
 -X POST http://127.0.0.1:9696/networking/v2.0/ports | json_pp
...
 "allowed_address_pairs" : [],
...

# bulk equivalent of
# openstack --debug port create port0 --network private --extra-dhcp-option 
name=domain-name-servers,value=10.0.0.1,ip-version=4

curl -s -H "Content-Type: application/json" -H "X-Auth-Token: $TOKEN" -d 
"{\"ports\":[{\"name\":\"port0\",\"network_id\":\"$( openstack net show private 
-f value -c id 
)\",\"extra_dhcp_opts\":[{\"opt_name\":\"domain-name-servers\",\"opt_value\":\"10.0.0.1\",\"ip_version\":\"4\"}]}]}"
 -X POST http://127.0.0.1:9696/networking/v2.0/ports | json_pp
...
 "extra_dhcp_opts" : [],
...

neutron b71b25820be6d61ed9f249eddf32bfa49ac76524

** Affects: neutron
 Importance: Undecided
 Assignee: Bence Romsics (bence-romsics)
 Status: New


** Tags: api

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/2003553

Title:
  Some port attributes are ignored in bulk port create:
  allowed_address_pairs, extra_dhcp_opts

Status in neutron:
  New

Bug description:
  It seems the bulk port create API ignores some of the port attributes
  it receives:

  export TOKEN="$( openstack token issue -f value -c id )"

  # bulk equivalent of
  # openstack --debug port create port0 --network private --allowed-address 
ip-address=10.0.0.1,mac-address=01:23:45:67:89:ab

  curl -s -H "Content-Type: application/json" -H "X-Auth-Token: $TOKEN" -d 
"{\"ports\":[{\"name\":\"port0\",\"network_id\":\"$( openstack net show private 
-f value -c id 
)\",\"allowed_address_pairs\":[{\"ip_address\":\"10.0.0.1\",\"mac_address\":\"01:23:45:67:89:ab\"}]}]}"
 -X POST http://127.0.0.1:9696/networking/v2.0/ports | json_pp
  ...
   "allowed_address_pairs" : [],
  ...

  # bulk equivalent of
  # openstack --debug port create port0 --network private --extra-dhcp-option 
name=domain-name-servers,value=10.0.0.1,ip-version=4

  curl -s -H "Content-Type: application/json" -H "X-Auth-Token: $TOKEN" -d 
"{\"ports\":[{\"name\":\"port0\",\"network_id\":\"$( openstack net show private 
-f value -c id 
)\",\"extra_dhcp_opts\":[{\"opt_name\":\"domain-name-servers\",\"opt_value\":\"10.0.0.1\",\"ip_version\":\"4\"}]}]}"
 -X POST http://127.0.0.1:9696/networking/v2.0/ports | json_pp
  ...
   "extra_dhcp_opts" : [],
  ...

  neutron b71b25820be6d61ed9f249eddf32bfa49ac76524

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/2003553/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 2002629] Re: devstack build in the gate fails with: ovnnb_db.sock: database connection failed

2023-01-12 Thread Bence Romsics
Removing neutron from the affected projects, since Yatin found the cause
in devstack.

** No longer affects: neutron

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/2002629

Title:
  devstack build in the gate fails with: ovnnb_db.sock: database
  connection failed

Status in devstack:
  In Progress

Bug description:
  Recently we seem to have many the same devstack build failure in many
  different gate jobs. The usual error message is:

  + lib/neutron_plugins/ovn_agent:start_ovn:714 :   wait_for_db_file 
/var/lib/ovn/ovnsb_db.db
  + lib/neutron_plugins/ovn_agent:wait_for_db_file:175 :   local count=0
  + lib/neutron_plugins/ovn_agent:wait_for_db_file:176 :   '[' '!' -f 
/var/lib/ovn/ovnsb_db.db ']'
  + lib/neutron_plugins/ovn_agent:start_ovn:716 :   is_service_enabled tls-proxy
  + functions-common:is_service_enabled:2089 :   return 0
  + lib/neutron_plugins/ovn_agent:start_ovn:717 :   sudo ovn-nbctl 
--db=unix:/var/run/ovn/ovnnb_db.sock set-ssl 
/opt/stack/data/CA/int-ca/private/devstack-cert.key 
/opt/stack/data/CA/int-ca/devstack-cert.crt 
/opt/stack/data/CA/int-ca/ca-chain.pem
  ovn-nbctl: unix:/var/run/ovn/ovnnb_db.sock: database connection failed (No 
such file or directory)
  + lib/neutron_plugins/ovn_agent:start_ovn:1 :   exit_trap

  A few example logs:

  https://zuul.opendev.org/t/openstack/build/ec852d75c8094afcb4140871bc9ffa36
  https://zuul.opendev.org/t/openstack/build/eae988aa8cd24c78894a3d3438392357

  The search expression 'message:"ovnnb_db.sock: database connection
  failed"' gives me 1200+ hits in https://opensearch.logs.openstack.org
  for the last 2 weeks.

To manage notifications about this bug go to:
https://bugs.launchpad.net/devstack/+bug/2002629/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 2002629] [NEW] devstack build in the gate fails with: ovnnb_db.sock: database connection failed

2023-01-12 Thread Bence Romsics
Public bug reported:

Recently we seem to have many the same devstack build failure in many
different gate jobs. The usual error message is:

+ lib/neutron_plugins/ovn_agent:start_ovn:714 :   wait_for_db_file 
/var/lib/ovn/ovnsb_db.db
+ lib/neutron_plugins/ovn_agent:wait_for_db_file:175 :   local count=0
+ lib/neutron_plugins/ovn_agent:wait_for_db_file:176 :   '[' '!' -f 
/var/lib/ovn/ovnsb_db.db ']'
+ lib/neutron_plugins/ovn_agent:start_ovn:716 :   is_service_enabled tls-proxy
+ functions-common:is_service_enabled:2089 :   return 0
+ lib/neutron_plugins/ovn_agent:start_ovn:717 :   sudo ovn-nbctl 
--db=unix:/var/run/ovn/ovnnb_db.sock set-ssl 
/opt/stack/data/CA/int-ca/private/devstack-cert.key 
/opt/stack/data/CA/int-ca/devstack-cert.crt 
/opt/stack/data/CA/int-ca/ca-chain.pem
ovn-nbctl: unix:/var/run/ovn/ovnnb_db.sock: database connection failed (No such 
file or directory)
+ lib/neutron_plugins/ovn_agent:start_ovn:1 :   exit_trap

A few example logs:

https://zuul.opendev.org/t/openstack/build/ec852d75c8094afcb4140871bc9ffa36
https://zuul.opendev.org/t/openstack/build/eae988aa8cd24c78894a3d3438392357

The search expression 'message:"ovnnb_db.sock: database connection
failed"' gives me 1200+ hits in https://opensearch.logs.openstack.org
for the last 2 weeks.

** Affects: neutron
 Importance: Undecided
 Status: New


** Tags: gate-failure ovn

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/2002629

Title:
  devstack build in the gate fails with: ovnnb_db.sock: database
  connection failed

Status in neutron:
  New

Bug description:
  Recently we seem to have many the same devstack build failure in many
  different gate jobs. The usual error message is:

  + lib/neutron_plugins/ovn_agent:start_ovn:714 :   wait_for_db_file 
/var/lib/ovn/ovnsb_db.db
  + lib/neutron_plugins/ovn_agent:wait_for_db_file:175 :   local count=0
  + lib/neutron_plugins/ovn_agent:wait_for_db_file:176 :   '[' '!' -f 
/var/lib/ovn/ovnsb_db.db ']'
  + lib/neutron_plugins/ovn_agent:start_ovn:716 :   is_service_enabled tls-proxy
  + functions-common:is_service_enabled:2089 :   return 0
  + lib/neutron_plugins/ovn_agent:start_ovn:717 :   sudo ovn-nbctl 
--db=unix:/var/run/ovn/ovnnb_db.sock set-ssl 
/opt/stack/data/CA/int-ca/private/devstack-cert.key 
/opt/stack/data/CA/int-ca/devstack-cert.crt 
/opt/stack/data/CA/int-ca/ca-chain.pem
  ovn-nbctl: unix:/var/run/ovn/ovnnb_db.sock: database connection failed (No 
such file or directory)
  + lib/neutron_plugins/ovn_agent:start_ovn:1 :   exit_trap

  A few example logs:

  https://zuul.opendev.org/t/openstack/build/ec852d75c8094afcb4140871bc9ffa36
  https://zuul.opendev.org/t/openstack/build/eae988aa8cd24c78894a3d3438392357

  The search expression 'message:"ovnnb_db.sock: database connection
  failed"' gives me 1200+ hits in https://opensearch.logs.openstack.org
  for the last 2 weeks.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/2002629/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1998820] [NEW] Floor division in size usage calculation leads to surprising quota limits

2022-12-05 Thread Bence Romsics
Public bug reported:

Colleagues working downstream found a slight discrepancy in quota
enforcement while working with the new unified quota system.

If we set the image_size_total quota to 1 MiB, the actual limit where
quota enforcement turns on is 2 MiB - 1 byte:

openstack --os-cloud devstack-system-admin registered limit create
--service glance --default-limit 1 --region RegionOne image_size_total

openstack image list -f value -c ID | xargs -r openstack image delete
openstack image create --file <( dd if=/dev/zero bs=1 count=$(( 2 * 1024 ** 2 - 
1 )) ) img1  ## succeeds
openstack image create --file <( dd if=/dev/zero bs=1 count=1 ) img2  ## 
succeeds

openstack image list -f value -c ID | xargs -r openstack image delete
openstack image create --file <( dd if=/dev/zero bs=1 count=$(( 2 * 1024 ** 2 
)) ) img1  ## succeeds
openstack image create --file <( dd if=/dev/zero bs=1 count=1 ) img2  ## 
HttpException: 413: ... Request Entity Too Large

This bug report is not about the size of img1 - we know that the limit
is soft and img1 can go over the quota - but the success/failure of
'image create img2'.

I believe the root cause is an integer/floor division when calculating
the usage in megabytes. My colleagues also proposed a fix, which I am
going to upload right after opening this ticket.

Environment details:

glance 199722a65
devstack 0d5c8d66

Quota setup as described in:
https://docs.openstack.org/glance/latest/admin/quotas.html

$ for opt in image_stage_total image_count_total image_count_uploading ; do 
openstack --os-cloud devstack-system-admin registered limit create --service 
glance --default-limit 99 --region RegionOne $opt ; done
$ openstack --os-cloud devstack-system-admin registered limit create --service 
glance --default-limit 1 --region RegionOne image_size_total
+---+--+
| Field | Value|
+---+--+
| default_limit | 1|
| description   | None |
| id| 828fe62d931449d08d96f725226891d4 |
| region_id | RegionOne|
| resource_name | image_size_total |
| service_id| 3400473cffa047edb79c67383e86072d |
+---+--+

$ source openrc admin admin

$ openstack user create --password devstack glance-service
+-+--+
| Field   | Value|
+-+--+
| domain_id   | default  |
| enabled | True |
| id  | 43268355b8f64d399a7a35535ffee399 |
| name| glance-service   |
| options | {}   |
| password_expires_at | None |
+-+--+
$ openstack role add --user glance-service --user-domain Default --system all 
reader

$ echo $OS_AUTH_URL
http://192.168.122.218/identity
$ openstack endpoint list --service glance
+--+---+--+--+-+---+--+
| ID   | Region| Service Name | Service Type | 
Enabled | Interface | URL  |
+--+---+--+--+-+---+--+
| 92995b7a76444502acbbecfb421d0bc1 | RegionOne | glance   | image| 
True| public| http://192.168.122.218/image |
+--+---+--+--+-+---+--+

$ vi /etc/glance/glance-api
[DEFAULT]
use_keystone_limits = True
[oslo_limit]
auth_url = http://192.168.122.218/identity
auth_type = password
user_domain_id = default
username = glance-service
system_scope = all
password = devstack
endpoint_id = 92995b7a76444502acbbecfb421d0bc1
region_name = RegionOne

$ sudo systemctl restart devstack@g-api.service

** Affects: glance
 Importance: Undecided
 Status: In Progress

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to Glance.
https://bugs.launchpad.net/bugs/1998820

Title:
  Floor division in size usage calculation leads to surprising quota
  limits

Status in Glance:
  In Progress

Bug description:
  Colleagues working downstream found a slight discrepancy in quota
  enforcement while working with the new unified quota system.

  If we set the image_size_total quota to 1 MiB, the actual limit where
  quota enforcement turns on is 2 MiB - 1 byte:

  openstack --os-cloud devstack-system-admin registered limit create
  --service glance --default-limit 1 --region RegionOne image_size_total

  

[Yahoo-eng-team] [Bug 1998337] [NEW] test_dvr_router_lifecycle_ha_with_snat_with_fips fails occasionally in the gate

2022-11-30 Thread Bence Romsics
Public bug reported:

Opening this report to track the following test that fails occasionally
in the gate:

job neutron-functional-with-uwsgi
test 
neutron.tests.functional.agent.l3.extensions.qos.test_fip_qos_extension.TestL3AgentFipQosExtensionDVR.test_dvr_router_lifecycle_ha_with_snat_with_fipstesttools

Sample traceback:

ft1.31: 
neutron.tests.functional.agent.l3.extensions.qos.test_fip_qos_extension.TestL3AgentFipQosExtensionDVR.test_dvr_router_lifecycle_ha_with_snat_with_fipstesttools.testresult.real._StringException:
 Traceback (most recent call last):
  File "/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/base.py", 
line 182, in func
return f(self, *args, **kwargs)
  File "/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/base.py", 
line 182, in func
return f(self, *args, **kwargs)
  File 
"/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/functional/agent/l3/test_dvr_router.py",
 line 208, in test_dvr_router_lifecycle_ha_with_snat_with_fips
self._dvr_router_lifecycle(enable_ha=True, enable_snat=True)
  File 
"/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/functional/agent/l3/test_dvr_router.py",
 line 626, in _dvr_router_lifecycle
self._assert_dvr_floating_ips(router, snat_bound_fip=snat_bound_fip,
  File 
"/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/functional/agent/l3/test_dvr_router.py",
 line 791, in _assert_dvr_floating_ips
self.assertTrue(fg_port_created_successfully)
  File "/usr/lib/python3.10/unittest/case.py", line 687, in assertTrue
raise self.failureException(msg)
AssertionError: False is not true

It seems to recur occasionally, for example:

https://675daf3418638bf15806-f7e1f8eddcfdd9404f4b72ab9bb1f324.ssl.cf1.rackcdn.com/865575/1/check/neutron-functional-with-uwsgi/bd983b3/testr_results.html
https://488eb2b76bde124417ee-80e67ec01f194d5b25d665df26ee3378.ssl.cf2.rackcdn.com/839066/18/check/neutron-functional-with-uwsgi/66c7fcc/testr_results.html

There may be more that's similar:

$ logsearch log --project openstack/neutron --result FAILURE --pipeline check 
--job neutron-functional-with-uwsgi --limit 30 'line 208, in 
test_dvr_router_lifecycle_ha_with_snat_with_fips'
Builds with matching logs 5/30:
+--+-+---++
| uuid | finished| review   
 | branch |
+--+-+---++
| 1d265722d23548d6930486699202347d | 2022-11-30T13:42:28 | 
https://review.opendev.org/863881 | master |
| cb2a2d7161764d5f823a09528eedc44c | 2022-11-28T16:47:20 | 
https://review.opendev.org/865018 | master |
| 66c7fcc56a5347648732bfcb90341ef5 | 2022-11-27T00:55:10 | 
https://review.opendev.org/839066 | master |
| 85b3b709e9d54718a4f0847da5b4b2df | 2022-11-25T10:00:01 | 
https://review.opendev.org/865018 | master |
| bd983b367ac441c190e38dcf1fadc87f | 2022-11-24T16:17:06 | 
https://review.opendev.org/865575 | master |
+--+-+---++

** Affects: neutron
 Importance: Medium
 Status: New


** Tags: gate-failure

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1998337

Title:
  test_dvr_router_lifecycle_ha_with_snat_with_fips fails occasionally in
  the gate

Status in neutron:
  New

Bug description:
  Opening this report to track the following test that fails
  occasionally in the gate:

  job neutron-functional-with-uwsgi
  test 
neutron.tests.functional.agent.l3.extensions.qos.test_fip_qos_extension.TestL3AgentFipQosExtensionDVR.test_dvr_router_lifecycle_ha_with_snat_with_fipstesttools

  Sample traceback:

  ft1.31: 
neutron.tests.functional.agent.l3.extensions.qos.test_fip_qos_extension.TestL3AgentFipQosExtensionDVR.test_dvr_router_lifecycle_ha_with_snat_with_fipstesttools.testresult.real._StringException:
 Traceback (most recent call last):
File "/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/base.py", 
line 182, in func
  return f(self, *args, **kwargs)
File "/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/base.py", 
line 182, in func
  return f(self, *args, **kwargs)
File 
"/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/functional/agent/l3/test_dvr_router.py",
 line 208, in test_dvr_router_lifecycle_ha_with_snat_with_fips
  self._dvr_router_lifecycle(enable_ha=True, enable_snat=True)
File 
"/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/functional/agent/l3/test_dvr_router.py",
 line 626, in _dvr_router_lifecycle
  self._assert_dvr_floating_ips(router, snat_bound_fip=snat_bound_fip,
File 
"/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/functional/agent/l3/test_dvr_router.py",
 line 791, in 

[Yahoo-eng-team] [Bug 1995732] [NEW] bulk port create: TypeError: Bad prefix type for generating IPv6 address by EUI-64

2022-11-04 Thread Bence Romsics
Public bug reported:

source openrc admin admin
export TOKEN="$( openstack token issue -f value -c id )"

A single port create succeeds:
curl -s -H "Content-Type: application/json" -H "X-Auth-Token: $TOKEN" -d 
"{\"port\":{\"name\":\"port0\",\"network_id\":\"$( openstack net show private 
-f value -c id )\"}}" -X POST http://127.0.0.1:9696/networking/v2.0/ports | 
json_pp
...

But the same request via the bulk api fails:
curl -s -H "Content-Type: application/json" -H "X-Auth-Token: $TOKEN" -d 
"{\"ports\":[{\"name\":\"port0-via-bulk\",\"network_id\":\"$( openstack net 
show private -f value -c id )\"}]}" -X POST 
http://127.0.0.1:9696/networking/v2.0/ports | json_pp
{
   "NeutronError" : {
  "detail" : "",
  "message" : "Request Failed: internal server error while processing your 
request.",
  "type" : "HTTPInternalServerError"
   }
}

While in q-svc logs we have:
nov 04 15:56:52 devstack0 neutron-server[101377]: ERROR 
neutron.pecan_wsgi.hooks.translation [None 
req-f5c79830-013a-4ae2-8c47-2102b20299e1 admin admin] POST failed.: TypeError: 
Bad prefix type for generating IPv6 address by EUI-64: fdd6:813:349::/64
nov 04 15:56:52 devstack0 neutron-server[101377]: ERROR 
neutron.pecan_wsgi.hooks.translation Traceback (most recent call last):
nov 04 15:56:52 devstack0 neutron-server[101377]: ERROR 
neutron.pecan_wsgi.hooks.translation   File 
"/usr/local/lib/python3.10/dist-packages/oslo_utils/netutils.py", line 210, in 
get_ipv6_addr_by_EUI64
nov 04 15:56:52 devstack0 neutron-server[101377]: ERROR 
neutron.pecan_wsgi.hooks.translation eui64 = int(netaddr.EUI(mac).eui64())
nov 04 15:56:52 devstack0 neutron-server[101377]: ERROR 
neutron.pecan_wsgi.hooks.translation   File 
"/usr/local/lib/python3.10/dist-packages/netaddr/eui/__init__.py", line 389, in 
__init__
nov 04 15:56:52 devstack0 neutron-server[101377]: ERROR 
neutron.pecan_wsgi.hooks.translation self.value = addr
nov 04 15:56:52 devstack0 neutron-server[101377]: ERROR 
neutron.pecan_wsgi.hooks.translation   File 
"/usr/local/lib/python3.10/dist-packages/netaddr/eui/__init__.py", line 425, in 
_set_value
nov 04 15:56:52 devstack0 neutron-server[101377]: ERROR 
neutron.pecan_wsgi.hooks.translation self._value = module.str_to_int(value)
nov 04 15:56:52 devstack0 neutron-server[101377]: ERROR 
neutron.pecan_wsgi.hooks.translation   File 
"/usr/local/lib/python3.10/dist-packages/netaddr/strategy/eui48.py", line 178, 
in str_to_int
nov 04 15:56:52 devstack0 neutron-server[101377]: ERROR 
neutron.pecan_wsgi.hooks.translation raise TypeError('%r is not str() or 
unicode()!' % (addr,))
nov 04 15:56:52 devstack0 neutron-server[101377]: ERROR 
neutron.pecan_wsgi.hooks.translation TypeError:  is not str() or unicode()!
nov 04 15:56:52 devstack0 neutron-server[101377]: ERROR 
neutron.pecan_wsgi.hooks.translation
nov 04 15:56:52 devstack0 neutron-server[101377]: ERROR 
neutron.pecan_wsgi.hooks.translation During handling of the above exception, 
another exception occurred:
nov 04 15:56:52 devstack0 neutron-server[101377]: ERROR 
neutron.pecan_wsgi.hooks.translation
nov 04 15:56:52 devstack0 neutron-server[101377]: ERROR 
neutron.pecan_wsgi.hooks.translation Traceback (most recent call last):
nov 04 15:56:52 devstack0 neutron-server[101377]: ERROR 
neutron.pecan_wsgi.hooks.translation   File 
"/usr/local/lib/python3.10/dist-packages/pecan/core.py", line 693, in __call__
nov 04 15:56:52 devstack0 neutron-server[101377]: ERROR 
neutron.pecan_wsgi.hooks.translation self.invoke_controller(controller, 
args, kwargs, state)
nov 04 15:56:52 devstack0 neutron-server[101377]: ERROR 
neutron.pecan_wsgi.hooks.translation   File 
"/usr/local/lib/python3.10/dist-packages/pecan/core.py", line 584, in 
invoke_controller
nov 04 15:56:52 devstack0 neutron-server[101377]: ERROR 
neutron.pecan_wsgi.hooks.translation result = controller(*args, **kwargs)
nov 04 15:56:52 devstack0 neutron-server[101377]: ERROR 
neutron.pecan_wsgi.hooks.translation   File 
"/opt/stack/neutron-lib/neutron_lib/db/api.py", line 140, in wrapped
nov 04 15:56:52 devstack0 neutron-server[101377]: ERROR 
neutron.pecan_wsgi.hooks.translation with 
excutils.save_and_reraise_exception():
nov 04 15:56:52 devstack0 neutron-server[101377]: ERROR 
neutron.pecan_wsgi.hooks.translation   File 
"/usr/local/lib/python3.10/dist-packages/oslo_utils/excutils.py", line 227, in 
__exit__
nov 04 15:56:52 devstack0 neutron-server[101377]: ERROR 
neutron.pecan_wsgi.hooks.translation self.force_reraise()
nov 04 15:56:52 devstack0 neutron-server[101377]: ERROR 
neutron.pecan_wsgi.hooks.translation   File 
"/usr/local/lib/python3.10/dist-packages/oslo_utils/excutils.py", line 200, in 
force_reraise
nov 04 15:56:52 devstack0 neutron-server[101377]: ERROR 
neutron.pecan_wsgi.hooks.translation raise self.value
nov 04 15:56:52 devstack0 neutron-server[101377]: ERROR 
neutron.pecan_wsgi.hooks.translation   File 
"/opt/stack/neutron-lib/neutron_lib/db/api.py", line 138, in wrapped
nov 04 

[Yahoo-eng-team] [Bug 1992328] [NEW] volume timeouts in nova gate

2022-10-10 Thread Bence Romsics
Public bug reported:

I'm trying to track here a bug I have seen in nova gate appearing
randomly through rechecks.

Typical stack traces:

Traceback (most recent call last):
  File "/opt/stack/tempest/tempest/common/utils/__init__.py", line 90, in 
wrapper
return f(*func_args, **func_kwargs)
  File "/opt/stack/tempest/tempest/api/compute/admin/test_volume_swap.py", line 
110, in test_volume_swap
volume1['id'], 'available')
  File "/opt/stack/tempest/tempest/common/waiters.py", line 288, in 
wait_for_volume_resource_status
raise lib_exc.TimeoutException(message)
tempest.lib.exceptions.TimeoutException: Request timed out
Details: volume a19743a3-4651-4c7f-a9a1-823735ea84a0 failed to reach available 
status (current in-use) within the required time (196 s).

Traceback (most recent call last):
  File "/opt/stack/tempest/tempest/common/utils/__init__.py", line 90, in 
wrapper
return f(*func_args, **func_kwargs)
  File "/opt/stack/tempest/tempest/api/compute/admin/test_live_migration.py", 
line 190, in test_live_block_migration_with_attached_volume
self.attach_volume(server, volume, device='/dev/xvdb')
  File "/opt/stack/tempest/tempest/api/compute/base.py", line 581, in 
attach_volume
volume['id'], 'in-use')
  File "/opt/stack/tempest/tempest/common/waiters.py", line 288, in 
wait_for_volume_resource_status
raise lib_exc.TimeoutException(message)
tempest.lib.exceptions.TimeoutException: Request timed out
Details: volume 92685b8f-4db0-4110-a1ac-016ea7c51d1f failed to reach in-use 
status (current available) within the required time (196 s).

Typical jobs and tests:

nova-multi-cell
test_volume_swap[id-1769f00d-a693-4d67-a631-6a3496773813]

nova-live-migration
test_live_block_migration_with_attached_volume[id-e19c0cc6-6720-4ed8-be83-b6603ed5c812]

Example hits with (affecting multiple branches):

$ logsearch log --project openstack/nova --job nova-live-migration --result 
FAILURE --limit 50 "test_live_block_migration_with_attached_volume .* ... 
FAILED"
...
Builds with matching logs 10/50:
+--+-+--+---+-+
| uuid | finished| pipeline | review
| branch  |
+--+-+--+---+-+
| 36b367b0d0bb46d2a7fc6af4eb7739ca | 2022-10-07T19:39:47 | check| 
https://review.opendev.org/860736 | stable/victoria |
| d02ed047fcfd4180902dc0bec0334c38 | 2022-10-03T10:37:00 | check| 
https://review.opendev.org/854980 | stable/victoria |
| 0df9b00df16c4bbc9e49baf853fe0cf5 | 2022-09-19T09:47:02 | check| 
https://review.opendev.org/854980 | stable/victoria |
| 0db0e8d510d04443a172cc43e537f973 | 2022-09-16T14:14:31 | check| 
https://review.opendev.org/857877 | stable/train|
| 6ca30836a1b34be58728dc5d69c44c21 | 2022-09-16T10:33:55 | check| 
https://review.opendev.org/858051 | stable/victoria |
| 684e7c37c61745829908495ba249afb7 | 2022-09-16T10:14:07 | check| 
https://review.opendev.org/854980 | stable/victoria |
| 6bcf4105d0fc476faf9ee56e7f0ed41f | 2022-09-15T14:22:01 | check| 
https://review.opendev.org/857877 | stable/train|
| 0ea47624757c48a8bcfa9fd5c35b6465 | 2022-09-13T10:33:52 | check| 
https://review.opendev.org/854980 | stable/victoria |
| ca0d5f750b3040ed99c1e6ec3414d154 | 2022-09-06T17:28:41 | check| 
https://review.opendev.org/836830 | master  |
| 2ce6d7aa67404587b050a6b56f4d15e6 | 2022-08-29T11:58:59 | check| 
https://review.opendev.org/833090 | master  |
+--+-+--+---+-+

** Affects: nova
 Importance: Undecided
 Status: New


** Tags: gate-failure

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1992328

Title:
  volume timeouts in nova gate

Status in OpenStack Compute (nova):
  New

Bug description:
  I'm trying to track here a bug I have seen in nova gate appearing
  randomly through rechecks.

  Typical stack traces:

  Traceback (most recent call last):
File "/opt/stack/tempest/tempest/common/utils/__init__.py", line 90, in 
wrapper
  return f(*func_args, **func_kwargs)
File "/opt/stack/tempest/tempest/api/compute/admin/test_volume_swap.py", 
line 110, in test_volume_swap
  volume1['id'], 'available')
File "/opt/stack/tempest/tempest/common/waiters.py", line 288, in 
wait_for_volume_resource_status
  raise lib_exc.TimeoutException(message)
  tempest.lib.exceptions.TimeoutException: Request timed out
  Details: volume a19743a3-4651-4c7f-a9a1-823735ea84a0 failed to reach 
available status (current in-use) within the required time (196 s).

  Traceback (most recent call last):
File 

[Yahoo-eng-team] [Bug 1990842] [NEW] RFE Expose Open vSwitch other_config column in the API

2022-09-26 Thread Bence Romsics
Public bug reported:

Some of our performance sensitive users would like to tweak Open
vSwitch's Tx packet steering option under OpenStack:

https://docs.openvswitch.org/en/latest/topics/userspace-tx-steering/
available since Open vSwitch v2.17.0:

https://github.com/openvswitch/ovs/blob/7af5c33c1629b309cbcbe3b6c9c3bd6d3b4c0abf/NEWS#L103

https://github.com/openvswitch/ovs/commit/c18e707b2f259438633af5b23df53e1409472871

To enable that, we would like to expose some OVS interface configuration in a 
Neutron port's binding_profile.
Consider for example:

openstack port create port0 --binding-profile 
ovs_other_config=tx-steering:hash ...
more generally: --binding-profile ovs_other_config=foo:bar,bar:baz
or an alternative syntax: --binding-profile ovs:other_config='{"foo": 
"bar", "bar": "baz"}'

Given this information, ovs-agent can set the corresponding OVS
interface's other_config (using the python native interface of course,
not ovs-vsctl):

sudo ovs-vsctl set Interface ovs-interface-of-port0 
other_config:tx-steering=hash
sudo ovs-vsctl set Interface ovs-interface-of-port0 other_config:foo=bar 
other_config:bar=baz

** Affects: neutron
 Importance: Wishlist
     Assignee: Bence Romsics (bence-romsics)
 Status: New


** Tags: rfe

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1990842

Title:
  RFE Expose Open vSwitch other_config column in the API

Status in neutron:
  New

Bug description:
  Some of our performance sensitive users would like to tweak Open
  vSwitch's Tx packet steering option under OpenStack:

  https://docs.openvswitch.org/en/latest/topics/userspace-tx-steering/
  available since Open vSwitch v2.17.0:
  
https://github.com/openvswitch/ovs/blob/7af5c33c1629b309cbcbe3b6c9c3bd6d3b4c0abf/NEWS#L103
  
https://github.com/openvswitch/ovs/commit/c18e707b2f259438633af5b23df53e1409472871

  To enable that, we would like to expose some OVS interface configuration in a 
Neutron port's binding_profile.
  Consider for example:

  openstack port create port0 --binding-profile 
ovs_other_config=tx-steering:hash ...
  more generally: --binding-profile ovs_other_config=foo:bar,bar:baz
  or an alternative syntax: --binding-profile ovs:other_config='{"foo": 
"bar", "bar": "baz"}'

  Given this information, ovs-agent can set the corresponding OVS
  interface's other_config (using the python native interface of course,
  not ovs-vsctl):

  sudo ovs-vsctl set Interface ovs-interface-of-port0 
other_config:tx-steering=hash
  sudo ovs-vsctl set Interface ovs-interface-of-port0 other_config:foo=bar 
other_config:bar=baz

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1990842/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1988986] [NEW] gate: keystone-protection-functional: keystone_tempest_plugin.tests.rbac.v3.test_credential.SystemAdminTests.test_identity_list_credentials: Could not find credent

2022-09-07 Thread Bence Romsics
Public bug reported:

Tracking a bug seen in the gate:

zuul report: 
https://50aa58668700125588f9-69e8ab9908c85e150921aaa267a6677d.ssl.cf1.rackcdn.com/855198/1/gate/keystone-protection-functional/edeae8a/testr_results.html
zuul log: 
https://50aa58668700125588f9-69e8ab9908c85e150921aaa267a6677d.ssl.cf1.rackcdn.com/855198/1/gate/keystone-protection-functional/edeae8a/job-output.txt

pipeline: gate
job: keystone-protection-functional
test: 
keystone_tempest_plugin.tests.rbac.v3.test_credential.SystemAdminTests.test_identity_list_credentials

stack trace:
2022-09-06 16:19:59.894748 | controller | {3} 
keystone_tempest_plugin.tests.rbac.v3.test_credential.SystemAdminTests.test_identity_list_credentials
 [0.842160s] ... FAILED
2022-09-06 16:19:59.894814 | controller |
2022-09-06 16:19:59.894840 | controller | Captured traceback:
2022-09-06 16:19:59.894859 | controller | ~~~
2022-09-06 16:19:59.894877 | controller | Traceback (most recent call last):
2022-09-06 16:19:59.894903 | controller |
2022-09-06 16:19:59.894922 | controller |   File 
"/opt/stack/tempest/.tox/tempest/lib/python3.8/site-packages/keystone_tempest_plugin/tests/rbac/v3/test_credential.py",
 line 220, in test_identity_list_credentials
2022-09-06 16:19:59.894941 | controller | resp = 
self.do_request('list_credentials')['credentials']
2022-09-06 16:19:59.894959 | controller |
2022-09-06 16:19:59.894977 | controller |   File 
"/opt/stack/tempest/.tox/tempest/lib/python3.8/site-packages/keystone_tempest_plugin/tests/rbac/v3/base.py",
 line 39, in do_request
2022-09-06 16:19:59.894994 | controller | response = getattr(client, 
method)(**payload)
2022-09-06 16:19:59.895012 | controller |
2022-09-06 16:19:59.895029 | controller |   File 
"/opt/stack/tempest/tempest/lib/services/identity/v3/credentials_client.py", 
line 78, in list_credentials
2022-09-06 16:19:59.895047 | controller | resp, body = self.get(url)
2022-09-06 16:19:59.895064 | controller |
2022-09-06 16:19:59.895093 | controller |   File 
"/opt/stack/tempest/tempest/lib/common/rest_client.py", line 314, in get
2022-09-06 16:19:59.895111 | controller | return self.request('GET', url, 
extra_headers, headers)
2022-09-06 16:19:59.895129 | controller |
2022-09-06 16:19:59.895146 | controller |   File 
"/opt/stack/tempest/tempest/lib/common/rest_client.py", line 720, in request
2022-09-06 16:19:59.895164 | controller | self._error_checker(resp, 
resp_body)
2022-09-06 16:19:59.895181 | controller |
2022-09-06 16:19:59.895203 | controller |   File 
"/opt/stack/tempest/tempest/lib/common/rest_client.py", line 826, in 
_error_checker
2022-09-06 16:19:59.895221 | controller | raise 
exceptions.NotFound(resp_body, resp=resp)
2022-09-06 16:19:59.895239 | controller |
2022-09-06 16:19:59.895256 | controller | tempest.lib.exceptions.NotFound: 
Object not found
2022-09-06 16:19:59.895274 | controller | Details: {'code': 404, 'message': 
'Could not find credential: f5b242ff18564f548caa1072929fdac2.', 'title': 'Not 
Found'}

** Affects: keystone
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Identity (keystone).
https://bugs.launchpad.net/bugs/1988986

Title:
  gate: keystone-protection-functional:
  
keystone_tempest_plugin.tests.rbac.v3.test_credential.SystemAdminTests.test_identity_list_credentials:
  Could not find credential

Status in OpenStack Identity (keystone):
  New

Bug description:
  Tracking a bug seen in the gate:

  zuul report: 
https://50aa58668700125588f9-69e8ab9908c85e150921aaa267a6677d.ssl.cf1.rackcdn.com/855198/1/gate/keystone-protection-functional/edeae8a/testr_results.html
  zuul log: 
https://50aa58668700125588f9-69e8ab9908c85e150921aaa267a6677d.ssl.cf1.rackcdn.com/855198/1/gate/keystone-protection-functional/edeae8a/job-output.txt

  pipeline: gate
  job: keystone-protection-functional
  test: 
keystone_tempest_plugin.tests.rbac.v3.test_credential.SystemAdminTests.test_identity_list_credentials

  stack trace:
  2022-09-06 16:19:59.894748 | controller | {3} 
keystone_tempest_plugin.tests.rbac.v3.test_credential.SystemAdminTests.test_identity_list_credentials
 [0.842160s] ... FAILED
  2022-09-06 16:19:59.894814 | controller |
  2022-09-06 16:19:59.894840 | controller | Captured traceback:
  2022-09-06 16:19:59.894859 | controller | ~~~
  2022-09-06 16:19:59.894877 | controller | Traceback (most recent call 
last):
  2022-09-06 16:19:59.894903 | controller |
  2022-09-06 16:19:59.894922 | controller |   File 
"/opt/stack/tempest/.tox/tempest/lib/python3.8/site-packages/keystone_tempest_plugin/tests/rbac/v3/test_credential.py",
 line 220, in test_identity_list_credentials
  2022-09-06 16:19:59.894941 | controller | resp = 
self.do_request('list_credentials')['credentials']
  2022-09-06 16:19:59.894959 | controller |
  2022-09-06 16:19:59.894977 | controller 

[Yahoo-eng-team] [Bug 1988311] [NEW] Concurrent evacuation of vms with pinned cpus to the same host fail randomly

2022-08-31 Thread Bence Romsics
Public bug reported:

Reproduction:

Boot two vms (each with one pinned cpu) on devstack0.
Then evacuate them to devtack0a.
devstack0a has two dedicated cpus, so both vms should fit.
However sometimes (for example 6 out of 10 times) the evacuation of one vm 
fails with this error message: 'CPU set to pin [0] must be a subset of free CPU 
set [1]'.

devstack0 - all-in-one host
devstack0a - compute-only host

# have two dedicated cpus for pinning on the evacuation target host
devstack0a:/etc/nova/nova-cpu.conf:
[compute]
cpu_dedicated_set = 0,1

# the dedicated cpus are properly tracked in placement
$ openstack resource provider list
+--+++--+--+
| uuid | name   | generation | 
root_provider_uuid   | parent_provider_uuid |
+--+++--+--+
| a0574d87-42ee-4e13-b05a-639dc62c1196 | devstack0a |  2 | 
a0574d87-42ee-4e13-b05a-639dc62c1196 | None |
| 2e6fac42-d6e3-4366-a864-d5eb2bdc2241 | devstack0  |  2 | 
2e6fac42-d6e3-4366-a864-d5eb2bdc2241 | None |
+--+++--+--+
$ openstack resource provider inventory list 
a0574d87-42ee-4e13-b05a-639dc62c1196
++--+--+--+--+---+---+--+
| resource_class | allocation_ratio | min_unit | max_unit | reserved | 
step_size | total | used |
++--+--+--+--+---+---+--+
| MEMORY_MB  |  1.5 |1 | 3923 |  512 | 
1 |  3923 |0 |
| DISK_GB|  1.0 |1 |   28 |0 | 
1 |28 |0 |
| PCPU   |  1.0 |1 |2 |0 | 
1 | 2 |0 |
++--+--+--+--+---+---+--+

# use vms with one pinned cpu
openstack flavor create cirros256-pinned --public --ram 256 --disk 1 --vcpus 1 
--property hw_rng:allowed=True --property hw:cpu_policy=dedicated

# boot two vms (each with one pinned cpu) on devstack0
n=2 ; for i in $( seq $n ) ; do openstack server create --flavor 
cirros256-pinned --image cirros-0.5.2-x86_64-disk --nic net-id=private 
--availability-zone :devstack0 --wait vm$i ; done

# kill n-cpu on devstack0
devstack0 $ sudo systemctl stop devstack@n-cpu
# and force it down, so we can start evacuating
openstack compute service set devstack0 nova-compute --down

# evacuate both vms to devstack0a concurrently
for vm in $( openstack server list --host devstack0 -f value -c ID ) ; do 
openstack --os-compute-api-version 2.29 server evacuate --host devstack0a $vm & 
done

# follow up on how the evacuation is going, check if the bug occured, see 
details a bit below
for i in $( seq $n ) ; do openstack server show vm$i -f value -c 
OS-EXT-SRV-ATTR:host -c status ; done

# clean up
devstack0 $ sudo systemctl start devstack@n-cpu
openstack compute service set devstack0 nova-compute --up
for i in $( seq $n ) ; do openstack server delete vm$i --wait ; done

This bug is not deterministic. For example out of 10 tries (like above)
I have seen 4 successes - when both vms successfully evacuated to (went
to ACTIVE on) devstack0a.

But in the other 6 cases only one vm evacuated successfully. The other
vm went to ERROR state, with the error message: "CPU set to pin [0] must
be a subset of free CPU set [1]". For example:

$ openstack server show vm2
...
| fault   | {'code': 400, 'created': 
'2022-08-24T13:50:33Z', 'message': 'CPU set to pin [0] must be a subset of free 
CPU set [1]'} |
...

In n-cpu logs we see the following:

aug 24 13:50:33 devstack0a nova-compute[246038]: ERROR nova.compute.manager 
[None req-278f5b67-a765-4231-b2b9-db3f8c7fe092 admin admin] [instance: 
dc3acde3-f1c6-41a9-9a12-0c278ad4b348] Setting instance vm_state to ERROR: 
nova.exception.CPUPinningInvalid: CPU set to pin [0] must be a subset of free 
CPU set [1]
aug 24 13:50:33 devstack0a nova-compute[246038]: ERROR nova.compute.manager 
[instance: dc3acde3-f1c6-41a9-9a12-0c278ad4b348] Traceback (most recent call 
last):
aug 24 13:50:33 devstack0a nova-compute[246038]: ERROR nova.compute.manager 
[instance: dc3acde3-f1c6-41a9-9a12-0c278ad4b348]   File 
"/opt/stack/nova/nova/compute/manager.py", line 10375, in 
_error_out_instance_on_exception
aug 24 13:50:33 devstack0a nova-compute[246038]: ERROR nova.compute.manager 
[instance: dc3acde3-f1c6-41a9-9a12-0c278ad4b348] yield
aug 24 13:50:33 devstack0a nova-compute[246038]: ERROR nova.compute.manager 
[instance: dc3acde3-f1c6-41a9-9a12-0c278ad4b348]   File 

[Yahoo-eng-team] [Bug 1988168] [NEW] Broken host:port splitting

2022-08-30 Thread Bence Romsics
Public bug reported:

Our users found a bug while POSTing to /v3/ec2tokens. I could simplify
the reproduction to this script:

$ cat keystone-post-ec2tokens.sh 
#! /bin/sh

# source openrc admin admin
# keystone-post-ec2tokens.sh http://127.0.0.1/identity/v3

keystone_base_url="${1:?}"

cleanup () {
openstack ec2 credential delete "$access"
}
trap cleanup EXIT

#host="localhost"
host="localhost:123"
#host="1.2.3.4:123"
#host="[fc00::]:123"
access="$( openstack ec2 credential create -f value -c access )"
secret="$( openstack ec2 credential show "$access" -f value -c secret )"
signature="intentionally-invalid"

cat 

[Yahoo-eng-team] [Bug 1983570] [NEW] cannot schedule ovs sriov offload port to tunneled segment

2022-08-04 Thread Bence Romsics
Public bug reported:

We observed a scheduling failure when using ovs sriov offload 
(https://docs.openstack.org/neutron/latest/admin/config-ovs-offload.html
) in combination with multisegment networks. The problem seems to affect the 
case when the port should be bound to a tunneled network segment (a segment 
that does not have a physnet).

I read that nova scheduler works the same way with pci sriov
passthrough, therefore I believe the same bug affects pci sriov
passthrough, though I did not test that.

Due to the special hardware needs for this environment I could not
reproduce this in devstack. But I hope we have collected enough
information that shows the error regardless. We believe we also
identified the relevant lines of code.

The overall setup includes l2gw - connecting the segments in the
multisegment network. But I will ignore that here, since l2gw cannot be
part of the root cause here. Neutron was configured with
mechanism_drivers=sriovnicswitch,opendaylight_v2. However since the
error happens before we bind the port, I believe the mechanism_driver is
irrelevant as long as it allows the creation of ports with "--vnic-type
direct --binding-profile '{"capabilities": ["switchdev"]}'". For the
sake of simplicity I will call these "ovs sriov offload ports".

As I understand the problem:

1) ovs sriov offload port on single segment neutron network, the segment is 
vxlan: works
2) normal port on no offload capable ovs (--vnic-type normal) on multisegment 
neutron network, one vlan, one vxlan segment, the port should be bound to the 
vxlan segment: works
3) ovs sriov offload port on multisegment neutron network, one vlan, one vxlan 
segment, the port should be bound to the vxlan segment: does not work

To reproduce:
* create a multisegment network with one vlan and one vxlan segment
* create a port on that network with "--vnic-type direct --binding-profile 
'{"capabilities": ["switchdev"]}' --disable-port-security --no-security-group".
* boot a vm with that port

On the compute host on which we expect the scheduling and boot to succeed we 
have configuration like:
[pci]
passthrough_whitelist = [{"devname": "data2", "physical_network": null}, 
{"devname": "data3", "physical_network": null}]

According to https://docs.openstack.org/nova/latest/admin/pci-
passthrough.html this marks the tunneled segments on this host to be
passthrough (and ovs offload) capable.

The vm boot fails with:

$ openstack server show c3_ms_1
...
| fault   | {'code': 500, 'created': 
'2022-07-16T08:12:31Z', 'message': 'Insufficient compute resources: Requested 
instance NUMA topology together with requested PCI devices cannot fit the given 
host NUMA topology; Claim pci failed.', 'details': 'Traceback (most recent call 
last):\n  File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 
2418, in _build_and_run_instance\nlimits):\n  File 
"/usr/lib/python3.6/site-packages/oslo_concurrency/lockutils.py", line 360, in 
inner\nreturn f(*args, **kwargs)\n  File 
"/usr/lib/python3.6/site-packages/nova/compute/resource_tracker.py", line 172, 
in instance_claim\npci_requests, limits=limits)\n  File 
"/usr/lib/python3.6/site-packages/nova/compute/claims.py", line 72, in 
__init__\nself._claim_test(compute_node, limits)\n  File 
"/usr/lib/python3.6/site-packages/nova/compute/claims.py", line 114, in 
_claim_test\n"; 
".join(reasons))\nnova.exception.ComputeResourcesUnavailable: Insufficient 
compute resources: Requested instance NUMA topology together with requested PCI 
devices cannot fit the given host NUMA topology; Claim pci failed.\n\nDuring 
handling of the above exception, another exception occurred:\n\nTraceback (most 
recent call last):\n  File 
"/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 2271, in 
_do_build_and_run_instance\nfilter_properties, request_spec, accel_uuids)\n 
 File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 2469, in 
_build_and_run_instance\ninstance_uuid=instance.uuid, 
reason=e.format_message())\nnova.exception.RescheduledException: Build of 
instance 09f3f8bb-b4c0-4395-8167-c10609d32d08 was re-scheduled: Insufficient 
compute resources: Requested instance NUMA topology together with requested PCI 
devices cannot fit the given host NUMA topology; Claim pci failed.\n'} |
...

In the scheduler logs we see that the scheduler uses a spec with a
physnet. But the pci passthrough capability is on a device without a
physnet.

controlhost3:/home/ceeinfra # grep DC259-CEE3- /var/log/nova/nova-scheduler.log
<180>2022-07-16T10:12:29.680009+02:00 
controlhost3.dc259cee3.cloud.k2.ericsson.se nova-scheduler[67299]: 2022-07-16 
10:12:29.679 76 WARNING nova.scheduler.host_manager 
[req-4dd7c37e-eb18-48da-9914-44a6a2a18b1d fcd3b2713191485d95befe1941f20e20 
cf7024f0f2bd46a8b17fd42055a20323 - default default] Selected host: 
compute3.dc259cee3.cloud.k2.ericsson.se failed to consume from instance. Error: 
PCI device 

[Yahoo-eng-team] [Bug 1966403] Re: Cres_Ubuntu 20.04, CI, Checkbox TPM test failed

2022-03-31 Thread Bence Romsics
Hi,

Are you sure you wanted to post this bug report to the neutron project's
bug tracker?

** Changed in: neutron
   Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1966403

Title:
  Cres_Ubuntu 20.04,CI,Checkbox TPM test failed

Status in neutron:
  Invalid

Bug description:
  Install Ubuntu 20.04 OS,and install checkbox,run the TPM test of the 
checkbox,There is 4 items failed.
  Failed Item:
  1.tpm2.0_4.1.1/context_gap_max_check
  2.tpm2.0_4.1.1/tpm2_getcap
  3.tpm2.0_4.1.1/tpm2_nv
  4.tpm2.0_4.1.1/tpm2_quote

   
  [Reproduce Steps]
  1.Install Ubuntu 20.04. 
  2.Install checkbox. 
  3.Run the TPM test,issue occurred.

  [Result]
  Expected Result:the test should be pass.
  Actual Result: Test failed

  [Additional information]
  Test Vault ID:159637
  Checkbox Test Case ID:100554
  BIOS Version:0.9.39
  Image/Manifest:dell-bto-focal-fossa-corsola-X212-20220302-1.iso
  CPU:XEON(R) PROCESSOR SAPPHIRE RAPIDS WS D-0 56c 105MB 350 W QYQU ES2 -112L 
SSKU, DPN:99AMTK
  MEM:Samsung, DIMM,16GB,4800,1RX8,16G,DDR5,R, DPN:1V1N1
  GPU:GV100
  Failure rate:100%

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1966403/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1955775] Re: Error when l3-agent get filter id for ip

2021-12-27 Thread Bence Romsics
** Changed in: neutron
   Status: In Progress => Won't Fix

** Changed in: neutron
   Status: Won't Fix => Triaged

** Changed in: neutron
   Importance: Undecided => High

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1955775

Title:
  Error when l3-agent get filter id for ip

Status in neutron:
  Triaged

Bug description:
  2021-12-22 23:37:34.366 3403 ERROR neutron.agent.l3.agent Traceback (most 
recent call last):
  2021-12-22 23:37:34.366 3403 ERROR neutron.agent.l3.agent   File 
"/usr/lib/python3.6/site-packages/neutron/agent/l3/agent.py", line 555, in 
_process_router_update
  2021-12-22 23:37:34.366 3403 ERROR neutron.agent.l3.agent 
self._process_router_if_compatible(router)
  2021-12-22 23:37:34.366 3403 ERROR neutron.agent.l3.agent   File 
"/usr/lib/python3.6/site-packages/neutron/agent/l3/agent.py", line 477, in 
_process_router_if_compatible
  2021-12-22 23:37:34.366 3403 ERROR neutron.agent.l3.agent 
self._process_updated_router(router)
  2021-12-22 23:37:34.366 3403 ERROR neutron.agent.l3.agent   File 
"/usr/lib/python3.6/site-packages/neutron/agent/l3/agent.py", line 501, in 
_process_updated_router
  2021-12-22 23:37:34.366 3403 ERROR neutron.agent.l3.agent 
self.l3_ext_manager.update_router(self.context, router)
  2021-12-22 23:37:34.366 3403 ERROR neutron.agent.l3.agent   File 
"/usr/lib/python3.6/site-packages/neutron/agent/l3/l3_agent_extensions_manager.py",
 line 54, in update_router
  2021-12-22 23:37:34.366 3403 ERROR neutron.agent.l3.agent 
extension.obj.update_router(context, data)
  2021-12-22 23:37:34.366 3403 ERROR neutron.agent.l3.agent   File 
"/usr/lib/python3.6/site-packages/oslo_concurrency/lockutils.py", line 359, in 
inner
  2021-12-22 23:37:34.366 3403 ERROR neutron.agent.l3.agent return f(*args, 
**kwargs)
  2021-12-22 23:37:34.366 3403 ERROR neutron.agent.l3.agent   File 
"/usr/lib/python3.6/site-packages/neutron/agent/l3/extensions/qos/fip.py", line 
236, in update_router
  2021-12-22 23:37:34.366 3403 ERROR neutron.agent.l3.agent 
self.process_floating_ip_addresses(context, router_info)
  2021-12-22 23:37:34.366 3403 ERROR neutron.agent.l3.agent   File 
"/usr/lib/python3.6/site-packages/neutron/agent/l3/extensions/qos/fip.py", line 
218, in process_floating_ip_addresses
  2021-12-22 23:37:34.366 3403 ERROR neutron.agent.l3.agent 
self.process_ip_rates(fip_addr, device, rates)
  2021-12-22 23:37:34.366 3403 ERROR neutron.agent.l3.agent   File 
"/usr/lib/python3.6/site-packages/neutron/agent/l3/extensions/qos/fip.py", line 
183, in process_ip_rates
  2021-12-22 23:37:34.366 3403 ERROR neutron.agent.l3.agent rate['rate'], 
rate['burst'])
  2021-12-22 23:37:34.366 3403 ERROR neutron.agent.l3.agent   File 
"/usr/lib/python3.6/site-packages/neutron/agent/l3/extensions/qos/fip.py", line 
123, in process_ip_rate_limit
  2021-12-22 23:37:34.366 3403 ERROR neutron.agent.l3.agent 
tc_wrapper.set_ip_rate_limit(direction, ip, rate, burst)
  2021-12-22 23:37:34.366 3403 ERROR neutron.agent.l3.agent   File 
"/usr/lib/python3.6/site-packages/neutron/agent/linux/l3_tc_lib.py", line 169, 
in set_ip_rate_limit
  2021-12-22 23:37:34.366 3403 ERROR neutron.agent.l3.agent filter_id = 
self._get_filterid_for_ip(qdisc_id, ip)
  2021-12-22 23:37:34.366 3403 ERROR neutron.agent.l3.agent   File 
"/usr/lib/python3.6/site-packages/neutron/agent/linux/l3_tc_lib.py", line 82, 
in _get_filterid_for_ip
  2021-12-22 23:37:34.366 3403 ERROR neutron.agent.l3.agent 
filterids_for_ip.append(filter_id)
  2021-12-22 23:37:34.366 3403 ERROR neutron.agent.l3.agent UnboundLocalError: 
local variable 'filter_id' referenced before assignment

  If the interface is accidentally added some tc rules not through
  neutron, for example, the interface has two tc rules, the first rule
  is "filter protocol all ...", the second rule is "match ...". This
  first rule mismatch FILTER_ID_REGEX and the second rule starts with
  "match", so the code will execute this statement:

  filterids_for_ip.append(filter_id)

  But filter_id has not been assignment at this time.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1955775/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1955765] Re: Devstack - Can no longer enable qos with neutron-qos

2021-12-27 Thread Bence Romsics
Hi,

There's a long history here, but I would actually recommend that you
switch back to using the legacy devstack plugin.

The new neutron devstack plugin AFAICT worked quite well in a simple dev
environment. Despite the legacy one being deprecated for a long time,
the work on the new one stalled and it never completely replaced the
legacy plugin (mostly for use cases in the gate). For a time both were
maintained. And at some point we acknowledged that the new devstack
plugin will never be completed and un-deprecated the legacy plugin:

https://review.opendev.org/c/openstack/devstack/+/704829

Some of these changes were clearly unexpected and probably we could have
done a better job communicating which plugin is the preferred. And now
maybe we should deprecate the new plugin. I think I'll ask the team
about that on our next meeting.

But until then the best I can recommend is that you switch back to using
the legacy devstack plugin.

Regards,
Bence

** Changed in: neutron
   Status: New => Opinion

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1955765

Title:
  Devstack - Can no longer enable qos with neutron-qos

Status in neutron:
  Opinion

Bug description:
  The neutron-qos functions were moved away from neutron devstack plugin
  with [1] and added to devstack directly with [2] and [3]. However,
  when one would previously enable neutron-qos in devstack with
  `neutron-qos`, this is no longer the case as the functions were added
  to the neutron-legacy file that is only sourced when legacy (quantum
  era) neutron services are enabled.

  
  [1] https://review.opendev.org/#/q/I7b70d6281d551a88080c6e727e2485079ba5c061
  [2] https://review.opendev.org/#/q/I48f65d530db53fe2c94cad57a8072e1158d738b0
  [3] https://review.opendev.org/#/q/Icf459a2f8c6ae3c3cb29b16ba0b92766af41af30

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1955765/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1955491] Re: [DHCP] Neutron DHCP agent failing when disabling the Linux DHCP service

2021-12-22 Thread Bence Romsics
Rodolfo, based on your analysis I moved this report to tripleo. Of
course if it also has a neutron part, just add that back please.

** Project changed: neutron => tripleo

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1955491

Title:
  [DHCP] Neutron DHCP agent failing when disabling the Linux DHCP
  service

Status in tripleo:
  New

Bug description:
  This issue has been detected running Neutron Train (Red Hat OSP 16.2),
  using TripleO as deployment tool. The services run on containers,
  using podman.

  The DHCP tries to stop the disable the DHCP helper. That calls the
  driver "disable" method [1]. In Linux that will call [2], that will
  try to stop the running process. In devstack, this process is a
  "dnsmasq" instance running on the DHCP namespace. In TripleO, the DHCP
  agent container will spawn a sidecar container to execute the
  "dnsmasq" instance. That requires a specific kill script [3].

  In this deployment, the DHCP agent is returning exit code 125 when trying to 
disable the "dnsmasq" process (running in a container):
neutron_lib.exceptions.ProcessExecutionError: Exit code: 125; Stdin: ; 
Stdout: ; Stderr:

  
  This error code comes from "podman" and could be cause because the container 
is not present in the system. That will raise an exception [4] that will re 
schedule a resync. The DHCP agent will enter in an endless loop unless 
restarted. That will remove from "self.cache = NetworkCache()" the affected 
network that is triggering the exception.

  Logs DHCP agent (snippet): [4]

  Bugzilla reference:
  https://bugzilla.redhat.com/show_bug.cgi?id=2032010

  
  
[1]https://github.com/openstack/neutron/blob/df9435a9a6fab9492c4f23d9ab0f1507841430c7/neutron/agent/dhcp/agent.py#L413-L426
  
[2]https://github.com/openstack/neutron/blob/df9435a9a6fab9492c4f23d9ab0f1507841430c7/neutron/agent/linux/dhcp.py#L305-L313
  
[3]https://github.com/openstack/tripleo-heat-templates/blob/25db32d4e5ed7ed4687bbb6d07a8a87ad65b71e6/deployment/neutron/kill-script
  [4]https://paste.opendev.org/show/811802/

To manage notifications about this bug go to:
https://bugs.launchpad.net/tripleo/+bug/1955491/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1952730] [NEW] Segment updates may cause unnecessary overload

2021-11-30 Thread Bence Romsics
Public bug reported:

When:

* the segments service plugin is enabled and
* we have many rpc worker processes (as in the sum of rpc_workers and 
rpc_state_report_workers, since both kind processes agent state_reports) and
* many ovs-agents report physnets and
* neutron-server is restarted,

then rpc workers may get overloaded by state_report messages. That is:
they may run at 100% CPU utilization for tens of minutes and during that
they are not able process ovs-agent's state_reports in a timely manner.
Which in turn causes the agent state to go down and back, maybe multiple
times. Eventually, as the workers get through the initial processing,
the load lessens, and the system stabilizes. The same rate of incoming
state_report messages is not a problem at that point.

(Colleagues working downstream observed this on a stable/victoria base
with cc 150 ovs-agents and 3 neutron-servers each having maybe
rpc_workers=6 and rpc_state_report_workers=6. The relevant code did not
change at all since victoria, so I believe the same would happen on
master.)

I think the root cause is the following:

rabbitmq dispatches the state_report messages between the workers in a
round robin fashion,  terefore eventually the state_reports of the same
agent will hit all rpc workers. Each worker has logic to update the host
segment mapping if either the server or the agent got restarted:

https://opendev.org/openstack/neutron/src/commit/90b5456b8c11011c41f2fcd53a8943cb45fb6479/neutron/services/segments/db.py#L304-L305

Unfortunately the 'reported_hosts' set (to remember from which host the server 
has seen agent reports already) is private to each worker process. But right 
after a server (re-)start when that set is still empty, each worker will 
unconditionally write the received physnet-segment information into the db. 
This means we multiply the load on the db and rpc workers by a factor of the 
total rpc worker count.

Pushing a fix attempt soon.

** Affects: neutron
 Importance: High
 Assignee: Bence Romsics (bence-romsics)
 Status: In Progress

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1952730

Title:
  Segment updates may cause unnecessary overload

Status in neutron:
  In Progress

Bug description:
  When:

  * the segments service plugin is enabled and
  * we have many rpc worker processes (as in the sum of rpc_workers and 
rpc_state_report_workers, since both kind processes agent state_reports) and
  * many ovs-agents report physnets and
  * neutron-server is restarted,

  then rpc workers may get overloaded by state_report messages. That is:
  they may run at 100% CPU utilization for tens of minutes and during
  that they are not able process ovs-agent's state_reports in a timely
  manner. Which in turn causes the agent state to go down and back,
  maybe multiple times. Eventually, as the workers get through the
  initial processing, the load lessens, and the system stabilizes. The
  same rate of incoming state_report messages is not a problem at that
  point.

  (Colleagues working downstream observed this on a stable/victoria base
  with cc 150 ovs-agents and 3 neutron-servers each having maybe
  rpc_workers=6 and rpc_state_report_workers=6. The relevant code did
  not change at all since victoria, so I believe the same would happen
  on master.)

  I think the root cause is the following:

  rabbitmq dispatches the state_report messages between the workers in a
  round robin fashion,  terefore eventually the state_reports of the
  same agent will hit all rpc workers. Each worker has logic to update
  the host segment mapping if either the server or the agent got
  restarted:

  
https://opendev.org/openstack/neutron/src/commit/90b5456b8c11011c41f2fcd53a8943cb45fb6479/neutron/services/segments/db.py#L304-L305
  
  Unfortunately the 'reported_hosts' set (to remember from which host the 
server has seen agent reports already) is private to each worker process. But 
right after a server (re-)start when that set is still empty, each worker will 
unconditionally write the received physnet-segment information into the db. 
This means we multiply the load on the db and rpc workers by a factor of the 
total rpc worker count.

  Pushing a fix attempt soon.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1952730/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1951429] [NEW] Neutron API responses should not contain tracebacks

2021-11-18 Thread Bence Romsics
Public bug reported:

Security folks found some corner cases in the neutron API where the
response contains a traceback, for example:

$ curl --request-target foo -k http://127.0.0.1:9696
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/eventlet/wsgi.py", line 563, in 
handle_one_response
result = self.application(self.environ, start_response)
  File "/usr/local/lib/python3.8/dist-packages/paste/urlmap.py", line 208, in 
__call__
path_info = self.normalize_url(path_info, False)[1]
  File "/usr/local/lib/python3.8/dist-packages/paste/urlmap.py", line 130, in 
normalize_url
assert (not url or url.startswith('/')
AssertionError: URL fragments must start with / or http:// (you gave 'foo')

As a developer I don't mind such tracebacks, but I see their point that
this may give away unwanted information to an attacker. On the other
hand I would not consider this in itself a vulnerability.

Pushing a trivial fix in a minute.

** Affects: neutron
 Importance: Low
 Assignee: Bence Romsics (bence-romsics)
 Status: In Progress


** Tags: api

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1951429

Title:
  Neutron API responses should not contain tracebacks

Status in neutron:
  In Progress

Bug description:
  Security folks found some corner cases in the neutron API where the
  response contains a traceback, for example:

  $ curl --request-target foo -k http://127.0.0.1:9696
  Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/eventlet/wsgi.py", line 563, 
in handle_one_response
  result = self.application(self.environ, start_response)
File "/usr/local/lib/python3.8/dist-packages/paste/urlmap.py", line 208, in 
__call__
  path_info = self.normalize_url(path_info, False)[1]
File "/usr/local/lib/python3.8/dist-packages/paste/urlmap.py", line 130, in 
normalize_url
  assert (not url or url.startswith('/')
  AssertionError: URL fragments must start with / or http:// (you gave 'foo')

  As a developer I don't mind such tracebacks, but I see their point
  that this may give away unwanted information to an attacker. On the
  other hand I would not consider this in itself a vulnerability.

  Pushing a trivial fix in a minute.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1951429/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1945747] Re: GET security group rule is missing description attribute

2021-10-04 Thread Bence Romsics
*** This bug is a duplicate of bug 1904188 ***
https://bugs.launchpad.net/bugs/1904188

I am marking this as duplicate. Let me know if you think differently.
Also don't hesitate to propose a backport to stable/ussuri.

** This bug has been marked a duplicate of bug 1904188
   Include standard attributes ID in OVO dictionaries to improve the OVN 
revision numbers operation

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1945747

Title:
  GET security group rule is missing description attribute

Status in neutron:
  New

Bug description:
  The description attribute is missed attribute in
  _make_security_group_rule_dict

  Create sec group rule with desc

  stack@bionic-template:~/devstack$ openstack security group rule create 
--description "test rule" --remote-ip 0.0.0.0/0 --ingress 
ff57f76f-93a0-4bf3-b538-c88df40fdc40
  
+---+--+
  | Field | Value   

 |
  
+---+--+
  | created_at| 2021-10-01T06:35:50Z

 |
  | description   | test rule   

 |
  | direction | ingress 

 |
  | ether_type| IPv4

 |
  | id| 389eb45e-58ac-471c-b966-a3c8784009f7

 |
  | location  | cloud='', project.domain_id='default', 
project.domain_name=, project.id='f2527eb734c745eca32b1dfbd9107563', 
project.name='admin', region_name='RegionOne', zone= |
  | name  | None

 |
  | port_range_max| None

 |
  | port_range_min| None

 |
  | project_id| f2527eb734c745eca32b1dfbd9107563

 |
  | protocol  | None

 |
  | remote_group_id   | None

 |
  | remote_ip_prefix  | None

 |
  | revision_number   | 0   

 |
  | security_group_id | ff57f76f-93a0-4bf3-b538-c88df40fdc40

 |
  | tags  | []  

 |
  | updated_at| 2021-10-01T06:35:50Z

 |
  
+---+--+

  
  Example get (no description)

  RESP BODY: {"security_group_rule": {"id":
  

[Yahoo-eng-team] [Bug 1936839] [NEW] Ingress bw-limit with DPDK does not work

2021-07-19 Thread Bence Romsics
Public bug reported:

A colleague of mine working downstream found the following bug (his
report follows with minor redactions of company-internal details). I'm
going to push his proposed fix in a minute too.

In short, the inbound bandwidth limitation on vHost user ports doesn't
seem to work. The value set with OpenStack QoS commands on the port
isn't configured properly. The problem exists in OVS backend.

Creating a 1 Mbit/s limit rule:

openstack network qos rule create max_1_Mbps --type bandwidth-limit
--max-kbps 1000 --max-burst-kbits 1000 --ingress

After applying to the port, you can query it on the compute:

compute-0-5:/home/ceeinfra # ovs-vsctl list qos
_uuid   : c326ed8b-24ef-4f1f-a5b0-b20f3ca3297d
external_ids: {id=vhu84edf6c2-f0}
other_config: {cbs="125000.0", cir="125000.0"}
queues  : {}
type: egress-policer

Note: the traffic is ingress from the VM point of view, and egress from
OVS.

The values are not integers, they have a .0 at the end. In
/var/log/openvswitch/ovs-vswitchd.log you can see that it is not
accepted:

2021-07-15T12:36:23.121Z|00208|netdev_dpdk|ERR|Could not create rte meter for 
egress policer
2021-07-15T12:36:23.121Z|00209|netdev_dpdk|ERR|Failed to set QoS type 
egress-policer on port vhu84edf6c2-f0: Invalid argument
2021-07-15T12:36:23.126Z|00210|netdev_dpdk|ERR|Could not create rte meter for 
egress policer
2021-07-15T12:36:23.126Z|00211|netdev_dpdk|ERR|Failed to set QoS type 
egress-policer on port vhu84edf6c2-f0: Invalid argument

If you create a traffic between two VMs, the downloading one having the
limitation applied on its port reports this:

root@bwtest2:~# nc 192.168.1.201  | dd of=/dev/null status=progress
816316928 bytes (816 MB, 779 MiB) copied, 5 s, 163 MB/s^C
1863705+71 records in
1863738+0 records out
954233856 bytes (954 MB, 910 MiB) copied, 8.23046 s, 116 MB/s

The bandwidth is higher than the set 1 Mb/s.

It is possible to modify the OVS agent so it applies the bandwidth limit
correctly. You have to find out where the Python scripts of the
neutron_openvswitch_agent container are stored on the compute host. In
our environment the file to modify is:

/var/lib/docker/overlay2/68653008fca0a6434adb3985b021b2329680b71b49859c3a028f951deed59df3/merged/usr/lib/python3.6/site-
packages/neutron/agent/common/ovs_lib.py

In the _update_ingress_bw_limit_for_dpdk_port function, the original
code is:

# cir and cbs should be set in bytes instead of bits
qos_other_config = {
'cir': str(max_bw_in_bits / 8),
'cbs': str(max_burst_in_bits / 8)
}

If you modify the code to this:

# cir and cbs should be set in bytes instead of bits
qos_other_config = {
'cir': str(int(max_bw_in_bits / 8)),
'cbs': str(int(max_burst_in_bits / 8))
}

the values passed to OVS will be integers. You can see the difference
querying the new values after applying the limit on the ports again:

compute-0-5:/home/ceeinfra # ovs-vsctl list qos
_uuid   : b93b1165-e839-4378-a6b7-b75c13ad0d41
external_ids: {id=vhu84edf6c2-f0}
other_config: {cbs="125000", cir="125000"}
queues  : {}
type: egress-policer

They don't have the .0 at the and anymore, and OVS doesn't complain in
the logs about invalid arguments. The bandwidth limitation between the
computes now works:

root@bwtest2:~# nc 192.168.1.201  | dd of=/dev/null status=progress
4095488 bytes (4.1 MB, 3.9 MiB) copied, 33 s, 123 kB/s^C
7274+1382 records in
8051+0 records out
4122112 bytes (4.1 MB, 3.9 MiB) copied, 33.4033 s, 123 kB/s

125 kB/s translates to 1 Mb/s that we have applied with the rule, so it
works now.

My guess is that this problem comes from the different behaviour between
the division in Python2 and Python3:

user@debian:~$ python2
Python 2.7.16 (default, Oct 10 2019, 22:02:15)
[GCC 8.3.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> 4/3
1
>>> 
user@debian:~$ python3
Python 3.7.3 (default, Jan 22 2021, 20:04:44)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> 4/3
1.
>>> 

Python3 doesn't round to integers, and OVS doesn't seem to accept
floating point numbers.

I have seen this in multiple versions.

** Affects: neutron
 Importance: Undecided
 Assignee: Bence Romsics (bence-romsics)
 Status: In Progress


** Tags: ovs qos

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1936839

Title:
  Ingress bw-limit with DPDK does not work

Status in neutron:
  In Progress

Bug description:
  A colleague of mine working downstream found the following bug (his
 

[Yahoo-eng-team] [Bug 1934238] Re: instance failed network setup

2021-07-06 Thread Bence Romsics
As gibi said above, this is unlikely to be either a nova or a neutron
problem, but more likely a deployment problem. I don't believe the
various neutron log lines quoted have anything to do with the root
cause.

To help with the debugging:

What deployment software did you use?
Are you using devstack - since you said you deployed into vms?
How was the deployment software configured?
Did the deployment complete successfully?
Is neutron-server running?
Is neutron-server actually available at the address nova tries to connect to?
>From both hosts?
Since you mentioned that you used 2 vms, did the error message come from the 
same host where the controller components are running? If not then the 
http://localhost:9696/... url is definitely wrong.

** Changed in: neutron
   Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1934238

Title:
  instance failed network setup

Status in neutron:
  Invalid
Status in OpenStack Compute (nova):
  Invalid

Bug description:
  I set up open stack on 2 ubuntu vm. when i want to create a new
  instance its cant connect to neutron and in nova-compute logs show
  bellow logs:



  2021-07-01 02:15:08.398 83631 INFO nova.compute.claims 
[req-830e0a7f-5a50-448c-a186-f082962c3c86 91704884e43f48fcbd156b8d7429fc3e 
5e055db0a5464dc1997ab0f456792271 - default default] [instance: 
3316f595-0e20-4914-90a2-c00da68c82ec] Claim successful on node compute
  2021-07-01 02:15:12.006 83631 INFO nova.virt.libvirt.driver 
[req-830e0a7f-5a50-448c-a186-f082962c3c86 91704884e43f48fcbd156b8d7429fc3e 
5e055db0a5464dc1997ab0f456792271 - default default] [instance: 
3316f595-0e20-4914-90a2-c00da68c82ec] Creating image
  2021-07-01 02:15:14.264 83631 ERROR nova.compute.manager 
[req-830e0a7f-5a50-448c-a186-f082962c3c86 91704884e43f48fcbd156b8d7429fc3e 
5e055db0a5464dc1997ab0f456792271 - default default] Instance failed network 
setup after 1 attempt(s): keystoneauth1.exceptions.connection.ConnectFailure: 
Unable to establish connection to 
http://localhost:9696/v2.0/networks?id=163f0b54-e337-40ac-81af-958c24ceeb7f: 
HTTPConnectionPool(host='localhost', port=9696): Max retries exceeded with url: 
/v2.0/networks?id=163f0b54-e337-40ac-81af-958c24ceeb7f (Caused by 
NewConnectionError(': Failed to establish a new connection: [Errno 111] 
ECONNREFUSED'))
  2021-07-01 02:15:14.264 83631 ERROR nova.compute.manager Traceback (most 
recent call last):
  2021-07-01 02:15:14.264 83631 ERROR nova.compute.manager   File 
"/usr/lib/python3/dist-packages/urllib3/connection.py", line 159, in _new_conn
  2021-07-01 02:15:14.264 83631 ERROR nova.compute.manager conn = 
connection.create_connection(
  2021-07-01 02:15:14.264 83631 ERROR nova.compute.manager   File 
"/usr/lib/python3/dist-packages/urllib3/util/connection.py", line 84, in 
create_connection
  2021-07-01 02:15:14.264 83631 ERROR nova.compute.manager raise err
  2021-07-01 02:15:14.264 83631 ERROR nova.compute.manager   File 
"/usr/lib/python3/dist-packages/urllib3/util/connection.py", line 74, in 
create_connection
  2021-07-01 02:15:14.264 83631 ERROR nova.compute.manager sock.connect(sa)
  2021-07-01 02:15:14.264 83631 ERROR nova.compute.manager   File 
"/usr/lib/python3/dist-packages/eventlet/greenio/base.py", line 253, in connect
  2021-07-01 02:15:14.264 83631 ERROR nova.compute.manager 
socket_checkerr(fd)
  2021-07-01 02:15:14.264 83631 ERROR nova.compute.manager   File 
"/usr/lib/python3/dist-packages/eventlet/greenio/base.py", line 51, in 
socket_checkerr
  2021-07-01 02:15:14.264 83631 ERROR nova.compute.manager raise 
socket.error(err, errno.errorcode[err])
  2021-07-01 02:15:14.264 83631 ERROR nova.compute.manager 
ConnectionRefusedError: [Errno 111] ECONNREFUSED
  2021-07-01 02:15:14.264 83631 ERROR nova.compute.manager 
  2021-07-01 02:15:14.264 83631 ERROR nova.compute.manager During handling of 
the above exception, another exception occurred:
  2021-07-01 02:15:14.264 83631 ERROR nova.compute.manager 
  2021-07-01 02:15:14.264 83631 ERROR nova.compute.manager Traceback (most 
recent call last):
  2021-07-01 02:15:14.264 83631 ERROR nova.compute.manager   File 
"/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 665, in urlopen
  2021-07-01 02:15:14.264 83631 ERROR nova.compute.manager httplib_response 
= self._make_request(
  2021-07-01 02:15:14.264 83631 ERROR nova.compute.manager   File 
"/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 387, in 
_make_request
  2021-07-01 02:15:14.264 83631 ERROR nova.compute.manager 
conn.request(method, url, **httplib_request_kw)
  2021-07-01 02:15:14.264 83631 ERROR nova.compute.manager   File 
"/usr/lib/python3.8/http/client.py", line 1255, in request
  2021-07-01 02:15:14.264 83631 ERROR nova.compute.manager 
self._send_request(method, url, body, headers, encode_chunked)
  2021-07-01 02:15:14.264 83631 ERROR 

[Yahoo-eng-team] [Bug 1921126] [NEW] [RFE] Allow explicit management of default routes

2021-03-24 Thread Bence Romsics
Public bug reported:

This RFE proposes to allow explicit management of the default route(s)
of a Neutron router.  This is mostly useful for a user to install
multiple default routes for Equal Cost Multipath (ECMP) and treat all
these routes uniformly.

Since I already written a spec proposal for this, please see the details
there:

https://review.opendev.org/c/openstack/neutron-specs/+/781475

** Affects: neutron
 Importance: Wishlist
 Assignee: Bence Romsics (bence-romsics)
 Status: New


** Tags: rfe

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1921126

Title:
  [RFE] Allow explicit management of default routes

Status in neutron:
  New

Bug description:
  This RFE proposes to allow explicit management of the default route(s)
  of a Neutron router.  This is mostly useful for a user to install
  multiple default routes for Equal Cost Multipath (ECMP) and treat all
  these routes uniformly.

  Since I already written a spec proposal for this, please see the
  details there:

  https://review.opendev.org/c/openstack/neutron-specs/+/781475

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1921126/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1905295] [NEW] [RFE] Allow multiple external gateways on a router

2020-11-23 Thread Bence Romsics
Public bug reported:

I'd like to bring the following idea to the drivers' meeting. If this
still looks like a good idea after that discussion, I'll open a spec so
this can be properly commented on in gerrit. Until then feel free to
comment here of course.

# Problem Description

A general router can be configured to connect and route to multiple
external networks for higher availability and/or to balance the load.
However the current Neutron API syntax allows exactly one external
gateway for a router.

https://docs.openstack.org/api-ref/network/v2/?expanded=create-router-
detail#create-router

{
"router": {
"name": "router1",
"external_gateway_info": {
"network_id": "ae34051f-aa6c-4c75-abf5-50dc9ac99ef3",
"enable_snat": true,
"external_fixed_ips": [
{
"ip_address": "172.24.4.6",
"subnet_id": "b930d7f6-ceb7-40a0-8b81-a425dd994ccf"
}
]
},
"admin_state_up": true
}
}

However consider the following (simplified) network architecture as an
example:

R3 R4
 |X|
R1 R2
 |X|
C1 C2 ...

(Sorry, my original, nice ascii art was eaten by launchpad. I hope this
still conveys what I mean.)

Where C1, C2, ... are compute nodes, R1 and R2 are OpenStack-managed
routers, while R3 and R4 are provider edge routers. Between R1-R2 and
R3-R4 Equal Cost Multipath (ECMP) routing is used to utilize all links
in an active-active manner. In such an architecture it makes sense to
represent R1 and R2 as 2 logical routers with 2-2 external gateways, or
in some cases (depending on other architectural choices) even as 1
logical router with 4 external gateways. But with the current API that
is not possible.

# Proposed Change

Extend the router API object with a new attribute
'additional_external_gateways', for example:

{
   "router" : {
  "name" : "router1",
  "admin_state_up" : true,
  "external_gateway_info" : {
 "enable_snat" : false,
 "external_fixed_ips" : [
{
   "ip_address" : "172.24.4.6",
   "subnet_id" : "b930d7f6-ceb7-40a0-8b81-a425dd994ccf"
}
 ],
 "network_id" : "ae34051f-aa6c-4c75-abf5-50dc9ac99ef3"
  },
  "additional_external_gateways" : [
 {
"enable_snat" : false,
"external_fixed_ips" : [
   {
  "ip_address" : "172.24.5.6",
  "subnet_id" : "62da64b0-29ab-11eb-9ed9-3b1175418487"
   }
],
"network_id" : "592d4716-29ab-11eb-a7dd-4f4b5e319915"
 },
 ...
  ]
   }
}

Edited via the following HTTP PUT methods with diff semantics:

PUT /v2.0/routers/{router_id}/add_additional_external_gateways
PUT /v2.0/routers/{router_id}/remove_additional_external_gateways

We keep 'external_gateway_info' for backwards compatibility. When
additional_external_gateways is an empty list, everything behaves as
before. When additional_external_gateways are given, then the actual
list of external gateways is (in Python-like pseudo-code):
[external_gateway_info] + additional_external_gateways.

Unless otherwise specified all non-directly connected external IPs are
routed towards the original external_gateway_info. However this behavior
may be overriden by either using (static) extraroutes, or by running ()
routing protocols and routing towards the external gateway where a
particular route was learned from.

# Alternatives

1) Using 4 logical routers with 1 external gateway each. However in this
case the API misses the information which (2 or 4) logical routers
represent the same backend router.

2) Using a VRRP HA router. However this provides a different level of
High Availability plus it is active-passive instead of active-active.

3) Adding router interfaces (since their number is not limited in the
API) instead of external gateways. However this creates confusion by
blurring the line of what is internal and what is external to the cloud
deployment.

** Affects: neutron
 Importance: Wishlist
 Assignee: Bence Romsics (bence-romsics)
 Status: New


** Tags: rfe

** Description changed:

  I'd like to bring the following idea to the drivers' meeting. If this
  still looks like a good idea after that discussion, I'll open a spec so
  this can be properly commented on in gerrit. Until then feel free to
  comment here of course.
  
  # Problem Description
  
  A general router can be configured to connect and route to multiple
  external networks for higher availability and/or to balance the load

[Yahoo-eng-team] [Bug 1878031] Re: Unable to delete an instance | Conflict: Port [port-id] is currently a parent port for trunk [trunk-id]

2020-05-15 Thread Bence Romsics
While I agree that it would be way more user friendly to give a
warning/error in the problematic API workflow that would entail some
cross project changes because today:

* nova does not know when an already bound port is added to a trunk
* neutron does not know if nova is supposed to auto-delete a port

That means neither nova nor neutron can detect the error condition in
itself.

Again, I believe changing the workflow to pre-create the parent port for
the server stops the problem described in this bug report completely.

So I'm setting this bug as Invalid. But let me know if you see other
alternatives.

** Changed in: neutron
   Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1878031

Title:
   Unable to delete an instance | Conflict: Port [port-id] is currently
  a parent port for trunk [trunk-id]

Status in neutron:
  Invalid

Bug description:
  When you create a trunk in Neutron you create a parent port for the
  trunk and attach the trunk to the parent.  Then subports can be
  created on the trunk.  When instances are created on the trunk, first
  a port is created and then an instance is associated with a free port.
  It looks to me that's this is the oversight in the logic.

  From the perspective of the code, the parent port looks like any other
  port attached to the trunk bridge.  It doesn't have an instance
  attached to it so it looks like it's not being used for anything
  (which is technically correct).  So it becomes an eligible port for an
  instance to bind to.  That is all fine and dandy until you go to
  delete the instance and you get the "Port [port-id] is currently a
  parent port for trunk [trunk-id]" exception just as happened here.
  Anecdotally, it's seems rare that an instance will actually bind to
  it, but that is what happened for the user in this case and I have had
  several pings over the past year about people in a similar state.

  I propose that when a port is made parent port for a trunk, that the
  trunk be established as the owner of the port.  That way it will be
  ineligible for instances seeking to bind to the port.

  See also old bug: https://bugs.launchpad.net/neutron/+bug/1700428

  Description of problem:

  Attempting to delete instance failed with error in nova-compute

  ~~~
  2020-03-04 09:52:46.257 1 WARNING nova.network.neutronv2.api 
[req-0dd45fe4-861c-46d3-a5ec-7db36352da58 02c6d1bc10fe4ffaa289c786cd09b146 
695c417810ac460480055b074bc41817 - default default] [instance: 
2f9e3740-b425-4f00-a949-e1aacf2239c4] Failed to delete port 
991e4e50-481a-4ca6-9ea6-69f848c4ca9f for instance.: Conflict: Port 
991e4e50-481a-4ca6-9ea6-69f848c4ca9f is currently a parent port for trunk 
5800ee0f-b558-46cb-bb0b-92799dbe02cf.
  ~~~

  ~~~
  [stack@migration-host ~]$ openstack network trunk show 
5800ee0f-b558-46cb-bb0b-92799dbe02cf
  +-+--+
  | Field   | Value|
  +-+--+
  | admin_state_up  | UP   |
  | created_at  | 2020-03-04T09:01:23Z |
  | description |  |
  | id  | 5800ee0f-b558-46cb-bb0b-92799dbe02cf |
  | name| WIN-TRUNK|
  | port_id | 991e4e50-481a-4ca6-9ea6-69f848c4ca9f |
  | project_id  | 695c417810ac460480055b074bc41817 |
  | revision_number | 3|
  | status  | ACTIVE   |
  | sub_ports   |  |
  | tags| []   |
  | tenant_id   | 695c417810ac460480055b074bc41817 |
  | updated_at  | 2020-03-04T10:20:46Z |
  +-+--+

  
  [stack@migration-host ~]$ nova interface-list 
2f9e3740-b425-4f00-a949-e1aacf2239c4
  
++--+--+--+---+
  | Port State | Port ID  | Net ID  
 | IP addresses | MAC Addr  |
  
++--+--+--+---+
  | DOWN   | 991e4e50-481a-4ca6-9ea6-69f848c4ca9f | 
9be62c82-4274-48b4-bba0-39ccbdd5bb1b | 192.168.0.19 | fa:16:3e:0a:2b:9b |
  
++--+--+--+---+
  [stack@migration-host ~]$ openstack port show 
991e4e50-481a-4ca6-9ea6-69f848c4ca9f
  
+---+---+
  | Field | Value  

[Yahoo-eng-team] [Bug 1878622] Re: Open vSwitch with DPDK datapath in neutron

2020-05-15 Thread Bence Romsics
Thank you for your bug report!

I believe this typo was fixed in the change below:
https://review.opendev.org/565289

So the command is correct since the rocky version of our docs, for example:
https://docs.openstack.org/neutron/latest/admin/config-ovs-dpdk.html

** Changed in: neutron
   Status: New => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1878622

Title:
  Open vSwitch with DPDK datapath in neutron

Status in neutron:
  Fix Released

Bug description:
  - [ x ] I have a fix to the document that I can paste below including
  example: input and output.

  There is a typo in the following documentation page:

https://docs.openstack.org/neutron/queens/admin/config-ovs-dpdk.html

  $ openstack image set --property hw_vif_mutliqueue_enabled=true
  IMAGE_NAME

  should read:

  $ openstack image set --property hw_vif_multiqueue_enabled=true
  IMAGE_NAME

  (i.e. multi not mutli)

  ---
  Release: 12.1.2.dev96 on 2020-05-11 17:10
  SHA: ed413939fcd134ee616078c017272f229b09f1d9
  Source: 
https://git.openstack.org/cgit/openstack/neutron/tree/doc/source/admin/config-ovs-dpdk.rst
  URL: https://docs.openstack.org/neutron/queens/admin/config-ovs-dpdk.html

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1878622/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1878632] [NEW] Race condition in subnet and segment delete: The segment is still bound with port(s)

2020-05-14 Thread Bence Romsics
08]: ERROR heat.engine.resource 
Traceback (most recent call last):
máj 14 14:37:11 devstack1 heat-engine[12508]: ERROR heat.engine.resource   File 
"/opt/stack/heat/heat/engine/resource.py", line 918, in _action_recorder
máj 14 14:37:11 devstack1 heat-engine[12508]: ERROR heat.engine.resource 
yield
máj 14 14:37:11 devstack1 heat-engine[12508]: ERROR heat.engine.resource   File 
"/opt/stack/heat/heat/engine/resource.py", line 2051, in delete
máj 14 14:37:11 devstack1 heat-engine[12508]: ERROR heat.engine.resource 
*action_args)
máj 14 14:37:11 devstack1 heat-engine[12508]: ERROR heat.engine.resource   File 
"/opt/stack/heat/heat/engine/scheduler.py", line 326, in wrapper
máj 14 14:37:11 devstack1 heat-engine[12508]: ERROR heat.engine.resource 
step = next(subtask)
máj 14 14:37:11 devstack1 heat-engine[12508]: ERROR heat.engine.resource   File 
"/opt/stack/heat/heat/engine/resource.py", line 972, in action_handler_task
máj 14 14:37:11 devstack1 heat-engine[12508]: ERROR heat.engine.resource 
handler_data = handler(*args)
máj 14 14:37:11 devstack1 heat-engine[12508]: ERROR heat.engine.resource   File 
"/opt/stack/heat/heat/engine/resources/openstack/neutron/segment.py", line 146, 
in handle_delete
máj 14 14:37:11 devstack1 heat-engine[12508]: ERROR heat.engine.resource 
self.client('openstack').network.delete_segment(self.resource_id)
máj 14 14:37:11 devstack1 heat-engine[12508]: ERROR heat.engine.resource   File 
"/usr/local/lib/python3.6/dist-packages/openstack/network/v2/_proxy.py", line 
3312, in delete_segment
máj 14 14:37:11 devstack1 heat-engine[12508]: ERROR heat.engine.resource 
self._delete(_segment.Segment, segment, ignore_missing=ignore_missing)
máj 14 14:37:11 devstack1 heat-engine[12508]: ERROR heat.engine.resource   File 
"/usr/local/lib/python3.6/dist-packages/openstack/proxy.py", line 46, in check
máj 14 14:37:11 devstack1 heat-engine[12508]: ERROR heat.engine.resource 
return method(self, expected, actual, *args, **kwargs)
máj 14 14:37:11 devstack1 heat-engine[12508]: ERROR heat.engine.resource   File 
"/usr/local/lib/python3.6/dist-packages/openstack/network/v2/_proxy.py", line 
75, in _delete
máj 14 14:37:11 devstack1 heat-engine[12508]: ERROR heat.engine.resource rv 
= res.delete(self, if_revision=if_revision)
máj 14 14:37:11 devstack1 heat-engine[12508]: ERROR heat.engine.resource   File 
"/usr/local/lib/python3.6/dist-packages/openstack/resource.py", line 1615, in 
delete
máj 14 14:37:11 devstack1 heat-engine[12508]: ERROR heat.engine.resource 
self._translate_response(response, has_body=False, **kwargs)
máj 14 14:37:11 devstack1 heat-engine[12508]: ERROR heat.engine.resource   File 
"/usr/local/lib/python3.6/dist-packages/openstack/resource.py", line 1113, in 
_translate_response
máj 14 14:37:11 devstack1 heat-engine[12508]: ERROR heat.engine.resource 
exceptions.raise_from_response(response, error_message=error_message)
máj 14 14:37:11 devstack1 heat-engine[12508]: ERROR heat.engine.resource   File 
"/usr/local/lib/python3.6/dist-packages/openstack/exceptions.py", line 236, in 
raise_from_response
máj 14 14:37:11 devstack1 heat-engine[12508]: ERROR heat.engine.resource 
http_status=http_status, request_id=request_id
máj 14 14:37:11 devstack1 heat-engine[12508]: ERROR heat.engine.resource 
openstack.exceptions.ConflictException: ConflictException: 409: Client Error 
for url: 
http://192.168.122.246:9696/v2.0/segments/641c8c60-59c9-4972-bf82-3637f3e0f1cb, 
Segment '641c8c60-59c9-4972-bf82-3637f3e0f1cb' cannot be deleted: The segment 
is still bound with port(s) 8cf8f188-5ea4-41b0-aa3a-fb8a8802888d.
máj 14 14:37:11 devstack1 heat-engine[12508]: ERROR heat.engine.resource

# a few seconds later a second delete succeeds
$ openstack stack delete s0 --yes --wait
2020-05-14 14:24:26Z [s0]: DELETE_IN_PROGRESS  Stack DELETE started

I have an idea what the root cause is. I'll describe that in a comment.

** Affects: neutron
 Importance: Medium
 Assignee: Bence Romsics (bence-romsics)
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1878632

Title:
  Race condition in subnet and segment delete: The segment is still
  bound with port(s)

Status in neutron:
  New

Bug description:
  The HOT template below may expose a race condition and by that make
  stack deletion fail. On the neutron API this means that a segment
  delete fails with "The segment is still bound with port(s)". The
  reproduction uses a HOT template but I don't think this problem is
  Heat specific. Rather I think it depends on quick succession of API
  calls, which Heat does rather well.

  Configuration:

  ml2_conf.ini
  [ml2]
  mechanism_drivers = openvswitch,linuxbridge,sriovnicswitch,l2population
  tenant_network_types = vxlan,vlan
  [ml2_ty

[Yahoo-eng-team] [Bug 1871340] [NEW] neutron.tests.functional.agent.ovn.metadata.test_metadata_agent.TestMetadataAgent.test_agent_registration_at_chassis_create_event fails randomly

2020-04-07 Thread Bence Romsics
Public bug reported:

Seemingly starting from the 1st of April
neutron.tests.functional.agent.ovn.metadata.test_metadata_agent.TestMetadataAgent.test_agent_registration_at_chassis_create_event
fails randomly in the gate with the error message:

2020-04-06 08:55:57.302891 | controller | ==
2020-04-06 08:55:57.302931 | controller | Failed 1 tests - output below:
2020-04-06 08:55:57.302953 | controller | ==
2020-04-06 08:55:57.302972 | controller |
2020-04-06 08:55:57.302992 | controller | 
neutron.tests.functional.agent.ovn.metadata.test_metadata_agent.TestMetadataAgent.test_agent_registration_at_chassis_create_event
2020-04-06 08:55:57.303012 | controller | 
-
2020-04-06 08:55:57.303030 | controller |
2020-04-06 08:55:57.303050 | controller | Captured traceback:
2020-04-06 08:55:57.303069 | controller | ~~~
2020-04-06 08:55:57.303088 | controller | Traceback (most recent call last):
2020-04-06 08:55:57.303107 | controller |
2020-04-06 08:55:57.303126 | controller |   File 
"/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/base.py", line 182, 
in func
2020-04-06 08:55:57.303145 | controller | return f(self, *args, **kwargs)
2020-04-06 08:55:57.303164 | controller |
2020-04-06 08:55:57.303184 | controller |   File 
"/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/functional/agent/ovn/metadata/test_metadata_agent.py",
 line 220, in test_agent_registration_at_chassis_create_event
2020-04-06 08:55:57.303203 | controller | chassis.external_ids)
2020-04-06 08:55:57.303223 | controller |
2020-04-06 08:55:57.303242 | controller |   File 
"/home/zuul/src/opendev.org/openstack/neutron/.tox/dsvm-functional/lib/python3.6/site-packages/testtools/testcase.py",
 line 421, in assertIn
2020-04-06 08:55:57.303261 | controller | self.assertThat(haystack, 
Contains(needle), message)
2020-04-06 08:55:57.303281 | controller |
2020-04-06 08:55:57.303300 | controller |   File 
"/home/zuul/src/opendev.org/openstack/neutron/.tox/dsvm-functional/lib/python3.6/site-packages/testtools/testcase.py",
 line 502, in assertThat
2020-04-06 08:55:57.303319 | controller | raise mismatch_error
2020-04-06 08:55:57.303338 | controller |
2020-04-06 08:55:57.303357 | controller | 
testtools.matchers._impl.MismatchError: 'neutron:ovn-metadata-id' not in 
{'ovn-bridge-mappings': ''}

Example log:
https://99f8d9af3210ff587b09-7ad1a719016265adf2ccc36ef6645b87.ssl.cf2.rackcdn.com/702247/7/gate
/neutron-functional/f42992a/job-output.txt

Logstash:
http://logstash.openstack.org/#/dashboard/file/logstash.json?query=message:%5C%22neutron
:ovn-metadata-id%5C%22%20AND%20message:%5C%22ovn-bridge-
mappings%5C%22%20AND%20voting:1=864000s

** Affects: neutron
 Importance: Undecided
 Status: New


** Tags: gate-failure ovn

** Tags added: ovn

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1871340

Title:
  
neutron.tests.functional.agent.ovn.metadata.test_metadata_agent.TestMetadataAgent.test_agent_registration_at_chassis_create_event
  fails randomly

Status in neutron:
  New

Bug description:
  Seemingly starting from the 1st of April
  
neutron.tests.functional.agent.ovn.metadata.test_metadata_agent.TestMetadataAgent.test_agent_registration_at_chassis_create_event
  fails randomly in the gate with the error message:

  2020-04-06 08:55:57.302891 | controller | ==
  2020-04-06 08:55:57.302931 | controller | Failed 1 tests - output below:
  2020-04-06 08:55:57.302953 | controller | ==
  2020-04-06 08:55:57.302972 | controller |
  2020-04-06 08:55:57.302992 | controller | 
neutron.tests.functional.agent.ovn.metadata.test_metadata_agent.TestMetadataAgent.test_agent_registration_at_chassis_create_event
  2020-04-06 08:55:57.303012 | controller | 
-
  2020-04-06 08:55:57.303030 | controller |
  2020-04-06 08:55:57.303050 | controller | Captured traceback:
  2020-04-06 08:55:57.303069 | controller | ~~~
  2020-04-06 08:55:57.303088 | controller | Traceback (most recent call 
last):
  2020-04-06 08:55:57.303107 | controller |
  2020-04-06 08:55:57.303126 | controller |   File 
"/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/base.py", line 182, 
in func
  2020-04-06 08:55:57.303145 | controller | return f(self, *args, **kwargs)
  2020-04-06 08:55:57.303164 | controller |
  2020-04-06 08:55:57.303184 | controller |   File 
"/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/functional/agent/ovn/metadata/test_metadata_agent.py",
 line 220, in 

[Yahoo-eng-team] [Bug 1870110] [NEW] neutron-rally-task fails in rally_openstack.task.scenarios.neutron.trunk.CreateAndListTrunks

2020-04-01 Thread Bence Romsics
Public bug reported:

It seems we have a gate failure in neutron-rally-task. It fails in
rally_openstack.task.scenarios.neutron.trunk.CreateAndListTrunks. For
example:

https://zuul.opendev.org/t/openstack/build/9c9970da456d4145a174f73c90529dd2/log/job-output.txt#41274
https://zuul.opendev.org/t/openstack/build/8319cc946cc9407a90467f68757c11e8/log/job-output.txt#41269

** Affects: neutron
 Importance: Undecided
 Status: New


** Tags: gate-failure

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1870110

Title:
  neutron-rally-task fails in
  rally_openstack.task.scenarios.neutron.trunk.CreateAndListTrunks

Status in neutron:
  New

Bug description:
  It seems we have a gate failure in neutron-rally-task. It fails in
  rally_openstack.task.scenarios.neutron.trunk.CreateAndListTrunks. For
  example:

  
https://zuul.opendev.org/t/openstack/build/9c9970da456d4145a174f73c90529dd2/log/job-output.txt#41274
  
https://zuul.opendev.org/t/openstack/build/8319cc946cc9407a90467f68757c11e8/log/job-output.txt#41269

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1870110/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1866353] Re: Neutron API returning HTTP 201 for SG rule create when not fully created yet

2020-03-09 Thread Bence Romsics
** Changed in: neutron
   Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1866353

Title:
  Neutron API returning HTTP 201 for SG rule create  when not fully
  created yet

Status in neutron:
  Invalid

Bug description:
  Neutron API returns HTTP 201 (Created) for security group rule create
  requests, although it takes longer to apply the configuration to the
  port. This means for a period of time the firewall on the port is
  outdated, eventually posing a security risk or applications to
  fail/misbehave. Even though not tested, it might even be that the
  q-agent could completely miss the SG rule add event from the Neutron
  server and never apply it.

  The log below is of a security group rule create request from Octavia
  to Neutron. Neutron returns HTTP 201 but the q-agent has not yet
  applied the configuration. The Octavia tempest test expects the load
  balancer VIP to conform to the security group rules but fails as the
  q-agent still have not applied the new security group rule to the port
  yet.

  Mar 03 17:33:24.786466 ubuntu-bionic-airship-kna1-0014969351 
octavia-worker[8605]: DEBUG octavia.controller.worker.v1.controller_worker [-] 
Task 'octavia.controller.worker.v1.tasks.network_tasks.UpdateVIP' 
(10c8bae1-19b1-4757-9530-12ac29384565) transitioned into state 'RUNNING' from 
state 'PENDING' {{(pid=8984) _task_receiver 
/usr/local/lib/python3.6/dist-packages/taskflow/listeners/logging.py:194}}
  Mar 03 17:33:24.787574 ubuntu-bionic-airship-kna1-0014969351 
octavia-worker[8605]: DEBUG octavia.controller.worker.v1.tasks.network_tasks 
[None req-6bbb57f5-2a06-4e8e-9ddd-6da259333fd7 None None] Updating VIP of 
load_balancer 61145d72-04e1-49bd-bcb0-5c215ed217ea. {{(pid=8984) execute 
/opt/stack/octavia/octavia/controller/worker/v1/tasks/network_tasks.py:472}}
  Mar 03 17:33:24.805139 ubuntu-bionic-airship-kna1-0014969351 
octavia-worker[8605]: DEBUG octavia.network.drivers.neutron.base [None 
req-6bbb57f5-2a06-4e8e-9ddd-6da259333fd7 None None] Neutron extension 
security-group found enabled {{(pid=8984) _check_extension_enabled 
/opt/stack/octavia/octavia/network/drivers/neutron/base.py:66}}
  Mar 03 17:33:24.819184 ubuntu-bionic-airship-kna1-0014969351 
octavia-worker[8605]: DEBUG octavia.network.drivers.neutron.base [None 
req-6bbb57f5-2a06-4e8e-9ddd-6da259333fd7 None None] Neutron extension 
dns-integration is not enabled {{(pid=8984) _check_extension_enabled 
/opt/stack/octavia/octavia/network/drivers/neutron/base.py:70}}
  Mar 03 17:33:24.832337 ubuntu-bionic-airship-kna1-0014969351 
octavia-worker[8605]: DEBUG octavia.network.drivers.neutron.base [None 
req-6bbb57f5-2a06-4e8e-9ddd-6da259333fd7 None None] Neutron extension qos found 
enabled {{(pid=8984) _check_extension_enabled 
/opt/stack/octavia/octavia/network/drivers/neutron/base.py:66}}
  Mar 03 17:33:24.847909 ubuntu-bionic-airship-kna1-0014969351 
octavia-worker[8605]: DEBUG octavia.network.drivers.neutron.base [None 
req-6bbb57f5-2a06-4e8e-9ddd-6da259333fd7 None None] Neutron extension 
allowed-address-pairs found enabled {{(pid=8984) _check_extension_enabled 
/opt/stack/octavia/octavia/network/drivers/neutron/base.py:66}}
  Mar 03 17:33:25.221590 ubuntu-bionic-airship-kna1-0014969351 
neutron-server[7030]: INFO neutron.wsgi [None 
req-137e4288-fac0-490b-b828-8b43a94f675c admin admin] 10.0.1.16,10.0.1.16 "POST 
/v2.0/security-group-rules HTTP/1.1" status: 201  len: 725 time: 0.1413145
  Mar 03 17:33:25.224900 ubuntu-bionic-airship-kna1-0014969351 
octavia-worker[8605]: DEBUG octavia.controller.worker.v1.controller_worker [-] 
Task 'octavia.controller.worker.v1.tasks.network_tasks.UpdateVIP' 
(10c8bae1-19b1-4757-9530-12ac29384565) transitioned into state 'SUCCESS' from 
state 'RUNNING' with result 'None' {{(pid=8984) _task_receiver 
/usr/local/lib/python3.6/dist-packages/taskflow/listeners/logging.py:183}}
  Mar 03 17:33:25.224298 ubuntu-bionic-airship-kna1-0014969351 
neutron-openvswitch-agent[7528]: DEBUG neutron.agent.resource_cache [None 
req-137e4288-fac0-490b-b828-8b43a94f675c admin admin] Received new resource 
SecurityGroupRule: 
SecurityGroupRule(created_at=2020-03-03T17:33:25Z,description='',direction='ingress',ethertype='IPv4',id=73e2e34d-a813-4846-8f85-2b8daae5d29c,port_range_max=8080,port_range_min=8080,project_id='e821f6bae64f4fa0bca1c230fbf4b364',protocol='tcp',remote_group_id=,remote_ip_prefix=192.0.1.0/32,revision_number=0,security_group_id=14216a23-b9c5-4cb3-b42d-c76b22c643ec,updated_at=2020-03-03T17:33:25Z)
 {{(pid=7528) record_resource_update 
/opt/stack/neutron/neutron/agent/resource_cache.py:192}}
  Mar 03 17:33:25.224767 ubuntu-bionic-airship-kna1-0014969351 
neutron-openvswitch-agent[7528]: DEBUG neutron_lib.callbacks.manager [None 
req-137e4288-fac0-490b-b828-8b43a94f675c admin admin] Notify callbacks 

[Yahoo-eng-team] [Bug 1845575] Re: Networking Option 1: Provider networks in neutron

2019-10-07 Thread Bence Romsics
Please note that the following two lines are NOT the same, one config
option ends in urI the other ends in urL. In later versions keystone
folks renamed auth_uri to www_authenticate_uri so it's easier to
distinguish these config options. But in queens we have to live with
this.

auth_uri = http://controller:5000
auth_url = http://controller:5000

** Changed in: neutron
   Status: In Progress => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1845575

Title:
  Networking Option 1: Provider networks in neutron

Status in neutron:
  Invalid

Bug description:
  In part: Configure the server component¶

  auth_uri = http://controller:5000
  auth_url = http://controller:5000

  The two sentences are the same.

  This bug tracker is for errors with the documentation, use the
  following as a template and remove or add fields as you see fit.
  Convert [ ] into [x] to check boxes:

  - [ ] This doc is inaccurate in this way: __
  - [ ] This is a doc addition request.
  - [ ] I have a fix to the document that I can paste below including example: 
input and output. 

  If you have a troubleshooting or support issue, use the following
  resources:

   - Ask OpenStack: http://ask.openstack.org
   - The mailing list: http://lists.openstack.org
   - IRC: 'openstack' channel on Freenode

  ---
  Release: 12.1.1.dev43 on 2019-09-21 05:59
  SHA: b3d3d6d64358f6e8340bf0dbdff716968bf0d92c
  Source: 
https://git.openstack.org/cgit/openstack/neutron/tree/doc/source/install/controller-install-option1-ubuntu.rst
  URL: 
https://docs.openstack.org/neutron/queens/install/controller-install-option1-ubuntu.html

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1845575/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1836253] Re: Sometimes InstanceMetada API returns 404 due to invalid InstaceID returned by _get_instance_and_tenant_id()

2019-07-19 Thread Bence Romsics
I don't know when William will read my previous comment, but overall
what I found is this:

The cache of metadata-agent was designed to be invalidated by time-based
expiry. That method has the reported kind of side effect if a client is
too fast. Which is not perfect, but usually can be addressed by tweaking
the cache TTL and/or waiting more in the client.

A more correct cache invalidation is theoretically possible, but I think
it is not feasible, because it would introduce cross-dependencies
between metadata-agent and far-away parts of neutron.

Therefore I'm inclined to mark this bug report as Invalid (not a bug).
Let me know please if I missed something here.

** Changed in: neutron
   Status: Confirmed => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1836253

Title:
  Sometimes InstanceMetada API returns 404 due to invalid InstaceID
  returned by _get_instance_and_tenant_id()

Status in neutron:
  Invalid

Bug description:
  Sometimes on instance initialization, the metadata step fails.

  On metadata-agent.log there are lots  of 404:
  "GET /2009-04-04/meta-data/instance-id HTTP/1.1" status: 404  len: 297 time: 
0.0771070

  On nova-api.log we get 404 too:
  "GET /2009-04-04/meta-data/instance-id HTTP/1.1" status: 404

  After some debuging we get that problem occurs when new instances is getting 
same IP used by deleted instances.
  The problem is related to cache implementation on method 
"_get_ports_for_remote_address()" on "/neutron/agent/metadata/agent.py" that 
returns an port from deleted instance (with the same IP) which returns wrong 
InstanceID that will be sent to nova-api which will fail because this 
instanceId not exists.
  This problem only occurs with cache enabled on neuton metadata-agent.

  Version: Queens

  How to reproduce:
  ---
  #!/bin/bash

  computenodelist=(
    'computenode00.test.openstack.net'
    'computenode01.test.openstack.net'
    'computenode02.test.openstack.net'
    'computenode03.test.openstack.net'
  )

  validate_metadata(){
  cat << EOF > /tmp/metadata
  #!/bin/sh -x
  if curl 192.168.10.2
  then
   echo "ControllerNode00 - OK"
  else
   echo "ControllerNode00 - ERROR"
  fi
  EOF

    #SUBNAME=$(date +%s)
    openstack server delete "${node}" 2>/dev/null
    source /root/admin-openrc
    openstack server create --image cirros --nic net-id=internal --flavor 
Cirros --security-group default --user-data /tmp/metadata --availability-zone 
nova:${node} --wait "${node}" &> /dev/null

    i=0
    until [ $i -gt 3 ] || openstack console log show "${node}" | grep -q 
"ControllerNode00"
    do
  i=$((i+1))
  sleep 1
    done
    openstack console log show "${node}" | grep -q "ControllerNode00 - OK"
    if [ $? == 0 ]; then
  echo "Metadata Servers OK: ${node}"
    else
  echo "Metadata Servers ERROR: ${node}"
    fi

    rm /tmp/metadata
  }

  for node in ${computenodelist[@]}
  do
    export node
    validate_metadata
  done
  echo -e "\n"
  ---

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1836253/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1833674] [NEW] [RFE] Improve profiling of port binding and vif plugging

2019-06-21 Thread Bence Romsics
Public bug reported:

As discussed on the 2019-May PTG in Denver we want to measure then
improve the performance of Neutron's most important operation that is
port binding.

As we're working with OSProfiler reports we are realizing the report is
incomplete. We could turn on tracing in other components and
subcomponents by further propagating trace information.

We heavily build on some previous work:

* https://bugs.launchpad.net/neutron/+bug/1335640 [RFE] Neutron support for 
OSprofiler
* https://review.opendev.org/615350 Integrate rally with osprofiler

A few patches were already merged before opening this RFE:

* https://review.opendev.org/662804 Run nova's VM boot rally scenario in the 
neutron gate
* https://review.opendev.org/665614 Allow VM booting rally scenarios to time out

We already see the need for a few changes:

* New rally scenario to measure port binding
* Profiling coverage for vif plugging

This work is also driven by the discoveries made while interpreting
profiler reports so I expect further changes here and there.

** Affects: neutron
 Importance: Wishlist
 Assignee: Bence Romsics (bence-romsics)
 Status: In Progress


** Tags: osprofiler rfe

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1833674

Title:
  [RFE] Improve profiling of port binding and vif plugging

Status in neutron:
  In Progress

Bug description:
  As discussed on the 2019-May PTG in Denver we want to measure then
  improve the performance of Neutron's most important operation that is
  port binding.

  As we're working with OSProfiler reports we are realizing the report
  is incomplete. We could turn on tracing in other components and
  subcomponents by further propagating trace information.

  We heavily build on some previous work:

  * https://bugs.launchpad.net/neutron/+bug/1335640 [RFE] Neutron support for 
OSprofiler
  * https://review.opendev.org/615350 Integrate rally with osprofiler

  A few patches were already merged before opening this RFE:

  * https://review.opendev.org/662804 Run nova's VM boot rally scenario in the 
neutron gate
  * https://review.opendev.org/665614 Allow VM booting rally scenarios to time 
out

  We already see the need for a few changes:

  * New rally scenario to measure port binding
  * Profiling coverage for vif plugging

  This work is also driven by the discoveries made while interpreting
  profiler reports so I expect further changes here and there.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1833674/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1826396] [NEW] Atomic Extraroute API

2019-04-25 Thread Bence Romsics
Public bug reported:

As discussed in an openstack-disciss thread [1] we could improve the
extraroute API to better support Neutron API clients, especially Heat.

The problem is that the current extraroute API does not allow atomic
additions/deletions of particular routing table entries. In the current
API the routes attribute of a router (containing all routing table
entries) must be updated at once. Therefore additions and deletions
must be performed on the client side. Therefore multiple clients race
to update the routes attribute and updates may get lost.

A detailed spec is coming soon.

[1] http://lists.openstack.org/pipermail/openstack-
discuss/2019-April/005121.html1

** Affects: neutron
 Importance: Undecided
 Assignee: Bence Romsics (bence-romsics)
 Status: New


** Tags: rfe

** Summary changed:

- Add atomic extraroute API
+ Atomic Extraroute API

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1826396

Title:
  Atomic Extraroute API

Status in neutron:
  New

Bug description:
  As discussed in an openstack-disciss thread [1] we could improve the
  extraroute API to better support Neutron API clients, especially Heat.

  The problem is that the current extraroute API does not allow atomic
  additions/deletions of particular routing table entries. In the current
  API the routes attribute of a router (containing all routing table
  entries) must be updated at once. Therefore additions and deletions
  must be performed on the client side. Therefore multiple clients race
  to update the routes attribute and updates may get lost.

  A detailed spec is coming soon.

  [1] http://lists.openstack.org/pipermail/openstack-
  discuss/2019-April/005121.html1

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1826396/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1821948] [NEW] Unstable unit test uses subnet broadcast address

2019-03-27 Thread Bence Romsics
Public bug reported:

This is a low frequency gate failure in unit tests.

Example log:
http://logs.openstack.org/10/645210/4/check/openstack-tox-py37/688ffa8/job-output.txt.gz

Logstash search:
http://logstash.openstack.org/#/dashboard/file/logstash.json?query=message:%5C%22line%20171,%20in%20test_port_ip_update_revises%5C%22%20AND%20voting:1=864000s

2019-03-25 12:16:24.333688 | ubuntu-bionic | ==
2019-03-25 12:16:24.333764 | ubuntu-bionic | Failed 1 tests - output below:
2019-03-25 12:16:24.333837 | ubuntu-bionic | ==
2019-03-25 12:16:24.333863 | ubuntu-bionic |
2019-03-25 12:16:24.334052 | ubuntu-bionic | 
neutron.tests.unit.services.revisions.test_revision_plugin.TestRevisionPlugin.test_port_ip_update_revises
2019-03-25 12:16:24.334243 | ubuntu-bionic | 
-
2019-03-25 12:16:24.334271 | ubuntu-bionic |
2019-03-25 12:16:24.334326 | ubuntu-bionic | Captured traceback:
2019-03-25 12:16:24.334381 | ubuntu-bionic | ~~~
2019-03-25 12:16:24.334471 | ubuntu-bionic | b'Traceback (most recent call 
last):'
2019-03-25 12:16:24.334662 | ubuntu-bionic | b'  File 
"/home/zuul/src/git.openstack.org/openstack/neutron/neutron/tests/base.py", 
line 174, in func'
2019-03-25 12:16:24.334754 | ubuntu-bionic | b'return f(self, *args, 
**kwargs)'
2019-03-25 12:16:24.335103 | ubuntu-bionic | b'  File 
"/home/zuul/src/git.openstack.org/openstack/neutron/neutron/tests/unit/services/revisions/test_revision_plugin.py",
 line 171, in test_port_ip_update_revises'
2019-03-25 12:16:24.335243 | ubuntu-bionic | b"response = 
self._update('ports', port['port']['id'], new)"
2019-03-25 12:16:24.335490 | ubuntu-bionic | b'  File 
"/home/zuul/src/git.openstack.org/openstack/neutron/neutron/tests/unit/db/test_db_base_plugin_v2.py",
 line 601, in _update'
2019-03-25 12:16:24.335642 | ubuntu-bionic | b'
self.assertEqual(expected_code, res.status_int)'
2019-03-25 12:16:24.335921 | ubuntu-bionic | b'  File 
"/home/zuul/src/git.openstack.org/openstack/neutron/.tox/py37/lib/python3.7/site-packages/testtools/testcase.py",
 line 411, in assertEqual'
2019-03-25 12:16:24.336035 | ubuntu-bionic | b'
self.assertThat(observed, matcher, message)'
2019-03-25 12:16:24.336297 | ubuntu-bionic | b'  File 
"/home/zuul/src/git.openstack.org/openstack/neutron/.tox/py37/lib/python3.7/site-packages/testtools/testcase.py",
 line 498, in assertThat'
2019-03-25 12:16:24.336372 | ubuntu-bionic | b'raise mismatch_error'
2019-03-25 12:16:24.336486 | ubuntu-bionic | 
b'testtools.matchers._impl.MismatchError: 200 != 400'
2019-03-25 12:16:24.336523 | ubuntu-bionic | b''
2019-03-25 12:16:24.336549 | ubuntu-bionic |
2019-03-25 12:16:24.336599 | ubuntu-bionic | Captured stderr:
2019-03-25 12:16:24.336650 | ubuntu-bionic | 
2019-03-25 12:16:24.337086 | ubuntu-bionic | 
b'/home/zuul/src/git.openstack.org/openstack/neutron/.tox/py37/lib/python3.7/site-packages/neutron_lib/context.py:154:
 DeprecationWarning: context.session is used with and without new enginefacade. 
Please update the code to use new enginefacede consistently.'
2019-03-25 12:16:24.337157 | ubuntu-bionic | b'  DeprecationWarning)'
2019-03-25 12:16:24.337594 | ubuntu-bionic | 
b'/home/zuul/src/git.openstack.org/openstack/neutron/.tox/py37/lib/python3.7/site-packages/neutron_lib/context.py:154:
 DeprecationWarning: context.session is used with and without new enginefacade. 
Please update the code to use new enginefacede consistently.'
2019-03-25 12:16:24.337664 | ubuntu-bionic | b'  DeprecationWarning)'
2019-03-25 12:16:24.337701 | ubuntu-bionic | b''

With some extra debug logging added I managed to obtain this error
message:

ERROR [neutron.tests.unit.db.test_db_base_plugin_v2] XXX
b\'{"NeutronError": {"type": "InvalidIpForNetwork", "message": "IP
address 10.0.0.255 is not a valid IP for any of the subnets on the
specified network.", "detail": ""}}\'

Reading the unit test source it seems likely that a random IP+1 is
occasionally the subnet broadcast address which is invalid as a
fixed_ip.

https://opendev.org/openstack/neutron/src/commit/1ea9326fda303b48905d7f7748d320ba8e9322aa/neutron/tests/unit/services/revisions/test_revision_plugin.py#L169

I'm going to upload an attempted fix soon.

** Affects: neutron
 Importance: Medium
 Assignee: Bence Romsics (bence-romsics)
 Status: In Progress


** Tags: gate-failure

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1821948

Title:
  Unstable unit test uses subnet broadcast address

Status in neutron:
  In Progress

Bug description:
  Thi

[Yahoo-eng-team] [Bug 1821654] Re: Neutron Installation Prerequisites. The mysql command cannot execute without parameters

2019-03-26 Thread Bence Romsics
Since we have two contradicting bug reports over the preferred form I'm
marking this as Opinion.

** Changed in: neutron
   Status: New => Opinion

** Changed in: neutron
   Importance: Undecided => Wishlist

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1821654

Title:
  Neutron Installation Prerequisites. The mysql command cannot execute
  without parameters

Status in neutron:
  Opinion

Bug description:
  In the first step of the prerequisites 
(https://docs.openstack.org/neutron/rocky/install/controller-install-rdo.html#prerequisites)
 the first instruction is to connect to the DB Server. The documentation 
instructs to use command 
  # mysql

  The command cannot be run as instructed, it should instruct the using
  of parameters like in the installation of other services e.g.
  identity, compute:

  # mysql -u root -p


  This bug tracker is for errors with the documentation, use the
  following as a template and remove or add fields as you see fit.
  Convert [ ] into [x] to check boxes:

  - [X] This doc is inaccurate in this way: The command will not execute as 
instructed
  - [ ] This is a doc addition request.
  - [ ] I have a fix to the document that I can paste below including example: 
input and output. 

  If you have a troubleshooting or support issue, use the following
  resources:

   - Ask OpenStack: http://ask.openstack.org
   - The mailing list: http://lists.openstack.org
   - IRC: 'openstack' channel on Freenode

  ---
  Release: 13.0.3.dev77 on 2019-03-22 23:34
  SHA: cfb6e0eb72bcb12cdca76c0baf14df86bd95c272
  Source: 
https://git.openstack.org/cgit/openstack/neutron/tree/doc/source/install/controller-install-rdo.rst
  URL: 
https://docs.openstack.org/neutron/rocky/install/controller-install-rdo.html

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1821654/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1819029] [NEW] QoS policies with minimum-bandwidth rule should be rejected on non-physnet ports/networks

2019-03-07 Thread Bence Romsics
Public bug reported:

We seem to have forgot to reject some API operations that are actually
not supported (and weren't planned to be supported) by the Stein
implementation of the Guaranteed Minimum Bandwidth feature.

That is QoS policies with a minimum-bandwidth rule should not be used on
ports/networks that are not backed by a physnet. But currently we allow
this:

$ openstack network show private | egrep provider
| provider:network_type | vxlan
| provider:physical_network | None
| provider:segmentation_id  | 1062

$ openstack network qos policy create policy0
$ openstack network qos rule create policy0 --type minimum-bandwidth --min-kbps 
1000 --egress
$ openstack port create port0 --network private --qos-policy policy0

The port-create seems to work today, but on non-physnet networks there's
no guarantee at all (as planned in the blueprint). Therefore I think API
operations like these should be rejected now, otherwise we may set up
false expectations in our users.

** Affects: neutron
 Importance: Undecided
 Assignee: Bence Romsics (bence-romsics)
 Status: New


** Tags: qos stein-rc-potential

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1819029

Title:
  QoS policies with minimum-bandwidth rule should be rejected on non-
  physnet ports/networks

Status in neutron:
  New

Bug description:
  We seem to have forgot to reject some API operations that are actually
  not supported (and weren't planned to be supported) by the Stein
  implementation of the Guaranteed Minimum Bandwidth feature.

  That is QoS policies with a minimum-bandwidth rule should not be used
  on ports/networks that are not backed by a physnet. But currently we
  allow this:

  $ openstack network show private | egrep provider
  | provider:network_type | vxlan
  | provider:physical_network | None
  | provider:segmentation_id  | 1062

  $ openstack network qos policy create policy0
  $ openstack network qos rule create policy0 --type minimum-bandwidth 
--min-kbps 1000 --egress
  $ openstack port create port0 --network private --qos-policy policy0

  The port-create seems to work today, but on non-physnet networks
  there's no guarantee at all (as planned in the blueprint). Therefore I
  think API operations like these should be rejected now, otherwise we
  may set up false expectations in our users.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1819029/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1818683] [NEW] Placement reporter service plugin sometimes creates orphaned resource providers

2019-03-05 Thread Bence Romsics
Public bug reported:

As discovered by lajoskatona while working on a fullstack test
(https://review.openstack.org/631793) the placement reporter plugin may
create some of the neutron resource providers in the wrong resource
provider tree. For example consider:

$ openstack --os-placement-api-version 1.17 resource provider list
+--+--++--+--+
| uuid | name   
  | generation | root_provider_uuid   | parent_provider_uuid
 |
+--+--++--+--+
| 89ca1421-5117-5348-acab-6d0e2054239c | devstack0:Open vSwitch agent   
  |  0 | 89ca1421-5117-5348-acab-6d0e2054239c | None
 |
| 4a6f5f40-b7a1-5df4-9938-63983543f365 | devstack0:Open vSwitch 
agent:br-physnet0 |  2 | 89ca1421-5117-5348-acab-6d0e2054239c | 
89ca1421-5117-5348-acab-6d0e2054239c |
| 193134fd-464c-5545-9d20-df7d58c0166f | devstack0:Open vSwitch agent:br-ex 
  |  2 | 89ca1421-5117-5348-acab-6d0e2054239c | 
89ca1421-5117-5348-acab-6d0e2054239c |
| dbc498c7-8808-4f31-8abb-18560a4c3b53 | devstack0  
  |  2 | dbc498c7-8808-4f31-8abb-18560a4c3b53 | None
 |
| 4a8a819d-61f9-5822-8c5c-3e9c7cb942d6 | devstack0:NIC Switch agent 
  |  0 | dbc498c7-8808-4f31-8abb-18560a4c3b53 | 
dbc498c7-8808-4f31-8abb-18560a4c3b53 |
| 1c7e83f0-108d-5c35-ada7-7ebebbe43aad | devstack0:NIC Switch agent:ens5
  |  2 | dbc498c7-8808-4f31-8abb-18560a4c3b53 | 
4a8a819d-61f9-5822-8c5c-3e9c7cb942d6 |
+--+--++--+--+

Please note that all RPs should have the root_provider_uuid set to the
devstack0 RP's uuid, but the open vswitch RPs have a different (wrong)
root. And 'devstack0:Open vSwitch agent' has no parent.

This situation is dependent on service startup order. The ovs RPs were
created before the compute host RP. That case should have been detected
as an error, but it was not.

I'll upload a proposed fix right away.

** Affects: neutron
 Importance: Undecided
 Assignee: Bence Romsics (bence-romsics)
 Status: New


** Tags: qos

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1818683

Title:
  Placement reporter service plugin sometimes creates orphaned resource
  providers

Status in neutron:
  New

Bug description:
  As discovered by lajoskatona while working on a fullstack test
  (https://review.openstack.org/631793) the placement reporter plugin
  may create some of the neutron resource providers in the wrong
  resource provider tree. For example consider:

  $ openstack --os-placement-api-version 1.17 resource provider list
  
+--+--++--+--+
  | uuid | name 
| generation | root_provider_uuid   | parent_provider_uuid  
   |
  
+--+--++--+--+
  | 89ca1421-5117-5348-acab-6d0e2054239c | devstack0:Open vSwitch agent 
|  0 | 89ca1421-5117-5348-acab-6d0e2054239c | None  
   |
  | 4a6f5f40-b7a1-5df4-9938-63983543f365 | devstack0:Open vSwitch 
agent:br-physnet0 |  2 | 89ca1421-5117-5348-acab-6d0e2054239c | 
89ca1421-5117-5348-acab-6d0e2054239c |
  | 193134fd-464c-5545-9d20-df7d58c0166f | devstack0:Open vSwitch agent:br-ex   
|  2 | 89ca1421-5117-5348-acab-6d0e2054239c | 
89ca1421-5117-5348-acab-6d0e2054239c |
  | dbc498c7-8808-4f31-8abb-18560a4c3b53 | devstack0
|  2 | dbc498c7-8808-4f31-8abb-18560a4c3b53 | None  
   |
  | 4a8a819d-61f9-5822-8c5c-3e9c7cb942d6 | devstack0:NIC Switch agent   
|  0 | dbc498c7-8808-4f31-8abb-18560a4c3b53 | 
dbc498c7-8808-4f31-8abb-18560a4c3b53 |
  | 1c7e83f0-108d-5c35-ada7-7ebebbe43aad | devstack0:NIC Switch agent:ens5  
|  2 | dbc498c7-8808-4f31-8abb-18560a4c3b53 | 
4a8a819d-61f9-5822-8c5c-3e9c7cb942d6

[Yahoo-eng-team] [Bug 1818479] [NEW] RFE Decouple placement reporting service plugin from ML2

2019-03-03 Thread Bence Romsics
Public bug reported:

This RFE tracks an improvement to the placement reporter service plugin
that was suggested just a few days before the Stein feature freeze, so
instead of working on it right there, this is delayed to the Train
cycle. The original code review comment:

https://review.openstack.org/#/c/580672/30/neutron/services/placement_report/plugin.py@187

The placement reporter service plugin as merged in Stein depends on ML2.
The improvement idea is to decouple it, by a driver pattern as in the
qos service plugin. We need to investigate the costs and benefits of
this refactoring and if it's feasible implement it in Train.

** Affects: neutron
 Importance: Undecided
 Assignee: Bence Romsics (bence-romsics)
 Status: New


** Tags: qos rfe

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1818479

Title:
  RFE Decouple placement reporting service plugin from ML2

Status in neutron:
  New

Bug description:
  This RFE tracks an improvement to the placement reporter service
  plugin that was suggested just a few days before the Stein feature
  freeze, so instead of working on it right there, this is delayed to
  the Train cycle. The original code review comment:

  
https://review.openstack.org/#/c/580672/30/neutron/services/placement_report/plugin.py@187

  The placement reporter service plugin as merged in Stein depends on
  ML2. The improvement idea is to decouple it, by a driver pattern as in
  the qos service plugin. We need to investigate the costs and benefits
  of this refactoring and if it's feasible implement it in Train.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1818479/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1815618] [NEW] cannot update qos rule

2019-02-12 Thread Bence Romsics
ck_rules_conflict(policy, rule)
febr 12 12:20:19 devstack0 neutron-server[31565]: ERROR neutron.api.v2.resource 
  File "/opt/stack/neutron/neutron/objects/qos/qos_policy_validator.py", line 
63, in check_rules_conflict
febr 12 12:20:19 devstack0 neutron-server[31565]: ERROR neutron.api.v2.resource 
if rule.duplicates(rule_obj):
febr 12 12:20:19 devstack0 neutron-server[31565]: ERROR neutron.api.v2.resource 
  File "/opt/stack/neutron/neutron/objects/qos/rule.py", line 83, in duplicates
febr 12 12:20:19 devstack0 neutron-server[31565]: ERROR neutron.api.v2.resource 
if getattr(self, field) != getattr(other_rule, field):
febr 12 12:20:19 devstack0 neutron-server[31565]: ERROR neutron.api.v2.resource 
  File "/usr/local/lib/python2.7/dist-packages/oslo_versionedobjects/base.py", 
line 68, in getter
febr 12 12:20:19 devstack0 neutron-server[31565]: ERROR neutron.api.v2.resource 
return getattr(self, attrname)
febr 12 12:20:19 devstack0 neutron-server[31565]: ERROR neutron.api.v2.resource 
AttributeError: 'QosMinimumBandwidthRule' object has no attribute 
'_obj_direction'
febr 12 12:20:19 devstack0 neutron-server[31565]: ERROR neutron.api.v2.resource·

The version used to reproduce the bug:

neutron 2f3cc51784
neutron-lib aceb7c50ed
devstack ee4b6a01
python-openstackclient dcff1012fd
python-neutronclient d74b871f7fe
openstacksdk==0.23.0
osc-lib==1.12.0

I'll work on fixing these problems.

** Affects: neutron
 Importance: Undecided
 Assignee: Bence Romsics (bence-romsics)
 Status: New


** Tags: low-hanging-fruit qos

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1815618

Title:
  cannot update qos rule

Status in neutron:
  New

Bug description:
  This bug seems to be combination of problems on both client and server
  sides. So we may need to add pyhton-neutronclient and/or python-
  openstackclient as an affected component. I'll do that as soon as I
  manage to locate which one contains the client side bug. But this
  report will be good to track the overall problem.

  First the reproduction:

  openstack network qos policy create policy0

  openstack network qos rule create policy0 --type minimum-bandwidth --min-kbps 
1000 --egress # 71a84995-cccd-4f09-9c3d-b1caa18ff363
  openstack network qos rule set policy0 71a84995-cccd-4f09-9c3d-b1caa18ff363 
--min-kbps 1001 --egress
  -> works as expected

  # make sure we only have one rule of the type
  openstack network qos rule delete policy0 71a84995-cccd-4f09-9c3d-b1caa18ff363

  openstack network qos rule create policy0 --type minimum-bandwidth --min-kbps 
1000 --ingress # 1155c1c8-f9a7-4954-b195-9f58c8e18b4d
  openstack network qos rule set policy0 1155c1c8-f9a7-4954-b195-9f58c8e18b4d 
--min-kbps 1001 --ingress
  -> works as expected

  openstack network qos rule delete policy0
  1155c1c8-f9a7-4954-b195-9f58c8e18b4d

  # create the ingress/egress pair at once
  openstack network qos rule create policy0 --type minimum-bandwidth --min-kbps 
1000 --egress # f392837a-09e2-4b5e-8c29-86670797679e
  openstack network qos rule create policy0 --type minimum-bandwidth --min-kbps 
1000 --ingress # 77dae223-b787-4943-bb45-c42424fd29ec

  # This is the bug. As we'll see later the trigger is a client-side problem, 
but I don't think neutron-server should return 500 Internal Server Error. The 
malformed input should be caught earlier and a 4xx response should be given.
  openstack network qos rule set policy0 f392837a-09e2-4b5e-8c29-86670797679e 
--min-kbps 1001 --egress
  Failed to set Network QoS rule ID "f392837a-09e2-4b5e-8c29-86670797679e": 
HttpException: 500: Server Error for url: 
http://100.109.0.20:9696/v2.0/qos/policies/188a2f59-ab90-41a3-9e6f-58e641a34544/minimum_bandwidth_rules/f392837a-09e2-4b5e-8c29-86670797679e,
 Request Failed: internal server error while processing your request.

  openstack network qos rule set policy0 77dae223-b787-4943-bb45-c42424fd29ec 
--min-kbps 1001 --ingress
  Failed to set Network QoS rule ID "77dae223-b787-4943-bb45-c42424fd29ec": 
HttpException: 500: Server Error for url: 
http://100.109.0.20:9696/v2.0/qos/policies/188a2f59-ab90-41a3-9e6f-58e641a34544/minimum_bandwidth_rules/77dae223-b787-4943-bb45-c42424fd29ec,
 Request Failed: internal server error while processing your request.

  # the same rule update can be done by neutronclient, but only for the egress 
direction
  neutron qos-minimum-bandwidth-rule-update 
f392837a-09e2-4b5e-8c29-86670797679e policy0 --min-kbps 1001 --direction egress
  -> works as expected

  # this failure is expected because neutronclient was long deprecated already 
when the ingress direction was introduced
  neutron qos-minimum-bandwidth-rule-update 
77dae223-b787-4943-bb45-c42424fd29ec policy0 --min-kbps 1001 --direction ingress
  neutron qos-minimum-bandwidth-rule-update: error: argument --direction: 

[Yahoo-eng-team] [Bug 1749404] [NEW] nova-compute resource tracker ignores 'reserved' while reporting 'max_unit'

2018-02-14 Thread Bence Romsics
Public bug reported:

The following inventory was reported after a fresh devstack build:

curl --silent \
--header "Accept: application/json" \
--header "Content-Type: application/json" \
--header "OpenStack-API-Version: placement latest" \
--header "X-Auth-Token: ${TOKEN:?}" \
-X GET 
http://127.0.0.1/placement/resource_providers/8d4d7926-df76-42e5-b5da-67893468f5cb/inventories
 | json_pp
{
   "resource_provider_generation" : 1,
   "inventories" : {
  "DISK_GB" : {
 "max_unit" : 19,
 "min_unit" : 1,
 "allocation_ratio" : 1,
 "step_size" : 1,
 "reserved" : 0,
 "total" : 19
  },
  "MEMORY_MB" : {
 "allocation_ratio" : 1.5,
 "max_unit" : 5967,
 "min_unit" : 1,
 "reserved" : 512,
 "step_size" : 1,
 "total" : 5967
  },
  "VCPU" : {
 "allocation_ratio" : 16,
 "min_unit" : 1,
 "max_unit" : 2,
 "reserved" : 0,
 "step_size" : 1,
 "total" : 2
  }
   }
}

IMO the correct max_unit value of the MEMORY_MB resource would be (total
- reserved). But today it equals the total value.

nova commit: 9e9b3e1
devstack commit: fbdefac
devstack config: ENABLED_SERVICES+=,placement-api,placement-client

** Affects: nova
 Importance: Undecided
 Status: New


** Tags: low-hanging-fruit placement

** Description changed:

  The following inventory was reported after a fresh devstack build:
  
- curl --silent --header "Accept: application/json" --header "Content-Type: 
application/json" --header "OpenStack-API-Version: placement latest" --header 
"X-Auth-Token: ${TOKEN:?}" -X GET 
http://127.0.0.1/placement/resource_providers/8d4d7926-df76-42e5-b5da-67893468f5cb/inventories
 | json_pp
+ curl --silent \
+ --header "Accept: application/json" \
+ --header "Content-Type: application/json" \
+ --header "OpenStack-API-Version: placement latest" \
+ --header "X-Auth-Token: ${TOKEN:?}" \
+ -X GET 
http://127.0.0.1/placement/resource_providers/8d4d7926-df76-42e5-b5da-67893468f5cb/inventories
 | json_pp
  {
-"resource_provider_generation" : 1,
-"inventories" : {
-   "DISK_GB" : {
-  "max_unit" : 19,
-  "min_unit" : 1,
-  "allocation_ratio" : 1,
-  "step_size" : 1,
-  "reserved" : 0,
-  "total" : 19
-   },
-   "MEMORY_MB" : {
-  "allocation_ratio" : 1.5,
-  "max_unit" : 5967,
-  "min_unit" : 1,
-  "reserved" : 512,
-  "step_size" : 1,
-  "total" : 5967
-   },
-   "VCPU" : {
-  "allocation_ratio" : 16,
-  "min_unit" : 1,
-  "max_unit" : 2,
-  "reserved" : 0,
-  "step_size" : 1,
-  "total" : 2
-   }
-}
+    "resource_provider_generation" : 1,
+    "inventories" : {
+   "DISK_GB" : {
+  "max_unit" : 19,
+  "min_unit" : 1,
+  "allocation_ratio" : 1,
+  "step_size" : 1,
+  "reserved" : 0,
+  "total" : 19
+   },
+   "MEMORY_MB" : {
+  "allocation_ratio" : 1.5,
+  "max_unit" : 5967,
+  "min_unit" : 1,
+  "reserved" : 512,
+  "step_size" : 1,
+  "total" : 5967
+   },
+   "VCPU" : {
+  "allocation_ratio" : 16,
+  "min_unit" : 1,
+  "max_unit" : 2,
+  "reserved" : 0,
+  "step_size" : 1,
+  "total" : 2
+   }
+    }
  }
  
  IMO the correct max_unit value of the MEMORY_MB resource would be (total
  - reserved). But today it equals the total value.
  
  nova commit: 9e9b3e1
  devstack commit: fbdefac
  devstack config: ENABLED_SERVICES+=,placement-api,placement-client

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1749404

Title:
  nova-compute resource tracker ignores 'reserved' while reporting
  'max_unit'

Status in OpenStack Compute (nova):
  New

Bug description:
  The following inventory was reported after a fresh devstack build:

  curl --silent \
  --header "Accept: application/json" \
  --header "Content-Type: application/json" \
  --header "OpenStack-API-Version: placement latest" \
  --header "X-Auth-Token: ${TOKEN:?}" \
  -X GET 
http://127.0.0.1/placement/resource_providers/8d4d7926-df76-42e5-b5da-67893468f5cb/inventories
 | json_pp
  {
     "resource_provider_generation" : 1,
     "inventories" : {
    "DISK_GB" : {
   "max_unit" : 19,
   "min_unit" : 1,
   "allocation_ratio" : 1,
   "step_size" : 1,
   "reserved" : 0,
   "total" : 19
    },
    "MEMORY_MB" : {
   "allocation_ratio" : 1.5,
   "max_unit" : 5967,
   "min_unit" : 1,
   "reserved" : 512,
   "step_size" : 1,
   "total" : 5967
    },
    "VCPU" : {
   

[Yahoo-eng-team] [Bug 1749410] [NEW] placement api-ref unclear if capacity is meant to be total or current

2018-02-14 Thread Bence Romsics
Public bug reported:

While exploring the newer microversions (here 1.4) of the placement API
I found this part of the API reference unclear to me
(https://developer.openstack.org/api-ref/placement/#list-resource-
providers, 'resources' parameter):

"A comma-separated list of strings indicating an amount of resource of a
specified class that a provider must have the capacity to serve:"

Based on the reference I cannot tell if the capacity is meant to be
total or current (ie. total - current allocations).

Running a few queries it seems to me the actual behavior is to filter on
total capacity. If that was the intended behavior then this report is
just a tiny documentation bug I guess.

https://github.com/openstack/nova/blob/17.0.0.0rc1/placement-api-
ref/source/parameters.yaml#L105

** Affects: nova
 Importance: Undecided
 Status: New


** Tags: doc placement

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1749410

Title:
  placement api-ref unclear if capacity is meant to be total or current

Status in OpenStack Compute (nova):
  New

Bug description:
  While exploring the newer microversions (here 1.4) of the placement
  API I found this part of the API reference unclear to me
  (https://developer.openstack.org/api-ref/placement/#list-resource-
  providers, 'resources' parameter):

  "A comma-separated list of strings indicating an amount of resource of
  a specified class that a provider must have the capacity to serve:"

  Based on the reference I cannot tell if the capacity is meant to be
  total or current (ie. total - current allocations).

  Running a few queries it seems to me the actual behavior is to filter
  on total capacity. If that was the intended behavior then this report
  is just a tiny documentation bug I guess.

  https://github.com/openstack/nova/blob/17.0.0.0rc1/placement-api-
  ref/source/parameters.yaml#L105

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1749410/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1708444] [NEW] Angular role table stays stale after editing a role

2017-08-03 Thread Bence Romsics
Public bug reported:

In the angularized role panel if I edit a role (eg. change its name) the
actual update happens in Keystone, but the role table is not refreshed
and shows the old state until I reload the page.

devstack b79531a
horizon 53dd2db

ANGULAR_FEATURES={
'roles_panel': True,
...
}

A proposed fix is on the way.

** Affects: horizon
 Importance: Undecided
 Assignee: Bence Romsics (bence-romsics)
 Status: In Progress


** Tags: angularjs

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Dashboard (Horizon).
https://bugs.launchpad.net/bugs/1708444

Title:
  Angular role table stays stale after editing a role

Status in OpenStack Dashboard (Horizon):
  In Progress

Bug description:
  In the angularized role panel if I edit a role (eg. change its name)
  the actual update happens in Keystone, but the role table is not
  refreshed and shows the old state until I reload the page.

  devstack b79531a
  horizon 53dd2db

  ANGULAR_FEATURES={
  'roles_panel': True,
  ...
  }

  A proposed fix is on the way.

To manage notifications about this bug go to:
https://bugs.launchpad.net/horizon/+bug/1708444/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1704118] [NEW] Spinner is stuck when deleting image in angularized panel

2017-07-13 Thread Bence Romsics
Public bug reported:

Reproduction:

 local_settings:
 ANGULAR_FEATURES={
 'images_panel': True,
 ...
 }

devstack commit b79531a9f96736225a8991052a0be5767c217377
horizon commit d5779eae0ad267533001cb7dae6ca7dbc5becb27

Go to detail page of an image eg: /ngdetails/OS::Glance::Image/90ccb1bf-
1feb-4f49-8234-c6812c952131

Click delete image. After that the image is deleted though multiple UI
errors can be seen:

1) The 'Please wait' spinner is stuck forever
2) A red toast is displayed: Error: Unable to retrieve the image
3) In the javascript console this error appears:
GET 
http://127.0.0.1:9000/api/glance/images/90ccb1bf-1feb-4f49-8234-c6812c952131/ 
404 (Not Found)

** Affects: horizon
 Importance: Undecided
 Status: New


** Tags: angularjs glance low-hanging-fruit

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Dashboard (Horizon).
https://bugs.launchpad.net/bugs/1704118

Title:
  Spinner is stuck when deleting image in angularized panel

Status in OpenStack Dashboard (Horizon):
  New

Bug description:
  Reproduction:

   local_settings:
   ANGULAR_FEATURES={
   'images_panel': True,
   ...
   }

  devstack commit b79531a9f96736225a8991052a0be5767c217377
  horizon commit d5779eae0ad267533001cb7dae6ca7dbc5becb27

  Go to detail page of an image eg: /ngdetails/OS::Glance::Image
  /90ccb1bf-1feb-4f49-8234-c6812c952131

  Click delete image. After that the image is deleted though multiple UI
  errors can be seen:

  1) The 'Please wait' spinner is stuck forever
  2) A red toast is displayed: Error: Unable to retrieve the image
  3) In the javascript console this error appears:
  GET 
http://127.0.0.1:9000/api/glance/images/90ccb1bf-1feb-4f49-8234-c6812c952131/ 
404 (Not Found)

To manage notifications about this bug go to:
https://bugs.launchpad.net/horizon/+bug/1704118/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1699516] [NEW] Trunk create fails due to case typo

2017-06-21 Thread Bence Romsics
Public bug reported:

When you boot a vm with a trunk using the ovs trunk driver the boot
fails in allocating the network. While you get this ovs-agent error log:

neutron-openvswitch-agent[12170]: CallbackFailure: Callback
neutron.services.trunk.drivers.openvswitch.agent.driver
.OVSTrunkSkeleton.check_trunk_dependencies-1030432 failed with "no such
option securitygroup in group [DEFAULT]"

The cause looks like to be a case typo in the fix of bug #1669074.

neutron/services/trunk/drivers/openvswitch/agent/driver.py:
wrong: cfg.CONF.securitygroup.firewall_driver
right: cfg.CONF.SECURITYGROUP.firewall_driver

** Affects: neutron
 Importance: Undecided
 Status: New


** Tags: trunk

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1699516

Title:
  Trunk create fails due to case typo

Status in neutron:
  New

Bug description:
  When you boot a vm with a trunk using the ovs trunk driver the boot
  fails in allocating the network. While you get this ovs-agent error
  log:

  neutron-openvswitch-agent[12170]: CallbackFailure: Callback
  neutron.services.trunk.drivers.openvswitch.agent.driver
  .OVSTrunkSkeleton.check_trunk_dependencies-1030432 failed with "no
  such option securitygroup in group [DEFAULT]"

  The cause looks like to be a case typo in the fix of bug #1669074.

  neutron/services/trunk/drivers/openvswitch/agent/driver.py:
  wrong: cfg.CONF.securitygroup.firewall_driver
  right: cfg.CONF.SECURITYGROUP.firewall_driver

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1699516/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1631371] [NEW] [RFE] Expose trunk details over metadata API

2016-10-07 Thread Bence Romsics
Public bug reported:

Enable bringup of subports via exposing trunk/subport details over
the metadata API

With the completion of the trunk port feature in Newton (Neutron
bp/vlan-aware-vms [1]), trunk and subports are now available. But the
bringup of the subports' VLAN interfaces inside an instance is not
automatic. In Newton there's no easy way to pass information about
the subports to the guest operating system. But using the metadata
API we can change this.

Problem Description
---

To bring up (and/or tear down) a subport the guest OS

(a) must know the segmentation-type and segmentation-id of a subport
as set in 'openstack network trunk create/set --subport'

(b) must know the MAC address of a subport
as set in 'openstack port create'

(c) must know which vNIC the subport belongs to

(d) may need to know when were subports added or removed
(if they are added or removed during the lifetime of an instance)

Since subports do not have a corresponding vNIC, the approach used
for regular ports (with a vNIC) cannot work.

This write-up addresses problems (a), (b) and (c), but not (d).

Proposed Change
---

Here we propose a change involving both Nova and Neutron to expose
the information needed via the metadata API.

Information covering (a) and (b) is already available (read-only)
in the 'trunk_details' attribute of the trunk parent port (ie. the
port which the instance was booted with). [2]

We propose to use the MAC address of the trunk parent port to cover
(c). We recognize this may occasionally be problematic, because MAC
addresses (of ports belonging to different neutron networks) are not
guaranteed to be unique, therefore collision may happen. But this seems
to be a small price for avoiding the complexity of other solutions.

The mechanism would be the following. Let's suppose we have port0
which is a trunk parent port and instance0 was booted with '--nic
port-id=port0'. On every update of port0's trunk_details Neutron
constructs the following JSON structure:

PORT0-DETAILS = {
"mac_address": PORT0-MAC-ADDRESS, "trunk_details":
PORT0-TRUNK-DETAILS
}

Then Neutron sets a metadata key-value pair of instance0, equivalent
to the following nova command:

nova meta set instance0 trunk_details::PORT0-MAC-ADDRESS=PORT0-DETAILS

Nova in Newton limits meta values to <= 255 characters, this limit
must be raised. Assuming the current format of trunk_details roughly
150 characters/subport are needed. Alternatively meta values could
have unlimited length - at least for the service tenant used by
Neutron. (Though tenant-specific API validators may not be a good
idea.) The 'values' column of the the 'instance_metadata' table should
be altered from VARCHAR(255) to TEXT() in a Nova DB migration.
(A slightly related bug report: [3])

A program could read
http://169.254.169.254/openstack/2016-06-30/meta_data.json and
bring up the subport VLAN interfaces accordingly. This program is
not covered here, however it is worth pointing out that it could be
called by cloud-init.

Alternatives


(1) The MAC address of a parent port can be reused for all its child
ports (when creating the child ports). Then VLAN subinterfaces
of a network interface will have the correct MAC address by
default. Segmentation type and ID can be shared in other ways, for
example as a VLAN plan embedded into a golden image. This approach
could even partially solve problem (d), however it cannot solve problem
(a) in the dynamic case. Use of this approach is currently blocked
by an openvswitch firewall driver bug. [4][5]

(2) Generate and inject a subport bringup script into the instance
as user data. Cannot handle subports added or removed after VM boot.

(3) An alternative solution to problem (c) could rely on the
preservation of ordering between NICs passed to nova boot and NICs
inside an instance. However this would turn the update of trunk_details
into an instance-level operation instead of the port-level operation
proposed here. Plus it would fail if this ordering is ever lost.

References
--

[1] https://blueprints.launchpad.net/neutron/+spec/vlan-aware-vms
[2] 
https://review.openstack.org/#q,Id23ce8fc16c6ea6a405cb8febf8470a5bf3bcb43,n,z
[3] https://bugs.launchpad.net/nova/+bug/1117923
[4] https://bugs.launchpad.net/neutron/+bug/1626010
[5] https://bugs.launchpad.net/neutron/+bug/1593760

** Affects: neutron
 Importance: Undecided
 Status: New


** Tags: rfe trunk

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1631371

Title:
  [RFE] Expose trunk details over metadata API

Status in neutron:
  New

Bug description:
  Enable bringup of subports via exposing trunk/subport details over
  the metadata API

  With the completion of the trunk port feature in Newton (Neutron
  bp/vlan-aware-vms [1]), trunk and subports are now available. But the
  bringup of the 

[Yahoo-eng-team] [Bug 1630920] [NEW] native/idl ovsdb driver loses some ovsdb transactions

2016-10-06 Thread Bence Romsics
Public bug reported:

It seems the 'native' and the 'vsctl' ovsdb drivers behave differently.
The native/idl driver seems to lose some ovsdb transactions, at least
the transactions setting the 'other_config' ovs port attribute.

I have written about this in a comment of an earlier bug report
(https://bugs.launchpad.net/neutron/+bug/1626010). But I opened this new
bug report because the two problems seem to be independent and that
other comment may have gone unnoticed.

It is not completely clear to me what difference this causes in user-
observable behavior. I think it at least leads to losing information
about which conntrack zone to use in the openvswitch firewall driver.
See here:

https://github.com/openstack/neutron/blob/3ade301/neutron/agent/linux/openvswitch_firewall/firewall.py#L257

The details:

If I use the vsctl ovsdb driver:

ml2_conf.ini:
[ovs]
ovsdb_interface = vsctl

then I see this:

$ > /opt/stack/logs/q-agt.log
$ sudo ovs-vsctl list Port | grep other_config | grep -c net_uuid
1
$ openstack server create --flavor cirros256 --image cirros-0.3.4-x86_64-uec 
--nic net-id=net0 --wait vm0
$ sudo ovs-vsctl list Port | grep other_config | grep -c net_uuid
2
$ openstack server delete vm0
$ sleep 3
$ sudo ovs-vsctl list Port | grep other_config | grep -c net_uuid
1
$ egrep -c 'Transaction caused no change' /opt/stack/logs/q-agt.log 
0

But if I use the (default) native driver:

ml2_conf.ini:
[ovs]
ovsdb_interface = native

Then this happens:

$ > /opt/stack/logs/q-agt.log
$ sudo ovs-vsctl list Port | grep other_config | grep -c net_uuid
1
$ openstack server create --flavor cirros256 --image cirros-0.3.4-x86_64-uec 
--nic net-id=net0 --wait vm0
$ sudo ovs-vsctl list Port | grep other_config | grep -c net_uuid
1
$ openstack server delete vm0
$ sleep 3
$ sudo ovs-vsctl list Port | grep other_config | grep -c net_uuid
1
$ egrep -c 'Transaction caused no change' /opt/stack/logs/q-agt.log
22

A sample log message from q-agt.log:

2016-10-06 09:23:05.447 DEBUG neutron.agent.ovsdb.impl_idl [-] Running txn 
command(idx=0): DbSetCommand(table=Port, col_values=(('other_config', {'tag': 
1}),), record=tap8e2a390d-63) from (pid=6068) do_commit 
/opt/stack/neutron/neutron/agent/ovsdb/impl_idl.py:99
2016-10-06 09:23:05.448 DEBUG neutron.agent.ovsdb.impl_idl [-] Transaction 
caused no change from (pid=6068) do_commit 
/opt/stack/neutron/neutron/agent/ovsdb/impl_idl.py:126

devstack version: 563d377
neutron version: 3ade301

** Affects: neutron
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1630920

Title:
  native/idl ovsdb driver loses some ovsdb transactions

Status in neutron:
  New

Bug description:
  It seems the 'native' and the 'vsctl' ovsdb drivers behave
  differently. The native/idl driver seems to lose some ovsdb
  transactions, at least the transactions setting the 'other_config' ovs
  port attribute.

  I have written about this in a comment of an earlier bug report
  (https://bugs.launchpad.net/neutron/+bug/1626010). But I opened this
  new bug report because the two problems seem to be independent and
  that other comment may have gone unnoticed.

  It is not completely clear to me what difference this causes in user-
  observable behavior. I think it at least leads to losing information
  about which conntrack zone to use in the openvswitch firewall driver.
  See here:

  
https://github.com/openstack/neutron/blob/3ade301/neutron/agent/linux/openvswitch_firewall/firewall.py#L257

  The details:

  If I use the vsctl ovsdb driver:

  ml2_conf.ini:
  [ovs]
  ovsdb_interface = vsctl

  then I see this:

  $ > /opt/stack/logs/q-agt.log
  $ sudo ovs-vsctl list Port | grep other_config | grep -c net_uuid
  1
  $ openstack server create --flavor cirros256 --image cirros-0.3.4-x86_64-uec 
--nic net-id=net0 --wait vm0
  $ sudo ovs-vsctl list Port | grep other_config | grep -c net_uuid
  2
  $ openstack server delete vm0
  $ sleep 3
  $ sudo ovs-vsctl list Port | grep other_config | grep -c net_uuid
  1
  $ egrep -c 'Transaction caused no change' /opt/stack/logs/q-agt.log 
  0

  But if I use the (default) native driver:

  ml2_conf.ini:
  [ovs]
  ovsdb_interface = native

  Then this happens:

  $ > /opt/stack/logs/q-agt.log
  $ sudo ovs-vsctl list Port | grep other_config | grep -c net_uuid
  1
  $ openstack server create --flavor cirros256 --image cirros-0.3.4-x86_64-uec 
--nic net-id=net0 --wait vm0
  $ sudo ovs-vsctl list Port | grep other_config | grep -c net_uuid
  1
  $ openstack server delete vm0
  $ sleep 3
  $ sudo ovs-vsctl list Port | grep other_config | grep -c net_uuid
  1
  $ egrep -c 'Transaction caused no change' /opt/stack/logs/q-agt.log
  22

  A sample log message from q-agt.log:

  2016-10-06 09:23:05.447 DEBUG neutron.agent.ovsdb.impl_idl [-] Running txn 
command(idx=0): DbSetCommand(table=Port, col_values=(('other_config', 

[Yahoo-eng-team] [Bug 1626010] [NEW] Connectivity problem on trunk parent with MAC reuse and openvswitch firewall driver

2016-09-21 Thread Bence Romsics
Public bug reported:

It seems we have a case where the openvswitch firewall driver and a use
of trunks interferes with each other. I tried using the parent's MAC
address for a subport. Like this:

 openstack network create net0
 openstack network create net1
 openstack subnet create --network net0 --subnet-range 10.0.4.0/24 subnet0
 openstack subnet create --network net1 --subnet-range 10.0.5.0/24 subnet1
 openstack port create --network net0 port0
 parent_mac="$( openstack port show port0 | awk '/ mac_address / { print $4 }' 
)"
 openstack port create --network net1 --mac-address "$parent_mac" port1
 openstack network trunk create --parent-port port0 --subport 
port=port1,segmentation-type=vlan,segmentation-id=101 trunk0
 openstack server create --flavor cirros256 --image cirros-0.3.4-x86_64-uec 
--nic port-id=port0 --key-name key0 --wait vm0

Then all packets are lost on the trunk's parent port:

 $ openstack server show vm0 | egrep addresses.*net0
 | addresses| net0=10.0.4.6 
 |
 $ sudo ip netns exec "qdhcp-$( openstack network show net0 | awk '/ id / { 
print $4 }' )" ping -c3 10.0.4.6
 WARNING: openstackclient.common.utils is deprecated and will be removed after 
Jun 2017. Please use osc_lib.utils
 PING 10.0.4.6 (10.0.4.6) 56(84) bytes of data.
 
 --- 10.0.4.6 ping statistics ---
 3 packets transmitted, 0 received, 100% packet loss, time 2016ms

If I change the firewall_driver to noop and redo the same I have
connectivity.

If I still have the openvswitch firewall_driver but I don't explicitly
set the subport MAC, but let neutron automatically assign one, then
again I have connectivity.

devstack version: 81d89cf
neutron version: 60010a8

relevant parts of local.conf:

 [[local|localrc]]
 enable_service neutron-api
 enable_service neutron-l3
 enable_service neutron-agent
 enable_service neutron-dhcp
 enable_service neutron-metadata-agent
 
 [[post-config|$NEUTRON_CONF]]
 [DEFAULT]
 service_plugins = router,trunk
 
 [[post-config|$NEUTRON_PLUGIN_CONF]]
 [securitygroup]
 firewall_driver = openvswitch

** Affects: neutron
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1626010

Title:
  Connectivity problem on trunk parent with MAC reuse and openvswitch
  firewall driver

Status in neutron:
  New

Bug description:
  It seems we have a case where the openvswitch firewall driver and a
  use of trunks interferes with each other. I tried using the parent's
  MAC address for a subport. Like this:

   openstack network create net0
   openstack network create net1
   openstack subnet create --network net0 --subnet-range 10.0.4.0/24 subnet0
   openstack subnet create --network net1 --subnet-range 10.0.5.0/24 subnet1
   openstack port create --network net0 port0
   parent_mac="$( openstack port show port0 | awk '/ mac_address / { print $4 
}' )"
   openstack port create --network net1 --mac-address "$parent_mac" port1
   openstack network trunk create --parent-port port0 --subport 
port=port1,segmentation-type=vlan,segmentation-id=101 trunk0
   openstack server create --flavor cirros256 --image cirros-0.3.4-x86_64-uec 
--nic port-id=port0 --key-name key0 --wait vm0

  Then all packets are lost on the trunk's parent port:

   $ openstack server show vm0 | egrep addresses.*net0
   | addresses| net0=10.0.4.6   
   |
   $ sudo ip netns exec "qdhcp-$( openstack network show net0 | awk '/ id / { 
print $4 }' )" ping -c3 10.0.4.6
   WARNING: openstackclient.common.utils is deprecated and will be removed 
after Jun 2017. Please use osc_lib.utils
   PING 10.0.4.6 (10.0.4.6) 56(84) bytes of data.
   
   --- 10.0.4.6 ping statistics ---
   3 packets transmitted, 0 received, 100% packet loss, time 2016ms

  If I change the firewall_driver to noop and redo the same I have
  connectivity.

  If I still have the openvswitch firewall_driver but I don't explicitly
  set the subport MAC, but let neutron automatically assign one, then
  again I have connectivity.

  devstack version: 81d89cf
  neutron version: 60010a8

  relevant parts of local.conf:

   [[local|localrc]]
   enable_service neutron-api
   enable_service neutron-l3
   enable_service neutron-agent
   enable_service neutron-dhcp
   enable_service neutron-metadata-agent
   
   [[post-config|$NEUTRON_CONF]]
   [DEFAULT]
   service_plugins = router,trunk
   
   [[post-config|$NEUTRON_PLUGIN_CONF]]
   [securitygroup]
   firewall_driver = openvswitch

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1626010/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1587296] [NEW] ovs-agent: use_veth_interconnection is not needed anymore

2016-05-31 Thread Bence Romsics
Public bug reported:

Config option 'use_veth_interconnection' should be deprecated. Instead
we can always use Open vSwitch patch ports.

The discussion started in a review here:

https://review.openstack.org/#/c/318317/2
openstack/neutron/doc/source/devref/openvswitch_agent.rst
line 471

AFAICT the use of veth pairs was always a fallback when sufficiently new
ovs was not available from distro packages. Since veth pairs have always
worse packet forwarding performance than ovs patch cables it makes no
sense using them if ovs patch cables are available.

If we no longer support veth pairs, the agent code can be simplified.

We think providing the veth fallback is no longer relevant. Open vSwitch
release notes state this (http://openvswitch.org/releases/NEWS-2.5.0):

v1.10.0 - 01 May 2013
-
...
- Patch ports no longer require kernel support, so they now work
  with FreeBSD and the kernel module built into Linux 3.3 and later.

For example for Ubuntu this means veth is not needed in 14.04+.

I opened this bug to separate this conversation from the above review.
To get feedback if anybody still uses veth pairs. Shall we deprecate
'use_veth_interconnection'? If yes, what should be the deprecation
timeline?

** Affects: neutron
 Importance: Undecided
 Status: New


** Tags: ovs rfe

** Project changed: tempest => neutron

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1587296

Title:
  ovs-agent: use_veth_interconnection is not needed anymore

Status in neutron:
  New

Bug description:
  Config option 'use_veth_interconnection' should be deprecated. Instead
  we can always use Open vSwitch patch ports.

  The discussion started in a review here:

  https://review.openstack.org/#/c/318317/2
  openstack/neutron/doc/source/devref/openvswitch_agent.rst
  line 471

  AFAICT the use of veth pairs was always a fallback when sufficiently
  new ovs was not available from distro packages. Since veth pairs have
  always worse packet forwarding performance than ovs patch cables it
  makes no sense using them if ovs patch cables are available.

  If we no longer support veth pairs, the agent code can be simplified.

  We think providing the veth fallback is no longer relevant. Open
  vSwitch release notes state this
  (http://openvswitch.org/releases/NEWS-2.5.0):

  v1.10.0 - 01 May 2013
  -
  ...
  - Patch ports no longer require kernel support, so they now work
with FreeBSD and the kernel module built into Linux 3.3 and later.

  For example for Ubuntu this means veth is not needed in 14.04+.

  I opened this bug to separate this conversation from the above review.
  To get feedback if anybody still uses veth pairs. Shall we deprecate
  'use_veth_interconnection'? If yes, what should be the deprecation
  timeline?

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1587296/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp