date:20220608

[Yahoo-eng-team] [Bug 1978039] [NEW] [ovn]Floating IP adds distributed attributes

2022-06-08 Thread ZhouHeng

Public bug reported:

By set config ovn.enable_distributed_floating_ip, we can set whether the 
floating IP of the entire cluster is distributed or centralized.
There is no way to set a floating IP separately, and the flow of floating IP 
cannot be finely controlled.

If the backend is ovn, we can set dnat_and_snat's external_mac
determines whether it is distributed or centralized, each floating IP
can be easily set individually.

** Affects: neutron
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1978039

Title:
  [ovn]Floating IP adds distributed attributes

Status in neutron:
  New

Bug description:
  By set config ovn.enable_distributed_floating_ip, we can set whether the 
floating IP of the entire cluster is distributed or centralized.
  There is no way to set a floating IP separately, and the flow of floating IP 
cannot be finely controlled.

  If the backend is ovn, we can set dnat_and_snat's external_mac
  determines whether it is distributed or centralized, each floating IP
  can be easily set individually.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1978039/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1978035] [NEW] remove unused updated_at parameter for AgentCache.update

2022-06-08 Thread ZhouHeng

Public bug reported:

AgentCache method update[1] has a parameter "updated_at", I didn't find this 
parameter passed in anywhere except for some unit tests. currently we have used 
nb_cfg_timestamp[2] as agent updated time. there are no other scenarios for 
this parameter.
can we remove this parameter?

[1] 
https://opendev.org/openstack/neutron/src/commit/e44dbe98e82fddac72723caa9357daae0f0ab76f/neutron/plugins/ml2/drivers/ovn/agent/neutron_agent.py#L241
[2] https://review.opendev.org/c/openstack/neutron/+/802834

** Affects: neutron
 Importance: Undecided
 Status: New

** Description changed:

- AgentCache method update[1] has a parameter "updated_at", I didn't find this 
parameter passed in anywhere except for some unit tests. currently we have used 
nb_cfg_timestamp[2] as agent updated time. 
+ AgentCache method update[1] has a parameter "updated_at", I didn't find this 
parameter passed in anywhere except for some unit tests. currently we have used 
nb_cfg_timestamp[2] as agent updated time. there are no other scenarios for 
this parameter.
  can we remove this parameter?
- 
  
  [1] 
https://opendev.org/openstack/neutron/src/commit/e44dbe98e82fddac72723caa9357daae0f0ab76f/neutron/plugins/ml2/drivers/ovn/agent/neutron_agent.py#L241
  [2] https://review.opendev.org/c/openstack/neutron/+/802834

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1978035

Title:
  remove unused updated_at  parameter for AgentCache.update

Status in neutron:
  New

Bug description:
  AgentCache method update[1] has a parameter "updated_at", I didn't find this 
parameter passed in anywhere except for some unit tests. currently we have used 
nb_cfg_timestamp[2] as agent updated time. there are no other scenarios for 
this parameter.
  can we remove this parameter?

  [1] 
https://opendev.org/openstack/neutron/src/commit/e44dbe98e82fddac72723caa9357daae0f0ab76f/neutron/plugins/ml2/drivers/ovn/agent/neutron_agent.py#L241
  [2] https://review.opendev.org/c/openstack/neutron/+/802834

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1978035/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1968555] Re: evacuate after network issue will cause vm running on two host

2022-06-08 Thread Balazs Gibizer

The evacuate API states:

Preconditions

The failed host must be fenced and no longer running the original
server.

The failed host must be reported as down or marked as forced down
using Update Forced Down.

So when you detect the control network failure you have to make sure
that the host is fenced before you evacuate the instance. This is
exactly there to prevent the duplication of the VM via evacuation.

The most common fencing method is power fencing. I.e. when the issue is
detected the problematic compute is powered off via out of band
management. Then VMs can be safely evacuated.

[1] https://docs.openstack.org/api-ref/compute/?expanded=evacuate-
server-evacuate-action-detail#evacuate-server-evacuate-action

** Changed in: nova
   Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1968555

Title:
  evacuate after network issue will cause vm running on two host

Status in OpenStack Compute (nova):
  Invalid

Bug description:
  Environment
  ===
  openstack queen + libvirt 4.5.0 + qemu 2.12 running on centos7, with ceph rbd 
storage

  Description
  ===
  If the management network of the compute host is abnormal, it may cause 
nova-compute down but the openstack-nova-compute.service is still running on 
that host. Now you evacuate a vm on that host, the evacuate will succeed, the 
vm will be running both on the old host and the new host even after the 
management network of old host recover, it may cause vm error.   

  Steps to reproduce
  ==
  1. Manually turn down the management network port of the compute host, like 
ifconfig eth0 down
  2. After the nova-compute of that host see down with openstack compute 
service list, evacuate one vm on that host:
  nova evacuate 
  3. After evacuate succeed, you can find the vm running on two host.
  4. Manually turn up the management network port of the old compute host, like 
ifconfig eth0 up, you can find the vm still running on this host, it can't be 
auto destroy unless you restart the openstack-nova-compute.service on that host.

  Expected result
  ===
  Maybe we can add a periodic task to auto destroy this vm?

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1968555/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1977524] Re: Wrong redirect after deleting zone from Zone Overview pane

2022-06-08 Thread Michael Johnson

** Also affects: designate-dashboard
   Importance: Undecided
   Status: New

** No longer affects: designate

** Changed in: designate-dashboard
   Status: New => Confirmed

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Dashboard (Horizon).
https://bugs.launchpad.net/bugs/1977524

Title:
  Wrong redirect after deleting zone from Zone Overview pane

Status in Designate Dashboard:
  Confirmed
Status in OpenStack Dashboard (Horizon):
  New

Bug description:
  When deleting zone from Zones -> specific zone ->  Overview pane i am getting 
page not exist error. 
  After successful notification that zone is being removed website redirects to 
/dashboard/dashboard/project/dnszones which has duplicate dashboard path. 
  When deleting from zones list view everything works fine.

  
  Tested on Ussuri environment, but code seems to be unchanged in newer 
releases. 
  I've tried to apply bugfixes for reloading zones/flating-ip panes but with no 
effect for this case

To manage notifications about this bug go to:
https://bugs.launchpad.net/designate-dashboard/+bug/1977524/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1977485] Re: Neutron deletes port in use and nova errors out when cleaning the VM XML

2022-06-08 Thread Brian Haley

Thanks for the info, I'll re-assign to Nova component based on that.

** Project changed: neutron => nova

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1977485

Title:
  Neutron deletes port in use and nova errors out when cleaning the VM
  XML

Status in OpenStack Compute (nova):
  Incomplete

Bug description:
  We recently upgraded to OpenStack XENA on a Kolla deployment and we
  hit this issue

  Neutron is able to delete a port in use and Nova errors out when
  trying to clean up the VM XML (we are talking about Windows VMs on KVM
  hypervisor).

  We need to move the ports from some VMs to a different subnet. We tried the 
following procedure:
  remove the port of a VM
  create a new port on the new subnet
  attach the new port to the VM

  Result:
  We have a VM with no network connectivity.
  The port gets deleted from OpenStack, and from OVS as well, but in the VM XML 
we see this:
  



  






  
  






  

  The bridge part represents the old interface (the interface of the old 
deleted port) and the ethernet part is the new port.
  We also tried to use:
  virsh detach-interface in order to remove the stale interface from the XML, 
the command said it was completed successfully but the interface is still there.
  We noticed that rebooting the VM cleans the XML file and the connectivity is 
back (this is not the desired solution)

  In the logs we see:
  When we delete the old port:
  2022-05-31 12:13:29.935 7 INFO nova.compute.manager 
[req-1bf9e960-fc99-453f-8bf9-cd6a76c12feb 9879764509c84ca58d054fc3b9575df6 
24783cb241264363ad1b8808ba21c131 - default default] [instance: 
b6c60a66-e571-4d50-984a-101dcb29f6aa] Neutron deleted interface 
fb15ad83-bf28-455d-a1b1-14158203b4bf; detaching it from the instance and 
deleting it from the info cache
  2022-05-31 12:13:30.076 7 WARNING nova.virt.libvirt.driver 
[req-1bf9e960-fc99-453f-8bf9-cd6a76c12feb 9879764509c84ca58d054fc3b9575df6 
24783cb241264363ad1b8808ba21c131 - default default] [instance: 
b6c60a66-e571-4d50-984a-101dcb29f6aa] Detaching interface 00:16:3c:7b:2c:1c 
failed because the device is no longer found on the guest.: 
nova.exception.DeviceNotFound: Device 'tapfb15ad83-bf' not found.
  2022-05-31 12:13:30.740 7 INFO os_vif 
[req-1bf9e960-fc99-453f-8bf9-cd6a76c12feb 9879764509c84ca58d054fc3b9575df6 
24783cb241264363ad1b8808ba21c131 - default default] Successfully unplugged vif 
VIFOpenVSwitch(active=True,address=00:16:3c:7b:2c:1c,bridge_name='br-int',has_traffic_filtering=True,id=fb15ad83-bf28-455d-a1b1-14158203b4bf,network=Network(b03631f6-6fa7-4ff3-97e6-0a3bd077fac3),plugin='ovs',port_profile=VIFPortProfileOpenVSwitch,preserve_on_delete=True,vif_name='tapfb15ad83-bf')

  When we attach the new port:
  2022-05-31 12:20:16.427 7 WARNING nova.compute.manager 
[req-f42820d6-1c70-428a-9c2d-305737838bfc 9879764509c84ca58d054fc3b9575df6 
24783cb241264363ad1b8808ba21c131 - default default] [instance: 
b6c60a66-e571-4d50-984a-101dcb29f6aa] Received unexpected event 
network-vif-plugged-ba91ea48-3676-4934-87ba-1ad4cf80b1bc for instance with 
vm_state active and task_state None.
  2022-05-31 12:20:19.188 7 WARNING nova.compute.manager 
[req-305324c7-c25b-44a7-96cd-a8cc84284727 9879764509c84ca58d054fc3b9575df6 
24783cb241264363ad1b8808ba21c131 - default default] [instance: 
b6c60a66-e571-4d50-984a-101dcb29f6aa] Received unexpected event 
network-vif-plugged-ba91ea48-3676-4934-87ba-1ad4cf80b1bc for instance with 
vm_state active and task_state None.

  
  We found that the following workaround works:
  we use virsh detach-interface while the old port of the VM exists (before we 
delete it)
  then we delete the old port 
  after that, we attach the new port
  This works as expected and the VM has network connectivity.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1977485/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1977468] Re: Nova sanitize_hostname problematic with fqdn display_name

2022-06-08 Thread Balazs Gibizer

I agree that deducing hostname from display name is problematic.
Therefore I don't think we should further complicate the logic currently
in place. But instead move away from it. I suggest to provide the
hostname specifically with the hostname field in the POST /servers REST
API request. I suggest to reach out to the openstack-ansible[1]
developers and ask to expose this field via ansible.

If you are using FQDNs then please also read and provide feedback in the
currently discussed specification about FQDN handling[2].

I'm marking this as Invalid. Please put it back to New if you disagree.

[1] 
https://storyboard.openstack.org/#!/project/openstack/ansible-collections-openstack
[2] https://review.opendev.org/c/openstack/nova-specs/+/840974

** Changed in: nova
   Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1977468

Title:
  Nova sanitize_hostname problematic with fqdn display_name

Status in OpenStack Compute (nova):
  Invalid

Bug description:
  Current implementation of nova/utils.py in function sanitize_hostname
  does a simple replace for dots in hostname.

  This causes an issue with nova/compute/api.py where _populate_instance_names 
sets hostname from display_name. We have multiple cases where we want display 
name to reflect FQDN of the instance.
  In this case server1.example.com becomes hostname server1-example-com. This 
tends to create a cascading array of problems, when the hostname is not the 
actual hostname, but variation of FQDN. Cloud-init does pick up the generated 
name and the issues can carry to /etc/hosts (depending on configuration).

  It's desirable to have FQDN as display name, as there may be instances
  that have the same hostname, but different domain listed in various
  views that list instances.

  Tools like Ansible's openstack.cloud.server.module do not have the
  ability to specify display_name and hostname individually.

  It would be preferable to have an option to select the way the name is
  sanitized. Aka. cut down everything after first dot (possibly with
  more logic to check for valid FQDN) or to have current way of just
  replacing dots '.' with dashes '-'. I don't see either as specifically
  correct way of doing things, trying to deduce a hostname from display
  name is an opinionated thing.

  I did a dirty fix for my specific problem by splitting the hostname
  from first dot and picking up the first part as hostname:

  --- /tmp/utils.py.orig  2022-06-02 22:02:48.152040276 +0300
  +++ /tmp/utils.py   2022-06-02 22:22:00.319168645 +0300
  @@ -365,6 +365,8 @@
   # Remove characters outside the Unicode range U+-U+00FF
   hostname = hostname.encode('latin-1', 'ignore').decode('latin-1')

  +if hostname.find('.') >= 0:
  +hostname = hostname.split('.')[0]
   hostname = truncate_hostname(hostname)
   hostname = re.sub(r'[ _\.]', '-', hostname)
   hostname = re.sub(r'[^\w.-]+', '', hostname)

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1977468/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1972666] Fix included in openstack/glance 24.1.0

2022-06-08 Thread OpenStack Infra

This issue was fixed in the openstack/glance 24.1.0  release.

** Changed in: glance/yoga
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to Glance.
https://bugs.launchpad.net/bugs/1972666

Title:
  Configgen does not pick up all groups from wsgi.py

Status in Glance:
  Fix Released
Status in Glance wallaby series:
  New
Status in Glance xena series:
  New
Status in Glance yoga series:
  Fix Released
Status in Glance zed series:
  Fix Released

Bug description:
  'cli_opts' and 'cache_opts' from glance/common/wsgi.py are not picked
  up by the configgen nor listed through the functions in
  glance/opts.py.

To manage notifications about this bug go to:
https://bugs.launchpad.net/glance/+bug/1972666/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1977969] [NEW] [OVN] DB sync tool: local variable 'members_to_verify' referenced before assignment"

2022-06-08 Thread Rodolfo Alonso

Public bug reported:

Error: https://paste.openstack.org/show/814826/

Method: "OVNClient._handle_lb_fip_cmds"

Seems that the section handling the load balancer members is incorrectly
indented.

** Affects: neutron
 Importance: Undecided
 Assignee: Rodolfo Alonso (rodolfo-alonso-hernandez)
 Status: In Progress

** Changed in: neutron
 Assignee: (unassigned) => Rodolfo Alonso (rodolfo-alonso-hernandez)

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1977969

Title:
  [OVN] DB sync tool:  local variable 'members_to_verify' referenced
  before assignment"

Status in neutron:
  In Progress

Bug description:
  Error: https://paste.openstack.org/show/814826/

  Method: "OVNClient._handle_lb_fip_cmds"

  Seems that the section handling the load balancer members is
  incorrectly indented.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1977969/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1975674] Re: Neutron agent blocks during VM deletion when a remote security group is involved

2022-06-08 Thread OpenStack Infra

Reviewed:  https://review.opendev.org/c/openstack/neutron/+/843253
Committed: 
https://opendev.org/openstack/neutron/commit/e09b128f416a809cd7734aba8ab52220ea01b2e2
Submitter: "Zuul (22348)"
Branch:master

commit e09b128f416a809cd7734aba8ab52220ea01b2e2
Author: Henning Eggers 
Date:   Wed May 25 11:17:43 2022 +0200

Defer flow deletion in openvswitch firewall

Reduces the deletion time of conjunction flows on hypervisors
where virtual machines reside which are part of a security
group that has remote security groups as target which contain
thousands of ports.

Without deferred deletion the agent will call ovs-ofctl several
hundred times in succession, during this time the agent will
block any new vm creation or neutron port modifications on this
hypervisor.

This patch has been tested using a single network with a single
vm with a security group that points to a remote security group
with 2000 ports.

During testing without the patch, the iteration time for deletion
was at around 500 seconds. After adding the patch to the l2 agent
on the test environment the same deletion time went down to
4 seconds.

Closes-Bug: #1975674
Change-Id: I46b1fe94b2e358f7f4b2cd4943a74ebaf84f51b8


** Changed in: neutron
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1975674

Title:
  Neutron agent blocks during VM deletion when a remote security group
  is involved

Status in neutron:
  Fix Released

Bug description:
  When deleting a VM that has a security group referring to a remote
  security group, the neutron agent will block for as long as it takes
  to remove the respective flows. This happens when the remote security
  group contains many (thousands) ports referring to other VMs.

  Steps to reproduce:
- Create a VM with security group A
- Add a rule to security group A allowing access from a remote security 
group B
- Add a large number or ports to security group B (e.g. 2000)
  - The respective ovs flows will be added
- Delete the VM
  - The ovs flows will be removed

  Expected:
- VM and flow to be deleted within seconds
- No impact to other VMs on the same hypervisor

  Actual:
- Flow deletion takes a long time, sometimes up to 10 minutes
- While flows are being deleted, no VMs can be created on the same 
hypervisor

  The reason for this behavior is that under the hood the agent calls
  ovs-ofctl (via execve()) once for each port in the remote security
  group. These calls quickly add up to minutes if there are many ports.

  The proposed solution would be to use deferred execution for the flow
  deletion. In that case it becomes a bulk operation and around 400
  flows are deleted in one call. In addition it runs in the background
  and does not block the agent for other operations.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1975674/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1977952] [NEW] In 22.2 cloud-init fails when phone-home module does not have "tries" parameter

2022-06-08 Thread Schmidt Zsolt

Public bug reported:

Hi!

We have some user-data files where we use the phone-home module of cloud-init.
So far we did not use it's "tries" parameter and everything worked.
However now in version 22.2 there was a change which causes cloud-init to fail.
https://github.com/canonical/cloud-init/compare/22.1...22.2#diff-a4aa83fbb946ba1ea7cf6c8dd5965cd62631dc9cb48d4baa50adddbfef06b82cL108

In our case this change in the exception handling throws a TypeError,
instead of the ValueError that is excepted:

File "/usr/lib/python3/dist-packages/cloudinit/config/cc_phone_home.py", line
132, in handle
tries = int(tries) # type: ignore
TypeError: int() argument must be a string, a bytes-like object or a number,
not 'NoneType'

While we can add the "tries" parameter (and after that everything works just
like before),
this exception should be handled properly.

Also according to guidelines:
1. Tell us your cloud provider
None
2. Any appropriate cloud-init configuration you can provide us
phone-home module
3. Perform the following on the system and attach it to this bug:
logs are attached

Best regards:
Zsolt

** Affects: cloud-init
Importance: Undecided
Status: New

--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to cloud-init.
https://bugs.launchpad.net/bugs/1977952

Title:
In 22.2 cloud-init fails when phone-home module does not have "tries"
parameter

Status in cloud-init:
New

Bug description:
Hi!

https://github.com/canonical/cloud-init/compare/22.1...22.2#diff-a4aa83fbb946ba1ea7cf6c8dd5965cd62631dc9cb48d4baa50adddbfef06b82cL108

In our case this change in the exception handling throws a TypeError,
instead of the ValueError that is excepted:

While we can add the "tries" parameter (and after that everything works just
like before),
this exception should be handled properly.

Best regards:
Zsolt

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-init/+bug/1977952/+subscriptions

--
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help : https://help.launchpad.net/ListHelp