[Yahoo-eng-team] [Bug 1978039] [NEW] [ovn]Floating IP adds distributed attributes
Public bug reported: By set config ovn.enable_distributed_floating_ip, we can set whether the floating IP of the entire cluster is distributed or centralized. There is no way to set a floating IP separately, and the flow of floating IP cannot be finely controlled. If the backend is ovn, we can set dnat_and_snat's external_mac determines whether it is distributed or centralized, each floating IP can be easily set individually. ** Affects: neutron Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1978039 Title: [ovn]Floating IP adds distributed attributes Status in neutron: New Bug description: By set config ovn.enable_distributed_floating_ip, we can set whether the floating IP of the entire cluster is distributed or centralized. There is no way to set a floating IP separately, and the flow of floating IP cannot be finely controlled. If the backend is ovn, we can set dnat_and_snat's external_mac determines whether it is distributed or centralized, each floating IP can be easily set individually. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1978039/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1978035] [NEW] remove unused updated_at parameter for AgentCache.update
Public bug reported: AgentCache method update[1] has a parameter "updated_at", I didn't find this parameter passed in anywhere except for some unit tests. currently we have used nb_cfg_timestamp[2] as agent updated time. there are no other scenarios for this parameter. can we remove this parameter? [1] https://opendev.org/openstack/neutron/src/commit/e44dbe98e82fddac72723caa9357daae0f0ab76f/neutron/plugins/ml2/drivers/ovn/agent/neutron_agent.py#L241 [2] https://review.opendev.org/c/openstack/neutron/+/802834 ** Affects: neutron Importance: Undecided Status: New ** Description changed: - AgentCache method update[1] has a parameter "updated_at", I didn't find this parameter passed in anywhere except for some unit tests. currently we have used nb_cfg_timestamp[2] as agent updated time. + AgentCache method update[1] has a parameter "updated_at", I didn't find this parameter passed in anywhere except for some unit tests. currently we have used nb_cfg_timestamp[2] as agent updated time. there are no other scenarios for this parameter. can we remove this parameter? - [1] https://opendev.org/openstack/neutron/src/commit/e44dbe98e82fddac72723caa9357daae0f0ab76f/neutron/plugins/ml2/drivers/ovn/agent/neutron_agent.py#L241 [2] https://review.opendev.org/c/openstack/neutron/+/802834 -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1978035 Title: remove unused updated_at parameter for AgentCache.update Status in neutron: New Bug description: AgentCache method update[1] has a parameter "updated_at", I didn't find this parameter passed in anywhere except for some unit tests. currently we have used nb_cfg_timestamp[2] as agent updated time. there are no other scenarios for this parameter. can we remove this parameter? [1] https://opendev.org/openstack/neutron/src/commit/e44dbe98e82fddac72723caa9357daae0f0ab76f/neutron/plugins/ml2/drivers/ovn/agent/neutron_agent.py#L241 [2] https://review.opendev.org/c/openstack/neutron/+/802834 To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1978035/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1968555] Re: evacuate after network issue will cause vm running on two host
The evacuate API states: Preconditions The failed host must be fenced and no longer running the original server. The failed host must be reported as down or marked as forced down using Update Forced Down. So when you detect the control network failure you have to make sure that the host is fenced before you evacuate the instance. This is exactly there to prevent the duplication of the VM via evacuation. The most common fencing method is power fencing. I.e. when the issue is detected the problematic compute is powered off via out of band management. Then VMs can be safely evacuated. [1] https://docs.openstack.org/api-ref/compute/?expanded=evacuate- server-evacuate-action-detail#evacuate-server-evacuate-action ** Changed in: nova Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1968555 Title: evacuate after network issue will cause vm running on two host Status in OpenStack Compute (nova): Invalid Bug description: Environment === openstack queen + libvirt 4.5.0 + qemu 2.12 running on centos7, with ceph rbd storage Description === If the management network of the compute host is abnormal, it may cause nova-compute down but the openstack-nova-compute.service is still running on that host. Now you evacuate a vm on that host, the evacuate will succeed, the vm will be running both on the old host and the new host even after the management network of old host recover, it may cause vm error. Steps to reproduce == 1. Manually turn down the management network port of the compute host, like ifconfig eth0 down 2. After the nova-compute of that host see down with openstack compute service list, evacuate one vm on that host: nova evacuate 3. After evacuate succeed, you can find the vm running on two host. 4. Manually turn up the management network port of the old compute host, like ifconfig eth0 up, you can find the vm still running on this host, it can't be auto destroy unless you restart the openstack-nova-compute.service on that host. Expected result === Maybe we can add a periodic task to auto destroy this vm? To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1968555/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1977524] Re: Wrong redirect after deleting zone from Zone Overview pane
** Also affects: designate-dashboard Importance: Undecided Status: New ** No longer affects: designate ** Changed in: designate-dashboard Status: New => Confirmed -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Dashboard (Horizon). https://bugs.launchpad.net/bugs/1977524 Title: Wrong redirect after deleting zone from Zone Overview pane Status in Designate Dashboard: Confirmed Status in OpenStack Dashboard (Horizon): New Bug description: When deleting zone from Zones -> specific zone -> Overview pane i am getting page not exist error. After successful notification that zone is being removed website redirects to /dashboard/dashboard/project/dnszones which has duplicate dashboard path. When deleting from zones list view everything works fine. Tested on Ussuri environment, but code seems to be unchanged in newer releases. I've tried to apply bugfixes for reloading zones/flating-ip panes but with no effect for this case To manage notifications about this bug go to: https://bugs.launchpad.net/designate-dashboard/+bug/1977524/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1977485] Re: Neutron deletes port in use and nova errors out when cleaning the VM XML
Thanks for the info, I'll re-assign to Nova component based on that. ** Project changed: neutron => nova -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1977485 Title: Neutron deletes port in use and nova errors out when cleaning the VM XML Status in OpenStack Compute (nova): Incomplete Bug description: We recently upgraded to OpenStack XENA on a Kolla deployment and we hit this issue Neutron is able to delete a port in use and Nova errors out when trying to clean up the VM XML (we are talking about Windows VMs on KVM hypervisor). We need to move the ports from some VMs to a different subnet. We tried the following procedure: remove the port of a VM create a new port on the new subnet attach the new port to the VM Result: We have a VM with no network connectivity. The port gets deleted from OpenStack, and from OVS as well, but in the VM XML we see this: The bridge part represents the old interface (the interface of the old deleted port) and the ethernet part is the new port. We also tried to use: virsh detach-interface in order to remove the stale interface from the XML, the command said it was completed successfully but the interface is still there. We noticed that rebooting the VM cleans the XML file and the connectivity is back (this is not the desired solution) In the logs we see: When we delete the old port: 2022-05-31 12:13:29.935 7 INFO nova.compute.manager [req-1bf9e960-fc99-453f-8bf9-cd6a76c12feb 9879764509c84ca58d054fc3b9575df6 24783cb241264363ad1b8808ba21c131 - default default] [instance: b6c60a66-e571-4d50-984a-101dcb29f6aa] Neutron deleted interface fb15ad83-bf28-455d-a1b1-14158203b4bf; detaching it from the instance and deleting it from the info cache 2022-05-31 12:13:30.076 7 WARNING nova.virt.libvirt.driver [req-1bf9e960-fc99-453f-8bf9-cd6a76c12feb 9879764509c84ca58d054fc3b9575df6 24783cb241264363ad1b8808ba21c131 - default default] [instance: b6c60a66-e571-4d50-984a-101dcb29f6aa] Detaching interface 00:16:3c:7b:2c:1c failed because the device is no longer found on the guest.: nova.exception.DeviceNotFound: Device 'tapfb15ad83-bf' not found. 2022-05-31 12:13:30.740 7 INFO os_vif [req-1bf9e960-fc99-453f-8bf9-cd6a76c12feb 9879764509c84ca58d054fc3b9575df6 24783cb241264363ad1b8808ba21c131 - default default] Successfully unplugged vif VIFOpenVSwitch(active=True,address=00:16:3c:7b:2c:1c,bridge_name='br-int',has_traffic_filtering=True,id=fb15ad83-bf28-455d-a1b1-14158203b4bf,network=Network(b03631f6-6fa7-4ff3-97e6-0a3bd077fac3),plugin='ovs',port_profile=VIFPortProfileOpenVSwitch,preserve_on_delete=True,vif_name='tapfb15ad83-bf') When we attach the new port: 2022-05-31 12:20:16.427 7 WARNING nova.compute.manager [req-f42820d6-1c70-428a-9c2d-305737838bfc 9879764509c84ca58d054fc3b9575df6 24783cb241264363ad1b8808ba21c131 - default default] [instance: b6c60a66-e571-4d50-984a-101dcb29f6aa] Received unexpected event network-vif-plugged-ba91ea48-3676-4934-87ba-1ad4cf80b1bc for instance with vm_state active and task_state None. 2022-05-31 12:20:19.188 7 WARNING nova.compute.manager [req-305324c7-c25b-44a7-96cd-a8cc84284727 9879764509c84ca58d054fc3b9575df6 24783cb241264363ad1b8808ba21c131 - default default] [instance: b6c60a66-e571-4d50-984a-101dcb29f6aa] Received unexpected event network-vif-plugged-ba91ea48-3676-4934-87ba-1ad4cf80b1bc for instance with vm_state active and task_state None. We found that the following workaround works: we use virsh detach-interface while the old port of the VM exists (before we delete it) then we delete the old port after that, we attach the new port This works as expected and the VM has network connectivity. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1977485/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1977468] Re: Nova sanitize_hostname problematic with fqdn display_name
I agree that deducing hostname from display name is problematic. Therefore I don't think we should further complicate the logic currently in place. But instead move away from it. I suggest to provide the hostname specifically with the hostname field in the POST /servers REST API request. I suggest to reach out to the openstack-ansible[1] developers and ask to expose this field via ansible. If you are using FQDNs then please also read and provide feedback in the currently discussed specification about FQDN handling[2]. I'm marking this as Invalid. Please put it back to New if you disagree. [1] https://storyboard.openstack.org/#!/project/openstack/ansible-collections-openstack [2] https://review.opendev.org/c/openstack/nova-specs/+/840974 ** Changed in: nova Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1977468 Title: Nova sanitize_hostname problematic with fqdn display_name Status in OpenStack Compute (nova): Invalid Bug description: Current implementation of nova/utils.py in function sanitize_hostname does a simple replace for dots in hostname. This causes an issue with nova/compute/api.py where _populate_instance_names sets hostname from display_name. We have multiple cases where we want display name to reflect FQDN of the instance. In this case server1.example.com becomes hostname server1-example-com. This tends to create a cascading array of problems, when the hostname is not the actual hostname, but variation of FQDN. Cloud-init does pick up the generated name and the issues can carry to /etc/hosts (depending on configuration). It's desirable to have FQDN as display name, as there may be instances that have the same hostname, but different domain listed in various views that list instances. Tools like Ansible's openstack.cloud.server.module do not have the ability to specify display_name and hostname individually. It would be preferable to have an option to select the way the name is sanitized. Aka. cut down everything after first dot (possibly with more logic to check for valid FQDN) or to have current way of just replacing dots '.' with dashes '-'. I don't see either as specifically correct way of doing things, trying to deduce a hostname from display name is an opinionated thing. I did a dirty fix for my specific problem by splitting the hostname from first dot and picking up the first part as hostname: --- /tmp/utils.py.orig 2022-06-02 22:02:48.152040276 +0300 +++ /tmp/utils.py 2022-06-02 22:22:00.319168645 +0300 @@ -365,6 +365,8 @@ # Remove characters outside the Unicode range U+-U+00FF hostname = hostname.encode('latin-1', 'ignore').decode('latin-1') +if hostname.find('.') >= 0: +hostname = hostname.split('.')[0] hostname = truncate_hostname(hostname) hostname = re.sub(r'[ _\.]', '-', hostname) hostname = re.sub(r'[^\w.-]+', '', hostname) To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1977468/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1972666] Fix included in openstack/glance 24.1.0
This issue was fixed in the openstack/glance 24.1.0 release. ** Changed in: glance/yoga Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to Glance. https://bugs.launchpad.net/bugs/1972666 Title: Configgen does not pick up all groups from wsgi.py Status in Glance: Fix Released Status in Glance wallaby series: New Status in Glance xena series: New Status in Glance yoga series: Fix Released Status in Glance zed series: Fix Released Bug description: 'cli_opts' and 'cache_opts' from glance/common/wsgi.py are not picked up by the configgen nor listed through the functions in glance/opts.py. To manage notifications about this bug go to: https://bugs.launchpad.net/glance/+bug/1972666/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1977969] [NEW] [OVN] DB sync tool: local variable 'members_to_verify' referenced before assignment"
Public bug reported: Error: https://paste.openstack.org/show/814826/ Method: "OVNClient._handle_lb_fip_cmds" Seems that the section handling the load balancer members is incorrectly indented. ** Affects: neutron Importance: Undecided Assignee: Rodolfo Alonso (rodolfo-alonso-hernandez) Status: In Progress ** Changed in: neutron Assignee: (unassigned) => Rodolfo Alonso (rodolfo-alonso-hernandez) -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1977969 Title: [OVN] DB sync tool: local variable 'members_to_verify' referenced before assignment" Status in neutron: In Progress Bug description: Error: https://paste.openstack.org/show/814826/ Method: "OVNClient._handle_lb_fip_cmds" Seems that the section handling the load balancer members is incorrectly indented. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1977969/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1975674] Re: Neutron agent blocks during VM deletion when a remote security group is involved
Reviewed: https://review.opendev.org/c/openstack/neutron/+/843253 Committed: https://opendev.org/openstack/neutron/commit/e09b128f416a809cd7734aba8ab52220ea01b2e2 Submitter: "Zuul (22348)" Branch:master commit e09b128f416a809cd7734aba8ab52220ea01b2e2 Author: Henning Eggers Date: Wed May 25 11:17:43 2022 +0200 Defer flow deletion in openvswitch firewall Reduces the deletion time of conjunction flows on hypervisors where virtual machines reside which are part of a security group that has remote security groups as target which contain thousands of ports. Without deferred deletion the agent will call ovs-ofctl several hundred times in succession, during this time the agent will block any new vm creation or neutron port modifications on this hypervisor. This patch has been tested using a single network with a single vm with a security group that points to a remote security group with 2000 ports. During testing without the patch, the iteration time for deletion was at around 500 seconds. After adding the patch to the l2 agent on the test environment the same deletion time went down to 4 seconds. Closes-Bug: #1975674 Change-Id: I46b1fe94b2e358f7f4b2cd4943a74ebaf84f51b8 ** Changed in: neutron Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1975674 Title: Neutron agent blocks during VM deletion when a remote security group is involved Status in neutron: Fix Released Bug description: When deleting a VM that has a security group referring to a remote security group, the neutron agent will block for as long as it takes to remove the respective flows. This happens when the remote security group contains many (thousands) ports referring to other VMs. Steps to reproduce: - Create a VM with security group A - Add a rule to security group A allowing access from a remote security group B - Add a large number or ports to security group B (e.g. 2000) - The respective ovs flows will be added - Delete the VM - The ovs flows will be removed Expected: - VM and flow to be deleted within seconds - No impact to other VMs on the same hypervisor Actual: - Flow deletion takes a long time, sometimes up to 10 minutes - While flows are being deleted, no VMs can be created on the same hypervisor The reason for this behavior is that under the hood the agent calls ovs-ofctl (via execve()) once for each port in the remote security group. These calls quickly add up to minutes if there are many ports. The proposed solution would be to use deferred execution for the flow deletion. In that case it becomes a bulk operation and around 400 flows are deleted in one call. In addition it runs in the background and does not block the agent for other operations. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1975674/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1977952] [NEW] In 22.2 cloud-init fails when phone-home module does not have "tries" parameter
Public bug reported: Hi! We have some user-data files where we use the phone-home module of cloud-init. So far we did not use it's "tries" parameter and everything worked. However now in version 22.2 there was a change which causes cloud-init to fail. https://github.com/canonical/cloud-init/compare/22.1...22.2#diff-a4aa83fbb946ba1ea7cf6c8dd5965cd62631dc9cb48d4baa50adddbfef06b82cL108 In our case this change in the exception handling throws a TypeError, instead of the ValueError that is excepted: File "/usr/lib/python3/dist-packages/cloudinit/config/cc_phone_home.py", line 132, in handle tries = int(tries) # type: ignore TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType' While we can add the "tries" parameter (and after that everything works just like before), this exception should be handled properly. Also according to guidelines: 1. Tell us your cloud provider None 2. Any appropriate cloud-init configuration you can provide us phone-home module 3. Perform the following on the system and attach it to this bug: logs are attached Best regards: Zsolt ** Affects: cloud-init Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to cloud-init. https://bugs.launchpad.net/bugs/1977952 Title: In 22.2 cloud-init fails when phone-home module does not have "tries" parameter Status in cloud-init: New Bug description: Hi! We have some user-data files where we use the phone-home module of cloud-init. So far we did not use it's "tries" parameter and everything worked. However now in version 22.2 there was a change which causes cloud-init to fail. https://github.com/canonical/cloud-init/compare/22.1...22.2#diff-a4aa83fbb946ba1ea7cf6c8dd5965cd62631dc9cb48d4baa50adddbfef06b82cL108 In our case this change in the exception handling throws a TypeError, instead of the ValueError that is excepted: File "/usr/lib/python3/dist-packages/cloudinit/config/cc_phone_home.py", line 132, in handle tries = int(tries) # type: ignore TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType' While we can add the "tries" parameter (and after that everything works just like before), this exception should be handled properly. Also according to guidelines: 1. Tell us your cloud provider None 2. Any appropriate cloud-init configuration you can provide us phone-home module 3. Perform the following on the system and attach it to this bug: logs are attached Best regards: Zsolt To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-init/+bug/1977952/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp