[Yahoo-eng-team] [Bug 2063463] [NEW] [ovn-octavia-provider] hairpin_snat_ip not set
Public bug reported: At the moment, the OVN octavia provider does not set `hairpin_snat_ip` out of the box which means that if a backend server is sending requests to a load balancer which it is also a backend server of, it will get that request where the source IP of the request is the floating IP of the service. The issue here is that there are two backend IPs, one floating and one fixed and there is non-deterministic behaviour if `hairpin_snat_ip` is not set. We should ideally set `hairpin_snat_ip` to the internal IP so that it always hairpins from that IP as opposed to many other IPs which will make it easier to manage security groups as well. ** Affects: neutron Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/2063463 Title: [ovn-octavia-provider] hairpin_snat_ip not set Status in neutron: New Bug description: At the moment, the OVN octavia provider does not set `hairpin_snat_ip` out of the box which means that if a backend server is sending requests to a load balancer which it is also a backend server of, it will get that request where the source IP of the request is the floating IP of the service. The issue here is that there are two backend IPs, one floating and one fixed and there is non-deterministic behaviour if `hairpin_snat_ip` is not set. We should ideally set `hairpin_snat_ip` to the internal IP so that it always hairpins from that IP as opposed to many other IPs which will make it easier to manage security groups as well. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/2063463/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 2062385] [NEW] [ovn-octavia-provider] Member with FIP not reachable
Public bug reported: We've noticed the following issue with the OVN octavia provider and we've narrowed it down to the following: - Member with floating IP not reachable through load balancer We've noticed that at first, it loses all connectivity. Once the floating IP gets removed and added, the VM will gain connectivity directly. However, that member will continue to be unreachable via the load balancer (but other members without floating IPs will work). DVR is enabled in this case. ** Affects: neutron Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/2062385 Title: [ovn-octavia-provider] Member with FIP not reachable Status in neutron: New Bug description: We've noticed the following issue with the OVN octavia provider and we've narrowed it down to the following: - Member with floating IP not reachable through load balancer We've noticed that at first, it loses all connectivity. Once the floating IP gets removed and added, the VM will gain connectivity directly. However, that member will continue to be unreachable via the load balancer (but other members without floating IPs will work). DVR is enabled in this case. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/2062385/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 2060163] [NEW] [ovn] race condition with add/remove router interface
Public bug reported: We're running into an issue in our CI with Atmosphere where we frequently see failures when a router port is removed from an interface, the traceback is the following: == 2024-04-03 21:13:09.804 10 ERROR neutron.api.v2.resource [None req-e5d08cdd-28e6-4231-a50c-7eafc1b8f942 70fc3b55af8c4386b80207dad11db5da dcec54844db44eedbd9667951a5ceb6b - - - -] remove_router_interface failed: No details.: ovsdbapp.backend.ovs_idl.idlutils.RowNotFound: Cannot find Logical_Router_Port with name=lrp-7e0debbb-893c-420a-8569-d8fb98e6a16e 2024-04-03 21:13:09.804 10 ERROR neutron.api.v2.resource Traceback (most recent call last): 2024-04-03 21:13:09.804 10 ERROR neutron.api.v2.resource File "/var/lib/openstack/lib/python3.10/site-packages/neutron/api/v2/resource.py", line 98, in resource 2024-04-03 21:13:09.804 10 ERROR neutron.api.v2.resource result = method(request=request, **args) 2024-04-03 21:13:09.804 10 ERROR neutron.api.v2.resource File "/var/lib/openstack/lib/python3.10/site-packages/neutron_lib/db/api.py", line 140, in wrapped 2024-04-03 21:13:09.804 10 ERROR neutron.api.v2.resource with excutils.save_and_reraise_exception(): 2024-04-03 21:13:09.804 10 ERROR neutron.api.v2.resource File "/var/lib/openstack/lib/python3.10/site-packages/oslo_utils/excutils.py", line 227, in __exit__ 2024-04-03 21:13:09.804 10 ERROR neutron.api.v2.resource self.force_reraise() 2024-04-03 21:13:09.804 10 ERROR neutron.api.v2.resource File "/var/lib/openstack/lib/python3.10/site-packages/oslo_utils/excutils.py", line 200, in force_reraise 2024-04-03 21:13:09.804 10 ERROR neutron.api.v2.resource raise self.value 2024-04-03 21:13:09.804 10 ERROR neutron.api.v2.resource File "/var/lib/openstack/lib/python3.10/site-packages/neutron_lib/db/api.py", line 138, in wrapped 2024-04-03 21:13:09.804 10 ERROR neutron.api.v2.resource return f(*args, **kwargs) 2024-04-03 21:13:09.804 10 ERROR neutron.api.v2.resource File "/var/lib/openstack/lib/python3.10/site-packages/oslo_db/api.py", line 144, in wrapper 2024-04-03 21:13:09.804 10 ERROR neutron.api.v2.resource with excutils.save_and_reraise_exception() as ectxt: 2024-04-03 21:13:09.804 10 ERROR neutron.api.v2.resource File "/var/lib/openstack/lib/python3.10/site-packages/oslo_utils/excutils.py", line 227, in __exit__ 2024-04-03 21:13:09.804 10 ERROR neutron.api.v2.resource self.force_reraise() 2024-04-03 21:13:09.804 10 ERROR neutron.api.v2.resource File "/var/lib/openstack/lib/python3.10/site-packages/oslo_utils/excutils.py", line 200, in force_reraise 2024-04-03 21:13:09.804 10 ERROR neutron.api.v2.resource raise self.value 2024-04-03 21:13:09.804 10 ERROR neutron.api.v2.resource File "/var/lib/openstack/lib/python3.10/site-packages/oslo_db/api.py", line 142, in wrapper 2024-04-03 21:13:09.804 10 ERROR neutron.api.v2.resource return f(*args, **kwargs) 2024-04-03 21:13:09.804 10 ERROR neutron.api.v2.resource File "/var/lib/openstack/lib/python3.10/site-packages/neutron_lib/db/api.py", line 186, in wrapped 2024-04-03 21:13:09.804 10 ERROR neutron.api.v2.resource with excutils.save_and_reraise_exception(): 2024-04-03 21:13:09.804 10 ERROR neutron.api.v2.resource File "/var/lib/openstack/lib/python3.10/site-packages/oslo_utils/excutils.py", line 227, in __exit__ 2024-04-03 21:13:09.804 10 ERROR neutron.api.v2.resource self.force_reraise() 2024-04-03 21:13:09.804 10 ERROR neutron.api.v2.resource File "/var/lib/openstack/lib/python3.10/site-packages/oslo_utils/excutils.py", line 200, in force_reraise 2024-04-03 21:13:09.804 10 ERROR neutron.api.v2.resource raise self.value 2024-04-03 21:13:09.804 10 ERROR neutron.api.v2.resource File "/var/lib/openstack/lib/python3.10/site-packages/neutron_lib/db/api.py", line 184, in wrapped 2024-04-03 21:13:09.804 10 ERROR neutron.api.v2.resource return f(*dup_args, **dup_kwargs) 2024-04-03 21:13:09.804 10 ERROR neutron.api.v2.resource File "/var/lib/openstack/lib/python3.10/site-packages/neutron/api/v2/base.py", line 253, in _handle_action 2024-04-03 21:13:09.804 10 ERROR neutron.api.v2.resource ret_value = getattr(self._plugin, name)(*arg_list, **kwargs) 2024-04-03 21:13:09.804 10 ERROR neutron.api.v2.resource File "/var/lib/openstack/lib/python3.10/site-packages/neutron/services/ovn_l3/plugin.py", line 260, in remove_router_interface 2024-04-03 21:13:09.804 10 ERROR neutron.api.v2.resource with excutils.save_and_reraise_exception(): 2024-04-03 21:13:09.804 10 ERROR neutron.api.v2.resource File "/var/lib/openstack/lib/python3.10/site-packages/oslo_utils/excutils.py", line 227, in __exit__ 2024-04-03 21:13:09.804 10 ERROR neutron.api.v2.resource self.force_reraise() 2024-04-03 21:13:09.804 10 ERROR neutron.api.v2.resource File "/var/lib/openstack/lib/python3.10/site-packages/oslo_utils/excutils.py", line 200, in force_reraise 2024-04-03 21:13:09.804 10 ERROR
[Yahoo-eng-team] [Bug 2059716] [NEW] [ovn] Multihomed backend (IPv4 + IPv6) with floating IP unreachable
Public bug reported: We've got an interesting scenario where one of the backends of a load balancer is not reachable given the following test environment: 2x networks - provider network, IPv4 + IPv6 subnets - tenant network (Geneve), IPv4 + IPv6 subnets 3x VMs - 2x single port, 2 IP addresses on the tenant network - 1x single port, 2 IP addresses on the tenant network + floating IP (IPv4 only) attached Load balancer: - Using single tenant network, with floating IP (IPv4 only) attached - OVN provider With the setup above, the VM with the floating IP attached will not be reachable by the load balancer (aka, hitting it multiple times will timeout 1/3 of the time). If you remove the floating IP and re-attach it, it works. In troubleshooting, we've noticed that when removing the IPv6 subnet from the tenant network resolves this, so I suspect that it's somehow to do with that. ** Affects: neutron Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/2059716 Title: [ovn] Multihomed backend (IPv4 + IPv6) with floating IP unreachable Status in neutron: New Bug description: We've got an interesting scenario where one of the backends of a load balancer is not reachable given the following test environment: 2x networks - provider network, IPv4 + IPv6 subnets - tenant network (Geneve), IPv4 + IPv6 subnets 3x VMs - 2x single port, 2 IP addresses on the tenant network - 1x single port, 2 IP addresses on the tenant network + floating IP (IPv4 only) attached Load balancer: - Using single tenant network, with floating IP (IPv4 only) attached - OVN provider With the setup above, the VM with the floating IP attached will not be reachable by the load balancer (aka, hitting it multiple times will timeout 1/3 of the time). If you remove the floating IP and re-attach it, it works. In troubleshooting, we've noticed that when removing the IPv6 subnet from the tenant network resolves this, so I suspect that it's somehow to do with that. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/2059716/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 2052915] Re: "neutron-ovs-grenade-multinode" and "neutron-ovn-grenade-multinode" failing in 2023.1 and Zed
Nova is also affected by this. ** Also affects: nova Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/2052915 Title: "neutron-ovs-grenade-multinode" and "neutron-ovn-grenade-multinode" failing in 2023.1 and Zed Status in neutron: Triaged Status in OpenStack Compute (nova): New Bug description: The issue seems to be in the neutron-lib version installed: 2024-02-07 16:19:35.155231 | compute1 | ERROR: neutron 21.2.1.dev38 has requirement neutron-lib>=3.1.0, but you'll have neutron-lib 2.20.2 which is incompatible. That leads to an error when starting the Neutron API (an API definition is not found) [1]: Feb 07 16:13:54.385467 np0036680724 neutron-server[67288]: ERROR neutron ImportError: cannot import name 'port_mac_address_override' from 'neutron_lib.api.definitions' (/usr/local/lib/python3.8/dist-packages/neutron_lib/api/definitions/__init__.py) Setting priority to Critical because that affects to the CI. [1]https://9faad8159db8d6994977-b587eccfce0a645f527dfcbc49e54bb4.ssl.cf2.rackcdn.com/891397/4/check/neutron- ovs-grenade-multinode/ba47cef/controller/logs/screen-q-svc.txt To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/2052915/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 2053274] [NEW] [ovn] mtu for metadata veth interface is not set
Public bug reported: When using OVN, the `veth` interfaces which get created inside the network namespace (and the other half that goes into the OVS bridge) both do not get an MTU configured for them when they are provisioned. https://github.com/openstack/neutron/blob/stable/zed/neutron/agent/ovn/metadata/agent.py#L589-L594 This can cause some unknown/annoying errors with packets being dropped if a user is hitting large requests on the metadata service, the ideal solution would be to configure the correct MTU for the interface to avoid this issue. ** Affects: neutron Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/2053274 Title: [ovn] mtu for metadata veth interface is not set Status in neutron: New Bug description: When using OVN, the `veth` interfaces which get created inside the network namespace (and the other half that goes into the OVS bridge) both do not get an MTU configured for them when they are provisioned. https://github.com/openstack/neutron/blob/stable/zed/neutron/agent/ovn/metadata/agent.py#L589-L594 This can cause some unknown/annoying errors with packets being dropped if a user is hitting large requests on the metadata service, the ideal solution would be to configure the correct MTU for the interface to avoid this issue. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/2053274/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 2042362] [NEW] Listing instances gives unnecessary error if flavor is deleted
Public bug reported: You'll get an alert similar to this: "Unable to retrieve instance size information." reference: https://github.com/vexxhost/atmosphere/issues/574 ** Affects: horizon Importance: Undecided Status: In Progress -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Dashboard (Horizon). https://bugs.launchpad.net/bugs/2042362 Title: Listing instances gives unnecessary error if flavor is deleted Status in OpenStack Dashboard (Horizon): In Progress Bug description: You'll get an alert similar to this: "Unable to retrieve instance size information." reference: https://github.com/vexxhost/atmosphere/issues/574 To manage notifications about this bug go to: https://bugs.launchpad.net/horizon/+bug/2042362/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 2038978] [NEW] [OVN] ARP + Floating IP issues
Public bug reported: When using OVN, if you have a virtual router with a gateway that is in subnet A, and has a port that has a floating IP attached to it from subnet B, they seem to not be reachable. https://mail.openvswitch.org/pipermail/ovs-dev/2021-July/385253.html There was a fix brought into OVN with this not long ago, it introduces an option called `options:add_route` to `true`. see: https://mail.openvswitch.org/pipermail/ovs- dev/2021-July/385255.html I think we should do this in order to mirror the same behaviour in ML2/OVS since we install scope link routes. ** Affects: neutron Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/2038978 Title: [OVN] ARP + Floating IP issues Status in neutron: New Bug description: When using OVN, if you have a virtual router with a gateway that is in subnet A, and has a port that has a floating IP attached to it from subnet B, they seem to not be reachable. https://mail.openvswitch.org/pipermail/ovs-dev/2021-July/385253.html There was a fix brought into OVN with this not long ago, it introduces an option called `options:add_route` to `true`. see: https://mail.openvswitch.org/pipermail/ovs- dev/2021-July/385255.html I think we should do this in order to mirror the same behaviour in ML2/OVS since we install scope link routes. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/2038978/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 2037585] [NEW] VM fails to delete with trunk + subports
Public bug reported: When using Neutron, it will prevent you to delete a port if the subports are still attached: https://review.opendev.org/c/openstack/neutron/+/885154 Because of this, if you delete a VM with subports attached, you will end up with a VM in ERROR state: ``` 2023-09-27 18:31:01.056 328858 ERROR nova.network.neutron [instance: 08ca5cf4-c86a-4446-a031-a3b84ff47884] Traceback (most recent call last): 2023-09-27 18:31:01.056 328858 ERROR nova.network.neutron [instance: 08ca5cf4-c86a-4446-a031-a3b84ff47884] File "/var/lib/openstack/lib/python3.10/site-packages/nova/network/neutron.py", line 1768, in _delete_ports 2023-09-27 18:31:01.056 328858 ERROR nova.network.neutron [instance: 08ca5cf4-c86a-4446-a031-a3b84ff47884] neutron.delete_port(port) 2023-09-27 18:31:01.056 328858 ERROR nova.network.neutron [instance: 08ca5cf4-c86a-4446-a031-a3b84ff47884] File "/var/lib/openstack/lib/python3.10/site-packages/nova/network/neutron.py", line 196, in wrapper 2023-09-27 18:31:01.056 328858 ERROR nova.network.neutron [instance: 08ca5cf4-c86a-4446-a031-a3b84ff47884] ret = obj(*args, **kwargs) 2023-09-27 18:31:01.056 328858 ERROR nova.network.neutron [instance: 08ca5cf4-c86a-4446-a031-a3b84ff47884] File "/var/lib/openstack/lib/python3.10/site-packages/neutronclient/v2_0/client.py", line 833, in delete_port 2023-09-27 18:31:01.056 328858 ERROR nova.network.neutron [instance: 08ca5cf4-c86a-4446-a031-a3b84ff47884] return self.delete(self.port_path % (port)) 2023-09-27 18:31:01.056 328858 ERROR nova.network.neutron [instance: 08ca5cf4-c86a-4446-a031-a3b84ff47884] File "/var/lib/openstack/lib/python3.10/site-packages/nova/network/neutron.py", line 196, in wrapper 2023-09-27 18:31:01.056 328858 ERROR nova.network.neutron [instance: 08ca5cf4-c86a-4446-a031-a3b84ff47884] ret = obj(*args, **kwargs) 2023-09-27 18:31:01.056 328858 ERROR nova.network.neutron [instance: 08ca5cf4-c86a-4446-a031-a3b84ff47884] File "/var/lib/openstack/lib/python3.10/site-packages/neutronclient/v2_0/client.py", line 352, in delete 2023-09-27 18:31:01.056 328858 ERROR nova.network.neutron [instance: 08ca5cf4-c86a-4446-a031-a3b84ff47884] return self.retry_request("DELETE", action, body=body, 2023-09-27 18:31:01.056 328858 ERROR nova.network.neutron [instance: 08ca5cf4-c86a-4446-a031-a3b84ff47884] File "/var/lib/openstack/lib/python3.10/site-packages/nova/network/neutron.py", line 196, in wrapper 2023-09-27 18:31:01.056 328858 ERROR nova.network.neutron [instance: 08ca5cf4-c86a-4446-a031-a3b84ff47884] ret = obj(*args, **kwargs) 2023-09-27 18:31:01.056 328858 ERROR nova.network.neutron [instance: 08ca5cf4-c86a-4446-a031-a3b84ff47884] File "/var/lib/openstack/lib/python3.10/site-packages/neutronclient/v2_0/client.py", line 333, in retry_request 2023-09-27 18:31:01.056 328858 ERROR nova.network.neutron [instance: 08ca5cf4-c86a-4446-a031-a3b84ff47884] return self.do_request(method, action, body=body, 2023-09-27 18:31:01.056 328858 ERROR nova.network.neutron [instance: 08ca5cf4-c86a-4446-a031-a3b84ff47884] File "/var/lib/openstack/lib/python3.10/site-packages/nova/network/neutron.py", line 196, in wrapper 2023-09-27 18:31:01.056 328858 ERROR nova.network.neutron [instance: 08ca5cf4-c86a-4446-a031-a3b84ff47884] ret = obj(*args, **kwargs) 2023-09-27 18:31:01.056 328858 ERROR nova.network.neutron [instance: 08ca5cf4-c86a-4446-a031-a3b84ff47884] File "/var/lib/openstack/lib/python3.10/site-packages/neutronclient/v2_0/client.py", line 297, in do_request 2023-09-27 18:31:01.056 328858 ERROR nova.network.neutron [instance: 08ca5cf4-c86a-4446-a031-a3b84ff47884] self._handle_fault_response(status_code, replybody, resp) 2023-09-27 18:31:01.056 328858 ERROR nova.network.neutron [instance: 08ca5cf4-c86a-4446-a031-a3b84ff47884] File "/var/lib/openstack/lib/python3.10/site-packages/nova/network/neutron.py", line 196, in wrapper 2023-09-27 18:31:01.056 328858 ERROR nova.network.neutron [instance: 08ca5cf4-c86a-4446-a031-a3b84ff47884] ret = obj(*args, **kwargs) 2023-09-27 18:31:01.056 328858 ERROR nova.network.neutron [instance: 08ca5cf4-c86a-4446-a031-a3b84ff47884] File "/var/lib/openstack/lib/python3.10/site-packages/neutronclient/v2_0/client.py", line 272, in _handle_fault_response 2023-09-27 18:31:01.056 328858 ERROR nova.network.neutron [instance: 08ca5cf4-c86a-4446-a031-a3b84ff47884] exception_handler_v20(status_code, error_body) 2023-09-27 18:31:01.056 328858 ERROR nova.network.neutron [instance: 08ca5cf4-c86a-4446-a031-a3b84ff47884] File "/var/lib/openstack/lib/python3.10/site-packages/neutronclient/v2_0/client.py", line 90, in exception_handler_v20 2023-09-27 18:31:01.056 328858 ERROR nova.network.neutron [instance: 08ca5cf4-c86a-4446-a031-a3b84ff47884] raise client_exc(message=error_message, 2023-09-27 18:31:01.056 328858 ERROR nova.network.neutron [instance: 08ca5cf4-c86a-4446-a031-a3b84ff47884]
[Yahoo-eng-team] [Bug 2028442] [NEW] Support DNS for ovn_{nb, sb}_connection
Public bug reported: At the moment, it seems that when using a DNS hostname for `ovn_nb_connection` or `ovn_sb_connection`, the connection never seems to go up. It seems that the `ovs` library does not resolve an IP address before proceeding, I'm not sure if we should be resolving things and passing them on resolved to OVS, or tryign to look for fix upstream. This is pretty critical for HA deployments that rely on multiple replicas with hostnames (i.e. a Kubernetes StatefulSet) ** Affects: neutron Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/2028442 Title: Support DNS for ovn_{nb,sb}_connection Status in neutron: New Bug description: At the moment, it seems that when using a DNS hostname for `ovn_nb_connection` or `ovn_sb_connection`, the connection never seems to go up. It seems that the `ovs` library does not resolve an IP address before proceeding, I'm not sure if we should be resolving things and passing them on resolved to OVS, or tryign to look for fix upstream. This is pretty critical for HA deployments that rely on multiple replicas with hostnames (i.e. a Kubernetes StatefulSet) To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/2028442/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 2015894] [NEW] VMs failing to go up with RBD volume + volume_use_multipath
Public bug reported: For a VM that's using an RBD volume when `volume_use_multipath` is set to true, VMs will fail to go up. https://github.com/openstack/os- brick/blob/28ffcdbfa138859859beca2f80627c076269be56/os_brick/initiator/linuxscsi.py#L212-L233 It seems like we always call os_brick with enforce_multipath=True, so if that is set to enabled, it ends up failing for all newly provisioned VMs, even if multipath is not in use/necessary. Ideally, we should be able to safely ignore it if we're trying to plug a backend that doesn't use/support multipath.. ** Affects: nova Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/2015894 Title: VMs failing to go up with RBD volume + volume_use_multipath Status in OpenStack Compute (nova): New Bug description: For a VM that's using an RBD volume when `volume_use_multipath` is set to true, VMs will fail to go up. https://github.com/openstack/os- brick/blob/28ffcdbfa138859859beca2f80627c076269be56/os_brick/initiator/linuxscsi.py#L212-L233 It seems like we always call os_brick with enforce_multipath=True, so if that is set to enabled, it ends up failing for all newly provisioned VMs, even if multipath is not in use/necessary. Ideally, we should be able to safely ignore it if we're trying to plug a backend that doesn't use/support multipath.. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/2015894/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1992186] [NEW] "int object is not iterable" when using numerical group names
Public bug reported: When using federation and having the values of `groups` in the mapping set to a number, it will be parsed into a a number and then fail to authenticate: ``` {"error":{"code":400,"message":"'int' object is not iterable","title":"Bad Request"}} ``` I believe the bad bit is here: https://github.com/openstack/keystone/blob/326b014434cc760ba08763e1870ac057f7917e98/keystone/federation/utils.py#L650-L661 ** Affects: keystone Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Identity (keystone). https://bugs.launchpad.net/bugs/1992186 Title: "int object is not iterable" when using numerical group names Status in OpenStack Identity (keystone): New Bug description: When using federation and having the values of `groups` in the mapping set to a number, it will be parsed into a a number and then fail to authenticate: ``` {"error":{"code":400,"message":"'int' object is not iterable","title":"Bad Request"}} ``` I believe the bad bit is here: https://github.com/openstack/keystone/blob/326b014434cc760ba08763e1870ac057f7917e98/keystone/federation/utils.py#L650-L661 To manage notifications about this bug go to: https://bugs.launchpad.net/keystone/+bug/1992186/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1447651] Re: Find many duplicate rules in memory by using iptables_manager
This is no longer relevant and I do not see these warnings, closing because of age. ** Changed in: neutron Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1447651 Title: Find many duplicate rules in memory by using iptables_manager Status in neutron: Invalid Bug description: I installed VPNaas In my devstack. I find many duplicate iptables rules in memory. The rule is ' 2015-04-23 10:55:15.380 ERROR neutron.agent.linux.iptables_manager [-] ## rule is -A neutron- vpn-agen-POSTROUTING -s 192.168.10.0/24 -d 192.168.20.1/24 -m policy --dir out --pol ipsec -j ACCEPT ', and I add this log in 'agent/linux/iptables_manager.py ' after ' _modify_rules '. Why there are duplicate iptables rules? Does iptables_manager weed out duplicate rules? To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1447651/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1598652] Re: Neutron VPNaaS API CI is not enabled
** Changed in: neutron Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1598652 Title: Neutron VPNaaS API CI is not enabled Status in neutron: Fix Released Bug description: VPNaaS API CI test is not enabled since the api test code has issue, now our team fixed it and vpnaas also need these CI tests. So add a new CI job to enable it. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1598652/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1645516] Re: openswan package is not available in Ubuntu 16.04
Invalid now, 16.04 is long gone :) ** Changed in: neutron Status: Confirmed => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1645516 Title: openswan package is not available in Ubuntu 16.04 Status in neutron: Invalid Bug description: We plan to launch vpnaas service on xenial node in rally ci, however , it is failed because openswan package is not available, and we found openswan package is likely not available in Ubuntu 16.04. the error print:http://paste.openstack.org/show/590601/ issue with openswan :http://askubuntu.com/questions/801860/openswan- shows-no-installation-candidate-after-running-apt-get-update To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1645516/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1680484] Re: neutron-vpnaas:error when creating IPSec Site Connection using strongswan on centos
Correct, resolved by the comment Dmitriy added. ** Changed in: neutron Status: New => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1680484 Title: neutron-vpnaas:error when creating IPSec Site Connection using strongswan on centos Status in neutron: Fix Released Bug description: Operating system: CentOS Linux release 7.3.1611 (Core) Kernel: 3.10.0-514.el7.x86_64 Packages: python-neutron-vpnaas-9.0.0-1.el7.noarch openstack-neutron-ml2-9.2.0-1.el7.noarch python2-neutronclient-6.0.0-2.el7.noarch python-neutron-lib-0.4.0-1.el7.noarch openstack-neutron-common-9.2.0-1.el7.noarch openstack-neutron-openvswitch-9.2.0-1.el7.noarch python-neutron-9.2.0-1.el7.noarch openstack-neutron-9.2.0-1.el7.noarch openstack-neutron-vpnaas-9.0.0-1.el7.noarch strongswan-5.4.0-2.el7.x86_64 Configuration options for vpnaass: service_provider = VPN:strongswan:neutron_vpnaas.services.vpn.service_drivers.ipsec.IPsecVPNDriver:default vpn_device_driver = neutron_vpnaas.services.vpn.device_drivers.fedora_strongswan_ipsec.FedoraStrongSwanDriver After I create an IPSec Site Connection use commands as follows: 1) neutron vpn-ikepolicy-create ikepolicy 2) neutron vpn-ipsecpolicy-create ipsecpolicy 3) neutron vpn-service-create --name vpn0 --description "My vpn service0" vpn0 vpn0-subnet 4) neutron vpn-service-create --name vpn1 --description "My vpn service1" vpn1 vpn1-subnet 5) neutron ipsec-site-connection-create --name vpnconnection0 --vpnservice-id vpn0 --ikepolicy-id ikepolicy --ipsecpolicy-id ipsecpolicy --peer-address 10.0.149.16 --peer-id 10.0.149.16 --peer-cidr 10.3.0.0/24 --psk secret 6) neutron ipsec-site-connection-create --name vpnconnection1 --vpnservice-id vpn1 --ikepolicy-id ikepolicy --ipsecpolicy-id ipsecpolicy --peer-address 10.0.149.3 --peer-id 10.0.149.3 --peer-cidr 10.1.0.0/24 --psk secret Then the status of vpnconnection0 and vpnconnection1 always keep PENDING_CREATE. Logs in /var/log/neutron/vpn-agent.log: 2017-04-06 13:42:12.134 16118 INFO oslo_rootwrap.client [req-1441bb58-bfa2-4b5b-bd57-71a9501f8716 07e158a349474724abc69f8651850b18 de65099dfaba4a4f8cb3c49911980e5c - - -] cmd: ['cp', '-a', '/usr/share/strongswan/templates/config/strongswan.d/../plugins', '/var/lib/neutron/ipsec/a2e0c9b9-51fd-4054-a4f9-d2b53adce83a/etc/strongswan/strongswan.d/charon'] 2017-04-06 13:42:12.135 16118 ERROR oslo_messaging.rpc.server [req-1441bb58-bfa2-4b5b-bd57-71a9501f8716 07e158a349474724abc69f8651850b18 de65099dfaba4a4f8cb3c49911980e5c - - -] Exception during message handling 2017-04-06 13:42:12.135 16118 ERROR oslo_messaging.rpc.server Traceback (most recent call last): 2017-04-06 13:42:12.135 16118 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 133, in _process_incoming 2017-04-06 13:42:12.135 16118 ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message) 2017-04-06 13:42:12.135 16118 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 150, in dispatch 2017-04-06 13:42:12.135 16118 ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args) 2017-04-06 13:42:12.135 16118 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 121, in _do_dispatch 2017-04-06 13:42:12.135 16118 ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args) 2017-04-06 13:42:12.135 16118 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/neutron_vpnaas/services/vpn/device_drivers/ipsec.py", line 884, in vpnservice_updated 2017-04-06 13:42:12.135 16118 ERROR oslo_messaging.rpc.server self.sync(context, [router] if router else []) 2017-04-06 13:42:12.135 16118 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py", line 271, in inner 2017-04-06 13:42:12.135 16118 ERROR oslo_messaging.rpc.server return f(*args, **kwargs) 2017-04-06 13:42:12.135 16118 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/neutron_vpnaas/services/vpn/device_drivers/ipsec.py", line 1045, in sync 2017-04-06 13:42:12.135 16118 ERROR oslo_messaging.rpc.server self._sync_vpn_processes(vpnservices, sync_router_ids) 2017-04-06 13:42:12.135 16118 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/neutron_vpnaas/services/vpn/device_drivers/ipsec.py", line 1069, in _sync_vpn_processes 2017-04-06 13:42:12.135 16118 ERROR oslo_messaging.rpc.server process.update() 2017-04-06 13:42:12.135 16118 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/neutron_vpnaas/services/vpn/device_drivers/ipsec.py", line 286, in update 2017-04-06
[Yahoo-eng-team] [Bug 1972028] [NEW] _get_pci_passthrough_devices prone to race condition
Public bug reported: At the moment, the `_get_pci_passthrough_devices` function is prone to race conditions. This specific code here calls `listCaps()`, however, it is possible that the device has disappeared by the time on method has been called: https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L7949-L7959 Which would result in the following traceback: 2022-05-06 20:16:16.110 4053032 ERROR nova.compute.manager [req-51b7c1c4-2b4a-46cc-9baa-8bf61801c48d - - - - -] Error updating resources for node .: libvirt.libvirtError: Node device not found: no node device with matching name 'net_tap8b08ec90_e5_fe_16_3e_0f_0a_d4' 2022-05-06 20:16:16.110 4053032 ERROR nova.compute.manager Traceback (most recent call last): 2022-05-06 20:16:16.110 4053032 ERROR nova.compute.manager File "/var/lib/openstack/lib/python3.8/site-packages/nova/compute/manager.py", line 9946, in _update_available_resource_for_node 2022-05-06 20:16:16.110 4053032 ERROR nova.compute.manager self.rt.update_available_resource(context, nodename, 2022-05-06 20:16:16.110 4053032 ERROR nova.compute.manager File "/var/lib/openstack/lib/python3.8/site-packages/nova/compute/resource_tracker.py", line 879, in update_available_resource 2022-05-06 20:16:16.110 4053032 ERROR nova.compute.manager resources = self.driver.get_available_resource(nodename) 2022-05-06 20:16:16.110 4053032 ERROR nova.compute.manager File "/var/lib/openstack/lib/python3.8/site-packages/nova/virt/libvirt/driver.py", line 8937, in get_available_resource 2022-05-06 20:16:16.110 4053032 ERROR nova.compute.manager data['pci_passthrough_devices'] = self._get_pci_passthrough_devices() 2022-05-06 20:16:16.110 4053032 ERROR nova.compute.manager File "/var/lib/openstack/lib/python3.8/site-packages/nova/virt/libvirt/driver.py", line 7663, in _get_pci_passthrough_devices 2022-05-06 20:16:16.110 4053032 ERROR nova.compute.manager vdpa_devs = [ 2022-05-06 20:16:16.110 4053032 ERROR nova.compute.manager File "/var/lib/openstack/lib/python3.8/site-packages/nova/virt/libvirt/driver.py", line 7664, in 2022-05-06 20:16:16.110 4053032 ERROR nova.compute.manager dev for dev in devices.values() if "vdpa" in dev.listCaps() 2022-05-06 20:16:16.110 4053032 ERROR nova.compute.manager File "/var/lib/openstack/lib/python3.8/site-packages/libvirt.py", line 6276, in listCaps 2022-05-06 20:16:16.110 4053032 ERROR nova.compute.manager raise libvirtError('virNodeDeviceListCaps() failed') 2022-05-06 20:16:16.110 4053032 ERROR nova.compute.manager libvirt.libvirtError: Node device not found: no node device with matching name 'net_tap8b08ec90_e5_fe_16_3e_0f_0a_d4' 2022-05-06 20:16:16.110 4053032 ERROR nova.compute.manager I think the cleaner way is to loop over all the items and skip a device if it raises an error that the device is not found. ** Affects: nova Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1972028 Title: _get_pci_passthrough_devices prone to race condition Status in OpenStack Compute (nova): New Bug description: At the moment, the `_get_pci_passthrough_devices` function is prone to race conditions. This specific code here calls `listCaps()`, however, it is possible that the device has disappeared by the time on method has been called: https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L7949-L7959 Which would result in the following traceback: 2022-05-06 20:16:16.110 4053032 ERROR nova.compute.manager [req-51b7c1c4-2b4a-46cc-9baa-8bf61801c48d - - - - -] Error updating resources for node .: libvirt.libvirtError: Node device not found: no node device with matching name 'net_tap8b08ec90_e5_fe_16_3e_0f_0a_d4' 2022-05-06 20:16:16.110 4053032 ERROR nova.compute.manager Traceback (most recent call last): 2022-05-06 20:16:16.110 4053032 ERROR nova.compute.manager File "/var/lib/openstack/lib/python3.8/site-packages/nova/compute/manager.py", line 9946, in _update_available_resource_for_node 2022-05-06 20:16:16.110 4053032 ERROR nova.compute.manager self.rt.update_available_resource(context, nodename, 2022-05-06 20:16:16.110 4053032 ERROR nova.compute.manager File "/var/lib/openstack/lib/python3.8/site-packages/nova/compute/resource_tracker.py", line 879, in update_available_resource 2022-05-06 20:16:16.110 4053032 ERROR nova.compute.manager resources = self.driver.get_available_resource(nodename) 2022-05-06 20:16:16.110 4053032 ERROR nova.compute.manager File "/var/lib/openstack/lib/python3.8/site-packages/nova/virt/libvirt/driver.py", line 8937, in get_available_resource 2022-05-06 20:16:16.110 4053032 ERROR nova.compute.manager data['pci_passthrough_devices'] = self._get_pci_passthrough_devices() 2022-05-06 20:16:16.110 4053032 ERROR
[Yahoo-eng-team] [Bug 1972023] [NEW] Failed (but retryable) device detaches are logged as ERROR
Public bug reported: At the moment, if a device attempts to be detached and times out (using libvirt), it will log a message: https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L2570-L2573 However, this is not a failure, since we actually retry the process a few more times depending on configuration, and then if it is a full failure, we do report that: https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L2504 In high load environments where this timeout might be hit, this triggers "ERROR" messages that might seem problematic to the operator, however, since the follow up attempt succeeds, there's no need for attention. This message should be logged as a WARNING since the operator will only need to intervene if the ERROR is logged and it is a full failure of detaching the device. ** Affects: nova Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1972023 Title: Failed (but retryable) device detaches are logged as ERROR Status in OpenStack Compute (nova): New Bug description: At the moment, if a device attempts to be detached and times out (using libvirt), it will log a message: https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L2570-L2573 However, this is not a failure, since we actually retry the process a few more times depending on configuration, and then if it is a full failure, we do report that: https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L2504 In high load environments where this timeout might be hit, this triggers "ERROR" messages that might seem problematic to the operator, however, since the follow up attempt succeeds, there's no need for attention. This message should be logged as a WARNING since the operator will only need to intervene if the ERROR is logged and it is a full failure of detaching the device. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1972023/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1971760] [NEW] nova-compute leaks green threads
Public bug reported: At the moment, if the cloud sustain a large number of VIF plugging timeouts, it will lead into a ton of leaked green threads which can cause the nova-compute process to stop reporting/responding. The tracebacks that would occur would be: === 2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 0c0d2422-781c-4bd2-b6bd-e5e3c94b602b] Traceback (most recent call last): 2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 0c0d2422-781c-4bd2-b6bd-e5e3c94b602b] File "/var/lib/openstack/lib/python3.8/site-packages/nova/virt/libvirt/driver.py", line 7230, in _create_guest_with_network 2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 0c0d2422-781c-4bd2-b6bd-e5e3c94b602b] guest = self._create_guest( 2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 0c0d2422-781c-4bd2-b6bd-e5e3c94b602b] File "/usr/lib/python3.8/contextlib.py", line 120, in __exit__ 2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 0c0d2422-781c-4bd2-b6bd-e5e3c94b602b] next(self.gen) 2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 0c0d2422-781c-4bd2-b6bd-e5e3c94b602b] File "/var/lib/openstack/lib/python3.8/site-packages/nova/compute/manager.py", line 479, in wait_for_instance_event 2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 0c0d2422-781c-4bd2-b6bd-e5e3c94b602b] actual_event = event.wait() 2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 0c0d2422-781c-4bd2-b6bd-e5e3c94b602b] File "/var/lib/openstack/lib/python3.8/site-packages/eventlet/event.py", line 125, in wait 2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 0c0d2422-781c-4bd2-b6bd-e5e3c94b602b] result = hub.switch() 2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 0c0d2422-781c-4bd2-b6bd-e5e3c94b602b] File "/var/lib/openstack/lib/python3.8/site-packages/eventlet/hubs/hub.py", line 313, in switch 2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 0c0d2422-781c-4bd2-b6bd-e5e3c94b602b] return self.greenlet.switch() 2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 0c0d2422-781c-4bd2-b6bd-e5e3c94b602b] eventlet.timeout.Timeout: 300 seconds 2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 0c0d2422-781c-4bd2-b6bd-e5e3c94b602b] 2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 0c0d2422-781c-4bd2-b6bd-e5e3c94b602b] During handling of the above exception, another exception occurred: 2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 0c0d2422-781c-4bd2-b6bd-e5e3c94b602b] 2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 0c0d2422-781c-4bd2-b6bd-e5e3c94b602b] Traceback (most recent call last): 2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 0c0d2422-781c-4bd2-b6bd-e5e3c94b602b] File "/var/lib/openstack/lib/python3.8/site-packages/nova/compute/manager.py", line 2409, in _build_and_run_instance 2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 0c0d2422-781c-4bd2-b6bd-e5e3c94b602b] self.driver.spawn(context, instance, image_meta, 2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 0c0d2422-781c-4bd2-b6bd-e5e3c94b602b] File "/var/lib/openstack/lib/python3.8/site-packages/nova/virt/libvirt/driver.py", line 4193, in spawn 2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 0c0d2422-781c-4bd2-b6bd-e5e3c94b602b] self._create_guest_with_network( 2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 0c0d2422-781c-4bd2-b6bd-e5e3c94b602b] File "/var/lib/openstack/lib/python3.8/site-packages/nova/virt/libvirt/driver.py", line 7256, in _create_guest_with_network 2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 0c0d2422-781c-4bd2-b6bd-e5e3c94b602b] raise exception.VirtualInterfaceCreateException() 2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 0c0d2422-781c-4bd2-b6bd-e5e3c94b602b] nova.exception.VirtualInterfaceCreateException: Virtual Interface creation failed 2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 0c0d2422-781c-4bd2-b6bd-e5e3c94b602b] === Eventually, with enough of these, the nova-compute process would hang. The output of GMR shows nearly 6094 threads, with around 3038 of them having the traceback below: === --Green Thread-- /var/lib/openstack/lib/python3.8/site-packages/eventlet/hubs/hub.py:355 in run `self.fire_timers(self.clock())` /var/lib/openstack/lib/python3.8/site-packages/eventlet/hubs/hub.py:476 in fire_timers `timer()` /var/lib/openstack/lib/python3.8/site-packages/eventlet/hubs/timer.py:59 in __call__ `cb(*args, **kw)` /var/lib/openstack/lib/python3.8/site-packages/eventlet/hubs/__init__.py:151 in _timeout `current.throw(exc)` === In addition, 3039 of
[Yahoo-eng-team] [Bug 1917645] Re: Nova can't create instances if RabbitMQ notification cluster is down
As per sean-k-mooney advice, I've added this to be an oslo.messaging bug since it's more of an issue in there than it is in Nova. ** Also affects: oslo.messaging Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1917645 Title: Nova can't create instances if RabbitMQ notification cluster is down Status in OpenStack Compute (nova): Confirmed Status in oslo.messaging: New Bug description: We use independent RabbitMQ clusters for each OpenStack project, Nova Cells and also for notifications. Recently, I noticed in our test infrastructure that if the RabbitMQ cluster for notifications has an outage, Nova can't create new instances. Possibly other operations will also hang. Not being able to send a notification/connect to the RabbitMQ cluster shouldn't stop new instances to be created. (If this is actually an use-case for some deployments, the operator should have the possibility to configure it.) Tested against the master branch. If the notification RabbitMQ is stooped, when creating an instance, nova-scheduler is stuck with: ``` Mar 01 21:16:28 devstack nova-scheduler[18384]: DEBUG nova.scheduler.request_filter [None req-353318d1-f4bd-499d-98db-a0919d28ecf7 demo demo] Request filter 'accelerators_filter' took 0.0 seconds {{(pid=18384) wrapper /opt/stack/nova/nova/scheduler/request_filter.py:46}} Mar 01 21:16:32 devstack nova-scheduler[18384]: ERROR oslo.messaging._drivers.impl_rabbit [None req-353318d1-f4bd-499d-98db-a0919d28ecf7 demo demo] Connection failed: [Errno 113] EHOSTUNREACH (retrying in 2.0 seconds): OSError: [Errno 113] EHOSTUNREACH Mar 01 21:16:35 devstack nova-scheduler[18384]: ERROR oslo.messaging._drivers.impl_rabbit [None req-353318d1-f4bd-499d-98db-a0919d28ecf7 demo demo] Connection failed: [Errno 113] EHOSTUNREACH (retrying in 4.0 seconds): OSError: [Errno 113] EHOSTUNREACH Mar 01 21:16:42 devstack nova-scheduler[18384]: ERROR oslo.messaging._drivers.impl_rabbit [None req-353318d1-f4bd-499d-98db-a0919d28ecf7 demo demo] Connection failed: [Errno 113] EHOSTUNREACH (retrying in 6.0 seconds): OSError: [Errno 113] EHOSTUNREACH Mar 01 21:16:51 devstack nova-scheduler[18384]: ERROR oslo.messaging._drivers.impl_rabbit [None req-353318d1-f4bd-499d-98db-a0919d28ecf7 demo demo] Connection failed: [Errno 113] EHOSTUNREACH (retrying in 8.0 seconds): OSError: [Errno 113] EHOSTUNREACH Mar 01 21:17:02 devstack nova-scheduler[18384]: ERROR oslo.messaging._drivers.impl_rabbit [None req-353318d1-f4bd-499d-98db-a0919d28ecf7 demo demo] Connection failed: [Errno 113] EHOSTUNREACH (retrying in 10.0 seconds): OSError: [Errno 113] EHOSTUNREACH (...) ``` Because the notification RabbitMQ cluster is down, Nova gets stuck in: https://github.com/openstack/nova/blob/5b66caab870558b8a7f7b662c01587b959ad3d41/nova/scheduler/filter_scheduler.py#L85 because oslo messaging never gives up: https://github.com/openstack/oslo.messaging/blob/5aa645b38b4c1cf08b00e687eb6c7c4b8a0211fc/oslo_messaging/_drivers/impl_rabbit.py#L736 To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1917645/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1931908] [NEW] Default CORS allow_headers missing X-OpenStack-Nova-API-Version
Public bug reported: When enabling CORS, by default, the `X-OpenStack-Nova-API-Version` API header is not included in the allowed headers. It should be by default because it's critical to the operation of the OpenStack Nova API. ** Affects: nova Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1931908 Title: Default CORS allow_headers missing X-OpenStack-Nova-API-Version Status in OpenStack Compute (nova): New Bug description: When enabling CORS, by default, the `X-OpenStack-Nova-API-Version` API header is not included in the allowed headers. It should be by default because it's critical to the operation of the OpenStack Nova API. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1931908/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1892370] [NEW] Database migrations fail when schema includes dash
Public bug reported: In our database migrations, we run the following: 'ALTER DATABASE %s DEFAULT CHARACTER SET utf8' If usnig a database name that includes a dash, the migration fails because it requires to be wrapped by `. ** Affects: nova Importance: Undecided Assignee: Mohammed Naser (mnaser) Status: In Progress -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1892370 Title: Database migrations fail when schema includes dash Status in OpenStack Compute (nova): In Progress Bug description: In our database migrations, we run the following: 'ALTER DATABASE %s DEFAULT CHARACTER SET utf8' If usnig a database name that includes a dash, the migration fails because it requires to be wrapped by `. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1892370/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1890057] Re: EC2 instance_id_mappings are never deleted
*** This bug is a duplicate of bug 1786298 *** https://bugs.launchpad.net/bugs/1786298 ** This bug has been marked a duplicate of bug 1786298 nova-manage db archive_deleted_rows does not cleanup table instance_id_mappings -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1890057 Title: EC2 instance_id_mappings are never deleted Status in OpenStack Compute (nova): New Bug description: It looks like whenever we create an instance, we create an EC2 instance ID mapping for it: https://github.com/openstack/nova/blob/df49ad9b29afcafa847b83df445b6627350721b5/nova/db/sqlalchemy/api.py#L1137-L1138 Which is used by the EC2 objects: https://github.com/openstack/nova/blob/df49ad9b29afcafa847b83df445b6627350721b5/nova/objects/ec2.py Which is not really used by much in the API, but it even has a mechanism to 'soft-create' mappings when they pop up: https://github.com/openstack/nova/blob/df49ad9b29afcafa847b83df445b6627350721b5/nova/objects/ec2.py#L63-L74 but a lot of the code seems unreferenced, so I am not sure what is the state of this, however, the problem comes in the fact that it never gets soft deleted anywhere in the code which can lead to... ``` MariaDB [nova]> SELECT COUNT(*) FROM instance_id_mappings; +--+ | COUNT(*) | +--+ | 3941119 | +--+ ``` For something entirely not used. I think the fix could be two parts (but I don't understand why the EC2 API is still referneced): 1. Mappings should be created on-demand (which in my case they never will) 2. Mappings should be soft deleted on instane delete (which should make the archiving work). I'm happy to try and help drive this if we come up with a solution. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1890057/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1890057] [NEW] EC2 instance_id_mappings are never deleted
Public bug reported: It looks like whenever we create an instance, we create an EC2 instance ID mapping for it: https://github.com/openstack/nova/blob/df49ad9b29afcafa847b83df445b6627350721b5/nova/db/sqlalchemy/api.py#L1137-L1138 Which is used by the EC2 objects: https://github.com/openstack/nova/blob/df49ad9b29afcafa847b83df445b6627350721b5/nova/objects/ec2.py Which is not really used by much in the API, but it even has a mechanism to 'soft-create' mappings when they pop up: https://github.com/openstack/nova/blob/df49ad9b29afcafa847b83df445b6627350721b5/nova/objects/ec2.py#L63-L74 but a lot of the code seems unreferenced, so I am not sure what is the state of this, however, the problem comes in the fact that it never gets soft deleted anywhere in the code which can lead to... ``` MariaDB [nova]> SELECT COUNT(*) FROM instance_id_mappings; +--+ | COUNT(*) | +--+ | 3941119 | +--+ ``` For something entirely not used. I think the fix could be two parts (but I don't understand why the EC2 API is still referneced): 1. Mappings should be created on-demand (which in my case they never will) 2. Mappings should be soft deleted on instane delete (which should make the archiving work). I'm happy to try and help drive this if we come up with a solution. ** Affects: nova Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1890057 Title: EC2 instance_id_mappings are never deleted Status in OpenStack Compute (nova): New Bug description: It looks like whenever we create an instance, we create an EC2 instance ID mapping for it: https://github.com/openstack/nova/blob/df49ad9b29afcafa847b83df445b6627350721b5/nova/db/sqlalchemy/api.py#L1137-L1138 Which is used by the EC2 objects: https://github.com/openstack/nova/blob/df49ad9b29afcafa847b83df445b6627350721b5/nova/objects/ec2.py Which is not really used by much in the API, but it even has a mechanism to 'soft-create' mappings when they pop up: https://github.com/openstack/nova/blob/df49ad9b29afcafa847b83df445b6627350721b5/nova/objects/ec2.py#L63-L74 but a lot of the code seems unreferenced, so I am not sure what is the state of this, however, the problem comes in the fact that it never gets soft deleted anywhere in the code which can lead to... ``` MariaDB [nova]> SELECT COUNT(*) FROM instance_id_mappings; +--+ | COUNT(*) | +--+ | 3941119 | +--+ ``` For something entirely not used. I think the fix could be two parts (but I don't understand why the EC2 API is still referneced): 1. Mappings should be created on-demand (which in my case they never will) 2. Mappings should be soft deleted on instane delete (which should make the archiving work). I'm happy to try and help drive this if we come up with a solution. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1890057/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1889454] [NEW] br-int has an unpredictable MTU
Public bug reported: We have an environment where users can plug their VMs both to tenant and provider networks on the hypervisor. This environment does not have jumbo frames. The MTU for VMs plugged directly into provider networks is 1500 (physical network) however it is 1450 for tneant networks (VXLAN). https://github.com/openstack/neutron/blob/2ac52607c266e593700be0784ebadc77789070ff/neutron/agent/common/ovs_lib.py#L299-L319 The code which creates the br-int bridge does not factor in an MTU, which means depending on what gets plugged in first, you could end up with 1500 MTU interfaces connected to br-int, which would give things like this in the system logs: br-int: dropped over-mtu packet: 1500 > 1458 I'm not sure what the best solution inside Neutron to do. Should we perhaps set br-int to the MTU of the largest physical network attachable on the agent? I'm happy to pick up the work. ** Affects: neutron Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1889454 Title: br-int has an unpredictable MTU Status in neutron: New Bug description: We have an environment where users can plug their VMs both to tenant and provider networks on the hypervisor. This environment does not have jumbo frames. The MTU for VMs plugged directly into provider networks is 1500 (physical network) however it is 1450 for tneant networks (VXLAN). https://github.com/openstack/neutron/blob/2ac52607c266e593700be0784ebadc77789070ff/neutron/agent/common/ovs_lib.py#L299-L319 The code which creates the br-int bridge does not factor in an MTU, which means depending on what gets plugged in first, you could end up with 1500 MTU interfaces connected to br-int, which would give things like this in the system logs: br-int: dropped over-mtu packet: 1500 > 1458 I'm not sure what the best solution inside Neutron to do. Should we perhaps set br-int to the MTU of the largest physical network attachable on the agent? I'm happy to pick up the work. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1889454/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1887523] [NEW] Deadlock detection code can be stale
Public bug reported: oslo.db has plenty of infrastructure for detecting deadlocks, however, it seems that at the moment, neutron has it's own implementation of it which is missing a bunch of deadlocks, causing issues when doing work at scale. this bug is to track the work in refactoring all of this to use the native oslo retry. ** Affects: neutron Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1887523 Title: Deadlock detection code can be stale Status in neutron: New Bug description: oslo.db has plenty of infrastructure for detecting deadlocks, however, it seems that at the moment, neutron has it's own implementation of it which is missing a bunch of deadlocks, causing issues when doing work at scale. this bug is to track the work in refactoring all of this to use the native oslo retry. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1887523/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1883969] [NEW] Nova doesn't fail at API layer when image_size > volume_size with BFV
Public bug reported: When trying to boot an instance where the image size is larger than the volume size, there seems to be no 'protection' mechanism of stopping you from doing that, it ends up failing in the compute manager layer making it more complicated for the user to debug. We should probably fail early in the API (just like we do for non-BFV instances). ** Affects: nova Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1883969 Title: Nova doesn't fail at API layer when image_size > volume_size with BFV Status in OpenStack Compute (nova): New Bug description: When trying to boot an instance where the image size is larger than the volume size, there seems to be no 'protection' mechanism of stopping you from doing that, it ends up failing in the compute manager layer making it more complicated for the user to debug. We should probably fail early in the API (just like we do for non-BFV instances). To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1883969/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1878979] [NEW] Quota code does not respect [api]/instance_list_per_project_cells
Public bug reported: The function which counts resources using the legacy method involves getting a list of all cell mappings assigned to a specific project: https://github.com/openstack/nova/blob/575a91ff5be79ac35aef4b61d84c78c693693304/nova/quota.py#L1170-L1209 This code can be very heavy on a database which contains a lot of instances (but not a lot of mappings), potentially scanning millions of rows to gather 1-2 cell mappings. In a single cell environment, it is just extra CPU usage with exactly the same outcome. The [api]/instance_list_per_project_cells was introduced to workaround this: https://github.com/openstack/nova/blob/575a91ff5be79ac35aef4b61d84c78c693693304/nova/compute/instance_list.py#L146-L153 However, the quota code does not implement it which means quota count take a big toll on the database server. We should ideally mirror the same behaviour in the quota code. ** Affects: nova Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1878979 Title: Quota code does not respect [api]/instance_list_per_project_cells Status in OpenStack Compute (nova): New Bug description: The function which counts resources using the legacy method involves getting a list of all cell mappings assigned to a specific project: https://github.com/openstack/nova/blob/575a91ff5be79ac35aef4b61d84c78c693693304/nova/quota.py#L1170-L1209 This code can be very heavy on a database which contains a lot of instances (but not a lot of mappings), potentially scanning millions of rows to gather 1-2 cell mappings. In a single cell environment, it is just extra CPU usage with exactly the same outcome. The [api]/instance_list_per_project_cells was introduced to workaround this: https://github.com/openstack/nova/blob/575a91ff5be79ac35aef4b61d84c78c693693304/nova/compute/instance_list.py#L146-L153 However, the quota code does not implement it which means quota count take a big toll on the database server. We should ideally mirror the same behaviour in the quota code. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1878979/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1862205] [NEW] Instances not visible when hidden=NULL
Public bug reported: During an upgrade of a cloud from Stein to Train, there is a migration which adds the `hidden` field to the database. In that migration, it was assumed that it does not backfill all of the columns. However, upon verifying, it actually does backfill all columns and the order of operations *seems* to be: 1. Create new column for `hidden` 2. Update database migration version 3. Start backfilling all existing instances with hidden=0 In my case, the migration did create the column but failed to backfill all existing instances because of the large number of instances. However, running the migrations again seems to simply continue and not block on that migration, but leaving all columns with hidden=NULL. Feb 06 14:06:13 control02-nova-api-container-f89ad8b4 nova-manage[10596]: 2020-02-06 14:06:13.566 10596 INFO migrate.versioning.api [req-34f0c5a6-2983-4c8e-9b9d-14167851c984 - - - - -] 398 -> 399... Feb 06 14:07:18 control02-nova-api-container-f89ad8b4 nova-manage[10596]: 2020-02-06 14:07:18.129 10596 ERROR oslo_db.sqlalchemy.exc_filters [req-34f0c5a6-2983-4c8e-9b9d-14167851c984 - - - - -] DBAPIError exception wrapped from (pymysql.err.InternalError) (1180, 'Got error 90 "Message too long" during COMMIT') Feb 06 14:07:18 control02-nova-api-container-f89ad8b4 nova-manage[10596]: 2020-02-06 14:07:18.132 10596 ERROR oslo_db.sqlalchemy.exc_filters [req-34f0c5a6-2983-4c8e-9b9d-14167851c984 - - - - -] DB exception wrapped.: sqlalchemy.exc.ResourceClosedError: This Connection is closed Feb 06 14:10:22 control02-nova-api-container-f89ad8b4 nova-manage[14139]: 2020-02-06 14:10:22.930 14139 INFO migrate.versioning.api [req-032e5b40-88c9-4f4b-8ab0-525c50389967 - - - - -] 398 -> 399... Feb 06 14:10:22 control02-nova-api-container-f89ad8b4 nova-manage[14139]: 2020-02-06 14:10:22.985 14139 INFO migrate.versioning.api [req-032e5b40-88c9-4f4b-8ab0-525c50389967 - - - - -] done Feb 06 14:10:22 control02-nova-api-container-f89ad8b4 nova-manage[14139]: 2020-02-06 14:10:22.985 14139 INFO migrate.versioning.api [req-032e5b40-88c9-4f4b-8ab0-525c50389967 - - - - -] 399 -> 400... Feb 06 14:10:22 control02-nova-api-container-f89ad8b4 nova-manage[14139]: 2020-02-06 14:10:22.995 14139 INFO migrate.versioning.api [req-032e5b40-88c9-4f4b-8ab0-525c50389967 - - - - -] done Feb 06 14:10:22 control02-nova-api-container-f89ad8b4 nova-manage[14139]: 2020-02-06 14:10:22.995 14139 INFO migrate.versioning.api [req-032e5b40-88c9-4f4b-8ab0-525c50389967 - - - - -] 400 -> 401... Feb 06 14:10:23 control02-nova-api-container-f89ad8b4 nova-manage[14139]: 2020-02-06 14:10:23.145 14139 INFO migrate.versioning.api [req-032e5b40-88c9-4f4b-8ab0-525c50389967 - - - - -] done Feb 06 14:10:23 control02-nova-api-container-f89ad8b4 nova-manage[14139]: 2020-02-06 14:10:23.145 14139 INFO migrate.versioning.api [req-032e5b40-88c9-4f4b-8ab0-525c50389967 - - - - -] 401 -> 402... Feb 06 14:10:23 control02-nova-api-container-f89ad8b4 nova-manage[14139]: 2020-02-06 14:10:23.244 14139 INFO migrate.versioning.api [req-032e5b40-88c9-4f4b-8ab0-525c50389967 - - - - -] done This issue is two-part, because now it seems that Nova does not assume that hidden=NULL means that the instance is not hidden and no longer displays the instance via API or any other operations. The "very silly" confirmation of this behaviour of backfilling was my attempt at patching things up resulted in the same error: == MariaDB [nova]> update instances set hidden=0; ERROR 1180 (HY000): Got error 90 "Message too long" during COMMIT === Ideally, Nova shouldn't try and backfill values and it should treat hidden=NULL as 0. ** Affects: nova Importance: Undecided Status: New ** Tags: db upgrade -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1862205 Title: Instances not visible when hidden=NULL Status in OpenStack Compute (nova): New Bug description: During an upgrade of a cloud from Stein to Train, there is a migration which adds the `hidden` field to the database. In that migration, it was assumed that it does not backfill all of the columns. However, upon verifying, it actually does backfill all columns and the order of operations *seems* to be: 1. Create new column for `hidden` 2. Update database migration version 3. Start backfilling all existing instances with hidden=0 In my case, the migration did create the column but failed to backfill all existing instances because of the large number of instances. However, running the migrations again seems to simply continue and not block on that migration, but leaving all columns with hidden=NULL. Feb 06 14:06:13 control02-nova-api-container-f89ad8b4 nova-manage[10596]: 2020-02-06 14:06:13.566 10596 INFO
[Yahoo-eng-team] [Bug 1852121] [NEW] Delete archived records instantly
Public bug reported: At the moment, in order to clean up a database, you will have to archive first and then run the delete afterwards. If the operator doesn't care about the ability of restoring deleted instances, it means that the archive step is useless for them. It would be nice if we added an option to archive to simply purge directly instead of archive (then use purge records command afterwards). ** Affects: nova Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1852121 Title: Delete archived records instantly Status in OpenStack Compute (nova): New Bug description: At the moment, in order to clean up a database, you will have to archive first and then run the delete afterwards. If the operator doesn't care about the ability of restoring deleted instances, it means that the archive step is useless for them. It would be nice if we added an option to archive to simply purge directly instead of archive (then use purge records command afterwards). To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1852121/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1839560] [NEW] ironic: moving node to maintenance makes it unusable afterwards
Public bug reported: If you use the Ironic API to set a node into a maintenance (for whatever reason), it will no longer be included in the list of available nodes to Nova. When Nova refreshes it's resources periodically, it will find that it is no longer in the list of available nodes and delete it from the database. Once you enable the node again and Nova attempts to create the ComputeNode again, it fails due to the duplicate UUID in the database, because the old record is soft deleted and had the same UUID. ref: https://github.com/openstack/nova/commit/9f28727eb75e05e07bad51b6eecce667d09dfb65 - this made computenode.uuid match the baremetal uuid https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L8304-L8316 - this soft-deletes the computenode record when it doesn't see it in the list of active nodes traces: 2019-08-08 17:20:13.921 6379 INFO nova.compute.manager [req-c71e5c81-eb34-4f72-a260-6aa7e802f490 - - - - -] Deleting orphan compute node 31 hypervisor host is 77788ad5-f1a4-46ac-8132-2d88dbd4e594, nodes are set([u'6d556617-2bdc-42b3-a3fe-b9218a1ebf0e', u'a634fab2-ecea-4cfa-be09-032dce6eaf51', u'2dee290d-ef73-46bc-8fc2-af248841ca12']) ... 2019-08-08 22:21:25.284 82770 WARNING nova.compute.resource_tracker [req-a58eb5e2-9be0-4503-bf68-dff32ff87a3a - - - - -] No compute node record for ctl1-:77788ad5-f1a4-46ac-8132-2d88dbd4e594: ComputeHostNotFound_Remote: Compute host ctl1- could not be found. Remote error: DBDuplicateEntry (pymysql.err.IntegrityError) (1062, u"Duplicate entry '77788ad5-f1a4-46ac-8132-2d88dbd4e594' for key 'compute_nodes_uuid_idx'") ** Affects: nova Importance: High Status: Triaged ** Tags: compute ironic -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1839560 Title: ironic: moving node to maintenance makes it unusable afterwards Status in OpenStack Compute (nova): Triaged Bug description: If you use the Ironic API to set a node into a maintenance (for whatever reason), it will no longer be included in the list of available nodes to Nova. When Nova refreshes it's resources periodically, it will find that it is no longer in the list of available nodes and delete it from the database. Once you enable the node again and Nova attempts to create the ComputeNode again, it fails due to the duplicate UUID in the database, because the old record is soft deleted and had the same UUID. ref: https://github.com/openstack/nova/commit/9f28727eb75e05e07bad51b6eecce667d09dfb65 - this made computenode.uuid match the baremetal uuid https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L8304-L8316 - this soft-deletes the computenode record when it doesn't see it in the list of active nodes traces: 2019-08-08 17:20:13.921 6379 INFO nova.compute.manager [req-c71e5c81-eb34-4f72-a260-6aa7e802f490 - - - - -] Deleting orphan compute node 31 hypervisor host is 77788ad5-f1a4-46ac-8132-2d88dbd4e594, nodes are set([u'6d556617-2bdc-42b3-a3fe-b9218a1ebf0e', u'a634fab2-ecea-4cfa-be09-032dce6eaf51', u'2dee290d-ef73-46bc-8fc2-af248841ca12']) ... 2019-08-08 22:21:25.284 82770 WARNING nova.compute.resource_tracker [req-a58eb5e2-9be0-4503-bf68-dff32ff87a3a - - - - -] No compute node record for ctl1-:77788ad5-f1a4-46ac-8132-2d88dbd4e594: ComputeHostNotFound_Remote: Compute host ctl1- could not be found. Remote error: DBDuplicateEntry (pymysql.err.IntegrityError) (1062, u"Duplicate entry '77788ad5-f1a4-46ac-8132-2d88dbd4e594' for key 'compute_nodes_uuid_idx'") To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1839560/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1825386] Re: nova is looking for OVMF file no longer provided by CentOS 7.6
** Also affects: openstack-ansible Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1825386 Title: nova is looking for OVMF file no longer provided by CentOS 7.6 Status in OpenStack Compute (nova): New Status in openstack-ansible: In Progress Bug description: In nova/virt/libvirt/driver.py the code looks for a hardcoded path "/usr/share/OVMF/OVMF_CODE.fd". It appears that centos 7.6 has modified the OVMF-20180508-3 rpm to no longer contain this file. Instead it now seems to be named /usr/share/OVMF/OVMF_CODE.secboot.fd This will break the ability to boot guests using UEFI. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1825386/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1822676] [NEW] novnc no longer sets token inside cookie
Public bug reported: For a long time, noVNC set the token inside a cookie so that when the /websockify request came in, we had it in the cookies and we could look it up from there and return the correct host. However, since the following commit, they've removed this behavior https://github.com/novnc/noVNC/commit/51f9f0098d306bbc67cc8e02ae547921b6f6585c #diff-1d6838e3812778e95699b90d530543a1L173 This means that we're unable to use latest noVNC with Nova. There is a really gross workaround of using the 'path' override in the URL for something like this http://foo/vnc_lite.html?path=?token=foo That feels pretty lame to me and it will have all deployment tools change their settings. Also, this wasn't caught in CI because we deploy novnc from packages. ** Affects: nova Importance: High Assignee: melanie witt (melwitt) Status: Confirmed ** Affects: openstack-ansible Importance: Undecided Status: New ** Tags: console ** Also affects: openstack-ansible Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1822676 Title: novnc no longer sets token inside cookie Status in OpenStack Compute (nova): Confirmed Status in openstack-ansible: New Bug description: For a long time, noVNC set the token inside a cookie so that when the /websockify request came in, we had it in the cookies and we could look it up from there and return the correct host. However, since the following commit, they've removed this behavior https://github.com/novnc/noVNC/commit/51f9f0098d306bbc67cc8e02ae547921b6f6585c #diff-1d6838e3812778e95699b90d530543a1L173 This means that we're unable to use latest noVNC with Nova. There is a really gross workaround of using the 'path' override in the URL for something like this http://foo/vnc_lite.html?path=?token=foo That feels pretty lame to me and it will have all deployment tools change their settings. Also, this wasn't caught in CI because we deploy novnc from packages. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1822676/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1822613] [NEW] Inefficient queries inside online_data_migrations
Public bug reported: The online_data_migrations should be ran after an upgrade and contains a list of tasks to do to backfill information after an upgrade, however, some of those queries are extremely inefficient which results in this online data migrations taking an unacceptable period of time. The SQL query that takes a really long time in question: > SELECT count(*) AS count_1 > FROM (SELECT instance_extra.created_at AS instance_extra_created_at, > instance_extra.updated_at AS instance_extra_updated_at, > instance_extra.deleted_at AS instance_extra_deleted_at, > instance_extra.deleted AS instance_extra_deleted, instance_extra.id AS > instance_extra_id, instance_extra.instance_uuid AS > instance_extra_instance_uuid > FROM instance_extra > WHERE instance_extra.keypairs IS NULL AND instance_extra.deleted = 0) AS > anon_1 It would also be good for us to *not* run a data migration again if we know we've already gotten found=0 when online_data_migrations is running in "forever-until-complete". Also, the value of 50 rows per run in that mode is quite small. ref: http://lists.openstack.org/pipermail/openstack- discuss/2019-April/004397.html ** Affects: nova Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1822613 Title: Inefficient queries inside online_data_migrations Status in OpenStack Compute (nova): New Bug description: The online_data_migrations should be ran after an upgrade and contains a list of tasks to do to backfill information after an upgrade, however, some of those queries are extremely inefficient which results in this online data migrations taking an unacceptable period of time. The SQL query that takes a really long time in question: > SELECT count(*) AS count_1 > FROM (SELECT instance_extra.created_at AS instance_extra_created_at, > instance_extra.updated_at AS instance_extra_updated_at, > instance_extra.deleted_at AS instance_extra_deleted_at, > instance_extra.deleted AS instance_extra_deleted, instance_extra.id AS > instance_extra_id, instance_extra.instance_uuid AS > instance_extra_instance_uuid > FROM instance_extra > WHERE instance_extra.keypairs IS NULL AND instance_extra.deleted = 0) AS anon_1 It would also be good for us to *not* run a data migration again if we know we've already gotten found=0 when online_data_migrations is running in "forever-until-complete". Also, the value of 50 rows per run in that mode is quite small. ref: http://lists.openstack.org/pipermail/openstack- discuss/2019-April/004397.html To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1822613/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1821244] [NEW] Failed volume creation can result in invalid `connection_info` field
Public bug reported: If a volume fails to create, this can result in `connection_info` having the literal value of 'null' which breaks things down the road that expect it to be a dictionary, an example of a breakage: https://github.com/openstack/nova/blob/a5e3054e1d6df248fc4c00b9abd7289dde160393/nova/compute/utils.py#L1260 This would fail with: AttributeError: 'NoneType' object has no attribute 'get' ** Affects: nova Importance: Undecided Assignee: Mohammed Naser (mnaser) Status: New ** Changed in: nova Assignee: (unassigned) => Mohammed Naser (mnaser) -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1821244 Title: Failed volume creation can result in invalid `connection_info` field Status in OpenStack Compute (nova): New Bug description: If a volume fails to create, this can result in `connection_info` having the literal value of 'null' which breaks things down the road that expect it to be a dictionary, an example of a breakage: https://github.com/openstack/nova/blob/a5e3054e1d6df248fc4c00b9abd7289dde160393/nova/compute/utils.py#L1260 This would fail with: AttributeError: 'NoneType' object has no attribute 'get' To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1821244/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1820752] [NEW] Implement reader/member/admin roles
Public bug reported: Keystone has introduced roles for reader/member/admin which we should leverage in order to be able to provide an easy way for read-only access to APIs. ** Affects: nova Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1820752 Title: Implement reader/member/admin roles Status in OpenStack Compute (nova): New Bug description: Keystone has introduced roles for reader/member/admin which we should leverage in order to be able to provide an easy way for read-only access to APIs. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1820752/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1715374] Re: Reloading compute with SIGHUP prevents instances from booting
** Also affects: oslo.service Importance: Undecided Status: New ** Also affects: openstack-ansible Importance: Undecided Status: New ** Changed in: openstack-ansible Status: New => Confirmed ** Changed in: openstack-ansible Importance: Undecided => Critical ** Changed in: openstack-ansible Assignee: (unassigned) => Mohammed Naser (mnaser) -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1715374 Title: Reloading compute with SIGHUP prevents instances from booting Status in OpenStack Compute (nova): In Progress Status in openstack-ansible: Confirmed Status in oslo.service: In Progress Status in tripleo: Won't Fix Bug description: When trying to boot a new instance at a compute-node, where nova- compute received SIGHUP(the SIGHUP is used as a trigger for reloading mutable options), it always failed. == nova/compute/manager.py == def cancel_all_events(self): if self._events is None: LOG.debug('Unexpected attempt to cancel events during shutdown.') return our_events = self._events # NOTE(danms): Block new events self._events = None<--- Set self._events to "None" ... = This will cause a NovaException when prepare_for_instance_event() was called. It's the cause of the failure of network allocation. == nova/compute/manager.py == def prepare_for_instance_event(self, instance, event_name): ... if self._events is None: # NOTE(danms): We really should have a more specific error # here, but this is what we use for our default error case raise exception.NovaException('In shutdown, no new events ' 'can be scheduled') = To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1715374/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1749574] Re: [tracking] removal and migration of pycrypto
** Changed in: openstack-ansible Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1749574 Title: [tracking] removal and migration of pycrypto Status in Barbican: In Progress Status in Compass: New Status in daisycloud: New Status in OpenStack Backup/Restore and DR (Freezer): New Status in Fuel for OpenStack: New Status in OpenStack Compute (nova): Triaged Status in openstack-ansible: Fix Released Status in OpenStack Global Requirements: Fix Released Status in pyghmi: Fix Committed Status in Solum: Fix Released Status in Tatu: New Status in OpenStack DBaaS (Trove): Fix Released Bug description: trove tatu barbican compass daisycloud freezer fuel nova openstack-ansible - https://review.openstack.org/544516 pyghmi - https://review.openstack.org/569073 solum To manage notifications about this bug go to: https://bugs.launchpad.net/barbican/+bug/1749574/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1807400] Re: networksegments table in neutron can not be cleared automatically
** Also affects: neutron Importance: Undecided Status: New ** Changed in: neutron Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1807400 Title: networksegments table in neutron can not be cleared automatically Status in neutron: Invalid Status in openstack-ansible: New Bug description: _process_port_binding function in neutron/plugins/ml2/plugin.py used clear_binding_levels to clear ml2_port_binding_levels table, but it will not do anything to networksegments under hierarchical port bonding condition @db_api.context_manager.writer def clear_binding_levels(context, port_id, host): if host: for l in (context.session.query(models.PortBindingLevel). filter_by(port_id=port_id, host=host)): context.session.delete(l) LOG.debug("For port %(port_id)s, host %(host)s, " "cleared binding levels", {'port_id': port_id, 'host': host}) To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1807400/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1800511] [NEW] VMs started before Rocky upgrade cannot be live migrated
Public bug reported: In Rocky, the following patch introduced adding MTU to the network for VMs: https://github.com/openstack/nova/commit/f02b3800051234ecc14f3117d5987b1a8ef75877 However, this didn't affect live migrations much because Nova didn't touch the network bits of the XML during live migration, until this patch: https://github.com/openstack/nova/commit/2b52cde565d542c03f004b48ee9c1a6a25f5b7cd With that change, the MTU is added to the configuration, which means that the destination is launched with host_mtu=N, which apparently changes the guest ABI (see: https://bugzilla.redhat.com/show_bug.cgi?id=1449346). This means the live migration will fail with an error looking like this: 2018-10-29 14:59:15.126+: 5289: error : qemuProcessReportLogError:1914 : internal error: qemu unexpectedly closed the monitor: 2018-10-29T14:59:14.977084Z qemu-kvm: get_pci_config_device: Bad config data: i=0x10 read: 61 device: 1 cmask: ff wmask: c0 w1cmask:0 2018-10-29T14:59:14.977105Z qemu-kvm: Failed to load PCIDevice:config 2018-10-29T14:59:14.977109Z qemu-kvm: Failed to load virtio-net:virtio 2018-10-29T14:59:14.977112Z qemu-kvm: error while loading state for instance 0x0 of device ‘:00:03.0/virtio-net’ 2018-10-29T14:59:14.977283Z qemu-kvm: load of migration failed: Invalid argument I was able to further verify this by seeing that `host_mtu` exists in the command line when looking at the destination host instance logs in /var/log/libvirt/qemu/instance-foo.log ** Affects: nova Importance: High Assignee: Mohammed Naser (mnaser) Status: Triaged ** Tags: libvirt live-migration upgrade ** Changed in: nova Assignee: (unassigned) => Mohammed Naser (mnaser) -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1800511 Title: VMs started before Rocky upgrade cannot be live migrated Status in OpenStack Compute (nova): Triaged Bug description: In Rocky, the following patch introduced adding MTU to the network for VMs: https://github.com/openstack/nova/commit/f02b3800051234ecc14f3117d5987b1a8ef75877 However, this didn't affect live migrations much because Nova didn't touch the network bits of the XML during live migration, until this patch: https://github.com/openstack/nova/commit/2b52cde565d542c03f004b48ee9c1a6a25f5b7cd With that change, the MTU is added to the configuration, which means that the destination is launched with host_mtu=N, which apparently changes the guest ABI (see: https://bugzilla.redhat.com/show_bug.cgi?id=1449346). This means the live migration will fail with an error looking like this: 2018-10-29 14:59:15.126+: 5289: error : qemuProcessReportLogError:1914 : internal error: qemu unexpectedly closed the monitor: 2018-10-29T14:59:14.977084Z qemu-kvm: get_pci_config_device: Bad config data: i=0x10 read: 61 device: 1 cmask: ff wmask: c0 w1cmask:0 2018-10-29T14:59:14.977105Z qemu-kvm: Failed to load PCIDevice:config 2018-10-29T14:59:14.977109Z qemu-kvm: Failed to load virtio-net:virtio 2018-10-29T14:59:14.977112Z qemu-kvm: error while loading state for instance 0x0 of device ‘:00:03.0/virtio-net’ 2018-10-29T14:59:14.977283Z qemu-kvm: load of migration failed: Invalid argument I was able to further verify this by seeing that `host_mtu` exists in the command line when looking at the destination host instance logs in /var/log/libvirt/qemu/instance-foo.log To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1800511/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1799892] [NEW] Placement API crashes with 500s in Rocky upgrade with downed compute nodes
Public bug reported: I ran into this upgrading another environment into Rocky, deleted the problematic resource provider, but just ran into it again in another upgrade of another environment so there's something wonky. Here's the traceback: = 2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap [req-8ad1c999-7646-4b0a-91c0-cd26a3581766 b61d42657d364008bfdc6fa715e67daf a894e8109af3430aa7ae03e0c49a0aa0 - default default] Placement API unexpected error: 19: KeyError: 19 2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap Traceback (most recent call last): 2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap File "/usr/lib/python2.7/site-packages/nova/api/openstack/placement/fault_wrap.py", line 40, in __call__ 2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap return self.application(environ, start_response) 2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap File "/usr/lib/python2.7/site-packages/webob/dec.py", line 129, in __call__ 2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap resp = self.call_func(req, *args, **kw) 2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap File "/usr/lib/python2.7/site-packages/webob/dec.py", line 193, in call_func 2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap return self.func(req, *args, **kwargs) 2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap File "/usr/lib/python2.7/site-packages/microversion_parse/middleware.py", line 80, in __call__ 2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap response = req.get_response(self.application) 2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap File "/usr/lib/python2.7/site-packages/webob/request.py", line 1313, in send 2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap application, catch_exc_info=False) 2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap File "/usr/lib/python2.7/site-packages/webob/request.py", line 1277, in call_application 2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap app_iter = application(self.environ, start_response) 2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap File "/usr/lib/python2.7/site-packages/nova/api/openstack/placement/handler.py", line 209, in __call__ 2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap return dispatch(environ, start_response, self._map) 2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap File "/usr/lib/python2.7/site-packages/nova/api/openstack/placement/handler.py", line 146, in dispatch 2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap return handler(environ, start_response) 2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap File "/usr/lib/python2.7/site-packages/webob/dec.py", line 129, in __call__ 2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap resp = self.call_func(req, *args, **kw) 2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap File "/usr/lib/python2.7/site-packages/nova/api/openstack/placement/wsgi_wrapper.py", line 29, in call_func 2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap super(PlacementWsgify, self).call_func(req, *args, **kwargs) 2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap File "/usr/lib/python2.7/site-packages/webob/dec.py", line 193, in call_func 2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap return self.func(req, *args, **kwargs) 2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap File "/usr/lib/python2.7/site-packages/nova/api/openstack/placement/microversion.py", line 164, in decorated_func 2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap return _find_method(f, version, status_code)(req, *args, **kwargs) 2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap File "/usr/lib/python2.7/site-packages/nova/api/openstack/placement/util.py", line 81, in decorated_function 2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap return f(req) 2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap File "/usr/lib/python2.7/site-packages/nova/api/openstack/placement/handlers/allocation_candidate.py", line 316, in list_allocation_candidates 2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap context, requests, limit=limit, group_policy=group_policy) 2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap File
[Yahoo-eng-team] [Bug 1798188] [NEW] VNC stops working in rolling upgrade by default
Public bug reported: During a rolling upgrade, once the control plane is upgraded and running on Rocky (but computes still in Queens), the consoles will stop working. It is not obvious however it seems that the following is missing: ``` [workarounds] enable_consoleauth = True ``` There isn't a really obvious document or anything explaining this, leaving the user confused ** Affects: nova Importance: High Assignee: melanie witt (melwitt) Status: Confirmed ** Tags: console upgrade -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1798188 Title: VNC stops working in rolling upgrade by default Status in OpenStack Compute (nova): Confirmed Bug description: During a rolling upgrade, once the control plane is upgraded and running on Rocky (but computes still in Queens), the consoles will stop working. It is not obvious however it seems that the following is missing: ``` [workarounds] enable_consoleauth = True ``` There isn't a really obvious document or anything explaining this, leaving the user confused To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1798188/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1794811] [NEW] Lack of allocation candidates is only logged in DEBUG
Public bug reported: If the placement service gets allocation candidates, it goes through all the filters and if it ends up with 0 compute nodes, it logs it in INFO: https://github.com/openstack/nova/blob/c6218428e9b29a2c52808ec7d27b4b21aadc0299/nova/filters.py#L130 However, if no allocation candidates match, it throw a message in DEBUG and exits out, leaving important information for the operator. https://github.com/openstack/nova/blob/c3fe54a74d8a3b5d5338a902e3562733a2b9a564/nova/scheduler/manager.py#L150-L153 ** Affects: nova Importance: Undecided Assignee: Mohammed Naser (mnaser) Status: In Progress -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1794811 Title: Lack of allocation candidates is only logged in DEBUG Status in OpenStack Compute (nova): In Progress Bug description: If the placement service gets allocation candidates, it goes through all the filters and if it ends up with 0 compute nodes, it logs it in INFO: https://github.com/openstack/nova/blob/c6218428e9b29a2c52808ec7d27b4b21aadc0299/nova/filters.py#L130 However, if no allocation candidates match, it throw a message in DEBUG and exits out, leaving important information for the operator. https://github.com/openstack/nova/blob/c3fe54a74d8a3b5d5338a902e3562733a2b9a564/nova/scheduler/manager.py#L150-L153 To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1794811/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1793569] [NEW] Add placement audit commands
Public bug reported: It is possible that placement gets out of sync which can cause scheduling problems that would go unknown. I've built out this script would would be nice to have as `nova-manage placement audit`: #!/usr/bin/env python import argparse import sys from openstack import connection import openstack.config config = openstack.config.OpenStackConfig() parser = argparse.ArgumentParser() config.register_argparse_arguments(parser, sys.argv) options = parser.parse_args() cloud_region = config.get_one(argparse=options) conn = connection.Connection(config=cloud_region) # Grab list of all hypervisors and their servers hypervisors = conn.compute.get('/os-hypervisors?with_servers=true', microversion='2.53').json().get('hypervisors') # Generate a dictionary mapping of hypervisor => [instances] hypervisor_mapping = {h['id']: [s['uuid'] for s in h.get('servers', [])] for h in hypervisors} hypervisor_names = {h['id']: h['hypervisor_hostname'] for h in hypervisors} # Grab list of all resource providers resource_providers = conn.placement.get('/resource_providers').json().get('resource_providers') for rp in resource_providers: # Check if RP has VCPU in inventory (i.e. compute node) inventories = conn.placement.get('/resource_providers/%s/inventories' % rp['uuid']).json().get('inventories') # Skip those without VCPU and MEMORY_MB (non computes) if 'MEMORY_MB' not in inventories and 'VCPU' not in inventories: continue # Get all allocations for RP allocations = conn.placement.get('/resource_providers/%s/allocations' % rp['uuid']).json().get('allocations') # Is there a compute node for this RP? if rp['uuid'] not in hypervisor_mapping: print "openstack resource provider delete %s # resource provider does not have matching provider" % rp['uuid'] continue for allocation_id, info in allocations.iteritems(): # The instance does not exist where placement says it should be. if allocation_id not in hypervisor_mapping[rp['uuid']]: hypervisor = None # Try to find where it's hiding. for hyp, instances in hypervisor_mapping.iteritems(): if allocation_id in instances: hypervisor = hyp break # We found it. if hypervisor: classes = ','.join(["%s=%s" % (key, value) for key, value in info.get('resources').iteritems()]) print "openstack resource provider allocation set --allocation rp=%s,%s %s # instance allocated on wrong rp" % (hypervisor, classes, allocation_id) continue # We don't know where this is. Let's see if it exists in Nova. server = conn.placement.get('/servers/%s' % allocation_id) if server.status_code == 404: print "openstack resource provider allocation delete %s # instance deleted" % allocation_id continue # TODO: idk? edge cases? raise It would likely need to be rewritten to use the built-in placement HTTP client and objects to avoid extra API calls. ** Affects: nova Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1793569 Title: Add placement audit commands Status in OpenStack Compute (nova): New Bug description: It is possible that placement gets out of sync which can cause scheduling problems that would go unknown. I've built out this script would would be nice to have as `nova-manage placement audit`: #!/usr/bin/env python import argparse import sys from openstack import connection import openstack.config config = openstack.config.OpenStackConfig() parser = argparse.ArgumentParser() config.register_argparse_arguments(parser, sys.argv) options = parser.parse_args() cloud_region = config.get_one(argparse=options) conn = connection.Connection(config=cloud_region) # Grab list of all hypervisors and their servers hypervisors = conn.compute.get('/os-hypervisors?with_servers=true', microversion='2.53').json().get('hypervisors') # Generate a dictionary mapping of hypervisor => [instances] hypervisor_mapping = {h['id']: [s['uuid'] for s in h.get('servers', [])] for h in hypervisors} hypervisor_names = {h['id']: h['hypervisor_hostname'] for h in hypervisors} # Grab list of all resource providers resource_providers = conn.placement.get('/resource_providers').json().get('resource_providers') for rp in resource_providers: # Check if RP has VCPU in inventory (i.e. compute node) inventories = conn.placement.get('/resource_providers/%s/inventories' % rp['uuid']).json().get('inventories') # Skip those without VCPU and MEMORY_MB (non computes)
[Yahoo-eng-team] [Bug 1793533] [NEW] Deleting a service with nova-compute binary doesn't remove compute node
Public bug reported: If you are taking a nova-compute service out of service permanently, the logical steps would be: 1) Take down the service 2) Delete it from the service list (nova service-delete ) However, this does not delete the compute node record which stays forever, leading to the scheduler to always complain about it as well: 2018-09-20 13:15:45.312 131035 WARNING nova.scheduler.host_manager [req- c4a7c383-c606-48a7-b870-cc143710114a 234412d3482f4707877ca696e105bf5b acb15d2ffaae4eda98580c7b874d7f89 - default default] No compute service record found for host .vexxhost.net https://github.com/openstack/nova/blob/master/nova/scheduler/host_manager.py#L716-L720 We should be deleting the compute node if a nova-compute binary is deleted, or that section should automatically clean up while warning (because service records can be rebuilt anyways?) ** Affects: nova Importance: Undecided Status: Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1793533 Title: Deleting a service with nova-compute binary doesn't remove compute node Status in OpenStack Compute (nova): Invalid Bug description: If you are taking a nova-compute service out of service permanently, the logical steps would be: 1) Take down the service 2) Delete it from the service list (nova service-delete ) However, this does not delete the compute node record which stays forever, leading to the scheduler to always complain about it as well: 2018-09-20 13:15:45.312 131035 WARNING nova.scheduler.host_manager [req-c4a7c383-c606-48a7-b870-cc143710114a 234412d3482f4707877ca696e105bf5b acb15d2ffaae4eda98580c7b874d7f89 - default default] No compute service record found for host .vexxhost.net https://github.com/openstack/nova/blob/master/nova/scheduler/host_manager.py#L716-L720 We should be deleting the compute node if a nova-compute binary is deleted, or that section should automatically clean up while warning (because service records can be rebuilt anyways?) To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1793533/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1784074] [NEW] Instances end up with no cell assigned in instance_mappings
Public bug reported: There has been situations where due to an unrelated issue such as an RPC or DB problem, the nova_api instance_mappings table can end up with instances that have cell_id set to NULL which can cause annoying and weird behaviour such as undeletable instances, etc. This seems to be an issue only during times where these external infrastructure components had issues. I have come up with the following script which loops over all cells and checks where they are, and spits out a mysql query to run to fix. This would be nice to have as a nova-manage cell_v2 command to help if any other users run into this, unfortunately I'm a bit short on time so I don't have time to nova-ify it, but it's here: #!/usr/bin/env python import urlparse import pymysql # Connect to databases api_conn = pymysql.connect(host='', port=3306, user='nova_api', passwd='xxx', db='nova_api') api_cur = api_conn.cursor() def _get_conn(db): parsed_url = urlparse.urlparse(db) conn = pymysql.connect(host=parsed_url.hostname, user=parsed_url.username, passwd=parsed_url.password, db=parsed_url.path[1:]) return conn.cursor() # Get list of all cells api_cur.execute("SELECT uuid, name, database_connection FROM cell_mappings") CELLS = [{'uuid': uuid, 'name': name, 'db': _get_conn(db)} for uuid, name, db in api_cur.fetchall()] # Get list of all unmapped instances api_cur.execute("SELECT instance_uuid FROM instance_mappings WHERE cell_id IS NULL") print "Number of unmapped instances: %s" % api_cur.rowcount # Go over all unmapped instances for (instance_uuid,) in api_cur.fetchall(): instance_cell = None # Check which cell contains this instance for cell in CELLS: cell['db'].execute("SELECT id FROM instances WHERE uuid = %s", (instance_uuid,)) if cell['db'].rowcount != 0: instance_cell = cell break # Update to the correct cell if instance_cell: print "UPDATE instance_mappings SET cell_id = '%s' WHERE instance_uuid = '%s'" % (instance_cell['uuid'], instance_uuid) continue # If we reach this point, it's not in any cell?! print "%s: not found in any cell" % (instance_uuid) ** Affects: nova Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1784074 Title: Instances end up with no cell assigned in instance_mappings Status in OpenStack Compute (nova): New Bug description: There has been situations where due to an unrelated issue such as an RPC or DB problem, the nova_api instance_mappings table can end up with instances that have cell_id set to NULL which can cause annoying and weird behaviour such as undeletable instances, etc. This seems to be an issue only during times where these external infrastructure components had issues. I have come up with the following script which loops over all cells and checks where they are, and spits out a mysql query to run to fix. This would be nice to have as a nova-manage cell_v2 command to help if any other users run into this, unfortunately I'm a bit short on time so I don't have time to nova-ify it, but it's here: #!/usr/bin/env python import urlparse import pymysql # Connect to databases api_conn = pymysql.connect(host='', port=3306, user='nova_api', passwd='xxx', db='nova_api') api_cur = api_conn.cursor() def _get_conn(db): parsed_url = urlparse.urlparse(db) conn = pymysql.connect(host=parsed_url.hostname, user=parsed_url.username, passwd=parsed_url.password, db=parsed_url.path[1:]) return conn.cursor() # Get list of all cells api_cur.execute("SELECT uuid, name, database_connection FROM cell_mappings") CELLS = [{'uuid': uuid, 'name': name, 'db': _get_conn(db)} for uuid, name, db in api_cur.fetchall()] # Get list of all unmapped instances api_cur.execute("SELECT instance_uuid FROM instance_mappings WHERE cell_id IS NULL") print "Number of unmapped instances: %s" % api_cur.rowcount # Go over all unmapped instances for (instance_uuid,) in api_cur.fetchall(): instance_cell = None # Check which cell contains this instance for cell in CELLS: cell['db'].execute("SELECT id FROM instances WHERE uuid = %s", (instance_uuid,)) if cell['db'].rowcount != 0: instance_cell = cell break # Update to the correct cell if instance_cell: print "UPDATE instance_mappings SET cell_id = '%s' WHERE instance_uuid = '%s'" % (instance_cell['uuid'], instance_uuid) continue # If we reach this point, it's not in any cell?! print "%s: not found in any cell" % (instance_uuid)
[Yahoo-eng-team] [Bug 1769283] [NEW] ImagePropertiesFilter has no default value resulting in unpredictable scheduling
Public bug reported: When using ImagePropertiesFilter for something like hardware architecture, it can cause very unpredictable behaviour because of the lack of default value. In our case, a public cloud user will most likely upload an image without `hw_architecture` defined anywhere (as most instruction and general OpenStack documentation refers to). However, in a case where there are multiple architectures available, the images tagged with a specific architecture will go towards hypervisors with that specific architecture. However, those which are not tagged will go to *any* hypervisor. Because of how popular certain architectures are, it should be possible to be able to set a 'default' value for the architecture as it is the implied one, with the ability to override it if a user wants a specific one. ** Affects: nova Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1769283 Title: ImagePropertiesFilter has no default value resulting in unpredictable scheduling Status in OpenStack Compute (nova): New Bug description: When using ImagePropertiesFilter for something like hardware architecture, it can cause very unpredictable behaviour because of the lack of default value. In our case, a public cloud user will most likely upload an image without `hw_architecture` defined anywhere (as most instruction and general OpenStack documentation refers to). However, in a case where there are multiple architectures available, the images tagged with a specific architecture will go towards hypervisors with that specific architecture. However, those which are not tagged will go to *any* hypervisor. Because of how popular certain architectures are, it should be possible to be able to set a 'default' value for the architecture as it is the implied one, with the ability to override it if a user wants a specific one. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1769283/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1755890] [NEW] Instances fail to hard reboot when using OpenDaylight
Public bug reported: When using OpenDaylight with Open vSwitch, the Neutron Open vSwitch agent does not exist in the environment anymore. When an instance is started up for the first time, OpenDaylight will successfully bind the port and send the vif plugged notification. However, since the introduction of the following patch: https://review.openstack.org/#/q/Ib08afad3822f2ca95cfeea18d7f4fc4cb407b4d6 It now expects the vif plugged event to happen on hard reboots, which for certain environments (such as using ODL with OVS, it will not come in). This results in all instance starts after the first one failing. Discussion: http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2018-03-14.log.html#t2018-03-14T18:12:48 ODL issue: https://jira.opendaylight.org/projects/NETVIRT/issues/NETVIRT-512 ** Affects: nova Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1755890 Title: Instances fail to hard reboot when using OpenDaylight Status in OpenStack Compute (nova): New Bug description: When using OpenDaylight with Open vSwitch, the Neutron Open vSwitch agent does not exist in the environment anymore. When an instance is started up for the first time, OpenDaylight will successfully bind the port and send the vif plugged notification. However, since the introduction of the following patch: https://review.openstack.org/#/q/Ib08afad3822f2ca95cfeea18d7f4fc4cb407b4d6 It now expects the vif plugged event to happen on hard reboots, which for certain environments (such as using ODL with OVS, it will not come in). This results in all instance starts after the first one failing. Discussion: http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2018-03-14.log.html#t2018-03-14T18:12:48 ODL issue: https://jira.opendaylight.org/projects/NETVIRT/issues/NETVIRT-512 To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1755890/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1752736] [NEW] Nova compute dies if it cannot authenticate to RabbitMQ
Public bug reported: At the moment, nova-compute will die if it fails to authenticate to the messaging cluster and it will not retry on start. It is possible that the vhost is not ready yet so it should be handled here: https://github.com/openstack/nova/blob/stable/pike/nova/conductor/api.py#L61-L78 ** Affects: nova Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1752736 Title: Nova compute dies if it cannot authenticate to RabbitMQ Status in OpenStack Compute (nova): New Bug description: At the moment, nova-compute will die if it fails to authenticate to the messaging cluster and it will not retry on start. It is possible that the vhost is not ready yet so it should be handled here: https://github.com/openstack/nova/blob/stable/pike/nova/conductor/api.py#L61-L78 To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1752736/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1750666] [NEW] Deleting an instance before scheduling with BFV fails to detach volume
Public bug reported: If you try to boot and instance and delete it early before scheduling, the '_delete_while_booting' codepath hits `_attempt_delete_of_buildrequest` which tries to remove the block device mappings. However, if the cloud contains compute nodes before Pike, no block device mappings will be present in the database (because they are only saved if using the new attachment flow), which means the attachment IDs are empty and the volume delete fails: 2018-02-20 16:02:25,063 WARNING [nova.compute.api] Ignoring volume cleanup failure due to Object action obj_load_attr failed because: attribute attachment_id not lazy-loadable ** Affects: nova Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1750666 Title: Deleting an instance before scheduling with BFV fails to detach volume Status in OpenStack Compute (nova): New Bug description: If you try to boot and instance and delete it early before scheduling, the '_delete_while_booting' codepath hits `_attempt_delete_of_buildrequest` which tries to remove the block device mappings. However, if the cloud contains compute nodes before Pike, no block device mappings will be present in the database (because they are only saved if using the new attachment flow), which means the attachment IDs are empty and the volume delete fails: 2018-02-20 16:02:25,063 WARNING [nova.compute.api] Ignoring volume cleanup failure due to Object action obj_load_attr failed because: attribute attachment_id not lazy-loadable To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1750666/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1739325] [NEW] Server operations fail to complete with versioned notifications if payload contains unset non-nullable fields
Public bug reported: With versioned notifications, the instance payload tries to attach a flavor payload which it looks up from the instance. It uses the one which is attached in instance_extras however there seems to be a scenario where the disabled field is missing in the database, causing all operations to fail in the notification stage. The JSON string for the flavor in the database is attached below (note this is a cloud with a long lifetime so it might be some weird conversion at some point in the life time of the cloud). The temporary workaround as suggested by Matt was to switch to unversioned notification which did the trick. == flavor == {"new": null, "old": null, "cur": {"nova_object.version": "1.1", "nova_object.changes": ["root_gb", "name", "ephemeral_gb", "memory_mb", "vcpus", "extra_specs", "swap", "rxtx_factor", "flavorid", "vcpu_weight", "id"], "nova_object.name": "Flavor", "nova_object.data": {"root_gb": 80, "name": "nb.2G", "ephemeral_gb": 0, "memory_mb": 2048, "vcpus": 4, "extra_specs": {}, "swap": 0, "rxtx_factor": 1.0, "flavorid": "8c6a8477-20cb-4db9-ad1d-be3bc05cdae9", "vcpu_weight": null, "id": 8}, "nova_object.namespace": "nova"}} == flavor == == stack == 2017-10-23 14:49:21.117 40200 ERROR oslo_messaging.rpc.server [req-edc9fb83-63ff-4c4b-b6c6-704d331905a8 604d5fd332904975a26b6e89c60a9d51 d6ebcbe536f848b3af4403f922377f80 - default default] Exception during message handling: ValueError: Field `disabled' cannot be None 2017-10-23 14:49:21.117 40200 ERROR oslo_messaging.rpc.server Traceback (most recent call last): 2017-10-23 14:49:21.117 40200 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 160, in _process_incoming 2017-10-23 14:49:21.117 40200 ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message) 2017-10-23 14:49:21.117 40200 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 213, in dispatch 2017-10-23 14:49:21.117 40200 ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args) 2017-10-23 14:49:21.117 40200 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 183, in _do_dispatch 2017-10-23 14:49:21.117 40200 ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args) 2017-10-23 14:49:21.117 40200 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/exception_wrapper.py", line 76, in wrapped 2017-10-23 14:49:21.117 40200 ERROR oslo_messaging.rpc.server function_name, call_dict, binary) 2017-10-23 14:49:21.117 40200 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__ 2017-10-23 14:49:21.117 40200 ERROR oslo_messaging.rpc.server self.force_reraise() 2017-10-23 14:49:21.117 40200 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise 2017-10-23 14:49:21.117 40200 ERROR oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb) 2017-10-23 14:49:21.117 40200 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/exception_wrapper.py", line 67, in wrapped 2017-10-23 14:49:21.117 40200 ERROR oslo_messaging.rpc.server return f(self, context, *args, **kw) 2017-10-23 14:49:21.117 40200 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 189, in decorated_function 2017-10-23 14:49:21.117 40200 ERROR oslo_messaging.rpc.server "Error: %s", e, instance=instance) 2017-10-23 14:49:21.117 40200 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__ 2017-10-23 14:49:21.117 40200 ERROR oslo_messaging.rpc.server self.force_reraise() 2017-10-23 14:49:21.117 40200 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise 2017-10-23 14:49:21.117 40200 ERROR oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb) 2017-10-23 14:49:21.117 40200 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 159, in decorated_function 2017-10-23 14:49:21.117 40200 ERROR oslo_messaging.rpc.server return function(self, context, *args, **kwargs) 2017-10-23 14:49:21.117 40200 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/compute/utils.py", line 874, in decorated_function 2017-10-23 14:49:21.117 40200 ERROR oslo_messaging.rpc.server return function(self, context, *args, **kwargs) 2017-10-23 14:49:21.117 40200 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 217, in decorated_function 2017-10-23 14:49:21.117 40200 ERROR oslo_messaging.rpc.server
[Yahoo-eng-team] [Bug 1739323] [NEW] KeyError in host_manager for _get_host_states
Public bug reported: https://github.com/openstack/nova/blob/master/nova/scheduler/host_manager.py#L674-L718 In _get_host_states, a list of all computes nodes is retrieved with a `state_key` of `(host, node)`. https://github.com/openstack/nova/blob/master/nova/scheduler/host_manager.py#L692 https://github.com/openstack/nova/blob/master/nova/scheduler/host_manager.py#L708 The small piece of code here removes all of the dead compute nodes from host_state_map https://github.com/openstack/nova/blob/master/nova/scheduler/host_manager.py#L708 However, the result is returned by iterating over all seen nodes and using that index for host_state_map, some of which have been deleted by the code above, resulting in a KeyError. https://github.com/openstack/nova/blob/master/nova/scheduler/host_manager.py#L718 ** Affects: nova Importance: Undecided Status: New ** Tags: scheduler -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1739323 Title: KeyError in host_manager for _get_host_states Status in OpenStack Compute (nova): New Bug description: https://github.com/openstack/nova/blob/master/nova/scheduler/host_manager.py#L674-L718 In _get_host_states, a list of all computes nodes is retrieved with a `state_key` of `(host, node)`. https://github.com/openstack/nova/blob/master/nova/scheduler/host_manager.py#L692 https://github.com/openstack/nova/blob/master/nova/scheduler/host_manager.py#L708 The small piece of code here removes all of the dead compute nodes from host_state_map https://github.com/openstack/nova/blob/master/nova/scheduler/host_manager.py#L708 However, the result is returned by iterating over all seen nodes and using that index for host_state_map, some of which have been deleted by the code above, resulting in a KeyError. https://github.com/openstack/nova/blob/master/nova/scheduler/host_manager.py#L718 To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1739323/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1739318] [NEW] Online data migration context does not contain project_id
Public bug reported: The online data migration generates a context in order to be able to execute migrations: https://github.com/openstack/nova/blob/master/nova/cmd/manage.py#L747 However, this context does not contain a `project_id` when running this via CLI. https://github.com/openstack/nova/blob/master/nova/context.py#L279-L290 During the creation of RequestSpec's for old instances, the context which contains no `project_id`. https://github.com/openstack/nova/blob/master/nova/objects/request_spec.py#L611-L622 This means that a RequestSpec gets created with `project_id` set to `null`. During the day-to-day operations, things work okay, however, when attempting to do a live migration, the `project_id` is set to `null` when trying to claim resources which the placement API refuses. https://github.com/openstack/nova/blob/master/nova/scheduler/utils.py#L791 This will give errors as such: 400 Bad Request 400 Bad Request The server could not comply with the request since it is either malformed or otherwise incorrect. JSON does not validate: None is not of type 'string' Failed validating 'type' in schema['properties']['project_id']: {'maxLength': 255, 'minLength': 1, 'type': 'string'} On instance['project_id']: None ** Affects: nova Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1739318 Title: Online data migration context does not contain project_id Status in OpenStack Compute (nova): New Bug description: The online data migration generates a context in order to be able to execute migrations: https://github.com/openstack/nova/blob/master/nova/cmd/manage.py#L747 However, this context does not contain a `project_id` when running this via CLI. https://github.com/openstack/nova/blob/master/nova/context.py#L279-L290 During the creation of RequestSpec's for old instances, the context which contains no `project_id`. https://github.com/openstack/nova/blob/master/nova/objects/request_spec.py#L611-L622 This means that a RequestSpec gets created with `project_id` set to `null`. During the day-to-day operations, things work okay, however, when attempting to do a live migration, the `project_id` is set to `null` when trying to claim resources which the placement API refuses. https://github.com/openstack/nova/blob/master/nova/scheduler/utils.py#L791 This will give errors as such: 400 Bad Request 400 Bad Request The server could not comply with the request since it is either malformed or otherwise incorrect. JSON does not validate: None is not of type 'string' Failed validating 'type' in schema['properties']['project_id']: {'maxLength': 255, 'minLength': 1, 'type': 'string'} On instance['project_id']: None To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1739318/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1715462] [NEW] Instances failing quota recheck end up with no assigned cell
Public bug reported: When an instance fails the quota rechecks codebase which is here: https://github.com/openstack/nova/blob/master/nova/conductor/manager.py#L992-L1006 It raises an exception, however, the cell mapping is only saved much later (thanks help of dansmith for finding this): https://github.com/openstack/nova/blob/master/nova/conductor/manager.py#L1037-L1043 This results in an instance with an unassigned cell, where it should technically be the cell it was scheduled into. ** Affects: nova Importance: Undecided Status: New ** Tags: cells quotas -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1715462 Title: Instances failing quota recheck end up with no assigned cell Status in OpenStack Compute (nova): New Bug description: When an instance fails the quota rechecks codebase which is here: https://github.com/openstack/nova/blob/master/nova/conductor/manager.py#L992-L1006 It raises an exception, however, the cell mapping is only saved much later (thanks help of dansmith for finding this): https://github.com/openstack/nova/blob/master/nova/conductor/manager.py#L1037-L1043 This results in an instance with an unassigned cell, where it should technically be the cell it was scheduled into. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1715462/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1564182] [NEW] CPU Metrics not working
Public bug reported: The metrics collection on compute nodes is currently not working. When the compute node creates the object to save, it is divided to be a value inside [0,1]. However, at the same time, when the scheduler needs to pull out the numbers, it divides it once again as it pulls the objects: https://github.com/openstack/nova/blob/stable/liberty/nova/compute/resource_tracker.py#L437 https://github.com/openstack/nova/blob/stable/liberty/nova/compute/monitors/base.py#L60-L63 https://github.com/openstack/nova/blob/stable/liberty/nova/objects/monitor_metric.py#L68-L71 This essentially means that it always returns a value of zero as a metric, because it divides a small number again by 100. ** Affects: nova Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1564182 Title: CPU Metrics not working Status in OpenStack Compute (nova): New Bug description: The metrics collection on compute nodes is currently not working. When the compute node creates the object to save, it is divided to be a value inside [0,1]. However, at the same time, when the scheduler needs to pull out the numbers, it divides it once again as it pulls the objects: https://github.com/openstack/nova/blob/stable/liberty/nova/compute/resource_tracker.py#L437 https://github.com/openstack/nova/blob/stable/liberty/nova/compute/monitors/base.py#L60-L63 https://github.com/openstack/nova/blob/stable/liberty/nova/objects/monitor_metric.py#L68-L71 This essentially means that it always returns a value of zero as a metric, because it divides a small number again by 100. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1564182/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1528894] [NEW] Native ovsdb implementation not working
Public bug reported: When trying to use the new native OVSDB provider, the connectivity never goes up due to the fact that what seems to be the db_set operation failing to change the patch ports from "nonexistant-peer" to the correct peer, therefore not linking the bridges together. https://github.com/openstack/neutron/blob/master/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py#L1119 The system must be running the latest Liberty release, python- openvswitch package installed and the following command executed: # ovs-vsctl set-manager ptcp:6640:127.0.0.1 Once that's all done, the openvswitch agent configuration should be changed to the following: [OVS] ovsdb_interface = ovsdb Restarting the OVS agent will setup everything but leave your network in a failed state because the correct patch ports aren't updated: # ovs-vsctl show Bridge br-ex Port br-ex Interface br-ex type: internal Port "em1" Interface "em1" Port phy-br-ex Interface phy-br-ex type: patch options: {peer=nonexistent-peer} Bridge br-int fail_mode: secure Port "qvo25d28228-9c" tag: 1 Interface "qvo25d28228-9c" ... Port int-br-ex Interface int-br-ex type: patch options: {peer=nonexistent-peer} Reverting to the regular old forked implementation works with no problems. ** Affects: neutron Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1528894 Title: Native ovsdb implementation not working Status in neutron: New Bug description: When trying to use the new native OVSDB provider, the connectivity never goes up due to the fact that what seems to be the db_set operation failing to change the patch ports from "nonexistant-peer" to the correct peer, therefore not linking the bridges together. https://github.com/openstack/neutron/blob/master/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py#L1119 The system must be running the latest Liberty release, python- openvswitch package installed and the following command executed: # ovs-vsctl set-manager ptcp:6640:127.0.0.1 Once that's all done, the openvswitch agent configuration should be changed to the following: [OVS] ovsdb_interface = ovsdb Restarting the OVS agent will setup everything but leave your network in a failed state because the correct patch ports aren't updated: # ovs-vsctl show Bridge br-ex Port br-ex Interface br-ex type: internal Port "em1" Interface "em1" Port phy-br-ex Interface phy-br-ex type: patch options: {peer=nonexistent-peer} Bridge br-int fail_mode: secure Port "qvo25d28228-9c" tag: 1 Interface "qvo25d28228-9c" ... Port int-br-ex Interface int-br-ex type: patch options: {peer=nonexistent-peer} Reverting to the regular old forked implementation works with no problems. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1528894/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1528895] [NEW] Timeouts in update_device_list (too slow with large # of VIFs)
Public bug reported: In our environment, we have some large compute nodes with a large number of VIFs. When the update_device_list call happens on the agent start up: https://github.com/openstack/neutron/blob/master/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py#L842 This takes a very long time as it seems to loop on each port at the server side, contact Nova and much more. The default rpc timeout of 60 seconds is not enough and it ends up failing on a server with around 120 VIFs. When raising the timeout to 120, it seems to work with no problems. 2015-12-23 15:27:27.373 38588 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-1e6cc46d-eb52-4d99-bd77-bf2e8424a1ea - - - - -] Error while processing VIF ports 2015-12-23 15:27:27.373 38588 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent Traceback (most recent call last): 2015-12-23 15:27:27.373 38588 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File "/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py", line 1752, in rpc_loop 2015-12-23 15:27:27.373 38588 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent ovs_restarted) 2015-12-23 15:27:27.373 38588 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File "/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py", line 1507, in process_network_ports 2015-12-23 15:27:27.373 38588 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent self._bind_devices(need_binding_devices) 2015-12-23 15:27:27.373 38588 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File "/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py", line 847, in _bind_devices 2015-12-23 15:27:27.373 38588 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent self.conf.host) 2015-12-23 15:27:27.373 38588 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File "/usr/lib/python2.7/dist-packages/neutron/agent/rpc.py", line 179, in update_device_list 2015-12-23 15:27:27.373 38588 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent agent_id=agent_id, host=host) 2015-12-23 15:27:27.373 38588 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/client.py", line 158, in call 2015-12-23 15:27:27.373 38588 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent retry=self.retry) 2015-12-23 15:27:27.373 38588 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File "/usr/lib/python2.7/dist-packages/oslo_messaging/transport.py", line 90, in _send 2015-12-23 15:27:27.373 38588 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent timeout=timeout, retry=retry) 2015-12-23 15:27:27.373 38588 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File "/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 431, in send 2015-12-23 15:27:27.373 38588 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent retry=retry) 2015-12-23 15:27:27.373 38588 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File "/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 420, in _send 2015-12-23 15:27:27.373 38588 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent result = self._waiter.wait(msg_id, timeout) 2015-12-23 15:27:27.373 38588 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File "/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 318, in wait 2015-12-23 15:27:27.373 38588 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent message = self.waiters.get(msg_id, timeout=timeout) 2015-12-23 15:27:27.373 38588 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File "/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 223, in get 2015-12-23 15:27:27.373 38588 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent 'to message ID %s' % msg_id) 2015-12-23 15:27:27.373 38588 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent MessagingTimeout: Timed out waiting for a reply to message ID c42c1ffc801b41ca89aa4472696bbf1a I don't think that an RPC call should ever take that long, the neutron- server is not loaded or anything and adding new ones doesn't seem to resolve it, due to the fact a single RPC responder answers this. ** Affects: neutron Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron.
[Yahoo-eng-team] [Bug 1518016] [NEW] Nova kilo requires concurrency 1.8.2 or better
Public bug reported: OpenStack Nova Kilo release requires 1.8.2 or higher, this is due to the addition of on_execute and on_completion to the execute(..) function. The latest Ubuntu OpenStack Kilo packages currently have code that depend on this new updated release. This results in a crash in some operations like resizes or migrations. 2015-11-19 16:26:24.103 7779 TRACE nova.compute.manager [instance: c04c1cf3-fbd9-40fd-be2e-e7dc06eb9f24] Traceback (most recent call last): 2015-11-19 16:26:24.103 7779 TRACE nova.compute.manager [instance: c04c1cf3-fbd9-40fd-be2e-e7dc06eb9f24] File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 6459, in _error_out_instance_on_exception 2015-11-19 16:26:24.103 7779 TRACE nova.compute.manager [instance: c04c1cf3-fbd9-40fd-be2e-e7dc06eb9f24] yield 2015-11-19 16:26:24.103 7779 TRACE nova.compute.manager [instance: c04c1cf3-fbd9-40fd-be2e-e7dc06eb9f24] File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 4054, in resize_instance 2015-11-19 16:26:24.103 7779 TRACE nova.compute.manager [instance: c04c1cf3-fbd9-40fd-be2e-e7dc06eb9f24] timeout, retry_interval) 2015-11-19 16:26:24.103 7779 TRACE nova.compute.manager [instance: c04c1cf3-fbd9-40fd-be2e-e7dc06eb9f24] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 6353, in migrate_disk_and_power_off 2015-11-19 16:26:24.103 7779 TRACE nova.compute.manager [instance: c04c1cf3-fbd9-40fd-be2e-e7dc06eb9f24] shared_storage) 2015-11-19 16:26:24.103 7779 TRACE nova.compute.manager [instance: c04c1cf3-fbd9-40fd-be2e-e7dc06eb9f24] File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 85, in __exit__ 2015-11-19 16:26:24.103 7779 TRACE nova.compute.manager [instance: c04c1cf3-fbd9-40fd-be2e-e7dc06eb9f24] six.reraise(self.type_, self.value, self.tb) 2015-11-19 16:26:24.103 7779 TRACE nova.compute.manager [instance: c04c1cf3-fbd9-40fd-be2e-e7dc06eb9f24] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 6342, in migrate_disk_and_power_off 2015-11-19 16:26:24.103 7779 TRACE nova.compute.manager [instance: c04c1cf3-fbd9-40fd-be2e-e7dc06eb9f24] on_completion=on_completion) 2015-11-19 16:26:24.103 7779 TRACE nova.compute.manager [instance: c04c1cf3-fbd9-40fd-be2e-e7dc06eb9f24] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/utils.py", line 329, in copy_image 2015-11-19 16:26:24.103 7779 TRACE nova.compute.manager [instance: c04c1cf3-fbd9-40fd-be2e-e7dc06eb9f24] on_execute=on_execute, on_completion=on_completion) 2015-11-19 16:26:24.103 7779 TRACE nova.compute.manager [instance: c04c1cf3-fbd9-40fd-be2e-e7dc06eb9f24] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/utils.py", line 55, in execute 2015-11-19 16:26:24.103 7779 TRACE nova.compute.manager [instance: c04c1cf3-fbd9-40fd-be2e-e7dc06eb9f24] return utils.execute(*args, **kwargs) 2015-11-19 16:26:24.103 7779 TRACE nova.compute.manager [instance: c04c1cf3-fbd9-40fd-be2e-e7dc06eb9f24] File "/usr/lib/python2.7/dist-packages/nova/utils.py", line 207, in execute 2015-11-19 16:26:24.103 7779 TRACE nova.compute.manager [instance: c04c1cf3-fbd9-40fd-be2e-e7dc06eb9f24] return processutils.execute(*cmd, **kwargs) 2015-11-19 16:26:24.103 7779 TRACE nova.compute.manager [instance: c04c1cf3-fbd9-40fd-be2e-e7dc06eb9f24] File "/usr/lib/python2.7/dist-packages/oslo_concurrency/processutils.py", line 174, in execute 2015-11-19 16:26:24.103 7779 TRACE nova.compute.manager [instance: c04c1cf3-fbd9-40fd-be2e-e7dc06eb9f24] raise UnknownArgumentError(_('Got unknown keyword args: %r') % kwargs) 2015-11-19 16:26:24.103 7779 TRACE nova.compute.manager [instance: c04c1cf3-fbd9-40fd-be2e-e7dc06eb9f24] UnknownArgumentError: Got unknown keyword args: {'on_execute': at 0x7f3a64527050>, 'on_completion': at 0x7f39ff6ddf50>} 2015-11-19 16:26:24.103 7779 TRACE nova.compute.manager [instance: c04c1cf3-fbd9-40fd-be2e-e7dc06eb9f24] https://github.com/openstack/requirements/commit/2fd00d00db5fce57d9589643801942d0332b1670 This commit above shows that OpenStack now requires 1.8.2 instead of 1.8.0. We would appreciate if the 1.8.2 upstream release can be brought in to resolve this bug. Thank you. ** Affects: nova Importance: Undecided Status: New ** Affects: python-oslo.concurrency (Ubuntu) Importance: Undecided Status: New ** Also affects: nova Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1518016 Title: Nova kilo requires concurrency 1.8.2 or better Status in OpenStack Compute (nova): New Status in python-oslo.concurrency package in Ubuntu: New Bug description: OpenStack Nova Kilo release requires 1.8.2 or higher, this is due to the addition of on_execute and on_completion to the execute(..) function.
[Yahoo-eng-team] [Bug 1338614] [NEW] Backgrounded resizing does not work
Public bug reported: When setting resize_rootfs to 'noblock', cloud-init should fork a new process and continue with it's own initialization process. However, it seems that this is currently broken, as you see from these logs that it still blocks on it: Jul 7 12:34:20 localhost [CLOUDINIT] cc_resizefs.py[DEBUG]: Resizing (via forking) root filesystem (type=ext4, val=noblock) Jul 7 12:34:20 localhost [CLOUDINIT] util.py[WARNING]: Failed forking and calling callback NoneType Jul 7 12:34:20 localhost [CLOUDINIT] util.py[DEBUG]: Failed forking and calling callback NoneType#012Traceback (most recent call last):#012 File /usr/lib/python2.6/site-packages/cloudinit/util.py, line 220, in fork_cb#012 child_cb(*args)#012TypeError: 'NoneType' object is not callable Also, when looking at timings, you can see that it was blocked on it for the whole time Jul 7 12:33:38 localhost [CLOUDINIT] util.py[DEBUG]: Cloud-init v. 0.7.4 running 'init' at Mon, 07 Jul 2014 12:33:38 +. Up 5.67 seconds. Jul 7 12:34:20 localhost [CLOUDINIT] util.py[DEBUG]: backgrounded Resizing took 41.487 seconds Jul 7 12:34:20 localhost [CLOUDINIT] util.py[DEBUG]: cloud-init mode 'init' took 41.799 seconds (41.80) ** Affects: cloud-init Importance: Undecided Status: Confirmed -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to cloud-init. https://bugs.launchpad.net/bugs/1338614 Title: Backgrounded resizing does not work Status in Init scripts for use on cloud images: Confirmed Bug description: When setting resize_rootfs to 'noblock', cloud-init should fork a new process and continue with it's own initialization process. However, it seems that this is currently broken, as you see from these logs that it still blocks on it: Jul 7 12:34:20 localhost [CLOUDINIT] cc_resizefs.py[DEBUG]: Resizing (via forking) root filesystem (type=ext4, val=noblock) Jul 7 12:34:20 localhost [CLOUDINIT] util.py[WARNING]: Failed forking and calling callback NoneType Jul 7 12:34:20 localhost [CLOUDINIT] util.py[DEBUG]: Failed forking and calling callback NoneType#012Traceback (most recent call last):#012 File /usr/lib/python2.6/site-packages/cloudinit/util.py, line 220, in fork_cb#012 child_cb(*args)#012TypeError: 'NoneType' object is not callable Also, when looking at timings, you can see that it was blocked on it for the whole time Jul 7 12:33:38 localhost [CLOUDINIT] util.py[DEBUG]: Cloud-init v. 0.7.4 running 'init' at Mon, 07 Jul 2014 12:33:38 +. Up 5.67 seconds. Jul 7 12:34:20 localhost [CLOUDINIT] util.py[DEBUG]: backgrounded Resizing took 41.487 seconds Jul 7 12:34:20 localhost [CLOUDINIT] util.py[DEBUG]: cloud-init mode 'init' took 41.799 seconds (41.80) To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-init/+bug/1338614/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1311778] [NEW] Unit tests fail with MessagingTimeout errors
Public bug reported: There is an issue that is causing unit tests to fail with the following error: MessagingTimeout: No reply on topic conductor MessagingTimeout: No reply on topic scheduler 2014-04-23 13:45:52.017 | Traceback (most recent call last): 2014-04-23 13:45:52.017 | File /home/jenkins/workspace/gate-nova-python26/.tox/py26/lib/python2.6/site-packages/oslo/messaging/rpc/dispatcher.py, line 133, in _dispatch_and_reply 2014-04-23 13:45:52.017 | incoming.message)) 2014-04-23 13:45:52.017 | File /home/jenkins/workspace/gate-nova-python26/.tox/py26/lib/python2.6/site-packages/oslo/messaging/rpc/dispatcher.py, line 176, in _dispatch 2014-04-23 13:45:52.017 | return self._do_dispatch(endpoint, method, ctxt, args) 2014-04-23 13:45:52.017 | File /home/jenkins/workspace/gate-nova-python26/.tox/py26/lib/python2.6/site-packages/oslo/messaging/rpc/dispatcher.py, line 122, in _do_dispatch 2014-04-23 13:45:52.017 | result = getattr(endpoint, method)(ctxt, **new_args) 2014-04-23 13:45:52.018 | File nova/conductor/manager.py, line 798, in build_instances 2014-04-23 13:45:52.018 | legacy_bdm_in_spec=legacy_bdm) 2014-04-23 13:51:50.628 | File nlibvir: error : internal error could not initialize domain event timer 2014-04-23 13:54:57.953 | ova/scheduler/rpcapi.py, line 120, in run_instance 2014-04-23 13:54:57.953 | cctxt.cast(ctxt, 'run_instance', **msg_kwargs) 2014-04-23 13:54:57.953 | File /home/jenkins/workspace/gate-nova-python26/.tox/py26/lib/python2.6/site-packages/oslo/messaging/rpc/client.py, line 150, in call 2014-04-23 13:54:57.953 | wait_for_reply=True, timeout=timeout) 2014-04-23 13:54:57.953 | File /home/jenkins/workspace/gate-nova-python26/.tox/py26/lib/python2.6/site-packages/oslo/messaging/transport.py, line 90, in _send 2014-04-23 13:54:57.953 | timeout=timeout) 2014-04-23 13:54:57.954 | File /home/jenkins/workspace/gate-nova-python26/.tox/py26/lib/python2.6/site-packages/oslo/messaging/_drivers/impl_fake.py, line 166, in send 2014-04-23 13:54:57.954 | return self._send(target, ctxt, message, wait_for_reply, timeout) 2014-04-23 13:54:57.954 | File /home/jenkins/workspace/gate-nova-python26/.tox/py26/lib/python2.6/site-packages/oslo/messaging/_drivers/impl_fake.py, line 161, in _send 2014-04-23 13:54:57.954 | 'No reply on topic %s' % target.topic) 2014-04-23 13:54:57.954 | MessagingTimeout: No reply on topic scheduler 2014-04-23 13:45:52.008 | Traceback (most recent call last): 2014-04-23 13:45:52.008 | File nova/api/openstack/__init__.py, line 125, in __call__ 2014-04-23 13:45:52.008 | return req.get_response(self.application) 2014-04-23 13:45:52.009 | File /home/jenkins/workspace/gate-nova-python26/.tox/py26/lib/python2.6/site-packages/webob/request.py, line 1320, in send 2014-04-23 13:45:52.009 | application, catch_exc_info=False) 2014-04-23 13:45:52.009 | File /home/jenkins/workspace/gate-nova-python26/.tox/py26/lib/python2.6/site-packages/webob/request.py, line 1284, in call_application 2014-04-23 13:45:52.009 | app_iter = application(self.environ, start_response) 2014-04-23 13:45:52.009 | File /home/jenkins/workspace/gate-nova-python26/.tox/py26/lib/python2.6/site-packages/webob/dec.py, line 144, in __call__ 2014-04-23 13:45:52.009 | return resp(environ, start_response) 2014-04-23 13:45:52.009 | File /home/jenkins/workspace/gate-nova-python26/.tox/py26/lib/python2.6/site-packages/webob/dec.py, line 144, in __call__ 2014-04-23 13:45:52.010 | return resp(environ, start_response) 2014-04-23 13:45:52.010 | File /home/jenkins/workspace/gate-nova-python26/.tox/py26/lib/python2.6/site-packages/webob/dec.py, line 144, in __call__ 2014-04-23 13:45:52.010 | return resp(environ, start_response) 2014-04-23 13:45:52.010 | File /home/jenkins/workspace/gate-nova-python26/.tox/py26/lib/python2.6/site-packages/webob/dec.py, line 144, in __call__ 2014-04-23 13:45:52.010 | return resp(environ, start_response) 2014-04-23 13:45:52.010 | File /home/jenkins/workspace/gate-nova-python26/.tox/py26/lib/python2.6/site-packages/routes/middleware.py, line 131, in __call__ 2014-04-23 13:45:52.010 | response = self.app(environ, start_response) 2014-04-23 13:45:52.011 | File /home/jenkins/workspace/gate-nova-python26/.tox/py26/lib/python2.6/site-packages/webob/dec.py, line 144, in __call__ 2014-04-23 13:45:52.011 | return resp(environ, start_response) 2014-04-23 13:45:52.011 | File /home/jenkins/workspace/gate-nova-python26/.tox/py26/lib/python2.6/site-packages/webob/dec.py, line 130, in __call__ 2014-04-23 13:45:52.011 | resp = self.call_func(req, *args, **self.kwargs) 2014-04-23 13:45:52.011 | File /home/jenkins/workspace/gate-nova-python26/.tox/py26/lib/python2.6/site-packages/webob/dec.py, line 195, in call_func 2014-04-23 13:45:52.011 | return self.func(req, *args, **kwargs) 2014-04-23 13:45:52.012 | File
[Yahoo-eng-team] [Bug 1309043] [NEW] NetworkCommandsTestCase unit test failing
Public bug reported: Change-Id I663bd06eb50872f16fc9889dde917277739fefce introduced a race condition where if another test doesn't properly reset the _IS_NEUTRON flag, it will fail because it will think that it is using Neutron and error out. ** Affects: nova Importance: Undecided Assignee: Mohammed Naser (mnaser) Status: In Progress ** Changed in: nova Assignee: (unassigned) = Mohammed Naser (mnaser) -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1309043 Title: NetworkCommandsTestCase unit test failing Status in OpenStack Compute (Nova): In Progress Bug description: Change-Id I663bd06eb50872f16fc9889dde917277739fefce introduced a race condition where if another test doesn't properly reset the _IS_NEUTRON flag, it will fail because it will think that it is using Neutron and error out. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1309043/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1309334] [NEW] Version aliases not updated for Icehouse
Public bug reported: With the release of Icehouse, the RPC APIs were not updated for their version aliases. ** Affects: nova Importance: Undecided Assignee: Mohammed Naser (mnaser) Status: New ** Changed in: nova Assignee: (unassigned) = Mohammed Naser (mnaser) -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1309334 Title: Version aliases not updated for Icehouse Status in OpenStack Compute (Nova): New Bug description: With the release of Icehouse, the RPC APIs were not updated for their version aliases. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1309334/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1240197] Re: Add support for 'auto' number of API or conductor workers
This has been taken care of in this merged review https://review.openstack.org/#/c/69266/ ** Changed in: nova Status: Confirmed = Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1240197 Title: Add support for 'auto' number of API or conductor workers Status in OpenStack Compute (Nova): Fix Released Bug description: Nova has some configuration options that allow you to have some services start multiple worker processes. [general] ec2_workers= osapi_compute_workers= metadata_workers= [conductor] workers= Swift has a similar workers option. In Swift, you can set this option to 'auto', and it will use the number of CPU cores. We should add support for 'auto' to all of the workers options in Nova. https://git.openstack.org/cgit/openstack/swift/tree/etc/proxy- server.conf-sample To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1240197/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1025481] Re: Instance usage audit fails under PostgreSQL
*** This bug is a duplicate of bug 1102477 *** https://bugs.launchpad.net/bugs/1102477 ** Changed in: nova Status: Triaged = Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1025481 Title: Instance usage audit fails under PostgreSQL Status in OpenStack Compute (Nova): Fix Released Bug description: The instance_usage_audit calls are not working when using PostgreSQL (not sure about other DB implementations) because SQLAlchemy sends it as a date when it expects a varchar. Stacktrace: 2012-07-17 00:00:07 DEBUG nova.manager [-] Running periodic task ComputeManager._instance_usage_audit from (pid=6658) periodic_tasks /usr/local/lib/python2.7/dist-packages/nova-2012.2-py2.7.egg/nova/manager.py:164 2012-07-17 00:00:07 ERROR nova.manager [-] Error during ComputeManager._instance_usage_audit: (ProgrammingError) operator does not exist: character varying = timestamp without time zone LINE 3: ...stance_usage_audit' AND task_log.period_beginning = '2012-06... ^ HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts. 'SELECT task_log.created_at AS task_log_created_at, task_log.updated_at AS task_log_updated_at, task_log.deleted_at AS task_log_deleted_at, task_log.deleted AS task_log_deleted, task_log.id AS task_log_id, task_log.task_name AS task_log_task_name, task_log.state AS task_log_state, task_log.host AS task_log_host, task_log.period_beginning AS task_log_period_beginning, task_log.period_ending AS task_log_period_ending, task_log.message AS task_log_message, task_log.task_items AS task_log_task_items, task_log.errors AS task_log_errors \nFROM task_log \nWHERE task_log.deleted = %(deleted_1)s AND task_log.task_name = %(task_name_1)s AND task_log.period_beginning = %(period_beginning_1)s AND task_log.period_ending = %(period_ending_1)s AND task_log.host = %(host_1)s \n LIMIT %(param_1)s' {'host_1': 'compute2', 'param_1': 1, 'deleted_1': False, 'period_ending_1': datetime.datetime(2012, 7, 1, 0, 0), 'task_name_1': 'instance_usage_audit', 'period_beginning_1': datetime.datetime(2012, 6, 1, 0, 0)} 2012-07-17 00:00:07 TRACE nova.manager Traceback (most recent call last): 2012-07-17 00:00:07 TRACE nova.manager File /usr/local/lib/python2.7/dist-packages/nova-2012.2-py2.7.egg/nova/manager.py, line 167, in periodic_tasks 2012-07-17 00:00:07 TRACE nova.manager task(self, context) 2012-07-17 00:00:07 TRACE nova.manager File /usr/local/lib/python2.7/dist-packages/nova-2012.2-py2.7.egg/nova/compute/manager.py, line 2381, in _instance_usage_audit 2012-07-17 00:00:07 TRACE nova.manager if not compute_utils.has_audit_been_run(context, self.host): 2012-07-17 00:00:07 TRACE nova.manager File /usr/local/lib/python2.7/dist-packages/nova-2012.2-py2.7.egg/nova/compute/utils.py, line 116, in has_audit_been_run 2012-07-17 00:00:07 TRACE nova.manager begin, end, host) 2012-07-17 00:00:07 TRACE nova.manager File /usr/local/lib/python2.7/dist-packages/nova-2012.2-py2.7.egg/nova/db/api.py, line 1879, in task_log_get 2012-07-17 00:00:07 TRACE nova.manager period_ending, host, state, session) 2012-07-17 00:00:07 TRACE nova.manager File /usr/local/lib/python2.7/dist-packages/nova-2012.2-py2.7.egg/nova/db/sqlalchemy/api.py, line 114, in wrapper 2012-07-17 00:00:07 TRACE nova.manager return f(*args, **kwargs) 2012-07-17 00:00:07 TRACE nova.manager File /usr/local/lib/python2.7/dist-packages/nova-2012.2-py2.7.egg/nova/db/sqlalchemy/api.py, line 4971, in task_log_get 2012-07-17 00:00:07 TRACE nova.manager return query.first() 2012-07-17 00:00:07 TRACE nova.manager File /usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/query.py, line 2156, in first 2012-07-17 00:00:07 TRACE nova.manager ret = list(self[0:1]) 2012-07-17 00:00:07 TRACE nova.manager File /usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/query.py, line 2023, in __getitem__ 2012-07-17 00:00:07 TRACE nova.manager return list(res) 2012-07-17 00:00:07 TRACE nova.manager File /usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/query.py, line 2227, in __iter__ 2012-07-17 00:00:07 TRACE nova.manager return self._execute_and_instances(context) 2012-07-17 00:00:07 TRACE nova.manager File /usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/query.py, line 2242, in _execute_and_instances 2012-07-17 00:00:07 TRACE nova.manager result = conn.execute(querycontext.statement, self._params) 2012-07-17 00:00:07 TRACE nova.manager File /usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py, line 1449, in execute 2012-07-17 00:00:07 TRACE nova.manager params) 2012-07-17 00:00:07 TRACE nova.manager File