[Yahoo-eng-team] [Bug 2063463] [NEW] [ovn-octavia-provider] hairpin_snat_ip not set

2024-04-25 Thread Mohammed Naser
Public bug reported:

At the moment, the OVN octavia provider does not set `hairpin_snat_ip`
out of the box which means that if a backend server is sending requests
to a load balancer which it is also a backend server of, it will get
that request where the source IP of the request is the floating IP of
the service.

The issue here is that there are two backend IPs, one floating and one
fixed and there is non-deterministic behaviour if `hairpin_snat_ip` is
not set.

We should ideally set `hairpin_snat_ip` to the internal IP so that it
always hairpins from that IP as opposed to many other IPs which will
make it easier to manage security groups as well.

** Affects: neutron
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/2063463

Title:
  [ovn-octavia-provider] hairpin_snat_ip not set

Status in neutron:
  New

Bug description:
  At the moment, the OVN octavia provider does not set `hairpin_snat_ip`
  out of the box which means that if a backend server is sending
  requests to a load balancer which it is also a backend server of, it
  will get that request where the source IP of the request is the
  floating IP of the service.

  The issue here is that there are two backend IPs, one floating and one
  fixed and there is non-deterministic behaviour if `hairpin_snat_ip` is
  not set.

  We should ideally set `hairpin_snat_ip` to the internal IP so that it
  always hairpins from that IP as opposed to many other IPs which will
  make it easier to manage security groups as well.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/2063463/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 2062385] [NEW] [ovn-octavia-provider] Member with FIP not reachable

2024-04-18 Thread Mohammed Naser
Public bug reported:

We've noticed the following issue with the OVN octavia provider and
we've narrowed it down to the following:

- Member with floating IP not reachable through load balancer

We've noticed that at first, it loses all connectivity.  Once the
floating IP gets removed and added, the VM will gain connectivity
directly.  However, that member will continue to be unreachable via the
load balancer (but other members without floating IPs will work).

DVR is enabled in this case.

** Affects: neutron
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/2062385

Title:
  [ovn-octavia-provider] Member with FIP not reachable

Status in neutron:
  New

Bug description:
  We've noticed the following issue with the OVN octavia provider and
  we've narrowed it down to the following:

  - Member with floating IP not reachable through load balancer

  We've noticed that at first, it loses all connectivity.  Once the
  floating IP gets removed and added, the VM will gain connectivity
  directly.  However, that member will continue to be unreachable via
  the load balancer (but other members without floating IPs will work).

  DVR is enabled in this case.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/2062385/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 2060163] [NEW] [ovn] race condition with add/remove router interface

2024-04-03 Thread Mohammed Naser
Public bug reported:

We're running into an issue in our CI with Atmosphere where we
frequently see failures when a router port is removed from an interface,
the traceback is the following:

==
2024-04-03 21:13:09.804 10 ERROR neutron.api.v2.resource [None 
req-e5d08cdd-28e6-4231-a50c-7eafc1b8f942 70fc3b55af8c4386b80207dad11db5da 
dcec54844db44eedbd9667951a5ceb6b - - - -] remove_router_interface failed: No 
details.: ovsdbapp.backend.ovs_idl.idlutils.RowNotFound: Cannot find 
Logical_Router_Port with name=lrp-7e0debbb-893c-420a-8569-d8fb98e6a16e
2024-04-03 21:13:09.804 10 ERROR neutron.api.v2.resource Traceback (most recent 
call last):
2024-04-03 21:13:09.804 10 ERROR neutron.api.v2.resource   File 
"/var/lib/openstack/lib/python3.10/site-packages/neutron/api/v2/resource.py", 
line 98, in resource
2024-04-03 21:13:09.804 10 ERROR neutron.api.v2.resource result = 
method(request=request, **args)
2024-04-03 21:13:09.804 10 ERROR neutron.api.v2.resource   File 
"/var/lib/openstack/lib/python3.10/site-packages/neutron_lib/db/api.py", line 
140, in wrapped
2024-04-03 21:13:09.804 10 ERROR neutron.api.v2.resource with 
excutils.save_and_reraise_exception():
2024-04-03 21:13:09.804 10 ERROR neutron.api.v2.resource   File 
"/var/lib/openstack/lib/python3.10/site-packages/oslo_utils/excutils.py", line 
227, in __exit__
2024-04-03 21:13:09.804 10 ERROR neutron.api.v2.resource 
self.force_reraise()
2024-04-03 21:13:09.804 10 ERROR neutron.api.v2.resource   File 
"/var/lib/openstack/lib/python3.10/site-packages/oslo_utils/excutils.py", line 
200, in force_reraise
2024-04-03 21:13:09.804 10 ERROR neutron.api.v2.resource raise self.value
2024-04-03 21:13:09.804 10 ERROR neutron.api.v2.resource   File 
"/var/lib/openstack/lib/python3.10/site-packages/neutron_lib/db/api.py", line 
138, in wrapped
2024-04-03 21:13:09.804 10 ERROR neutron.api.v2.resource return f(*args, 
**kwargs)
2024-04-03 21:13:09.804 10 ERROR neutron.api.v2.resource   File 
"/var/lib/openstack/lib/python3.10/site-packages/oslo_db/api.py", line 144, in 
wrapper
2024-04-03 21:13:09.804 10 ERROR neutron.api.v2.resource with 
excutils.save_and_reraise_exception() as ectxt:
2024-04-03 21:13:09.804 10 ERROR neutron.api.v2.resource   File 
"/var/lib/openstack/lib/python3.10/site-packages/oslo_utils/excutils.py", line 
227, in __exit__
2024-04-03 21:13:09.804 10 ERROR neutron.api.v2.resource 
self.force_reraise()
2024-04-03 21:13:09.804 10 ERROR neutron.api.v2.resource   File 
"/var/lib/openstack/lib/python3.10/site-packages/oslo_utils/excutils.py", line 
200, in force_reraise
2024-04-03 21:13:09.804 10 ERROR neutron.api.v2.resource raise self.value
2024-04-03 21:13:09.804 10 ERROR neutron.api.v2.resource   File 
"/var/lib/openstack/lib/python3.10/site-packages/oslo_db/api.py", line 142, in 
wrapper
2024-04-03 21:13:09.804 10 ERROR neutron.api.v2.resource return f(*args, 
**kwargs)
2024-04-03 21:13:09.804 10 ERROR neutron.api.v2.resource   File 
"/var/lib/openstack/lib/python3.10/site-packages/neutron_lib/db/api.py", line 
186, in wrapped
2024-04-03 21:13:09.804 10 ERROR neutron.api.v2.resource with 
excutils.save_and_reraise_exception():
2024-04-03 21:13:09.804 10 ERROR neutron.api.v2.resource   File 
"/var/lib/openstack/lib/python3.10/site-packages/oslo_utils/excutils.py", line 
227, in __exit__
2024-04-03 21:13:09.804 10 ERROR neutron.api.v2.resource 
self.force_reraise()
2024-04-03 21:13:09.804 10 ERROR neutron.api.v2.resource   File 
"/var/lib/openstack/lib/python3.10/site-packages/oslo_utils/excutils.py", line 
200, in force_reraise
2024-04-03 21:13:09.804 10 ERROR neutron.api.v2.resource raise self.value
2024-04-03 21:13:09.804 10 ERROR neutron.api.v2.resource   File 
"/var/lib/openstack/lib/python3.10/site-packages/neutron_lib/db/api.py", line 
184, in wrapped
2024-04-03 21:13:09.804 10 ERROR neutron.api.v2.resource return 
f(*dup_args, **dup_kwargs)
2024-04-03 21:13:09.804 10 ERROR neutron.api.v2.resource   File 
"/var/lib/openstack/lib/python3.10/site-packages/neutron/api/v2/base.py", line 
253, in _handle_action
2024-04-03 21:13:09.804 10 ERROR neutron.api.v2.resource ret_value = 
getattr(self._plugin, name)(*arg_list, **kwargs)
2024-04-03 21:13:09.804 10 ERROR neutron.api.v2.resource   File 
"/var/lib/openstack/lib/python3.10/site-packages/neutron/services/ovn_l3/plugin.py",
 line 260, in remove_router_interface
2024-04-03 21:13:09.804 10 ERROR neutron.api.v2.resource with 
excutils.save_and_reraise_exception():
2024-04-03 21:13:09.804 10 ERROR neutron.api.v2.resource   File 
"/var/lib/openstack/lib/python3.10/site-packages/oslo_utils/excutils.py", line 
227, in __exit__
2024-04-03 21:13:09.804 10 ERROR neutron.api.v2.resource 
self.force_reraise()
2024-04-03 21:13:09.804 10 ERROR neutron.api.v2.resource   File 
"/var/lib/openstack/lib/python3.10/site-packages/oslo_utils/excutils.py", line 
200, in force_reraise
2024-04-03 21:13:09.804 10 ERROR 

[Yahoo-eng-team] [Bug 2059716] [NEW] [ovn] Multihomed backend (IPv4 + IPv6) with floating IP unreachable

2024-03-28 Thread Mohammed Naser
Public bug reported:

We've got an interesting scenario where one of the backends of a load
balancer is not reachable given the following test environment:

2x networks
- provider network, IPv4 + IPv6 subnets
- tenant network (Geneve), IPv4 + IPv6 subnets

3x VMs
- 2x single port, 2 IP addresses on the tenant network
- 1x single port, 2 IP addresses on the tenant network + floating IP (IPv4 
only) attached

Load balancer:
- Using single tenant network, with floating IP (IPv4 only) attached
- OVN provider

With the setup above, the VM with the floating IP attached will not be
reachable by the load balancer (aka, hitting it multiple times will
timeout 1/3 of the time).  If you remove the floating IP and re-attach
it, it works.

In troubleshooting, we've noticed that when removing the IPv6 subnet
from the tenant network resolves this, so I suspect that it's somehow to
do with that.

** Affects: neutron
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/2059716

Title:
  [ovn] Multihomed backend (IPv4 + IPv6) with floating IP unreachable

Status in neutron:
  New

Bug description:
  We've got an interesting scenario where one of the backends of a load
  balancer is not reachable given the following test environment:

  2x networks
  - provider network, IPv4 + IPv6 subnets
  - tenant network (Geneve), IPv4 + IPv6 subnets

  3x VMs
  - 2x single port, 2 IP addresses on the tenant network
  - 1x single port, 2 IP addresses on the tenant network + floating IP (IPv4 
only) attached

  Load balancer:
  - Using single tenant network, with floating IP (IPv4 only) attached
  - OVN provider

  With the setup above, the VM with the floating IP attached will not be
  reachable by the load balancer (aka, hitting it multiple times will
  timeout 1/3 of the time).  If you remove the floating IP and re-attach
  it, it works.

  In troubleshooting, we've noticed that when removing the IPv6 subnet
  from the tenant network resolves this, so I suspect that it's somehow
  to do with that.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/2059716/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 2052915] Re: "neutron-ovs-grenade-multinode" and "neutron-ovn-grenade-multinode" failing in 2023.1 and Zed

2024-02-22 Thread Mohammed Naser
Nova is also affected by this.

** Also affects: nova
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/2052915

Title:
  "neutron-ovs-grenade-multinode" and "neutron-ovn-grenade-multinode"
  failing in 2023.1 and Zed

Status in neutron:
  Triaged
Status in OpenStack Compute (nova):
  New

Bug description:
  The issue seems to be in the neutron-lib version installed:
  2024-02-07 16:19:35.155231 | compute1 | ERROR: neutron 21.2.1.dev38 has 
requirement neutron-lib>=3.1.0, but you'll have neutron-lib 2.20.2 which is 
incompatible.

  That leads to an error when starting the Neutron API (an API definition is 
not found) [1]:
  Feb 07 16:13:54.385467 np0036680724 neutron-server[67288]: ERROR neutron 
ImportError: cannot import name 'port_mac_address_override' from 
'neutron_lib.api.definitions' 
(/usr/local/lib/python3.8/dist-packages/neutron_lib/api/definitions/__init__.py)

  Setting priority to Critical because that affects to the CI.

  
[1]https://9faad8159db8d6994977-b587eccfce0a645f527dfcbc49e54bb4.ssl.cf2.rackcdn.com/891397/4/check/neutron-
  ovs-grenade-multinode/ba47cef/controller/logs/screen-q-svc.txt

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/2052915/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 2053274] [NEW] [ovn] mtu for metadata veth interface is not set

2024-02-15 Thread Mohammed Naser
Public bug reported:

When using OVN, the `veth` interfaces which get created inside the
network namespace (and the other half that goes into the OVS bridge)
both do not get an MTU configured for them when they are provisioned.

https://github.com/openstack/neutron/blob/stable/zed/neutron/agent/ovn/metadata/agent.py#L589-L594

This can cause some unknown/annoying errors with packets being dropped
if a user is hitting large requests on the metadata service, the ideal
solution would be to configure the correct MTU for the interface to
avoid this issue.

** Affects: neutron
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/2053274

Title:
  [ovn] mtu for metadata veth interface is not set

Status in neutron:
  New

Bug description:
  When using OVN, the `veth` interfaces which get created inside the
  network namespace (and the other half that goes into the OVS bridge)
  both do not get an MTU configured for them when they are provisioned.

  
https://github.com/openstack/neutron/blob/stable/zed/neutron/agent/ovn/metadata/agent.py#L589-L594

  This can cause some unknown/annoying errors with packets being dropped
  if a user is hitting large requests on the metadata service, the ideal
  solution would be to configure the correct MTU for the interface to
  avoid this issue.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/2053274/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 2042362] [NEW] Listing instances gives unnecessary error if flavor is deleted

2023-10-31 Thread Mohammed Naser
Public bug reported:

You'll get an alert similar to this:

"Unable to retrieve instance size information."

reference: https://github.com/vexxhost/atmosphere/issues/574

** Affects: horizon
 Importance: Undecided
 Status: In Progress

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Dashboard (Horizon).
https://bugs.launchpad.net/bugs/2042362

Title:
  Listing instances gives unnecessary error if flavor is deleted

Status in OpenStack Dashboard (Horizon):
  In Progress

Bug description:
  You'll get an alert similar to this:

  "Unable to retrieve instance size information."

  reference: https://github.com/vexxhost/atmosphere/issues/574

To manage notifications about this bug go to:
https://bugs.launchpad.net/horizon/+bug/2042362/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 2038978] [NEW] [OVN] ARP + Floating IP issues

2023-10-10 Thread Mohammed Naser
Public bug reported:

When using OVN, if you have a virtual router with a gateway that is in
subnet A, and has a port that has a floating IP attached to it from
subnet B, they seem to not be reachable.

https://mail.openvswitch.org/pipermail/ovs-dev/2021-July/385253.html

There was a fix brought into OVN with this not long ago, it introduces
an option called `options:add_route` to `true`.

see: https://mail.openvswitch.org/pipermail/ovs-
dev/2021-July/385255.html

I think we should do this in order to mirror the same behaviour in
ML2/OVS since we install scope link routes.

** Affects: neutron
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/2038978

Title:
  [OVN] ARP + Floating IP issues

Status in neutron:
  New

Bug description:
  When using OVN, if you have a virtual router with a gateway that is in
  subnet A, and has a port that has a floating IP attached to it from
  subnet B, they seem to not be reachable.

  https://mail.openvswitch.org/pipermail/ovs-dev/2021-July/385253.html

  There was a fix brought into OVN with this not long ago, it introduces
  an option called `options:add_route` to `true`.

  see: https://mail.openvswitch.org/pipermail/ovs-
  dev/2021-July/385255.html

  I think we should do this in order to mirror the same behaviour in
  ML2/OVS since we install scope link routes.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/2038978/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 2037585] [NEW] VM fails to delete with trunk + subports

2023-09-27 Thread Mohammed Naser
Public bug reported:

When using Neutron, it will prevent you to delete a port if the subports
are still attached:

https://review.opendev.org/c/openstack/neutron/+/885154

Because of this, if you delete a VM with subports attached, you will end
up with a VM in ERROR state:

```
2023-09-27 18:31:01.056 328858 ERROR nova.network.neutron [instance: 
08ca5cf4-c86a-4446-a031-a3b84ff47884] Traceback (most recent call last):
2023-09-27 18:31:01.056 328858 ERROR nova.network.neutron [instance: 
08ca5cf4-c86a-4446-a031-a3b84ff47884]   File 
"/var/lib/openstack/lib/python3.10/site-packages/nova/network/neutron.py", line 
1768, in _delete_ports
2023-09-27 18:31:01.056 328858 ERROR nova.network.neutron [instance: 
08ca5cf4-c86a-4446-a031-a3b84ff47884] neutron.delete_port(port)
2023-09-27 18:31:01.056 328858 ERROR nova.network.neutron [instance: 
08ca5cf4-c86a-4446-a031-a3b84ff47884]   File 
"/var/lib/openstack/lib/python3.10/site-packages/nova/network/neutron.py", line 
196, in wrapper
2023-09-27 18:31:01.056 328858 ERROR nova.network.neutron [instance: 
08ca5cf4-c86a-4446-a031-a3b84ff47884] ret = obj(*args, **kwargs)
2023-09-27 18:31:01.056 328858 ERROR nova.network.neutron [instance: 
08ca5cf4-c86a-4446-a031-a3b84ff47884]   File 
"/var/lib/openstack/lib/python3.10/site-packages/neutronclient/v2_0/client.py", 
line 833, in delete_port
2023-09-27 18:31:01.056 328858 ERROR nova.network.neutron [instance: 
08ca5cf4-c86a-4446-a031-a3b84ff47884] return self.delete(self.port_path % 
(port))
2023-09-27 18:31:01.056 328858 ERROR nova.network.neutron [instance: 
08ca5cf4-c86a-4446-a031-a3b84ff47884]   File 
"/var/lib/openstack/lib/python3.10/site-packages/nova/network/neutron.py", line 
196, in wrapper
2023-09-27 18:31:01.056 328858 ERROR nova.network.neutron [instance: 
08ca5cf4-c86a-4446-a031-a3b84ff47884] ret = obj(*args, **kwargs)
2023-09-27 18:31:01.056 328858 ERROR nova.network.neutron [instance: 
08ca5cf4-c86a-4446-a031-a3b84ff47884]   File 
"/var/lib/openstack/lib/python3.10/site-packages/neutronclient/v2_0/client.py", 
line 352, in delete
2023-09-27 18:31:01.056 328858 ERROR nova.network.neutron [instance: 
08ca5cf4-c86a-4446-a031-a3b84ff47884] return self.retry_request("DELETE", 
action, body=body,
2023-09-27 18:31:01.056 328858 ERROR nova.network.neutron [instance: 
08ca5cf4-c86a-4446-a031-a3b84ff47884]   File 
"/var/lib/openstack/lib/python3.10/site-packages/nova/network/neutron.py", line 
196, in wrapper
2023-09-27 18:31:01.056 328858 ERROR nova.network.neutron [instance: 
08ca5cf4-c86a-4446-a031-a3b84ff47884] ret = obj(*args, **kwargs)
2023-09-27 18:31:01.056 328858 ERROR nova.network.neutron [instance: 
08ca5cf4-c86a-4446-a031-a3b84ff47884]   File 
"/var/lib/openstack/lib/python3.10/site-packages/neutronclient/v2_0/client.py", 
line 333, in retry_request
2023-09-27 18:31:01.056 328858 ERROR nova.network.neutron [instance: 
08ca5cf4-c86a-4446-a031-a3b84ff47884] return self.do_request(method, 
action, body=body,
2023-09-27 18:31:01.056 328858 ERROR nova.network.neutron [instance: 
08ca5cf4-c86a-4446-a031-a3b84ff47884]   File 
"/var/lib/openstack/lib/python3.10/site-packages/nova/network/neutron.py", line 
196, in wrapper
2023-09-27 18:31:01.056 328858 ERROR nova.network.neutron [instance: 
08ca5cf4-c86a-4446-a031-a3b84ff47884] ret = obj(*args, **kwargs)
2023-09-27 18:31:01.056 328858 ERROR nova.network.neutron [instance: 
08ca5cf4-c86a-4446-a031-a3b84ff47884]   File 
"/var/lib/openstack/lib/python3.10/site-packages/neutronclient/v2_0/client.py", 
line 297, in do_request
2023-09-27 18:31:01.056 328858 ERROR nova.network.neutron [instance: 
08ca5cf4-c86a-4446-a031-a3b84ff47884] 
self._handle_fault_response(status_code, replybody, resp)
2023-09-27 18:31:01.056 328858 ERROR nova.network.neutron [instance: 
08ca5cf4-c86a-4446-a031-a3b84ff47884]   File 
"/var/lib/openstack/lib/python3.10/site-packages/nova/network/neutron.py", line 
196, in wrapper
2023-09-27 18:31:01.056 328858 ERROR nova.network.neutron [instance: 
08ca5cf4-c86a-4446-a031-a3b84ff47884] ret = obj(*args, **kwargs)
2023-09-27 18:31:01.056 328858 ERROR nova.network.neutron [instance: 
08ca5cf4-c86a-4446-a031-a3b84ff47884]   File 
"/var/lib/openstack/lib/python3.10/site-packages/neutronclient/v2_0/client.py", 
line 272, in _handle_fault_response
2023-09-27 18:31:01.056 328858 ERROR nova.network.neutron [instance: 
08ca5cf4-c86a-4446-a031-a3b84ff47884] exception_handler_v20(status_code, 
error_body)
2023-09-27 18:31:01.056 328858 ERROR nova.network.neutron [instance: 
08ca5cf4-c86a-4446-a031-a3b84ff47884]   File 
"/var/lib/openstack/lib/python3.10/site-packages/neutronclient/v2_0/client.py", 
line 90, in exception_handler_v20
2023-09-27 18:31:01.056 328858 ERROR nova.network.neutron [instance: 
08ca5cf4-c86a-4446-a031-a3b84ff47884] raise 
client_exc(message=error_message,
2023-09-27 18:31:01.056 328858 ERROR nova.network.neutron [instance: 
08ca5cf4-c86a-4446-a031-a3b84ff47884] 

[Yahoo-eng-team] [Bug 2028442] [NEW] Support DNS for ovn_{nb, sb}_connection

2023-07-22 Thread Mohammed Naser
Public bug reported:

At the moment, it seems that when using a DNS hostname for
`ovn_nb_connection` or `ovn_sb_connection`, the connection never seems
to go up.

It seems that the `ovs` library does not resolve an IP address before
proceeding, I'm not sure if we should be resolving things and passing
them on resolved to OVS, or tryign to look for fix upstream.

This is pretty critical for HA deployments that rely on multiple
replicas with hostnames (i.e. a Kubernetes StatefulSet)

** Affects: neutron
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/2028442

Title:
  Support DNS for ovn_{nb,sb}_connection

Status in neutron:
  New

Bug description:
  At the moment, it seems that when using a DNS hostname for
  `ovn_nb_connection` or `ovn_sb_connection`, the connection never seems
  to go up.

  It seems that the `ovs` library does not resolve an IP address before
  proceeding, I'm not sure if we should be resolving things and passing
  them on resolved to OVS, or tryign to look for fix upstream.

  This is pretty critical for HA deployments that rely on multiple
  replicas with hostnames (i.e. a Kubernetes StatefulSet)

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/2028442/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 2015894] [NEW] VMs failing to go up with RBD volume + volume_use_multipath

2023-04-11 Thread Mohammed Naser
Public bug reported:

For a VM that's using an RBD volume when `volume_use_multipath` is set
to true, VMs will fail to go up.

https://github.com/openstack/os-
brick/blob/28ffcdbfa138859859beca2f80627c076269be56/os_brick/initiator/linuxscsi.py#L212-L233

It seems like we always call os_brick with enforce_multipath=True, so if
that is set to enabled, it ends up failing for all newly provisioned
VMs, even if multipath is not in use/necessary.

Ideally, we should be able to safely ignore it if we're trying to plug a
backend that doesn't use/support multipath..

** Affects: nova
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2015894

Title:
  VMs failing to go up with RBD volume + volume_use_multipath

Status in OpenStack Compute (nova):
  New

Bug description:
  For a VM that's using an RBD volume when `volume_use_multipath` is set
  to true, VMs will fail to go up.

  https://github.com/openstack/os-
  
brick/blob/28ffcdbfa138859859beca2f80627c076269be56/os_brick/initiator/linuxscsi.py#L212-L233

  It seems like we always call os_brick with enforce_multipath=True, so
  if that is set to enabled, it ends up failing for all newly
  provisioned VMs, even if multipath is not in use/necessary.

  Ideally, we should be able to safely ignore it if we're trying to plug
  a backend that doesn't use/support multipath..

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/2015894/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1992186] [NEW] "int object is not iterable" when using numerical group names

2022-10-07 Thread Mohammed Naser
Public bug reported:

When using federation and having the values of `groups` in the mapping
set to a number, it will be parsed into a a number and then fail to
authenticate:

```
{"error":{"code":400,"message":"'int' object is not iterable","title":"Bad 
Request"}}
```

I believe the bad bit is here:

https://github.com/openstack/keystone/blob/326b014434cc760ba08763e1870ac057f7917e98/keystone/federation/utils.py#L650-L661

** Affects: keystone
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Identity (keystone).
https://bugs.launchpad.net/bugs/1992186

Title:
  "int object is not iterable" when using numerical group names

Status in OpenStack Identity (keystone):
  New

Bug description:
  When using federation and having the values of `groups` in the mapping
  set to a number, it will be parsed into a a number and then fail to
  authenticate:

  ```
  {"error":{"code":400,"message":"'int' object is not iterable","title":"Bad 
Request"}}
  ```

  I believe the bad bit is here:

  
https://github.com/openstack/keystone/blob/326b014434cc760ba08763e1870ac057f7917e98/keystone/federation/utils.py#L650-L661

To manage notifications about this bug go to:
https://bugs.launchpad.net/keystone/+bug/1992186/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1447651] Re: Find many duplicate rules in memory by using iptables_manager

2022-05-30 Thread Mohammed Naser
This is no longer relevant and I do not see these warnings, closing
because of age.

** Changed in: neutron
   Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1447651

Title:
  Find many duplicate rules in memory by using iptables_manager

Status in neutron:
  Invalid

Bug description:
  I installed VPNaas In my devstack. I find many duplicate iptables
  rules in memory. The rule is ' 2015-04-23 10:55:15.380 ERROR
  neutron.agent.linux.iptables_manager [-] ## rule is -A neutron-
  vpn-agen-POSTROUTING -s 192.168.10.0/24 -d 192.168.20.1/24 -m policy
  --dir out --pol ipsec -j ACCEPT ', and I add this log in
  'agent/linux/iptables_manager.py  ' after ' _modify_rules '. Why there
  are duplicate iptables rules?  Does iptables_manager weed out
  duplicate rules?

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1447651/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1598652] Re: Neutron VPNaaS API CI is not enabled

2022-05-30 Thread Mohammed Naser
** Changed in: neutron
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1598652

Title:
  Neutron VPNaaS API CI is not enabled

Status in neutron:
  Fix Released

Bug description:
  VPNaaS API CI test is not enabled since the api test code has issue,
  now our team fixed it and vpnaas also need these CI tests. So add a
  new CI job to enable it.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1598652/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1645516] Re: openswan package is not available in Ubuntu 16.04

2022-05-30 Thread Mohammed Naser
Invalid now, 16.04 is long gone :)

** Changed in: neutron
   Status: Confirmed => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1645516

Title:
  openswan package is not available  in Ubuntu  16.04

Status in neutron:
  Invalid

Bug description:
  We plan to launch vpnaas service on xenial node in rally ci, however ,
  it is failed because openswan package is not available, and we found
  openswan package is likely not available in Ubuntu 16.04.

  the error print:http://paste.openstack.org/show/590601/

  issue with openswan :http://askubuntu.com/questions/801860/openswan-
  shows-no-installation-candidate-after-running-apt-get-update

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1645516/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1680484] Re: neutron-vpnaas:error when creating IPSec Site Connection using strongswan on centos

2022-05-30 Thread Mohammed Naser
Correct, resolved by the comment Dmitriy added.

** Changed in: neutron
   Status: New => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1680484

Title:
  neutron-vpnaas:error when creating IPSec Site Connection using
  strongswan on centos

Status in neutron:
  Fix Released

Bug description:
  Operating system:
  CentOS Linux release 7.3.1611 (Core)

  Kernel:
  3.10.0-514.el7.x86_64

  Packages:
  python-neutron-vpnaas-9.0.0-1.el7.noarch
  openstack-neutron-ml2-9.2.0-1.el7.noarch
  python2-neutronclient-6.0.0-2.el7.noarch
  python-neutron-lib-0.4.0-1.el7.noarch
  openstack-neutron-common-9.2.0-1.el7.noarch
  openstack-neutron-openvswitch-9.2.0-1.el7.noarch
  python-neutron-9.2.0-1.el7.noarch
  openstack-neutron-9.2.0-1.el7.noarch
  openstack-neutron-vpnaas-9.0.0-1.el7.noarch
  strongswan-5.4.0-2.el7.x86_64

  Configuration options for vpnaass:
  service_provider = 
VPN:strongswan:neutron_vpnaas.services.vpn.service_drivers.ipsec.IPsecVPNDriver:default
  vpn_device_driver = 
neutron_vpnaas.services.vpn.device_drivers.fedora_strongswan_ipsec.FedoraStrongSwanDriver

  After I create an IPSec Site Connection use commands as follows:

  1) neutron vpn-ikepolicy-create ikepolicy
  2) neutron vpn-ipsecpolicy-create ipsecpolicy
  3) neutron vpn-service-create --name vpn0 --description "My vpn service0" 
vpn0 vpn0-subnet
  4) neutron vpn-service-create --name vpn1 --description "My vpn service1" 
vpn1 vpn1-subnet
  5) neutron ipsec-site-connection-create --name vpnconnection0 --vpnservice-id 
vpn0 --ikepolicy-id ikepolicy --ipsecpolicy-id ipsecpolicy --peer-address 
10.0.149.16 --peer-id 10.0.149.16 --peer-cidr 10.3.0.0/24 --psk secret
  6) neutron ipsec-site-connection-create --name vpnconnection1 --vpnservice-id 
vpn1 --ikepolicy-id ikepolicy --ipsecpolicy-id ipsecpolicy --peer-address 
10.0.149.3 --peer-id 10.0.149.3 --peer-cidr 10.1.0.0/24 --psk secret

  Then the status of vpnconnection0 and vpnconnection1 always keep
  PENDING_CREATE.

  Logs in /var/log/neutron/vpn-agent.log:

  2017-04-06 13:42:12.134 16118 INFO oslo_rootwrap.client 
[req-1441bb58-bfa2-4b5b-bd57-71a9501f8716 07e158a349474724abc69f8651850b18 
de65099dfaba4a4f8cb3c49911980e5c - - -] cmd: ['cp', '-a', 
'/usr/share/strongswan/templates/config/strongswan.d/../plugins', 
'/var/lib/neutron/ipsec/a2e0c9b9-51fd-4054-a4f9-d2b53adce83a/etc/strongswan/strongswan.d/charon']
  2017-04-06 13:42:12.135 16118 ERROR oslo_messaging.rpc.server 
[req-1441bb58-bfa2-4b5b-bd57-71a9501f8716 07e158a349474724abc69f8651850b18 
de65099dfaba4a4f8cb3c49911980e5c - - -] Exception during message handling
  2017-04-06 13:42:12.135 16118 ERROR oslo_messaging.rpc.server Traceback (most 
recent call last):
  2017-04-06 13:42:12.135 16118 ERROR oslo_messaging.rpc.server   File 
"/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 133, in 
_process_incoming
  2017-04-06 13:42:12.135 16118 ERROR oslo_messaging.rpc.server res = 
self.dispatcher.dispatch(message)
  2017-04-06 13:42:12.135 16118 ERROR oslo_messaging.rpc.server   File 
"/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 150, 
in dispatch
  2017-04-06 13:42:12.135 16118 ERROR oslo_messaging.rpc.server return 
self._do_dispatch(endpoint, method, ctxt, args)
  2017-04-06 13:42:12.135 16118 ERROR oslo_messaging.rpc.server   File 
"/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 121, 
in _do_dispatch
  2017-04-06 13:42:12.135 16118 ERROR oslo_messaging.rpc.server result = 
func(ctxt, **new_args)
  2017-04-06 13:42:12.135 16118 ERROR oslo_messaging.rpc.server   File 
"/usr/lib/python2.7/site-packages/neutron_vpnaas/services/vpn/device_drivers/ipsec.py",
 line 884, in vpnservice_updated
  2017-04-06 13:42:12.135 16118 ERROR oslo_messaging.rpc.server 
self.sync(context, [router] if router else [])
  2017-04-06 13:42:12.135 16118 ERROR oslo_messaging.rpc.server   File 
"/usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py", line 271, in 
inner
  2017-04-06 13:42:12.135 16118 ERROR oslo_messaging.rpc.server return 
f(*args, **kwargs)
  2017-04-06 13:42:12.135 16118 ERROR oslo_messaging.rpc.server   File 
"/usr/lib/python2.7/site-packages/neutron_vpnaas/services/vpn/device_drivers/ipsec.py",
 line 1045, in sync
  2017-04-06 13:42:12.135 16118 ERROR oslo_messaging.rpc.server 
self._sync_vpn_processes(vpnservices, sync_router_ids)
  2017-04-06 13:42:12.135 16118 ERROR oslo_messaging.rpc.server   File 
"/usr/lib/python2.7/site-packages/neutron_vpnaas/services/vpn/device_drivers/ipsec.py",
 line 1069, in _sync_vpn_processes
  2017-04-06 13:42:12.135 16118 ERROR oslo_messaging.rpc.server 
process.update()
  2017-04-06 13:42:12.135 16118 ERROR oslo_messaging.rpc.server   File 
"/usr/lib/python2.7/site-packages/neutron_vpnaas/services/vpn/device_drivers/ipsec.py",
 line 286, in update
  2017-04-06 

[Yahoo-eng-team] [Bug 1972028] [NEW] _get_pci_passthrough_devices prone to race condition

2022-05-06 Thread Mohammed Naser
Public bug reported:

At the moment, the `_get_pci_passthrough_devices` function is prone to
race conditions.

This specific code here calls `listCaps()`, however, it is possible that
the device has disappeared by the time on method has been called:

https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L7949-L7959

Which would result in the following traceback:

2022-05-06 20:16:16.110 4053032 ERROR nova.compute.manager 
[req-51b7c1c4-2b4a-46cc-9baa-8bf61801c48d - - - - -] Error updating resources 
for node .: libvirt.libvirtError: Node device not found: no node device 
with matching name 'net_tap8b08ec90_e5_fe_16_3e_0f_0a_d4'
2022-05-06 20:16:16.110 4053032 ERROR nova.compute.manager Traceback (most 
recent call last):
2022-05-06 20:16:16.110 4053032 ERROR nova.compute.manager   File 
"/var/lib/openstack/lib/python3.8/site-packages/nova/compute/manager.py", line 
9946, in _update_available_resource_for_node
2022-05-06 20:16:16.110 4053032 ERROR nova.compute.manager 
self.rt.update_available_resource(context, nodename,
2022-05-06 20:16:16.110 4053032 ERROR nova.compute.manager   File 
"/var/lib/openstack/lib/python3.8/site-packages/nova/compute/resource_tracker.py",
 line 879, in update_available_resource
2022-05-06 20:16:16.110 4053032 ERROR nova.compute.manager resources = 
self.driver.get_available_resource(nodename)
2022-05-06 20:16:16.110 4053032 ERROR nova.compute.manager   File 
"/var/lib/openstack/lib/python3.8/site-packages/nova/virt/libvirt/driver.py", 
line 8937, in get_available_resource
2022-05-06 20:16:16.110 4053032 ERROR nova.compute.manager 
data['pci_passthrough_devices'] = self._get_pci_passthrough_devices()
2022-05-06 20:16:16.110 4053032 ERROR nova.compute.manager   File 
"/var/lib/openstack/lib/python3.8/site-packages/nova/virt/libvirt/driver.py", 
line 7663, in _get_pci_passthrough_devices
2022-05-06 20:16:16.110 4053032 ERROR nova.compute.manager vdpa_devs = [
2022-05-06 20:16:16.110 4053032 ERROR nova.compute.manager   File 
"/var/lib/openstack/lib/python3.8/site-packages/nova/virt/libvirt/driver.py", 
line 7664, in 
2022-05-06 20:16:16.110 4053032 ERROR nova.compute.manager dev for dev in 
devices.values() if "vdpa" in dev.listCaps()
2022-05-06 20:16:16.110 4053032 ERROR nova.compute.manager   File 
"/var/lib/openstack/lib/python3.8/site-packages/libvirt.py", line 6276, in 
listCaps
2022-05-06 20:16:16.110 4053032 ERROR nova.compute.manager raise 
libvirtError('virNodeDeviceListCaps() failed')
2022-05-06 20:16:16.110 4053032 ERROR nova.compute.manager 
libvirt.libvirtError: Node device not found: no node device with matching name 
'net_tap8b08ec90_e5_fe_16_3e_0f_0a_d4'
2022-05-06 20:16:16.110 4053032 ERROR nova.compute.manager 

I think the cleaner way is to loop over all the items and skip a device
if it raises an error that the device is not found.

** Affects: nova
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1972028

Title:
  _get_pci_passthrough_devices prone to race condition

Status in OpenStack Compute (nova):
  New

Bug description:
  At the moment, the `_get_pci_passthrough_devices` function is prone to
  race conditions.

  This specific code here calls `listCaps()`, however, it is possible
  that the device has disappeared by the time on method has been called:

  
https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L7949-L7959

  Which would result in the following traceback:

  2022-05-06 20:16:16.110 4053032 ERROR nova.compute.manager 
[req-51b7c1c4-2b4a-46cc-9baa-8bf61801c48d - - - - -] Error updating resources 
for node .: libvirt.libvirtError: Node device not found: no node device 
with matching name 'net_tap8b08ec90_e5_fe_16_3e_0f_0a_d4'
  2022-05-06 20:16:16.110 4053032 ERROR nova.compute.manager Traceback (most 
recent call last):
  2022-05-06 20:16:16.110 4053032 ERROR nova.compute.manager   File 
"/var/lib/openstack/lib/python3.8/site-packages/nova/compute/manager.py", line 
9946, in _update_available_resource_for_node
  2022-05-06 20:16:16.110 4053032 ERROR nova.compute.manager 
self.rt.update_available_resource(context, nodename,
  2022-05-06 20:16:16.110 4053032 ERROR nova.compute.manager   File 
"/var/lib/openstack/lib/python3.8/site-packages/nova/compute/resource_tracker.py",
 line 879, in update_available_resource
  2022-05-06 20:16:16.110 4053032 ERROR nova.compute.manager resources = 
self.driver.get_available_resource(nodename)
  2022-05-06 20:16:16.110 4053032 ERROR nova.compute.manager   File 
"/var/lib/openstack/lib/python3.8/site-packages/nova/virt/libvirt/driver.py", 
line 8937, in get_available_resource
  2022-05-06 20:16:16.110 4053032 ERROR nova.compute.manager 
data['pci_passthrough_devices'] = self._get_pci_passthrough_devices()
  2022-05-06 20:16:16.110 4053032 ERROR 

[Yahoo-eng-team] [Bug 1972023] [NEW] Failed (but retryable) device detaches are logged as ERROR

2022-05-06 Thread Mohammed Naser
Public bug reported:

At the moment, if a device attempts to be detached and times out (using
libvirt), it will log a message:

https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L2570-L2573

However, this is not a failure, since we actually retry the process a
few more times depending on configuration, and then if it is a full
failure, we do report that:

https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L2504

In high load environments where this timeout might be hit, this triggers
"ERROR" messages that might seem problematic to the operator, however,
since the follow up attempt succeeds, there's no need for attention.
This message should be logged as a WARNING since the operator will only
need to intervene if the ERROR is logged and it is a full failure of
detaching the device.

** Affects: nova
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1972023

Title:
  Failed (but retryable) device detaches are logged as ERROR

Status in OpenStack Compute (nova):
  New

Bug description:
  At the moment, if a device attempts to be detached and times out
  (using libvirt), it will log a message:

  
https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L2570-L2573

  However, this is not a failure, since we actually retry the process a
  few more times depending on configuration, and then if it is a full
  failure, we do report that:

  
https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L2504

  In high load environments where this timeout might be hit, this
  triggers "ERROR" messages that might seem problematic to the operator,
  however, since the follow up attempt succeeds, there's no need for
  attention.  This message should be logged as a WARNING since the
  operator will only need to intervene if the ERROR is logged and it is
  a full failure of detaching the device.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1972023/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1971760] [NEW] nova-compute leaks green threads

2022-05-05 Thread Mohammed Naser
Public bug reported:

At the moment, if the cloud sustain a large number of VIF plugging
timeouts, it will lead into a ton of leaked green threads which can
cause the nova-compute process to stop reporting/responding.

The tracebacks that would occur would be:

===
2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 
0c0d2422-781c-4bd2-b6bd-e5e3c94b602b] Traceback (most recent call last):
2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 
0c0d2422-781c-4bd2-b6bd-e5e3c94b602b]   File 
"/var/lib/openstack/lib/python3.8/site-packages/nova/virt/libvirt/driver.py", 
line 7230, in _create_guest_with_network
2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 
0c0d2422-781c-4bd2-b6bd-e5e3c94b602b] guest = self._create_guest(
2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 
0c0d2422-781c-4bd2-b6bd-e5e3c94b602b]   File 
"/usr/lib/python3.8/contextlib.py", line 120, in __exit__
2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 
0c0d2422-781c-4bd2-b6bd-e5e3c94b602b] next(self.gen)
2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 
0c0d2422-781c-4bd2-b6bd-e5e3c94b602b]   File 
"/var/lib/openstack/lib/python3.8/site-packages/nova/compute/manager.py", line 
479, in wait_for_instance_event
2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 
0c0d2422-781c-4bd2-b6bd-e5e3c94b602b] actual_event = event.wait()
2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 
0c0d2422-781c-4bd2-b6bd-e5e3c94b602b]   File 
"/var/lib/openstack/lib/python3.8/site-packages/eventlet/event.py", line 125, 
in wait
2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 
0c0d2422-781c-4bd2-b6bd-e5e3c94b602b] result = hub.switch()
2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 
0c0d2422-781c-4bd2-b6bd-e5e3c94b602b]   File 
"/var/lib/openstack/lib/python3.8/site-packages/eventlet/hubs/hub.py", line 
313, in switch
2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 
0c0d2422-781c-4bd2-b6bd-e5e3c94b602b] return self.greenlet.switch()
2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 
0c0d2422-781c-4bd2-b6bd-e5e3c94b602b] eventlet.timeout.Timeout: 300 seconds
2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 
0c0d2422-781c-4bd2-b6bd-e5e3c94b602b] 
2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 
0c0d2422-781c-4bd2-b6bd-e5e3c94b602b] During handling of the above exception, 
another exception occurred:
2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 
0c0d2422-781c-4bd2-b6bd-e5e3c94b602b] 
2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 
0c0d2422-781c-4bd2-b6bd-e5e3c94b602b] Traceback (most recent call last):
2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 
0c0d2422-781c-4bd2-b6bd-e5e3c94b602b]   File 
"/var/lib/openstack/lib/python3.8/site-packages/nova/compute/manager.py", line 
2409, in _build_and_run_instance
2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 
0c0d2422-781c-4bd2-b6bd-e5e3c94b602b] self.driver.spawn(context, instance, 
image_meta,
2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 
0c0d2422-781c-4bd2-b6bd-e5e3c94b602b]   File 
"/var/lib/openstack/lib/python3.8/site-packages/nova/virt/libvirt/driver.py", 
line 4193, in spawn
2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 
0c0d2422-781c-4bd2-b6bd-e5e3c94b602b] self._create_guest_with_network(
2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 
0c0d2422-781c-4bd2-b6bd-e5e3c94b602b]   File 
"/var/lib/openstack/lib/python3.8/site-packages/nova/virt/libvirt/driver.py", 
line 7256, in _create_guest_with_network
2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 
0c0d2422-781c-4bd2-b6bd-e5e3c94b602b] raise 
exception.VirtualInterfaceCreateException()
2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 
0c0d2422-781c-4bd2-b6bd-e5e3c94b602b] 
nova.exception.VirtualInterfaceCreateException: Virtual Interface creation 
failed
2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 
0c0d2422-781c-4bd2-b6bd-e5e3c94b602b] 
===

Eventually, with enough of these, the nova-compute process would hang.
The output of GMR shows nearly 6094 threads, with around 3038 of them
having the traceback below:

===
--Green Thread--

/var/lib/openstack/lib/python3.8/site-packages/eventlet/hubs/hub.py:355 in run
`self.fire_timers(self.clock())`

/var/lib/openstack/lib/python3.8/site-packages/eventlet/hubs/hub.py:476 in 
fire_timers
`timer()`

/var/lib/openstack/lib/python3.8/site-packages/eventlet/hubs/timer.py:59 in 
__call__
`cb(*args, **kw)`

/var/lib/openstack/lib/python3.8/site-packages/eventlet/hubs/__init__.py:151 in 
_timeout
`current.throw(exc)`
===

In addition, 3039 of 

[Yahoo-eng-team] [Bug 1917645] Re: Nova can't create instances if RabbitMQ notification cluster is down

2021-11-23 Thread Mohammed Naser
As per sean-k-mooney advice, I've added this to be an oslo.messaging bug
since it's more of an issue in there than it is in Nova.

** Also affects: oslo.messaging
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1917645

Title:
  Nova can't create instances if RabbitMQ notification cluster is down

Status in OpenStack Compute (nova):
  Confirmed
Status in oslo.messaging:
  New

Bug description:
  We use independent RabbitMQ clusters for each OpenStack project, Nova
  Cells and also for notifications. Recently, I noticed in our test
  infrastructure that if the RabbitMQ cluster for notifications has an
  outage, Nova can't create new instances. Possibly other operations
  will also hang.

  Not being able to send a notification/connect to the RabbitMQ cluster
  shouldn't stop new instances to be created. (If this is actually an
  use-case for some deployments, the operator should have the
  possibility to configure it.)

  Tested against the master branch.

  If the notification RabbitMQ is stooped, when creating an instance,
  nova-scheduler is stuck with:

  ```
  Mar 01 21:16:28 devstack nova-scheduler[18384]: DEBUG 
nova.scheduler.request_filter [None req-353318d1-f4bd-499d-98db-a0919d28ecf7 
demo demo] Request filter 'accelerators_filter' took 0.0 seconds {{(pid=18384) 
wrapper /opt/stack/nova/nova/scheduler/request_filter.py:46}}
  Mar 01 21:16:32 devstack nova-scheduler[18384]: ERROR 
oslo.messaging._drivers.impl_rabbit [None 
req-353318d1-f4bd-499d-98db-a0919d28ecf7 demo demo] Connection failed: [Errno 
113] EHOSTUNREACH (retrying in 2.0 seconds): OSError: [Errno 113] EHOSTUNREACH
  Mar 01 21:16:35 devstack nova-scheduler[18384]: ERROR 
oslo.messaging._drivers.impl_rabbit [None 
req-353318d1-f4bd-499d-98db-a0919d28ecf7 demo demo] Connection failed: [Errno 
113] EHOSTUNREACH (retrying in 4.0 seconds): OSError: [Errno 113] EHOSTUNREACH
  Mar 01 21:16:42 devstack nova-scheduler[18384]: ERROR 
oslo.messaging._drivers.impl_rabbit [None 
req-353318d1-f4bd-499d-98db-a0919d28ecf7 demo demo] Connection failed: [Errno 
113] EHOSTUNREACH (retrying in 6.0 seconds): OSError: [Errno 113] EHOSTUNREACH
  Mar 01 21:16:51 devstack nova-scheduler[18384]: ERROR 
oslo.messaging._drivers.impl_rabbit [None 
req-353318d1-f4bd-499d-98db-a0919d28ecf7 demo demo] Connection failed: [Errno 
113] EHOSTUNREACH (retrying in 8.0 seconds): OSError: [Errno 113] EHOSTUNREACH
  Mar 01 21:17:02 devstack nova-scheduler[18384]: ERROR 
oslo.messaging._drivers.impl_rabbit [None 
req-353318d1-f4bd-499d-98db-a0919d28ecf7 demo demo] Connection failed: [Errno 
113] EHOSTUNREACH (retrying in 10.0 seconds): OSError: [Errno 113] EHOSTUNREACH
  (...)
  ```

  Because the notification RabbitMQ cluster is down, Nova gets stuck in:

  
https://github.com/openstack/nova/blob/5b66caab870558b8a7f7b662c01587b959ad3d41/nova/scheduler/filter_scheduler.py#L85

  because oslo messaging never gives up:

  
https://github.com/openstack/oslo.messaging/blob/5aa645b38b4c1cf08b00e687eb6c7c4b8a0211fc/oslo_messaging/_drivers/impl_rabbit.py#L736

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1917645/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1931908] [NEW] Default CORS allow_headers missing X-OpenStack-Nova-API-Version

2021-06-14 Thread Mohammed Naser
Public bug reported:

When enabling CORS, by default, the `X-OpenStack-Nova-API-Version` API
header is not included in the allowed headers.  It should be by default
because it's critical to the operation of the OpenStack Nova API.

** Affects: nova
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1931908

Title:
  Default CORS allow_headers missing X-OpenStack-Nova-API-Version

Status in OpenStack Compute (nova):
  New

Bug description:
  When enabling CORS, by default, the `X-OpenStack-Nova-API-Version` API
  header is not included in the allowed headers.  It should be by
  default because it's critical to the operation of the OpenStack Nova
  API.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1931908/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1892370] [NEW] Database migrations fail when schema includes dash

2020-08-20 Thread Mohammed Naser
Public bug reported:

In our database migrations, we run the following:

'ALTER DATABASE %s DEFAULT CHARACTER SET utf8'

If usnig a database name that includes a dash, the migration fails
because it requires to be wrapped by `.

** Affects: nova
 Importance: Undecided
 Assignee: Mohammed Naser (mnaser)
 Status: In Progress

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1892370

Title:
  Database migrations fail when schema includes dash

Status in OpenStack Compute (nova):
  In Progress

Bug description:
  In our database migrations, we run the following:

  'ALTER DATABASE %s DEFAULT CHARACTER SET utf8'

  If usnig a database name that includes a dash, the migration fails
  because it requires to be wrapped by `.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1892370/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1890057] Re: EC2 instance_id_mappings are never deleted

2020-08-03 Thread Mohammed Naser
*** This bug is a duplicate of bug 1786298 ***
https://bugs.launchpad.net/bugs/1786298

** This bug has been marked a duplicate of bug 1786298
   nova-manage db archive_deleted_rows does not cleanup table 
instance_id_mappings

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1890057

Title:
  EC2 instance_id_mappings are never deleted

Status in OpenStack Compute (nova):
  New

Bug description:
  It looks like whenever we create an instance, we create an EC2
  instance ID mapping for it:

  
https://github.com/openstack/nova/blob/df49ad9b29afcafa847b83df445b6627350721b5/nova/db/sqlalchemy/api.py#L1137-L1138

  Which is used by the EC2 objects:

  
https://github.com/openstack/nova/blob/df49ad9b29afcafa847b83df445b6627350721b5/nova/objects/ec2.py

  Which is not really used by much in the API, but it even has a
  mechanism to 'soft-create' mappings when they pop up:

  
https://github.com/openstack/nova/blob/df49ad9b29afcafa847b83df445b6627350721b5/nova/objects/ec2.py#L63-L74

  but a lot of the code seems unreferenced, so I am not sure what is the
  state of this, however, the problem comes in the fact that it never
  gets soft deleted anywhere in the code which can lead to...

  ```
  MariaDB [nova]> SELECT COUNT(*) FROM instance_id_mappings;
  +--+
  | COUNT(*) |
  +--+
  |  3941119 |
  +--+
  ```

  For something entirely not used.  I think the fix could be two parts
  (but I don't understand why the EC2 API is still referneced):

  1. Mappings should be created on-demand (which in my case they never will)
  2. Mappings should be soft deleted on instane delete (which should make the 
archiving work).

  I'm happy to try and help drive this if we come up with a solution.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1890057/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1890057] [NEW] EC2 instance_id_mappings are never deleted

2020-08-02 Thread Mohammed Naser
Public bug reported:

It looks like whenever we create an instance, we create an EC2 instance
ID mapping for it:

https://github.com/openstack/nova/blob/df49ad9b29afcafa847b83df445b6627350721b5/nova/db/sqlalchemy/api.py#L1137-L1138

Which is used by the EC2 objects:

https://github.com/openstack/nova/blob/df49ad9b29afcafa847b83df445b6627350721b5/nova/objects/ec2.py

Which is not really used by much in the API, but it even has a mechanism
to 'soft-create' mappings when they pop up:

https://github.com/openstack/nova/blob/df49ad9b29afcafa847b83df445b6627350721b5/nova/objects/ec2.py#L63-L74

but a lot of the code seems unreferenced, so I am not sure what is the
state of this, however, the problem comes in the fact that it never gets
soft deleted anywhere in the code which can lead to...

```
MariaDB [nova]> SELECT COUNT(*) FROM instance_id_mappings;
+--+
| COUNT(*) |
+--+
|  3941119 |
+--+
```

For something entirely not used.  I think the fix could be two parts
(but I don't understand why the EC2 API is still referneced):

1. Mappings should be created on-demand (which in my case they never will)
2. Mappings should be soft deleted on instane delete (which should make the 
archiving work).

I'm happy to try and help drive this if we come up with a solution.

** Affects: nova
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1890057

Title:
  EC2 instance_id_mappings are never deleted

Status in OpenStack Compute (nova):
  New

Bug description:
  It looks like whenever we create an instance, we create an EC2
  instance ID mapping for it:

  
https://github.com/openstack/nova/blob/df49ad9b29afcafa847b83df445b6627350721b5/nova/db/sqlalchemy/api.py#L1137-L1138

  Which is used by the EC2 objects:

  
https://github.com/openstack/nova/blob/df49ad9b29afcafa847b83df445b6627350721b5/nova/objects/ec2.py

  Which is not really used by much in the API, but it even has a
  mechanism to 'soft-create' mappings when they pop up:

  
https://github.com/openstack/nova/blob/df49ad9b29afcafa847b83df445b6627350721b5/nova/objects/ec2.py#L63-L74

  but a lot of the code seems unreferenced, so I am not sure what is the
  state of this, however, the problem comes in the fact that it never
  gets soft deleted anywhere in the code which can lead to...

  ```
  MariaDB [nova]> SELECT COUNT(*) FROM instance_id_mappings;
  +--+
  | COUNT(*) |
  +--+
  |  3941119 |
  +--+
  ```

  For something entirely not used.  I think the fix could be two parts
  (but I don't understand why the EC2 API is still referneced):

  1. Mappings should be created on-demand (which in my case they never will)
  2. Mappings should be soft deleted on instane delete (which should make the 
archiving work).

  I'm happy to try and help drive this if we come up with a solution.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1890057/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1889454] [NEW] br-int has an unpredictable MTU

2020-07-29 Thread Mohammed Naser
Public bug reported:

We have an environment where users can plug their VMs both to tenant and
provider networks on the hypervisor.  This environment does not have
jumbo frames.  The MTU for VMs plugged directly into provider networks
is 1500 (physical network) however it is 1450 for tneant networks
(VXLAN).

https://github.com/openstack/neutron/blob/2ac52607c266e593700be0784ebadc77789070ff/neutron/agent/common/ovs_lib.py#L299-L319

The code which creates the br-int bridge does not factor in an MTU,
which means depending on what gets plugged in first, you could end up
with 1500 MTU interfaces connected to br-int, which would give things
like this in the system logs:

br-int: dropped over-mtu packet: 1500 > 1458

I'm not sure what the best solution inside Neutron to do.  Should we
perhaps set br-int to the MTU of the largest physical network attachable
on the agent?  I'm happy to pick up the work.

** Affects: neutron
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1889454

Title:
  br-int has an unpredictable MTU

Status in neutron:
  New

Bug description:
  We have an environment where users can plug their VMs both to tenant
  and provider networks on the hypervisor.  This environment does not
  have jumbo frames.  The MTU for VMs plugged directly into provider
  networks is 1500 (physical network) however it is 1450 for tneant
  networks (VXLAN).

  
https://github.com/openstack/neutron/blob/2ac52607c266e593700be0784ebadc77789070ff/neutron/agent/common/ovs_lib.py#L299-L319

  The code which creates the br-int bridge does not factor in an MTU,
  which means depending on what gets plugged in first, you could end up
  with 1500 MTU interfaces connected to br-int, which would give things
  like this in the system logs:

  br-int: dropped over-mtu packet: 1500 > 1458

  I'm not sure what the best solution inside Neutron to do.  Should we
  perhaps set br-int to the MTU of the largest physical network
  attachable on the agent?  I'm happy to pick up the work.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1889454/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1887523] [NEW] Deadlock detection code can be stale

2020-07-14 Thread Mohammed Naser
Public bug reported:

oslo.db has plenty of infrastructure for detecting deadlocks, however,
it seems that at the moment, neutron has it's own implementation of it
which is missing a bunch of deadlocks, causing issues when doing work at
scale.

this bug is to track the work in refactoring all of this to use the
native oslo retry.

** Affects: neutron
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1887523

Title:
  Deadlock detection code can be stale

Status in neutron:
  New

Bug description:
  oslo.db has plenty of infrastructure for detecting deadlocks, however,
  it seems that at the moment, neutron has it's own implementation of it
  which is missing a bunch of deadlocks, causing issues when doing work
  at scale.

  this bug is to track the work in refactoring all of this to use the
  native oslo retry.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1887523/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1883969] [NEW] Nova doesn't fail at API layer when image_size > volume_size with BFV

2020-06-17 Thread Mohammed Naser
Public bug reported:

When trying to boot an instance where the image size is larger than the
volume size, there seems to be no 'protection' mechanism of stopping you
from doing that, it ends up failing in the compute manager layer making
it more complicated for the user to debug.

We should probably fail early in the API (just like we do for non-BFV
instances).

** Affects: nova
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1883969

Title:
  Nova doesn't fail at API layer when image_size > volume_size with BFV

Status in OpenStack Compute (nova):
  New

Bug description:
  When trying to boot an instance where the image size is larger than
  the volume size, there seems to be no 'protection' mechanism of
  stopping you from doing that, it ends up failing in the compute
  manager layer making it more complicated for the user to debug.

  We should probably fail early in the API (just like we do for non-BFV
  instances).

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1883969/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1878979] [NEW] Quota code does not respect [api]/instance_list_per_project_cells

2020-05-15 Thread Mohammed Naser
Public bug reported:

The function which counts resources using the legacy method involves
getting a list of all cell mappings assigned to a specific project:

https://github.com/openstack/nova/blob/575a91ff5be79ac35aef4b61d84c78c693693304/nova/quota.py#L1170-L1209

This code can be very heavy on a database which contains a lot of
instances (but not a lot of mappings), potentially scanning millions of
rows to gather 1-2 cell mappings.  In a single cell environment, it is
just extra CPU usage with exactly the same outcome.

The [api]/instance_list_per_project_cells was introduced to workaround
this:

https://github.com/openstack/nova/blob/575a91ff5be79ac35aef4b61d84c78c693693304/nova/compute/instance_list.py#L146-L153

However, the quota code does not implement it which means quota count
take a big toll on the database server.  We should ideally mirror the
same behaviour in the quota code.

** Affects: nova
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1878979

Title:
  Quota code does not respect [api]/instance_list_per_project_cells

Status in OpenStack Compute (nova):
  New

Bug description:
  The function which counts resources using the legacy method involves
  getting a list of all cell mappings assigned to a specific project:

  
https://github.com/openstack/nova/blob/575a91ff5be79ac35aef4b61d84c78c693693304/nova/quota.py#L1170-L1209

  This code can be very heavy on a database which contains a lot of
  instances (but not a lot of mappings), potentially scanning millions
  of rows to gather 1-2 cell mappings.  In a single cell environment, it
  is just extra CPU usage with exactly the same outcome.

  The [api]/instance_list_per_project_cells was introduced to workaround
  this:

  
https://github.com/openstack/nova/blob/575a91ff5be79ac35aef4b61d84c78c693693304/nova/compute/instance_list.py#L146-L153

  However, the quota code does not implement it which means quota count
  take a big toll on the database server.  We should ideally mirror the
  same behaviour in the quota code.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1878979/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1862205] [NEW] Instances not visible when hidden=NULL

2020-02-06 Thread Mohammed Naser
Public bug reported:

During an upgrade of a cloud from Stein to Train, there is a migration
which adds the `hidden` field to the database.

In that migration, it was assumed that it does not backfill all of the
columns.  However, upon verifying, it actually does backfill all columns
and the order of operations *seems* to be:

1. Create new column for `hidden`
2. Update database migration version
3. Start backfilling all existing instances with hidden=0

In my case, the migration did create the column but failed to backfill
all existing instances because of the large number of instances.
However, running the migrations again seems to simply continue and not
block on that migration, but leaving all columns with hidden=NULL.


Feb 06 14:06:13 control02-nova-api-container-f89ad8b4 nova-manage[10596]: 
2020-02-06 14:06:13.566 10596 INFO migrate.versioning.api 
[req-34f0c5a6-2983-4c8e-9b9d-14167851c984 - - - - -] 398 -> 399... 
Feb 06 14:07:18 control02-nova-api-container-f89ad8b4 nova-manage[10596]: 
2020-02-06 14:07:18.129 10596 ERROR oslo_db.sqlalchemy.exc_filters 
[req-34f0c5a6-2983-4c8e-9b9d-14167851c984 - - - - -] DBAPIError exception 
wrapped from (pymysql.err.InternalError) (1180, 'Got error 90 "Message too 
long" during COMMIT')
Feb 06 14:07:18 control02-nova-api-container-f89ad8b4 nova-manage[10596]: 
2020-02-06 14:07:18.132 10596 ERROR oslo_db.sqlalchemy.exc_filters 
[req-34f0c5a6-2983-4c8e-9b9d-14167851c984 - - - - -] DB exception wrapped.: 
sqlalchemy.exc.ResourceClosedError: This Connection is closed
Feb 06 14:10:22 control02-nova-api-container-f89ad8b4 nova-manage[14139]: 
2020-02-06 14:10:22.930 14139 INFO migrate.versioning.api 
[req-032e5b40-88c9-4f4b-8ab0-525c50389967 - - - - -] 398 -> 399... 
Feb 06 14:10:22 control02-nova-api-container-f89ad8b4 nova-manage[14139]: 
2020-02-06 14:10:22.985 14139 INFO migrate.versioning.api 
[req-032e5b40-88c9-4f4b-8ab0-525c50389967 - - - - -] done
Feb 06 14:10:22 control02-nova-api-container-f89ad8b4 nova-manage[14139]: 
2020-02-06 14:10:22.985 14139 INFO migrate.versioning.api 
[req-032e5b40-88c9-4f4b-8ab0-525c50389967 - - - - -] 399 -> 400... 
Feb 06 14:10:22 control02-nova-api-container-f89ad8b4 nova-manage[14139]: 
2020-02-06 14:10:22.995 14139 INFO migrate.versioning.api 
[req-032e5b40-88c9-4f4b-8ab0-525c50389967 - - - - -] done
Feb 06 14:10:22 control02-nova-api-container-f89ad8b4 nova-manage[14139]: 
2020-02-06 14:10:22.995 14139 INFO migrate.versioning.api 
[req-032e5b40-88c9-4f4b-8ab0-525c50389967 - - - - -] 400 -> 401... 
Feb 06 14:10:23 control02-nova-api-container-f89ad8b4 nova-manage[14139]: 
2020-02-06 14:10:23.145 14139 INFO migrate.versioning.api 
[req-032e5b40-88c9-4f4b-8ab0-525c50389967 - - - - -] done
Feb 06 14:10:23 control02-nova-api-container-f89ad8b4 nova-manage[14139]: 
2020-02-06 14:10:23.145 14139 INFO migrate.versioning.api 
[req-032e5b40-88c9-4f4b-8ab0-525c50389967 - - - - -] 401 -> 402... 
Feb 06 14:10:23 control02-nova-api-container-f89ad8b4 nova-manage[14139]: 
2020-02-06 14:10:23.244 14139 INFO migrate.versioning.api 
[req-032e5b40-88c9-4f4b-8ab0-525c50389967 - - - - -] done


This issue is two-part, because now it seems that Nova does not assume
that hidden=NULL means that the instance is not hidden and no longer
displays the instance via API or any other operations.

The "very silly" confirmation of this behaviour of backfilling was my
attempt at patching things up resulted in the same error:

==
MariaDB [nova]> update instances set hidden=0;
ERROR 1180 (HY000): Got error 90 "Message too long" during COMMIT
===

Ideally, Nova shouldn't try and backfill values and it should treat
hidden=NULL as 0.

** Affects: nova
 Importance: Undecided
 Status: New


** Tags: db upgrade

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1862205

Title:
  Instances not visible when hidden=NULL

Status in OpenStack Compute (nova):
  New

Bug description:
  During an upgrade of a cloud from Stein to Train, there is a migration
  which adds the `hidden` field to the database.

  In that migration, it was assumed that it does not backfill all of the
  columns.  However, upon verifying, it actually does backfill all
  columns and the order of operations *seems* to be:

  1. Create new column for `hidden`
  2. Update database migration version
  3. Start backfilling all existing instances with hidden=0

  In my case, the migration did create the column but failed to backfill
  all existing instances because of the large number of instances.
  However, running the migrations again seems to simply continue and not
  block on that migration, but leaving all columns with hidden=NULL.

  
  Feb 06 14:06:13 control02-nova-api-container-f89ad8b4 nova-manage[10596]: 
2020-02-06 14:06:13.566 10596 INFO 

[Yahoo-eng-team] [Bug 1852121] [NEW] Delete archived records instantly

2019-11-11 Thread Mohammed Naser
Public bug reported:

At the moment, in order to clean up a database, you will have to archive
first and then run the delete afterwards.

If the operator doesn't care about the ability of restoring deleted
instances, it means that the archive step is useless for them.

It would be nice if we added an option to archive to simply purge
directly instead of archive (then use purge records command afterwards).

** Affects: nova
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1852121

Title:
  Delete archived records instantly

Status in OpenStack Compute (nova):
  New

Bug description:
  At the moment, in order to clean up a database, you will have to
  archive first and then run the delete afterwards.

  If the operator doesn't care about the ability of restoring deleted
  instances, it means that the archive step is useless for them.

  It would be nice if we added an option to archive to simply purge
  directly instead of archive (then use purge records command
  afterwards).

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1852121/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1839560] [NEW] ironic: moving node to maintenance makes it unusable afterwards

2019-08-08 Thread Mohammed Naser
Public bug reported:

If you use the Ironic API to set a node into a maintenance (for whatever
reason), it will no longer be included in the list of available nodes to
Nova.

When Nova refreshes it's resources periodically, it will find that it is
no longer in the list of available nodes and delete it from the
database.

Once you enable the node again and Nova attempts to create the
ComputeNode again, it fails due to the duplicate UUID in the database,
because the old record is soft deleted and had the same UUID.

ref:
https://github.com/openstack/nova/commit/9f28727eb75e05e07bad51b6eecce667d09dfb65
- this made computenode.uuid match the baremetal uuid

https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L8304-L8316
- this soft-deletes the computenode record when it doesn't see it in the list 
of active nodes


traces:
2019-08-08 17:20:13.921 6379 INFO nova.compute.manager 
[req-c71e5c81-eb34-4f72-a260-6aa7e802f490 - - - - -] Deleting orphan compute 
node 31 hypervisor host is 77788ad5-f1a4-46ac-8132-2d88dbd4e594, nodes are 
set([u'6d556617-2bdc-42b3-a3fe-b9218a1ebf0e', 
u'a634fab2-ecea-4cfa-be09-032dce6eaf51', 
u'2dee290d-ef73-46bc-8fc2-af248841ca12'])
...
2019-08-08 22:21:25.284 82770 WARNING nova.compute.resource_tracker 
[req-a58eb5e2-9be0-4503-bf68-dff32ff87a3a - - - - -] No compute node record for 
ctl1-:77788ad5-f1a4-46ac-8132-2d88dbd4e594: ComputeHostNotFound_Remote: 
Compute host ctl1- could not be found.

Remote error: DBDuplicateEntry (pymysql.err.IntegrityError) (1062, u"Duplicate 
entry '77788ad5-f1a4-46ac-8132-2d88dbd4e594' for key 'compute_nodes_uuid_idx'")


** Affects: nova
 Importance: High
 Status: Triaged


** Tags: compute ironic

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1839560

Title:
  ironic: moving node to maintenance makes it unusable afterwards

Status in OpenStack Compute (nova):
  Triaged

Bug description:
  If you use the Ironic API to set a node into a maintenance (for
  whatever reason), it will no longer be included in the list of
  available nodes to Nova.

  When Nova refreshes it's resources periodically, it will find that it
  is no longer in the list of available nodes and delete it from the
  database.

  Once you enable the node again and Nova attempts to create the
  ComputeNode again, it fails due to the duplicate UUID in the database,
  because the old record is soft deleted and had the same UUID.

  ref:
  
https://github.com/openstack/nova/commit/9f28727eb75e05e07bad51b6eecce667d09dfb65
  - this made computenode.uuid match the baremetal uuid

  
https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L8304-L8316
  - this soft-deletes the computenode record when it doesn't see it in the list 
of active nodes

  
  traces:
  2019-08-08 17:20:13.921 6379 INFO nova.compute.manager 
[req-c71e5c81-eb34-4f72-a260-6aa7e802f490 - - - - -] Deleting orphan compute 
node 31 hypervisor host is 77788ad5-f1a4-46ac-8132-2d88dbd4e594, nodes are 
set([u'6d556617-2bdc-42b3-a3fe-b9218a1ebf0e', 
u'a634fab2-ecea-4cfa-be09-032dce6eaf51', 
u'2dee290d-ef73-46bc-8fc2-af248841ca12'])
  ...
  2019-08-08 22:21:25.284 82770 WARNING nova.compute.resource_tracker 
[req-a58eb5e2-9be0-4503-bf68-dff32ff87a3a - - - - -] No compute node record for 
ctl1-:77788ad5-f1a4-46ac-8132-2d88dbd4e594: ComputeHostNotFound_Remote: 
Compute host ctl1- could not be found.
  
  Remote error: DBDuplicateEntry (pymysql.err.IntegrityError) (1062, 
u"Duplicate entry '77788ad5-f1a4-46ac-8132-2d88dbd4e594' for key 
'compute_nodes_uuid_idx'")
  

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1839560/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1825386] Re: nova is looking for OVMF file no longer provided by CentOS 7.6

2019-05-31 Thread Mohammed Naser
** Also affects: openstack-ansible
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1825386

Title:
  nova is looking for OVMF file no longer provided by CentOS 7.6

Status in OpenStack Compute (nova):
  New
Status in openstack-ansible:
  In Progress

Bug description:
  In nova/virt/libvirt/driver.py the code looks for a hardcoded path
  "/usr/share/OVMF/OVMF_CODE.fd".

  It appears that centos 7.6 has modified the OVMF-20180508-3 rpm to no
  longer contain this file.  Instead it now seems to be named
  /usr/share/OVMF/OVMF_CODE.secboot.fd

  This will break the ability to boot guests using UEFI.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1825386/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1822676] [NEW] novnc no longer sets token inside cookie

2019-04-01 Thread Mohammed Naser
Public bug reported:

For a long time, noVNC set the token inside a cookie so that when the
/websockify request came in, we had it in the cookies and we could look
it up from there and return the correct host.

However, since the following commit, they've removed this behavior

https://github.com/novnc/noVNC/commit/51f9f0098d306bbc67cc8e02ae547921b6f6585c
#diff-1d6838e3812778e95699b90d530543a1L173

This means that we're unable to use latest noVNC with Nova.  There is a
really gross workaround of using the 'path' override in the URL for
something like this

http://foo/vnc_lite.html?path=?token=foo

That feels pretty lame to me and it will have all deployment tools
change their settings.  Also, this wasn't caught in CI because we deploy
novnc from packages.

** Affects: nova
 Importance: High
 Assignee: melanie witt (melwitt)
 Status: Confirmed

** Affects: openstack-ansible
 Importance: Undecided
 Status: New


** Tags: console

** Also affects: openstack-ansible
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1822676

Title:
  novnc no longer sets token inside cookie

Status in OpenStack Compute (nova):
  Confirmed
Status in openstack-ansible:
  New

Bug description:
  For a long time, noVNC set the token inside a cookie so that when the
  /websockify request came in, we had it in the cookies and we could
  look it up from there and return the correct host.

  However, since the following commit, they've removed this behavior

  https://github.com/novnc/noVNC/commit/51f9f0098d306bbc67cc8e02ae547921b6f6585c
  #diff-1d6838e3812778e95699b90d530543a1L173

  This means that we're unable to use latest noVNC with Nova.  There is
  a really gross workaround of using the 'path' override in the URL for
  something like this

  http://foo/vnc_lite.html?path=?token=foo

  That feels pretty lame to me and it will have all deployment tools
  change their settings.  Also, this wasn't caught in CI because we
  deploy novnc from packages.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1822676/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1822613] [NEW] Inefficient queries inside online_data_migrations

2019-04-01 Thread Mohammed Naser
Public bug reported:

The online_data_migrations should be ran after an upgrade and contains a
list of tasks to do to backfill information after an upgrade, however,
some of those queries are extremely inefficient which results in this
online data migrations taking an unacceptable period of time.  The SQL
query that takes a really long time in question:

> SELECT count(*) AS count_1
> FROM (SELECT instance_extra.created_at AS instance_extra_created_at,
> instance_extra.updated_at AS instance_extra_updated_at,
> instance_extra.deleted_at AS instance_extra_deleted_at,
> instance_extra.deleted AS instance_extra_deleted, instance_extra.id AS
> instance_extra_id, instance_extra.instance_uuid AS
> instance_extra_instance_uuid
> FROM instance_extra
> WHERE instance_extra.keypairs IS NULL AND instance_extra.deleted = 0) AS 
> anon_1

It would also be good for us to *not* run a data migration again if we
know we've already gotten found=0 when online_data_migrations is running
in "forever-until-complete".  Also, the value of 50 rows per run in that
mode is quite small.

ref: http://lists.openstack.org/pipermail/openstack-
discuss/2019-April/004397.html

** Affects: nova
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1822613

Title:
  Inefficient queries inside online_data_migrations

Status in OpenStack Compute (nova):
  New

Bug description:
  The online_data_migrations should be ran after an upgrade and contains
  a list of tasks to do to backfill information after an upgrade,
  however, some of those queries are extremely inefficient which results
  in this online data migrations taking an unacceptable period of time.
  The SQL query that takes a really long time in question:

  > SELECT count(*) AS count_1
  > FROM (SELECT instance_extra.created_at AS instance_extra_created_at,
  > instance_extra.updated_at AS instance_extra_updated_at,
  > instance_extra.deleted_at AS instance_extra_deleted_at,
  > instance_extra.deleted AS instance_extra_deleted, instance_extra.id AS
  > instance_extra_id, instance_extra.instance_uuid AS
  > instance_extra_instance_uuid
  > FROM instance_extra
  > WHERE instance_extra.keypairs IS NULL AND instance_extra.deleted = 0) AS 
anon_1

  It would also be good for us to *not* run a data migration again if we
  know we've already gotten found=0 when online_data_migrations is
  running in "forever-until-complete".  Also, the value of 50 rows per
  run in that mode is quite small.

  ref: http://lists.openstack.org/pipermail/openstack-
  discuss/2019-April/004397.html

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1822613/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1821244] [NEW] Failed volume creation can result in invalid `connection_info` field

2019-03-21 Thread Mohammed Naser
Public bug reported:

If a volume fails to create, this can result in `connection_info` having
the literal value of 'null' which breaks things down the road that
expect it to be a dictionary, an example of a breakage:

https://github.com/openstack/nova/blob/a5e3054e1d6df248fc4c00b9abd7289dde160393/nova/compute/utils.py#L1260

This would fail with:

AttributeError: 'NoneType' object has no attribute 'get'

** Affects: nova
 Importance: Undecided
 Assignee: Mohammed Naser (mnaser)
 Status: New

** Changed in: nova
 Assignee: (unassigned) => Mohammed Naser (mnaser)

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1821244

Title:
  Failed volume creation can result in invalid `connection_info` field

Status in OpenStack Compute (nova):
  New

Bug description:
  If a volume fails to create, this can result in `connection_info`
  having the literal value of 'null' which breaks things down the road
  that expect it to be a dictionary, an example of a breakage:

  
https://github.com/openstack/nova/blob/a5e3054e1d6df248fc4c00b9abd7289dde160393/nova/compute/utils.py#L1260

  This would fail with:

  AttributeError: 'NoneType' object has no attribute 'get'

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1821244/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1820752] [NEW] Implement reader/member/admin roles

2019-03-18 Thread Mohammed Naser
Public bug reported:

Keystone has introduced roles for reader/member/admin which we should
leverage in order to be able to provide an easy way for read-only access
to APIs.

** Affects: nova
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1820752

Title:
  Implement reader/member/admin roles

Status in OpenStack Compute (nova):
  New

Bug description:
  Keystone has introduced roles for reader/member/admin which we should
  leverage in order to be able to provide an easy way for read-only
  access to APIs.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1820752/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1715374] Re: Reloading compute with SIGHUP prevents instances from booting

2019-03-07 Thread Mohammed Naser
** Also affects: oslo.service
   Importance: Undecided
   Status: New

** Also affects: openstack-ansible
   Importance: Undecided
   Status: New

** Changed in: openstack-ansible
   Status: New => Confirmed

** Changed in: openstack-ansible
   Importance: Undecided => Critical

** Changed in: openstack-ansible
 Assignee: (unassigned) => Mohammed Naser (mnaser)

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1715374

Title:
  Reloading compute with SIGHUP prevents instances from booting

Status in OpenStack Compute (nova):
  In Progress
Status in openstack-ansible:
  Confirmed
Status in oslo.service:
  In Progress
Status in tripleo:
  Won't Fix

Bug description:
  When trying to boot a new instance at a compute-node, where nova-
  compute received SIGHUP(the SIGHUP is used as a trigger for reloading
  mutable options), it always failed.

== nova/compute/manager.py ==
  def cancel_all_events(self):
  if self._events is None:
  LOG.debug('Unexpected attempt to cancel events during shutdown.')
  return
  our_events = self._events
  # NOTE(danms): Block new events
  self._events = None<--- Set self._events to 
"None" 
  ...
  =

This will cause a NovaException when prepare_for_instance_event() was 
called.
It's the cause of the failure of network allocation.

  == nova/compute/manager.py ==
  def prepare_for_instance_event(self, instance, event_name):
  ...
  if self._events is None:
  # NOTE(danms): We really should have a more specific error
  # here, but this is what we use for our default error case
  raise exception.NovaException('In shutdown, no new events '
'can be scheduled')
  =

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1715374/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1749574] Re: [tracking] removal and migration of pycrypto

2019-02-13 Thread Mohammed Naser
** Changed in: openstack-ansible
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1749574

Title:
  [tracking] removal and migration of pycrypto

Status in Barbican:
  In Progress
Status in Compass:
  New
Status in daisycloud:
  New
Status in OpenStack Backup/Restore and DR (Freezer):
  New
Status in Fuel for OpenStack:
  New
Status in OpenStack Compute (nova):
  Triaged
Status in openstack-ansible:
  Fix Released
Status in OpenStack Global Requirements:
  Fix Released
Status in pyghmi:
  Fix Committed
Status in Solum:
  Fix Released
Status in Tatu:
  New
Status in OpenStack DBaaS (Trove):
  Fix Released

Bug description:
  trove
  tatu
  barbican
  compass
  daisycloud
  freezer
  fuel
  nova
  openstack-ansible - https://review.openstack.org/544516
  pyghmi - https://review.openstack.org/569073
  solum

To manage notifications about this bug go to:
https://bugs.launchpad.net/barbican/+bug/1749574/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1807400] Re: networksegments table in neutron can not be cleared automatically

2018-12-18 Thread Mohammed Naser
** Also affects: neutron
   Importance: Undecided
   Status: New

** Changed in: neutron
   Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1807400

Title:
  networksegments table in neutron can not be cleared automatically

Status in neutron:
  Invalid
Status in openstack-ansible:
  New

Bug description:
  _process_port_binding function in neutron/plugins/ml2/plugin.py used
  clear_binding_levels to clear ml2_port_binding_levels table, but it
  will not do anything to networksegments under hierarchical port
  bonding condition

  @db_api.context_manager.writer
  def clear_binding_levels(context, port_id, host):
  if host:
  for l in (context.session.query(models.PortBindingLevel).
filter_by(port_id=port_id, host=host)):
  context.session.delete(l)
  LOG.debug("For port %(port_id)s, host %(host)s, "
"cleared binding levels",
{'port_id': port_id,
 'host': host})

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1807400/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1800511] [NEW] VMs started before Rocky upgrade cannot be live migrated

2018-10-29 Thread Mohammed Naser
Public bug reported:

In Rocky, the following patch introduced adding MTU to the network for
VMs:

https://github.com/openstack/nova/commit/f02b3800051234ecc14f3117d5987b1a8ef75877

However, this didn't affect live migrations much because Nova didn't
touch the network bits of the XML during live migration, until this
patch:

https://github.com/openstack/nova/commit/2b52cde565d542c03f004b48ee9c1a6a25f5b7cd

With that change, the MTU is added to the configuration, which means
that the destination is launched with host_mtu=N, which apparently
changes the guest ABI (see:
https://bugzilla.redhat.com/show_bug.cgi?id=1449346).  This means the
live migration will fail with an error looking like this:

2018-10-29 14:59:15.126+: 5289: error : qemuProcessReportLogError:1914 : 
internal error: qemu unexpectedly closed the monitor: 
2018-10-29T14:59:14.977084Z qemu-kvm: get_pci_config_device: Bad config data: 
i=0x10 read: 61 device: 1 cmask: ff wmask: c0 w1cmask:0
2018-10-29T14:59:14.977105Z qemu-kvm: Failed to load PCIDevice:config
2018-10-29T14:59:14.977109Z qemu-kvm: Failed to load virtio-net:virtio
2018-10-29T14:59:14.977112Z qemu-kvm: error while loading state for instance 
0x0 of device ‘:00:03.0/virtio-net’
2018-10-29T14:59:14.977283Z qemu-kvm: load of migration failed: Invalid argument

I was able to further verify this by seeing that `host_mtu` exists in
the command line when looking at the destination host instance logs in
/var/log/libvirt/qemu/instance-foo.log

** Affects: nova
 Importance: High
 Assignee: Mohammed Naser (mnaser)
 Status: Triaged


** Tags: libvirt live-migration upgrade

** Changed in: nova
 Assignee: (unassigned) => Mohammed Naser (mnaser)

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1800511

Title:
  VMs started before Rocky upgrade cannot be live migrated

Status in OpenStack Compute (nova):
  Triaged

Bug description:
  In Rocky, the following patch introduced adding MTU to the network for
  VMs:

  
https://github.com/openstack/nova/commit/f02b3800051234ecc14f3117d5987b1a8ef75877

  However, this didn't affect live migrations much because Nova didn't
  touch the network bits of the XML during live migration, until this
  patch:

  
https://github.com/openstack/nova/commit/2b52cde565d542c03f004b48ee9c1a6a25f5b7cd

  With that change, the MTU is added to the configuration, which means
  that the destination is launched with host_mtu=N, which apparently
  changes the guest ABI (see:
  https://bugzilla.redhat.com/show_bug.cgi?id=1449346).  This means the
  live migration will fail with an error looking like this:

  2018-10-29 14:59:15.126+: 5289: error : qemuProcessReportLogError:1914 : 
internal error: qemu unexpectedly closed the monitor: 
2018-10-29T14:59:14.977084Z qemu-kvm: get_pci_config_device: Bad config data: 
i=0x10 read: 61 device: 1 cmask: ff wmask: c0 w1cmask:0
  2018-10-29T14:59:14.977105Z qemu-kvm: Failed to load PCIDevice:config
  2018-10-29T14:59:14.977109Z qemu-kvm: Failed to load virtio-net:virtio
  2018-10-29T14:59:14.977112Z qemu-kvm: error while loading state for instance 
0x0 of device ‘:00:03.0/virtio-net’
  2018-10-29T14:59:14.977283Z qemu-kvm: load of migration failed: Invalid 
argument

  I was able to further verify this by seeing that `host_mtu` exists in
  the command line when looking at the destination host instance logs in
  /var/log/libvirt/qemu/instance-foo.log

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1800511/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1799892] [NEW] Placement API crashes with 500s in Rocky upgrade with downed compute nodes

2018-10-25 Thread Mohammed Naser
Public bug reported:

I ran into this upgrading another environment into Rocky, deleted the
problematic resource provider, but just ran into it again in another
upgrade of another environment so there's something wonky.  Here's the
traceback:

=
2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap 
[req-8ad1c999-7646-4b0a-91c0-cd26a3581766 b61d42657d364008bfdc6fa715e67daf 
a894e8109af3430aa7ae03e0c49a0aa0 - default default] Placement API unexpected 
error: 19: KeyError: 19
2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap 
Traceback (most recent call last):
2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap   
File 
"/usr/lib/python2.7/site-packages/nova/api/openstack/placement/fault_wrap.py", 
line 40, in __call__
2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap 
return self.application(environ, start_response)
2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap   
File "/usr/lib/python2.7/site-packages/webob/dec.py", line 129, in __call__
2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap 
resp = self.call_func(req, *args, **kw)
2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap   
File "/usr/lib/python2.7/site-packages/webob/dec.py", line 193, in call_func
2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap 
return self.func(req, *args, **kwargs)
2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap   
File "/usr/lib/python2.7/site-packages/microversion_parse/middleware.py", line 
80, in __call__
2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap 
response = req.get_response(self.application)
2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap   
File "/usr/lib/python2.7/site-packages/webob/request.py", line 1313, in send
2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap 
application, catch_exc_info=False)
2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap   
File "/usr/lib/python2.7/site-packages/webob/request.py", line 1277, in 
call_application
2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap 
app_iter = application(self.environ, start_response)
2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap   
File 
"/usr/lib/python2.7/site-packages/nova/api/openstack/placement/handler.py", 
line 209, in __call__
2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap 
return dispatch(environ, start_response, self._map)
2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap   
File 
"/usr/lib/python2.7/site-packages/nova/api/openstack/placement/handler.py", 
line 146, in dispatch
2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap 
return handler(environ, start_response)
2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap   
File "/usr/lib/python2.7/site-packages/webob/dec.py", line 129, in __call__
2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap 
resp = self.call_func(req, *args, **kw)
2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap   
File 
"/usr/lib/python2.7/site-packages/nova/api/openstack/placement/wsgi_wrapper.py",
 line 29, in call_func
2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap 
super(PlacementWsgify, self).call_func(req, *args, **kwargs)
2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap   
File "/usr/lib/python2.7/site-packages/webob/dec.py", line 193, in call_func
2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap 
return self.func(req, *args, **kwargs)
2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap   
File 
"/usr/lib/python2.7/site-packages/nova/api/openstack/placement/microversion.py",
 line 164, in decorated_func
2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap 
return _find_method(f, version, status_code)(req, *args, **kwargs)
2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap   
File "/usr/lib/python2.7/site-packages/nova/api/openstack/placement/util.py", 
line 81, in decorated_function
2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap 
return f(req)
2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap   
File 
"/usr/lib/python2.7/site-packages/nova/api/openstack/placement/handlers/allocation_candidate.py",
 line 316, in list_allocation_candidates
2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap 
context, requests, limit=limit, group_policy=group_policy)
2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap   
File 

[Yahoo-eng-team] [Bug 1798188] [NEW] VNC stops working in rolling upgrade by default

2018-10-16 Thread Mohammed Naser
Public bug reported:

During a rolling upgrade, once the control plane is upgraded and running
on Rocky (but computes still in Queens), the consoles will stop working.

It is not obvious however it seems that the following is missing:

```
[workarounds]
enable_consoleauth = True
```

There isn't a really obvious document or anything explaining this,
leaving the user confused

** Affects: nova
 Importance: High
 Assignee: melanie witt (melwitt)
 Status: Confirmed


** Tags: console upgrade

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1798188

Title:
  VNC stops working in rolling upgrade by default

Status in OpenStack Compute (nova):
  Confirmed

Bug description:
  During a rolling upgrade, once the control plane is upgraded and
  running on Rocky (but computes still in Queens), the consoles will
  stop working.

  It is not obvious however it seems that the following is missing:

  ```
  [workarounds]
  enable_consoleauth = True
  ```

  There isn't a really obvious document or anything explaining this,
  leaving the user confused

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1798188/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1794811] [NEW] Lack of allocation candidates is only logged in DEBUG

2018-09-27 Thread Mohammed Naser
Public bug reported:

If the placement service gets allocation candidates, it goes through all
the filters and if it ends up with 0 compute nodes, it logs it in INFO:

https://github.com/openstack/nova/blob/c6218428e9b29a2c52808ec7d27b4b21aadc0299/nova/filters.py#L130

However, if no allocation candidates match, it throw a message in DEBUG
and exits out, leaving important information for the operator.

https://github.com/openstack/nova/blob/c3fe54a74d8a3b5d5338a902e3562733a2b9a564/nova/scheduler/manager.py#L150-L153

** Affects: nova
 Importance: Undecided
 Assignee: Mohammed Naser (mnaser)
 Status: In Progress

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1794811

Title:
  Lack of allocation candidates is only logged in DEBUG

Status in OpenStack Compute (nova):
  In Progress

Bug description:
  If the placement service gets allocation candidates, it goes through
  all the filters and if it ends up with 0 compute nodes, it logs it in
  INFO:

  
https://github.com/openstack/nova/blob/c6218428e9b29a2c52808ec7d27b4b21aadc0299/nova/filters.py#L130

  However, if no allocation candidates match, it throw a message in
  DEBUG and exits out, leaving important information for the operator.

  
https://github.com/openstack/nova/blob/c3fe54a74d8a3b5d5338a902e3562733a2b9a564/nova/scheduler/manager.py#L150-L153

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1794811/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1793569] [NEW] Add placement audit commands

2018-09-20 Thread Mohammed Naser
Public bug reported:

It is possible that placement gets out of sync which can cause
scheduling problems that would go unknown.  I've built out this script
would would be nice to have as `nova-manage placement audit`:


#!/usr/bin/env python

import argparse
import sys

from openstack import connection
import openstack.config

config = openstack.config.OpenStackConfig()
parser = argparse.ArgumentParser()
config.register_argparse_arguments(parser, sys.argv)

options = parser.parse_args()

cloud_region = config.get_one(argparse=options)
conn = connection.Connection(config=cloud_region)

# Grab list of all hypervisors and their servers
hypervisors = conn.compute.get('/os-hypervisors?with_servers=true', 
microversion='2.53').json().get('hypervisors')

# Generate a dictionary mapping of hypervisor => [instances]
hypervisor_mapping = {h['id']: [s['uuid'] for s in h.get('servers', [])] for h 
in hypervisors}
hypervisor_names = {h['id']: h['hypervisor_hostname'] for h in hypervisors}

# Grab list of all resource providers
resource_providers = 
conn.placement.get('/resource_providers').json().get('resource_providers')
for rp in resource_providers:
  # Check if RP has VCPU in inventory (i.e. compute node)
  inventories = conn.placement.get('/resource_providers/%s/inventories' % 
rp['uuid']).json().get('inventories')

  # Skip those without VCPU and MEMORY_MB (non computes)
  if 'MEMORY_MB' not in inventories and 'VCPU' not in inventories:
continue

  # Get all allocations for RP
  allocations = conn.placement.get('/resource_providers/%s/allocations' % 
rp['uuid']).json().get('allocations')

  # Is there a compute node for this RP?
  if rp['uuid'] not in hypervisor_mapping:
print "openstack resource provider delete %s # resource provider does not 
have matching provider" % rp['uuid']
continue

  for allocation_id, info in allocations.iteritems():
# The instance does not exist where placement says it should be.
if allocation_id not in hypervisor_mapping[rp['uuid']]:
  hypervisor = None

  # Try to find where it's hiding.
  for hyp, instances in hypervisor_mapping.iteritems():
if allocation_id in instances:
  hypervisor = hyp
  break

  # We found it.
  if hypervisor:
classes = ','.join(["%s=%s" % (key, value) for key, value in 
info.get('resources').iteritems()])
print "openstack resource provider allocation set --allocation rp=%s,%s 
%s # instance allocated on wrong rp" % (hypervisor, classes, allocation_id)
continue

  # We don't know where this is.  Let's see if it exists in Nova.
  server = conn.placement.get('/servers/%s' % allocation_id)
  if server.status_code == 404:
print "openstack resource provider allocation delete %s # instance 
deleted" % allocation_id
continue

  # TODO: idk? edge cases?
  raise


It would likely need to be rewritten to use the built-in placement HTTP
client and objects to avoid extra API calls.

** Affects: nova
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1793569

Title:
  Add placement audit commands

Status in OpenStack Compute (nova):
  New

Bug description:
  It is possible that placement gets out of sync which can cause
  scheduling problems that would go unknown.  I've built out this script
  would would be nice to have as `nova-manage placement audit`:

  

  #!/usr/bin/env python

  import argparse
  import sys

  from openstack import connection
  import openstack.config

  config = openstack.config.OpenStackConfig()
  parser = argparse.ArgumentParser()
  config.register_argparse_arguments(parser, sys.argv)

  options = parser.parse_args()

  cloud_region = config.get_one(argparse=options)
  conn = connection.Connection(config=cloud_region)

  # Grab list of all hypervisors and their servers
  hypervisors = conn.compute.get('/os-hypervisors?with_servers=true', 
microversion='2.53').json().get('hypervisors')

  # Generate a dictionary mapping of hypervisor => [instances]
  hypervisor_mapping = {h['id']: [s['uuid'] for s in h.get('servers', [])] for 
h in hypervisors}
  hypervisor_names = {h['id']: h['hypervisor_hostname'] for h in hypervisors}

  # Grab list of all resource providers
  resource_providers = 
conn.placement.get('/resource_providers').json().get('resource_providers')
  for rp in resource_providers:
# Check if RP has VCPU in inventory (i.e. compute node)
inventories = conn.placement.get('/resource_providers/%s/inventories' % 
rp['uuid']).json().get('inventories')

# Skip those without VCPU and MEMORY_MB (non computes)
   

[Yahoo-eng-team] [Bug 1793533] [NEW] Deleting a service with nova-compute binary doesn't remove compute node

2018-09-20 Thread Mohammed Naser
Public bug reported:

If you are taking a nova-compute service out of service permanently, the
logical steps would be:

1) Take down the service
2) Delete it from the service list (nova service-delete )

However, this does not delete the compute node record which stays
forever, leading to the scheduler to always complain about it as well:

2018-09-20 13:15:45.312 131035 WARNING nova.scheduler.host_manager [req-
c4a7c383-c606-48a7-b870-cc143710114a 234412d3482f4707877ca696e105bf5b
acb15d2ffaae4eda98580c7b874d7f89 - default default] No compute service
record found for host .vexxhost.net

https://github.com/openstack/nova/blob/master/nova/scheduler/host_manager.py#L716-L720

We should be deleting the compute node if a nova-compute binary is
deleted, or that section should automatically clean up while warning
(because service records can be rebuilt anyways?)

** Affects: nova
 Importance: Undecided
 Status: Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1793533

Title:
  Deleting a service with nova-compute binary doesn't remove compute
  node

Status in OpenStack Compute (nova):
  Invalid

Bug description:
  If you are taking a nova-compute service out of service permanently,
  the logical steps would be:

  1) Take down the service
  2) Delete it from the service list (nova service-delete )

  However, this does not delete the compute node record which stays
  forever, leading to the scheduler to always complain about it as well:

  2018-09-20 13:15:45.312 131035 WARNING nova.scheduler.host_manager
  [req-c4a7c383-c606-48a7-b870-cc143710114a
  234412d3482f4707877ca696e105bf5b acb15d2ffaae4eda98580c7b874d7f89 -
  default default] No compute service record found for host
  .vexxhost.net

  
https://github.com/openstack/nova/blob/master/nova/scheduler/host_manager.py#L716-L720

  We should be deleting the compute node if a nova-compute binary is
  deleted, or that section should automatically clean up while warning
  (because service records can be rebuilt anyways?)

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1793533/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1784074] [NEW] Instances end up with no cell assigned in instance_mappings

2018-07-27 Thread Mohammed Naser
Public bug reported:

There has been situations where due to an unrelated issue such as an RPC
or DB problem, the nova_api instance_mappings table can end up with
instances that have cell_id set to NULL which can cause annoying and
weird behaviour such as undeletable instances, etc.

This seems to be an issue only during times where these external
infrastructure components had issues.  I have come up with the following
script which loops over all cells and checks where they are, and spits
out a mysql query to run to fix.

This would be nice to have as a nova-manage cell_v2 command to help if
any other users run into this, unfortunately I'm a bit short on time so
I don't have time to nova-ify it, but it's here:


#!/usr/bin/env python

import urlparse

import pymysql


# Connect to databases
api_conn = pymysql.connect(host='', port=3306, user='nova_api', 
passwd='xxx', db='nova_api')
api_cur = api_conn.cursor()

def _get_conn(db):
  parsed_url = urlparse.urlparse(db)
  conn = pymysql.connect(host=parsed_url.hostname, user=parsed_url.username, 
passwd=parsed_url.password, db=parsed_url.path[1:])
  return conn.cursor()

# Get list of all cells
api_cur.execute("SELECT uuid, name, database_connection FROM cell_mappings")
CELLS = [{'uuid': uuid, 'name': name, 'db': _get_conn(db)} for uuid, name, db 
in api_cur.fetchall()]

# Get list of all unmapped instances
api_cur.execute("SELECT instance_uuid FROM instance_mappings WHERE cell_id IS 
NULL")
print "Number of unmapped instances: %s" % api_cur.rowcount

# Go over all unmapped instances
for (instance_uuid,) in api_cur.fetchall():
  instance_cell = None

  # Check which cell contains this instance
  for cell in CELLS:
cell['db'].execute("SELECT id FROM instances WHERE uuid = %s", 
(instance_uuid,))

if cell['db'].rowcount != 0:
  instance_cell = cell
  break

  # Update to the correct cell
  if instance_cell:
print "UPDATE instance_mappings SET cell_id = '%s' WHERE instance_uuid = 
'%s'" % (instance_cell['uuid'], instance_uuid)
continue

  # If we reach this point, it's not in any cell?!
  print "%s: not found in any cell" % (instance_uuid)


** Affects: nova
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1784074

Title:
  Instances end up with no cell assigned in instance_mappings

Status in OpenStack Compute (nova):
  New

Bug description:
  There has been situations where due to an unrelated issue such as an
  RPC or DB problem, the nova_api instance_mappings table can end up
  with instances that have cell_id set to NULL which can cause annoying
  and weird behaviour such as undeletable instances, etc.

  This seems to be an issue only during times where these external
  infrastructure components had issues.  I have come up with the
  following script which loops over all cells and checks where they are,
  and spits out a mysql query to run to fix.

  This would be nice to have as a nova-manage cell_v2 command to help if
  any other users run into this, unfortunately I'm a bit short on time
  so I don't have time to nova-ify it, but it's here:

  
  #!/usr/bin/env python

  import urlparse

  import pymysql

  
  # Connect to databases
  api_conn = pymysql.connect(host='', port=3306, user='nova_api', 
passwd='xxx', db='nova_api')
  api_cur = api_conn.cursor()

  def _get_conn(db):
parsed_url = urlparse.urlparse(db)
conn = pymysql.connect(host=parsed_url.hostname, user=parsed_url.username, 
passwd=parsed_url.password, db=parsed_url.path[1:])
return conn.cursor()

  # Get list of all cells
  api_cur.execute("SELECT uuid, name, database_connection FROM cell_mappings")
  CELLS = [{'uuid': uuid, 'name': name, 'db': _get_conn(db)} for uuid, name, db 
in api_cur.fetchall()]

  # Get list of all unmapped instances
  api_cur.execute("SELECT instance_uuid FROM instance_mappings WHERE cell_id IS 
NULL")
  print "Number of unmapped instances: %s" % api_cur.rowcount

  # Go over all unmapped instances
  for (instance_uuid,) in api_cur.fetchall():
instance_cell = None

# Check which cell contains this instance
for cell in CELLS:
  cell['db'].execute("SELECT id FROM instances WHERE uuid = %s", 
(instance_uuid,))

  if cell['db'].rowcount != 0:
instance_cell = cell
break

# Update to the correct cell
if instance_cell:
  print "UPDATE instance_mappings SET cell_id = '%s' WHERE instance_uuid = 
'%s'" % (instance_cell['uuid'], instance_uuid)
  continue

# If we reach this point, it's not in any cell?!
print "%s: not found in any cell" % (instance_uuid)
  

[Yahoo-eng-team] [Bug 1769283] [NEW] ImagePropertiesFilter has no default value resulting in unpredictable scheduling

2018-05-04 Thread Mohammed Naser
Public bug reported:

When using ImagePropertiesFilter for something like hardware
architecture, it can cause very unpredictable behaviour because of the
lack of default value.

In our case, a public cloud user will most likely upload an image
without `hw_architecture` defined anywhere (as most instruction and
general OpenStack documentation refers to).

However, in a case where there are multiple architectures available, the
images tagged with a specific architecture will go towards hypervisors
with that specific architecture.  However, those which are not tagged
will go to *any* hypervisor.

Because of how popular certain architectures are, it should be possible
to be able to set a 'default' value for the architecture as it is the
implied one, with the ability to override it if a user wants a specific
one.

** Affects: nova
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1769283

Title:
  ImagePropertiesFilter has no default value resulting in unpredictable
  scheduling

Status in OpenStack Compute (nova):
  New

Bug description:
  When using ImagePropertiesFilter for something like hardware
  architecture, it can cause very unpredictable behaviour because of the
  lack of default value.

  In our case, a public cloud user will most likely upload an image
  without `hw_architecture` defined anywhere (as most instruction and
  general OpenStack documentation refers to).

  However, in a case where there are multiple architectures available,
  the images tagged with a specific architecture will go towards
  hypervisors with that specific architecture.  However, those which are
  not tagged will go to *any* hypervisor.

  Because of how popular certain architectures are, it should be
  possible to be able to set a 'default' value for the architecture as
  it is the implied one, with the ability to override it if a user wants
  a specific one.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1769283/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1755890] [NEW] Instances fail to hard reboot when using OpenDaylight

2018-03-14 Thread Mohammed Naser
Public bug reported:

When using OpenDaylight with Open vSwitch, the Neutron Open vSwitch
agent does not exist in the environment anymore.

When an instance is started up for the first time, OpenDaylight will
successfully bind the port and send the vif plugged notification.
However, since the introduction of the following patch:

https://review.openstack.org/#/q/Ib08afad3822f2ca95cfeea18d7f4fc4cb407b4d6

It now expects the vif plugged event to happen on hard reboots, which
for certain environments (such as using ODL with OVS, it will not come
in).  This results in all instance starts after the first one failing.

Discussion:
http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2018-03-14.log.html#t2018-03-14T18:12:48

ODL issue:
https://jira.opendaylight.org/projects/NETVIRT/issues/NETVIRT-512

** Affects: nova
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1755890

Title:
  Instances fail to hard reboot when using OpenDaylight

Status in OpenStack Compute (nova):
  New

Bug description:
  When using OpenDaylight with Open vSwitch, the Neutron Open vSwitch
  agent does not exist in the environment anymore.

  When an instance is started up for the first time, OpenDaylight will
  successfully bind the port and send the vif plugged notification.
  However, since the introduction of the following patch:

  https://review.openstack.org/#/q/Ib08afad3822f2ca95cfeea18d7f4fc4cb407b4d6

  It now expects the vif plugged event to happen on hard reboots, which
  for certain environments (such as using ODL with OVS, it will not come
  in).  This results in all instance starts after the first one failing.

  Discussion:
  
http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2018-03-14.log.html#t2018-03-14T18:12:48

  ODL issue:
  https://jira.opendaylight.org/projects/NETVIRT/issues/NETVIRT-512

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1755890/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1752736] [NEW] Nova compute dies if it cannot authenticate to RabbitMQ

2018-03-01 Thread Mohammed Naser
Public bug reported:

At the moment, nova-compute will die if it fails to authenticate to the
messaging cluster and it will not retry on start.  It is possible that
the vhost is not ready yet so it should be handled here:

https://github.com/openstack/nova/blob/stable/pike/nova/conductor/api.py#L61-L78

** Affects: nova
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1752736

Title:
  Nova compute dies if it cannot authenticate to RabbitMQ

Status in OpenStack Compute (nova):
  New

Bug description:
  At the moment, nova-compute will die if it fails to authenticate to
  the messaging cluster and it will not retry on start.  It is possible
  that the vhost is not ready yet so it should be handled here:

  
https://github.com/openstack/nova/blob/stable/pike/nova/conductor/api.py#L61-L78

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1752736/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1750666] [NEW] Deleting an instance before scheduling with BFV fails to detach volume

2018-02-20 Thread Mohammed Naser
Public bug reported:

If you try to boot and instance and delete it early before scheduling,
the '_delete_while_booting' codepath hits
`_attempt_delete_of_buildrequest` which tries to remove the block device
mappings.

However, if the cloud contains compute nodes before Pike, no block
device mappings will be present in the database (because they are only
saved if using the new attachment flow), which means the attachment IDs
are empty and the volume delete fails:

2018-02-20 16:02:25,063 WARNING [nova.compute.api] Ignoring volume
cleanup failure due to Object action obj_load_attr failed because:
attribute attachment_id not lazy-loadable

** Affects: nova
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1750666

Title:
  Deleting an instance before scheduling with BFV fails to detach volume

Status in OpenStack Compute (nova):
  New

Bug description:
  If you try to boot and instance and delete it early before scheduling,
  the '_delete_while_booting' codepath hits
  `_attempt_delete_of_buildrequest` which tries to remove the block
  device mappings.

  However, if the cloud contains compute nodes before Pike, no block
  device mappings will be present in the database (because they are only
  saved if using the new attachment flow), which means the attachment
  IDs are empty and the volume delete fails:

  2018-02-20 16:02:25,063 WARNING [nova.compute.api] Ignoring volume
  cleanup failure due to Object action obj_load_attr failed because:
  attribute attachment_id not lazy-loadable

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1750666/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1739325] [NEW] Server operations fail to complete with versioned notifications if payload contains unset non-nullable fields

2017-12-19 Thread Mohammed Naser
Public bug reported:

With versioned notifications, the instance payload tries to attach a
flavor payload which it looks up from the instance.  It uses the one
which is attached in instance_extras however there seems to be a
scenario where the disabled field is missing in the database, causing
all operations to fail in the notification stage.

The JSON string for the flavor in the database is attached below (note
this is a cloud with a long lifetime so it might be some weird
conversion at some point in the life time of the cloud).

The temporary workaround as suggested by Matt was to switch to
unversioned notification which did the trick.

== flavor ==
{"new": null, "old": null, "cur": {"nova_object.version": "1.1", 
"nova_object.changes": ["root_gb", "name", "ephemeral_gb", "memory_mb", 
"vcpus", "extra_specs", "swap", "rxtx_factor", "flavorid", "vcpu_weight", 
"id"], "nova_object.name": "Flavor", "nova_object.data": {"root_gb": 80, 
"name": "nb.2G", "ephemeral_gb": 0, "memory_mb": 2048, "vcpus": 4, 
"extra_specs": {}, "swap": 0, "rxtx_factor": 1.0, "flavorid": 
"8c6a8477-20cb-4db9-ad1d-be3bc05cdae9", "vcpu_weight": null, "id": 8}, 
"nova_object.namespace": "nova"}}
== flavor ==

== stack ==
2017-10-23 14:49:21.117 40200 ERROR oslo_messaging.rpc.server 
[req-edc9fb83-63ff-4c4b-b6c6-704d331905a8 604d5fd332904975a26b6e89c60a9d51 
d6ebcbe536f848b3af4403f922377f80 - default default] Exception during message 
handling: ValueError: Field `disabled' cannot be None
2017-10-23 14:49:21.117 40200 ERROR oslo_messaging.rpc.server Traceback (most 
recent call last):
2017-10-23 14:49:21.117 40200 ERROR oslo_messaging.rpc.server   File 
"/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 160, in 
_process_incoming
2017-10-23 14:49:21.117 40200 ERROR oslo_messaging.rpc.server res = 
self.dispatcher.dispatch(message)
2017-10-23 14:49:21.117 40200 ERROR oslo_messaging.rpc.server   File 
"/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 213, 
in dispatch
2017-10-23 14:49:21.117 40200 ERROR oslo_messaging.rpc.server return 
self._do_dispatch(endpoint, method, ctxt, args)
2017-10-23 14:49:21.117 40200 ERROR oslo_messaging.rpc.server   File 
"/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 183, 
in _do_dispatch
2017-10-23 14:49:21.117 40200 ERROR oslo_messaging.rpc.server result = 
func(ctxt, **new_args)
2017-10-23 14:49:21.117 40200 ERROR oslo_messaging.rpc.server   File 
"/usr/lib/python2.7/site-packages/nova/exception_wrapper.py", line 76, in 
wrapped
2017-10-23 14:49:21.117 40200 ERROR oslo_messaging.rpc.server 
function_name, call_dict, binary)
2017-10-23 14:49:21.117 40200 ERROR oslo_messaging.rpc.server   File 
"/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2017-10-23 14:49:21.117 40200 ERROR oslo_messaging.rpc.server 
self.force_reraise()
2017-10-23 14:49:21.117 40200 ERROR oslo_messaging.rpc.server   File 
"/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in 
force_reraise
2017-10-23 14:49:21.117 40200 ERROR oslo_messaging.rpc.server 
six.reraise(self.type_, self.value, self.tb)
2017-10-23 14:49:21.117 40200 ERROR oslo_messaging.rpc.server   File 
"/usr/lib/python2.7/site-packages/nova/exception_wrapper.py", line 67, in 
wrapped
2017-10-23 14:49:21.117 40200 ERROR oslo_messaging.rpc.server return 
f(self, context, *args, **kw)
2017-10-23 14:49:21.117 40200 ERROR oslo_messaging.rpc.server   File 
"/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 189, in 
decorated_function
2017-10-23 14:49:21.117 40200 ERROR oslo_messaging.rpc.server "Error: %s", 
e, instance=instance)
2017-10-23 14:49:21.117 40200 ERROR oslo_messaging.rpc.server   File 
"/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2017-10-23 14:49:21.117 40200 ERROR oslo_messaging.rpc.server 
self.force_reraise()
2017-10-23 14:49:21.117 40200 ERROR oslo_messaging.rpc.server   File 
"/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in 
force_reraise
2017-10-23 14:49:21.117 40200 ERROR oslo_messaging.rpc.server 
six.reraise(self.type_, self.value, self.tb)
2017-10-23 14:49:21.117 40200 ERROR oslo_messaging.rpc.server   File 
"/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 159, in 
decorated_function
2017-10-23 14:49:21.117 40200 ERROR oslo_messaging.rpc.server return 
function(self, context, *args, **kwargs)
2017-10-23 14:49:21.117 40200 ERROR oslo_messaging.rpc.server   File 
"/usr/lib/python2.7/site-packages/nova/compute/utils.py", line 874, in 
decorated_function
2017-10-23 14:49:21.117 40200 ERROR oslo_messaging.rpc.server return 
function(self, context, *args, **kwargs)
2017-10-23 14:49:21.117 40200 ERROR oslo_messaging.rpc.server   File 
"/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 217, in 
decorated_function
2017-10-23 14:49:21.117 40200 ERROR oslo_messaging.rpc.server 

[Yahoo-eng-team] [Bug 1739323] [NEW] KeyError in host_manager for _get_host_states

2017-12-19 Thread Mohammed Naser
Public bug reported:

https://github.com/openstack/nova/blob/master/nova/scheduler/host_manager.py#L674-L718

In _get_host_states, a list of all computes nodes is retrieved with a
`state_key` of `(host, node)`.

https://github.com/openstack/nova/blob/master/nova/scheduler/host_manager.py#L692
https://github.com/openstack/nova/blob/master/nova/scheduler/host_manager.py#L708

The small piece of code here removes all of the dead compute nodes from
host_state_map

https://github.com/openstack/nova/blob/master/nova/scheduler/host_manager.py#L708

However, the result is returned by iterating over all seen nodes and
using that index for host_state_map, some of which have been deleted by
the code above, resulting in a KeyError.

https://github.com/openstack/nova/blob/master/nova/scheduler/host_manager.py#L718

** Affects: nova
 Importance: Undecided
 Status: New


** Tags: scheduler

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1739323

Title:
  KeyError in host_manager for _get_host_states

Status in OpenStack Compute (nova):
  New

Bug description:
  
https://github.com/openstack/nova/blob/master/nova/scheduler/host_manager.py#L674-L718

  In _get_host_states, a list of all computes nodes is retrieved with a
  `state_key` of `(host, node)`.

  
https://github.com/openstack/nova/blob/master/nova/scheduler/host_manager.py#L692
  
https://github.com/openstack/nova/blob/master/nova/scheduler/host_manager.py#L708

  The small piece of code here removes all of the dead compute nodes
  from host_state_map

  
https://github.com/openstack/nova/blob/master/nova/scheduler/host_manager.py#L708

  However, the result is returned by iterating over all seen nodes and
  using that index for host_state_map, some of which have been deleted
  by the code above, resulting in a KeyError.

  
https://github.com/openstack/nova/blob/master/nova/scheduler/host_manager.py#L718

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1739323/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1739318] [NEW] Online data migration context does not contain project_id

2017-12-19 Thread Mohammed Naser
Public bug reported:

The online data migration generates a context in order to be able to
execute migrations:

https://github.com/openstack/nova/blob/master/nova/cmd/manage.py#L747

However, this context does not contain a `project_id` when running this
via CLI.

https://github.com/openstack/nova/blob/master/nova/context.py#L279-L290

During the creation of RequestSpec's for old instances, the context
which contains no `project_id`.

https://github.com/openstack/nova/blob/master/nova/objects/request_spec.py#L611-L622

This means that a RequestSpec gets created with `project_id` set to
`null`.  During the day-to-day operations, things work okay, however,
when attempting to do a live migration, the `project_id` is set to
`null` when trying to claim resources which the placement API refuses.

https://github.com/openstack/nova/blob/master/nova/scheduler/utils.py#L791

This will give errors as such:


 
  400 Bad Request
 
 
  400 Bad Request
  The server could not comply with the request since it is either malformed or 
otherwise incorrect.
JSON does not validate: None is not of type 'string'

Failed validating 'type' in schema['properties']['project_id']:
{'maxLength': 255, 'minLength': 1, 'type': 'string'}

On instance['project_id']:
None


 


** Affects: nova
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1739318

Title:
  Online data migration context does not contain project_id

Status in OpenStack Compute (nova):
  New

Bug description:
  The online data migration generates a context in order to be able to
  execute migrations:

  https://github.com/openstack/nova/blob/master/nova/cmd/manage.py#L747

  However, this context does not contain a `project_id` when running
  this via CLI.

  https://github.com/openstack/nova/blob/master/nova/context.py#L279-L290

  During the creation of RequestSpec's for old instances, the context
  which contains no `project_id`.

  
https://github.com/openstack/nova/blob/master/nova/objects/request_spec.py#L611-L622

  This means that a RequestSpec gets created with `project_id` set to
  `null`.  During the day-to-day operations, things work okay, however,
  when attempting to do a live migration, the `project_id` is set to
  `null` when trying to claim resources which the placement API refuses.

  https://github.com/openstack/nova/blob/master/nova/scheduler/utils.py#L791

  This will give errors as such:

  
   
400 Bad Request
   
   
400 Bad Request
The server could not comply with the request since it is either malformed 
or otherwise incorrect.
  JSON does not validate: None is not of type 'string'

  Failed validating 'type' in schema['properties']['project_id']:
  {'maxLength': 255, 'minLength': 1, 'type': 'string'}

  On instance['project_id']:
  None

  
   
  

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1739318/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1715462] [NEW] Instances failing quota recheck end up with no assigned cell

2017-09-06 Thread Mohammed Naser
Public bug reported:

When an instance fails the quota rechecks codebase which is here:

https://github.com/openstack/nova/blob/master/nova/conductor/manager.py#L992-L1006

It raises an exception, however, the cell mapping is only saved much
later (thanks help of dansmith for finding this):

https://github.com/openstack/nova/blob/master/nova/conductor/manager.py#L1037-L1043

This results in an instance with an unassigned cell, where it should
technically be the cell it was scheduled into.

** Affects: nova
 Importance: Undecided
 Status: New


** Tags: cells quotas

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1715462

Title:
  Instances failing quota recheck end up with no assigned cell

Status in OpenStack Compute (nova):
  New

Bug description:
  When an instance fails the quota rechecks codebase which is here:

  
https://github.com/openstack/nova/blob/master/nova/conductor/manager.py#L992-L1006

  It raises an exception, however, the cell mapping is only saved much
  later (thanks help of dansmith for finding this):

  
https://github.com/openstack/nova/blob/master/nova/conductor/manager.py#L1037-L1043

  This results in an instance with an unassigned cell, where it should
  technically be the cell it was scheduled into.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1715462/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1564182] [NEW] CPU Metrics not working

2016-03-30 Thread Mohammed Naser
Public bug reported:

The metrics collection on compute nodes is currently not working.

When the compute node creates the object to save, it is divided to be a
value inside [0,1].  However, at the same time, when the scheduler needs
to pull out the numbers, it divides it once again as it pulls the
objects:

https://github.com/openstack/nova/blob/stable/liberty/nova/compute/resource_tracker.py#L437
https://github.com/openstack/nova/blob/stable/liberty/nova/compute/monitors/base.py#L60-L63
https://github.com/openstack/nova/blob/stable/liberty/nova/objects/monitor_metric.py#L68-L71

This essentially means that it always returns a value of zero as a
metric, because it divides a small number again by 100.

** Affects: nova
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1564182

Title:
  CPU Metrics not working

Status in OpenStack Compute (nova):
  New

Bug description:
  The metrics collection on compute nodes is currently not working.

  When the compute node creates the object to save, it is divided to be
  a value inside [0,1].  However, at the same time, when the scheduler
  needs to pull out the numbers, it divides it once again as it pulls
  the objects:

  
https://github.com/openstack/nova/blob/stable/liberty/nova/compute/resource_tracker.py#L437
  
https://github.com/openstack/nova/blob/stable/liberty/nova/compute/monitors/base.py#L60-L63
  
https://github.com/openstack/nova/blob/stable/liberty/nova/objects/monitor_metric.py#L68-L71

  This essentially means that it always returns a value of zero as a
  metric, because it divides a small number again by 100.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1564182/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1528894] [NEW] Native ovsdb implementation not working

2015-12-23 Thread Mohammed Naser
Public bug reported:

When trying to use the new native OVSDB provider, the connectivity never
goes up due to the fact that what seems to be the db_set operation
failing to change the patch ports from "nonexistant-peer" to the correct
peer, therefore not linking the bridges together.

https://github.com/openstack/neutron/blob/master/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py#L1119

The system must be running the latest Liberty release, python-
openvswitch package installed and the following command executed:

# ovs-vsctl set-manager ptcp:6640:127.0.0.1

Once that's all done, the openvswitch agent configuration should be
changed to the following:

[OVS]
ovsdb_interface = ovsdb

Restarting the OVS agent will setup everything but leave your network in
a failed state because the correct patch ports aren't updated:

# ovs-vsctl show
Bridge br-ex
Port br-ex
Interface br-ex
type: internal
Port "em1"
Interface "em1"
Port phy-br-ex
Interface phy-br-ex
type: patch
options: {peer=nonexistent-peer}
Bridge br-int
fail_mode: secure
Port "qvo25d28228-9c"
tag: 1
Interface "qvo25d28228-9c"
...
Port int-br-ex
Interface int-br-ex
type: patch
options: {peer=nonexistent-peer}

Reverting to the regular old forked implementation works with no
problems.

** Affects: neutron
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1528894

Title:
  Native ovsdb implementation not working

Status in neutron:
  New

Bug description:
  When trying to use the new native OVSDB provider, the connectivity
  never goes up due to the fact that what seems to be the db_set
  operation failing to change the patch ports from "nonexistant-peer" to
  the correct peer, therefore not linking the bridges together.

  
https://github.com/openstack/neutron/blob/master/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py#L1119

  The system must be running the latest Liberty release, python-
  openvswitch package installed and the following command executed:

  # ovs-vsctl set-manager ptcp:6640:127.0.0.1

  Once that's all done, the openvswitch agent configuration should be
  changed to the following:

  [OVS]
  ovsdb_interface = ovsdb

  Restarting the OVS agent will setup everything but leave your network
  in a failed state because the correct patch ports aren't updated:

  # ovs-vsctl show
  Bridge br-ex
  Port br-ex
  Interface br-ex
  type: internal
  Port "em1"
  Interface "em1"
  Port phy-br-ex
  Interface phy-br-ex
  type: patch
  options: {peer=nonexistent-peer}
  Bridge br-int
  fail_mode: secure
  Port "qvo25d28228-9c"
  tag: 1
  Interface "qvo25d28228-9c"
  ...
  Port int-br-ex
  Interface int-br-ex
  type: patch
  options: {peer=nonexistent-peer}

  Reverting to the regular old forked implementation works with no
  problems.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1528894/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1528895] [NEW] Timeouts in update_device_list (too slow with large # of VIFs)

2015-12-23 Thread Mohammed Naser
Public bug reported:

In our environment, we have some large compute nodes with a large number
of VIFs.  When the update_device_list call happens on the agent start
up:

https://github.com/openstack/neutron/blob/master/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py#L842

This takes a very long time as it seems to loop on each port at the
server side, contact Nova and much more. The default rpc timeout of 60
seconds is not enough and it ends up failing on a server with around 120
VIFs.  When raising the timeout to 120, it seems to work with no
problems.

2015-12-23 15:27:27.373 38588 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent 
[req-1e6cc46d-eb52-4d99-bd77-bf2e8424a1ea - - - - -] Error while processing VIF 
ports
2015-12-23 15:27:27.373 38588 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent Traceback (most 
recent call last):
2015-12-23 15:27:27.373 38588 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent   File 
"/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py",
 line 1752, in rpc_loop
2015-12-23 15:27:27.373 38588 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent 
ovs_restarted)
2015-12-23 15:27:27.373 38588 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent   File 
"/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py",
 line 1507, in process_network_ports
2015-12-23 15:27:27.373 38588 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent 
self._bind_devices(need_binding_devices)
2015-12-23 15:27:27.373 38588 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent   File 
"/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py",
 line 847, in _bind_devices
2015-12-23 15:27:27.373 38588 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent 
self.conf.host)
2015-12-23 15:27:27.373 38588 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent   File 
"/usr/lib/python2.7/dist-packages/neutron/agent/rpc.py", line 179, in 
update_device_list
2015-12-23 15:27:27.373 38588 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent 
agent_id=agent_id, host=host)
2015-12-23 15:27:27.373 38588 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent   File 
"/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/client.py", line 158, in 
call
2015-12-23 15:27:27.373 38588 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent 
retry=self.retry)
2015-12-23 15:27:27.373 38588 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent   File 
"/usr/lib/python2.7/dist-packages/oslo_messaging/transport.py", line 90, in 
_send
2015-12-23 15:27:27.373 38588 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent 
timeout=timeout, retry=retry)
2015-12-23 15:27:27.373 38588 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent   File 
"/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 
431, in send
2015-12-23 15:27:27.373 38588 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent retry=retry)
2015-12-23 15:27:27.373 38588 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent   File 
"/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 
420, in _send
2015-12-23 15:27:27.373 38588 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent result = 
self._waiter.wait(msg_id, timeout)
2015-12-23 15:27:27.373 38588 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent   File 
"/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 
318, in wait
2015-12-23 15:27:27.373 38588 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent message = 
self.waiters.get(msg_id, timeout=timeout)
2015-12-23 15:27:27.373 38588 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent   File 
"/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 
223, in get
2015-12-23 15:27:27.373 38588 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent 'to message 
ID %s' % msg_id)
2015-12-23 15:27:27.373 38588 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent 
MessagingTimeout: Timed out waiting for a reply to message ID 
c42c1ffc801b41ca89aa4472696bbf1a

I don't think that an RPC call should ever take that long, the neutron-
server is not loaded or anything and adding new ones doesn't seem to
resolve it, due to the fact a single RPC responder answers this.

** Affects: neutron
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.

[Yahoo-eng-team] [Bug 1518016] [NEW] Nova kilo requires concurrency 1.8.2 or better

2015-11-19 Thread Mohammed Naser
Public bug reported:

OpenStack Nova Kilo release requires 1.8.2 or higher, this is due to the
addition of on_execute and on_completion to the execute(..) function.
The latest Ubuntu OpenStack Kilo packages currently have code that
depend on this new updated release.  This results in a crash in some
operations like resizes or migrations.

2015-11-19 16:26:24.103 7779 TRACE nova.compute.manager [instance: 
c04c1cf3-fbd9-40fd-be2e-e7dc06eb9f24] Traceback (most recent call last):
2015-11-19 16:26:24.103 7779 TRACE nova.compute.manager [instance: 
c04c1cf3-fbd9-40fd-be2e-e7dc06eb9f24]   File 
"/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 6459, in 
_error_out_instance_on_exception
2015-11-19 16:26:24.103 7779 TRACE nova.compute.manager [instance: 
c04c1cf3-fbd9-40fd-be2e-e7dc06eb9f24] yield
2015-11-19 16:26:24.103 7779 TRACE nova.compute.manager [instance: 
c04c1cf3-fbd9-40fd-be2e-e7dc06eb9f24]   File 
"/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 4054, in 
resize_instance
2015-11-19 16:26:24.103 7779 TRACE nova.compute.manager [instance: 
c04c1cf3-fbd9-40fd-be2e-e7dc06eb9f24] timeout, retry_interval)
2015-11-19 16:26:24.103 7779 TRACE nova.compute.manager [instance: 
c04c1cf3-fbd9-40fd-be2e-e7dc06eb9f24]   File 
"/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 6353, in 
migrate_disk_and_power_off
2015-11-19 16:26:24.103 7779 TRACE nova.compute.manager [instance: 
c04c1cf3-fbd9-40fd-be2e-e7dc06eb9f24] shared_storage)
2015-11-19 16:26:24.103 7779 TRACE nova.compute.manager [instance: 
c04c1cf3-fbd9-40fd-be2e-e7dc06eb9f24]   File 
"/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 85, in __exit__
2015-11-19 16:26:24.103 7779 TRACE nova.compute.manager [instance: 
c04c1cf3-fbd9-40fd-be2e-e7dc06eb9f24] six.reraise(self.type_, self.value, 
self.tb)
2015-11-19 16:26:24.103 7779 TRACE nova.compute.manager [instance: 
c04c1cf3-fbd9-40fd-be2e-e7dc06eb9f24]   File 
"/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 6342, in 
migrate_disk_and_power_off
2015-11-19 16:26:24.103 7779 TRACE nova.compute.manager [instance: 
c04c1cf3-fbd9-40fd-be2e-e7dc06eb9f24] on_completion=on_completion)
2015-11-19 16:26:24.103 7779 TRACE nova.compute.manager [instance: 
c04c1cf3-fbd9-40fd-be2e-e7dc06eb9f24]   File 
"/usr/lib/python2.7/dist-packages/nova/virt/libvirt/utils.py", line 329, in 
copy_image
2015-11-19 16:26:24.103 7779 TRACE nova.compute.manager [instance: 
c04c1cf3-fbd9-40fd-be2e-e7dc06eb9f24] on_execute=on_execute, 
on_completion=on_completion)
2015-11-19 16:26:24.103 7779 TRACE nova.compute.manager [instance: 
c04c1cf3-fbd9-40fd-be2e-e7dc06eb9f24]   File 
"/usr/lib/python2.7/dist-packages/nova/virt/libvirt/utils.py", line 55, in 
execute
2015-11-19 16:26:24.103 7779 TRACE nova.compute.manager [instance: 
c04c1cf3-fbd9-40fd-be2e-e7dc06eb9f24] return utils.execute(*args, **kwargs)
2015-11-19 16:26:24.103 7779 TRACE nova.compute.manager [instance: 
c04c1cf3-fbd9-40fd-be2e-e7dc06eb9f24]   File 
"/usr/lib/python2.7/dist-packages/nova/utils.py", line 207, in execute
2015-11-19 16:26:24.103 7779 TRACE nova.compute.manager [instance: 
c04c1cf3-fbd9-40fd-be2e-e7dc06eb9f24] return processutils.execute(*cmd, 
**kwargs)
2015-11-19 16:26:24.103 7779 TRACE nova.compute.manager [instance: 
c04c1cf3-fbd9-40fd-be2e-e7dc06eb9f24]   File 
"/usr/lib/python2.7/dist-packages/oslo_concurrency/processutils.py", line 174, 
in execute
2015-11-19 16:26:24.103 7779 TRACE nova.compute.manager [instance: 
c04c1cf3-fbd9-40fd-be2e-e7dc06eb9f24] raise UnknownArgumentError(_('Got 
unknown keyword args: %r') % kwargs)
2015-11-19 16:26:24.103 7779 TRACE nova.compute.manager [instance: 
c04c1cf3-fbd9-40fd-be2e-e7dc06eb9f24] UnknownArgumentError: Got unknown keyword 
args: {'on_execute':  at 0x7f3a64527050>, 'on_completion': 
 at 0x7f39ff6ddf50>}
2015-11-19 16:26:24.103 7779 TRACE nova.compute.manager [instance: 
c04c1cf3-fbd9-40fd-be2e-e7dc06eb9f24] 

https://github.com/openstack/requirements/commit/2fd00d00db5fce57d9589643801942d0332b1670

This commit above shows that OpenStack now requires 1.8.2 instead of
1.8.0.  We would appreciate if the 1.8.2 upstream release can be brought
in to resolve this bug.

Thank you.

** Affects: nova
 Importance: Undecided
 Status: New

** Affects: python-oslo.concurrency (Ubuntu)
 Importance: Undecided
 Status: New

** Also affects: nova
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1518016

Title:
  Nova kilo requires concurrency 1.8.2 or better

Status in OpenStack Compute (nova):
  New
Status in python-oslo.concurrency package in Ubuntu:
  New

Bug description:
  OpenStack Nova Kilo release requires 1.8.2 or higher, this is due to
  the addition of on_execute and on_completion to the execute(..)
  function.  

[Yahoo-eng-team] [Bug 1338614] [NEW] Backgrounded resizing does not work

2014-07-07 Thread Mohammed Naser
Public bug reported:

When setting resize_rootfs to 'noblock', cloud-init should fork a new
process and continue with it's own initialization process.  However, it
seems that this is currently broken, as you see from these logs that it
still blocks on it:

Jul  7 12:34:20 localhost [CLOUDINIT] cc_resizefs.py[DEBUG]: Resizing (via 
forking) root filesystem (type=ext4, val=noblock)
Jul  7 12:34:20 localhost [CLOUDINIT] util.py[WARNING]: Failed forking and 
calling callback NoneType
Jul  7 12:34:20 localhost [CLOUDINIT] util.py[DEBUG]: Failed forking and 
calling callback NoneType#012Traceback (most recent call last):#012  File 
/usr/lib/python2.6/site-packages/cloudinit/util.py, line 220, in fork_cb#012  
  child_cb(*args)#012TypeError: 'NoneType' object is not callable

Also, when looking at timings, you can see that it was blocked on it for
the whole time

Jul  7 12:33:38 localhost [CLOUDINIT] util.py[DEBUG]: Cloud-init v. 0.7.4 
running 'init' at Mon, 07 Jul 2014 12:33:38 +. Up 5.67 seconds.
Jul  7 12:34:20 localhost [CLOUDINIT] util.py[DEBUG]: backgrounded Resizing 
took 41.487 seconds
Jul  7 12:34:20 localhost [CLOUDINIT] util.py[DEBUG]: cloud-init mode 'init' 
took 41.799 seconds (41.80)

** Affects: cloud-init
 Importance: Undecided
 Status: Confirmed

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to cloud-init.
https://bugs.launchpad.net/bugs/1338614

Title:
  Backgrounded resizing does not work

Status in Init scripts for use on cloud images:
  Confirmed

Bug description:
  When setting resize_rootfs to 'noblock', cloud-init should fork a new
  process and continue with it's own initialization process.  However,
  it seems that this is currently broken, as you see from these logs
  that it still blocks on it:

  Jul  7 12:34:20 localhost [CLOUDINIT] cc_resizefs.py[DEBUG]: Resizing (via 
forking) root filesystem (type=ext4, val=noblock)
  Jul  7 12:34:20 localhost [CLOUDINIT] util.py[WARNING]: Failed forking and 
calling callback NoneType
  Jul  7 12:34:20 localhost [CLOUDINIT] util.py[DEBUG]: Failed forking and 
calling callback NoneType#012Traceback (most recent call last):#012  File 
/usr/lib/python2.6/site-packages/cloudinit/util.py, line 220, in fork_cb#012  
  child_cb(*args)#012TypeError: 'NoneType' object is not callable

  Also, when looking at timings, you can see that it was blocked on it
  for the whole time

  Jul  7 12:33:38 localhost [CLOUDINIT] util.py[DEBUG]: Cloud-init v. 0.7.4 
running 'init' at Mon, 07 Jul 2014 12:33:38 +. Up 5.67 seconds.
  Jul  7 12:34:20 localhost [CLOUDINIT] util.py[DEBUG]: backgrounded Resizing 
took 41.487 seconds
  Jul  7 12:34:20 localhost [CLOUDINIT] util.py[DEBUG]: cloud-init mode 'init' 
took 41.799 seconds (41.80)

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-init/+bug/1338614/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1311778] [NEW] Unit tests fail with MessagingTimeout errors

2014-04-23 Thread Mohammed Naser
Public bug reported:

There is an issue that is causing unit tests to fail with the following
error:

MessagingTimeout: No reply on topic conductor
MessagingTimeout: No reply on topic scheduler

2014-04-23 13:45:52.017 | Traceback (most recent call last):
2014-04-23 13:45:52.017 |   File 
/home/jenkins/workspace/gate-nova-python26/.tox/py26/lib/python2.6/site-packages/oslo/messaging/rpc/dispatcher.py,
 line 133, in _dispatch_and_reply
2014-04-23 13:45:52.017 | incoming.message))
2014-04-23 13:45:52.017 |   File 
/home/jenkins/workspace/gate-nova-python26/.tox/py26/lib/python2.6/site-packages/oslo/messaging/rpc/dispatcher.py,
 line 176, in _dispatch
2014-04-23 13:45:52.017 | return self._do_dispatch(endpoint, method, ctxt, 
args)
2014-04-23 13:45:52.017 |   File 
/home/jenkins/workspace/gate-nova-python26/.tox/py26/lib/python2.6/site-packages/oslo/messaging/rpc/dispatcher.py,
 line 122, in _do_dispatch
2014-04-23 13:45:52.017 | result = getattr(endpoint, method)(ctxt, 
**new_args)
2014-04-23 13:45:52.018 |   File nova/conductor/manager.py, line 798, in 
build_instances
2014-04-23 13:45:52.018 | legacy_bdm_in_spec=legacy_bdm)
2014-04-23 13:51:50.628 |   File nlibvir:  error : internal error could not 
initialize domain event timer
2014-04-23 13:54:57.953 | ova/scheduler/rpcapi.py, line 120, in run_instance
2014-04-23 13:54:57.953 | cctxt.cast(ctxt, 'run_instance', **msg_kwargs)
2014-04-23 13:54:57.953 |   File 
/home/jenkins/workspace/gate-nova-python26/.tox/py26/lib/python2.6/site-packages/oslo/messaging/rpc/client.py,
 line 150, in call
2014-04-23 13:54:57.953 | wait_for_reply=True, timeout=timeout)
2014-04-23 13:54:57.953 |   File 
/home/jenkins/workspace/gate-nova-python26/.tox/py26/lib/python2.6/site-packages/oslo/messaging/transport.py,
 line 90, in _send
2014-04-23 13:54:57.953 | timeout=timeout)
2014-04-23 13:54:57.954 |   File 
/home/jenkins/workspace/gate-nova-python26/.tox/py26/lib/python2.6/site-packages/oslo/messaging/_drivers/impl_fake.py,
 line 166, in send
2014-04-23 13:54:57.954 | return self._send(target, ctxt, message, 
wait_for_reply, timeout)
2014-04-23 13:54:57.954 |   File 
/home/jenkins/workspace/gate-nova-python26/.tox/py26/lib/python2.6/site-packages/oslo/messaging/_drivers/impl_fake.py,
 line 161, in _send
2014-04-23 13:54:57.954 | 'No reply on topic %s' % target.topic)
2014-04-23 13:54:57.954 | MessagingTimeout: No reply on topic scheduler



2014-04-23 13:45:52.008 | Traceback (most recent call last):
2014-04-23 13:45:52.008 |   File nova/api/openstack/__init__.py, line 125, in 
__call__
2014-04-23 13:45:52.008 | return req.get_response(self.application)
2014-04-23 13:45:52.009 |   File 
/home/jenkins/workspace/gate-nova-python26/.tox/py26/lib/python2.6/site-packages/webob/request.py,
 line 1320, in send
2014-04-23 13:45:52.009 | application, catch_exc_info=False)
2014-04-23 13:45:52.009 |   File 
/home/jenkins/workspace/gate-nova-python26/.tox/py26/lib/python2.6/site-packages/webob/request.py,
 line 1284, in call_application
2014-04-23 13:45:52.009 | app_iter = application(self.environ, 
start_response)
2014-04-23 13:45:52.009 |   File 
/home/jenkins/workspace/gate-nova-python26/.tox/py26/lib/python2.6/site-packages/webob/dec.py,
 line 144, in __call__
2014-04-23 13:45:52.009 | return resp(environ, start_response)
2014-04-23 13:45:52.009 |   File 
/home/jenkins/workspace/gate-nova-python26/.tox/py26/lib/python2.6/site-packages/webob/dec.py,
 line 144, in __call__
2014-04-23 13:45:52.010 | return resp(environ, start_response)
2014-04-23 13:45:52.010 |   File 
/home/jenkins/workspace/gate-nova-python26/.tox/py26/lib/python2.6/site-packages/webob/dec.py,
 line 144, in __call__
2014-04-23 13:45:52.010 | return resp(environ, start_response)
2014-04-23 13:45:52.010 |   File 
/home/jenkins/workspace/gate-nova-python26/.tox/py26/lib/python2.6/site-packages/webob/dec.py,
 line 144, in __call__
2014-04-23 13:45:52.010 | return resp(environ, start_response)
2014-04-23 13:45:52.010 |   File 
/home/jenkins/workspace/gate-nova-python26/.tox/py26/lib/python2.6/site-packages/routes/middleware.py,
 line 131, in __call__
2014-04-23 13:45:52.010 | response = self.app(environ, start_response)
2014-04-23 13:45:52.011 |   File 
/home/jenkins/workspace/gate-nova-python26/.tox/py26/lib/python2.6/site-packages/webob/dec.py,
 line 144, in __call__
2014-04-23 13:45:52.011 | return resp(environ, start_response)
2014-04-23 13:45:52.011 |   File 
/home/jenkins/workspace/gate-nova-python26/.tox/py26/lib/python2.6/site-packages/webob/dec.py,
 line 130, in __call__
2014-04-23 13:45:52.011 | resp = self.call_func(req, *args, **self.kwargs)
2014-04-23 13:45:52.011 |   File 
/home/jenkins/workspace/gate-nova-python26/.tox/py26/lib/python2.6/site-packages/webob/dec.py,
 line 195, in call_func
2014-04-23 13:45:52.011 | return self.func(req, *args, **kwargs)
2014-04-23 13:45:52.012 |   File 

[Yahoo-eng-team] [Bug 1309043] [NEW] NetworkCommandsTestCase unit test failing

2014-04-17 Thread Mohammed Naser
Public bug reported:

Change-Id I663bd06eb50872f16fc9889dde917277739fefce introduced a race
condition where if another test doesn't properly reset the _IS_NEUTRON
flag, it will fail because it will think that it is using Neutron and
error out.

** Affects: nova
 Importance: Undecided
 Assignee: Mohammed Naser (mnaser)
 Status: In Progress

** Changed in: nova
 Assignee: (unassigned) = Mohammed Naser (mnaser)

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1309043

Title:
  NetworkCommandsTestCase unit test failing

Status in OpenStack Compute (Nova):
  In Progress

Bug description:
  Change-Id I663bd06eb50872f16fc9889dde917277739fefce introduced a race
  condition where if another test doesn't properly reset the _IS_NEUTRON
  flag, it will fail because it will think that it is using Neutron and
  error out.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1309043/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1309334] [NEW] Version aliases not updated for Icehouse

2014-04-17 Thread Mohammed Naser
Public bug reported:

With the release of Icehouse, the RPC APIs were not updated for their
version aliases.

** Affects: nova
 Importance: Undecided
 Assignee: Mohammed Naser (mnaser)
 Status: New

** Changed in: nova
 Assignee: (unassigned) = Mohammed Naser (mnaser)

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1309334

Title:
  Version aliases not updated for Icehouse

Status in OpenStack Compute (Nova):
  New

Bug description:
  With the release of Icehouse, the RPC APIs were not updated for their
  version aliases.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1309334/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1240197] Re: Add support for 'auto' number of API or conductor workers

2014-04-16 Thread Mohammed Naser
This has been taken care of in this merged review

https://review.openstack.org/#/c/69266/

** Changed in: nova
   Status: Confirmed = Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1240197

Title:
  Add support for 'auto' number of API or conductor workers

Status in OpenStack Compute (Nova):
  Fix Released

Bug description:
  Nova has some configuration options that allow you to have some
  services start multiple worker processes.

  [general]
  ec2_workers=
  osapi_compute_workers=
  metadata_workers=
  
  [conductor]
  workers=

  Swift has a similar workers option.  In Swift, you can set this
  option to 'auto', and it will use the number of CPU cores.  We should
  add support for 'auto' to all of the workers options in Nova.

  https://git.openstack.org/cgit/openstack/swift/tree/etc/proxy-
  server.conf-sample

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1240197/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1025481] Re: Instance usage audit fails under PostgreSQL

2014-04-13 Thread Mohammed Naser
*** This bug is a duplicate of bug 1102477 ***
https://bugs.launchpad.net/bugs/1102477

** Changed in: nova
   Status: Triaged = Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1025481

Title:
  Instance usage audit fails under PostgreSQL

Status in OpenStack Compute (Nova):
  Fix Released

Bug description:
  The instance_usage_audit calls are not working when using PostgreSQL
  (not sure about other DB implementations) because SQLAlchemy sends it
  as a date when it expects a varchar.

  Stacktrace:
  2012-07-17 00:00:07 DEBUG nova.manager [-] Running periodic task 
ComputeManager._instance_usage_audit from (pid=6658) periodic_tasks 
/usr/local/lib/python2.7/dist-packages/nova-2012.2-py2.7.egg/nova/manager.py:164
  2012-07-17 00:00:07 ERROR nova.manager [-] Error during 
ComputeManager._instance_usage_audit: (ProgrammingError) operator does not 
exist: character varying = timestamp without time zone
  LINE 3: ...stance_usage_audit' AND task_log.period_beginning = '2012-06...
   ^
  HINT:  No operator matches the given name and argument type(s). You might 
need to add explicit type casts.
   'SELECT task_log.created_at AS task_log_created_at, task_log.updated_at AS 
task_log_updated_at, task_log.deleted_at AS task_log_deleted_at, 
task_log.deleted AS task_log_deleted, task_log.id AS task_log_id, 
task_log.task_name AS task_log_task_name, task_log.state AS task_log_state, 
task_log.host AS task_log_host, task_log.period_beginning AS 
task_log_period_beginning, task_log.period_ending AS task_log_period_ending, 
task_log.message AS task_log_message, task_log.task_items AS 
task_log_task_items, task_log.errors AS task_log_errors \nFROM task_log \nWHERE 
task_log.deleted = %(deleted_1)s AND task_log.task_name = %(task_name_1)s AND 
task_log.period_beginning = %(period_beginning_1)s AND task_log.period_ending = 
%(period_ending_1)s AND task_log.host = %(host_1)s \n LIMIT %(param_1)s' 
{'host_1': 'compute2', 'param_1': 1, 'deleted_1': False, 'period_ending_1': 
datetime.datetime(2012, 7, 1, 0, 0), 'task_name_1': 'instance_usage_audit', 
'period_beginning_1': datetime.datetime(2012, 6, 1,
  0, 0)}
  2012-07-17 00:00:07 TRACE nova.manager Traceback (most recent call last):
  2012-07-17 00:00:07 TRACE nova.manager   File 
/usr/local/lib/python2.7/dist-packages/nova-2012.2-py2.7.egg/nova/manager.py, 
line 167, in periodic_tasks
  2012-07-17 00:00:07 TRACE nova.manager task(self, context)
  2012-07-17 00:00:07 TRACE nova.manager   File 
/usr/local/lib/python2.7/dist-packages/nova-2012.2-py2.7.egg/nova/compute/manager.py,
 line 2381, in _instance_usage_audit
  2012-07-17 00:00:07 TRACE nova.manager if not 
compute_utils.has_audit_been_run(context, self.host):
  2012-07-17 00:00:07 TRACE nova.manager   File 
/usr/local/lib/python2.7/dist-packages/nova-2012.2-py2.7.egg/nova/compute/utils.py,
 line 116, in has_audit_been_run
  2012-07-17 00:00:07 TRACE nova.manager begin, end, host)
  2012-07-17 00:00:07 TRACE nova.manager   File 
/usr/local/lib/python2.7/dist-packages/nova-2012.2-py2.7.egg/nova/db/api.py, 
line 1879, in task_log_get
  2012-07-17 00:00:07 TRACE nova.manager period_ending, host, state, 
session)
  2012-07-17 00:00:07 TRACE nova.manager   File 
/usr/local/lib/python2.7/dist-packages/nova-2012.2-py2.7.egg/nova/db/sqlalchemy/api.py,
 line 114, in wrapper
  2012-07-17 00:00:07 TRACE nova.manager return f(*args, **kwargs)
  2012-07-17 00:00:07 TRACE nova.manager   File 
/usr/local/lib/python2.7/dist-packages/nova-2012.2-py2.7.egg/nova/db/sqlalchemy/api.py,
 line 4971, in task_log_get
  2012-07-17 00:00:07 TRACE nova.manager return query.first()
  2012-07-17 00:00:07 TRACE nova.manager   File 
/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/query.py, line 2156, in 
first
  2012-07-17 00:00:07 TRACE nova.manager ret = list(self[0:1])
  2012-07-17 00:00:07 TRACE nova.manager   File 
/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/query.py, line 2023, in 
__getitem__
  2012-07-17 00:00:07 TRACE nova.manager return list(res)
  2012-07-17 00:00:07 TRACE nova.manager   File 
/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/query.py, line 2227, in 
__iter__
  2012-07-17 00:00:07 TRACE nova.manager return 
self._execute_and_instances(context)
  2012-07-17 00:00:07 TRACE nova.manager   File 
/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/query.py, line 2242, in 
_execute_and_instances
  2012-07-17 00:00:07 TRACE nova.manager result = 
conn.execute(querycontext.statement, self._params)
  2012-07-17 00:00:07 TRACE nova.manager   File 
/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py, line 1449, 
in execute
  2012-07-17 00:00:07 TRACE nova.manager params)
  2012-07-17 00:00:07 TRACE nova.manager   File