[Yahoo-eng-team] [Bug 1992139] [NEW] Wrong url : compute admin guide

2022-10-07 Thread Hyeokjin Doo
Public bug reported:

in https://docs.openstack.org/api-guide/compute/users.html,
in Role based access control 
compute admin guide link is crashed. 
-> https://docs.openstack.org/nova/latest//admin/arch#projects-users-and-roles
 I think correct url is this.
-> 
https://docs.openstack.org/nova/latest//admin/architecture.html#projects-users-and-roles

This bug tracker is for errors with the documentation, use the following
as a template and remove or add fields as you see fit. Convert [ ] into
[x] to check boxes:

- [x] This doc is inaccurate in this way: __
- [ ] This is a doc addition request.
- [ ] I have a fix to the document that I can paste below including example: 
input and output. 

If you have a troubleshooting or support issue, use the following
resources:

 - The mailing list: https://lists.openstack.org
 - IRC: 'openstack' channel on OFTC

---
Release: 2.1.0 on 2022-08-24 10:31:37
SHA: 51ae2e6baf7d33853f496ac7b69edf46549bfc02
Source: https://opendev.org/openstack/nova/src/api-guide/source/users.rst
URL: https://docs.openstack.org/api-guide/compute/users.html

** Affects: nova
 Importance: Undecided
 Status: New


** Tags: api-guide

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1992139

Title:
  Wrong url : compute admin guide

Status in OpenStack Compute (nova):
  New

Bug description:
  in https://docs.openstack.org/api-guide/compute/users.html,
  in Role based access control 
  compute admin guide link is crashed. 
  -> https://docs.openstack.org/nova/latest//admin/arch#projects-users-and-roles
   I think correct url is this.
  -> 
https://docs.openstack.org/nova/latest//admin/architecture.html#projects-users-and-roles

  This bug tracker is for errors with the documentation, use the
  following as a template and remove or add fields as you see fit.
  Convert [ ] into [x] to check boxes:

  - [x] This doc is inaccurate in this way: __
  - [ ] This is a doc addition request.
  - [ ] I have a fix to the document that I can paste below including example: 
input and output. 

  If you have a troubleshooting or support issue, use the following
  resources:

   - The mailing list: https://lists.openstack.org
   - IRC: 'openstack' channel on OFTC

  ---
  Release: 2.1.0 on 2022-08-24 10:31:37
  SHA: 51ae2e6baf7d33853f496ac7b69edf46549bfc02
  Source: https://opendev.org/openstack/nova/src/api-guide/source/users.rst
  URL: https://docs.openstack.org/api-guide/compute/users.html

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1992139/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1992161] [NEW] Unknown quota resource security_group_rule in neutron-rpc-server

2022-10-07 Thread Johannes Kulik
Public bug reported:

When restarting our linuxbridge-agents, we see exceptions for some of the
networks: Unknown quota resources ['security_group_rule']. This stops the
linuxbridge-agent from fully bringing up that network.

Prerequisites:
* run api-server and rpc-server in different process
  We have neutron-server running with uWSGI and start the neutron-rpc-server in 
another container.

Steps to reproduce:
* have a project with server/network/ports
* have an unused default security group
* delete the default security group
* restart the appropriate linuxbridge-agent

Version:
* Ussuri with custom patches on top: https://github.com/sapcc/neutron

Expected behavior:
linuxbridge-agent should bring up all networks even if the user deleted the
default security group.

Either don't create a default security-group when called via the
linuxbridge-agent instead of the API or make the quota available in the
rpc-server so the default security-group can be created.

Creating/updating a port or creating a network via API will create the default
security group and fix the problem on the linuxbridge-agent, too. I just don't
think that's acceptable to have the user/admin do some API actions in case the
user did something they maybe shouldn't have.

We've also seen the same exception from a dhcp-agent. Attached both a traceback
from linuxbridge as well as from dhcp-agent.

Trying to debug this, we found that no quota resources are registered in 
neutron-rpc-server. This can be seen when using the eventlet backdoor by these 
commands:
  from neutron.quota import resource_registry;
  resource_registry.get_all_resources()

** Affects: neutron
 Importance: Undecided
 Status: New

** Attachment added: "tracebacks from dhcp-agent and linuxbridge agent calling 
neutron-rpc-server"
   
https://bugs.launchpad.net/bugs/1992161/+attachment/5622035/+files/rpc-no-default-security-group-creation.txt

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1992161

Title:
  Unknown quota resource security_group_rule in neutron-rpc-server

Status in neutron:
  New

Bug description:
  When restarting our linuxbridge-agents, we see exceptions for some of the
  networks: Unknown quota resources ['security_group_rule']. This stops the
  linuxbridge-agent from fully bringing up that network.

  Prerequisites:
  * run api-server and rpc-server in different process
We have neutron-server running with uWSGI and start the neutron-rpc-server 
in another container.

  Steps to reproduce:
  * have a project with server/network/ports
  * have an unused default security group
  * delete the default security group
  * restart the appropriate linuxbridge-agent

  Version:
  * Ussuri with custom patches on top: https://github.com/sapcc/neutron

  Expected behavior:
  linuxbridge-agent should bring up all networks even if the user deleted the
  default security group.

  Either don't create a default security-group when called via the
  linuxbridge-agent instead of the API or make the quota available in the
  rpc-server so the default security-group can be created.

  Creating/updating a port or creating a network via API will create the default
  security group and fix the problem on the linuxbridge-agent, too. I just don't
  think that's acceptable to have the user/admin do some API actions in case the
  user did something they maybe shouldn't have.

  We've also seen the same exception from a dhcp-agent. Attached both a 
traceback
  from linuxbridge as well as from dhcp-agent.

  Trying to debug this, we found that no quota resources are registered in 
neutron-rpc-server. This can be seen when using the eventlet backdoor by these 
commands:
from neutron.quota import resource_registry;
resource_registry.get_all_resources()

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1992161/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1992169] [NEW] instance_faults entries are created on InstanceInvalidState exceptions

2022-10-07 Thread Pavlo Shchelokovskyy
Public bug reported:

This might somewhat be related to
https://bugs.launchpad.net/nova/+bug/1800755 and discussion there.

Recently the following problem was reported in one of our clouds:

- a homegrown self-written monitoring that polls servers diagnostics
- the monitoring script is naive and does not check the server state before 
requesting server diagnostics
- several servers in shutdown state
- instance_faults table is growing and ballooning database size on disk

During handling of GET /servers//diagnostics call for anything but 
RUNNING instance nova raises InstanceInvalidState exception which is then:
- stored in instance_faults table;
- returns as HTTP409 Conflict to the user.

https://opendev.org/openstack/nova/src/commit/03d2715ed492350fa11908aea0fdd0265993e284/nova/compute/manager.py#L6550-L6558

Effectively benign 'read-only' GET requests are recorded in the DB.
Also, these instance_faults entries can not purged by standard means
since the instance is not deleted yet. What's more, they won't be shown
in any API at all, since the server is also not in ERROR state.

This got me thinking - should the InvalidInstanceState be saved as 
instance_faults at all?
After all, usually this exception indicates not the problem (fault) with the 
instance, but the mismatch between instance state and requested action upon 
instance, which might not warrant storing it.

There's also a slight DoS potential here, but since default policy for
get diagnostics call is admin-only, this is probably not worth worrying.

** Affects: nova
 Importance: Undecided
 Assignee: Pavlo Shchelokovskyy (pshchelo)
 Status: New

** Changed in: nova
 Assignee: (unassigned) => Pavlo Shchelokovskyy (pshchelo)

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1992169

Title:
  instance_faults entries are created on InstanceInvalidState exceptions

Status in OpenStack Compute (nova):
  New

Bug description:
  This might somewhat be related to
  https://bugs.launchpad.net/nova/+bug/1800755 and discussion there.

  Recently the following problem was reported in one of our clouds:

  - a homegrown self-written monitoring that polls servers diagnostics
  - the monitoring script is naive and does not check the server state before 
requesting server diagnostics
  - several servers in shutdown state
  - instance_faults table is growing and ballooning database size on disk

  During handling of GET /servers//diagnostics call for anything but 
RUNNING instance nova raises InstanceInvalidState exception which is then:
  - stored in instance_faults table;
  - returns as HTTP409 Conflict to the user.

  
https://opendev.org/openstack/nova/src/commit/03d2715ed492350fa11908aea0fdd0265993e284/nova/compute/manager.py#L6550-L6558

  Effectively benign 'read-only' GET requests are recorded in the DB.
  Also, these instance_faults entries can not purged by standard means
  since the instance is not deleted yet. What's more, they won't be
  shown in any API at all, since the server is also not in ERROR state.

  This got me thinking - should the InvalidInstanceState be saved as 
instance_faults at all?
  After all, usually this exception indicates not the problem (fault) with the 
instance, but the mismatch between instance state and requested action upon 
instance, which might not warrant storing it.

  There's also a slight DoS potential here, but since default policy for
  get diagnostics call is admin-only, this is probably not worth
  worrying.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1992169/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1991509] Re: [ovn-octavia-provider] router gateway unset + set breaks ovn lb connectivity

2022-10-07 Thread OpenStack Infra
Reviewed:  https://review.opendev.org/c/openstack/ovn-octavia-provider/+/858363
Committed: 
https://opendev.org/openstack/ovn-octavia-provider/commit/d889812f3c2deaf630892d65c90cdbafd4435d21
Submitter: "Zuul (22348)"
Branch:master

commit d889812f3c2deaf630892d65c90cdbafd4435d21
Author: Luis Tomas Bolivar 
Date:   Fri Sep 30 20:13:49 2022 +0200

Ensure lbs are properly configured for router gateway set/unset

Before adding support for lbs with VIPs on the provider networks,
there was no need to react to gateway chassis creation events, and
nothing was done for its deletion. However, after adding the support
for that, there is a need of properly handling the creation event
for that type of port.

For the loadbalancer VIPs on the provider network, processing the
event and triggering the lb_creat_lrp_assoc_handler means that the
information about the logical router will be added, i.e., the router
is added to the loadbalancer external_ids as a lr_ref, while the
loadbalancer is also added into the logical_router object
(loadbalancer entry). In addition, the lb is also added to the logical
switches connected to the router.

For the loadbalancer VIPs on tenant networks (which should not be
affected by the gateway port creation/deletion), this patch ensures
the lb is not added to the logical switch representing the provider
network that is connected to the router. Therefore differentiating
between lrp ports which has gateway_chassis and the ones that don't,
i.e., adding the lb to the switch when the lrp port is the one
connecting a subnet with the router, and not doing so when it is the
gateway port for the router to the provider network.

Closes-Bug: #1991509

Change-Id: Iddd96fd9015230b3dd75aa2182055cf43eb608c1


** Changed in: neutron
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1991509

Title:
  [ovn-octavia-provider] router gateway unset + set breaks ovn lb
  connectivity

Status in neutron:
  Fix Released

Bug description:
  The LogicalRouterPortEvent for gateway_chassis port are skip [1],
  however if ovn lb VIPs are on provider network, the create event needs
  to be handle so that the loadbalancer gets properly configured and
  added to the router

  
  [1] 
https://opendev.org/openstack/ovn-octavia-provider/src/commit/acbf6e7f3e223c088582390475c84464bc27227d/ovn_octavia_provider/event.py#L39

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1991509/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1992183] [NEW] Openstack: Application credential token remains valid longer than expected

2022-10-07 Thread David Wilde
*** This bug is a security vulnerability ***

Public security bug reported:

Description of problem:
Keystone issues tokens with the default lifespan regardless of the lifespan of 
the application credentials used to issue them.
If the configured lifespan of an identity token is set to be 1h, and the 
application credentials expire in 1 minute from now, a newly issued token will 
outlive the application credentials used to issue it by 59 minutes.

How reproducible: 100%

Steps to Reproduce:
1. Create application credentials with short expiration time (e.g. 10 seconds)
2. openstack token issue
--> the returned token has standard expiration, for example 1 hour. The script 
below confirms that the token continue being valid after the application 
credentials expired.

```bash
#!/usr/bin/env bash

set -Eeuo pipefail

openstack image create --disk-format=raw --container-format=bare --file
<(echo 'I am a Glance image') testimage -f json > image.json

image_url="$(openstack catalog show glance -f json | jq -r '.endpoints[]
| select(.interface=="public").url')$(jq -r '.file' image.json)"

openstack application credential create \
--expiration="$(date --utc --date '+10 second' +%Y-%m-%dT%H:%M:%S)" \
token_test \
-f json \
> appcreds.json

cat < clouds.yaml
clouds:
${OS_CLOUD}:
auth:
auth_url: 
application_credential_id: '$(jq -r '.id' appcreds.json)'
application_credential_secret: '$(jq -r '.secret' appcreds.json)'
auth_type: "v3applicationcredential"
identity_api_version: 3
interface: public
region_name: 
EOF
# Override ~/.config/openstack/secure.yaml
touch secure.yaml

openstack token issue -f json > token.json

echo "appcreds expiration: $(jq -r '.expires_at' appcreds.json)"
for i in {1..10}; do
sleep 100
echo -ne "$(date --utc --rfc-3339=seconds)\t"
curl -isS -H "X-Auth-Token: $(jq -r '.id' token.json)" --url 
"$image_url" | head -n1
done

```

Actual results (on a cloud with tokens duration of 24h):
appcreds expiration: 2022-07-08T13:55:02.00
2022-07-08 13:56:38+00:00   HTTP/1.1 200 OK
2022-07-08 13:58:19+00:00   HTTP/1.1 200 OK
2022-07-08 14:00:00+00:00   HTTP/1.1 200 OK
2022-07-08 14:01:42+00:00   HTTP/1.1 200 OK
2022-07-08 14:03:23+00:00   HTTP/1.1 200 OK
2022-07-08 14:05:07+00:00   HTTP/1.1 200 OK
2022-07-08 14:06:49+00:00   HTTP/1.1 200 OK
2022-07-08 14:08:37+00:00   HTTP/1.1 200 OK
2022-07-08 14:10:18+00:00   HTTP/1.1 200 OK
2022-07-08 14:12:00+00:00   HTTP/1.1 200 OK

Expected results:
appcreds expiration: 2022-07-08T13:55:02.00
2022-07-08 13:54:38+00:00   HTTP/1.1 200 OK
2022-07-08 13:58:19+00:00   HTTP/1.1 401 Unauthorized
2022-07-08 14:00:00+00:00   HTTP/1.1 401 Unauthorized
2022-07-08 14:01:42+00:00   HTTP/1.1 401 Unauthorized
2022-07-08 14:03:23+00:00   HTTP/1.1 401 Unauthorized
2022-07-08 14:05:07+00:00   HTTP/1.1 401 Unauthorized
2022-07-08 14:06:49+00:00   HTTP/1.1 401 Unauthorized
2022-07-08 14:08:37+00:00   HTTP/1.1 401 Unauthorized
2022-07-08 14:10:18+00:00   HTTP/1.1 401 Unauthorized
2022-07-08 14:12:00+00:00   HTTP/1.1 401 Unauthorized

** Affects: keystone
 Importance: High
 Assignee: David Wilde (dave-wilde)
 Status: New

** Affects: ossa
 Importance: Undecided
 Assignee: David Wilde (dave-wilde)
 Status: New

** Also affects: ossa
   Importance: Undecided
   Status: New

** Changed in: ossa
 Assignee: (unassigned) => David Wilde (dave-wilde)

** Changed in: keystone
   Importance: Undecided => High

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Identity (keystone).
https://bugs.launchpad.net/bugs/1992183

Title:
  Openstack: Application credential token remains valid longer than
  expected

Status in OpenStack Identity (keystone):
  New
Status in OpenStack Security Advisory:
  New

Bug description:
  Description of problem:
  Keystone issues tokens with the default lifespan regardless of the lifespan 
of the application credentials used to issue them.
  If the configured lifespan of an identity token is set to be 1h, and the 
application credentials expire in 1 minute from now, a newly issued token will 
outlive the application credentials used to issue it by 59 minutes.

  How reproducible: 100%

  Steps to Reproduce:
  1. Create application credentials with short expiration time (e.g. 10 seconds)
  2. openstack token issue
  --> the returned token has standard expiration, for example 1 hour. The 
script below confirms that the token continue being valid after the application 
credentials expired.

  ```bash
  #!/usr/bin/env bash

  set -Eeuo pipefail

  openstack image create --disk-format=raw --container-format=bare
  --file <(echo 'I am a Glance image') testimage -f json > image.json

  image_url="$(openstack catalog show glance -f json | jq -r
  

[Yahoo-eng-team] [Bug 1992186] [NEW] "int object is not iterable" when using numerical group names

2022-10-07 Thread Mohammed Naser
Public bug reported:

When using federation and having the values of `groups` in the mapping
set to a number, it will be parsed into a a number and then fail to
authenticate:

```
{"error":{"code":400,"message":"'int' object is not iterable","title":"Bad 
Request"}}
```

I believe the bad bit is here:

https://github.com/openstack/keystone/blob/326b014434cc760ba08763e1870ac057f7917e98/keystone/federation/utils.py#L650-L661

** Affects: keystone
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Identity (keystone).
https://bugs.launchpad.net/bugs/1992186

Title:
  "int object is not iterable" when using numerical group names

Status in OpenStack Identity (keystone):
  New

Bug description:
  When using federation and having the values of `groups` in the mapping
  set to a number, it will be parsed into a a number and then fail to
  authenticate:

  ```
  {"error":{"code":400,"message":"'int' object is not iterable","title":"Bad 
Request"}}
  ```

  I believe the bad bit is here:

  
https://github.com/openstack/keystone/blob/326b014434cc760ba08763e1870ac057f7917e98/keystone/federation/utils.py#L650-L661

To manage notifications about this bug go to:
https://bugs.launchpad.net/keystone/+bug/1992186/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1933802] Re: missing global_request_id in neutron_lib context from_dict method

2022-10-07 Thread OpenStack Infra
Reviewed:  https://review.opendev.org/c/openstack/neutron-lib/+/859813
Committed: 
https://opendev.org/openstack/neutron-lib/commit/9ecd5995b6c598cee931087bf13fdd166f404034
Submitter: "Zuul (22348)"
Branch:master

commit 9ecd5995b6c598cee931087bf13fdd166f404034
Author: Kiran Pawar 
Date:   Thu Sep 29 08:40:30 2022 +

Use oslo_context.from_dict() for context generation

Use RequestContext.from_dict in oslo_context to generate context, and
add/update needed attr such as user_id, tenant_id, tenant_name and
timestamp.

Closes-bug: #1933802
Change-Id: I0527eb5fa8d32d97ca45e44d1b154b6529b3f847


** Changed in: neutron
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1933802

Title:
  missing global_request_id in neutron_lib context from_dict method

Status in neutron:
  Fix Released

Bug description:
  code:
  @classmethod
  def from_dict(cls, values):
  return cls(user_id=values.get('user_id', values.get('user')),
  ¦   ¦   ¦  tenant_id=values.get('tenant_id', values.get('project_id')),
  ¦   ¦   ¦  is_admin=values.get('is_admin'),
  ¦   ¦   ¦  roles=values.get('roles'),
  ¦   ¦   ¦  timestamp=values.get('timestamp'),
  ¦   ¦   ¦  request_id=values.get('request_id'),
  ¦   ¦   ¦  #global_request_id=values.get('global_request_id'),
  ¦   ¦   ¦  tenant_name=values.get('tenant_name'),
  ¦   ¦   ¦  user_name=values.get('user_name'),
  ¦   ¦   ¦  auth_token=values.get('auth_token'))

  project: neutron_lib
  path: neutron_lib/context.py

  please note the comment line, which should have been passed.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1933802/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1982284] Re: libvirt live migration sometimes fails with "libvirt.libvirtError: internal error: migration was active, but no RAM info was set"

2022-10-07 Thread melanie witt
** Also affects: nova/train
   Importance: Undecided
   Status: New

** Also affects: nova/victoria
   Importance: Undecided
   Status: New

** Also affects: nova/xena
   Importance: Undecided
   Status: New

** Also affects: nova/yoga
   Importance: Undecided
   Status: New

** Also affects: nova/wallaby
   Importance: Undecided
   Status: New

** Also affects: nova/ussuri
   Importance: Undecided
   Status: New

** Also affects: nova/zed
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1982284

Title:
  libvirt live migration sometimes fails with "libvirt.libvirtError:
  internal error: migration was active, but no   RAM info was set"

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) train series:
  In Progress
Status in OpenStack Compute (nova) ussuri series:
  In Progress
Status in OpenStack Compute (nova) victoria series:
  In Progress
Status in OpenStack Compute (nova) wallaby series:
  In Progress
Status in OpenStack Compute (nova) xena series:
  In Progress
Status in OpenStack Compute (nova) yoga series:
  In Progress
Status in OpenStack Compute (nova) zed series:
  In Progress

Bug description:
  We have seen this downstream where live migration randomly fails with
  the following error [1]:

libvirt.libvirtError: internal error: migration was active, but no
  RAM info was set

  Discussion on [1] gravitated toward a possible race condition issue in
  qemu around the query-migrate command [2]. The query-migrate command
  is used (indirectly) by the libvirt driver during monitoring of live
  migrations [3][4][5].

  While searching for info about this error, I found a thread on libvir-
  list from the past [6] where someone else encountered the same error
  and for them it happened if they called query-migrate *after* a live
  migration had completed.

  Based on this, it seemed possible that our live migration monitoring
  thread sometimes races and calls jobStats() after the migration has
  completed, resulting in this error being raised and the migration
  being considered failed when it was actually complete.

  A patch has since been proposed and committed [7] to address the
  possible issue.

  Meanwhile, on our side in nova, we can mitigate this problematic
  behavior by catching the specific error from libvirt and ignoring it
  so that a live migration in this situation will be considered
  completed by the libvirt driver.

  Doing this would improve the experience for users that are hitting
  this error and getting erroneous live migration failures.

  [1] https://bugzilla.redhat.com/show_bug.cgi?id=2074205
  [2] 
https://qemu.readthedocs.io/en/latest/interop/qemu-qmp-ref.html#qapidoc-1848
  [3] 
https://github.com/openstack/nova/blob/bcb96f362ab12e297f125daa5189fb66345b4976/nova/virt/libvirt/driver.py#L10123
  [4] 
https://github.com/openstack/nova/blob/bcb96f362ab12e297f125daa5189fb66345b4976/nova/virt/libvirt/guest.py#L655
  [5] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainGetJobStats
  [6] https://listman.redhat.com/archives/libvir-list/2021-January/213631.html
  [7] 
https://github.com/qemu/qemu/commit/552de79bfdd5e9e53847eb3c6d6e4cd898a4370e

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1982284/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp