[Yahoo-eng-team] [Bug 1992139] [NEW] Wrong url : compute admin guide
Public bug reported: in https://docs.openstack.org/api-guide/compute/users.html, in Role based access control compute admin guide link is crashed. -> https://docs.openstack.org/nova/latest//admin/arch#projects-users-and-roles I think correct url is this. -> https://docs.openstack.org/nova/latest//admin/architecture.html#projects-users-and-roles This bug tracker is for errors with the documentation, use the following as a template and remove or add fields as you see fit. Convert [ ] into [x] to check boxes: - [x] This doc is inaccurate in this way: __ - [ ] This is a doc addition request. - [ ] I have a fix to the document that I can paste below including example: input and output. If you have a troubleshooting or support issue, use the following resources: - The mailing list: https://lists.openstack.org - IRC: 'openstack' channel on OFTC --- Release: 2.1.0 on 2022-08-24 10:31:37 SHA: 51ae2e6baf7d33853f496ac7b69edf46549bfc02 Source: https://opendev.org/openstack/nova/src/api-guide/source/users.rst URL: https://docs.openstack.org/api-guide/compute/users.html ** Affects: nova Importance: Undecided Status: New ** Tags: api-guide -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1992139 Title: Wrong url : compute admin guide Status in OpenStack Compute (nova): New Bug description: in https://docs.openstack.org/api-guide/compute/users.html, in Role based access control compute admin guide link is crashed. -> https://docs.openstack.org/nova/latest//admin/arch#projects-users-and-roles I think correct url is this. -> https://docs.openstack.org/nova/latest//admin/architecture.html#projects-users-and-roles This bug tracker is for errors with the documentation, use the following as a template and remove or add fields as you see fit. Convert [ ] into [x] to check boxes: - [x] This doc is inaccurate in this way: __ - [ ] This is a doc addition request. - [ ] I have a fix to the document that I can paste below including example: input and output. If you have a troubleshooting or support issue, use the following resources: - The mailing list: https://lists.openstack.org - IRC: 'openstack' channel on OFTC --- Release: 2.1.0 on 2022-08-24 10:31:37 SHA: 51ae2e6baf7d33853f496ac7b69edf46549bfc02 Source: https://opendev.org/openstack/nova/src/api-guide/source/users.rst URL: https://docs.openstack.org/api-guide/compute/users.html To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1992139/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1992161] [NEW] Unknown quota resource security_group_rule in neutron-rpc-server
Public bug reported: When restarting our linuxbridge-agents, we see exceptions for some of the networks: Unknown quota resources ['security_group_rule']. This stops the linuxbridge-agent from fully bringing up that network. Prerequisites: * run api-server and rpc-server in different process We have neutron-server running with uWSGI and start the neutron-rpc-server in another container. Steps to reproduce: * have a project with server/network/ports * have an unused default security group * delete the default security group * restart the appropriate linuxbridge-agent Version: * Ussuri with custom patches on top: https://github.com/sapcc/neutron Expected behavior: linuxbridge-agent should bring up all networks even if the user deleted the default security group. Either don't create a default security-group when called via the linuxbridge-agent instead of the API or make the quota available in the rpc-server so the default security-group can be created. Creating/updating a port or creating a network via API will create the default security group and fix the problem on the linuxbridge-agent, too. I just don't think that's acceptable to have the user/admin do some API actions in case the user did something they maybe shouldn't have. We've also seen the same exception from a dhcp-agent. Attached both a traceback from linuxbridge as well as from dhcp-agent. Trying to debug this, we found that no quota resources are registered in neutron-rpc-server. This can be seen when using the eventlet backdoor by these commands: from neutron.quota import resource_registry; resource_registry.get_all_resources() ** Affects: neutron Importance: Undecided Status: New ** Attachment added: "tracebacks from dhcp-agent and linuxbridge agent calling neutron-rpc-server" https://bugs.launchpad.net/bugs/1992161/+attachment/5622035/+files/rpc-no-default-security-group-creation.txt -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1992161 Title: Unknown quota resource security_group_rule in neutron-rpc-server Status in neutron: New Bug description: When restarting our linuxbridge-agents, we see exceptions for some of the networks: Unknown quota resources ['security_group_rule']. This stops the linuxbridge-agent from fully bringing up that network. Prerequisites: * run api-server and rpc-server in different process We have neutron-server running with uWSGI and start the neutron-rpc-server in another container. Steps to reproduce: * have a project with server/network/ports * have an unused default security group * delete the default security group * restart the appropriate linuxbridge-agent Version: * Ussuri with custom patches on top: https://github.com/sapcc/neutron Expected behavior: linuxbridge-agent should bring up all networks even if the user deleted the default security group. Either don't create a default security-group when called via the linuxbridge-agent instead of the API or make the quota available in the rpc-server so the default security-group can be created. Creating/updating a port or creating a network via API will create the default security group and fix the problem on the linuxbridge-agent, too. I just don't think that's acceptable to have the user/admin do some API actions in case the user did something they maybe shouldn't have. We've also seen the same exception from a dhcp-agent. Attached both a traceback from linuxbridge as well as from dhcp-agent. Trying to debug this, we found that no quota resources are registered in neutron-rpc-server. This can be seen when using the eventlet backdoor by these commands: from neutron.quota import resource_registry; resource_registry.get_all_resources() To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1992161/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1992169] [NEW] instance_faults entries are created on InstanceInvalidState exceptions
Public bug reported: This might somewhat be related to https://bugs.launchpad.net/nova/+bug/1800755 and discussion there. Recently the following problem was reported in one of our clouds: - a homegrown self-written monitoring that polls servers diagnostics - the monitoring script is naive and does not check the server state before requesting server diagnostics - several servers in shutdown state - instance_faults table is growing and ballooning database size on disk During handling of GET /servers//diagnostics call for anything but RUNNING instance nova raises InstanceInvalidState exception which is then: - stored in instance_faults table; - returns as HTTP409 Conflict to the user. https://opendev.org/openstack/nova/src/commit/03d2715ed492350fa11908aea0fdd0265993e284/nova/compute/manager.py#L6550-L6558 Effectively benign 'read-only' GET requests are recorded in the DB. Also, these instance_faults entries can not purged by standard means since the instance is not deleted yet. What's more, they won't be shown in any API at all, since the server is also not in ERROR state. This got me thinking - should the InvalidInstanceState be saved as instance_faults at all? After all, usually this exception indicates not the problem (fault) with the instance, but the mismatch between instance state and requested action upon instance, which might not warrant storing it. There's also a slight DoS potential here, but since default policy for get diagnostics call is admin-only, this is probably not worth worrying. ** Affects: nova Importance: Undecided Assignee: Pavlo Shchelokovskyy (pshchelo) Status: New ** Changed in: nova Assignee: (unassigned) => Pavlo Shchelokovskyy (pshchelo) -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1992169 Title: instance_faults entries are created on InstanceInvalidState exceptions Status in OpenStack Compute (nova): New Bug description: This might somewhat be related to https://bugs.launchpad.net/nova/+bug/1800755 and discussion there. Recently the following problem was reported in one of our clouds: - a homegrown self-written monitoring that polls servers diagnostics - the monitoring script is naive and does not check the server state before requesting server diagnostics - several servers in shutdown state - instance_faults table is growing and ballooning database size on disk During handling of GET /servers//diagnostics call for anything but RUNNING instance nova raises InstanceInvalidState exception which is then: - stored in instance_faults table; - returns as HTTP409 Conflict to the user. https://opendev.org/openstack/nova/src/commit/03d2715ed492350fa11908aea0fdd0265993e284/nova/compute/manager.py#L6550-L6558 Effectively benign 'read-only' GET requests are recorded in the DB. Also, these instance_faults entries can not purged by standard means since the instance is not deleted yet. What's more, they won't be shown in any API at all, since the server is also not in ERROR state. This got me thinking - should the InvalidInstanceState be saved as instance_faults at all? After all, usually this exception indicates not the problem (fault) with the instance, but the mismatch between instance state and requested action upon instance, which might not warrant storing it. There's also a slight DoS potential here, but since default policy for get diagnostics call is admin-only, this is probably not worth worrying. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1992169/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1991509] Re: [ovn-octavia-provider] router gateway unset + set breaks ovn lb connectivity
Reviewed: https://review.opendev.org/c/openstack/ovn-octavia-provider/+/858363 Committed: https://opendev.org/openstack/ovn-octavia-provider/commit/d889812f3c2deaf630892d65c90cdbafd4435d21 Submitter: "Zuul (22348)" Branch:master commit d889812f3c2deaf630892d65c90cdbafd4435d21 Author: Luis Tomas Bolivar Date: Fri Sep 30 20:13:49 2022 +0200 Ensure lbs are properly configured for router gateway set/unset Before adding support for lbs with VIPs on the provider networks, there was no need to react to gateway chassis creation events, and nothing was done for its deletion. However, after adding the support for that, there is a need of properly handling the creation event for that type of port. For the loadbalancer VIPs on the provider network, processing the event and triggering the lb_creat_lrp_assoc_handler means that the information about the logical router will be added, i.e., the router is added to the loadbalancer external_ids as a lr_ref, while the loadbalancer is also added into the logical_router object (loadbalancer entry). In addition, the lb is also added to the logical switches connected to the router. For the loadbalancer VIPs on tenant networks (which should not be affected by the gateway port creation/deletion), this patch ensures the lb is not added to the logical switch representing the provider network that is connected to the router. Therefore differentiating between lrp ports which has gateway_chassis and the ones that don't, i.e., adding the lb to the switch when the lrp port is the one connecting a subnet with the router, and not doing so when it is the gateway port for the router to the provider network. Closes-Bug: #1991509 Change-Id: Iddd96fd9015230b3dd75aa2182055cf43eb608c1 ** Changed in: neutron Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1991509 Title: [ovn-octavia-provider] router gateway unset + set breaks ovn lb connectivity Status in neutron: Fix Released Bug description: The LogicalRouterPortEvent for gateway_chassis port are skip [1], however if ovn lb VIPs are on provider network, the create event needs to be handle so that the loadbalancer gets properly configured and added to the router [1] https://opendev.org/openstack/ovn-octavia-provider/src/commit/acbf6e7f3e223c088582390475c84464bc27227d/ovn_octavia_provider/event.py#L39 To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1991509/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1992183] [NEW] Openstack: Application credential token remains valid longer than expected
*** This bug is a security vulnerability *** Public security bug reported: Description of problem: Keystone issues tokens with the default lifespan regardless of the lifespan of the application credentials used to issue them. If the configured lifespan of an identity token is set to be 1h, and the application credentials expire in 1 minute from now, a newly issued token will outlive the application credentials used to issue it by 59 minutes. How reproducible: 100% Steps to Reproduce: 1. Create application credentials with short expiration time (e.g. 10 seconds) 2. openstack token issue --> the returned token has standard expiration, for example 1 hour. The script below confirms that the token continue being valid after the application credentials expired. ```bash #!/usr/bin/env bash set -Eeuo pipefail openstack image create --disk-format=raw --container-format=bare --file <(echo 'I am a Glance image') testimage -f json > image.json image_url="$(openstack catalog show glance -f json | jq -r '.endpoints[] | select(.interface=="public").url')$(jq -r '.file' image.json)" openstack application credential create \ --expiration="$(date --utc --date '+10 second' +%Y-%m-%dT%H:%M:%S)" \ token_test \ -f json \ > appcreds.json cat < clouds.yaml clouds: ${OS_CLOUD}: auth: auth_url: application_credential_id: '$(jq -r '.id' appcreds.json)' application_credential_secret: '$(jq -r '.secret' appcreds.json)' auth_type: "v3applicationcredential" identity_api_version: 3 interface: public region_name: EOF # Override ~/.config/openstack/secure.yaml touch secure.yaml openstack token issue -f json > token.json echo "appcreds expiration: $(jq -r '.expires_at' appcreds.json)" for i in {1..10}; do sleep 100 echo -ne "$(date --utc --rfc-3339=seconds)\t" curl -isS -H "X-Auth-Token: $(jq -r '.id' token.json)" --url "$image_url" | head -n1 done ``` Actual results (on a cloud with tokens duration of 24h): appcreds expiration: 2022-07-08T13:55:02.00 2022-07-08 13:56:38+00:00 HTTP/1.1 200 OK 2022-07-08 13:58:19+00:00 HTTP/1.1 200 OK 2022-07-08 14:00:00+00:00 HTTP/1.1 200 OK 2022-07-08 14:01:42+00:00 HTTP/1.1 200 OK 2022-07-08 14:03:23+00:00 HTTP/1.1 200 OK 2022-07-08 14:05:07+00:00 HTTP/1.1 200 OK 2022-07-08 14:06:49+00:00 HTTP/1.1 200 OK 2022-07-08 14:08:37+00:00 HTTP/1.1 200 OK 2022-07-08 14:10:18+00:00 HTTP/1.1 200 OK 2022-07-08 14:12:00+00:00 HTTP/1.1 200 OK Expected results: appcreds expiration: 2022-07-08T13:55:02.00 2022-07-08 13:54:38+00:00 HTTP/1.1 200 OK 2022-07-08 13:58:19+00:00 HTTP/1.1 401 Unauthorized 2022-07-08 14:00:00+00:00 HTTP/1.1 401 Unauthorized 2022-07-08 14:01:42+00:00 HTTP/1.1 401 Unauthorized 2022-07-08 14:03:23+00:00 HTTP/1.1 401 Unauthorized 2022-07-08 14:05:07+00:00 HTTP/1.1 401 Unauthorized 2022-07-08 14:06:49+00:00 HTTP/1.1 401 Unauthorized 2022-07-08 14:08:37+00:00 HTTP/1.1 401 Unauthorized 2022-07-08 14:10:18+00:00 HTTP/1.1 401 Unauthorized 2022-07-08 14:12:00+00:00 HTTP/1.1 401 Unauthorized ** Affects: keystone Importance: High Assignee: David Wilde (dave-wilde) Status: New ** Affects: ossa Importance: Undecided Assignee: David Wilde (dave-wilde) Status: New ** Also affects: ossa Importance: Undecided Status: New ** Changed in: ossa Assignee: (unassigned) => David Wilde (dave-wilde) ** Changed in: keystone Importance: Undecided => High -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Identity (keystone). https://bugs.launchpad.net/bugs/1992183 Title: Openstack: Application credential token remains valid longer than expected Status in OpenStack Identity (keystone): New Status in OpenStack Security Advisory: New Bug description: Description of problem: Keystone issues tokens with the default lifespan regardless of the lifespan of the application credentials used to issue them. If the configured lifespan of an identity token is set to be 1h, and the application credentials expire in 1 minute from now, a newly issued token will outlive the application credentials used to issue it by 59 minutes. How reproducible: 100% Steps to Reproduce: 1. Create application credentials with short expiration time (e.g. 10 seconds) 2. openstack token issue --> the returned token has standard expiration, for example 1 hour. The script below confirms that the token continue being valid after the application credentials expired. ```bash #!/usr/bin/env bash set -Eeuo pipefail openstack image create --disk-format=raw --container-format=bare --file <(echo 'I am a Glance image') testimage -f json > image.json image_url="$(openstack catalog show glance -f json | jq -r
[Yahoo-eng-team] [Bug 1992186] [NEW] "int object is not iterable" when using numerical group names
Public bug reported: When using federation and having the values of `groups` in the mapping set to a number, it will be parsed into a a number and then fail to authenticate: ``` {"error":{"code":400,"message":"'int' object is not iterable","title":"Bad Request"}} ``` I believe the bad bit is here: https://github.com/openstack/keystone/blob/326b014434cc760ba08763e1870ac057f7917e98/keystone/federation/utils.py#L650-L661 ** Affects: keystone Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Identity (keystone). https://bugs.launchpad.net/bugs/1992186 Title: "int object is not iterable" when using numerical group names Status in OpenStack Identity (keystone): New Bug description: When using federation and having the values of `groups` in the mapping set to a number, it will be parsed into a a number and then fail to authenticate: ``` {"error":{"code":400,"message":"'int' object is not iterable","title":"Bad Request"}} ``` I believe the bad bit is here: https://github.com/openstack/keystone/blob/326b014434cc760ba08763e1870ac057f7917e98/keystone/federation/utils.py#L650-L661 To manage notifications about this bug go to: https://bugs.launchpad.net/keystone/+bug/1992186/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1933802] Re: missing global_request_id in neutron_lib context from_dict method
Reviewed: https://review.opendev.org/c/openstack/neutron-lib/+/859813 Committed: https://opendev.org/openstack/neutron-lib/commit/9ecd5995b6c598cee931087bf13fdd166f404034 Submitter: "Zuul (22348)" Branch:master commit 9ecd5995b6c598cee931087bf13fdd166f404034 Author: Kiran Pawar Date: Thu Sep 29 08:40:30 2022 + Use oslo_context.from_dict() for context generation Use RequestContext.from_dict in oslo_context to generate context, and add/update needed attr such as user_id, tenant_id, tenant_name and timestamp. Closes-bug: #1933802 Change-Id: I0527eb5fa8d32d97ca45e44d1b154b6529b3f847 ** Changed in: neutron Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1933802 Title: missing global_request_id in neutron_lib context from_dict method Status in neutron: Fix Released Bug description: code: @classmethod def from_dict(cls, values): return cls(user_id=values.get('user_id', values.get('user')), ¦ ¦ ¦ tenant_id=values.get('tenant_id', values.get('project_id')), ¦ ¦ ¦ is_admin=values.get('is_admin'), ¦ ¦ ¦ roles=values.get('roles'), ¦ ¦ ¦ timestamp=values.get('timestamp'), ¦ ¦ ¦ request_id=values.get('request_id'), ¦ ¦ ¦ #global_request_id=values.get('global_request_id'), ¦ ¦ ¦ tenant_name=values.get('tenant_name'), ¦ ¦ ¦ user_name=values.get('user_name'), ¦ ¦ ¦ auth_token=values.get('auth_token')) project: neutron_lib path: neutron_lib/context.py please note the comment line, which should have been passed. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1933802/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1982284] Re: libvirt live migration sometimes fails with "libvirt.libvirtError: internal error: migration was active, but no RAM info was set"
** Also affects: nova/train Importance: Undecided Status: New ** Also affects: nova/victoria Importance: Undecided Status: New ** Also affects: nova/xena Importance: Undecided Status: New ** Also affects: nova/yoga Importance: Undecided Status: New ** Also affects: nova/wallaby Importance: Undecided Status: New ** Also affects: nova/ussuri Importance: Undecided Status: New ** Also affects: nova/zed Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1982284 Title: libvirt live migration sometimes fails with "libvirt.libvirtError: internal error: migration was active, but no RAM info was set" Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) train series: In Progress Status in OpenStack Compute (nova) ussuri series: In Progress Status in OpenStack Compute (nova) victoria series: In Progress Status in OpenStack Compute (nova) wallaby series: In Progress Status in OpenStack Compute (nova) xena series: In Progress Status in OpenStack Compute (nova) yoga series: In Progress Status in OpenStack Compute (nova) zed series: In Progress Bug description: We have seen this downstream where live migration randomly fails with the following error [1]: libvirt.libvirtError: internal error: migration was active, but no RAM info was set Discussion on [1] gravitated toward a possible race condition issue in qemu around the query-migrate command [2]. The query-migrate command is used (indirectly) by the libvirt driver during monitoring of live migrations [3][4][5]. While searching for info about this error, I found a thread on libvir- list from the past [6] where someone else encountered the same error and for them it happened if they called query-migrate *after* a live migration had completed. Based on this, it seemed possible that our live migration monitoring thread sometimes races and calls jobStats() after the migration has completed, resulting in this error being raised and the migration being considered failed when it was actually complete. A patch has since been proposed and committed [7] to address the possible issue. Meanwhile, on our side in nova, we can mitigate this problematic behavior by catching the specific error from libvirt and ignoring it so that a live migration in this situation will be considered completed by the libvirt driver. Doing this would improve the experience for users that are hitting this error and getting erroneous live migration failures. [1] https://bugzilla.redhat.com/show_bug.cgi?id=2074205 [2] https://qemu.readthedocs.io/en/latest/interop/qemu-qmp-ref.html#qapidoc-1848 [3] https://github.com/openstack/nova/blob/bcb96f362ab12e297f125daa5189fb66345b4976/nova/virt/libvirt/driver.py#L10123 [4] https://github.com/openstack/nova/blob/bcb96f362ab12e297f125daa5189fb66345b4976/nova/virt/libvirt/guest.py#L655 [5] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainGetJobStats [6] https://listman.redhat.com/archives/libvir-list/2021-January/213631.html [7] https://github.com/qemu/qemu/commit/552de79bfdd5e9e53847eb3c6d6e4cd898a4370e To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1982284/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp