[Yahoo-eng-team] [Bug 1940834] Re: Horizon not show flavor details in instance and resize is not possible - Flavor ID is not supported by nova

2022-02-16 Thread OpenStack Infra
Reviewed:  https://review.opendev.org/c/openstack/horizon/+/808102
Committed: 
https://opendev.org/openstack/horizon/commit/d269b1640f49e13aa1693a175083d66a3eaf5386
Submitter: "Zuul (22348)"
Branch:master

commit d269b1640f49e13aa1693a175083d66a3eaf5386
Author: Vadym Markov 
Date:   Thu Sep 9 17:52:40 2021 +0300

Fix for "Resize instance" button

Currently, "Resize instance" widget is not working because it relies on
legacy Nova API v2.46, obsoleted in Pike release. Proposed patch make
Horizon use current Nova API (>=2.47).

Closes-Bug: #1940834
Co-Authored-By: Akihiro Motoki 
Change-Id: Id2f38acfc27cdf93cc4341422873e512aaff716a


** Changed in: horizon
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Dashboard (Horizon).
https://bugs.launchpad.net/bugs/1940834

Title:
  Horizon not show flavor details in instance and resize is not possible
  - Flavor ID is not supported by nova

Status in OpenStack Dashboard (Horizon):
  Fix Released

Bug description:
  In horizon on Wallaby and Victoria release, there are some view and
  function which are using ID value from Instance's Flavor part of JSON.
  The main issue is when you want to resize instance, you are receiving
  output below. The issue is also on Instance detail is specs, where
  Flavor is Not available. But on all instances view, this is working
  fine and base on detail of instance object and it's details, it looks
  like this view is using different methods based on older API.

  We are running Wallaby dashboard with openstack-helm project with nova-api 
2.88
  Nova version:
  {"versions": [{"id": "v2.0", "status": "SUPPORTED", "version": "", 
"min_version": "", "updated": "2011-01-21T11:33:21Z", "links": [{"rel": "self", 
"href": "http://nova.openstack.svc.cluster.local/v2/"}]}, {"id": "v2.1", 
"status": "CURRENT", "version": "2.88", "min_version": "2.1", "updated": 
"2013-07-23T11:33:21Z", "links": [{"rel": "self", "href": 
"http://nova.openstack.svc.cluster.local/v2.1/"}]}]})

  For example for resize initialization the log output is:

  2021-08-23 12:20:30.308473 Internal Server Error: 
/project/instances/a872bcc6-0a56-413a-9bea-b27dc006c707/resize
  2021-08-23 12:20:30.308500 Traceback (most recent call last):
  2021-08-23 12:20:30.308503   File 
"/var/lib/openstack/lib/python3.6/site-packages/horizon/utils/memoized.py", 
line 107, in wrapped
  2021-08-23 12:20:30.308505 value = cache[key] = cache.pop(key)
  2021-08-23 12:20:30.308507 KeyError: ((,), ())
  2021-08-23 12:20:30.308509
  2021-08-23 12:20:30.308512 During handling of the above exception, another 
exception occurred:
  2021-08-23 12:20:30.308513
  2021-08-23 12:20:30.308515 Traceback (most recent call last):
  2021-08-23 12:20:30.308517   File 
"/var/lib/openstack/lib/python3.6/site-packages/django/core/handlers/exception.py",
 line 34, in inner
  2021-08-23 12:20:30.308519 response = get_response(request)
  2021-08-23 12:20:30.308521   File 
"/var/lib/openstack/lib/python3.6/site-packages/django/core/handlers/base.py", 
line 115, in _get_response
  2021-08-23 12:20:30.308523 response = 
self.process_exception_by_middleware(e, request)
  2021-08-23 12:20:30.308525   File 
"/var/lib/openstack/lib/python3.6/site-packages/django/core/handlers/base.py", 
line 113, in _get_response
  2021-08-23 12:20:30.308527 response = wrapped_callback(request, 
*callback_args, **callback_kwargs)
  2021-08-23 12:20:30.308529   File 
"/var/lib/openstack/lib/python3.6/site-packages/horizon/decorators.py", line 
52, in dec
  2021-08-23 12:20:30.308531 return view_func(request, *args, **kwargs)
  2021-08-23 12:20:30.308533   File 
"/var/lib/openstack/lib/python3.6/site-packages/horizon/decorators.py", line 
36, in dec
  2021-08-23 12:20:30.308534 return view_func(request, *args, **kwargs)
  2021-08-23 12:20:30.308536   File 
"/var/lib/openstack/lib/python3.6/site-packages/horizon/decorators.py", line 
36, in dec
  2021-08-23 12:20:30.308538 return view_func(request, *args, **kwargs)
  2021-08-23 12:20:30.308540   File 
"/var/lib/openstack/lib/python3.6/site-packages/horizon/decorators.py", line 
112, in dec
  2021-08-23 12:20:30.308542 return view_func(request, *args, **kwargs)
  2021-08-23 12:20:30.308543   File 
"/var/lib/openstack/lib/python3.6/site-packages/horizon/decorators.py", line 
84, in dec
  2021-08-23 12:20:30.308545 return view_func(request, *args, **kwargs)
  2021-08-23 12:20:30.308547   File 
"/var/lib/openstack/lib/python3.6/site-packages/django/views/generic/base.py", 
line 71, in view
  2021-08-23 12:20:30.308549 return self.dispatch(request, *args, **kwargs)
  2021-08-23 12:20:30.308551   File 
"/var/lib/openstack/lib/python3.6/site-packages/django/views/generic/base.py", 
line 97, in dispatch
  2021-08-23 12:20:30.308553 return handler(request, *args, **kwargs)
  2021-08-23 

[Yahoo-eng-team] [Bug 1956965] Re: [FT] Test "test_port_dhcp_options" failing

2022-02-16 Thread OpenStack Infra
Reviewed:  https://review.opendev.org/c/openstack/neutron/+/825530
Committed: 
https://opendev.org/openstack/neutron/commit/654c3b796fb467f92ce06528f64904086b0beb17
Submitter: "Zuul (22348)"
Branch:master

commit 654c3b796fb467f92ce06528f64904086b0beb17
Author: elajkat 
Date:   Thu Jan 20 15:56:58 2022 +0100

OVN TestNBDbResources wait for NB_Global table to be present

In functional jobs (neutron-functional-with-uwsgi)
test_ovn_db_resources.TestNBDbResources.test_port_dhcp_options failes
time-to-time at the phase of creating network.
Wait for NB_Global table at the setUp() phase of TestNBDbResources.

Change-Id: I92132233dae77ffbbd5565caa320c7dac19e2194
Closes-Bug: #1956965


** Changed in: neutron
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1956965

Title:
  [FT] Test "test_port_dhcp_options" failing

Status in neutron:
  Fix Released

Bug description:
  Log:
  
https://e36203e60051d918bd96-b4b1a7d89013756684de846d3b70c9e9.ssl.cf2.rackcdn.com/823498/4/gate/neutron-
  functional-with-uwsgi/946416c/testr_results.html

  Snippet: https://paste.opendev.org/show/811996/

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1956965/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1961112] [NEW] [ovn] overlapping security group rules break neutron-ovn-db-sync-util

2022-02-16 Thread Daniel Speichert
Public bug reported:

Neutron (Xena) is happy to accept equivalent rules with overlapping
remote CIDR prefix as long as the notation is different, e.g. 10.0.0.0/8
and 10.0.0.1/8.

However, OVN is smarter, normalizes the prefix and figures out that they
both are 10.0.0.0/8.

This does not have any fatal effects in a running OVN deployment
(creating and using such rules does not even trigger a warning) but upon
running neutron-ovn-db-sync-util, it crashes and won't perform a sync.
This is a blocker for upgrades (and other scenarios).


Security group's rules:

$ openstack security group rule list overlap-sgr
+--+-+---+++---+---+--+
| ID   | IP Protocol | Ethertype | IP Range   | 
Port Range | Direction | Remote Security Group | Remote Address Group |
+--+-+---+++---+---+--+
| 3c41fa80-1d23-49c9-9ec1-adf581e07e24 | tcp | IPv4  | 10.0.0.1/8 | 
   | ingress   | None  | None |
| 639d263e-6873-47cb-b2c4-17fc824252db | None| IPv4  | 0.0.0.0/0  | 
   | egress| None  | None |
| 96e99039-cbc0-48fe-98fe-ef28d41b9d9b | tcp | IPv4  | 10.0.0.0/8 | 
   | ingress   | None  | None |
| bf9160a3-fc9b-467e-85d5-c889811fd6ca | None| IPv6  | ::/0   | 
   | egress| None  | None |
+--+-+---+++---+---+--+


Log excerpt:
16/Feb/2022:20:55:40.568 527216 INFO neutron.cmd.ovn.neutron_ovn_db_sync_util 
[req-c595a893-db9b-484e-ae8a-bb7dbe8b31f3 - - - - -] Sync for Northbound db 
started with mode : repair
16/Feb/2022:20:55:42.105 527216 INFO 
neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.extensions.qos 
[req-c595a893-db9b-484e-ae8a-bb7dbe8b31f3 - - - - -] Starting 
OVNClientQosExtension
16/Feb/2022:20:55:42.380 527216 INFO neutron.db.ovn_revision_numbers_db 
[req-c595a893-db9b-484e-ae8a-bb7dbe8b31f3 - - - - -] Successfully bumped 
revision number for resource 49b3249a-7624-4711-b271-3e63c6a27658 (type: ports) 
to 17
16/Feb/2022:20:55:43.205 527216 WARNING 
neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.ovn_db_sync 
[req-c595a893-db9b-484e-ae8a-bb7dbe8b31f3 - - - - -] ACLs-to-be-added 1 
ACLs-to-be-removed 0
16/Feb/2022:20:55:43.206 527216 WARNING 
neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.ovn_db_sync 
[req-c595a893-db9b-484e-ae8a-bb7dbe8b31f3 - - - - -] ACL found in Neutron but 
not in OVN DB for port group pg_e90b68f3_9f8d_4250_9b6a_7531e2249c99
16/Feb/2022:20:55:43.208 527216 ERROR ovsdbapp.backend.ovs_idl.transaction 
[req-c595a893-db9b-484e-ae8a-bb7dbe8b31f3 - - - - -] Traceback (most recent 
call last):
  File "/usr/lib/python3/dist-packages/ovsdbapp/backend/ovs_idl/connection.py", 
line 131, in run
txn.results.put(txn.do_commit())
  File 
"/usr/lib/python3/dist-packages/ovsdbapp/backend/ovs_idl/transaction.py", line 
93, in do_commit
command.run_idl(txn)
  File 
"/usr/lib/python3/dist-packages/ovsdbapp/schema/ovn_northbound/commands.py", 
line 123, in run_idl
raise RuntimeError("ACL (%s, %s, %s) already exists" % (
RuntimeError: ACL (to-lport, 1002, outport == 
@pg_e90b68f3_9f8d_4250_9b6a_7531e2249c99 && ip4 && ip4.src == 10.0.0.0/8 && 
tcp) already exists

** Affects: neutron
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1961112

Title:
  [ovn] overlapping security group rules break neutron-ovn-db-sync-util

Status in neutron:
  New

Bug description:
  Neutron (Xena) is happy to accept equivalent rules with overlapping
  remote CIDR prefix as long as the notation is different, e.g.
  10.0.0.0/8 and 10.0.0.1/8.

  However, OVN is smarter, normalizes the prefix and figures out that
  they both are 10.0.0.0/8.

  This does not have any fatal effects in a running OVN deployment
  (creating and using such rules does not even trigger a warning) but
  upon running neutron-ovn-db-sync-util, it crashes and won't perform a
  sync. This is a blocker for upgrades (and other scenarios).

  
  Security group's rules:

  $ openstack security group rule list overlap-sgr
  
+--+-+---+++---+---+--+
  | ID   | IP Protocol | Ethertype | IP Range   
| Port Range | Direction | Remote Security Group | Remote Address Group |
  

[Yahoo-eng-team] [Bug 1960944] Re: cloudinit.sources.DataSourceNotFoundException: Did not find any data source, searched classes

2022-02-16 Thread dann frazier
While we do have sporadic messages like this in our nginx error.log,
they started piling up around the time this issue was reported to us,
starting with this message:

2022/02/15 01:49:24 [error] 3341359#3341359: *1929977 upstream timed out
(110: Connection timed out) while reading response header from upstream,
client: 10.229.95.139, server: , request: "POST
/MAAS/metadata/status/ww4mgk HTTP/1.1", upstream:
"http://10.155.212.2:5240/MAAS/metadata/status/ww4mgk;, host:
"10.229.32.21:5248"

Around this time we started seeing these pile up in rackd.log:
2022-02-15 01:40:07 provisioningserver.rpc.clusterservice: [critical] Failed to 
contact region. (While requesting RPC info at http://localhost:5240/MAAS).

Our regiond processes are running, and I don't see anything that seems
abnormal in the regiond log around this time. However, these symptoms
reminded me of a similar issue in bug 1908452, so I started debugging it
similarly. Like bug 1908452, I see one regiond process stuck in a recv
call:

root@maas:/var/snap/maas/common/log# strace -p 3340720
strace: Process 3340720 attached
recvfrom(23, 

All the other regiond processes are making progress, but not this one.

The server it is talking to appears to be this canonical server, which I
can't currently resolve:

root@maas:/var/snap/maas/common/log# lsof -i -a -p  3340720 | grep 23
python3 3340720 root   23u  IPv4 3487880288  0t0  TCP 
maas:42848->https-services.aerodent.canonical.com:http (ESTABLISHED)
root@maas:/var/snap/maas/common/log# host https-services.aerodent.canonical.com
Host https-services.aerodent.canonical.com not found: 3(NXDOMAIN)

However, I suspect it maybe related to image fetching again. In our
regiond logs, I see that the the last log entry related to images
appears to have been about an hour before things locked up:

root@maas:/var/snap/maas/common/log# grep image regiond.log | tail -1
2022-02-15 00:38:51 regiond: [info] 127.0.0.1 GET 
/MAAS/images-stream/streams/v1/maas:v2:download.json HTTP/1.1 --> 200 OK 
(referrer: -; agent: python-simplestreams/0.1)

Prior to that, we have log entries every hour, but none after. So maybe
simplestreams has other places that need a timeout?

** Changed in: cloud-init
   Status: New => Invalid

** Also affects: simplestreams
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to cloud-init.
https://bugs.launchpad.net/bugs/1960944

Title:
  cloudinit.sources.DataSourceNotFoundException: Did not find any data
  source, searched classes

Status in cloud-init:
  Invalid
Status in MAAS:
  New
Status in simplestreams:
  New

Bug description:
  Not able to deploy baremetal (arm64 and amd64) on a 
  snap-based MAAS: 3.1.0 (maas  3.1.0-10901-g.f1f8f1505  18199  3.1/stable) 

  from MAAS event log: 
  ```
  Tue, 15 Feb. 2022 17:35:33Node changed status - From 'Deploying' to 
'Failed deployment'
   Tue, 15 Feb. 2022 17:35:33   Marking node failed - Node operation 
'Deploying' timed out after 30 minutes.
   Tue, 15 Feb. 2022 17:07:44   Node installation - 'cloudinit' searching for 
network data from DataSourceMAAS
   Tue, 15 Feb. 2022 17:06:44   Node installation - 'cloudinit' attempting to 
read from cache [trust]
   Tue, 15 Feb. 2022 17:06:42   Node installation - 'cloudinit' attempting to 
read from cache [check]
   Tue, 15 Feb. 2022 17:05:29   Performing PXE boot
   Tue, 15 Feb. 2022 17:05:29   PXE Request - installation
   Tue, 15 Feb. 2022 17:03:52   Node powered on
  ```

  
  Server console log shows: 

  ```
  ubuntu login:  Starting Message of the Day...
  [  OK  ] Listening on Socket unix for snap application lxd.daemon.
   Starting Service for snap application lxd.activate...
  [  OK  ] Finished Service for snap application lxd.activate.
  [  OK  ] Started snap.lxd.hook.conf…-4400-96a8-0c5c9e438c51.scope.
   Starting Time & Date Service...
  [  OK  ] Started Time & Date Service.
  [  OK  ] Finished Wait until snapd is fully seeded.
   Starting Apply the settings specified in cloud-config...
  [  OK  ] Reached target Multi-User System.
  [  OK  ] Reached target Graphical Interface.
   Starting Update UTMP about System Runlevel Changes...
  [  OK  ] Finished Update UTMP about System Runlevel Changes.
  [  322.036861] cloud-init[2034]: Can not apply stage config, no datasource 
found! Likely bad things to come!
  [  322.037477] cloud-init[2034]: 

  [  322.037907] cloud-init[2034]: Traceback (most recent call last):
  [  322.038341] cloud-init[2034]:   File 
"/usr/lib/python3/dist-packages/cloudinit/cmd/main.py", line 521, in 
main_modules
  [  322.038783] cloud-init[2034]: init.fetch(existing="trust")
  [  322.039181] cloud-init[2034]:   File 
"/usr/lib/python3/dist-packages/cloudinit/stages.py", line 411, in fetch
  [  322.039584] cloud-init[2034]: return 

[Yahoo-eng-team] [Bug 1960902] Re: Wallaby ovb fs001 failing on tempest.api.compute.servers.test_delete_server.DeleteServersTestJSON.test_delete_server_while_in_building_state

2022-02-16 Thread Alan Pevec
** Also affects: nova
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1960902

Title:
  Wallaby ovb fs001 failing on
  
tempest.api.compute.servers.test_delete_server.DeleteServersTestJSON.test_delete_server_while_in_building_state

Status in OpenStack Compute (nova):
  New
Status in tripleo:
  Triaged

Bug description:
  Reporting due the fact the test fails, and on the failure says this is
  a nova internal error and should be reported as bug:

  
  Logs:
  
https://logserver.rdoproject.org/49/39449/2/check/periodic-tripleo-ci-centos-9-ovb-3ctlr_1comp-featureset001-wallaby/6607433/logs/

  
  Error on tempest side:

  ft1.3: 
tempest.api.compute.servers.test_delete_server.DeleteServersTestJSON.test_delete_server_while_in_building_state[id-9e6e0c87-3352-42f7-9faf-5d6210dbd159]testtools.testresult.real._StringException:
 pythonlogging:'': {{{
  2022-02-14 17:49:07,053 254588 INFO [tempest.lib.common.rest_client] 
Request (DeleteServersTestJSON:test_delete_server_while_in_building_state): 201 
POST https://10.0.0.5:13000/v3/auth/tokens 0.474s
  2022-02-14 17:49:07,054 254588 DEBUG[tempest.lib.common.rest_client] 
Request - Headers: {'Content-Type': 'application/json', 'Accept': 
'application/json'}
  Body: 
  Response - Headers: {'date': 'Mon, 14 Feb 2022 22:49:06 GMT', 'server': 
'Apache', 'content-length': '5989', 'x-subject-token': '', 'vary': 
'X-Auth-Token', 'x-openstack-request-id': 
'req-07da4513-88c6-4ade-a4f7-b1f8b75595c2', 'content-type': 'application/json', 
'connection': 'close', 'status': '201', 'content-location': 
'https://10.0.0.5:13000/v3/auth/tokens'}
  Body: b'{"token": {"methods": ["password"], "user": {"domain": {"id": 
"default", "name": "Default"}, "id": "10d3ad43a61641ce8182ca7275eadae3", 
"name": "tempest-DeleteServersTestJSON-1760533763-project", 
"password_expires_at": null}, "audit_ids": ["lZA50RCXTlqRxKgejUylog"], 
"expires_at": "2022-02-14T23:49:07.00Z", "issued_at": 
"2022-02-14T22:49:07.00Z", "project": {"domain": {"id": "default", "name": 
"Default"}, "id": "833c1ddd2dfb4db8a31719eba1705a4b", "name": 
"tempest-DeleteServersTestJSON-1760533763"}, "is_domain": false, "roles": 
[{"id": "69eeb16b59ff4b6f9cb6e2eb34025513", "name": "reader"}, {"id": 
"946f9c5be3ca413c9f8ae3261ed391c5", "name": "member"}], "catalog": 
[{"endpoints": [{"id": "20fa93c3887648949dfeb21c594b7c0b", "interface": 
"admin", "region_id": "regionOne", "url": "http://172.17.0.173:9696;, "region": 
"regionOne"}, {"id": "a710cf1fd64e4293bb60d54e29074a99", "interface": "public", 
"region_id": "regionOne", "url": "https://10.0.0.5:13696;, "region": 
"regionOne"}, {"id": "f7f132e73d1243f984bd2d4a6db0bedb", "interface": 
"internal", "region_id": "regionOne", "url": "http://172.17.0.173:9696;, 
"region": "regionOne"}], "id": "0bce0bee2c80453d9b8fe1d47b36a2d0", "type": 
"network", "name": "neutron"}, {"endpoints": [{"id": 
"3c2c1cdd6852421d9905869844fabd34", "interface": "internal", "region_id": 
"regionOne", "url": "http://172.17.0.173:8000/v1;, "region": "regionOne"}, 
{"id": "d948ad8956a642d5a016f164c8d53c8f", "interface": "admin", "region_id": 
"regionOne", "url": "http://172.17.0.173:8000/v1;, "region": "regionOne"}, 
{"id": "e7fa7141606c428a9c582ecd93100f3e", "interface": "public", "region_id": 
"regionOne", "url": "https://10.0.0.5:13005/v1;, "region": "regionOne"}], "id": 
"247706a8fccf414e8e79aed9573e4e4c", "type": "cloudformation", "name": 
"heat-cfn"}, {"endpoints": [{"id": "3847d57ab18a413a99629fabf6cfbf95", 
"interface": "internal", "region_id": "regionOne", "url": 
"http://172.17.0.173:8778/placement;, "region": "regionOne"}, {"id": 
"73f48d783ddd4c658399e9c5ca4e4524", "interface": "admin", "region_id": 
"regionOne", "url": "http://172.17.0.173:8778/placement;, "region": 
"regionOne"}, {"id": "ae5a64a560c54899a1d56ec8755e4692", "interface": "public", 
"region_id": "regionOne", "url": "https://10.0.0.5:13778/placement;, "region": 
"regionOne"}], "id": "3b3d32f96dc2455fa19ebaa1fe46a318", "type": "placement", 
"name": "placement"}, {"endpoints": [{"id": "47412821c43b464790d3b9310a27f298", 
"interface": "internal", "region_id": "regionOne", "url": 
"http://172.17.0.173:8776/v3/833c1ddd2dfb4db8a31719eba1705a4b;, "region": 
"regionOne"}, {"id": "ae54f76d2c2a4e518ac09c38094e36d0", "interface": "public", 
"region_id": "regionOne", "url": 
"https://10.0.0.5:13776/v3/833c1ddd2dfb4db8a31719eba1705a4b;, "region": 
"regionOne"}, {"id": "cead87f50b9040658aa8897f38cb8ff0", "interface": "admin", 
"region_id": "regionOne", "url": 
"http://172.17.0.173:8776/v3/833c1ddd2dfb4db8a31719eba1705a4b;, "region": 
"regionOne"}], "id": "53dc49ca65b447ba943e5def068e8859", "type": "volumev3", 
"name": "cinderv3"}, {"endpoints": [{"id": "1b82f0b12b474c75bb9c3e4d31fe5ec4", 
"interface": "public", "region_id": "regionOne", "url": 

[Yahoo-eng-team] [Bug 1961068] [NEW] nova-ceph-multistore job fails with mysqld got oom-killed

2022-02-16 Thread Elod Illes
Public bug reported:

Searching through the jobs showed that nova-ceph-multistore job fails
time to time with DB crash due to out of memory error.

In the tempest errors the following message can be seen:

tempest.lib.exceptions.ServerFault: Got server fault
Details: Unexpected API Error. Please report this at 
http://bugs.launchpad.net/nova/ and attach the Nova API log if possible.


in mysqld error logs (controller/logs/mysql/error_log.txt) the crash
recovery is visible:

2022-02-15T19:26:40.245179Z 0 [System] [MY-010229] [Server] Starting XA crash 
recovery...
2022-02-15T19:26:40.268204Z 0 [System] [MY-010232] [Server] XA crash recovery 
finished.

and around that time in syslog (controller/logs/syslog.txt) the Out of
Memory logs can be seen:

Feb 15 19:26:35 ubuntu-focal-ovh-gra1-0028467853 kernel: 
oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/system.slice/mysql.service,task=mysqld,pid=67959,uid=116
Feb 15 19:26:35 ubuntu-focal-ovh-gra1-0028467853 kernel: Out of memory: Killed 
process 67959 (mysqld) total-vm:5127600kB, anon-rss:756064kB, file-rss:0kB, 
shmem-rss:0kB, UID:116 pgtables:2388kB oom_score_adj:0
Feb 15 19:26:35 ubuntu-focal-ovh-gra1-0028467853 kernel: oom_reaper: reaped 
process 67959 (mysqld), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB


The error only comes in nova-ceph-multistore job. (see recent occurrences via 
logsearch: https://paste.opendev.org/show/bQNKfoaMafUyNFCyQ0kN/ ) Mostly 
happens on current master branch (yoga), but example error found in wallaby as 
well: 
https://zuul.opendev.org/t/openstack/build/d8a6a9c1496346dda6986db00c06a616

** Affects: nova
 Importance: High
 Status: Confirmed


** Tags: gate-failure

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1961068

Title:
  nova-ceph-multistore job fails with mysqld got oom-killed

Status in OpenStack Compute (nova):
  Confirmed

Bug description:
  Searching through the jobs showed that nova-ceph-multistore job fails
  time to time with DB crash due to out of memory error.

  In the tempest errors the following message can be seen:

  tempest.lib.exceptions.ServerFault: Got server fault
  Details: Unexpected API Error. Please report this at 
http://bugs.launchpad.net/nova/ and attach the Nova API log if possible.
  

  in mysqld error logs (controller/logs/mysql/error_log.txt) the crash
  recovery is visible:

  2022-02-15T19:26:40.245179Z 0 [System] [MY-010229] [Server] Starting XA crash 
recovery...
  2022-02-15T19:26:40.268204Z 0 [System] [MY-010232] [Server] XA crash recovery 
finished.

  and around that time in syslog (controller/logs/syslog.txt) the Out of
  Memory logs can be seen:

  Feb 15 19:26:35 ubuntu-focal-ovh-gra1-0028467853 kernel: 
oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/system.slice/mysql.service,task=mysqld,pid=67959,uid=116
  Feb 15 19:26:35 ubuntu-focal-ovh-gra1-0028467853 kernel: Out of memory: 
Killed process 67959 (mysqld) total-vm:5127600kB, anon-rss:756064kB, 
file-rss:0kB, shmem-rss:0kB, UID:116 pgtables:2388kB oom_score_adj:0
  Feb 15 19:26:35 ubuntu-focal-ovh-gra1-0028467853 kernel: oom_reaper: reaped 
process 67959 (mysqld), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

  
  The error only comes in nova-ceph-multistore job. (see recent occurrences via 
logsearch: https://paste.opendev.org/show/bQNKfoaMafUyNFCyQ0kN/ ) Mostly 
happens on current master branch (yoga), but example error found in wallaby as 
well: 
https://zuul.opendev.org/t/openstack/build/d8a6a9c1496346dda6986db00c06a616

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1961068/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1958961] Re: [ovn-octavia-provider] lb create failing with with ValueError: invalid literal for int() with base 10: '24 2001:db8::131/64'

2022-02-16 Thread Luis Tomas Bolivar
This bug is the same as https://bugs.launchpad.net/neutron/+bug/1959903
(or a subset of it).

The fix at https://review.opendev.org/c/openstack/ovn-octavia-
provider/+/827670 also solves this problem

** Changed in: neutron
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1958961

Title:
  [ovn-octavia-provider] lb create failing with with ValueError: invalid
  literal for int() with base 10: '24 2001:db8::131/64'

Status in neutron:
  Fix Released

Bug description:
  When deployed with octavia-ovn-provider with below local.conf,
  loadbalancer create(openstack loadbalancer create --vip-network-id
  public --provider ovn) goes into ERROR state.

  From o-api logs:-
  ERROR ovn_octavia_provider.helper Traceback (most recent call last):
  ERROR ovn_octavia_provider.helper   File 
"/usr/local/lib/python3.8/dist-packages/netaddr/ip/__init__.py", line 811, in 
parse_ip_network
  ERROR ovn_octavia_provider.helper prefixlen = int(val2)
  ERROR ovn_octavia_provider.helper ValueError: invalid literal for int() with 
base 10: '24 2001:db8::131/64'

  Seems regression caused after
  https://review.opendev.org/c/openstack/ovn-octavia-provider/+/816868.

  # Logical switch ports output
  sudo ovn-nbctl find logical_switch_port  type=router 
  _uuid   : 4865f50c-a2cd-4a5c-ae4a-bbc911985fb2
  addresses   : [router]
  dhcpv4_options  : []
  dhcpv6_options  : []
  dynamic_addresses   : []
  enabled : true
  external_ids: {"neutron:cidrs"="172.24.4.149/24 2001:db8::131/64", 
"neutron:device_id"="31a0e24f-6278-4714-b543-cba735a6c49d", 
"neutron:device_owner"="network:router_gateway", 
"neutron:network_name"=neutron-4708e992-cff8-4438-8142-1cc2ac7010db, 
"neutron:port_name"="", "neutron:project_id"="", "neutron:revision_number"="6", 
"neutron:security_group_ids"=""}
  ha_chassis_group: []
  name: "c18869b9--49a8-bc8a-5d2c51db5b6e"
  options : {mcast_flood_reports="true", nat-addresses=router, 
requested-chassis=ykarel-devstack, 
router-port=lrp-c18869b9--49a8-bc8a-5d2c51db5b6e}
  parent_name : []
  port_security   : []
  tag : []
  tag_request : []
  type: router
  up  : true

  _uuid   : f0ed6566-a942-4e2d-94f5-64ccd6bed568
  addresses   : [router]
  dhcpv4_options  : []
  dhcpv6_options  : []
  dynamic_addresses   : []
  enabled : true
  external_ids: {"neutron:cidrs"="fd25:38d5:1d9::1/64", 
"neutron:device_id"="31a0e24f-6278-4714-b543-cba735a6c49d", 
"neutron:device_owner"="network:router_interface", 
"neutron:network_name"=neutron-591d2b8c-3501-49b1-822c-731f2cc9b305, 
"neutron:port_name"="", "neutron:project_id"=f4c9948020024e13a1a091bd09d1fbba, 
"neutron:revision_number"="3", "neutron:security_group_ids"=""}
  ha_chassis_group: []
  name: "e778ac75-a15b-441b-b334-6a7579f851fa"
  options : {router-port=lrp-e778ac75-a15b-441b-b334-6a7579f851fa}
  parent_name : []
  port_security   : []
  tag : []
  tag_request : []
  type: router
  up  : true

  _uuid   : 9c2f3327-ac94-4881-a9c5-a6da87acf6a3
  addresses   : [router]
  dhcpv4_options  : []
  dhcpv6_options  : []
  dynamic_addresses   : []
  enabled : true
  external_ids: {"neutron:cidrs"="10.0.0.1/26", 
"neutron:device_id"="31a0e24f-6278-4714-b543-cba735a6c49d", 
"neutron:device_owner"="network:router_interface", 
"neutron:network_name"=neutron-591d2b8c-3501-49b1-822c-731f2cc9b305, 
"neutron:port_name"="", "neutron:project_id"=f4c9948020024e13a1a091bd09d1fbba, 
"neutron:revision_number"="3", "neutron:security_group_ids"=""}
  ha_chassis_group: []
  name: "d728e2a3-f9fd-4fff-8a6f-0c55a26bc55c"
  options : {router-port=lrp-d728e2a3-f9fd-4fff-8a6f-0c55a26bc55c}
  parent_name : []
  port_security   : []
  tag : []
  tag_request : []
  type: router
  up  : true

  
  local.conf
  ==

  [[local|localrc]]
  RECLONE=yes
  DATABASE_PASSWORD=password
  RABBIT_PASSWORD=password
  SERVICE_PASSWORD=password
  SERVICE_TOKEN=password
  ADMIN_PASSWORD=password
  Q_AGENT=ovn
  Q_ML2_PLUGIN_MECHANISM_DRIVERS=ovn,logger
  Q_ML2_PLUGIN_TYPE_DRIVERS=local,flat,vlan,geneve
  Q_ML2_TENANT_NETWORK_TYPE="geneve"
  OVN_BRANCH="v21.06.0"
  OVN_BUILD_FROM_SOURCE="True"
  OVS_BRANCH="branch-2.15"
  OVS_SYSCONFDIR="/usr/local/etc/openvswitch"
  OVN_L3_CREATE_PUBLIC_NETWORK=True
  OCTAVIA_NODE="api"
  DISABLE_AMP_IMAGE_BUILD=True
  enable_plugin barbican https://opendev.org/openstack/barbican
  enable_plugin octavia https://opendev.org/openstack/octavia
  enable_plugin octavia-dashboard