[Yahoo-eng-team] [Bug 1808917] [NEW] RetryRequest shouldn't log stack trace by default, or it should be configuarble by the exception
Public bug reported: I see the following littering the logs and it strikes me as wrong: 2018-12-18 01:01:46.259 34 DEBUG neutron.plugins.ml2.managers [req-196ce43f-2408-48f4-9c7e-bb90f66c9c14 - - - - -] DB exception raised by Mechanism driver 'opendaylight_v2' in update_port_precommit _call_on_drivers /usr/lib/python2.7/site-packages/neutron/plugins/ml2/managers.py:434 2018-12-18 01:01:46.259 34 ERROR neutron.plugins.ml2.managers Traceback (most recent call last): 2018-12-18 01:01:46.259 34 ERROR neutron.plugins.ml2.managers File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/managers.py", line 427, in _call_on_drivers 2018-12-18 01:01:46.259 34 ERROR neutron.plugins.ml2.managers getattr(driver.obj, method_name)(context) 2018-12-18 01:01:46.259 34 ERROR neutron.plugins.ml2.managers File "/usr/lib/python2.7/site-packages/oslo_log/helpers.py", line 67, in wrapper 2018-12-18 01:01:46.259 34 ERROR neutron.plugins.ml2.managers return method(*args, **kwargs) 2018-12-18 01:01:46.259 34 ERROR neutron.plugins.ml2.managers File "/usr/lib/python2.7/site-packages/networking_odl/ml2/mech_driver_v2.py", line 117, in update_port_precommit 2018-12-18 01:01:46.259 34 ERROR neutron.plugins.ml2.managers context, odl_const.ODL_PORT, odl_const.ODL_UPDATE) 2018-12-18 01:01:46.259 34 ERROR neutron.plugins.ml2.managers File "/usr/lib/python2.7/site-packages/networking_odl/ml2/mech_driver_v2.py", line 87, in _record_in_journal 2018-12-18 01:01:46.259 34 ERROR neutron.plugins.ml2.managers ml2_context=context) 2018-12-18 01:01:46.259 34 ERROR neutron.plugins.ml2.managers File "/usr/lib/python2.7/site-packages/networking_odl/journal/journal.py", line 123, in record 2018-12-18 01:01:46.259 34 ERROR neutron.plugins.ml2.managers raise exception.RetryRequest(e) 2018-12-18 01:01:46.259 34 ERROR neutron.plugins.ml2.managers RetryRequest 2018-12-18 01:01:46.259 34 ERROR neutron.plugins.ml2.managers Since this is an explicit request by the operation to retry, and not some unexpected behavior, it shouldn't log the stack trace. If you really want more fine grained control (over not logging the trace completely), a flag can be added to the exception to determine whether the log of it should contain the stack trace or not. The code in question is here (also on master but this rocky url is simpler): https://github.com/openstack/neutron/blob/stable/rocky/neutron/plugins/ml2/managers.py#L433 ** Affects: neutron Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1808917 Title: RetryRequest shouldn't log stack trace by default, or it should be configuarble by the exception Status in neutron: New Bug description: I see the following littering the logs and it strikes me as wrong: 2018-12-18 01:01:46.259 34 DEBUG neutron.plugins.ml2.managers [req-196ce43f-2408-48f4-9c7e-bb90f66c9c14 - - - - -] DB exception raised by Mechanism driver 'opendaylight_v2' in update_port_precommit _call_on_drivers /usr/lib/python2.7/site-packages/neutron/plugins/ml2/managers.py:434 2018-12-18 01:01:46.259 34 ERROR neutron.plugins.ml2.managers Traceback (most recent call last): 2018-12-18 01:01:46.259 34 ERROR neutron.plugins.ml2.managers File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/managers.py", line 427, in _call_on_drivers 2018-12-18 01:01:46.259 34 ERROR neutron.plugins.ml2.managers getattr(driver.obj, method_name)(context) 2018-12-18 01:01:46.259 34 ERROR neutron.plugins.ml2.managers File "/usr/lib/python2.7/site-packages/oslo_log/helpers.py", line 67, in wrapper 2018-12-18 01:01:46.259 34 ERROR neutron.plugins.ml2.managers return method(*args, **kwargs) 2018-12-18 01:01:46.259 34 ERROR neutron.plugins.ml2.managers File "/usr/lib/python2.7/site-packages/networking_odl/ml2/mech_driver_v2.py", line 117, in update_port_precommit 2018-12-18 01:01:46.259 34 ERROR neutron.plugins.ml2.managers context, odl_const.ODL_PORT, odl_const.ODL_UPDATE) 2018-12-18 01:01:46.259 34 ERROR neutron.plugins.ml2.managers File "/usr/lib/python2.7/site-packages/networking_odl/ml2/mech_driver_v2.py", line 87, in _record_in_journal 2018-12-18 01:01:46.259 34 ERROR neutron.plugins.ml2.managers ml2_context=context) 2018-12-18 01:01:46.259 34 ERROR neutron.plugins.ml2.managers File "/usr/lib/python2.7/site-packages/networking_odl/journal/journal.py", line 123, in record 2018-12-18 01:01:46.259 34 ERROR neutron.plugins.ml2.managers raise exception.RetryRequest(e) 2018-12-18 01:01:46.259 34 ERROR neutron.plugins.ml2.managers RetryRequest 2018-12-18 01:01:46.259 34 ERROR neutron.plugins.ml2.managers Since this is an explicit request by the operation to retry, and not some unexpected behavior, it shouldn't log the stack trace. If you really want more fine grained control (over not logg
[Yahoo-eng-team] [Bug 1808916] [NEW] Update mailinglist from dev to discuss
Public bug reported: Update mailinglist from dev to discuss openstack-dev was decomissioned this night in https://review.openstack.org/621258 Update openstack-dev to openstack-discuss ** Affects: neutron Importance: Undecided Assignee: yfzhao (yfzhao) Status: New ** Changed in: neutron Assignee: (unassigned) => yfzhao (yfzhao) -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1808916 Title: Update mailinglist from dev to discuss Status in neutron: New Bug description: Update mailinglist from dev to discuss openstack-dev was decomissioned this night in https://review.openstack.org/621258 Update openstack-dev to openstack-discuss To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1808916/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1798296] Re: openstack console url show
[Expired for OpenStack Compute (nova) because there has been no activity for 60 days.] ** Changed in: nova Status: Incomplete => Expired -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1798296 Title: openstack console url show Status in OpenStack Compute (nova): Expired Bug description: $openstack console url show selfservice-instance $openstack console url show 79718973-747e-4215-9dba-94e3a883a071 error happens below: web@controller:~$ openstack console url show 79718973-747e-4215-9dba-94e3a883a071 Unexpected API Error. Please report this at http://bugs.launchpad.net/nova/ and attach the Nova API log if possible. (HTTP 500) (Request-ID: req-8954aad1-3731-4733-9db8-7005b7f05aa1) To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1798296/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1808906] [NEW] doc: broken link in user/cells.rst
Public bug reported: https://docs.openstack.org/nova/latest/user/cells.html There is a broken links in user/cells.html. https://www.openstack.org/videos/video/nova-cells-v2-whats-going-on ** Affects: nova Importance: Undecided Assignee: Takashi NATSUME (natsume-takashi) Status: In Progress ** Tags: doc -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1808906 Title: doc: broken link in user/cells.rst Status in OpenStack Compute (nova): In Progress Bug description: https://docs.openstack.org/nova/latest/user/cells.html There is a broken links in user/cells.html. https://www.openstack.org/videos/video/nova-cells-v2-whats-going-on To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1808906/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1808902] [NEW] Support matrix for abort migration is missing
Public bug reported: The support matrix for Delete(Abort) migration is missing, this could be a very useful info for users to consider using this featture, we should add it. ** Affects: nova Importance: Undecided Assignee: Zhenyu Zheng (zhengzhenyu) Status: New ** Changed in: nova Assignee: (unassigned) => Zhenyu Zheng (zhengzhenyu) -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1808902 Title: Support matrix for abort migration is missing Status in OpenStack Compute (nova): New Bug description: The support matrix for Delete(Abort) migration is missing, this could be a very useful info for users to consider using this featture, we should add it. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1808902/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1808879] [NEW] meta-api request header logging is useless
Public bug reported: The nova metadata-api logs request headers at debug level but it's useless: http://logs.openstack.org/78/624778/1/check/tempest- full/8d3c124/controller/logs/screen-n-api- meta.txt.gz#_Dec_13_08_45_55_316152 Dec 13 08:45:55.316152 ubuntu-bionic-rax-ord-0001170468 devstack@n-api- meta.service[9309]: DEBUG nova.api.metadata.handler [None req- 4b5eab00-e132-4551-b7f8-c80a238727a2 None None] Metadata request headers: {{(pid=9311) __call__ /opt/stack/nova/nova/api/metadata/handler.py:99}} This is because webob.headers.EnvironHeaders is a mutable mapping which acts like an iterator and doesn't yield a nice repr. ** Affects: nova Importance: Low Assignee: Matt Riedemann (mriedem) Status: In Progress ** Tags: api logging serviceability -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1808879 Title: meta-api request header logging is useless Status in OpenStack Compute (nova): In Progress Bug description: The nova metadata-api logs request headers at debug level but it's useless: http://logs.openstack.org/78/624778/1/check/tempest- full/8d3c124/controller/logs/screen-n-api- meta.txt.gz#_Dec_13_08_45_55_316152 Dec 13 08:45:55.316152 ubuntu-bionic-rax-ord-0001170468 devstack@n -api-meta.service[9309]: DEBUG nova.api.metadata.handler [None req- 4b5eab00-e132-4551-b7f8-c80a238727a2 None None] Metadata request headers: {{(pid=9311) __call__ /opt/stack/nova/nova/api/metadata/handler.py:99}} This is because webob.headers.EnvironHeaders is a mutable mapping which acts like an iterator and doesn't yield a nice repr. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1808879/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1806912] Re: devstack timeout because n-api/g-api takes longer than 60 seconds to start
Looking at one of the failures, g-api starts in ~3 seconds: http://logs.openstack.org/55/62/1/check/tempest- full/60bd495/controller/logs/screen-g-api.txt.gz Dec 17 18:37:34.594832 ubuntu-bionic-rax-dfw-0001240665 devstack@g-api.service[1909]: WSGI app 0 (mountpoint='') ready in 3 seconds on interpreter 0x562a8bbd57c0 pid: 1910 (default app) Dec 17 18:37:34.620388 ubuntu-bionic-rax-dfw-0001240665 devstack@g-api.service[1909]: WSGI app 0 (mountpoint='') ready in 3 seconds on interpreter 0x562a8bbd57c0 pid: 1911 (default app) but it looks like GET /images requests are returning a 503; 2018-12-17 20:34:07.843 | ++ :: : curl -g -k --noproxy '*' -s -o /dev/null -w '%{http_code}' https://10.209.34.117/image 2018-12-17 20:34:07.874 | + :: : [[ 503 == 503 ]] I don't see anything wrong in the glance logs, but I do see proxy errors in the apache logs: http://logs.openstack.org/55/62/1/check/tempest- full/60bd495/controller/logs/apache/error_log.txt.gz [Mon Dec 17 20:33:08.847250 2018] [proxy:error] [pid 8631:tid 140513730557696] (111)Connection refused: AH00957: HTTP: attempt to connect to 127.0.0.1:60998 (127.0.0.1) failed [Mon Dec 17 20:33:08.847344 2018] [proxy_http:error] [pid 8631:tid 140513730557696] [client 10.209.34.117:51988] AH01114: HTTP: failed to make connection to backend: 127.0.0.1 And in the access log: http://logs.openstack.org/55/62/1/check/tempest- full/60bd495/controller/logs/apache/access_log.txt.gz 10.209.34.117 - - [17/Dec/2018:20:33:15 +] "GET /image HTTP/1.1" 503 568 "-" "curl/7.58.0" 10.209.34.117 - - [17/Dec/2018:20:33:16 +] "GET /image HTTP/1.1" 503 568 "-" "curl/7.58.0" 10.209.34.117 - - [17/Dec/2018:20:33:17 +] "GET /image HTTP/1.1" 503 568 "-" "curl/7.58.0" 10.209.34.117 - - [17/Dec/2018:20:33:18 +] "GET /image HTTP/1.1" 503 568 "-" "curl/7.58.0" ** Also affects: glance Importance: Undecided Status: New ** Summary changed: - devstack timeout because n-api/g-api takes longer than 60 seconds to start + devstack timeout because g-api takes longer than 60 seconds to start -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to Glance. https://bugs.launchpad.net/bugs/1806912 Title: devstack timeout because g-api takes longer than 60 seconds to start Status in Glance: New Status in OpenStack-Gate: Confirmed Bug description: For example: http://logs.openstack.org/60/620660/1/check/heat-functional-orig- mysql-lbaasv2/bc7ef21/logs/devstacklog.txt#_2018-11-30_20_14_13_418 2018-11-30 20:14:13.418 | + lib/glance:start_glance:353 : die 353 'g-api did not start' The g-api logs show it took 62 seconds to start: http://logs.openstack.org/60/620660/1/check/heat-functional-orig- mysql-lbaasv2/bc7ef21/logs/screen-g-api.txt.gz#_Nov_30_20_14_12_852280 Nov 30 20:14:12.852280 ubuntu-xenial-ovh-bhs1-840057 devstack@g-api.service[7937]: WSGI app 0 (mountpoint='') ready in 62 seconds on interpreter 0xb23d10 pid: 7942 (default app) Looks like this primarily happens on ovh-bhs1 nodes. http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22-api%20did%20not%20start%5C%22%20AND%20tags%3A%5C%22console%5C%22&from=7d To manage notifications about this bug go to: https://bugs.launchpad.net/glance/+bug/1806912/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1808868] [NEW] Useful image properties in glance - hw_cdrom_bus is not documented
Public bug reported: - [x] This is a doc addition request. The useful image properties page does not document the "hw_cdrom_bus" image property, but it's mentioned in a few docs: https://docs.openstack.org/glance/rocky/admin/manage-images.html https://docs.openstack.org/security-guide/compute/hardening-the- virtualization-layers.html Probably the easiest description is "Name of the CDROM bus to use.". The values for the property are defined in nova: https://github.com/openstack/nova/blob/48ad73e1faf966badab1f0344baad9f4f4055abf/nova/objects/fields.py#L320 class DiskBus(BaseNovaEnum): FDC = "fdc" IDE = "ide" SATA = "sata" SCSI = "scsi" USB = "usb" VIRTIO = "virtio" XEN = "xen" LXC = "lxc" UML = "uml" ALL = (FDC, IDE, SATA, SCSI, USB, VIRTIO, XEN, LXC, UML) However, not all libvirt virt types support all disk bus types, that's defined here: https://github.com/openstack/nova/blob/48ad73e1faf966badab1f0344baad9f4f4055abf/nova/virt/libvirt/blockinfo.py#L198 The only compute driver that uses hw_cdrom_bus is "libvirt". --- Release: on 2018-07-30 15:26:07 SHA: 3d52684346f3a7eb56b7d8836b4e0e8efafee647 Source: https://git.openstack.org/cgit/openstack/glance/tree/doc/source/admin/useful-image-properties.rst URL: https://docs.openstack.org/glance/latest/admin/useful-image-properties.html ** Affects: glance Importance: Medium Status: Confirmed ** Tags: documentation -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to Glance. https://bugs.launchpad.net/bugs/1808868 Title: Useful image properties in glance - hw_cdrom_bus is not documented Status in Glance: Confirmed Bug description: - [x] This is a doc addition request. The useful image properties page does not document the "hw_cdrom_bus" image property, but it's mentioned in a few docs: https://docs.openstack.org/glance/rocky/admin/manage-images.html https://docs.openstack.org/security-guide/compute/hardening-the- virtualization-layers.html Probably the easiest description is "Name of the CDROM bus to use.". The values for the property are defined in nova: https://github.com/openstack/nova/blob/48ad73e1faf966badab1f0344baad9f4f4055abf/nova/objects/fields.py#L320 class DiskBus(BaseNovaEnum): FDC = "fdc" IDE = "ide" SATA = "sata" SCSI = "scsi" USB = "usb" VIRTIO = "virtio" XEN = "xen" LXC = "lxc" UML = "uml" ALL = (FDC, IDE, SATA, SCSI, USB, VIRTIO, XEN, LXC, UML) However, not all libvirt virt types support all disk bus types, that's defined here: https://github.com/openstack/nova/blob/48ad73e1faf966badab1f0344baad9f4f4055abf/nova/virt/libvirt/blockinfo.py#L198 The only compute driver that uses hw_cdrom_bus is "libvirt". --- Release: on 2018-07-30 15:26:07 SHA: 3d52684346f3a7eb56b7d8836b4e0e8efafee647 Source: https://git.openstack.org/cgit/openstack/glance/tree/doc/source/admin/useful-image-properties.rst URL: https://docs.openstack.org/glance/latest/admin/useful-image-properties.html To manage notifications about this bug go to: https://bugs.launchpad.net/glance/+bug/1808868/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1744079] Re: [SRU] disk over-commit still not correctly calculated during live migration
** Also affects: cloud-archive/ocata Importance: Undecided Status: New ** Also affects: cloud-archive/pike Importance: Undecided Status: New ** Changed in: cloud-archive/pike Importance: Undecided => High ** Changed in: cloud-archive/pike Status: New => Triaged ** Changed in: cloud-archive/ocata Importance: Undecided => High ** Changed in: cloud-archive/ocata Status: New => Triaged -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1744079 Title: [SRU] disk over-commit still not correctly calculated during live migration Status in Ubuntu Cloud Archive: Fix Committed Status in Ubuntu Cloud Archive ocata series: Triaged Status in Ubuntu Cloud Archive pike series: Triaged Status in Ubuntu Cloud Archive queens series: Fix Committed Status in Ubuntu Cloud Archive rocky series: Fix Committed Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) queens series: In Progress Status in OpenStack Compute (nova) rocky series: In Progress Status in nova package in Ubuntu: Fix Released Status in nova source package in Xenial: Triaged Status in nova source package in Bionic: Fix Committed Status in nova source package in Cosmic: Fix Committed Status in nova source package in Disco: Fix Released Bug description: [Impact] nova compares disk space with disk_available_least field, which is possible to be negative, due to overcommit. So the migration may fail because of a "Migration pre-check error: Unable to migrate dfcd087a-5dff-439d-8875-2f702f081539: Disk of instance is too large(available on destination host:-3221225472 < need:22806528)" when trying a migration to another compute that has plenty of free space in his disk. [Test Case] Deploy openstack environment. Make sure there is a negative disk_available_least and a adequate free_disk_gb in one test compute node, then migrate a VM to it with disk-overcommit (openstack server migrate --live --block-migration --disk-overcommit ). You will see above migration pre-check error. This is the formula to compute disk_available_least and free_disk_gb. disk_free_gb = disk_info_dict['free'] disk_over_committed = self._get_disk_over_committed_size_total() available_least = disk_free_gb * units.Gi - disk_over_committed data['disk_available_least'] = available_least / units.Gi The following command can be used to query the value of disk_available_least nova hypervisor-show |grep disk Steps to Reproduce: 1. set disk_allocation_ratio config option > 1.0 2. qemu-img resize cirros-0.3.0-x86_64-disk.img +40G 3. glance image-create --disk-format qcow2 ... 4. boot VMs based on resized image 5. we see disk_available_least becomes negative [Regression Potential] Minimal - we're just changing from the following line: disk_available_gb = dst_compute_info['disk_available_least'] to the following codes: if disk_over_commit: disk_available_gb = dst_compute_info['free_disk_gb'] else: disk_available_gb = dst_compute_info['disk_available_least'] When enabling overcommit, disk_available_least is possible to be negative, so we should use free_disk_gb instead of it by backporting the following two fixes. https://git.openstack.org/cgit/openstack/nova/commit/?id=e097c001c8e0efe8879da57264fcb7bdfdf2 https://git.openstack.org/cgit/openstack/nova/commit/?id=e2cc275063658b23ed88824100919a6dfccb760d This is the code path for check_can_live_migrate_destination: _migrate_live(os-migrateLive API, migrate_server.py) -> migrate_server -> _live_migrate -> _build_live_migrate_task -> _call_livem_checks_on_host -> check_can_live_migrate_destination BTW, redhat also has a same bug - https://bugzilla.redhat.com/show_bug.cgi?id=1477706 [Original Bug Report] Change I8a705114d47384fcd00955d4a4f204072fed57c2 (written by me... sigh) addressed a bug which prevented live migration to a target host with overcommitted disk when made with microversion <2.25. It achieved this, but the fix is still not correct. We now do: if disk_over_commit: disk_available_gb = dst_compute_info['local_gb'] Unfortunately local_gb is *total* disk, not available disk. We actually want free_disk_gb. Fun fact: due to the way we calculate this for filesystems, without taking into account reserved space, this can also be negative. The test we're currently running is: could we fit this guest's allocated disks on the target if the target disk was empty. This is at least better than it was before, as we don't spuriously fail early. In fact, we're effectively disabling a test which is disabled for microversion >=2.25 anyway. IOW we should fix it, but it's probably not a high priority. To manage notifications about this bug go to: https://bugs.launchpad.n
[Yahoo-eng-team] [Bug 1808859] [NEW] The v3 group API should account for different scopes
Public bug reported: Keystone implemented scope_types for oslo.policy RuleDefault objects in the Queens release [0]. In order to take full advantage of scope_types, keystone is going to have to evolve policy enforcement checks in the group API. This is documented in each patch with FIXMEs [1]. System users should be able to manage groups across all domains in the deployment. Domain users should be able to manage groups within the domain they have authorization on. Project users shouldn't be able to manage groups at all, since group entities are domain-specific. [0] https://review.openstack.org/#/c/525706/ [1] https://git.openstack.org/cgit/openstack/keystone/tree/keystone/common/policies/group.py?id=20f11eb88a7d8bf534fa221ebeae4ae9c87cdc0b#n21 ** Affects: keystone Importance: High Status: Triaged ** Tags: policy system-scope ** Tags added: policy ** Tags added: system-scope ** Description changed: Keystone implemented scope_types for oslo.policy RuleDefault objects in the Queens release [0]. In order to take full advantage of scope_types, keystone is going to have to evolve policy enforcement checks in the group API. This is documented in each patch with FIXMEs [1]. System users should be able to manage groups across all domains in the deployment. Domain users should be able to manage groups within the domain they have authorization on. Project users shouldn't be able to manage groups at all, since group entities are domain-specific. [0] https://review.openstack.org/#/c/525706/ - [1] https://review.openstack.org/#/c/525706/3/keystone/common/policies/group.py + [1] https://git.openstack.org/cgit/openstack/keystone/tree/keystone/common/policies/group.py?id=20f11eb88a7d8bf534fa221ebeae4ae9c87cdc0b#n21 ** Changed in: keystone Status: New => Triaged ** Changed in: keystone Importance: Undecided => High -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Identity (keystone). https://bugs.launchpad.net/bugs/1808859 Title: The v3 group API should account for different scopes Status in OpenStack Identity (keystone): Triaged Bug description: Keystone implemented scope_types for oslo.policy RuleDefault objects in the Queens release [0]. In order to take full advantage of scope_types, keystone is going to have to evolve policy enforcement checks in the group API. This is documented in each patch with FIXMEs [1]. System users should be able to manage groups across all domains in the deployment. Domain users should be able to manage groups within the domain they have authorization on. Project users shouldn't be able to manage groups at all, since group entities are domain-specific. [0] https://review.openstack.org/#/c/525706/ [1] https://git.openstack.org/cgit/openstack/keystone/tree/keystone/common/policies/group.py?id=20f11eb88a7d8bf534fa221ebeae4ae9c87cdc0b#n21 To manage notifications about this bug go to: https://bugs.launchpad.net/keystone/+bug/1808859/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1744079] Re: [SRU] disk over-commit still not correctly calculated during live migration
** Also affects: nova (Ubuntu Xenial) Importance: Undecided Status: New ** Changed in: nova (Ubuntu Xenial) Importance: Undecided => High ** Changed in: nova (Ubuntu Xenial) Status: New => Triaged -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1744079 Title: [SRU] disk over-commit still not correctly calculated during live migration Status in Ubuntu Cloud Archive: Fix Committed Status in Ubuntu Cloud Archive queens series: Fix Committed Status in Ubuntu Cloud Archive rocky series: Fix Committed Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) queens series: In Progress Status in OpenStack Compute (nova) rocky series: In Progress Status in nova package in Ubuntu: Fix Released Status in nova source package in Xenial: Triaged Status in nova source package in Bionic: Fix Committed Status in nova source package in Cosmic: Fix Committed Status in nova source package in Disco: Fix Released Bug description: [Impact] nova compares disk space with disk_available_least field, which is possible to be negative, due to overcommit. So the migration may fail because of a "Migration pre-check error: Unable to migrate dfcd087a-5dff-439d-8875-2f702f081539: Disk of instance is too large(available on destination host:-3221225472 < need:22806528)" when trying a migration to another compute that has plenty of free space in his disk. [Test Case] Deploy openstack environment. Make sure there is a negative disk_available_least and a adequate free_disk_gb in one test compute node, then migrate a VM to it with disk-overcommit (openstack server migrate --live --block-migration --disk-overcommit ). You will see above migration pre-check error. This is the formula to compute disk_available_least and free_disk_gb. disk_free_gb = disk_info_dict['free'] disk_over_committed = self._get_disk_over_committed_size_total() available_least = disk_free_gb * units.Gi - disk_over_committed data['disk_available_least'] = available_least / units.Gi The following command can be used to query the value of disk_available_least nova hypervisor-show |grep disk Steps to Reproduce: 1. set disk_allocation_ratio config option > 1.0 2. qemu-img resize cirros-0.3.0-x86_64-disk.img +40G 3. glance image-create --disk-format qcow2 ... 4. boot VMs based on resized image 5. we see disk_available_least becomes negative [Regression Potential] Minimal - we're just changing from the following line: disk_available_gb = dst_compute_info['disk_available_least'] to the following codes: if disk_over_commit: disk_available_gb = dst_compute_info['free_disk_gb'] else: disk_available_gb = dst_compute_info['disk_available_least'] When enabling overcommit, disk_available_least is possible to be negative, so we should use free_disk_gb instead of it by backporting the following two fixes. https://git.openstack.org/cgit/openstack/nova/commit/?id=e097c001c8e0efe8879da57264fcb7bdfdf2 https://git.openstack.org/cgit/openstack/nova/commit/?id=e2cc275063658b23ed88824100919a6dfccb760d This is the code path for check_can_live_migrate_destination: _migrate_live(os-migrateLive API, migrate_server.py) -> migrate_server -> _live_migrate -> _build_live_migrate_task -> _call_livem_checks_on_host -> check_can_live_migrate_destination BTW, redhat also has a same bug - https://bugzilla.redhat.com/show_bug.cgi?id=1477706 [Original Bug Report] Change I8a705114d47384fcd00955d4a4f204072fed57c2 (written by me... sigh) addressed a bug which prevented live migration to a target host with overcommitted disk when made with microversion <2.25. It achieved this, but the fix is still not correct. We now do: if disk_over_commit: disk_available_gb = dst_compute_info['local_gb'] Unfortunately local_gb is *total* disk, not available disk. We actually want free_disk_gb. Fun fact: due to the way we calculate this for filesystems, without taking into account reserved space, this can also be negative. The test we're currently running is: could we fit this guest's allocated disks on the target if the target disk was empty. This is at least better than it was before, as we don't spuriously fail early. In fact, we're effectively disabling a test which is disabled for microversion >=2.25 anyway. IOW we should fix it, but it's probably not a high priority. To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-archive/+bug/1744079/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1793255] Re: nova-consoleauth missing from Rocky install guide; unable to use VNC
Reviewed: https://review.openstack.org/605154 Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=983e6ea5518dee956d8054cd67dddbe98eab9f78 Submitter: Zuul Branch:master commit 983e6ea5518dee956d8054cd67dddbe98eab9f78 Author: melanie witt Date: Tue Sep 25 17:12:16 2018 + Restore nova-consoleauth to install docs The installation of the nova-consoleauth service was erroneously removed from the docs prematurely. The nova-consoleauth service is still being used in Rocky, with the removal being possible in Stein. This should have been fixed as part of change Ibbdc7c50c312da2acc59dfe64de95a519f87f123 but was missed. This is also related to the release note update in Rocky under change Ie637b4871df8b870193b5bc07eece15c03860c06. Co-Authored-By: Matt Riedemann Closes-Bug: #1793255 Related-Bug: #1798188 Change-Id: Ied268da9e70bd2807c2dfe7a479181fbec52979d ** Changed in: nova Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1793255 Title: nova-consoleauth missing from Rocky install guide; unable to use VNC Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) rocky series: In Progress Bug description: You gus doesn't maintain the doc well. Waste people lots of time. openstack-nova-consoleauth should be started, or VNC console won't work. see https://bugs.launchpad.net/nova/+bug/1792674 This bug tracker is for errors with the documentation, use the following as a template and remove or add fields as you see fit. Convert [ ] into [x] to check boxes: - [x] This doc is inaccurate in this way: __ - [ ] This is a doc addition request. - [ ] I have a fix to the document that I can paste below including example: input and output. If you have a troubleshooting or support issue, use the following resources: - Ask OpenStack: http://ask.openstack.org - The mailing list: http://lists.openstack.org - IRC: 'openstack' channel on Freenode --- Release: 18.0.1.dev30 on 2018-09-19 04:24 SHA: a936ecb280de74339e3de8ba8a42bd6e9458b7f5 Source: https://git.openstack.org/cgit/openstack/nova/tree/doc/source/install/controller-install-rdo.rst URL: https://docs.openstack.org/nova/rocky/install/controller-install-rdo.html To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1793255/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1804653] Re: Provide user data to instances in nova should explicitly tell in the docs about the size limit
Reviewed: https://review.openstack.org/620700 Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=d63afbf6c59c1565e1c6943747834555852f3bbc Submitter: Zuul Branch:master commit d63afbf6c59c1565e1c6943747834555852f3bbc Author: Matt Riedemann Date: Wed Nov 28 16:13:21 2018 -0500 Mention size limit on user data in docs The API reference for the "user_data" parameter to server create and rebuild mentions the size restriction but the user docs did not, so this adds that note. While doing this, several other user-facing docs that mention user data link back to the overall user-data doc so that we can centralize any further documentation about that topic in a single location. Change-Id: Id4a61d58150337e0dec223d4d2741336ed6d5387 Closes-Bug: #1804653 ** Changed in: nova Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1804653 Title: Provide user data to instances in nova should explicitly tell in the docs about the size limit Status in OpenStack Compute (nova): Fix Released Bug description: This bug tracker is for errors with the documentation, use the following as a template and remove or add fields as you see fit. Convert [ ] into [x] to check boxes: - [x] This doc is inaccurate in this way: So, you think cloud init is cool and you start building on it. The more you learn about cloud init, the more you like it. Until your cloud init becomes too big to the dislikes of who every design the database schema or API. Then your calls to nova API will fail with: novaclient.exceptions.BadRequest: User data too large. User data must be no larger than 65535 bytes once base64 encoded. Your data is 66080 bytes (HTTP 400) (Request-ID: req-13e1d006-2c77-4ab4-903f- 92f279d64cfc) At this point you invested quite sometime ... and now you have to figure a new strategy. The documentation could at least warn in advance ... https://github.com/openstack/nova/blob/c6218428e9b29a2c52808ec7d27b4b21aadc0299/doc/source/user /user-data.rst Somewhere in there should be a notice about the max userdata size. --- Release: 18.0.4.dev11 on 2018-11-21 20:54 SHA: e90e89219410a771f9b6b0c4200edb0480360afe Source: https://git.openstack.org/cgit/openstack/nova/tree/doc/source/user/user-data.rst URL: https://docs.openstack.org/nova/rocky/user/user-data.html To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1804653/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1807970] Re: test_multi_cell_list fails in python 3.7
Reviewed: https://review.openstack.org/624055 Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=2ea552e019bd427b5b1709160a6ed7da9dd23fbd Submitter: Zuul Branch:master commit 2ea552e019bd427b5b1709160a6ed7da9dd23fbd Author: Chris Dent Date: Mon Dec 10 11:22:54 2018 + Add python 3.7 unit and functional tox jobs Without these, if you try to run tox -epy37,functional-py37 you'll get a successful tox run, but no actual tests are run, which is rather misleading. Given the generaly availability of python 3.7 this is a bad thing. Running the tests under python 3.7 identified a few minor tests failures, also fixed here. Each is a result of a change in behavior in python 3.7: * printable unicode changes with a new Unicode 11-based unicodedata package * intentionally raising StopIteration in a generator is now considered a RuntimeError, 'return' should be used instead * an exception message is different beween python 3 and python 2, and the guard for it was mapping python 3.5 and 3.6 but not 3.7. zuul configuration is adjusted to add an experimental job for python 3.7 unit. A functional test job is not added, because we don't have 3.6 yet, and we probably want to get through that first. Closes-Bug: #1807976 Closes-Bug: #1807970 Change-Id: I37779a12d3b36eb3dc7e2733d07fe0ed23ab3da6 ** Changed in: nova Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1807970 Title: test_multi_cell_list fails in python 3.7 Status in OpenStack Compute (nova): Fix Released Bug description: Generators used in multi cell list handling raise StopIteration, which is not something python 3.7 likes. Efforts to add python 3.7 testing to nova [1] revealed this (and similar for neighboring tests): nova.tests.unit.compute.test_multi_cell_list.TestBaseClass.test_with_failing_cells -- Captured traceback: ~~~ b'Traceback (most recent call last):' b' File "/mnt/share/cdentsrc/nova/nova/compute/multi_cell_list.py", line 101, in query_wrapper' b'for record in fn(ctx, *args, **kwargs):' b' File "/mnt/share/cdentsrc/nova/nova/compute/multi_cell_list.py", line 348, in do_query' b'**kwargs)' b' File "/mnt/share/cdentsrc/nova/nova/tests/unit/compute/test_multi_cell_list.py", line 356, in get_by_filters' b'raise exception.CellTimeout' b'nova.exception.CellTimeout: Timeout waiting for response from cell' b'' b'During handling of the above exception, another exception occurred:' b'' b'Traceback (most recent call last):' b' File "/mnt/share/cdentsrc/nova/nova/compute/multi_cell_list.py", line 108, in query_wrapper' b'raise StopIteration' b'StopIteration' b'' b'The above exception was the direct cause of the following exception:' b'' b'Traceback (most recent call last):' b' File "/mnt/share/cdentsrc/nova/.tox/py37/lib/python3.7/site-packages/mock/mock.py", line 1305, in patched' b'return func(*args, **keywargs)' b' File "/mnt/share/cdentsrc/nova/nova/tests/unit/compute/test_multi_cell_list.py", line 384, in test_with_failing_cells' b'self.assertEqual(50, len(list(result)))' b' File "/mnt/share/cdentsrc/nova/nova/compute/multi_cell_list.py", line 400, in get_records_sorted' b'item = next(feeder)' b' File "/mnt/share/cdentsrc/nova/.tox/py37/lib/python3.7/heapq.py", line 359, in merge' b's[0] = next() # raises StopIteration when exhausted' b'RuntimeError: generator raised StopIteration' b'' According to pep 479 [2] the fix for this is to 'return' instead of 'raise StopIteration'. [1] https://review.openstack.org/#/c/624055/ [2] https://www.python.org/dev/peps/pep-0479/#writing-backwards-and-forwards-compatible-code To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1807970/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1807976] Re: In python 3.7 the definition of a printable character is changed so test_flavors fails
Reviewed: https://review.openstack.org/624055 Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=2ea552e019bd427b5b1709160a6ed7da9dd23fbd Submitter: Zuul Branch:master commit 2ea552e019bd427b5b1709160a6ed7da9dd23fbd Author: Chris Dent Date: Mon Dec 10 11:22:54 2018 + Add python 3.7 unit and functional tox jobs Without these, if you try to run tox -epy37,functional-py37 you'll get a successful tox run, but no actual tests are run, which is rather misleading. Given the generaly availability of python 3.7 this is a bad thing. Running the tests under python 3.7 identified a few minor tests failures, also fixed here. Each is a result of a change in behavior in python 3.7: * printable unicode changes with a new Unicode 11-based unicodedata package * intentionally raising StopIteration in a generator is now considered a RuntimeError, 'return' should be used instead * an exception message is different beween python 3 and python 2, and the guard for it was mapping python 3.5 and 3.6 but not 3.7. zuul configuration is adjusted to add an experimental job for python 3.7 unit. A functional test job is not added, because we don't have 3.6 yet, and we probably want to get through that first. Closes-Bug: #1807976 Closes-Bug: #1807970 Change-Id: I37779a12d3b36eb3dc7e2733d07fe0ed23ab3da6 ** Changed in: nova Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1807976 Title: In python 3.7 the definition of a printable character is changed so test_flavors fails Status in OpenStack Compute (nova): Fix Released Bug description: 'test_name_with_non_printable_characters' in the 'test_flavors' unit tests checks to see that a non-printable character cannot be allowed in a flavor name. This fails in python 3.7. The reason it fails is because in Python 3.7 the 'unicodedata' package was updated [1] to Unicode 11 and what's printable has changed. The fix to the problem is to use a _really_ unprintable unicode char, according unicode 11. [1] https://docs.python.org/3/whatsnew/3.7.html#unicodedata To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1807976/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1767315] Re: "openstack hypervisor stats show" shows wrong memory information
this is not a but at least not as reported. free_ram_mb is not ment to be equal to the free value of "free -m" memory_mb is 4047 and memory_mb 4096 as a result the total amount of memery that you have acounted against guests is 49 mb more then the avalible ram so free_ram_mb which is caluated as memory_mb - memory_mb_used is -49 as you are oversubsibed by 49MBs the default ram allocation ratio is 1.5 which is why you are allowed to allocate more ram then you have up to 6GiB in your case. not that memory_mb_used is calulated by suming the total ram of all vms scheduled to a node + the reserved_host_memory_mb config value which i would guess defaults to 4096 in triplo. https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.reserved_host_memory_mb ** Changed in: nova Status: Confirmed => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1767315 Title: "openstack hypervisor stats show" shows wrong memory information Status in OpenStack Compute (nova): Invalid Bug description: Description === "openstack hypervisor stats show" shows wrong memory information Steps to reproduce == - The deployment is done through tripleo-quickstart for openstack queens (overcloud) [stack@undercloud ~]$ openstack hypervisor stats show +--+---+ | Field| Value | +--+---+ | count| 1 | | current_workload | 0 | | disk_available_least | 43| | free_disk_gb | 49| | free_ram_mb | -49 | | local_gb | 49| | local_gb_used| 0 | | memory_mb| 4047 | | memory_mb_used | 4096 | | running_vms | 0 | | vcpus| 2 | | vcpus_used | 0 | +--+---+ (overcloud) [stack@undercloud ~]$ [root@overcloud-novacompute-0 ~]# free -m totalusedfree shared buff/cache available Mem: 3742 556 203 12982 2688 Swap: 0 0 0 [root@overcloud-novacompute-0 ~]# Expected result === Should leverage the actual available memory. Actual result = Showing wrong memory value. Environment === - Openstack queens [heat-admin@overcloud-novacompute-0 ~]$ rpm -qa | grep nova openstack-nova-placement-api-17.0.3-0.20180420001136.bf0a069.el7.centos.noarch python-nova-17.0.3-0.20180420001136.bf0a069.el7.centos.noarch puppet-nova-12.4.1-0.20180423041756.95ca7cd.el7.centos.noarch openstack-nova-scheduler-17.0.3-0.20180420001136.bf0a069.el7.centos.noarch python2-novaclient-9.1.1-0.20180213141814.a1c0074.el7.centos.noarch openstack-nova-conductor-17.0.3-0.20180420001136.bf0a069.el7.centos.noarch openstack-nova-migration-17.0.3-0.20180420001136.bf0a069.el7.centos.noarch openstack-nova-console-17.0.3-0.20180420001136.bf0a069.el7.centos.noarch openstack-nova-common-17.0.3-0.20180420001136.bf0a069.el7.centos.noarch openstack-nova-compute-17.0.3-0.20180420001136.bf0a069.el7.centos.noarch openstack-nova-api-17.0.3-0.20180420001136.bf0a069.el7.centos.noarch openstack-nova-novncproxy-17.0.3-0.20180420001136.bf0a069.el7.centos.noarch [heat-admin@overcloud-novacompute-0 ~]$ Logs & Configs == - default configuration To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1767315/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1808814] [NEW] admin docs: interoperable image import revision for stein
Public bug reported: https://docs.openstack.org/glance/latest/admin/interoperable-image- import.html The image import docs need a revision. I noticed these, there may be more: * remove mention of enable_image_import option and its effect on the v2.6 API * probably leave in the mention of the v1 copy-from (so it's clear that the OSSN doesn't apply to web-download), but change language of the v1 API being deprecated to simply, "Additionally, the Image API v1 was removed in Glance 17.0.0 (Rocky)." ** Affects: glance Importance: Low Status: Triaged -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to Glance. https://bugs.launchpad.net/bugs/1808814 Title: admin docs: interoperable image import revision for stein Status in Glance: Triaged Bug description: https://docs.openstack.org/glance/latest/admin/interoperable-image- import.html The image import docs need a revision. I noticed these, there may be more: * remove mention of enable_image_import option and its effect on the v2.6 API * probably leave in the mention of the v1 copy-from (so it's clear that the OSSN doesn't apply to web-download), but change language of the v1 API being deprecated to simply, "Additionally, the Image API v1 was removed in Glance 17.0.0 (Rocky)." To manage notifications about this bug go to: https://bugs.launchpad.net/glance/+bug/1808814/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1803643] Re: task_info FK error when running "glance-manage db purge"
Reviewed: https://review.openstack.org/617889 Committed: https://git.openstack.org/cgit/openstack/glance/commit/?id=72159a4a7b267bfe376e84ea754a42b372206325 Submitter: Zuul Branch:master commit 72159a4a7b267bfe376e84ea754a42b372206325 Author: Liang Fang Date: Wed Nov 14 14:18:54 2018 +0800 Fix for FK constraint violation First force purging of records that are not soft deleted but are referencing soft deleted tasks/images records (e.g. task_info records). Then purge all soft deleted records in glance tables in the right order to avoid FK constraint violation. Closes-Bug: #1803643 Change-Id: I1c471adce002545f8965a57ef78a57e1e3031ef0 Co-authored-by: Tee Ngo Signed-off-by: Liang Fang ** Changed in: glance Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to Glance. https://bugs.launchpad.net/bugs/1803643 Title: task_info FK error when running "glance-manage db purge" Status in Glance: Fix Released Bug description: "glance-manage db purge" failed when there're tasks in db, with state "deleted=1" and "deleted_at" one month ago. Error logs --- DBError detected when purging from tasks: (pymysql.err.IntegrityError) (1451, u'Cannot delete or update a parent row: a foreign key constraint fails (`glance`.`task_info`, CONSTRAINT `task_info_ibfk_1` FOREIGN KEY (`task_id`) REFERENCES `tasks` (`id`))') [SQL: u'DELETE FROM tasks WHERE tasks.id in (SELECT T1.id FROM (SELECT tasks.id \nFROM tasks \nWHERE tasks.deleted_at < %(deleted_at_1)s ORDER BY tasks.deleted_at \n LIMIT %(param_1)s) as T1)'] [parameters: {u'deleted_at_1': datetime.datetime(2018, 11, 14, 2, 28, 7, 645622), u'param_1': 100}] (Background on this error at: http://sqlalche.me/e/gkpj): DBReferenceError: (pymysql.err.IntegrityError) (1451, u'Cannot delete or update a parent row: a foreign key constraint fails (`glance`.`task_info`, CONSTRAINT `task_info_ibfk_1` FOREIGN KEY (`task_id`) REFERENCES `tasks` (`id`))') [SQL: u'DELETE FROM tasks WHERE tasks.id in (SELECT T1.id FROM (SELECT tasks.id \nFROM tasks \nWHERE tasks.deleted_at < %(deleted_at_1)s ORDER BY tasks.deleted_at \n LIMIT %(param_1)s) as T1)'] [parameters: {u'deleted_at_1': datetime.datetime(2018, 11, 14, 2, 28, 7, 645622), u'param_1': 100}] (Background on this error at: http://sqlalche.me/e/gkpj) Purge command failed, check glance-manage logs for more details. Steps to reproduce --- 1. create a task glance task-create --type "import" --input '{"import_from": "/opt/stack/111.img"}' glance task-create --type "import" --input '{"import_from": "/opt/stack/222.img"}' 2. update the db table "tasks", set deleted=1 and deleted_at a day one month ago e.g. update tasks set deleted=1, deleted_at='2018-10-10 03:18:50' where id='dc76da48-cace-47d4-bcfd-0b62254e52ed'; 3. run "glance-manage db purge --age_in_days 2" Database like: mysql> select * from tasks; +--++-+--++-+-+-+-+ | id | type | status | owner | expires_at | created_at | updated_at | deleted_at | deleted | +--++-+--++-+-+-+-+ | dc76da48-cace-47d4-bcfd-0b62254e52ed | import | pending | 60a12b1788ad44468afd983f89a5f8dc | NULL | 2018-11-15 03:18:33 | 2018-11-15 03:18:33 | 2018-10-10 03:18:50 | 1 | | fbd7e46a-0f33-4c98-be87-0ff7112561e1 | import | pending | 60a12b1788ad44468afd983f89a5f8dc | NULL | 2018-11-16 02:18:12 | 2018-11-16 02:18:12 | NULL| 0 | +--++-+--++-+-+-+-+ 2 rows in set (0.00 sec) mysql> select * from task_info; +--+---++-+ | task_id | input | result | message | +--+---++-+ | dc76da48-cace-47d4-bcfd-0b62254e52ed | {"import_from": "/opt/stack/111.img"} | NULL | | | fbd7e46a-0f33-4c98-be87-0ff7112561e1 | {"import_from": "/opt/stack/222.img"} | NULL | | +--+---++-+ To manage notifications about this
[Yahoo-eng-team] [Bug 1807639] Re: Not precise when calculating image size in GB.
Reviewed: https://review.openstack.org/623989 Committed: https://git.openstack.org/cgit/openstack/horizon/commit/?id=274706151c68072a34c21e0b596424c2390dc9b4 Submitter: Zuul Branch:master commit 274706151c68072a34c21e0b596424c2390dc9b4 Author: Yan Chen Date: Mon Dec 10 23:08:22 2018 +0800 Fix precission issue when calculating image size in GB. When calculating imageGb, should use 1073741824.0 (Bytes in a GB) as the divisor. Closes-Bug: 1807639 Change-Id: I096dbf84826866e3e6916474157f8697b9f546ab Signed-off-by: Yan Chen ** Changed in: horizon Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Dashboard (Horizon). https://bugs.launchpad.net/bugs/1807639 Title: Not precise when calculating image size in GB. Status in OpenStack Dashboard (Horizon): Fix Released Bug description: Impacted source code: openstack_dashboard/dashboards/project/static/dashboard/project/workflow/launch-instance/source/source.controller.js, checkVolumeForImage(). When calculating imageGb, should use 1073741824.0 (Bytes in a GB) as the divisor. To manage notifications about this bug go to: https://bugs.launchpad.net/horizon/+bug/1807639/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1716920] Re: online snapshot deletion breaks backing chain with remotefs drivers
Finally i was able to attribute this to an issue in Quobyte client side caching, not an OpenStack or libvirt issue. ** Changed in: cinder Status: Incomplete => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1716920 Title: online snapshot deletion breaks backing chain with remotefs drivers Status in Cinder: Invalid Status in OpenStack Compute (nova): Expired Bug description: The deletion of online snapshots of remotefs based volumes breaks the .info file/backing chain of these volumes. Logs can be seen in any current Quobyte CI run in Cinder/Nova/OS-Brick . Afaics the the other driver using this (VZstorage) has it's CI skip the affected tests (e.g. test_snapshot_create_delete_with_volume_in_use). I ran a lot of tests and so far i can say that the first deletion of a member in the backing chain works (snapshot is deleted) but seemingly leaves the .info files content and/or the backing chain of the volume file in a broken state. The error can be identified e.g. by the following log pattern: This is the first snapshot deletion that runs successfully (the snapshots id is 91755e5f-e573-4ddb-84af-3712d69a). The ID of the snapshot and its snapshot_file name do match their uuids: 2017-09-13 08:28:59.436 20467 DEBUG cinder.volume.drivers.remotefs [req-eda7ddf5-217d-490d-a8d4-1813df68d8db tempest-VolumesSnapshotTestJSON-708947401 -] Deleting online snapshot 91755e5f-e573-4ddb-84af-3712d69a fc89 of volume 94598844-418c-4b5d-b034-5330e24e7421 _delete_snapshot /opt/stack/cinder/cinder/volume/drivers/remotefs.py:1099 2017-09-13 08:28:59.487 20467 DEBUG cinder.volume.drivers.remotefs [req-eda7ddf5-217d-490d-a8d4-1813df68d8db tempest-VolumesSnapshotTestJSON-708947401 -] snapshot_file for this snap is: volume-94598844-418c-4b5d-b034-5330e24e7421.91755e5f-e573-4ddb-84af-3712d69afc89 _delete_snapshot /opt/stack/cinder/cinder/volume/drivers/remotefs.py:1124 The next snapshot to be deleted (138a1f62-7582-4aaa-9d72-9eada34b) shows that a wrong snapshot_file is read from the volumes .info file. In fact it shows the file of the previous snapshot : 2017-09-13 08:29:01.857 20467 DEBUG cinder.volume.drivers.remotefs [req-6ad4add9-34b8-41b9-a1f0-7dc2d6bb1862 tempest-VolumesSnapshotTestJSON-708947401 -] Deleting online snapshot 138a1f62-7582-4aaa-9d72-9eada34b eeaf of volume 94598844-418c-4b5d-b034-5330e24e7421 _delete_snapshot /opt/stack/cinder/cinder/volume/drivers/remotefs.py:1099 2017-09-13 08:29:01.872 20467 DEBUG cinder.volume.drivers.remotefs [req-6ad4add9-34b8-41b9-a1f0-7dc2d6bb1862 tempest-VolumesSnapshotTestJSON-708947401 -] snapshot_file for this snap is: volume-94598844-418c-4b5d-b034-5330e24e7421.91755e5f-e573-4ddb-84af-3712d69afc89 _delete_snapshot /opt/stack/cinder/cinder/volume/drivers/remotefs.py:1124 Now this second snapshot deletion fails because the snapshot file for 138a1f62-7582-4aaa-9d72-9eada34b no longer exists: 2017-09-13 08:29:02.674 20467 ERROR oslo_messaging.rpc.server ProcessExecutionError: Unexpected error while running command. 2017-09-13 08:29:02.674 20467 ERROR oslo_messaging.rpc.server Command: /usr/bin/python -m oslo_concurrency.prlimit --as=1073741824 --cpu=8 -- env LC_ALL=C qemu-img info /opt/stack/data/cinder/mnt/a1e3635ffba9fce1b854369f1a255d7b/volume-94598844-418c-4b5d-b034-5330e24e7421.138a1f62-7582-4aaa-9d72-9eada34beeaf 2017-09-13 08:29:02.674 20467 ERROR oslo_messaging.rpc.server Exit code: 1 2017-09-13 08:29:02.674 20467 ERROR oslo_messaging.rpc.server Stdout: u'' 2017-09-13 08:29:02.674 20467 ERROR oslo_messaging.rpc.server Stderr: u"qemu-img: Could not open '/opt/stack/data/cinder/mnt/a1e3635ffba9fce1b854369f1a255d7b/volume-94598844-418c-4b5d-b034-5330e24e7421.138a1f62-7582-4aaa-9d72-9eada34beeaf': Could not open '/opt/stack/data/cinder/mnt/a1e3635ffba9fce1b854369f1a255d7b/volume-94598844-418c-4b5d-b034-5330e24e7421.138a1f62-7582-4aaa-9d72-9eada34beeaf': No such file or directory\n" The referenced tempest test fails 100% of the time in our CIs. I manually tested the scenario and found the same results. Furthermore i was able, by creating three consecutive snapshots from a single volume and deleting them one after the other, to create a snapshot file with a broken backing file link. In the end i was left with a volume file and an overlay file referencing a removed backing file (previous snapshot of the same volume). I was able to run the scenario without issues when using offline snapshots. Thus this seems to be related to the usage of the online snapshot deletion via the Nova API. To manage notifications about this bug go to: https://bugs.launchpad.net/cinder/+bug/1716920/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~y
[Yahoo-eng-team] [Bug 1808594] Re: [RFE] Limit Geneve to within Neutron availability zones
** Tags added: rfe ** Also affects: networking-ovn Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1808594 Title: [RFE] Limit Geneve to within Neutron availability zones Status in networking-ovn: New Status in neutron: New Bug description: Creating multiple Neutron availability zones allows the operator to schedule DHCP and L3 agents within a single AZ. Neutron OVN will still try to form a Geneve mesh between all nodes in all availability zones, which creates inter-AZ dependencies and may not work when strict firewalls are placed between AZs. Note that this RFE is a clone of https://bugs.launchpad.net/neutron/+bug/1808062 but applies to Neutron OVN instead of ML2/OVS. This behavior should be configurable, so that L2 may be limited to a particular AZ, and no tunnels are formed between different AZs. This will prevent Neutron from trying to form tunnels when the tunnel cannot function, and may enhance security when AZs are in different security zones. The desired end-state configuration would have separate DHCP and L3 agents hosted in each AZ, along with tunnels formed only inside the AZ. This would allow, for instance, multiple edge sites within a single deployment that each performed local networking only. Any particular Neutron network would be limited to one AZ. A new flag would allow AZs to be truly autonomous and remove cross-AZ dependencies. Note that it appears that NSX-T has a concept called "Transport Zones" that enables the feature that is being requested here. Compute nodes within a given transport zone will only be able to communicate with compute nodes within that same transport zone. This prevents network traffic from being sent between zones. More information here: https://docs.vmware.com/en/VMware-NSX-T-Data- Center/2.3/com.vmware.nsxt.install.doc/GUID-F47989B2-2B9D-4214-B3BA- 5DDF66A1B0E6.html NSX-T also supports Availability Zones, but it appears that those are separate from the Transport Zone functionality: https://docs.vmware.com/en/VMware-Integrated- OpenStack/5.0/com.vmware.openstack.admin.doc/GUID-37F0E9DE-BD19-4AB0 -964C-D1D12B06345C.html It's possible that limiting tunneling traffic to a particular AZ may be outside the intended functions of Neutron AZs, but I think this is a valid use case. To manage notifications about this bug go to: https://bugs.launchpad.net/networking-ovn/+bug/1808594/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp