[Yahoo-eng-team] [Bug 1930750] [NEW] pyroute2 >= 0.6.2 fails in pep8 import analysis

2021-06-03 Thread Rodolfo Alonso
Public bug reported:

Since version 0.6.2, pyroute2 library dynamically imports the needed
modules when loaded. A static analysis will fail when checking the
import references.

Example: https://c918cbae52d07f0b694c-
87cfb8a8e579ae39cc41214d7e8b69d2.ssl.cf1.rackcdn.com/793735/2/check
/openstack-tox-pep8/62e482e/job-output.txt

Snippet: http://paste.openstack.org/show/806340/

** Affects: neutron
 Importance: Critical
 Assignee: Rodolfo Alonso (rodolfo-alonso-hernandez)
 Status: In Progress

** Changed in: neutron
   Importance: Undecided => Critical

** Changed in: neutron
 Assignee: (unassigned) => Rodolfo Alonso (rodolfo-alonso-hernandez)

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1930750

Title:
  pyroute2 >= 0.6.2 fails in pep8 import analysis

Status in neutron:
  In Progress

Bug description:
  Since version 0.6.2, pyroute2 library dynamically imports the needed
  modules when loaded. A static analysis will fail when checking the
  import references.

  Example: https://c918cbae52d07f0b694c-
  87cfb8a8e579ae39cc41214d7e8b69d2.ssl.cf1.rackcdn.com/793735/2/check
  /openstack-tox-pep8/62e482e/job-output.txt

  Snippet: http://paste.openstack.org/show/806340/

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1930750/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1837995] Re: "Unexpected API Error" when use "openstack usage show" command

2021-06-03 Thread Elod Illes
** Changed in: nova/victoria
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1837995

Title:
  "Unexpected API Error" when use "openstack usage show" command

Status in OpenStack Compute (nova):
  In Progress
Status in OpenStack Compute (nova) train series:
  New
Status in OpenStack Compute (nova) ussuri series:
  New
Status in OpenStack Compute (nova) victoria series:
  Fix Released

Bug description:
  Description
  ===
  For a non-admin project, if you have instance launched. And try to query the 
usage information on GUI by clicking Overview or on CLI: openstack usage show

  I will got "Error: Unable to retrieve usage information." on GUI. and
  the following ERROR for CLI output:

  $ openstack usage show
  Unexpected API Error. Please report this at http://bugs.launchpad.net/nova/ 
and attach the Nova API log if possible.
   (HTTP 500) (Request-ID: 
req-cbea9542-ecce-42fd-b660-fc5f996ea3c3)

  Steps to reproduce
  ==
  Execute "openstack usage show" command 
  Or click Project - Compute - Overview button on the GUI.

  
  Expected result
  ===
  No Error report and the usage information shown

  
  Actual result
  =
  Unexpected API Error. Please report this at http://bugs.launchpad.net/nova/ 
and attach the Nova API log if possible.
   (HTTP 500) (Request-ID: 
req-cbea9542-ecce-42fd-b660-fc5f996ea3c3)

  
  Environment
  ===
  1. Exact version of OpenStack you are running. 
  Openstack Stein on CentOS7

  $ rpm -qa | grep nova
  openstack-nova-api-19.0.1-1.el7.noarch
  puppet-nova-14.4.0-1.el7.noarch
  python2-nova-19.0.1-1.el7.noarch
  openstack-nova-conductor-19.0.1-1.el7.noarch
  openstack-nova-novncproxy-19.0.1-1.el7.noarch
  openstack-nova-migration-19.0.1-1.el7.noarch
  openstack-nova-common-19.0.1-1.el7.noarch
  openstack-nova-scheduler-19.0.1-1.el7.noarch
  openstack-nova-console-19.0.1-1.el7.noarch
  python2-novaclient-13.0.1-1.el7.noarch
  openstack-nova-placement-api-19.0.1-1.el7.noarch
  openstack-nova-compute-19.0.1-1.el7.noarch

  2. Which hypervisor did you use?
 Libvirt + KVM
 $ rpm -qa | grep kvm
  qemu-kvm-ev-2.12.0-18.el7_6.5.1.x86_64
  libvirt-daemon-kvm-4.5.0-10.el7_6.12.x86_64
  qemu-kvm-common-ev-2.12.0-18.el7_6.5.1.x86_64
  $ rpm -qa | grep libvirt
  libvirt-gconfig-1.0.0-1.el7.x86_64
  libvirt-daemon-driver-nwfilter-4.5.0-10.el7_6.12.x86_64
  libvirt-daemon-driver-interface-4.5.0-10.el7_6.12.x86_64
  libvirt-daemon-config-nwfilter-4.5.0-10.el7_6.12.x86_64
  libvirt-daemon-driver-storage-mpath-4.5.0-10.el7_6.12.x86_64
  libvirt-daemon-driver-storage-gluster-4.5.0-10.el7_6.12.x86_64
  libvirt-daemon-driver-storage-core-4.5.0-10.el7_6.12.x86_64
  libvirt-daemon-driver-secret-4.5.0-10.el7_6.12.x86_64
  libvirt-daemon-driver-lxc-4.5.0-10.el7_6.12.x86_64
  libvirt-daemon-driver-storage-rbd-4.5.0-10.el7_6.12.x86_64
  libvirt-daemon-kvm-4.5.0-10.el7_6.12.x86_64
  libvirt-bash-completion-4.5.0-10.el7_6.12.x86_64
  libvirt-4.5.0-10.el7_6.12.x86_64
  libvirt-glib-1.0.0-1.el7.x86_64
  libvirt-daemon-4.5.0-10.el7_6.12.x86_64
  libvirt-daemon-driver-qemu-4.5.0-10.el7_6.12.x86_64
  libvirt-daemon-config-network-4.5.0-10.el7_6.12.x86_64
  libvirt-daemon-driver-storage-disk-4.5.0-10.el7_6.12.x86_64
  libvirt-daemon-driver-storage-4.5.0-10.el7_6.12.x86_64
  libvirt-python-4.5.0-1.el7.x86_64
  libvirt-libs-4.5.0-10.el7_6.12.x86_64
  libvirt-daemon-driver-storage-scsi-4.5.0-10.el7_6.12.x86_64
  libvirt-daemon-driver-network-4.5.0-10.el7_6.12.x86_64
  libvirt-daemon-driver-nodedev-4.5.0-10.el7_6.12.x86_64
  libvirt-daemon-driver-storage-logical-4.5.0-10.el7_6.12.x86_64
  libvirt-daemon-driver-storage-iscsi-4.5.0-10.el7_6.12.x86_64
  libvirt-client-4.5.0-10.el7_6.12.x86_64
  libvirt-gobject-1.0.0-1.el7.x86_64

  
  Logs & Configs
  ==

  nova-api.log

  
  2019-07-26 16:12:53.967 8673 INFO nova.osapi_compute.wsgi.server 
[req-69d7df76-7dd9-4d42-8eeb-347ef1c9d0a5 f887cc44f21043dca85438d74a47d68d 
0d47cfd5b9c94a5790fa4472e576cba6 - default default] c5f::e2 "GET 
/v2.1/0d47cfd5b9c94a5790fa4472e576cba6/servers/detail?all_tenants=True=2019-07-26T08%3A07%3A55.280119%2B00%3A00
 HTTP/1.1" status: 200 len: 413 time: 0.0639658
  2019-07-26 16:12:57.211 8682 ERROR nova.api.openstack.wsgi 
[req-cbea9542-ecce-42fd-b660-fc5f996ea3c3 1e45ea9a7d5647a6a938c2ac027822f2 
85dd8936d21b46a8878ed59678c7ad9a - default default] Unexpected exception in API 
method: OrphanedObjectError: Cannot call obj_load_attr on orphaned Instance 
object
  2019-07-26 16:12:57.211 8682 ERROR nova.api.openstack.wsgi Traceback (most 
recent call last):
  2019-07-26 16:12:57.211 8682 ERROR nova.api.openstack.wsgi   File 
"/usr/lib/python2.7/site-packages/nova/api/openstack/wsgi.py", line 671, in 
wrapped
  2019-07-26 16:12:57.211 8682 ERROR nova.api.openstack.wsgi return 
f(*args, 

[Yahoo-eng-team] [Bug 1732428] Re: Unshelving a VM breaks instance metadata when using qcow2 backed images

2021-06-03 Thread Elod Illes
** Changed in: nova/ussuri
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1732428

Title:
  Unshelving a VM breaks instance metadata when using qcow2 backed
  images

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) ocata series:
  Confirmed
Status in OpenStack Compute (nova) pike series:
  Confirmed
Status in OpenStack Compute (nova) train series:
  Fix Committed
Status in OpenStack Compute (nova) ussuri series:
  Fix Released

Bug description:
  If you unshelve instances on compute nodes that use qcow2 backed
  instances, the instance image_ref will point to the original image the
  VM was lauched from. The base file for
  /var/lib/nova/instances/uuid/disk will be the snapshot which was used
  for shelving. This causes errors with e.g. resizes and migrations.

  Steps to reproduce/what happens:
  Have at least 2 compute nodes configured with the standard qcow2 backed 
images.

  1) Launch an instance.
  2) Shelve the instance. In the background this should in practice create a 
flattened snapshot of the VM.

  3) Unshelve the instance. The instance will boot on one of the compute
  nodes. The /var/lib/nova/instances/uuid/disk should now have the
  snapshot as its base file. The instance metadata still claims that the
  image_ref is the original image which the VM was launched from, not
  the snapshot.

  4) Resize/migrate the instance. /var/lib/nova/instances/uuid/disk
  should be copied to the other compute node. If you resize to an image
  with the same size disk, go to 5), if you resize to flavor with a
  larger disk, it probably causes an error here when it tries to grow
  the disk.

  5a) If the instance was running: When nova tries to start the VM, it
  will copy the original base image to the new compute node, not the
  snapshot base image. The instance can't boot, since it doesn't find
  its actual base file, and it goes to an ERROR state.

  5b) If the instance was shutdown: You can confirm the resize, but the
  VM won't start. The snapshot base file may be removed from the source
  machine causing dataloss.

  What should have happened:
  Either the instance image_ref should be updated to the snapshot image, or the 
snapshot image should be rebased to the original image, or is should force a 
raw only image after unshelve, or something else you smart people come up with.

  Environment:
  RDO Neutron with KVM

  rpm -qa |grep nova
  openstack-nova-common-14.0.6-1.el7.noarch
  python2-novaclient-6.0.1-1.el7.noarch
  python-nova-14.0.6-1.el7.noarch
  openstack-nova-compute-14.0.6-1.el7.noarch

  Also a big thank you to Toni Peltonen and Anton Aksola from nebula.fi
  for discovering and debugging this issue.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1732428/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1904446] Re: 'GetPMEMNamespacesFailed' is not a valid exception

2021-06-03 Thread Elod Illes
** Changed in: nova/ussuri
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1904446

Title:
  'GetPMEMNamespacesFailed' is not a valid exception

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) train series:
  In Progress
Status in OpenStack Compute (nova) ussuri series:
  Fix Released
Status in OpenStack Compute (nova) victoria series:
  Fix Released

Bug description:
  Attempting to retrieve a non-existent PMEM device results in the
  following traceback:

  ./nova-compute.log.1:2020-11-16 16:01:22.704 7 ERROR nova During handling 
of the above exception, another exception occurred:
  ./nova-compute.log.1:2020-11-16 16:01:22.704 7 ERROR nova
  ./nova-compute.log.1:2020-11-16 16:01:22.704 7 ERROR nova Traceback (most 
recent call last):
  ./nova-compute.log.1:2020-11-16 16:01:22.704 7 ERROR nova   File 
"/usr/bin/nova-compute", line 10, in 
  ./nova-compute.log.1:2020-11-16 16:01:22.704 7 ERROR nova 
sys.exit(main())
  ./nova-compute.log.1:2020-11-16 16:01:22.704 7 ERROR nova   File 
"/usr/lib/python3.6/site-packages/nova/cmd/compute.py", line 57, in main
  ./nova-compute.log.1:2020-11-16 16:01:22.704 7 ERROR nova 
topic=compute_rpcapi.RPC_TOPIC)
  ./nova-compute.log.1:2020-11-16 16:01:22.704 7 ERROR nova   File 
"/usr/lib/python3.6/site-packages/nova/service.py", line 271, in create
  ./nova-compute.log.1:2020-11-16 16:01:22.704 7 ERROR nova 
periodic_interval_max=periodic_interval_max)
  ./nova-compute.log.1:2020-11-16 16:01:22.704 7 ERROR nova   File 
"/usr/lib/python3.6/site-packages/nova/service.py", line 129, in __init__
  ./nova-compute.log.1:2020-11-16 16:01:22.704 7 ERROR nova 
self.manager = manager_class(host=self.host, *args, **kwargs)
  ./nova-compute.log.1:2020-11-16 16:01:22.704 7 ERROR nova   File 
"/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 571, in 
__init__
  ./nova-compute.log.1:2020-11-16 16:01:22.704 7 ERROR nova self.driver 
= driver.load_compute_driver(self.virtapi, compute_driver)
  ./nova-compute.log.1:2020-11-16 16:01:22.704 7 ERROR nova   File 
"/usr/lib/python3.6/site-packages/nova/virt/driver.py", line 1911, in 
load_compute_driver
  ./nova-compute.log.1:2020-11-16 16:01:22.704 7 ERROR nova virtapi)
  ./nova-compute.log.1:2020-11-16 16:01:22.704 7 ERROR nova   File 
"/usr/lib/python3.6/site-packages/oslo_utils/importutils.py", line 44, in 
import_object
  ./nova-compute.log.1:2020-11-16 16:01:22.704 7 ERROR nova return 
import_class(import_str)(*args, **kwargs)
  ./nova-compute.log.1:2020-11-16 16:01:22.704 7 ERROR nova   File 
"/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 446, in 
__init__
  ./nova-compute.log.1:2020-11-16 16:01:22.704 7 ERROR nova 
vpmem_conf=CONF.libvirt.pmem_namespaces)
  ./nova-compute.log.1:2020-11-16 16:01:22.704 7 ERROR nova   File 
"/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 477, in 
_discover_vpmems
  ./nova-compute.log.1:2020-11-16 16:01:22.704 7 ERROR nova vpmems_host 
= self._get_vpmems_on_host()
  ./nova-compute.log.1:2020-11-16 16:01:22.704 7 ERROR nova   File 
"/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 512, in 
_get_vpmems_on_host
  ./nova-compute.log.1:2020-11-16 16:01:22.704 7 ERROR nova raise 
exception.GetPMEMNamespacesFailed(reason=reason)
  ./nova-compute.log.1:2020-11-16 16:01:22.704 7 ERROR nova AttributeError: 
module 'nova.exception' has no attribute 'GetPMEMNamespacesFailed'
  ./nova-compute.log.1:2020-11-16 16:01:22.704 7 ERROR nova

  It seems there was a typo introduced when this code was added. The
  code referenced 'GetPMEMNamespacesFailed' but the exception, which has
  since been removed since it was "unused", was called
  'GetPMEMNamespaceFailed'.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1904446/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1900006] Re: Asking for different vGPU types is racey

2021-06-03 Thread Elod Illes
** Changed in: nova/victoria
   Status: Confirmed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/196

Title:
  Asking for different vGPU types is racey

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) victoria series:
  Fix Released

Bug description:
  When testing on Victoria virtual GPUs, I wanted to have different
  types :

  [devices]
  enabled_vgpu_types = nvidia-320,nvidia-321

  [vgpu_nvidia-320]
  device_addresses = :04:02.1,:04:02.2

  [vgpu_nvidia-321]
  device_addresses = :04:02.3

  
  Unfortunately, I saw that only the first type was used.
  When restarting the nova-compute service, we got the log :
  WARNING nova.virt.libvirt.driver [None 
req-a23d9cb4-6554-499c-9fcf-d7f9706535ef None None] The vGPU type 'nvidia-320' 
was listed in '[devices] enabled_vgpu_types' but no corresponding 
'[vgpu_nvidia-320]' group or '[vgpu_nvidia-320] device_addresses' option was 
defined. Only the first type 'nvidia-320' will be used.

  
  It's due to the fact that we call _get_supported_vgpu_types() first when 
creating the libvirt implementation [1] while we only register the new CONF 
options by init_host() [2] which is called after.

  
  [1] 
https://github.com/openstack/nova/blob/90777d790d7c268f50851ac3e5b4e02617f5ae1c/nova/virt/libvirt/driver.py#L418

  [2]
  https://github.com/openstack/nova/blob/90777d7/nova/compute/manager.py#L1405

  A simple fix would just be to make sure we have dynamic options within
  _get_supported_vgpu_types()

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/196/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1896496] Re: Combination of 'hw_video_ram' image metadata prop, 'hw_video:ram_max_mb' extra spec raises error

2021-06-03 Thread Elod Illes
** Also affects: nova/train
   Importance: Undecided
   Status: New

** Changed in: nova/train
   Status: New => Fix Released

** Also affects: nova/ussuri
   Importance: Undecided
   Status: New

** Changed in: nova/ussuri
   Status: New => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1896496

Title:
  Combination of 'hw_video_ram' image metadata prop,
  'hw_video:ram_max_mb' extra spec raises error

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) train series:
  Fix Released
Status in OpenStack Compute (nova) ussuri series:
  Fix Released
Status in OpenStack Compute (nova) victoria series:
  Fix Released

Bug description:
  The 'hw_video_ram' image metadata property is used to configure the
  amount of memory allocated to VRAM. Using it requires specifying the
  'hw_video:ram_max_mb' extra spec or you'll get the following error:

nova.exception.RequestedVRamTooHigh: The requested amount of video
  memory 8 is higher than the maximum allowed by flavor 0.

  However, specifying these currently results in a libvirt failure.

ERROR nova.compute.manager [None ...] [instance: 
11a71ae4-e410-4856-aeab-eea6ca4784c5] Failed to build and run instance: 
libvirt.libvirtError: XML error: cannot parse video vram '8192.0'
ERROR nova.compute.manager [instance: ...] Traceback (most recent call 
last):
ERROR nova.compute.manager [instance: ...]   File 
"/opt/stack/nova/nova/compute/manager.py", line 2333, in _build_and_run_instance
ERROR nova.compute.manager [instance: ...] accel_info=accel_info)
ERROR nova.compute.manager [instance: ...]   File 
"/opt/stack/nova/nova/virt/libvirt/driver.py", line 3632, in spawn
ERROR nova.compute.manager [instance: ...] 
cleanup_instance_disks=created_disks)
ERROR nova.compute.manager [instance: ...]   File 
"/opt/stack/nova/nova/virt/libvirt/driver.py", line 6527, in 
_create_domain_and_network
ERROR nova.compute.manager [instance: ...] 
cleanup_instance_disks=cleanup_instance_disks)
ERROR nova.compute.manager [instance: ...]   File 
"/usr/local/lib/python3.6/dist-packages/oslo_utils/excutils.py", line 220, in 
__exit__
ERROR nova.compute.manager [instance: ...] self.force_reraise()
ERROR nova.compute.manager [instance: ...]   File 
"/usr/local/lib/python3.6/dist-packages/oslo_utils/excutils.py", line 196, in 
force_reraise
ERROR nova.compute.manager [instance: ...] six.reraise(self.type_, 
self.value, self.tb)
ERROR nova.compute.manager [instance: ...]   File 
"/usr/local/lib/python3.6/dist-packages/six.py", line 703, in reraise
ERROR nova.compute.manager [instance: ...] raise value
ERROR nova.compute.manager [instance: ...]   File 
"/opt/stack/nova/nova/virt/libvirt/driver.py", line 6496, in 
_create_domain_and_network
ERROR nova.compute.manager [instance: ...] 
post_xml_callback=post_xml_callback)
ERROR nova.compute.manager [instance: ...]   File 
"/opt/stack/nova/nova/virt/libvirt/driver.py", line 6425, in _create_domain
ERROR nova.compute.manager [instance: ...] guest = 
libvirt_guest.Guest.create(xml, self._host)
ERROR nova.compute.manager [instance: ...]   File 
"/opt/stack/nova/nova/virt/libvirt/guest.py", line 127, in create
ERROR nova.compute.manager [instance: ...] encodeutils.safe_decode(xml))
ERROR nova.compute.manager [instance: ...]   File 
"/usr/local/lib/python3.6/dist-packages/oslo_utils/excutils.py", line 220, in 
__exit__
ERROR nova.compute.manager [instance: ...] self.force_reraise()
ERROR nova.compute.manager [instance: ...]   File 
"/usr/local/lib/python3.6/dist-packages/oslo_utils/excutils.py", line 196, in 
force_reraise
ERROR nova.compute.manager [instance: ...] six.reraise(self.type_, 
self.value, self.tb)
ERROR nova.compute.manager [instance: ...]   File 
"/usr/local/lib/python3.6/dist-packages/six.py", line 703, in reraise
ERROR nova.compute.manager [instance: ...] raise value
ERROR nova.compute.manager [instance: ...]   File 
"/opt/stack/nova/nova/virt/libvirt/guest.py", line 123, in create
ERROR nova.compute.manager [instance: ...] guest = 
host.write_instance_config(xml)
ERROR nova.compute.manager [instance: ...]   File 
"/opt/stack/nova/nova/virt/libvirt/host.py", line 1135, in write_instance_config
ERROR nova.compute.manager [instance: ...] domain = 
self.get_connection().defineXML(xml)
ERROR nova.compute.manager [instance: ...]   File 
"/usr/local/lib/python3.6/dist-packages/eventlet/tpool.py", line 190, in doit
ERROR nova.compute.manager [instance: ...] result = 
proxy_call(self._autowrap, f, *args, **kwargs)
ERROR nova.compute.manager [instance: ...]   File 
"/usr/local/lib/python3.6/dist-packages/eventlet/tpool.py", line 148, in 
proxy_call
ERROR 

[Yahoo-eng-team] [Bug 1899541] Re: archive_deleted_rows archives pci_devices records as residue because of 'instance_uuid'

2021-06-03 Thread Elod Illes
** Changed in: nova/train
   Status: In Progress => Fix Released

** Changed in: nova/ussuri
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1899541

Title:
  archive_deleted_rows archives pci_devices records as residue because
  of 'instance_uuid'

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) queens series:
  In Progress
Status in OpenStack Compute (nova) rocky series:
  In Progress
Status in OpenStack Compute (nova) stein series:
  In Progress
Status in OpenStack Compute (nova) train series:
  Fix Released
Status in OpenStack Compute (nova) ussuri series:
  Fix Released
Status in OpenStack Compute (nova) victoria series:
  Fix Released

Bug description:
  This is based on a bug reported downstream [1] where after a random
  amount of time, update_available_resource began to fail with the
  following trace on nodes with PCI devices:

"traceback": [
  "Traceback (most recent call last):",
  "  File \"/usr/lib/python2.7/site-packages/nova/compute/manager.py\", 
line 7447, in update_available_resource_for_node",
  "rt.update_available_resource(context, nodename)",
  "  File 
\"/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py\", line 
706, in update_available_resource",
  "self._update_available_resource(context, resources)",
  "  File 
\"/usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py\", line 274, 
in inner",
  "return f(*args, **kwargs)",
  "  File 
\"/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py\", line 
782, in _update_available_resource",
  "self._update(context, cn)",
  "  File 
\"/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py\", line 
926, in _update",
  "self.pci_tracker.save(context)",
  "  File \"/usr/lib/python2.7/site-packages/nova/pci/manager.py\", line 
92, in save",
  "dev.save()",
  "  File 
\"/usr/lib/python2.7/site-packages/oslo_versionedobjects/base.py\", line 210, 
in wrapper",
  "ctxt, self, fn.__name__, args, kwargs)",
  "  File \"/usr/lib/python2.7/site-packages/nova/conductor/rpcapi.py\", 
line 245, in object_action",
  "objmethod=objmethod, args=args, kwargs=kwargs)",
  "  File 
\"/usr/lib/python2.7/site-packages/oslo_messaging/rpc/client.py\", line 174, in 
call",
  "retry=self.retry)",
  "  File \"/usr/lib/python2.7/site-packages/oslo_messaging/transport.py\", 
line 131, in _send",
  "timeout=timeout, retry=retry)",
  "  File 
\"/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py\", 
line 559, in send",
  "retry=retry)",
  "  File 
\"/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py\", 
line 550, in _send",
  "raise result",
  "RemoteError: Remote error: DBError (pymysql.err.IntegrityError) (1048, 
u\"Column 'compute_node_id' cannot be null\") [SQL: u'INSERT INTO pci_devices 
(created_at, updated_at, deleted_at, deleted, uuid, compute_node_id, address, 
vendor_id, product_id, dev_type, dev_id, label, status, request_id, extra_info, 
instance_uuid, numa_node, parent_addr) VALUES (%(created_at)s, %(updated_at)s, 
%(deleted_at)s, %(deleted)s, %(uuid)s, %(compute_node_id)s, %(address)s, 
%(vendor_id)s, %(product_id)s, %(dev_type)s, %(dev_id)s, %(label)s, %(status)s, 
%(request_id)s, %(extra_info)s, %(instance_uuid)s, %(numa_node)s, 
%(parent_addr)s)'] [parameters: {'status': u'available', 'instance_uuid': None, 
'dev_type': None, 'uuid': None, 'dev_id': None, 'parent_addr': None, 
'numa_node': None, 'created_at': datetime.datetime(2020, 8, 7, 11, 51, 19, 
643044), 'vendor_id': None, 'updated_at': None, 'label': None, 'deleted': 0, 
'extra_info': '{}', 'compute_node_id': None, 'request_id': None, 'deleted_at': 
None, 'address': None, 'product_id': None}] (Background on this error at: 
http://sqlalche.me/e/gkpj)",


  Here ^ we see an attempt to insert a nearly empty (NULL fields) record
  into the pci_devices table. Inspection of the code shows that the way
  this can occur is if we fail to lookup the pci_devices record we want
  and then we try to create a new one [2]:


  @pick_context_manager_writer
  def pci_device_update(context, node_id, address, values):
  query = model_query(context, models.PciDevice, read_deleted="no").\
  filter_by(compute_node_id=node_id).\
  filter_by(address=address)
  if query.update(values) == 0:
  device = models.PciDevice()
  device.update(values)
  context.session.add(device)
  return query.one()


  Turns out what was happening was when a request came in to delete an
  instance that had allocated a PCI device, if the archive_deleted_rows
  cron job fired at just the right (wrong) moment, it would sweep 

[Yahoo-eng-team] [Bug 1885528] Re: snapshot delete fails on shutdown VM

2021-06-03 Thread Elod Illes
** Changed in: nova/ussuri
   Status: In Progress => Fix Released

** Changed in: nova/victoria
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1885528

Title:
  snapshot delete fails on shutdown VM

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) queens series:
  New
Status in OpenStack Compute (nova) rocky series:
  In Progress
Status in OpenStack Compute (nova) stein series:
  New
Status in OpenStack Compute (nova) trunk series:
  New
Status in OpenStack Compute (nova) ussuri series:
  Fix Released
Status in OpenStack Compute (nova) victoria series:
  Fix Released

Bug description:
  Description:
  When we try to delete the last snapshot of a VM in shutdown state, this 
snapshot_delete will fail (and be stuck in state error-deleting). When setting 
state==available and redeleting the snapshot, the volume will be corrupted and 
the VM will never start again. Volumes are stored on NFS.
  (for root cause and fix, see the bottom of this post)

  To reproduce:
  - storage on NFS
  - create a VM and some snapshots
  - shut down the VM (ie volume is still considered "attached" but vm is no 
longer "active")
  - delete the last snapshot

  Expected Result:
  snapshot is deleted, vm still works

  Actual result:
  The snapshot is stuck on error deleting. After setting the snapshot 
state==available and deleting the snapshot again, the volume will be corrupted 
and the VM will never start again. (non-existing backing_file in qcow on disk)

  Environment:
  - openstack version: stein, deployed via kolla-ansible. I suspect this 
downloads from git but i don't know the exact version.
  - hypervisor: Libvirt + KVM
  - storage: NFS
  - networking: Neutron with OpenVSwitch

  Nova debug Logs:
  2020-02-06 12:20:10.713 6 ERROR nova.virt.libvirt.driver 
[req-d38b5ec8-afdb-4dfe-af12-0c47598c6a47 6dd1c995b2ea4ddfbeb0685bc52e5fbf 
6bebb564667d4a75b9281fd826b32ecf - d
  efault default] [instance: 711651a3-8440-42dd-a210-e7e550a8624e] Error 
occurred during volume_snapshot_delete, sending error status to Cinder.: 
DiskNotFound: No disk at
   
volume-86c06b12-699c-4b54-8bca-fb92c99a2bf0.63d1585e-eb76-4e8f-bc96-93960e9c9692
  2020-02-06 12:20:10.713 6 ERROR nova.virt.libvirt.driver [instance: 
711651a3-8440-42dd-a210-e7e550a8624e] Traceback (most recent call last):
  2020-02-06 12:20:10.713 6 ERROR nova.virt.libvirt.driver [instance: 
711651a3-8440-42dd-a210-e7e550a8624e]   File 
"/usr/lib/python2.7/site-packages/nova/virt/libvirt/dri
  ver.py", line 2726, in volume_snapshot_delete
  2020-02-06 12:20:10.713 6 ERROR nova.virt.libvirt.driver [instance: 
711651a3-8440-42dd-a210-e7e550a8624e] snapshot_id, delete_info=delete_info)
  2020-02-06 12:20:10.713 6 ERROR nova.virt.libvirt.driver [instance: 
711651a3-8440-42dd-a210-e7e550a8624e]   File 
"/usr/lib/python2.7/site-packages/nova/virt/libvirt/dri
  ver.py", line 2686, in _volume_snapshot_delete
  2020-02-06 12:20:10.713 6 ERROR nova.virt.libvirt.driver [instance: 
711651a3-8440-42dd-a210-e7e550a8624e] rebase_base)
  2020-02-06 12:20:10.713 6 ERROR nova.virt.libvirt.driver [instance: 
711651a3-8440-42dd-a210-e7e550a8624e]   File 
"/usr/lib/python2.7/site-packages/nova/virt/libvirt/dri
  ver.py", line 2519, in _rebase_with_qemu_img
  2020-02-06 12:20:10.713 6 ERROR nova.virt.libvirt.driver [instance: 
711651a3-8440-42dd-a210-e7e550a8624e] b_file_fmt = 
images.qemu_img_info(backing_file).file_forma
  t
  2020-02-06 12:20:10.713 6 ERROR nova.virt.libvirt.driver [instance: 
711651a3-8440-42dd-a210-e7e550a8624e]   File 
"/usr/lib/python2.7/site-packages/nova/virt/images.py",
   line 58, in qemu_img_info
  2020-02-06 12:20:10.713 6 ERROR nova.virt.libvirt.driver [instance: 
711651a3-8440-42dd-a210-e7e550a8624e] raise 
exception.DiskNotFound(location=path)
  2020-02-06 12:20:10.713 6 ERROR nova.virt.libvirt.driver [instance: 
711651a3-8440-42dd-a210-e7e550a8624e] DiskNotFound: No disk at 
volume-86c06b12-699c-4b54-8bca-fb92c9
  9a2bf0.63d1585e-eb76-4e8f-bc96-93960e9c9692
  2020-02-06 12:20:10.713 6 ERROR nova.virt.libvirt.driver [instance: 
711651a3-8440-42dd-a210-e7e550a8624e] 
  2020-02-06 12:20:10.780 6 ERROR oslo_messaging.rpc.server 
[req-d38b5ec8-afdb-4dfe-af12-0c47598c6a47 6dd1c995b2ea4ddfbeb0685bc52e5fbf 
6bebb564667d4a75b9281fd826b32ecf - 
  default default] Exception during message handling: DiskNotFound: No disk at 
volume-86c06b12-699c-4b54-8bca-fb92c99a2bf0.63d1585e-eb76-4e8f-bc96-93960e9c9692
  2020-02-06 12:20:10.780 6 ERROR oslo_messaging.rpc.server Traceback (most 
recent call last):
  2020-02-06 12:20:10.780 6 ERROR oslo_messaging.rpc.server   File 
"/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 166, in 
_process_incoming
  2020-02-06 12:20:10.780 6 ERROR oslo_messaging.rpc.server res = 

[Yahoo-eng-team] [Bug 1905701] Re: Do not recreate libvirt secret when one already exists on the host during a host reboot

2021-06-03 Thread Elod Illes
** Changed in: nova/victoria
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1905701

Title:
  Do not recreate libvirt secret when one already exists on the host
  during a host reboot

Status in OpenStack Compute (nova):
  In Progress
Status in OpenStack Compute (nova) queens series:
  In Progress
Status in OpenStack Compute (nova) rocky series:
  In Progress
Status in OpenStack Compute (nova) stein series:
  In Progress
Status in OpenStack Compute (nova) train series:
  In Progress
Status in OpenStack Compute (nova) ussuri series:
  In Progress
Status in OpenStack Compute (nova) victoria series:
  Fix Released
Status in OpenStack Compute (nova) wallaby series:
  New
Status in OpenStack Compute (nova) xena series:
  In Progress

Bug description:
  Description
  ===

  When [compute]/resume_guests_state_on_host_boot is enabled the compute
  manager will attempt to restart instances on start up.

  When using the libvirt driver and instances with attached LUKSv1
  encrypted volumes a call is made to _attach_encryptor that currently
  assumes that any volume libvirt secrets don't already exist on the
  host. As a result this call will currently lead to an attempt to
  lookup encryption metadata that fails as the compute service is using
  a bare bones local only admin context to drive the restart of the
  instances.

  The libvirt secrets associated with LUKSv1 encrypted volumes actually
  persist a host reboot and thus this call to fetch encryption metadata,
  fetch the symmetric key etc are not required. Removal of these calls
  in this context should allow the compute service to start instances
  with these volumes attached.

  Steps to reproduce
  ==
  * Enable [compute]/resume_guests_state_on_host_boot
  * Launch instances with encrypted LUKSv1 volumes attached
  * Reboot the underlying host

  Expected result
  ===
  * The instances are restarted successfully by Nova as no external calls are 
made and the existing libvirt secret for any encrypted LUKSv1 volumes are 
reused.

  Actual result
  =
  * The instances fail to restart as the initial calls made by the Nova service 
use an empty admin context without a service catelog etc.

  Environment
  ===
  1. Exact version of OpenStack you are running. See the following

 master

  2. Which hypervisor did you use?
 (For example: Libvirt + KVM, Libvirt + XEN, Hyper-V, PowerKVM, ...)
 What's the version of that?

 libvirt + QEMU/KVM

  2. Which storage type did you use?
 (For example: Ceph, LVM, GPFS, ...)
 What's the version of that?

 N/A

  3. Which networking type did you use?
 (For example: nova-network, Neutron with OpenVSwitch, ...)

 N/A

  Logs & Configs
  ==

  2020-08-20 11:30:12.273 7 ERROR nova.virt.libvirt.driver [instance: 
c5b3e7d4-99ea-409c-aba6-d32751f93ccf]   File 
"/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 1641, in 
_connect_volume
  2020-08-20 11:30:12.273 7 ERROR nova.virt.libvirt.driver [instance: 
c5b3e7d4-99ea-409c-aba6-d32751f93ccf] self._attach_encryptor(context, 
connection_info, encryption)
  2020-08-20 11:30:12.273 7 ERROR nova.virt.libvirt.driver [instance: 
c5b3e7d4-99ea-409c-aba6-d32751f93ccf]   File 
"/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 1760, in 
_attach_encryptor
  2020-08-20 11:30:12.273 7 ERROR nova.virt.libvirt.driver [instance: 
c5b3e7d4-99ea-409c-aba6-d32751f93ccf] key = keymgr.get(context, 
encryption['encryption_key_id'])
  2020-08-20 11:30:12.273 7 ERROR nova.virt.libvirt.driver [instance: 
c5b3e7d4-99ea-409c-aba6-d32751f93ccf]   File 
"/usr/lib/python3.6/site-packages/castellan/key_manager/barbican_key_manager.py",
 line 575, in get
  2020-08-20 11:30:12.273 7 ERROR nova.virt.libvirt.driver [instance: 
c5b3e7d4-99ea-409c-aba6-d32751f93ccf] secret = self._get_secret(context, 
managed_object_id)
  2020-08-20 11:30:12.273 7 ERROR nova.virt.libvirt.driver [instance: 
c5b3e7d4-99ea-409c-aba6-d32751f93ccf]   File 
"/usr/lib/python3.6/site-packages/castellan/key_manager/barbican_key_manager.py",
 line 545, in _ge
  t_secret
  2020-08-20 11:30:12.273 7 ERROR nova.virt.libvirt.driver [instance: 
c5b3e7d4-99ea-409c-aba6-d32751f93ccf] barbican_client = 
self._get_barbican_client(context)
  2020-08-20 11:30:12.273 7 ERROR nova.virt.libvirt.driver [instance: 
c5b3e7d4-99ea-409c-aba6-d32751f93ccf]   File 
"/usr/lib/python3.6/site-packages/castellan/key_manager/barbican_key_manager.py",
 line 142, in _ge
  t_barbican_client
  2020-08-20 11:30:12.273 7 ERROR nova.virt.libvirt.driver [instance: 
c5b3e7d4-99ea-409c-aba6-d32751f93ccf] self._barbican_endpoint)
  2020-08-20 11:30:12.273 7 ERROR nova.virt.libvirt.driver [instance: 
c5b3e7d4-99ea-409c-aba6-d32751f93ccf]   File 

[Yahoo-eng-team] [Bug 1911924] Re: os-resetState not logged as an instance action

2021-06-03 Thread Elod Illes
** Changed in: nova/victoria
   Status: New => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1911924

Title:
  os-resetState not logged as an instance action

Status in OpenStack Compute (nova):
  Confirmed
Status in OpenStack Compute (nova) train series:
  New
Status in OpenStack Compute (nova) ussuri series:
  New
Status in OpenStack Compute (nova) victoria series:
  Fix Released

Bug description:
  Description
  ===
  When called os-resetState does not record an instance action.

  
  Steps to reproduce
  ==
  $ nova reset-state --active test
  $ openstack server event list test

  Expected result
  ===
  os-resetState listed as an instance action.

  Actual result
  =
  os-resetState not listed as an instance action.

  Environment
  ===
  1. Exact version of OpenStack you are running. See the following
list for all releases: http://docs.openstack.org/releases/

 7aa7fb94fd3573f6006f7eb8bc92b870b1750721

  2. Which hypervisor did you use?
 (For example: Libvirt + KVM, Libvirt + XEN, Hyper-V, PowerKVM, ...)
 What's the version of that?

 libvirt

  2. Which storage type did you use?
 (For example: Ceph, LVM, GPFS, ...)
 What's the version of that?

 N/A

  3. Which networking type did you use?
 (For example: nova-network, Neutron with OpenVSwitch, ...)

 N/A

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1911924/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1882421] Re: inject_password fails with python3

2021-06-03 Thread Elod Illes
** Changed in: nova/victoria
   Status: Fix Committed => Fix Released

** Also affects: nova/wallaby
   Importance: Undecided
   Status: New

** Changed in: nova/wallaby
   Status: New => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1882421

Title:
  inject_password fails with python3

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) train series:
  New
Status in OpenStack Compute (nova) ussuri series:
  New
Status in OpenStack Compute (nova) victoria series:
  Fix Released
Status in OpenStack Compute (nova) wallaby series:
  Fix Released

Bug description:
  Originally reported in #openstack-nova:

  14:44 < lvdombrkr>  hello guys, trying to inject admin_password 
(inject_password=true ) into  image but when creating instance get this error 
in nova-compute.log
  14:45 < lvdombrkr>  2020-06-06 14:53:50.188 6 WARNING nova.virt.disk.api 
[req-94f485ca-944c-40e9-bf14-c8b8dbe09a7b 052d02306e6746a4a3e7e5449de49f8c 
 413a4cadf9734fca9ec3e5e6192a446f - default default] 
Ignoring error injecting admin_password into image (a bytes-like object is 
required, not 'str')
  14:45 < lvdombrkr> Train + Centos8

  Can reproduce on master on devstack by installing python3-guestfs and
  setting

  [libvirt]
  inject_partition = -1
  inject_password = true

  in nova-cpu.conf. Backtrace after adding a hard "raise" into
  inject_data_into_fs():

  Jun 06 15:48:39 jh-devstack-focal-01a nova-compute[2983293]: ERROR 
nova.virt.libvirt.driver [None req-47214a25-b56a-4135-83bb-7c5ff4c86ca6 demo 
demo] [instance: 5604d60c-61c9-49b5-8786-ff5144817863] Error injecting data 
into image 4b3e63a6-b3c4-4de5-b515-cc286e7d5c48 (a bytes-like object is 
required, not 'str')
  Jun 06 15:48:39 jh-devstack-focal-01a nova-compute[2983293]: ERROR 
nova.compute.manager [None req-47214a25-b56a-4135-83bb-7c5ff4c86ca6 demo demo] 
[instance: 5604d60c-61c9-49b5-8786-ff5144817863] Instance failed to spawn: 
TypeError: a bytes-like object is required, not 'str'
  Jun 06 15:48:39 jh-devstack-focal-01a nova-compute[2983293]: ERROR 
nova.compute.manager [instance: 5604d60c-61c9-49b5-8786-ff5144817863] Traceback 
(most recent call last):
  Jun 06 15:48:39 jh-devstack-focal-01a nova-compute[2983293]: ERROR 
nova.compute.manager [instance: 5604d60c-61c9-49b5-8786-ff5144817863]   File 
"/opt/stack/nova/nova/compute/manager.py", line 2614, in _build_resources
  Jun 06 15:48:39 jh-devstack-focal-01a nova-compute[2983293]: ERROR 
nova.compute.manager [instance: 5604d60c-61c9-49b5-8786-ff5144817863] yield 
resources
  Jun 06 15:48:39 jh-devstack-focal-01a nova-compute[2983293]: ERROR 
nova.compute.manager [instance: 5604d60c-61c9-49b5-8786-ff5144817863]   File 
"/opt/stack/nova/nova/compute/manager.py", line 2374, in _build_and_run_instance
  Jun 06 15:48:39 jh-devstack-focal-01a nova-compute[2983293]: ERROR 
nova.compute.manager [instance: 5604d60c-61c9-49b5-8786-ff5144817863] 
self.driver.spawn(context, instance, image_meta,
  Jun 06 15:48:39 jh-devstack-focal-01a nova-compute[2983293]: ERROR 
nova.compute.manager [instance: 5604d60c-61c9-49b5-8786-ff5144817863]   File 
"/opt/stack/nova/nova/virt/libvirt/driver.py", line 3604, in spawn
  Jun 06 15:48:39 jh-devstack-focal-01a nova-compute[2983293]: ERROR 
nova.compute.manager [instance: 5604d60c-61c9-49b5-8786-ff5144817863] 
created_instance_dir, created_disks = self._create_image(
  Jun 06 15:48:39 jh-devstack-focal-01a nova-compute[2983293]: ERROR 
nova.compute.manager [instance: 5604d60c-61c9-49b5-8786-ff5144817863]   File 
"/opt/stack/nova/nova/virt/libvirt/driver.py", line 3991, in _create_image
  Jun 06 15:48:39 jh-devstack-focal-01a nova-compute[2983293]: ERROR 
nova.compute.manager [instance: 5604d60c-61c9-49b5-8786-ff5144817863] 
created_disks = self._create_and_inject_local_root(
  Jun 06 15:48:39 jh-devstack-focal-01a nova-compute[2983293]: ERROR 
nova.compute.manager [instance: 5604d60c-61c9-49b5-8786-ff5144817863]   File 
"/opt/stack/nova/nova/virt/libvirt/driver.py", line 4119, in 
_create_and_inject_local_root
  Jun 06 15:48:39 jh-devstack-focal-01a nova-compute[2983293]: ERROR 
nova.compute.manager [instance: 5604d60c-61c9-49b5-8786-ff5144817863] 
self._inject_data(backend, instance, injection_info)
  Jun 06 15:48:39 jh-devstack-focal-01a nova-compute[2983293]: ERROR 
nova.compute.manager [instance: 5604d60c-61c9-49b5-8786-ff5144817863]   File 
"/opt/stack/nova/nova/virt/libvirt/driver.py", line 3894, in _inject_data
  Jun 06 15:48:39 jh-devstack-focal-01a nova-compute[2983293]: ERROR 
nova.compute.manager [instance: 5604d60c-61c9-49b5-8786-ff5144817863] 
LOG.error('Error injecting data into image '
  Jun 06 15:48:39 jh-devstack-focal-01a nova-compute[2983293]: ERROR 
nova.compute.manager [instance: 5604d60c-61c9-49b5-8786-ff5144817863]   File 

[Yahoo-eng-team] [Bug 1882608] Re: DELETE fails with HTTP 500, StaleDataError: UPDATE statement on table 'instance_mappings' expected to update 1 row(s); 0 were matched

2021-06-03 Thread Elod Illes
** Changed in: nova/ussuri
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1882608

Title:
  DELETE fails with HTTP 500, StaleDataError: UPDATE statement on table
  'instance_mappings' expected to update 1 row(s); 0 were matched

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) train series:
  In Progress
Status in OpenStack Compute (nova) ussuri series:
  Fix Released

Bug description:
  Noticed in a failed nova-grenade-multinode gate job where a resource
  cleanup (server delete) during a ServersNegativeTestJSON test results
  in a 500 error and the job fails with:

  Jun 01 14:33:57.523020 ubuntu-bionic-rax-iad-0016890725 
devstack@n-api.service[13722]: ERROR nova.api.openstack.wsgi [None 
req-ab8b5ad1-c168-4f7e-9bfc-42b202b9894f 
tempest-ServersNegativeTestJSON-1435542876 
tempest-ServersNegativeTestJSON-1435542876] Unexpected exception in API method: 
sqlalchemy.orm.exc.StaleDataError: UPDATE statement on table 
'instance_mappings' expected to update 1 row(s); 0 were matched.
  Jun 01 14:33:57.523020 ubuntu-bionic-rax-iad-0016890725 
devstack@n-api.service[13722]: ERROR nova.api.openstack.wsgi Traceback (most 
recent call last):
  Jun 01 14:33:57.523020 ubuntu-bionic-rax-iad-0016890725 
devstack@n-api.service[13722]: ERROR nova.api.openstack.wsgi   File 
"/opt/stack/new/nova/nova/api/openstack/wsgi.py", line 671, in wrapped
  Jun 01 14:33:57.523020 ubuntu-bionic-rax-iad-0016890725 
devstack@n-api.service[13722]: ERROR nova.api.openstack.wsgi return 
f(*args, **kwargs)
  Jun 01 14:33:57.523020 ubuntu-bionic-rax-iad-0016890725 
devstack@n-api.service[13722]: ERROR nova.api.openstack.wsgi   File 
"/opt/stack/new/nova/nova/api/openstack/compute/servers.py", line 990, in delete
  Jun 01 14:33:57.523020 ubuntu-bionic-rax-iad-0016890725 
devstack@n-api.service[13722]: ERROR nova.api.openstack.wsgi 
self._delete(req.environ['nova.context'], req, id)
  Jun 01 14:33:57.523020 ubuntu-bionic-rax-iad-0016890725 
devstack@n-api.service[13722]: ERROR nova.api.openstack.wsgi   File 
"/opt/stack/new/nova/nova/api/openstack/compute/servers.py", line 798, in 
_delete
  Jun 01 14:33:57.523020 ubuntu-bionic-rax-iad-0016890725 
devstack@n-api.service[13722]: ERROR nova.api.openstack.wsgi 
self.compute_api.delete(context, instance)
  Jun 01 14:33:57.523020 ubuntu-bionic-rax-iad-0016890725 
devstack@n-api.service[13722]: ERROR nova.api.openstack.wsgi   File 
"/opt/stack/new/nova/nova/compute/api.py", line 224, in inner
  Jun 01 14:33:57.523020 ubuntu-bionic-rax-iad-0016890725 
devstack@n-api.service[13722]: ERROR nova.api.openstack.wsgi return 
function(self, context, instance, *args, **kwargs)
  Jun 01 14:33:57.523020 ubuntu-bionic-rax-iad-0016890725 
devstack@n-api.service[13722]: ERROR nova.api.openstack.wsgi   File 
"/opt/stack/new/nova/nova/compute/api.py", line 151, in inner
  Jun 01 14:33:57.523020 ubuntu-bionic-rax-iad-0016890725 
devstack@n-api.service[13722]: ERROR nova.api.openstack.wsgi return f(self, 
context, instance, *args, **kw)
  Jun 01 14:33:57.523020 ubuntu-bionic-rax-iad-0016890725 
devstack@n-api.service[13722]: ERROR nova.api.openstack.wsgi   File 
"/opt/stack/new/nova/nova/compute/api.py", line 2479, in delete
  Jun 01 14:33:57.523020 ubuntu-bionic-rax-iad-0016890725 
devstack@n-api.service[13722]: ERROR nova.api.openstack.wsgi 
self._delete_instance(context, instance)
  Jun 01 14:33:57.523020 ubuntu-bionic-rax-iad-0016890725 
devstack@n-api.service[13722]: ERROR nova.api.openstack.wsgi   File 
"/opt/stack/new/nova/nova/compute/api.py", line 2471, in _delete_instance
  Jun 01 14:33:57.523020 ubuntu-bionic-rax-iad-0016890725 
devstack@n-api.service[13722]: ERROR nova.api.openstack.wsgi 
task_state=task_states.DELETING)
  Jun 01 14:33:57.523020 ubuntu-bionic-rax-iad-0016890725 
devstack@n-api.service[13722]: ERROR nova.api.openstack.wsgi   File 
"/opt/stack/new/nova/nova/compute/api.py", line 2158, in _delete
  Jun 01 14:33:57.524852 ubuntu-bionic-rax-iad-0016890725 
devstack@n-api.service[13722]: ERROR nova.api.openstack.wsgi 
self._local_delete_cleanup(context, instance)
  Jun 01 14:33:57.524852 ubuntu-bionic-rax-iad-0016890725 
devstack@n-api.service[13722]: ERROR nova.api.openstack.wsgi   File 
"/opt/stack/new/nova/nova/compute/api.py", line 2117, in _local_delete_cleanup
  Jun 01 14:33:57.524852 ubuntu-bionic-rax-iad-0016890725 
devstack@n-api.service[13722]: ERROR nova.api.openstack.wsgi 
self._update_queued_for_deletion(context, instance, True)
  Jun 01 14:33:57.524852 ubuntu-bionic-rax-iad-0016890725 
devstack@n-api.service[13722]: ERROR nova.api.openstack.wsgi   File 
"/opt/stack/new/nova/nova/compute/api.py", line 2434, in 
_update_queued_for_deletion
  Jun 01 14:33:57.524852 ubuntu-bionic-rax-iad-0016890725 
devstack@n-api.service[13722]: ERROR 

[Yahoo-eng-team] [Bug 1882521] Re: Failing device detachments on Focal: "Unable to detach the device from the live config"

2021-06-03 Thread Elod Illes
** Changed in: nova/ussuri
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1882521

Title:
  Failing device detachments on Focal: "Unable to detach the device from
  the live config"

Status in Cinder:
  Invalid
Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) train series:
  Fix Released
Status in OpenStack Compute (nova) ussuri series:
  Fix Released
Status in OpenStack Compute (nova) victoria series:
  Fix Released

Bug description:
  The following tests are failing consistently when deploying devstack
  on Focal in the CI, see https://review.opendev.org/734029 for detailed
  logs:

  
tempest.api.compute.servers.test_server_rescue_negative.ServerRescueNegativeTestJSON.test_rescued_vm_detach_volume
  
tempest.api.compute.volumes.test_attach_volume.AttachVolumeMultiAttachTest.test_resize_server_with_multiattached_volume
  
tempest.api.compute.servers.test_server_rescue.ServerStableDeviceRescueTest.test_stable_device_rescue_disk_virtio_with_volume_attached
  tearDownClass 
(tempest.api.compute.servers.test_server_rescue.ServerStableDeviceRescueTest)

  Sample extract from nova-compute log:

  Jun 08 08:48:24.384559 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: 
DEBUG oslo.service.loopingcall [-] Exception which is in the suggested list of 
exceptions occurred while invoking function: 
nova.virt.libvirt.guest.Guest.detach_device_with_retry.._do_wait_and_retry_detach.
 {{(pid=82495) _func 
/usr/local/lib/python3.8/dist-packages/oslo_service/loopingcall.py:410}}
  Jun 08 08:48:24.384862 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: 
DEBUG oslo.service.loopingcall [-] Cannot retry 
nova.virt.libvirt.guest.Guest.detach_device_with_retry.._do_wait_and_retry_detach
 upon suggested exception since retry count (7) reached max retry count (7). 
{{(pid=82495) _func 
/usr/local/lib/python3.8/dist-packages/oslo_service/loopingcall.py:416}}
  Jun 08 08:48:24.388855 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: 
ERROR oslo.service.loopingcall [-] Dynamic interval looping call 
'oslo_service.loopingcall.RetryDecorator.__call__.._func' failed: 
nova.exception.DeviceDetachFailed: Device detach failed for vdb: Unable to 
detach the device from the live config.
  Jun 08 08:48:24.388855 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: 
ERROR oslo.service.loopingcall Traceback (most recent call last):
  Jun 08 08:48:24.388855 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: 
ERROR oslo.service.loopingcall   File 
"/usr/local/lib/python3.8/dist-packages/oslo_service/loopingcall.py", line 150, 
in _run_loop
  Jun 08 08:48:24.388855 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: 
ERROR oslo.service.loopingcall result = func(*self.args, **self.kw)
  Jun 08 08:48:24.388855 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: 
ERROR oslo.service.loopingcall   File 
"/usr/local/lib/python3.8/dist-packages/oslo_service/loopingcall.py", line 428, 
in _func
  Jun 08 08:48:24.388855 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: 
ERROR oslo.service.loopingcall return self._sleep_time
  Jun 08 08:48:24.388855 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: 
ERROR oslo.service.loopingcall   File 
"/usr/local/lib/python3.8/dist-packages/oslo_utils/excutils.py", line 220, in 
__exit__
  Jun 08 08:48:24.388855 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: 
ERROR oslo.service.loopingcall self.force_reraise()
  Jun 08 08:48:24.388855 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: 
ERROR oslo.service.loopingcall   File 
"/usr/local/lib/python3.8/dist-packages/oslo_utils/excutils.py", line 196, in 
force_reraise
  Jun 08 08:48:24.388855 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: 
ERROR oslo.service.loopingcall six.reraise(self.type_, self.value, self.tb)
  Jun 08 08:48:24.388855 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: 
ERROR oslo.service.loopingcall   File 
"/usr/local/lib/python3.8/dist-packages/six.py", line 703, in reraise
  Jun 08 08:48:24.388855 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: 
ERROR oslo.service.loopingcall raise value
  Jun 08 08:48:24.388855 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: 
ERROR oslo.service.loopingcall   File 
"/usr/local/lib/python3.8/dist-packages/oslo_service/loopingcall.py", line 407, 
in _func
  Jun 08 08:48:24.388855 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: 
ERROR oslo.service.loopingcall result = f(*args, **kwargs)
  Jun 08 08:48:24.388855 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: 
ERROR oslo.service.loopingcall   File 
"/opt/stack/nova/nova/virt/libvirt/guest.py", line 453, in 
_do_wait_and_retry_detach
  Jun 08 08:48:24.388855 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: 
ERROR oslo.service.loopingcall raise exception.DeviceDetachFailed(
  Jun 

[Yahoo-eng-team] [Bug 1917619] Re: Attempting to start or hard reboot a users instance as an admin with encrypted volumes leaves the instance unbootable when [workarounds]disable_native_luksv1 is enab

2021-06-03 Thread Elod Illes
** Also affects: nova/wallaby
   Importance: Undecided
   Status: New

** Changed in: nova/wallaby
   Status: New => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1917619

Title:
  Attempting to start or hard reboot a users instance as an admin with
  encrypted volumes leaves the instance unbootable when
  [workarounds]disable_native_luksv1 is enabled

Status in OpenStack Compute (nova):
  In Progress
Status in OpenStack Compute (nova) wallaby series:
  Fix Released

Bug description:
  Description
  ===
  $subject, by default admins do not have access to user created barbican 
secrets. As a result admins cannot hard reboot or stop/start instances as this 
deletes local libvirt secrets, refetches secrets from Barbican and recreates 
the local secrets.

  However this initial attempt by an admin will destroy the local
  secrets *before* failing to access anything in Barbican.

  As a result any request by the owner of the instance to hard reboot or
  stop/start the instance can fail as the _detach_encryptor logic fails
  to find any local secret and assumes that native LUKSv1 encryption
  isn't being used. This causes the os-brick encryptors to be loaded
  that can fail if the underlying volume type isn't supported, such as
  rbd.

  Steps to reproduce
  ==
  1. As an non-admin user create an instance with encrypted rbd volumes attached
  2. Attempt to hard reboot or stop/start the instance as an admin
  3. Attempt to hard reboot or stop/start the instance as the owner

  Expected result
  ===
  The request by the admin to hard reboot or stop/start the instance fails.
  The request by the owner to hard reboot or stop/start the instance fails due 
to os_brick.exception.VolumeEncryptionNotSupported being raised.

  Actual result
  =
  The request by the admin to hard reboot or stop/start the instance fails.
  The request by the owner to hard reboot or stop/start the instance succeeds.

  Environment
  ===
  1. Exact version of OpenStack you are running. See the following
    list for all releases: http://docs.openstack.org/releases/

     master

  2. Which hypervisor did you use?
     (For example: Libvirt + KVM, Libvirt + XEN, Hyper-V, PowerKVM, ...)
     What's the version of that?

     libvirt

  2. Which storage type did you use?
     (For example: Ceph, LVM, GPFS, ...)
     What's the version of that?

     N/A

  3. Which networking type did you use?
     (For example: nova-network, Neutron with OpenVSwitch, ...)

     N/A

  Logs & Configs
  ==

  https://bugzilla.redhat.com/show_bug.cgi?id=1934513

  2021-02-23 17:07:50.453 7 ERROR oslo_messaging.rpc.server 
[req-fe304872-e35f-4cb3-8760-4fd1eed745bc 
fef8c04ca63ab77e9a37b9d79367fd49747d2016352759f6faa8475fbf6f63c1 
4127275f099844f28fde120064aa4753 - 1d485afd913b4c489730f79d83044080 
1d485afd913b4c489730f79d83044080] Exception during message handling: 
os_brick.exception.VolumeEncryptionNotSupported: Volume encryption is not 
supported for rbd volume d9817c6a-9c84-472a-8fc8-58ad73b389aa.
  2021-02-23 17:07:50.453 7 ERROR oslo_messaging.rpc.server Traceback (most 
recent call last):
  2021-02-23 17:07:50.453 7 ERROR oslo_messaging.rpc.server   File 
"/usr/lib/python3.6/site-packages/oslo_messaging/rpc/server.py", line 165, in 
_process_incoming
  2021-02-23 17:07:50.453 7 ERROR oslo_messaging.rpc.server res = 
self.dispatcher.dispatch(message)
  2021-02-23 17:07:50.453 7 ERROR oslo_messaging.rpc.server   File 
"/usr/lib/python3.6/site-packages/oslo_messaging/rpc/dispatcher.py", line 274, 
in dispatch
  2021-02-23 17:07:50.453 7 ERROR oslo_messaging.rpc.server return 
self._do_dispatch(endpoint, method, ctxt, args)
  2021-02-23 17:07:50.453 7 ERROR oslo_messaging.rpc.server   File 
"/usr/lib/python3.6/site-packages/oslo_messaging/rpc/dispatcher.py", line 194, 
in _do_dispatch
  2021-02-23 17:07:50.453 7 ERROR oslo_messaging.rpc.server result = 
func(ctxt, **new_args)
  2021-02-23 17:07:50.453 7 ERROR oslo_messaging.rpc.server   File 
"/usr/lib/python3.6/site-packages/nova/exception_wrapper.py", line 79, in 
wrapped
  2021-02-23 17:07:50.453 7 ERROR oslo_messaging.rpc.server function_name, 
call_dict, binary, tb)
  2021-02-23 17:07:50.453 7 ERROR oslo_messaging.rpc.server   File 
"/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 220, in __exit__
  2021-02-23 17:07:50.453 7 ERROR oslo_messaging.rpc.server 
self.force_reraise()
  2021-02-23 17:07:50.453 7 ERROR oslo_messaging.rpc.server   File 
"/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 196, in 
force_reraise
  2021-02-23 17:07:50.453 7 ERROR oslo_messaging.rpc.server 
six.reraise(self.type_, self.value, self.tb)
  2021-02-23 17:07:50.453 7 ERROR oslo_messaging.rpc.server   File 
"/usr/lib/python3.6/site-packages/six.py", line 693, in 

[Yahoo-eng-team] [Bug 1913575] Re: Use auth_username when probing encrypted rbd volumes while extending them

2021-06-03 Thread Elod Illes
** Changed in: nova/ussuri
   Status: In Progress => Fix Released

** Changed in: nova/victoria
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1913575

Title:
  Use auth_username when probing encrypted rbd volumes while extending
  them

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) ussuri series:
  Fix Released
Status in OpenStack Compute (nova) victoria series:
  Fix Released

Bug description:
  Description
  ===

  I0c3f14100a18107f7e416293f3d4fcc641ce5e55 introduced new logic around
  resizing encrypted LUKSv1 volumes that probed the volume using qemu-
  img to determine the LUKSv1 header size and to take this into account
  during the resize. The use of qemu-img however assumes access to the
  admin rbd keyring as a username isn't provided. This isn't always
  available in all environment so the options `id:$username` need to be
  appended on the rbd URI provided to qemu-img.

  Steps to reproduce
  ==

  Attempt to resize an encrypted LUKSv1 volume on a compute without
  access to the admin keyring.

  Expected result
  ===

  The URI provided to qemu-img includes the username (and thus local
  keyring) to use.

  Actual result
  =

  qemu-img fails as it can't find the default admin keyring.

  Environment
  ===
  1. Exact version of OpenStack you are running. See the following
list for all releases: http://docs.openstack.org/releases/

master

  2. Which hypervisor did you use?
 (For example: Libvirt + KVM, Libvirt + XEN, Hyper-V, PowerKVM, ...)
 What's the version of that?

 libvirt + KVM

  2. Which storage type did you use?
 (For example: Ceph, LVM, GPFS, ...)
 What's the version of that?

 c-vol ceph

  3. Which networking type did you use?
 (For example: nova-network, Neutron with OpenVSwitch, ...)

  
 N/A

  Logs & Configs
  ==

  3e004ad2953a4aa7a2f9022be3ffc7cd - default default] [instance: 
8d640d15-30dd-4e72-a9ba-d9f7cf11b1ec] Unknown error when attempting to find the 
payload_offset for LUKSv1 encrypted disk 
rbd:volumes/volume-d721825d-038a-42f6-8127-aaec171e5c39.: 
nova.exception.InvalidDiskInfo: Disk info file is invalid: qemu-img failed to 
execute on rbd:volumes/volume-d721825d-038a-42f6-8127-aaec171e5c39 : Unexpected 
error while running command.
  Command: /usr/libexec/platform-python -m oslo_concurrency.prlimit 
--as=1073741824 --cpu=30 -- env LC_ALL=C LANG=C qemu-img info 
rbd:volumes/volume-d721825d-038a-42f6-8127-aaec171e5c39 --output=json 
--force-share
  Exit code: 1
  Stdout: ''
  Stdout: ''
  Stderr: "qemu-img: Could not open 
'rbd:volumes/volume-d721825d-038a-42f6-8127-aaec171e5c39': error connecting: 
Permission denied\n"

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1913575/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1919357] Re: "Secure live migration with QEMU-native TLS in nova"-guide misses essential config option

2021-06-03 Thread Elod Illes
** Changed in: nova/ussuri
   Status: New => Fix Released

** Changed in: nova/victoria
   Status: New => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1919357

Title:
  "Secure live migration with QEMU-native TLS in nova"-guide misses
  essential config option

Status in OpenStack Compute (nova):
  In Progress
Status in OpenStack Compute (nova) stein series:
  New
Status in OpenStack Compute (nova) train series:
  Fix Released
Status in OpenStack Compute (nova) ussuri series:
  Fix Released
Status in OpenStack Compute (nova) victoria series:
  Fix Released
Status in OpenStack Security Advisory:
  Won't Fix
Status in OpenStack Security Notes:
  In Progress

Bug description:
  - [x] This doc is inaccurate in this way: __

  I followed the guide to setup qemu native tls for live migration.
  After checking, that libvirt is able to use tls using tcpdump to
  listen on the port for tls,  I also wanted to check that it works when
  I live migrate an instance. Apparently it didn't. But it used the port
  for unencrypted TCP [1].

  After digging through documentation and code afterwards I found that
  in this code part:
  
https://github.com/openstack/nova/blob/stable/victoria/nova/virt/libvirt/driver.py#L1120

  @staticmethod
  def _live_migration_uri(dest):
  uris = {
  'kvm': 'qemu+%(scheme)s://%(dest)s/system',
  'qemu': 'qemu+%(scheme)s://%(dest)s/system',
  'xen': 'xenmigr://%(dest)s/system',
  'parallels': 'parallels+tcp://%(dest)s/system',
  }
  dest = oslo_netutils.escape_ipv6(dest)

  virt_type = CONF.libvirt.virt_type
  # TODO(pkoniszewski): Remove fetching live_migration_uri in Pike
  uri = CONF.libvirt.live_migration_uri
  if uri:
  return uri % dest

  uri = uris.get(virt_type)
  if uri is None:
  raise exception.LiveMigrationURINotAvailable(virt_type=virt_type)

  str_format = {
  'dest': dest,
  'scheme': CONF.libvirt.live_migration_scheme or 'tcp',
  }
  return uri % str_format

  the uri is calculated using the config parameter
  'live_migration_scheme' or using the hard coded tcp parameter. Coming
  from the guide for qemu native tls, there was no hint that this config
  option needs to be set.

  In fact without setting this 'live_migration_scheme' config option to
  tls, there is no way to see, that the live migration still uses the
  unencrypted tcp connection - one has to use tcpdump and listen for tcp
  or tls to recognize it. Neither in the logs nor in any debug output
  there is any hint that it is still unencrypted!

  Thus I conclude there might be OpenStack deployments which are
  configured as the guide say but these config changes have no effect!

  - [x] This is a doc addition request.

  To fix this the config parameter 'live_migration_scheme' should be set
  to tls and maybe there should be a warning in the documentation, that
  without doing this, the traffic is still unencrypted.

  - [ ] I have a fix to the document that I can paste below including
  example: input and output.

  [1] without setting 'live_migration_scheme' in the nova.conf
  $ tcpdump -i INTERFACE -n -X port 16509 and '(tcp[((tcp[12] & 0xf0) >> 2)] < 
0x14 || tcp[((tcp[12] & 0xf0) >> 2)] > 0x17)'
  tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
  listening on INTERFACE, link-type EN10MB (Ethernet), capture size 262144 bytes
  17:10:56.387407 IP 192.168.70.101.50900 > 192.168.70.100.16509: Flags [P.], 
seq 304:6488, ack 285, win 502, options [nop,nop,TS val 424149655 ecr 
1875309961], length 6184
   0x:  4500 185c ad05 4000 4006 677c c0a8 4665  E..\..@.@.g|..Fe
   0x0010:  c0a8 4664 c6d4 407d a407 70a6 15ad 0a5a  ..Fd..@}..pZ
   0x0020:  8018 01f6 2669  0101 080a 1948 0297  
   0x0030:  6fc6 f589  1828 2000 8086  0001  o..(
   0x0040:   012f    0009    .../
   0x0050:   0001  000f 6465 7374 696e 6174  destinat
   0x0060:  696f 6e5f 786d 6c00  0007  129b  ion_xml.
   0x0070:  3c64 6f6d 6169 6e20 7479 7065 3d27 6b76  ...inst
   0x0090:  616e 6365 2d30 3030 3032 6539 393c 2f6e  ance-2e99...7e2
   0x00b0:  6364 3839 352d 6263 3765 2d34 6634 352d  cd895-bc7e-4f45-
   0x00c0:  6166 6264 2d33 3732 3166 3735 6134 3064  afbd-3721f75a40d
   0x00d0:  383c 2f75 7569 643e 0a20 203c 6d65 7461  8...> 2)] > 
0x13 && tcp[((tcp[12] & 0xf0) >> 2)] < 0x18)'
  tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
  listening on INTERFACE, link-type EN10MB (Ethernet), capture size 262144 bytes
  16:55:47.746851 IP 192.168.70.100.35620 > 192.168.70.101.16514: Flags [P.], 
seq 1849334708:1849334914, ack 3121294199, win 502, options 

[Yahoo-eng-team] [Bug 1919487] Re: virDomainBlockCommit called when deleting an intermediary snapshot via os-assisted-volume-snapshots even when instance is shutoff

2021-06-03 Thread Elod Illes
** Also affects: nova/wallaby
   Importance: Undecided
   Status: New

** Changed in: nova/wallaby
   Status: New => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1919487

Title:
  virDomainBlockCommit called when deleting an intermediary snapshot via
  os-assisted-volume-snapshots even when instance is shutoff

Status in OpenStack Compute (nova):
  In Progress
Status in OpenStack Compute (nova) wallaby series:
  Fix Released

Bug description:
  Description
  ===

  Attempting to delete a NFS volume snapshot (via c-api and the the os-
  assisted-volume-snapshots n-api) of a volume attached to a SHUTOFF
  instance currently results in n-cpu attempting to fire off a
  virDomainBlockCommit command even though the instance isn't running.

  Steps to reproduce
  ==
  1. Create multiple volume snapshots against a volume.
  2. Attach the volume to an ACTIVE instance.
  3. Stop the instance and ensure it is SHUTOFF.
  4. Attempt to delete an intermediary snapshot.

  Expected result
  ===
  qemu-img commit or qemu-img rebase should be used to handle this offline.

  Actual result
  =
  virDomainBlockCommit is called even though the domain isn't running.

  Environment
  ===

  1. Exact version of OpenStack you are running. See the following
    list for all releases: http://docs.openstack.org/releases/

     master

  2. Which hypervisor did you use?
     (For example: Libvirt + KVM, Libvirt + XEN, Hyper-V, PowerKVM, ...)
     What's the version of that?

     libvirt + KVM

  2. Which storage type did you use?
     (For example: Ceph, LVM, GPFS, ...)
     What's the version of that?

     NFS c-vol

  3. Which networking type did you use?
     (For example: nova-network, Neutron with OpenVSwitch, ...)

     N/A

  Logs & Configs
  ==

  Jul 03 09:37:57 localhost.localdomain nova-compute[127223]: ERROR 
oslo_messaging.rpc.server [req-570281c6-566e-44a3-9953-eeb634513778 
req-0fbbe87f-fd1d-4861-9fb3-21b8eb011e55 service nova] Exception during message 
handling: libvirt.libvirtError: Requested operation is not valid: domain is not 
>
  Jul 03 09:37:57 localhost.localdomain nova-compute[127223]: ERROR 
oslo_messaging.rpc.server Traceback (most recent call last):
  Jul 03 09:37:57 localhost.localdomain nova-compute[127223]: ERROR 
oslo_messaging.rpc.server   File 
"/usr/local/lib/python3.7/site-packages/oslo_messaging/rpc/server.py", line 
165, in _process_incoming
  Jul 03 09:37:57 localhost.localdomain nova-compute[127223]: ERROR 
oslo_messaging.rpc.server res = self.dispatcher.dispatch(message)
  Jul 03 09:37:57 localhost.localdomain nova-compute[127223]: ERROR 
oslo_messaging.rpc.server   File 
"/usr/local/lib/python3.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 
273, in dispatch
  Jul 03 09:37:57 localhost.localdomain nova-compute[127223]: ERROR 
oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, 
args)
  Jul 03 09:37:57 localhost.localdomain nova-compute[127223]: ERROR 
oslo_messaging.rpc.server   File 
"/usr/local/lib/python3.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 
193, in _do_dispatch
  Jul 03 09:37:57 localhost.localdomain nova-compute[127223]: ERROR 
oslo_messaging.rpc.server result = func(ctxt, **new_args)
  Jul 03 09:37:57 localhost.localdomain nova-compute[127223]: ERROR 
oslo_messaging.rpc.server   File 
"/usr/local/lib/python3.7/site-packages/oslo_messaging/rpc/server.py", line 
241, in inner
  Jul 03 09:37:57 localhost.localdomain nova-compute[127223]: ERROR 
oslo_messaging.rpc.server return func(*args, **kwargs)
  Jul 03 09:37:57 localhost.localdomain nova-compute[127223]: ERROR 
oslo_messaging.rpc.server   File "/opt/stack/nova/nova/exception_wrapper.py", 
line 78, in wrapped
  Jul 03 09:37:57 localhost.localdomain nova-compute[127223]: ERROR 
oslo_messaging.rpc.server function_name, call_dict, binary)
  Jul 03 09:37:57 localhost.localdomain nova-compute[127223]: ERROR 
oslo_messaging.rpc.server   File 
"/usr/local/lib/python3.7/site-packages/oslo_utils/excutils.py", line 220, in 
__exit__
  Jul 03 09:37:57 localhost.localdomain nova-compute[127223]: ERROR 
oslo_messaging.rpc.server self.force_reraise()
  Jul 03 09:37:57 localhost.localdomain nova-compute[127223]: ERROR 
oslo_messaging.rpc.server   File 
"/usr/local/lib/python3.7/site-packages/oslo_utils/excutils.py", line 196, in 
force_reraise
  Jul 03 09:37:57 localhost.localdomain nova-compute[127223]: ERROR 
oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb)
  Jul 03 09:37:57 localhost.localdomain nova-compute[127223]: ERROR 
oslo_messaging.rpc.server   File 
"/usr/local/lib/python3.7/site-packages/six.py", line 703, in reraise
  Jul 03 09:37:57 localhost.localdomain nova-compute[127223]: ERROR 
oslo_messaging.rpc.server raise 

[Yahoo-eng-team] [Bug 1923206] Re: libvirt.libvirtError: internal error: unable to execute QEMU command 'device_del': Device $device is already in the process of unplug

2021-06-03 Thread Elod Illes
** Also affects: nova/wallaby
   Importance: Undecided
   Status: New

** Changed in: nova/wallaby
   Status: New => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1923206

Title:
  libvirt.libvirtError: internal error: unable to execute QEMU command
  'device_del': Device $device is already in the process of unplug

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) wallaby series:
  Fix Released

Bug description:
  Description
  ===
  This was initially reported downstream against QEMU in the following bug:

  Get libvirtError "Device XX is already in the process of unplug" when detach 
device in OSP env
  https://bugzilla.redhat.com/show_bug.cgi?id=1878659

  I first saw the error crop up while testing q35 in TripleO in the
  following job:

  
https://c6b36562677324bf8249-804f3f4695b3063292bbb3235f424ae0.ssl.cf1.rackcdn.com/785027/5/check
  /tripleo-ci-
  centos-8-standalone/6860050/logs/undercloud/var/log/containers/nova
  /nova-compute.log

  2021-04-09 11:09:53.702 8 DEBUG nova.virt.libvirt.guest 
[req-4d0b64d5-a2cf-4a6e-a2f7-f6cc7ced4df1 7e2b737ed8f04b3ca819841a41be66c1 
d4d933c7b10c462c8141820b0e70822b - default default] Attempting initial detach 
for device vdb detach_device_with_retry 
/usr/lib/python3.6/site-packages/nova/virt/libvirt/guest.py:455
  [..]
  2021-04-09 11:09:58.721 8 DEBUG nova.virt.libvirt.guest 
[req-4d0b64d5-a2cf-4a6e-a2f7-f6cc7ced4df1 7e2b737ed8f04b3ca819841a41be66c1 
d4d933c7b10c462c8141820b0e70822b - default default] Start retrying detach until 
device vdb is gone. detach_device_with_retry 
/usr/lib/python3.6/site-packages/nova/virt/libvirt/guest.py:471
  [..]
  2021-04-09 11:09:58.729 8 ERROR oslo.service.loopingcall 
libvirt.libvirtError: internal error: unable to execute QEMU command 
'device_del': Device virtio-disk1 is already in the process of unplug

  
  Steps to reproduce
  ==
  Unclear at present, it looks like a genuine QEMU bug that causes it to fail 
when a repeat request to device_del a device comes in instead of ignore the 
request as would previously happen. I've asked for clarification in the 
downstream QEMU bug.

  Expected result
  ===
  Repeat calls to device_del are ignored or the failure while raised is ignored 
by Nova.

  Actual result
  =
  Repeat calls to device_del lead to an error being raised to Nova via libvirt 
that causes the detach to fail while it still succeeds asynchronously within 
QEMU. 

  Environment
  ===
  1. Exact version of OpenStack you are running. See the following
list for all releases: http://docs.openstack.org/releases/

 master

  2. Which hypervisor did you use?
 (For example: Libvirt + KVM, Libvirt + XEN, Hyper-V, PowerKVM, ...)
 What's the version of that?

 libvirt + QEMU/KVM

  2. Which storage type did you use?
 (For example: Ceph, LVM, GPFS, ...)
 What's the version of that?

 N/A

  3. Which networking type did you use?
 (For example: nova-network, Neutron with OpenVSwitch, ...)

 N/A

  Logs & Configs
  ==
  See above.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1923206/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1799298] Re: Metadata API cross joining instance_metadata and instance_system_metadata

2021-06-03 Thread Elod Illes
** Changed in: nova/train
   Status: In Progress => Fix Released

** Changed in: nova/stein
   Status: In Progress => Fix Committed

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1799298

Title:
  Metadata API cross joining instance_metadata and
  instance_system_metadata

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) ocata series:
  Triaged
Status in OpenStack Compute (nova) pike series:
  Triaged
Status in OpenStack Compute (nova) queens series:
  Fix Committed
Status in OpenStack Compute (nova) rocky series:
  Fix Committed
Status in OpenStack Compute (nova) stein series:
  Fix Committed
Status in OpenStack Compute (nova) train series:
  Fix Released
Status in OpenStack Compute (nova) ussuri series:
  Fix Released
Status in OpenStack Compute (nova) victoria series:
  Fix Released
Status in OpenStack Security Advisory:
  Won't Fix

Bug description:
  
  Description
  ===

  While troubleshooting a production issue we identified that the Nova
  metadata API is fetching a lot more raw data from the database than
  seems necessary. The problem appears to be caused by the SQL query
  used to fetch instance data, which joins the "instance" table with,
  among others, two metadata tables: "instance_metadata" and
  "instance_system_metadata". Below is a simplified version of this
  query which was captured by adding extra logging (the full query is
  listed at the end of this bug report):

  SELECT ...
FROM (SELECT ...
FROM `instances`
   WHERE `instances` . `deleted` = ?
 AND `instances` . `uuid` = ?
   LIMIT ?) AS `anon_1`
LEFT OUTER JOIN `instance_system_metadata` AS `instance_system_metadata_1`
  ON `anon_1` . `instances_uuid` = `instance_system_metadata_1` . 
`instance_uuid`
LEFT OUTER JOIN (`security_group_instance_association` AS 
`security_group_instance_association_1`
 INNER JOIN `security_groups` AS `security_groups_1`
 ON `security_groups_1` . `id` = 
`security_group_instance_association_1` . `security_group_id`
 AND `security_group_instance_association_1` . `deleted` = ?
 AND `security_groups_1` . `deleted` = ? )
  ON `security_group_instance_association_1` . `instance_uuid` = `anon_1` . 
`instances_uuid`
 AND `anon_1` . `instances_deleted` = ?
LEFT OUTER JOIN `security_group_rules` AS `security_group_rules_1`
  ON `security_group_rules_1` . `parent_group_id` = `security_groups_1` . 
`id`
 AND `security_group_rules_1` . `deleted` = ?
LEFT OUTER JOIN `instance_info_caches` AS `instance_info_caches_1`
  ON `instance_info_caches_1` . `instance_uuid` = `anon_1` . 
`instances_uuid`
LEFT OUTER JOIN `instance_extra` AS `instance_extra_1`
  ON `instance_extra_1` . `instance_uuid` = `anon_1` . `instances_uuid`
LEFT OUTER JOIN `instance_metadata` AS `instance_metadata_1`
  ON `instance_metadata_1` . `instance_uuid` = `anon_1` . `instances_uuid`
 AND `instance_metadata_1` . `deleted` = ?

  The instance table has a 1-to-many relationship to both
  "instance_metadata" and "instance_system_metadata" tables, so the
  query is effectively producing a cross join of both metadata tables.

  
  Steps to reproduce
  ==

  To illustrate the impact of this query, add 2 properties to a running
  instance and verify that it has 2 records in "instance_metadata", as
  well as other records in "instance_system_metadata" such as base image
  properties:

  > select instance_uuid,`key`,value from instance_metadata where instance_uuid 
= 'a6cf4a6a-effe-4438-9b7f-d61b23117b9b';
  +--+---++
  | instance_uuid| key   | value  |
  +--+---++
  | a6cf4a6a-effe-4438-9b7f-d61b23117b9b | property1 | value1 |
  | a6cf4a6a-effe-4438-9b7f-d61b23117b9b | property2 | value  |
  +--+---++
  2 rows in set (0.61 sec)

  > select instance_uuid,`key`,valusystem_metadata where instance_uuid = 
'a6cf4a6a-effe-4438-9b7f-d61b23117b9b';
  ++--+
  | key| value|
  ++--+
  | image_disk_format  | qcow2|
  | image_min_ram  | 0|
  | image_min_disk | 20   |
  | image_base_image_ref   | 39cd564f-6a29-43e2-815b-62097968486a |
  | image_container_format | bare |
  ++--+
  5 rows in set (0.00 sec)

  For this particular instance, 

[Yahoo-eng-team] [Bug 1841932] Re: hide_hypervisor_id extra_specs in nova flavor cannot pass AggregateInstanceExtraSpecsFilter

2021-06-03 Thread Elod Illes
** Also affects: nova/train
   Importance: Undecided
   Status: New

** Changed in: nova/train
   Status: New => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1841932

Title:
  hide_hypervisor_id extra_specs in nova flavor cannot pass
  AggregateInstanceExtraSpecsFilter

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) train series:
  Fix Released
Status in OpenStack Compute (nova) ussuri series:
  Fix Released

Bug description:
  Description
  ===
  when we enable nova AggregateInstanceExtraSpecsFilter, and then we need to 
passthrough a nvidia gpu so that we need to set hide_hypervisor_id in nova 
flavor extra specs. hide_hypervisor_id cannot pass the 
AggregateInstanceExtraSpecsFilter because of # Either not scope format, or 
aggregate_instance_extra_specs scope. 

  See the codes below:
  # Either not scope format, or aggregate_instance_extra_specs scope
  scope = key.split(':', 1)
  if len(scope) > 1:
  if scope[0] != _SCOPE:
  continue
  else:
  del scope[0]
  key = scope[0]

  
  Steps to reproduce
  ==
  in nova.conf
  [filter_scheduler]
  enabled_filters = ,AggregateInstanceExtraSpecsFilter,...

  create a flavor like "g3.8xlarge" and setting extra_specs
  "hide_hypervisor_id":

  nova flavor-key g3.8xlarge set hide_hypervisor_id=true

  then create a instance with flavor g3.8xlarge, it will report "Filter
  AggregateInstanceExtraSpecsFilter returned 0 hosts" in nova schedualer
  log.

  Environment
  ===
  (nova-scheduler)[nova@control1 /]$ rpm -qa | grep nova
  openstack-nova-common-18.2.1-0.1.el7.noarch
  openstack-nova-scheduler-18.2.1-0.1.el7.noarch
  python-nova-18.2.1-0.1.el7.noarch
  python2-novaclient-11.0.0-1.el7.noarch

  
  I think this is a BUG in AggregateInstanceExtraSpecsFilter, can I suggest to 
remove the "not scope format" support in AggregateInstanceExtraSpecsFilter? or 
add a explicitly scope for "hide_hypervisor_id". Otherwise, I cannot use 
AggregateInstanceExtraSpecsFilter and hide_hypervisor_id at the same time.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1841932/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1893618] Re: periodic-tripleo-ci-centos-8-standalone-full-tempest-api-compute-master tempest test_shelve_unshelve_server failing in component-pipeline

2021-06-03 Thread Elod Illes
** Changed in: nova/ussuri
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1893618

Title:
  periodic-tripleo-ci-centos-8-standalone-full-tempest-api-compute-
  master tempest test_shelve_unshelve_server failing in component-
  pipeline

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) train series:
  Fix Committed
Status in OpenStack Compute (nova) ussuri series:
  Fix Released
Status in tripleo:
  Fix Released

Bug description:
  https://logserver.rdoproject.org/openstack-component-
  compute/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-
  centos-8-standalone-full-tempest-api-compute-
  master/b346467/logs/undercloud/var/log/tempest/stestr_results.html.gz

  traceback-1: {{{
  Traceback (most recent call last):
    File 
"/usr/lib/python3.6/site-packages/tempest/api/compute/servers/test_server_actions.py",
 line 66, in tearDown
  self.server_check_teardown()
    File "/usr/lib/python3.6/site-packages/tempest/api/compute/base.py", line 
220, in server_check_teardown
  cls.server_id, 'ACTIVE')
    File "/usr/lib/python3.6/site-packages/tempest/common/waiters.py", line 96, 
in wait_for_server_status
  raise lib_exc.TimeoutException(message)
  tempest.lib.exceptions.TimeoutException: Request timed out
  Details: (ServerActionsTestJSON:tearDown) Server 
41f15309-34bb-430d-8dad-7b9c8362a851 failed to reach ACTIVE status and task 
state "None" within the required time (300 s). Current status: 
SHELVED_OFFLOADED. Current task state: None.
  }}}

  traceback-2: {{{
  Traceback (most recent call last):
    File 
"/usr/lib/python3.6/site-packages/tempest/api/compute/servers/test_server_actions.py",
 line 649, in _unshelve_server
  server_info = self.client.show_server(self.server_id)['server']
    File 
"/usr/lib/python3.6/site-packages/tempest/lib/services/compute/servers_client.py",
 line 145, in show_server
  resp, body = self.get("servers/%s" % server_id)
    File "/usr/lib/python3.6/site-packages/tempest/lib/common/rest_client.py", 
line 313, in get
  return self.request('GET', url, extra_headers, headers)
    File 
"/usr/lib/python3.6/site-packages/tempest/lib/services/compute/base_compute_client.py",
 line 48, in request
  method, url, extra_headers, headers, body, chunked)
    File "/usr/lib/python3.6/site-packages/tempest/lib/common/rest_client.py", 
line 702, in request
  self._error_checker(resp, resp_body)
    File "/usr/lib/python3.6/site-packages/tempest/lib/common/rest_client.py", 
line 808, in _error_checker
  raise exceptions.NotFound(resp_body, resp=resp)
  tempest.lib.exceptions.NotFound: Object not found
  Details: {'code': 404, 'message': 'Instance None could not be found.'}
  }}}

  Traceback (most recent call last):
    File "/usr/lib/python3.6/site-packages/tempest/common/utils/__init__.py", 
line 89, in wrapper
  return f(*func_args, **func_kwargs)
    File 
"/usr/lib/python3.6/site-packages/tempest/api/compute/servers/test_server_actions.py",
 line 666, in test_shelve_unshelve_server
  waiters.wait_for_server_status(self.client, self.server_id, 'ACTIVE')
    File "/usr/lib/python3.6/site-packages/tempest/common/waiters.py", line 96, 
in wait_for_server_status
  raise lib_exc.TimeoutException(message)
  tempest.lib.exceptions.TimeoutException: Request timed out
  Details: (ServerActionsTestJSON:test_shelve_unshelve_server) Server 
41f15309-34bb-430d-8dad-7b9c8362a851 failed to reach ACTIVE status and task 
state "None" within the required time (300 s). Current status: 
SHELVED_OFFLOADED. Current task state: None.


  Traceback in nova-compute logs https://logserver.rdoproject.org
  /openstack-component-compute/opendev.org/openstack/tripleo-ci/master
  /periodic-tripleo-ci-centos-8-standalone-full-tempest-api-compute-
  master/b346467/logs/undercloud/var/log/containers/nova/nova-
  compute.log.1.gz:-

  2020-08-30 08:35:10.183 7 ERROR oslo_messaging.rpc.server 
[req-9280bac1-da23-4f45-b01c-b6012198d97e 10fe2caa6924408485c181adfc7377e8 
df52aad2e4da4f07b4b7b4ff6644e121 - default default] Exception during message 
handling: AttributeError: 'NoneType' object has no attribute 'encode'
  2020-08-30 08:35:10.183 7 ERROR oslo_messaging.rpc.server Traceback (most 
recent call last):
  2020-08-30 08:35:10.183 7 ERROR oslo_messaging.rpc.server   File 
"/usr/lib/python3.6/site-packages/oslo_messaging/rpc/server.py", line 165, in 
_process_incoming
  2020-08-30 08:35:10.183 7 ERROR oslo_messaging.rpc.server res = 
self.dispatcher.dispatch(message)
  2020-08-30 08:35:10.183 7 ERROR oslo_messaging.rpc.server   File 
"/usr/lib/python3.6/site-packages/oslo_messaging/rpc/dispatcher.py", line 273, 
in dispatch
  2020-08-30 08:35:10.183 7 ERROR oslo_messaging.rpc.server return 
self._do_dispatch(endpoint, method, ctxt, args)
  

[Yahoo-eng-team] [Bug 1613770] Re: Improve error log when instance snapshot fails

2021-06-03 Thread Elod Illes
** Changed in: nova/train
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1613770

Title:
  Improve error log when instance snapshot fails

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) queens series:
  New
Status in OpenStack Compute (nova) rocky series:
  New
Status in OpenStack Compute (nova) stein series:
  New
Status in OpenStack Compute (nova) train series:
  Fix Released

Bug description:
  If glance backend store is set to use filesystem storage and this
  storage is running out of space when if glance is trying to create a
  instance snapshot,  then in nova-compute log the following message is
  displayed:

  2016-08-08 22:24:31.644 TRACE oslo_messaging.rpc.server HTTPOverLimit: 413 
Request Entity Too Large
  2016-08-08 22:24:31.644 TRACE oslo_messaging.rpc.server Image storage media 
is full: There is not enough disk space on the image storage media.
  2016-08-08 22:24:31.644 TRACE oslo_messaging.rpc.server (HTTP 413)

  It's a little bit annoying that we're logging the HTTP error from
  glance and that we don't specify the image uuid.

  Steps to reproduce:
  * set glance's config filesystem_store_datadir to small size filesystem
  * start nova instance
  * keep invoking "nova image-create" to create instance image snapshot, 
eventually the backend filesystem storage would run out of space
  * on nova-compute log see the HTTP error message above.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1613770/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1882521] Re: Failing device detachments on Focal: "Unable to detach the device from the live config"

2021-06-03 Thread Elod Illes
** Changed in: nova/train
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1882521

Title:
  Failing device detachments on Focal: "Unable to detach the device from
  the live config"

Status in Cinder:
  Invalid
Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) train series:
  Fix Released
Status in OpenStack Compute (nova) ussuri series:
  In Progress
Status in OpenStack Compute (nova) victoria series:
  Fix Released

Bug description:
  The following tests are failing consistently when deploying devstack
  on Focal in the CI, see https://review.opendev.org/734029 for detailed
  logs:

  
tempest.api.compute.servers.test_server_rescue_negative.ServerRescueNegativeTestJSON.test_rescued_vm_detach_volume
  
tempest.api.compute.volumes.test_attach_volume.AttachVolumeMultiAttachTest.test_resize_server_with_multiattached_volume
  
tempest.api.compute.servers.test_server_rescue.ServerStableDeviceRescueTest.test_stable_device_rescue_disk_virtio_with_volume_attached
  tearDownClass 
(tempest.api.compute.servers.test_server_rescue.ServerStableDeviceRescueTest)

  Sample extract from nova-compute log:

  Jun 08 08:48:24.384559 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: 
DEBUG oslo.service.loopingcall [-] Exception which is in the suggested list of 
exceptions occurred while invoking function: 
nova.virt.libvirt.guest.Guest.detach_device_with_retry.._do_wait_and_retry_detach.
 {{(pid=82495) _func 
/usr/local/lib/python3.8/dist-packages/oslo_service/loopingcall.py:410}}
  Jun 08 08:48:24.384862 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: 
DEBUG oslo.service.loopingcall [-] Cannot retry 
nova.virt.libvirt.guest.Guest.detach_device_with_retry.._do_wait_and_retry_detach
 upon suggested exception since retry count (7) reached max retry count (7). 
{{(pid=82495) _func 
/usr/local/lib/python3.8/dist-packages/oslo_service/loopingcall.py:416}}
  Jun 08 08:48:24.388855 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: 
ERROR oslo.service.loopingcall [-] Dynamic interval looping call 
'oslo_service.loopingcall.RetryDecorator.__call__.._func' failed: 
nova.exception.DeviceDetachFailed: Device detach failed for vdb: Unable to 
detach the device from the live config.
  Jun 08 08:48:24.388855 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: 
ERROR oslo.service.loopingcall Traceback (most recent call last):
  Jun 08 08:48:24.388855 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: 
ERROR oslo.service.loopingcall   File 
"/usr/local/lib/python3.8/dist-packages/oslo_service/loopingcall.py", line 150, 
in _run_loop
  Jun 08 08:48:24.388855 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: 
ERROR oslo.service.loopingcall result = func(*self.args, **self.kw)
  Jun 08 08:48:24.388855 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: 
ERROR oslo.service.loopingcall   File 
"/usr/local/lib/python3.8/dist-packages/oslo_service/loopingcall.py", line 428, 
in _func
  Jun 08 08:48:24.388855 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: 
ERROR oslo.service.loopingcall return self._sleep_time
  Jun 08 08:48:24.388855 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: 
ERROR oslo.service.loopingcall   File 
"/usr/local/lib/python3.8/dist-packages/oslo_utils/excutils.py", line 220, in 
__exit__
  Jun 08 08:48:24.388855 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: 
ERROR oslo.service.loopingcall self.force_reraise()
  Jun 08 08:48:24.388855 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: 
ERROR oslo.service.loopingcall   File 
"/usr/local/lib/python3.8/dist-packages/oslo_utils/excutils.py", line 196, in 
force_reraise
  Jun 08 08:48:24.388855 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: 
ERROR oslo.service.loopingcall six.reraise(self.type_, self.value, self.tb)
  Jun 08 08:48:24.388855 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: 
ERROR oslo.service.loopingcall   File 
"/usr/local/lib/python3.8/dist-packages/six.py", line 703, in reraise
  Jun 08 08:48:24.388855 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: 
ERROR oslo.service.loopingcall raise value
  Jun 08 08:48:24.388855 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: 
ERROR oslo.service.loopingcall   File 
"/usr/local/lib/python3.8/dist-packages/oslo_service/loopingcall.py", line 407, 
in _func
  Jun 08 08:48:24.388855 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: 
ERROR oslo.service.loopingcall result = f(*args, **kwargs)
  Jun 08 08:48:24.388855 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: 
ERROR oslo.service.loopingcall   File 
"/opt/stack/nova/nova/virt/libvirt/guest.py", line 453, in 
_do_wait_and_retry_detach
  Jun 08 08:48:24.388855 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: 
ERROR oslo.service.loopingcall raise exception.DeviceDetachFailed(
  Jun 

[Yahoo-eng-team] [Bug 1896621] Re: instance corrupted after volume retype

2021-06-03 Thread Elod Illes
** Changed in: nova/train
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1896621

Title:
  instance corrupted after volume retype

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) queens series:
  In Progress
Status in OpenStack Compute (nova) rocky series:
  In Progress
Status in OpenStack Compute (nova) stein series:
  In Progress
Status in OpenStack Compute (nova) train series:
  Fix Released
Status in OpenStack Compute (nova) ussuri series:
  Fix Released
Status in OpenStack Compute (nova) victoria series:
  Fix Released

Bug description:
  Description
  ===

  Following a cinder volume retype on a volume attached to a running
  instance, the instance became corrupt and cannot boot into the guest
  operating system any more.

  Upon further investigating it seems the retype operation failed.  The
  nova-compute logs registered the following error:

  Exception during message handling: libvirtError: block copy still
  active: domain has active block job

  see log extract: http://paste.openstack.org/show/798201/

  Steps to reproduce
  ==

  I'm not sure how easy this would be to replicate the exact problem.

  As an admin user within the project, in Horizon go to Project | Volume
  | Volume, then from the context menu of the required volume select
  "change volume type".

  Select the new type and migration policy 'on-demand'.

  Following this it was reported that the instance was none-responsive,
  when checking in the console the instance was unable to boot from the
  volume.

  
  Environment
  ===
  DISTRIB_ID="OSA"
  DISTRIB_RELEASE="18.1.5"
  DISTRIB_CODENAME="Rocky"
  DISTRIB_DESCRIPTION="OpenStack-Ansible"

  # nova-manage --version
  18.1.1

  # virsh version
  Compiled against library: libvirt 4.0.0
  Using library: libvirt 4.0.0
  Using API: QEMU 4.0.0
  Running hypervisor: QEMU 2.11.1

  
  Cinder v13.0.3 backed volumes using Zadara VPSA driver

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1896621/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1919357] Re: "Secure live migration with QEMU-native TLS in nova"-guide misses essential config option

2021-06-03 Thread Elod Illes
** Changed in: nova/train
   Status: New => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1919357

Title:
  "Secure live migration with QEMU-native TLS in nova"-guide misses
  essential config option

Status in OpenStack Compute (nova):
  In Progress
Status in OpenStack Compute (nova) stein series:
  New
Status in OpenStack Compute (nova) train series:
  Fix Released
Status in OpenStack Compute (nova) ussuri series:
  New
Status in OpenStack Compute (nova) victoria series:
  New
Status in OpenStack Security Advisory:
  Won't Fix
Status in OpenStack Security Notes:
  In Progress

Bug description:
  - [x] This doc is inaccurate in this way: __

  I followed the guide to setup qemu native tls for live migration.
  After checking, that libvirt is able to use tls using tcpdump to
  listen on the port for tls,  I also wanted to check that it works when
  I live migrate an instance. Apparently it didn't. But it used the port
  for unencrypted TCP [1].

  After digging through documentation and code afterwards I found that
  in this code part:
  
https://github.com/openstack/nova/blob/stable/victoria/nova/virt/libvirt/driver.py#L1120

  @staticmethod
  def _live_migration_uri(dest):
  uris = {
  'kvm': 'qemu+%(scheme)s://%(dest)s/system',
  'qemu': 'qemu+%(scheme)s://%(dest)s/system',
  'xen': 'xenmigr://%(dest)s/system',
  'parallels': 'parallels+tcp://%(dest)s/system',
  }
  dest = oslo_netutils.escape_ipv6(dest)

  virt_type = CONF.libvirt.virt_type
  # TODO(pkoniszewski): Remove fetching live_migration_uri in Pike
  uri = CONF.libvirt.live_migration_uri
  if uri:
  return uri % dest

  uri = uris.get(virt_type)
  if uri is None:
  raise exception.LiveMigrationURINotAvailable(virt_type=virt_type)

  str_format = {
  'dest': dest,
  'scheme': CONF.libvirt.live_migration_scheme or 'tcp',
  }
  return uri % str_format

  the uri is calculated using the config parameter
  'live_migration_scheme' or using the hard coded tcp parameter. Coming
  from the guide for qemu native tls, there was no hint that this config
  option needs to be set.

  In fact without setting this 'live_migration_scheme' config option to
  tls, there is no way to see, that the live migration still uses the
  unencrypted tcp connection - one has to use tcpdump and listen for tcp
  or tls to recognize it. Neither in the logs nor in any debug output
  there is any hint that it is still unencrypted!

  Thus I conclude there might be OpenStack deployments which are
  configured as the guide say but these config changes have no effect!

  - [x] This is a doc addition request.

  To fix this the config parameter 'live_migration_scheme' should be set
  to tls and maybe there should be a warning in the documentation, that
  without doing this, the traffic is still unencrypted.

  - [ ] I have a fix to the document that I can paste below including
  example: input and output.

  [1] without setting 'live_migration_scheme' in the nova.conf
  $ tcpdump -i INTERFACE -n -X port 16509 and '(tcp[((tcp[12] & 0xf0) >> 2)] < 
0x14 || tcp[((tcp[12] & 0xf0) >> 2)] > 0x17)'
  tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
  listening on INTERFACE, link-type EN10MB (Ethernet), capture size 262144 bytes
  17:10:56.387407 IP 192.168.70.101.50900 > 192.168.70.100.16509: Flags [P.], 
seq 304:6488, ack 285, win 502, options [nop,nop,TS val 424149655 ecr 
1875309961], length 6184
   0x:  4500 185c ad05 4000 4006 677c c0a8 4665  E..\..@.@.g|..Fe
   0x0010:  c0a8 4664 c6d4 407d a407 70a6 15ad 0a5a  ..Fd..@}..pZ
   0x0020:  8018 01f6 2669  0101 080a 1948 0297  
   0x0030:  6fc6 f589  1828 2000 8086  0001  o..(
   0x0040:   012f    0009    .../
   0x0050:   0001  000f 6465 7374 696e 6174  destinat
   0x0060:  696f 6e5f 786d 6c00  0007  129b  ion_xml.
   0x0070:  3c64 6f6d 6169 6e20 7479 7065 3d27 6b76  ...inst
   0x0090:  616e 6365 2d30 3030 3032 6539 393c 2f6e  ance-2e99...7e2
   0x00b0:  6364 3839 352d 6263 3765 2d34 6634 352d  cd895-bc7e-4f45-
   0x00c0:  6166 6264 2d33 3732 3166 3735 6134 3064  afbd-3721f75a40d
   0x00d0:  383c 2f75 7569 643e 0a20 203c 6d65 7461  8...> 2)] > 
0x13 && tcp[((tcp[12] & 0xf0) >> 2)] < 0x18)'
  tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
  listening on INTERFACE, link-type EN10MB (Ethernet), capture size 262144 bytes
  16:55:47.746851 IP 192.168.70.100.35620 > 192.168.70.101.16514: Flags [P.], 
seq 1849334708:1849334914, ack 3121294199, win 502, options [nop,nop,TS val 
1874401351 ecr 423241020], length 206
   0x:  4500 0102 a605 4000 

[Yahoo-eng-team] [Bug 1930734] [NEW] Volumes and vNICs are being hot plugged into SEV based instances without iommu='on' causing failures to attach and later detach within the guest OS

2021-06-03 Thread Lee Yarwood
Public bug reported:

Description
===
After successfully attaching a disk to a SEV enabled instance the request to 
detach the disk never completes with the following trace eventually logged 
regarding the initial attach:

[7.773877] pcieport :00:02.5: Slot(0-5): Attention button pressed
[7.774743] pcieport :00:02.5: Slot(0-5) Powering on due to button press
[7.775714] pcieport :00:02.5: Slot(0-5): Card present
[7.776403] pcieport :00:02.5: Slot(0-5): Link Up
[7.903183] pci :06:00.0: [1af4:1042] type 00 class 0x01
[7.904095] pci :06:00.0: reg 0x14: [mem 0x-0x0fff]
[7.905024] pci :06:00.0: reg 0x20: [mem 0x-0x3fff 64bit 
pref]
[7.906977] pcieport :00:02.5: bridge window [io  0x1000-0x0fff] to [bus 
06] add_size 1000
[7.908069] pcieport :00:02.5: BAR 13: no space for [io  size 0x1000]
[7.908917] pcieport :00:02.5: BAR 13: failed to assign [io  size 0x1000]
[7.909832] pcieport :00:02.5: BAR 13: no space for [io  size 0x1000]
[7.910667] pcieport :00:02.5: BAR 13: failed to assign [io  size 0x1000]
[7.911586] pci :06:00.0: BAR 4: assigned [mem 0x80060-0x800603fff 
64bit pref]
[7.912616] pci :06:00.0: BAR 1: assigned [mem 0x8040-0x80400fff]
[7.913472] pcieport :00:02.5: PCI bridge to [bus 06]
[7.915762] pcieport :00:02.5:   bridge window [mem 
0x8040-0x805f]
[7.917525] pcieport :00:02.5:   bridge window [mem 
0x80060-0x8007f 64bit pref]
[7.920252] virtio-pci :06:00.0: enabling device ( -> 0002)
[7.924487] virtio_blk virtio4: [vdb] 2097152 512-byte logical blocks (1.07 
GB/1.00 GiB)
[7.926616] vdb: detected capacity change from 0 to 1073741824
[ .. ]
[  246.751028] INFO: task irq/29-pciehp:173 blocked for more than 120 seconds.
[  246.752801]   Not tainted 4.18.0-305.el8.x86_64 #1
[  246.753902] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[  246.755457] irq/29-pciehp   D0   173  2 0x80004000
[  246.756616] Call Trace:
[  246.757328]  __schedule+0x2c4/0x700
[  246.758185]  schedule+0x38/0xa0
[  246.758966]  io_schedule+0x12/0x40
[  246.759801]  do_read_cache_page+0x513/0x770
[  246.760761]  ? blkdev_writepages+0x10/0x10
[  246.761692]  ? file_fdatawait_range+0x20/0x20
[  246.762659]  read_part_sector+0x38/0xda
[  246.763554]  read_lba+0x10f/0x220
[  246.764367]  efi_partition+0x1e4/0x6de
[  246.765245]  ? snprintf+0x49/0x60
[  246.766046]  ? is_gpt_valid.part.5+0x430/0x430
[  246.766991]  blk_add_partitions+0x164/0x3f0
[  246.767915]  ? blk_drop_partitions+0x91/0xc0
[  246.768863]  bdev_disk_changed+0x65/0xd0
[  246.769748]  __blkdev_get+0x3c4/0x510
[  246.770595]  blkdev_get+0xaf/0x180
[  246.771394]  __device_add_disk+0x3de/0x4b0
[  246.772302]  virtblk_probe+0x4ba/0x8a0 [virtio_blk]
[  246.773313]  virtio_dev_probe+0x158/0x1f0
[  246.774208]  really_probe+0x255/0x4a0
[  246.775046]  ? __driver_attach_async_helper+0x90/0x90
[  246.776091]  driver_probe_device+0x49/0xc0
[  246.776965]  bus_for_each_drv+0x79/0xc0
[  246.777813]  __device_attach+0xdc/0x160
[  246.778669]  bus_probe_device+0x9d/0xb0
[  246.779523]  device_add+0x418/0x780
[  246.780321]  register_virtio_device+0x9e/0xe0
[  246.781254]  virtio_pci_probe+0xb3/0x140
[  246.782124]  local_pci_probe+0x41/0x90
[  246.782937]  pci_device_probe+0x105/0x1c0
[  246.783807]  really_probe+0x255/0x4a0
[  246.784623]  ? __driver_attach_async_helper+0x90/0x90
[  246.785647]  driver_probe_device+0x49/0xc0
[  246.786526]  bus_for_each_drv+0x79/0xc0
[  246.787364]  __device_attach+0xdc/0x160
[  246.788205]  pci_bus_add_device+0x4a/0x90
[  246.789063]  pci_bus_add_devices+0x2c/0x70
[  246.789916]  pciehp_configure_device+0x91/0x130
[  246.790855]  pciehp_handle_presence_or_link_change+0x334/0x460
[  246.791985]  pciehp_ist+0x1a2/0x1b0
[  246.792768]  ? irq_finalize_oneshot.part.47+0xf0/0xf0
[  246.793768]  irq_thread_fn+0x1f/0x50
[  246.794550]  irq_thread+0xe7/0x170
[  246.795299]  ? irq_forced_thread_fn+0x70/0x70
[  246.796190]  ? irq_thread_check_affinity+0xe0/0xe0
[  246.797147]  kthread+0x116/0x130
[  246.797841]  ? kthread_flush_work_fn+0x10/0x10
[  246.798735]  ret_from_fork+0x22/0x40
[  246.799523] INFO: task sfdisk:1129 blocked for more than 120 seconds.
[  246.800717]   Not tainted 4.18.0-305.el8.x86_64 #1
[  246.801733] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[  246.803155] sfdisk  D0  1129   1107 0x4080
[  246.804225] Call Trace:
[  246.804827]  __schedule+0x2c4/0x700
[  246.805590]  ? submit_bio+0x3c/0x160
[  246.806373]  schedule+0x38/0xa0
[  246.807089]  schedule_preempt_disabled+0xa/0x10
[  246.807990]  __mutex_lock.isra.6+0x2d0/0x4a0
[  246.808876]  ? wake_up_q+0x80/0x80
[  246.809636]  ? fdatawait_one_bdev+0x20/0x20
[  246.810508]  iterate_bdevs+0x98/0x142
[  246.811304]  ksys_sync+0x6e/0xb0
[  246.812041]  __ia32_sys_sync+0xa/0x10
[  

[Yahoo-eng-team] [Bug 1907686] Re: ovn: instance unable to retrieve metadata

2021-06-03 Thread Chris MacNaughton
This bug was fixed in the package openvswitch - 2.13.3-0ubuntu0.20.04.1~cloud0
---

 openvswitch (2.13.3-0ubuntu0.20.04.1~cloud0) bionic-ussuri; urgency=medium
 .
   * New update for the Ubuntu Cloud Archive.
 .
 openvswitch (2.13.3-0ubuntu0.20.04.1) focal; urgency=medium
 .
   [ James Page ]
   * New upstream point release (LP: #1920141, LP: #1907686).
   * Dropped security patches, included in release:
 - CVE-2015-8011.patch
 - CVE-2020-27827.patch
 - CVE-2020-35498.patch
   * Add BD on libdbus-1-dev to resolve linking issues for DPDK builds due
 to changes in DPDK.
   * d/control: Set minimum version of libdpdk-dev to avoid build
 failures with 19.11.6-0ubuntu0.20.04.1.
 .
   [ Frode Nordahl ]
   * Fix recording of FQDN/hostname on startup (LP: #1915829):
 - d/p/ovs-dev-ovs-ctl-Allow-recording-hostname-separately.patch: Cherry
   pick of committed upstream fix to support skip of hostname
   configuration on ovs-vswitchd/ovsdb-server startup.
 - d/openvswitch-switch.ovs-record-hostname.service: Record hostname in
   Open vSwitch after network-online.target using new systemd unit.
 - d/openvswitch-switch.ovs-vswitchd.service: Pass `--no-record-hostname`
   option to `ovs-ctl` to delegate recording of hostname to the separate
   service.
 - d/openvswitch-switch.ovsdb-server.service: Pass `--no-record-hostname`
   option to `ovs-ctl` to delegate recording of hostname to the separate
   service.
 - d/openvswitch-switch.service: Add `Also` reference to
   ovs-record-hostname.service so that the service is enabled on install.
 - d/rules: Add `ovs-record-hostname.service` to package build.


** Changed in: cloud-archive/ussuri
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1907686

Title:
  ovn: instance unable to retrieve metadata

Status in charm-ovn-chassis:
  Invalid
Status in Ubuntu Cloud Archive:
  Fix Released
Status in Ubuntu Cloud Archive ussuri series:
  Fix Released
Status in Ubuntu Cloud Archive victoria series:
  Won't Fix
Status in Ubuntu Cloud Archive wallaby series:
  Fix Released
Status in neutron:
  Invalid
Status in openvswitch package in Ubuntu:
  Fix Released
Status in openvswitch source package in Focal:
  Fix Released
Status in openvswitch source package in Groovy:
  Fix Released
Status in openvswitch source package in Hirsute:
  Fix Released

Bug description:
  [Impact]
  Cloud instances are unable to retrieve metadata on startup.

  [Test Case]
  Deploy OpenStack with OVN/OVS
  Restart OVN central controllers
  Create a new instance
  Instance will fail to retrieve metadata with the message from the original 
bug report displayed in the metadata agent log on the local hypervisor

  [Regression Potential]
  The fix for this issue is included in the upstream 2.13.3 release of OVS.
  The fix ensures that SSL related connection issues are correctly handling in 
python3-ovs avoiding an issue where the connection to the OVN SB IDL is reset 
and never recreated.
  The OVN drivers use python3-ovsdbapp which in turn bases off code provided by 
python3-ovs.

  
  [Original Bug Report]
  Ubuntu:focal
  OpenStack: ussuri
  Instance port: hardware offloaded

  instance created, attempts to access metadata - metadata agent can't
  resolve the port/network combination:

  2020-12-10 15:00:18.258 4732 INFO neutron.agent.ovn.metadata.agent [-] Port 
d65418a6-d0e9-47e6-84ba-3d02fe75131a in datapath 
37706e4d-ce2a-4d81-8c61-3fd12437a0a7 bound to our ch
  assis
  2020-12-10 15:00:31.672 8062 ERROR neutron.agent.ovn.metadata.server [-] No 
port found in network 37706e4d-ce2a-4d81-8c61-3fd12437a0a7 with IP address 
10.5.1.155
  2020-12-10 15:00:31.673 8062 INFO eventlet.wsgi.server [-] 10.5.1.155, 
"GET /openstack HTTP/1.1" status: 404  len: 297 time: 0.0043790
  2020-12-10 15:00:34.639 8062 ERROR neutron.agent.ovn.metadata.server [-] No 
port found in network 37706e4d-ce2a-4d81-8c61-3fd12437a0a7 with IP address 
10.5.1.155
  2020-12-10 15:00:34.639 8062 INFO eventlet.wsgi.server [-] 10.5.1.155, 
"GET /openstack HTTP/1.1" status: 404  len: 297 time: 0.0040138

To manage notifications about this bug go to:
https://bugs.launchpad.net/charm-ovn-chassis/+bug/1907686/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1832021] Please test proposed package

2021-06-03 Thread Corey Bryant
Hello David, or anyone else affected,

Accepted neutron into rocky-proposed. The package will build now and be
available in the Ubuntu Cloud Archive in a few hours, and then in the
-proposed repository.

Please help us by testing this new package. To enable the -proposed
repository:

  sudo add-apt-repository cloud-archive:rocky-proposed
  sudo apt-get update

Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug,
mentioning the version of the package you tested, and change the tag
from verification-rocky-needed to verification-rocky-done. If it does
not fix the bug for you, please add a comment stating that, and change
the tag to verification-rocky-failed. In either case, details of your
testing will help us make a better decision.

Further information regarding the verification process can be found at
https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in
advance!

** Changed in: cloud-archive/rocky
   Status: New => Fix Committed

** Tags added: verification-rocky-needed

** Changed in: cloud-archive/queens
   Status: New => Fix Committed

** Changed in: cloud-archive
   Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1832021

Title:
  Checksum drop of metadata traffic on isolated networks with DPDK

Status in OpenStack neutron-openvswitch charm:
  Fix Released
Status in Ubuntu Cloud Archive:
  Invalid
Status in Ubuntu Cloud Archive queens series:
  Fix Committed
Status in Ubuntu Cloud Archive rocky series:
  Fix Committed
Status in Ubuntu Cloud Archive stein series:
  Fix Released
Status in neutron:
  Fix Released
Status in neutron package in Ubuntu:
  Fix Released
Status in neutron source package in Bionic:
  Fix Committed
Status in neutron source package in Focal:
  Fix Released

Bug description:
  [Impact]

  When an isolated network using provider networks for tenants (meaning
  without virtual routers: DVR or network node), metadata access occurs
  in the qdhcp ip netns rather than the qrouter netns.

  The following options are set in the dhcp_agent.ini file:
  force_metadata = True
  enable_isolated_metadata = True

  VMs on the provider tenant network are unable to access metadata as
  packets are dropped due to checksum.

  [Test Plan]

  1. Create an OpenStack deployment with DPDK options enabled and
  'enable-local-dhcp-and-metadata: true' in neutron-openvswitch. A
  sample, simple 3 node bundle can be found here[1].

  2. Create an external flat network and subnet:

  openstack network show dpdk_net || \
openstack network create --provider-network-type flat \
 --provider-physical-network physnet1 dpdk_net \
 --external

  openstack subnet show dpdk_net || \
  openstack subnet create --allocation-pool 
start=10.230.58.100,end=10.230.58.200 \
  --subnet-range 10.230.56.0/21 --dhcp --gateway 
10.230.56.1 \
  --dns-nameserver 10.230.56.2 \
  --ip-version 4 --network dpdk_net dpdk_subnet

  
  3. Create an instance attached to that network. The instance must have a 
flavor that uses huge pages.

  openstack flavor create --ram 8192 --disk 50 --vcpus 4 m1.dpdk
  openstack flavor set m1.dpdk --property hw:mem_page_size=large

  openstack server create --wait --image xenial --flavor m1.dpdk --key-
  name testkey --network dpdk_net i1

  4. Log into the instance host and check the instance console. The
  instance will hang into the boot and show the following message:

  2020-11-20 09:43:26,790 - openstack.py[DEBUG]: Failed reading optional
  path http://169.254.169.254/openstack/2015-10-15/user_data due to:
  HTTPConnectionPool(host='169.254.169.254', port=80): Read timed out.
  (read timeout=10.0)

  5. Apply the fix in all computes, restart the DHCP agents in all
  computes and create the instance again.

  6. No errors should be shown and the instance quickly boots.

  
  [Where problems could occur]

  * This change is only touched if datapath_type and ovs_use_veth. Those 
settings are mostly used for DPDK environments. The core of the fix is
  to toggle off checksum offload done by the DHCP namespace interfaces.
  This will have the drawback of adding some overhead on the packet processing 
for DHCP traffic but given DHCP does not demand too much data, this should be a 
minor proble.

  * Future changes on the syntax of the ethtool command could cause
  regressions


  [Other Info]

   * None


  [1] https://gist.github.com/sombrafam/e0741138773e444960eb4aeace6e3e79

To manage notifications about this bug go to:
https://bugs.launchpad.net/charm-neutron-openvswitch/+bug/1832021/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net

[Yahoo-eng-team] [Bug 1930706] [NEW] nova allows suboptimal emulator tread pinning for realtime guests

2021-06-03 Thread sean mooney
Public bug reported:

today when ever you use a realtime guest you are required to enable cpu
pinning and other feature such as spcifing a real time core mask via
hw:cpu_realtime_mask or hw_cpu_realtime_mask.

in the victoria release this requriement was relaxed somewhat with the
intoduction of mixed cpu policy guest that are assinged pinned and
floating cores.

https://github.com/openstack/nova/commit/9fc63c764429c10f9041e6b53659e0cbd595bf6b


It is now possible to allocate all cores in an instance to realtime and
omit the ``hw:cpu_realtime_mask`` extra spec. This requires specifying the
``hw:emulator_threads_policy`` extra spec.

https://github.com/openstack/nova/blob/50fdbc752a9ca9c31488140ef2997ed59d861a41/releasenotes/notes/bug-1884231-16acf297d88b122e.yaml

however while that works well it also possible to hw:cpu_realtime_mask
but not specify hw:emulator_threads_policy which leads to sub optimal
xml generation for the libvirt driver.

this is reported downstream as
https://bugzilla.redhat.com/show_bug.cgi?id=1700390 for older releas
that predata the changes referenced above.

though in revaluation of this a possible improvment can be made as
detailed in https://bugzilla.redhat.com/show_bug.cgi?id=1700390#c11


today if we have a 2 core vm where guest cpu 0 is non realtime and guest cpu 1 
is realtime we 
.e.g. hw:cpu_policy=dedicated hw:cpu_realtime=True hw:cpu_realtime_mask=^0
would generate the xml as follows
  
  
  
  

this is because the default behavior when no emulator_threads_policy is 
specifed is for the
emulator thread to float over all the vm cores.

but a slight modifcation to the xml could be made to have a more optimal 
default in this case
useing the cpu_realtime_mask we can instead restrict the emulator thread to 
float over the non realtime cores with realtime priortiy.

  
  
  
  
  

this will ensure that if qemu need to process a request for a device attach for 
example
that the emulator thread has higher priorty then the guest vcpus that deal with 
guest house keeping task but will not interupt the realtime cores.

this would give many of the benifits of emulator_threads_policy=share or
emulator_threads_policy=isolate without increase resource usage or
requireing any config,flavor or image changes. this should also be a
backporable solution to this problem.

this is espically important given realtime host often are deplopy with
the kernel isolcpus paramater which mean that the kernel will not load
balance the emulator thread acrros the range and will instead leave it
onthe core it intially spwaned on. today you coudl get lucky and it
could be spawn on core 0 in which case the new behvior would be the same
or it coudl get spwaned on core 1. wehn the emulatro thread is spawned
on core 1 sicne it has less priority then the vcpu thread it will only
run if the guest vcpu idels resulting in the iablity for qemu to process
device attach and other qemu monitor commands form libvirt or the user.

** Affects: nova
 Importance: Wishlist
 Status: Triaged


** Tags: libvirt numa

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1930706

Title:
  nova allows suboptimal emulator tread pinning for realtime guests

Status in OpenStack Compute (nova):
  Triaged

Bug description:
  today when ever you use a realtime guest you are required to enable
  cpu pinning and other feature such as spcifing a real time core mask
  via hw:cpu_realtime_mask or hw_cpu_realtime_mask.

  in the victoria release this requriement was relaxed somewhat with the
  intoduction of mixed cpu policy guest that are assinged pinned and
  floating cores.

  
https://github.com/openstack/nova/commit/9fc63c764429c10f9041e6b53659e0cbd595bf6b

  
  It is now possible to allocate all cores in an instance to realtime and
  omit the ``hw:cpu_realtime_mask`` extra spec. This requires specifying the
  ``hw:emulator_threads_policy`` extra spec.

  
https://github.com/openstack/nova/blob/50fdbc752a9ca9c31488140ef2997ed59d861a41/releasenotes/notes/bug-1884231-16acf297d88b122e.yaml

  however while that works well it also possible to hw:cpu_realtime_mask
  but not specify hw:emulator_threads_policy which leads to sub optimal
  xml generation for the libvirt driver.

  this is reported downstream as
  https://bugzilla.redhat.com/show_bug.cgi?id=1700390 for older releas
  that predata the changes referenced above.

  though in revaluation of this a possible improvment can be made as
  detailed in https://bugzilla.redhat.com/show_bug.cgi?id=1700390#c11

  
  today if we have a 2 core vm where guest cpu 0 is non realtime and guest cpu 
1 is realtime we 
  .e.g. hw:cpu_policy=dedicated hw:cpu_realtime=True hw:cpu_realtime_mask=^0
  would generate the xml as follows





  this is because the default behavior when no emulator_threads_policy is 
specifed is for the
  emulator thread to float