[Yahoo-eng-team] [Bug 1839920] Re: Macvtap CI fails on Train

2019-10-22 Thread Matt Riedemann
** Changed in: nova
   Importance: Undecided => High

** Tags added: train-rc-potential

** Also affects: nova/train
   Importance: Undecided
   Status: New

** Changed in: nova/train
   Status: New => Confirmed

** Changed in: nova/train
   Importance: Undecided => High

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1839920

Title:
  Macvtap CI fails on Train

Status in OpenStack Compute (nova):
  In Progress
Status in OpenStack Compute (nova) train series:
  Confirmed

Bug description:
  
  MacVtap CI[1] started to fail after merging commit[2]

  
  We think it related to this 
https://github.com/libvirt/libvirt/commit/b91a33638476cf57d910b6056a8fc11921edd029#diff-28bc83a0c3470bba712dfa6824a79c9d.
 So they change from setting the admin mac to the effective mac. The problem is 
that the sriov-nic agent relay on the admin mac to send rpc to the neutron 
server. If the mac and the pci slot don't much it ignores it and the VM stuck 
in spawn until timeout

  
  [1] https://wiki.openstack.org/wiki/ThirdPartySystems/Mellanox_CI
  [2] https://review.opendev.org/#/c/31/

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1839920/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1849165] Re: _populate_assigned_resources raises TypeError: argument of type 'NoneType' is not iterable

2019-10-21 Thread Matt Riedemann
http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22if%20mig.dest_compute%20%3D%3D%20self.host%20and%20'new_resources'%20in%20mig_ctx%3A%5C%22%20AND%20tags%3A%5C%22screen-n-cpu.txt%5C%22=7d

** Also affects: nova/train
   Importance: Undecided
   Status: New

** Summary changed:

- _populate_assigned_resources raises TypeError: argument of type 'NoneType' is 
not iterable
+ _populate_assigned_resources raises "TypeError: argument of type 'NoneType' 
is not iterable" during active migration

** Changed in: nova/train
   Importance: Undecided => High

** Changed in: nova
   Status: New => Confirmed

** Changed in: nova/train
   Status: New => Confirmed

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1849165

Title:
  _populate_assigned_resources raises "TypeError: argument of type
  'NoneType' is not iterable" during active migration

Status in OpenStack Compute (nova):
  Confirmed
Status in OpenStack Compute (nova) train series:
  Confirmed

Bug description:
  Seen here:

  
https://zuul.opendev.org/t/openstack/build/2b10b4a240b84245bcee3366db93951d/log/logs/screen-n-cpu.txt.gz?severity=4#2675

  Oct 21 13:35:16.977968 ubuntu-bionic-rax-iad-0012404623 nova-
  compute[26938]: ERROR nova.compute.manager [None req-
  dd5ddbad-4234-4288-bbab-2c3d20b7f4ad None None] Error updating
  resources for node ubuntu-bionic-rax-iad-0012404623.: TypeError:
  argument of type 'NoneType' is not iterable

  Oct 21 13:35:16.977968 ubuntu-bionic-rax-iad-0012404623 nova-
  compute[26938]: ERROR nova.compute.manager Traceback (most recent call
  last):

  Oct 21 13:35:16.977968 ubuntu-bionic-rax-iad-0012404623 nova-
  compute[26938]: ERROR nova.compute.manager   File
  "/opt/stack/new/nova/nova/compute/manager.py", line 8925, in
  _update_available_resource_for_node

  Oct 21 13:35:16.977968 ubuntu-bionic-rax-iad-0012404623 nova-
  compute[26938]: ERROR nova.compute.manager startup=startup)

  Oct 21 13:35:16.977968 ubuntu-bionic-rax-iad-0012404623 nova-
  compute[26938]: ERROR nova.compute.manager   File
  "/opt/stack/new/nova/nova/compute/resource_tracker.py", line 883, in
  update_available_resource

  Oct 21 13:35:16.977968 ubuntu-bionic-rax-iad-0012404623 nova-
  compute[26938]: ERROR nova.compute.manager
  self._update_available_resource(context, resources, startup=startup)

  Oct 21 13:35:16.977968 ubuntu-bionic-rax-iad-0012404623 nova-
  compute[26938]: ERROR nova.compute.manager   File
  "/usr/local/lib/python2.7/dist-
  packages/oslo_concurrency/lockutils.py", line 328, in inner

  Oct 21 13:35:16.977968 ubuntu-bionic-rax-iad-0012404623 nova-
  compute[26938]: ERROR nova.compute.manager return f(*args,
  **kwargs)

  Oct 21 13:35:16.977968 ubuntu-bionic-rax-iad-0012404623 nova-
  compute[26938]: ERROR nova.compute.manager   File
  "/opt/stack/new/nova/nova/compute/resource_tracker.py", line 965, in
  _update_available_resource

  Oct 21 13:35:16.977968 ubuntu-bionic-rax-iad-0012404623 nova-
  compute[26938]: ERROR nova.compute.manager
  self._populate_assigned_resources(context, instance_by_uuid)

  Oct 21 13:35:16.977968 ubuntu-bionic-rax-iad-0012404623 nova-
  compute[26938]: ERROR nova.compute.manager   File
  "/opt/stack/new/nova/nova/compute/resource_tracker.py", line 482, in
  _populate_assigned_resources

  Oct 21 13:35:16.977968 ubuntu-bionic-rax-iad-0012404623 nova-
  compute[26938]: ERROR nova.compute.manager if mig.dest_compute ==
  self.host and 'new_resources' in mig_ctx:

  Oct 21 13:35:16.977968 ubuntu-bionic-rax-iad-0012404623 nova-
  compute[26938]: ERROR nova.compute.manager TypeError: argument of type
  'NoneType' is not iterable

  Oct 21 13:35:16.977968 ubuntu-bionic-rax-iad-0012404623 nova-
  compute[26938]: ERROR nova.compute.manager

  This was added late in Train:

  https://review.opendev.org/#/c/678452/

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1849165/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1849165] [NEW] _populate_assigned_resources raises "TypeError: argument of type 'NoneType' is not iterable" during active migration

2019-10-21 Thread Matt Riedemann
Public bug reported:

Seen here:

https://zuul.opendev.org/t/openstack/build/2b10b4a240b84245bcee3366db93951d/log/logs/screen-n-cpu.txt.gz?severity=4#2675

Oct 21 13:35:16.977968 ubuntu-bionic-rax-iad-0012404623 nova-
compute[26938]: ERROR nova.compute.manager [None req-dd5ddbad-4234-4288
-bbab-2c3d20b7f4ad None None] Error updating resources for node ubuntu-
bionic-rax-iad-0012404623.: TypeError: argument of type 'NoneType' is
not iterable

Oct 21 13:35:16.977968 ubuntu-bionic-rax-iad-0012404623 nova-
compute[26938]: ERROR nova.compute.manager Traceback (most recent call
last):

Oct 21 13:35:16.977968 ubuntu-bionic-rax-iad-0012404623 nova-
compute[26938]: ERROR nova.compute.manager   File
"/opt/stack/new/nova/nova/compute/manager.py", line 8925, in
_update_available_resource_for_node

Oct 21 13:35:16.977968 ubuntu-bionic-rax-iad-0012404623 nova-
compute[26938]: ERROR nova.compute.manager startup=startup)

Oct 21 13:35:16.977968 ubuntu-bionic-rax-iad-0012404623 nova-
compute[26938]: ERROR nova.compute.manager   File
"/opt/stack/new/nova/nova/compute/resource_tracker.py", line 883, in
update_available_resource

Oct 21 13:35:16.977968 ubuntu-bionic-rax-iad-0012404623 nova-
compute[26938]: ERROR nova.compute.manager
self._update_available_resource(context, resources, startup=startup)

Oct 21 13:35:16.977968 ubuntu-bionic-rax-iad-0012404623 nova-
compute[26938]: ERROR nova.compute.manager   File
"/usr/local/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py",
line 328, in inner

Oct 21 13:35:16.977968 ubuntu-bionic-rax-iad-0012404623 nova-
compute[26938]: ERROR nova.compute.manager return f(*args, **kwargs)

Oct 21 13:35:16.977968 ubuntu-bionic-rax-iad-0012404623 nova-
compute[26938]: ERROR nova.compute.manager   File
"/opt/stack/new/nova/nova/compute/resource_tracker.py", line 965, in
_update_available_resource

Oct 21 13:35:16.977968 ubuntu-bionic-rax-iad-0012404623 nova-
compute[26938]: ERROR nova.compute.manager
self._populate_assigned_resources(context, instance_by_uuid)

Oct 21 13:35:16.977968 ubuntu-bionic-rax-iad-0012404623 nova-
compute[26938]: ERROR nova.compute.manager   File
"/opt/stack/new/nova/nova/compute/resource_tracker.py", line 482, in
_populate_assigned_resources

Oct 21 13:35:16.977968 ubuntu-bionic-rax-iad-0012404623 nova-
compute[26938]: ERROR nova.compute.manager if mig.dest_compute ==
self.host and 'new_resources' in mig_ctx:

Oct 21 13:35:16.977968 ubuntu-bionic-rax-iad-0012404623 nova-
compute[26938]: ERROR nova.compute.manager TypeError: argument of type
'NoneType' is not iterable

Oct 21 13:35:16.977968 ubuntu-bionic-rax-iad-0012404623 nova-
compute[26938]: ERROR nova.compute.manager

This was added late in Train:

https://review.opendev.org/#/c/678452/

** Affects: nova
 Importance: High
 Status: Confirmed

** Affects: nova/train
 Importance: High
 Status: Confirmed


** Tags: resource-tracker

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1849165

Title:
  _populate_assigned_resources raises "TypeError: argument of type
  'NoneType' is not iterable" during active migration

Status in OpenStack Compute (nova):
  Confirmed
Status in OpenStack Compute (nova) train series:
  Confirmed

Bug description:
  Seen here:

  
https://zuul.opendev.org/t/openstack/build/2b10b4a240b84245bcee3366db93951d/log/logs/screen-n-cpu.txt.gz?severity=4#2675

  Oct 21 13:35:16.977968 ubuntu-bionic-rax-iad-0012404623 nova-
  compute[26938]: ERROR nova.compute.manager [None req-
  dd5ddbad-4234-4288-bbab-2c3d20b7f4ad None None] Error updating
  resources for node ubuntu-bionic-rax-iad-0012404623.: TypeError:
  argument of type 'NoneType' is not iterable

  Oct 21 13:35:16.977968 ubuntu-bionic-rax-iad-0012404623 nova-
  compute[26938]: ERROR nova.compute.manager Traceback (most recent call
  last):

  Oct 21 13:35:16.977968 ubuntu-bionic-rax-iad-0012404623 nova-
  compute[26938]: ERROR nova.compute.manager   File
  "/opt/stack/new/nova/nova/compute/manager.py", line 8925, in
  _update_available_resource_for_node

  Oct 21 13:35:16.977968 ubuntu-bionic-rax-iad-0012404623 nova-
  compute[26938]: ERROR nova.compute.manager startup=startup)

  Oct 21 13:35:16.977968 ubuntu-bionic-rax-iad-0012404623 nova-
  compute[26938]: ERROR nova.compute.manager   File
  "/opt/stack/new/nova/nova/compute/resource_tracker.py", line 883, in
  update_available_resource

  Oct 21 13:35:16.977968 ubuntu-bionic-rax-iad-0012404623 nova-
  compute[26938]: ERROR nova.compute.manager
  self._update_available_resource(context, resources, startup=startup)

  Oct 21 13:35:16.977968 ubuntu-bionic-rax-iad-0012404623 nova-
  compute[26938]: ERROR nova.compute.manager   File
  "/usr/local/lib/python2.7/dist-
  packages/oslo_concurrency/lockutils.py", line 328, in inner

  Oct 21 13:35:16.977968 ubuntu-bionic-rax-iad-0012404623 nova-
  compute[26938]: ERROR 

[Yahoo-eng-team] [Bug 1848514] Re: Booting from volume providing an image fails

2019-10-21 Thread Matt Riedemann
Hmm, did something change in Stein on the Cinder side to enforce the
update_volume_admin_metadata policy rule on the os-attach API? I'm not
aware of anything that has changed on the nova side in stein that would
be related to this.

** Also affects: cinder
   Importance: Undecided
   Status: New

** Tags added: policy volumes

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1848514

Title:
  Booting from volume providing an image fails

Status in Cinder:
  New
Status in OpenStack Compute (nova):
  New

Bug description:
  Trying to create an instance (booting from volume when specifying an image) 
fails.
  Running Stein (19.0.1)

  ###
  When using:
  ###
  nova boot --flavor FLAVOR_ID --block-device 
source=image,id=IMAGE_ID,dest=volume,size=10,shutdown=preserve,bootindex=0 
INSTANCE_NAME

  ###
  nova-compute logs:
  ###

  Instance failed block device setup Forbidden: Policy doesn't allow
  volume:update_volume_admin_metadata to be performed. (HTTP 403)
  (Request-ID: req-875cc6e1-ffe1-45dd-b942-944166c6040a)

  The full trace:
  http://paste.openstack.org/raw/784535/

  
  Definitely this is a policy issue!
  Our cinder policy: "volume:update_volume_admin_metadata": "rule:admin_api" 
(default)
  Using an user with admin credentials works as expected!

  Is this expected? we didn't identified this behaviour previously
  (before stein) using the same policy for
  "update_volume_admin_metadata"

  Found an old similar report:
  https://bugs.launchpad.net/nova/+bug/1661189

To manage notifications about this bug go to:
https://bugs.launchpad.net/cinder/+bug/1848514/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1848499] [NEW] powervm driver tests fail with networkx 2.4: "AttributeError: 'DiGraph' object has no attribute 'node'"

2019-10-17 Thread Matt Riedemann
Public bug reported:

https://c6fecb2db5c55fa0effa-
6229cc6450d9b491384804026d2fbd81.ssl.cf5.rackcdn.com/688980/1/gate
/openstack-tox-py36/71a8bdd/testr_results.html.gz

ft1.2: 
nova.tests.unit.virt.powervm.tasks.test_vm.TestVMTasks.test_power_on_revert_StringException:
 Traceback (most recent call last):
  File 
"/home/zuul/src/opendev.org/openstack/nova/.tox/py36/lib/python3.6/site-packages/mock/mock.py",
 line 1330, in patched
return func(*args, **keywargs)
  File 
"/home/zuul/src/opendev.org/openstack/nova/nova/tests/unit/virt/powervm/tasks/test_vm.py",
 line 90, in test_power_on_revert
self.assertRaises(ValueError, tf_eng.run, flow)
  File 
"/home/zuul/src/opendev.org/openstack/nova/.tox/py36/lib/python3.6/site-packages/testtools/testcase.py",
 line 485, in assertRaises
self.assertThat(our_callable, matcher)
  File 
"/home/zuul/src/opendev.org/openstack/nova/.tox/py36/lib/python3.6/site-packages/testtools/testcase.py",
 line 496, in assertThat
mismatch_error = self._matchHelper(matchee, matcher, message, verbose)
  File 
"/home/zuul/src/opendev.org/openstack/nova/.tox/py36/lib/python3.6/site-packages/testtools/testcase.py",
 line 547, in _matchHelper
mismatch = matcher.match(matchee)
  File 
"/home/zuul/src/opendev.org/openstack/nova/.tox/py36/lib/python3.6/site-packages/testtools/matchers/_exception.py",
 line 108, in match
mismatch = self.exception_matcher.match(exc_info)
  File 
"/home/zuul/src/opendev.org/openstack/nova/.tox/py36/lib/python3.6/site-packages/testtools/matchers/_higherorder.py",
 line 62, in match
mismatch = matcher.match(matchee)
  File 
"/home/zuul/src/opendev.org/openstack/nova/.tox/py36/lib/python3.6/site-packages/testtools/testcase.py",
 line 475, in match
reraise(*matchee)
  File 
"/home/zuul/src/opendev.org/openstack/nova/.tox/py36/lib/python3.6/site-packages/testtools/_compat3x.py",
 line 16, in reraise
raise exc_obj.with_traceback(exc_tb)
  File 
"/home/zuul/src/opendev.org/openstack/nova/.tox/py36/lib/python3.6/site-packages/testtools/matchers/_exception.py",
 line 101, in match
result = matchee()
  File 
"/home/zuul/src/opendev.org/openstack/nova/.tox/py36/lib/python3.6/site-packages/testtools/testcase.py",
 line 1049, in __call__
return self._callable_object(*self._args, **self._kwargs)
  File 
"/home/zuul/src/opendev.org/openstack/nova/.tox/py36/lib/python3.6/site-packages/taskflow/engines/helpers.py",
 line 162, in run
engine.run()
  File 
"/home/zuul/src/opendev.org/openstack/nova/.tox/py36/lib/python3.6/site-packages/taskflow/engines/action_engine/engine.py",
 line 247, in run
for _state in self.run_iter(timeout=timeout):
  File 
"/home/zuul/src/opendev.org/openstack/nova/.tox/py36/lib/python3.6/site-packages/taskflow/engines/action_engine/engine.py",
 line 271, in run_iter
self.compile()
  File 
"/home/zuul/src/opendev.org/openstack/nova/.tox/py36/lib/python3.6/site-packages/fasteners/lock.py",
 line 306, in wrapper
return f(self, *args, **kwargs)
  File 
"/home/zuul/src/opendev.org/openstack/nova/.tox/py36/lib/python3.6/site-packages/taskflow/engines/action_engine/engine.py",
 line 470, in compile
self._runtime.compile()
  File 
"/home/zuul/src/opendev.org/openstack/nova/.tox/py36/lib/python3.6/site-packages/taskflow/engines/action_engine/runtime.py",
 line 143, in compile
metadata['edge_deciders'] = tuple(deciders_it)
  File 
"/home/zuul/src/opendev.org/openstack/nova/.tox/py36/lib/python3.6/site-packages/taskflow/engines/action_engine/runtime.py",
 line 75, in _walk_edge_deciders
u_node_kind = graph.node[u_node]['kind']
AttributeError: 'DiGraph' object has no attribute 'node'

Seems this is since networkx 2.4 was released 11 hours ago:

https://pypi.org/project/networkx/2.4/

And upper-constraints aren't being honored for some reason:

networkx===2.2;python_version=='2.7'
networkx===2.3;python_version=='3.4'
networkx===2.3;python_version=='3.5'
networkx===2.3;python_version=='3.6'
networkx===2.3;python_version=='3.7'

I guess maybe because they are a transitive dependency of taskflow which
the powervm driver depends on?

** Affects: nova
 Importance: Critical
 Status: Confirmed


** Tags: gate-failure

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1848499

Title:
  powervm driver tests fail with networkx 2.4: "AttributeError:
  'DiGraph' object has no attribute 'node'"

Status in OpenStack Compute (nova):
  Confirmed

Bug description:
  https://c6fecb2db5c55fa0effa-
  6229cc6450d9b491384804026d2fbd81.ssl.cf5.rackcdn.com/688980/1/gate
  /openstack-tox-py36/71a8bdd/testr_results.html.gz

  ft1.2: 
nova.tests.unit.virt.powervm.tasks.test_vm.TestVMTasks.test_power_on_revert_StringException:
 Traceback (most recent call last):
File 
"/home/zuul/src/opendev.org/openstack/nova/.tox/py36/lib/python3.6/site-packages/mock/mock.py",
 line 1330, in patched
  

[Yahoo-eng-team] [Bug 1848442] Re: The request method of "os-floating-ips" should be DELETE

2019-10-17 Thread Matt Riedemann
That API is for nova-network only which we are removing so eventually
that API is just going to return a 410 response and won't be used
anyway.

** Changed in: nova
   Status: New => Won't Fix

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1848442

Title:
  The  request method of "os-floating-ips"  should be DELETE

Status in OpenStack Compute (nova):
  Won't Fix

Bug description:
  Bulk-deletes floating IPs,the request method of /os-floating-ips-
  bulk/delete should be DELETE

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1848442/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1848373] Re: Instance.save(expected_task_state=) is passed string in many locations

2019-10-16 Thread Matt Riedemann
Looks like expected_task_state is pulled from the values dict here:

https://github.com/openstack/nova/blob/1a226aaa9e8c969ddfdfe198c36f7966b1f692f3/nova/db/sqlalchemy/api.py#L2850

and if not None converted to a list here:

https://github.com/openstack/nova/blob/1a226aaa9e8c969ddfdfe198c36f7966b1f692f3/nova/db/sqlalchemy/api.py#L2857

So I guess that's why things work and I can close this bug - there are
wrong uses of expected_task_state for the Instance.save but the DB API
handles it.

** Changed in: nova
   Status: Triaged => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1848373

Title:
  Instance.save(expected_task_state=) is passed string in many locations

Status in OpenStack Compute (nova):
  Invalid

Bug description:
  I noticed this in some code I was writing when it didn't behave like I
  expected:

  
https://review.opendev.org/#/c/627891/63/nova/conductor/tasks/cross_cell_migrate.py@423

  
https://review.opendev.org/#/c/688832/2/nova/conductor/tasks/cross_cell_migrate.py@781

  That "works" because strings are iterable but it's not the intended
  use of that kwarg which should be None or a list or tuple:

  
https://github.com/openstack/nova/blob/1a226aaa9e8c969ddfdfe198c36f7966b1f692f3/nova/objects/instance.py#L758

  We have several places that incorrectly pass a string though, here are
  a couple:

  
https://github.com/openstack/nova/blob/1a226aaa9e8c969ddfdfe198c36f7966b1f692f3/nova/compute/api.py#L3228

  
https://github.com/openstack/nova/blob/1a226aaa9e8c969ddfdfe198c36f7966b1f692f3/nova/compute/manager.py#L2554

  
https://github.com/openstack/nova/blob/1a226aaa9e8c969ddfdfe198c36f7966b1f692f3/nova/compute/manager.py#L3103

  The Instance.save() method should probably assert that if the value is
  not None that it's not a string type since the latter is a coding
  error.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1848373/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1848373] [NEW] Instance.save(expected_task_state=) is passed string in many locations

2019-10-16 Thread Matt Riedemann
Public bug reported:

I noticed this in some code I was writing when it didn't behave like I
expected:

https://review.opendev.org/#/c/627891/63/nova/conductor/tasks/cross_cell_migrate.py@423

https://review.opendev.org/#/c/688832/2/nova/conductor/tasks/cross_cell_migrate.py@781

That "works" because strings are iterable but it's not the intended use
of that kwarg which should be None or a list or tuple:

https://github.com/openstack/nova/blob/1a226aaa9e8c969ddfdfe198c36f7966b1f692f3/nova/objects/instance.py#L758

We have several places that incorrectly pass a string though, here are a
couple:

https://github.com/openstack/nova/blob/1a226aaa9e8c969ddfdfe198c36f7966b1f692f3/nova/compute/api.py#L3228

https://github.com/openstack/nova/blob/1a226aaa9e8c969ddfdfe198c36f7966b1f692f3/nova/compute/manager.py#L2554

https://github.com/openstack/nova/blob/1a226aaa9e8c969ddfdfe198c36f7966b1f692f3/nova/compute/manager.py#L3103

The Instance.save() method should probably assert that if the value is
not None that it's not a string type since the latter is a coding error.

** Affects: nova
 Importance: Medium
 Status: Triaged

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1848373

Title:
  Instance.save(expected_task_state=) is passed string in many locations

Status in OpenStack Compute (nova):
  Triaged

Bug description:
  I noticed this in some code I was writing when it didn't behave like I
  expected:

  
https://review.opendev.org/#/c/627891/63/nova/conductor/tasks/cross_cell_migrate.py@423

  
https://review.opendev.org/#/c/688832/2/nova/conductor/tasks/cross_cell_migrate.py@781

  That "works" because strings are iterable but it's not the intended
  use of that kwarg which should be None or a list or tuple:

  
https://github.com/openstack/nova/blob/1a226aaa9e8c969ddfdfe198c36f7966b1f692f3/nova/objects/instance.py#L758

  We have several places that incorrectly pass a string though, here are
  a couple:

  
https://github.com/openstack/nova/blob/1a226aaa9e8c969ddfdfe198c36f7966b1f692f3/nova/compute/api.py#L3228

  
https://github.com/openstack/nova/blob/1a226aaa9e8c969ddfdfe198c36f7966b1f692f3/nova/compute/manager.py#L2554

  
https://github.com/openstack/nova/blob/1a226aaa9e8c969ddfdfe198c36f7966b1f692f3/nova/compute/manager.py#L3103

  The Instance.save() method should probably assert that if the value is
  not None that it's not a string type since the latter is a coding
  error.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1848373/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1848343] Re: MigrationTask rollback can leak allocations for a deleted server

2019-10-16 Thread Matt Riedemann
** Also affects: nova/queens
   Importance: Undecided
   Status: New

** Also affects: nova/train
   Importance: Undecided
   Status: New

** Also affects: nova/stein
   Importance: Undecided
   Status: New

** Also affects: nova/rocky
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1848343

Title:
  MigrationTask rollback can leak allocations for a deleted server

Status in OpenStack Compute (nova):
  Triaged
Status in OpenStack Compute (nova) queens series:
  New
Status in OpenStack Compute (nova) rocky series:
  New
Status in OpenStack Compute (nova) stein series:
  New
Status in OpenStack Compute (nova) train series:
  New

Bug description:
  This came up in the cross-cell resize review:

  
https://review.opendev.org/#/c/627890/60/nova/conductor/tasks/cross_cell_migrate.py@495

  And I was able to recreate with a functional test here:

  https://review.opendev.org/#/c/688832/

  That test is doing a cross-cell cold migration but looking at the
  code:

  
https://github.com/openstack/nova/blob/1a226aaa9e8c969ddfdfe198c36f7966b1f692f3/nova/conductor/tasks/migrate.py#L461

  We can hit an issue for same-cell resize/cold migrate if we have
  swapped the allocations so the source node allocations are held by the
  migration consumer and the instance holds allocations on the target
  node (created by the scheduler):

  
https://github.com/openstack/nova/blob/1a226aaa9e8c969ddfdfe198c36f7966b1f692f3/nova/conductor/tasks/migrate.py#L328

  If something fails between ^ and the cast to prep_resize, the task
  will rollback and revert the allocations so the target node
  allocations are dropped and the source node allocations are moved back
  to the instance:

  
https://github.com/openstack/nova/blob/1a226aaa9e8c969ddfdfe198c36f7966b1f692f3/nova/conductor/tasks/migrate.py#L91

  Furthermore, if the instance was deleted when we perform that swap,
  the move_allocations method will recreate the allocations on the
  source node for the now-deleted instance since we don't assert
  consumer generations during the swap:

  
https://github.com/openstack/nova/blob/1a226aaa9e8c969ddfdfe198c36f7966b1f692f3/nova/scheduler/client/report.py#L1886

  This results in leaking allocations for the source node since the
  instance is deleted.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1848343/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1848343] [NEW] MigrationTask rollback can leak allocations for a deleted server

2019-10-16 Thread Matt Riedemann
Public bug reported:

This came up in the cross-cell resize review:

https://review.opendev.org/#/c/627890/60/nova/conductor/tasks/cross_cell_migrate.py@495

And I was able to recreate with a functional test here:

https://review.opendev.org/#/c/688832/

That test is doing a cross-cell cold migration but looking at the code:

https://github.com/openstack/nova/blob/1a226aaa9e8c969ddfdfe198c36f7966b1f692f3/nova/conductor/tasks/migrate.py#L461

We can hit an issue for same-cell resize/cold migrate if we have swapped
the allocations so the source node allocations are held by the migration
consumer and the instance holds allocations on the target node (created
by the scheduler):

https://github.com/openstack/nova/blob/1a226aaa9e8c969ddfdfe198c36f7966b1f692f3/nova/conductor/tasks/migrate.py#L328

If something fails between ^ and the cast to prep_resize, the task will
rollback and revert the allocations so the target node allocations are
dropped and the source node allocations are moved back to the instance:

https://github.com/openstack/nova/blob/1a226aaa9e8c969ddfdfe198c36f7966b1f692f3/nova/conductor/tasks/migrate.py#L91

Furthermore, if the instance was deleted when we perform that swap, the
move_allocations method will recreate the allocations on the source node
for the now-deleted instance since we don't assert consumer generations
during the swap:

https://github.com/openstack/nova/blob/1a226aaa9e8c969ddfdfe198c36f7966b1f692f3/nova/scheduler/client/report.py#L1886

This results in leaking allocations for the source node since the
instance is deleted.

** Affects: nova
 Importance: Undecided
 Status: Triaged


** Tags: cold-migrate placement resize

** Changed in: nova
   Status: New => Triaged

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1848343

Title:
  MigrationTask rollback can leak allocations for a deleted server

Status in OpenStack Compute (nova):
  Triaged

Bug description:
  This came up in the cross-cell resize review:

  
https://review.opendev.org/#/c/627890/60/nova/conductor/tasks/cross_cell_migrate.py@495

  And I was able to recreate with a functional test here:

  https://review.opendev.org/#/c/688832/

  That test is doing a cross-cell cold migration but looking at the
  code:

  
https://github.com/openstack/nova/blob/1a226aaa9e8c969ddfdfe198c36f7966b1f692f3/nova/conductor/tasks/migrate.py#L461

  We can hit an issue for same-cell resize/cold migrate if we have
  swapped the allocations so the source node allocations are held by the
  migration consumer and the instance holds allocations on the target
  node (created by the scheduler):

  
https://github.com/openstack/nova/blob/1a226aaa9e8c969ddfdfe198c36f7966b1f692f3/nova/conductor/tasks/migrate.py#L328

  If something fails between ^ and the cast to prep_resize, the task
  will rollback and revert the allocations so the target node
  allocations are dropped and the source node allocations are moved back
  to the instance:

  
https://github.com/openstack/nova/blob/1a226aaa9e8c969ddfdfe198c36f7966b1f692f3/nova/conductor/tasks/migrate.py#L91

  Furthermore, if the instance was deleted when we perform that swap,
  the move_allocations method will recreate the allocations on the
  source node for the now-deleted instance since we don't assert
  consumer generations during the swap:

  
https://github.com/openstack/nova/blob/1a226aaa9e8c969ddfdfe198c36f7966b1f692f3/nova/scheduler/client/report.py#L1886

  This results in leaking allocations for the source node since the
  instance is deleted.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1848343/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1841481] Re: Race during ironic re-balance corrupts local RT ProviderTree and compute_nodes cache

2019-10-15 Thread Matt Riedemann
Hits in ironic multinode jobs:

http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22Skipping%20removal%20of%20allocations%20for%20deleted%20instances%3A%20Failed%20to%20retrieve%20allocations%20for%20resource%20provider%5C%22%20AND%20message%3A%5C%22No%20resource%20provider%20with%20uuid%5C%22%20AND%20tags%3A%5C%22screen-n-cpu.txt%5C%22%20AND%20project%3A%5C%22openstack%2Fironic%5C%22=7d

We don't have an elastic-recheck query for that since none of the jobs
it hits on are voting.

** Also affects: nova/train
   Importance: Undecided
   Status: New

** Also affects: nova/pike
   Importance: Undecided
   Status: New

** Also affects: nova/ocata
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1841481

Title:
  Race during ironic re-balance corrupts local RT ProviderTree and
  compute_nodes cache

Status in OpenStack Compute (nova):
  In Progress
Status in OpenStack Compute (nova) ocata series:
  New
Status in OpenStack Compute (nova) pike series:
  New
Status in OpenStack Compute (nova) queens series:
  New
Status in OpenStack Compute (nova) rocky series:
  New
Status in OpenStack Compute (nova) stein series:
  New
Status in OpenStack Compute (nova) train series:
  New

Bug description:
  Seen with an ironic re-balance in this job:

  
https://d01b2e57f0a56cb7edf0-b6bc206936c08bb07a5f77cfa916a2d4.ssl.cf5.rackcdn.com/678298/4/check
  /ironic-tempest-ipa-wholedisk-direct-tinyipa-multinode/92c65ac/

  On the subnode we see the RT detect that the node is moving hosts:

  Aug 26 18:41:38.818412 ubuntu-bionic-rax-ord-0010443319 nova-
  compute[747]: INFO nova.compute.resource_tracker [None req-a894abee-
  a2f1-4423-8ede-2a1b9eef28a4 None None] ComputeNode 61dbc9c7-828b-4c42
  -b19c-a3716037965f moving from ubuntu-bionic-rax-ord-0010443317 to
  ubuntu-bionic-rax-ord-0010443319

  On that new host, the ProviderTree cache is getting updated with
  refreshed associations for inventory:

  Aug 26 18:41:38.881026 ubuntu-bionic-rax-ord-0010443319 nova-
  compute[747]: DEBUG nova.scheduler.client.report [None req-a894abee-
  a2f1-4423-8ede-2a1b9eef28a4 None None] Refreshing inventories for
  resource provider 61dbc9c7-828b-4c42-b19c-a3716037965f {{(pid=747)
  _refresh_associations
  /opt/stack/nova/nova/scheduler/client/report.py:761}}

  aggregates:

  Aug 26 18:41:38.953685 ubuntu-bionic-rax-ord-0010443319 nova-
  compute[747]: DEBUG nova.scheduler.client.report [None req-a894abee-
  a2f1-4423-8ede-2a1b9eef28a4 None None] Refreshing aggregate
  associations for resource provider 61dbc9c7-828b-4c42-b19c-
  a3716037965f, aggregates: None {{(pid=747) _refresh_associations
  /opt/stack/nova/nova/scheduler/client/report.py:770}}

  and traits - but when we get traits the provider is gone:

  Aug 26 18:41:38.995595 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: 
ERROR nova.compute.manager [None req-a894abee-a2f1-4423-8ede-2a1b9eef28a4 None 
None] Error updating resources for node 61dbc9c7-828b-4c42-b19c-a3716037965f.: 
ResourceProviderTraitRetrievalFailed: Failed to get traits for resource 
provider with UUID 61dbc9c7-828b-4c42-b19c-a3716037965f
  Aug 26 18:41:38.995595 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: 
ERROR nova.compute.manager Traceback (most recent call last):
  Aug 26 18:41:38.995595 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: 
ERROR nova.compute.manager   File "/opt/stack/nova/nova/compute/manager.py", 
line 8250, in _update_available_resource_for_node
  Aug 26 18:41:38.995595 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: 
ERROR nova.compute.manager startup=startup)
  Aug 26 18:41:38.995595 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: 
ERROR nova.compute.manager   File 
"/opt/stack/nova/nova/compute/resource_tracker.py", line 715, in 
update_available_resource
  Aug 26 18:41:38.995595 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: 
ERROR nova.compute.manager self._update_available_resource(context, 
resources, startup=startup)
  Aug 26 18:41:38.995595 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: 
ERROR nova.compute.manager   File 
"/usr/local/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py", line 
328, in inner
  Aug 26 18:41:38.995595 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: 
ERROR nova.compute.manager return f(*args, **kwargs)
  Aug 26 18:41:38.995595 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: 
ERROR nova.compute.manager   File 
"/opt/stack/nova/nova/compute/resource_tracker.py", line 738, in 
_update_available_resource
  Aug 26 18:41:38.995595 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: 
ERROR nova.compute.manager is_new_compute_node = 
self._init_compute_node(context, resources)
  Aug 26 18:41:38.995595 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: 
ERROR nova.compute.manager   File 

[Yahoo-eng-team] [Bug 1836754] Re: Conflict when deleting allocations for an instance that hasn't finished building

2019-10-15 Thread Matt Riedemann
This goes back to Stein because https://review.opendev.org/#/c/591597/
changed the method from using DELETE /allocations/{consumer_id} to the
GET/PUT dance.

** Also affects: nova/stein
   Importance: Undecided
   Status: New

** Also affects: nova/train
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1836754

Title:
  Conflict when deleting allocations for an instance that hasn't
  finished building

Status in OpenStack Compute (nova):
  In Progress
Status in OpenStack Compute (nova) stein series:
  New
Status in OpenStack Compute (nova) train series:
  New

Bug description:
  Description
  ===

  When deleting an instance that hasn't finished building, we'll
  sometimes get a 409 from placement as such:

  Failed to delete allocations for consumer 6494d4d3-013e-478f-
  9ac1-37ca7a67b776. Error: {"errors": [{"status": 409, "title":
  "Conflict", "detail": "There was a conflict when trying to complete
  your request.nn Inventory and/or allocations changed while
  attempting to allocate: Another thread concurrently updated the data.
  Please retry your update  ", "code": "placement.concurrent_update",
  "request_id": "req-6dcd766b-f5d3-49fa-89f3-02e64079046a"}]}

  Steps to reproduce
  ==

  1. Boot an instance
  2. Don't wait for it to become active
  3. Delete it immediately

  Expected result
  ===

  The instance deletes successfully.

  Actual result
  =

  Nova bubbles up that error from Placement.

  Logs & Configs
  ==

  This is being hit at a low rate in various CI tests, logstash query is
  here:

  
http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22Inventory%20and%2For%20allocations%20changed%20while%20attempting%20to%20allocate%3A%20Another%20thread%20concurrently%20updated%20the%20data%5C%22%20AND%20filename%3A%5C
  %22job-output.txt%5C%22

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1836754/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1737131] Re: Superfluous re-mount attempts with the Quobyte Nova driver and multi-registry volume URLs

2019-10-14 Thread Matt Riedemann
** Also affects: nova/queens
   Importance: Undecided
   Status: New

** Changed in: nova/queens
   Status: New => In Progress

** Changed in: nova/queens
   Importance: Undecided => Low

** Changed in: nova/queens
 Assignee: (unassigned) => Silvan Kaiser (2-silvan)

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1737131

Title:
  Superfluous re-mount attempts with the Quobyte Nova driver and multi-
  registry volume URLs

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) queens series:
  In Progress
Status in OpenStack Compute (nova) rocky series:
  Fix Committed
Status in OpenStack Compute (nova) stein series:
  Fix Committed

Bug description:
  When using a multi-registry volume URL in the Cinder Quobyte driver
  the Nova Quobyte driver does not detect existing mounts correctly.
  Upon trying to mount the given volume the driver fails because the
  mount already exists:

  [..]
  2017-12-08 08:32:29.277 25660 ERROR oslo_messaging.rpc.server   File 
"/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in 
force_reraise
  2017-12-08 08:32:29.277 25660 ERROR oslo_messaging.rpc.server 
six.reraise(self.type_, self.value, self.tb)
  2017-12-08 08:32:29.277 25660 ERROR oslo_messaging.rpc.server   File 
"/opt/stack/nova/nova/virt/block_device.py", line 389, in attach
  2017-12-08 08:32:29.277 25660 ERROR oslo_messaging.rpc.server 
device_type=self['device_type'], encryption=encryption)
  2017-12-08 08:32:29.277 25660 ERROR oslo_messaging.rpc.server   File 
"/opt/stack/nova/nova/virt/libvirt/driver.py", line 1248, in attach_volume
  2017-12-08 08:32:29.277 25660 ERROR oslo_messaging.rpc.server 
self._connect_volume(connection_info, disk_info, instance)
  2017-12-08 08:32:29.277 25660 ERROR oslo_messaging.rpc.server   File 
"/opt/stack/nova/nova/virt/libvirt/driver.py", line 1181, in _connect_volume
  2017-12-08 08:32:29.277 25660 ERROR oslo_messaging.rpc.server 
vol_driver.connect_volume(connection_info, disk_info, instance)
  2017-12-08 08:32:29.277 25660 ERROR oslo_messaging.rpc.server   File 
"/usr/local/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py", line 
274, in inner
  2017-12-08 08:32:29.277 25660 ERROR oslo_messaging.rpc.server return 
f(*args, **kwargs)
  2017-12-08 08:32:29.277 25660 ERROR oslo_messaging.rpc.server   File 
"/opt/stack/nova/nova/virt/libvirt/volume/quobyte.py", line 147, in 
connect_volume
  2017-12-08 08:32:29.277 25660 ERROR oslo_messaging.rpc.server 
CONF.libvirt.quobyte_client_cfg)
  2017-12-08 08:32:29.277 25660 ERROR oslo_messaging.rpc.server   File 
"/opt/stack/nova/nova/virt/libvirt/volume/quobyte.py", line 61, in mount_volume
  2017-12-08 08:32:29.277 25660 ERROR oslo_messaging.rpc.server 
utils.execute(*command)
  2017-12-08 08:32:29.277 25660 ERROR oslo_messaging.rpc.server   File 
"/opt/stack/nova/nova/utils.py", line 229, in execute
  2017-12-08 08:32:29.277 25660 ERROR oslo_messaging.rpc.server return 
processutils.execute(*cmd, **kwargs)
  2017-12-08 08:32:29.277 25660 ERROR oslo_messaging.rpc.server   File 
"/usr/local/lib/python2.7/dist-packages/oslo_concurrency/processutils.py", line 
419, in execute
  2017-12-08 08:32:29.277 25660 ERROR oslo_messaging.rpc.server 
cmd=sanitized_cmd)
  2017-12-08 08:32:29.277 25660 ERROR oslo_messaging.rpc.server 
ProcessExecutionError: Unexpected error while running command.
  2017-12-08 08:32:29.277 25660 ERROR oslo_messaging.rpc.server Command: 
mount.quobyte --disable-xattrs 
78.46.57.153:7861,78.46.57.153:7861,78.46.57.153:7861/82000e41-c6ac-4be2-b31a-0543db93767c
 /mnt/quobyte-volume/531b7439e360bdea0a79870354906cab
  2017-12-08 08:32:29.277 25660 ERROR oslo_messaging.rpc.server Exit code: 4
  2017-12-08 08:32:29.277 25660 ERROR oslo_messaging.rpc.server Stdout: 
u'mount.quobyte failed: Unable to initialize mount point\n'
  2017-12-08 08:32:29.277 25660 ERROR oslo_messaging.rpc.server Stderr: 
u"Logging to file /opt/stack/logs/quobyte_client.log.\nfuse: mountpoint is not 
empty\nfuse: if you are sure this is safe, use the 'nonempty' mount option\n"
  2017-12-08 08:32:29.277 25660 ERROR oslo_messaging.rpc.server

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1737131/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1835400] Re: Issues booting with os_distro=centos7.0

2019-10-14 Thread Matt Riedemann
** Also affects: nova/stein
   Importance: Undecided
   Status: New

** Also affects: nova/queens
   Importance: Undecided
   Status: New

** Also affects: nova/rocky
   Importance: Undecided
   Status: New

** Changed in: nova/queens
   Status: New => In Progress

** Changed in: nova/rocky
   Status: New => In Progress

** Tags added: libvirt

** Changed in: nova/rocky
   Importance: Undecided => Medium

** Changed in: nova/stein
   Importance: Undecided => Medium

** Changed in: nova/queens
   Importance: Undecided => Medium

** Changed in: nova/stein
   Status: New => In Progress

** Changed in: nova/queens
 Assignee: (unassigned) => Lee Yarwood (lyarwood)

** Changed in: nova/rocky
 Assignee: (unassigned) => Lee Yarwood (lyarwood)

** Changed in: nova/stein
 Assignee: (unassigned) => Lee Yarwood (lyarwood)

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1835400

Title:
  Issues booting with os_distro=centos7.0

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) queens series:
  In Progress
Status in OpenStack Compute (nova) rocky series:
  In Progress
Status in OpenStack Compute (nova) stein series:
  In Progress

Bug description:
  If we have os_distro=centos this isn't known by os-info, so we get:

  Cannot find OS information - Reason: (No configuration information
  found for operating system centos7): OsInfoNotFound: No configuration
  information found for operating system centos7

  If we "fix" it to os_distro=centos7.0 we get:

  Instance failed to spawn: UnsupportedHardware: Requested hardware
  'virtio1.0-net' is not supported by the 'kvm' virt driver

  This is with Rocky, but was also happening with Queens, I believe.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1835400/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1847302] [NEW] doc: need admin guide for the image cache

2019-10-08 Thread Matt Riedemann
Public bug reported:

There is no documentation for the image cache, so we should add one to
the admin guide.

I think a relatively simple beginning would include:

- A high level description of what an image cache is, where it lives,
and the benefits.

- Which compute drivers support image cache (that's not detailed here
either: https://docs.openstack.org/nova/latest/user/support-
matrix.html), this is any driver that supports the "has_imagecache"
driver capability (currently libvirt, hyperv and vmware).

- The related configuration options since the options are not in a
particular config option group they are all spread across DEFAULT
(moving those to an [imagecache] group would probably be useful as well
outside the docs change).

More advanced topics could be things like known issues/limitations
(maybe mdbooth can help here), some of which is probably covered in this
spec:

https://specs.openstack.org/openstack/nova-specs/specs/ussuri/approved
/image-precache-support.html

** Affects: nova
 Importance: Undecided
 Status: New


** Tags: doc image-cache

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1847302

Title:
  doc: need admin guide for the image cache

Status in OpenStack Compute (nova):
  New

Bug description:
  There is no documentation for the image cache, so we should add one to
  the admin guide.

  I think a relatively simple beginning would include:

  - A high level description of what an image cache is, where it lives,
  and the benefits.

  - Which compute drivers support image cache (that's not detailed here
  either: https://docs.openstack.org/nova/latest/user/support-
  matrix.html), this is any driver that supports the "has_imagecache"
  driver capability (currently libvirt, hyperv and vmware).

  - The related configuration options since the options are not in a
  particular config option group they are all spread across DEFAULT
  (moving those to an [imagecache] group would probably be useful as
  well outside the docs change).

  More advanced topics could be things like known issues/limitations
  (maybe mdbooth can help here), some of which is probably covered in
  this spec:

  https://specs.openstack.org/openstack/nova-specs/specs/ussuri/approved
  /image-precache-support.html

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1847302/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1833581] Re: instance stuck in BUILD state if nova-compute is restarted

2019-10-07 Thread Matt Riedemann
This is extremely latent but I've marked it going back to at least
queens since that's currently our oldest non-extended maintenance
branch.

** Also affects: nova/queens
   Importance: Undecided
   Status: New

** Also affects: nova/train
   Importance: Undecided
   Status: New

** Also affects: nova/stein
   Importance: Undecided
   Status: New

** Also affects: nova/rocky
   Importance: Undecided
   Status: New

** Changed in: nova/queens
   Status: New => Confirmed

** Changed in: nova/rocky
   Status: New => Confirmed

** Changed in: nova/train
   Status: New => Confirmed

** Changed in: nova/stein
   Status: New => Confirmed

** Changed in: nova/train
   Importance: Undecided => Critical

** Changed in: nova/stein
   Importance: Undecided => Low

** Changed in: nova/rocky
   Importance: Undecided => Low

** Changed in: nova/queens
   Importance: Undecided => Low

** Changed in: nova/train
   Importance: Critical => Low

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1833581

Title:
  instance stuck in BUILD state if nova-compute is restarted

Status in OpenStack Compute (nova):
  In Progress
Status in OpenStack Compute (nova) queens series:
  Confirmed
Status in OpenStack Compute (nova) rocky series:
  Confirmed
Status in OpenStack Compute (nova) stein series:
  Confirmed
Status in OpenStack Compute (nova) train series:
  Confirmed

Bug description:
  Description
  ===
  Instance stuck in BUILD state indefinitely if nova-compute service restarted 
in the mean time. Even after the instance_build_timeout the instance is not put 
into ERROR state.

  Steps to reproduce
  ==

  1) Start 10 VMs in parallel to increase the chance of hitting the bug

  $ for NUM in `seq 1 1 10`; do openstack  server create --flavor c1
  --image cirros-0.4.0-x86_64-disk --availability-zone nova:ubuntu
  vm$NUM &  done

  2) when the first instance reach the BUILD state restart the nova-compute 
service
  $ sudo systemctl restart devstack@n-cpu.service

  3) Observer that instance states after the compute is up again.

  Expected result
  ===

  Instances either in ACTIVE or in ERROR state.

  Actual result
  =
  Some instance stuck in BUILD state.

  
  Environment
  ===

  all in one devstack build from recent nova master
  61558f274842b149044a14bbe7537b9f278035fd

  
  Logs & Configs
  ==

  stack@ubuntu:~$ openstack server list
  
+--+--+++--+---+
  | ID   | Name | Status | Networks 
  | Image| Flavor|
  
+--+--+++--+---+
  | 9ee76601-4a61-4682-86f1-743dac2b05e6 | vm3  | BUILD  |  
  | cirros-0.4.0-x86_64-disk | cirros256 |
  | e459beae-ccb5-4781-b938-2dff68e33bf7 | vm9  | ACTIVE | 
public=2001:db8::181, 172.24.4.44  | cirros-0.4.0-x86_64-disk | cirros256 |
  | 562f44db-cd51-4516-bce9-598bd29c6310 | vm10 | ERROR  | 
public=2001:db8::3a1, 172.24.4.196 | cirros-0.4.0-x86_64-disk | cirros256 |
  | 73f1e2c6-78a1-44c5-b178-7adcf9bf58a0 | vm5  | ERROR  | public=2001:db8::21, 
172.24.4.177  | cirros-0.4.0-x86_64-disk | cirros256 |
  | 1b01acfc-b798-48f9-b808-6cfd0d5cd3fb | vm6  | ERROR  | 
public=2001:db8::3e1, 172.24.4.20  | cirros-0.4.0-x86_64-disk | cirros256 |
  | c709e3bf-9c71-4f64-bad3-e9e07e911f62 | vm7  | ERROR  | 
public=2001:db8::231, 172.24.4.46  | cirros-0.4.0-x86_64-disk | cirros256 |
  | 538d2534-98f1-4e11-9bbb-b4e74bab8c65 | vm4  | ERROR  | 
public=2001:db8::3e9, 172.24.4.157 | cirros-0.4.0-x86_64-disk | cirros256 |
  | ed74eb32-00fe-4f24-9379-c57c04ce9af1 | vm2  | ERROR  | public=2001:db8::f5, 
172.24.4.53   | cirros-0.4.0-x86_64-disk | cirros256 |
  | 582b5356-4f3d-42ed-937e-966580303af0 | vm8  | ERROR  | public=2001:db8::92, 
172.24.4.16   | cirros-0.4.0-x86_64-disk | cirros256 |
  | ae36ffca-e4d6-4353-8e7e-41db500a5e0d | vm1  | ERROR  | 
public=2001:db8::1cf, 172.24.4.203 | cirros-0.4.0-x86_64-disk | cirros256 |
  
+--+--+++--+---+

  
  stack@ubuntu:~$ openstack server show 9ee76601-4a61-4682-86f1-743dac2b05e6
  
+-+-+
  | Field   | Value 
  |
  
+-+-+
  | OS-DCF:diskConfig   | MANUAL
  

[Yahoo-eng-team] [Bug 1847131] [NEW] UnboundLocalError: local variable 'cell_uuid' referenced before assignment

2019-10-07 Thread Matt Riedemann
Public bug reported:

https://review.opendev.org/#/c/684118/ recently merged and is causing an
issue because a variable used in the log message isn't in scope:

Oct 07 07:16:51.372050 ubuntu-bionic-ovh-bhs1-0012185489 nova-scheduler[28235]: 
ERROR oslo_messaging.rpc.server [None req-72524ba6-86bf-479d-a09f-9a9d302f7d2f 
demo demo] Exception during message handling: UnboundLocalError: local variable 
'cell_uuid' referenced before assignment
Oct 07 07:16:51.372050 ubuntu-bionic-ovh-bhs1-0012185489 nova-scheduler[28235]: 
ERROR oslo_messaging.rpc.server Traceback (most recent call last):
Oct 07 07:16:51.372050 ubuntu-bionic-ovh-bhs1-0012185489 nova-scheduler[28235]: 
ERROR oslo_messaging.rpc.server   File 
"/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/server.py", line 
165, in _process_incoming
Oct 07 07:16:51.372050 ubuntu-bionic-ovh-bhs1-0012185489 nova-scheduler[28235]: 
ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message)
Oct 07 07:16:51.372050 ubuntu-bionic-ovh-bhs1-0012185489 nova-scheduler[28235]: 
ERROR oslo_messaging.rpc.server   File 
"/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 
274, in dispatch
Oct 07 07:16:51.372050 ubuntu-bionic-ovh-bhs1-0012185489 nova-scheduler[28235]: 
ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, 
ctxt, args)
Oct 07 07:16:51.372050 ubuntu-bionic-ovh-bhs1-0012185489 nova-scheduler[28235]: 
ERROR oslo_messaging.rpc.server   File 
"/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 
194, in _do_dispatch
Oct 07 07:16:51.372050 ubuntu-bionic-ovh-bhs1-0012185489 nova-scheduler[28235]: 
ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args)
Oct 07 07:16:51.372050 ubuntu-bionic-ovh-bhs1-0012185489 nova-scheduler[28235]: 
ERROR oslo_messaging.rpc.server   File 
"/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/server.py", line 
235, in inner
Oct 07 07:16:51.372050 ubuntu-bionic-ovh-bhs1-0012185489 nova-scheduler[28235]: 
ERROR oslo_messaging.rpc.server return func(*args, **kwargs)
Oct 07 07:16:51.372050 ubuntu-bionic-ovh-bhs1-0012185489 nova-scheduler[28235]: 
ERROR oslo_messaging.rpc.server   File 
"/opt/stack/new/nova/nova/scheduler/manager.py", line 214, in 
select_destinations
Oct 07 07:16:51.372050 ubuntu-bionic-ovh-bhs1-0012185489 nova-scheduler[28235]: 
ERROR oslo_messaging.rpc.server allocation_request_version, 
return_alternates)
Oct 07 07:16:51.372050 ubuntu-bionic-ovh-bhs1-0012185489 nova-scheduler[28235]: 
ERROR oslo_messaging.rpc.server   File 
"/opt/stack/new/nova/nova/scheduler/filter_scheduler.py", line 96, in 
select_destinations
Oct 07 07:16:51.372050 ubuntu-bionic-ovh-bhs1-0012185489 nova-scheduler[28235]: 
ERROR oslo_messaging.rpc.server allocation_request_version, 
return_alternates)
Oct 07 07:16:51.372050 ubuntu-bionic-ovh-bhs1-0012185489 nova-scheduler[28235]: 
ERROR oslo_messaging.rpc.server   File 
"/opt/stack/new/nova/nova/scheduler/filter_scheduler.py", line 152, in _schedule
Oct 07 07:16:51.372050 ubuntu-bionic-ovh-bhs1-0012185489 nova-scheduler[28235]: 
ERROR oslo_messaging.rpc.server provider_summaries)
Oct 07 07:16:51.374461 ubuntu-bionic-ovh-bhs1-0012185489 nova-scheduler[28235]: 
ERROR oslo_messaging.rpc.server   File 
"/opt/stack/new/nova/nova/scheduler/filter_scheduler.py", line 494, in 
_get_all_host_states
Oct 07 07:16:51.374461 ubuntu-bionic-ovh-bhs1-0012185489 nova-scheduler[28235]: 
ERROR oslo_messaging.rpc.server spec_obj)
Oct 07 07:16:51.374461 ubuntu-bionic-ovh-bhs1-0012185489 nova-scheduler[28235]: 
ERROR oslo_messaging.rpc.server   File 
"/opt/stack/new/nova/nova/scheduler/host_manager.py", line 774, in 
get_host_states_by_uuids
Oct 07 07:16:51.374461 ubuntu-bionic-ovh-bhs1-0012185489 nova-scheduler[28235]: 
ERROR oslo_messaging.rpc.server context, cells, compute_uuids=compute_uuids)
Oct 07 07:16:51.374461 ubuntu-bionic-ovh-bhs1-0012185489 nova-scheduler[28235]: 
ERROR oslo_messaging.rpc.server   File 
"/opt/stack/new/nova/nova/scheduler/host_manager.py", line 640, in 
_get_computes_for_cells
Oct 07 07:16:51.374461 ubuntu-bionic-ovh-bhs1-0012185489 nova-scheduler[28235]: 
ERROR oslo_messaging.rpc.server targeted_operation)
Oct 07 07:16:51.374461 ubuntu-bionic-ovh-bhs1-0012185489 nova-scheduler[28235]: 
ERROR oslo_messaging.rpc.server   File "/opt/stack/new/nova/nova/context.py", 
line 449, in scatter_gather_cells
Oct 07 07:16:51.374461 ubuntu-bionic-ovh-bhs1-0012185489 nova-scheduler[28235]: 
ERROR oslo_messaging.rpc.server cell_uuid, exc_info=True)
Oct 07 07:16:51.374461 ubuntu-bionic-ovh-bhs1-0012185489 nova-scheduler[28235]: 
ERROR oslo_messaging.rpc.server UnboundLocalError: local variable 'cell_uuid' 
referenced before assignment
Oct 07 07:16:51.374461 ubuntu-bionic-ovh-bhs1-0012185489 nova-scheduler[28235]: 
ERROR oslo_messaging.rpc.server 

The fix is here: https://review.opendev.org/#/c/686996/

Apparently we don't have test coverage for that code.

[Yahoo-eng-team] [Bug 1552071] Re: Deleted instances didn't show when calling "nova list --deleted" by non-admin users

2019-10-07 Thread Matt Riedemann
To capture what I said in the now abandoned patch:

"This would change something that's not an error to an error, regardless
of the weird latent behavior. Because of that, I think this would
require a microversion which means we'd need a spec if we wanted to
change this. gmann was compiling a list of random cleanup items for the
compute API in an etherpad I believe, and this is something that could
probably go in that list as a candidate for something to cleanup in a
mass cleanup microversion."

** Changed in: nova
   Importance: Undecided => Wishlist

** Changed in: nova
   Status: In Progress => Opinion

** Changed in: nova
 Assignee: huanhongda (hongda) => (unassigned)

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1552071

Title:
  Deleted instances didn't show when calling "nova list --deleted" by
  non-admin users

Status in OpenStack Compute (nova):
  Opinion

Bug description:
  When calling "nova list --deleted" using non-admin context, no
  instance in "DELETED" will return:

  root@SZX158625:/opt/devstack# nova list --deleted
  
+--+-+++-+-+
  | ID   | Name| Status | Task State | 
Power State | Networks|
  
+--+-+++-+-+
  | 40bab05f-0692-43df-a8a9-e7c0d58a73bd | test_inject | ACTIVE | -  | 
Running | private=10.0.0.13, fdb7:5d7b:6dcd:0:f816:3eff:fe63:b012 |
  | ee8907c7-0730-4051-8426-64be44300e70 | test_inject | ACTIVE | -  | 
Running | private=10.0.0.14, fdb7:5d7b:6dcd:0:f816:3eff:fe4f:1b32 |
  
+--+-+++-+-+

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1552071/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1783565] Re: ServerGroupTestV21.test_evacuate_with_anti_affinity_no_valid_host intermittently fails with "Instance compute service state on host2 expected to be down, but it was

2019-10-07 Thread Matt Riedemann
We don't seem to be hitting this in the gate anymore so I'm not sure if
it's just rare now or if it's resolved some other way:

http://status.openstack.org/elastic-recheck/#1783565

I'm marking invalid for now though. We can re-open if necessary.

** Changed in: nova
 Assignee: Zhenyu Zheng (zhengzhenyu) => (unassigned)

** Changed in: nova
   Status: In Progress => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1783565

Title:
  ServerGroupTestV21.test_evacuate_with_anti_affinity_no_valid_host
  intermittently fails with "Instance compute service state on host2
  expected to be down, but it was up."

Status in OpenStack Compute (nova):
  Invalid

Bug description:
  http://logs.openstack.org/32/584032/5/check/nova-tox-functional-
  py35/7061ec1/job-output.txt.gz#_2018-07-25_03_16_46_462415

  18-07-25 03:16:46.418499 | ubuntu-xenial | {5} 
nova.tests.functional.test_server_group.ServerGroupTestV21.test_evacuate_with_anti_affinity_no_valid_host
 [14.070214s] ... FAILED
  2018-07-25 03:16:46.418582 | ubuntu-xenial |
  2018-07-25 03:16:46.418645 | ubuntu-xenial | Captured traceback:
  2018-07-25 03:16:46.418705 | ubuntu-xenial | ~~~
  2018-07-25 03:16:46.418798 | ubuntu-xenial | b'Traceback (most recent 
call last):'
  2018-07-25 03:16:46.419095 | ubuntu-xenial | b'  File 
"/home/zuul/src/git.openstack.org/openstack/nova/nova/tests/functional/test_server_group.py",
 line 456, in test_evacuate_with_anti_affinity_no_valid_host'
  2018-07-25 03:16:46.419232 | ubuntu-xenial | b"
self.admin_api.post_server_action(servers[1]['id'], post)"
  2018-07-25 03:16:46.419471 | ubuntu-xenial | b'  File 
"/home/zuul/src/git.openstack.org/openstack/nova/nova/tests/functional/api/client.py",
 line 294, in post_server_action'
  2018-07-25 03:16:46.419602 | ubuntu-xenial | b"'/servers/%s/action' % 
server_id, data, **kwargs).body"
  2018-07-25 03:16:46.419841 | ubuntu-xenial | b'  File 
"/home/zuul/src/git.openstack.org/openstack/nova/nova/tests/functional/api/client.py",
 line 235, in api_post'
  2018-07-25 03:16:46.419975 | ubuntu-xenial | b'return 
APIResponse(self.api_request(relative_uri, **kwargs))'
  2018-07-25 03:16:46.420187 | ubuntu-xenial | b'  File 
"/home/zuul/src/git.openstack.org/openstack/nova/nova/tests/functional/api/client.py",
 line 213, in api_request'
  2018-07-25 03:16:46.420263 | ubuntu-xenial | b'response=response)'
  2018-07-25 03:16:46.420545 | ubuntu-xenial | 
b'nova.tests.functional.api.client.OpenStackApiException: Unexpected status 
code: {"badRequest": {"message": "Compute service of host2 is still in use.", 
"code": 400}}'
  2018-07-25 03:16:46.420581 | ubuntu-xenial | b''
  2018-07-25 03:16:46.420606 | ubuntu-xenial |
  2018-07-25 03:16:46.420654 | ubuntu-xenial | Captured stderr:
  2018-07-25 03:16:46.420702 | ubuntu-xenial | 
  2018-07-25 03:16:46.421102 | ubuntu-xenial | 
b'/home/zuul/src/git.openstack.org/openstack/nova/.tox/functional-py35/lib/python3.5/site-packages/oslo_db/sqlalchemy/enginefacade.py:350:
 OsloDBDeprecationWarning: EngineFacade is deprecated; please use 
oslo_db.sqlalchemy.enginefacade'
  2018-07-25 03:16:46.421240 | ubuntu-xenial | b'  self._legacy_facade = 
LegacyEngineFacade(None, _factory=self)'
  2018-07-25 03:16:46.421623 | ubuntu-xenial | 
b'/home/zuul/src/git.openstack.org/openstack/nova/.tox/functional-py35/lib/python3.5/site-packages/oslo_db/sqlalchemy/enginefacade.py:350:
 OsloDBDeprecationWarning: EngineFacade is deprecated; please use 
oslo_db.sqlalchemy.enginefacade'
  2018-07-25 03:16:46.421751 | ubuntu-xenial | b'  self._legacy_facade = 
LegacyEngineFacade(None, _factory=self)'
  2018-07-25 03:16:46.422054 | ubuntu-xenial | 
b"/home/zuul/src/git.openstack.org/openstack/nova/nova/test.py:323: 
DeprecationWarning: Using class 'MoxStubout' (either directly or via 
inheritance) is deprecated in version '3.5.0'"
  2018-07-25 03:16:46.422174 | ubuntu-xenial | b'  mox_fixture = 
self.useFixture(moxstubout.MoxStubout())'
  2018-07-25 03:16:46.422537 | ubuntu-xenial | 
b'/home/zuul/src/git.openstack.org/openstack/nova/.tox/functional-py35/lib/python3.5/site-packages/paste/deploy/loadwsgi.py:22:
 DeprecationWarning: Parameters to load are deprecated.  Call .resolve and 
.require separately.'
  2018-07-25 03:16:46.422664 | ubuntu-xenial | b'  return 
pkg_resources.EntryPoint.parse("x=" + s).load(False)'
  2018-07-25 03:16:46.422928 | ubuntu-xenial | 
b"/home/zuul/src/git.openstack.org/openstack/nova/nova/db/sqlalchemy/api.py:205:
 DeprecationWarning: Property 'async_compat' has moved to 'function.async_'"
  2018-07-25 03:16:46.423038 | ubuntu-xenial | b'  reader_mode = 
get_context_manager(context).async'
  2018-07-25 03:16:46.423301 | ubuntu-xenial | 

[Yahoo-eng-team] [Bug 1846777] [NEW] Inefficient/redundant image GETs during large boot from volume server create requests with the same image

2019-10-04 Thread Matt Riedemann
Public bug reported:

This is demonstrated by this functional test patch:

https://review.opendev.org/#/c/686734/

That adds a test which creates a single server create request to create
10 servers and each server has 255 BDMs using the same image and asserts
that the API calls GET /v2/images/{image_id} on the same image 2551
times which is pretty inefficient.

For the lifetime of the server create request we should be smarter and
cache the results of each image we get so we don't make the same
redundant calls to the image service.

** Affects: nova
 Importance: Low
 Status: Confirmed


** Tags: api performance

** Summary changed:

- Inefficient image GET during large boot from volume server create requests
+ Inefficient/redundant image GETs during large boot from volume server create 
requests with the same image

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1846777

Title:
  Inefficient/redundant image GETs during large boot from volume server
  create requests with the same image

Status in OpenStack Compute (nova):
  Confirmed

Bug description:
  This is demonstrated by this functional test patch:

  https://review.opendev.org/#/c/686734/

  That adds a test which creates a single server create request to
  create 10 servers and each server has 255 BDMs using the same image
  and asserts that the API calls GET /v2/images/{image_id} on the same
  image 2551 times which is pretty inefficient.

  For the lifetime of the server create request we should be smarter and
  cache the results of each image we get so we don't make the same
  redundant calls to the image service.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1846777/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1846656] [NEW] Compute API in nova - show/list servers with details says security_groups is required in response but it's optional

2019-10-03 Thread Matt Riedemann
Public bug reported:

- [x] This doc is inaccurate in this way:

This came up in review:

https://review.opendev.org/#/c/685927/2//COMMIT_MSG@9

https://docs.openstack.org/api-ref/compute/#show-server-details

and

https://docs.openstack.org/api-ref/compute/#list-servers-detailed

response parameter tables both say that "security_groups" is a required
field in the response but that's not true if the server does not have
any attached ports which is possible. This is the server view builder
code:

https://github.com/openstack/nova/blob/867401e575d2b27b9bc63ceda41cd85233545cd5/nova/api/openstack/compute/views/servers.py#L627

Note the key is not in the GET response if the server is not attached to
any ports that have security groups.

I recreated in devstack by creating a server with no network:

$ openstack --os-compute-api-version 2.37 server create --flavor m1.tiny
--image cirros-0.4.0-x86_64-disk --nic none --wait vm-no-net

And the security_groups key is not in the GET /servers/detail response:

$ curl -H "X-Auth-Token: $token" http://10.128.0.6/compute/v2.1/servers/detail 
| python -m json.tool | grep security_groups
  % Total% Received % Xferd  Average Speed   TimeTime Time  Current
 Dload  Upload   Total   SpentLeft  Speed
100  1388  100  13880 0   8213  0 --:--:-- --:--:-- --:--:--  8213


---
Release:  on 2019-09-19 17:55:19
SHA: 9ca14e081860b1abcc0d676f253a472028690e29
Source: https://opendev.org/openstack/nova/src/api-ref/source/index.rst
URL: https://docs.openstack.org/api-ref/compute/

** Affects: nova
 Importance: Low
 Status: Triaged


** Tags: api-ref doc low-hanging-fruit

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1846656

Title:
  Compute API in nova - show/list servers with details says
  security_groups is required in response but it's optional

Status in OpenStack Compute (nova):
  Triaged

Bug description:
  - [x] This doc is inaccurate in this way:

  This came up in review:

  https://review.opendev.org/#/c/685927/2//COMMIT_MSG@9

  https://docs.openstack.org/api-ref/compute/#show-server-details

  and

  https://docs.openstack.org/api-ref/compute/#list-servers-detailed

  response parameter tables both say that "security_groups" is a
  required field in the response but that's not true if the server does
  not have any attached ports which is possible. This is the server view
  builder code:

  
https://github.com/openstack/nova/blob/867401e575d2b27b9bc63ceda41cd85233545cd5/nova/api/openstack/compute/views/servers.py#L627

  Note the key is not in the GET response if the server is not attached
  to any ports that have security groups.

  I recreated in devstack by creating a server with no network:

  $ openstack --os-compute-api-version 2.37 server create --flavor
  m1.tiny --image cirros-0.4.0-x86_64-disk --nic none --wait vm-no-net

  And the security_groups key is not in the GET /servers/detail
  response:

  $ curl -H "X-Auth-Token: $token" 
http://10.128.0.6/compute/v2.1/servers/detail | python -m json.tool | grep 
security_groups
% Total% Received % Xferd  Average Speed   TimeTime Time  
Current
   Dload  Upload   Total   SpentLeft  Speed
  100  1388  100  13880 0   8213  0 --:--:-- --:--:-- --:--:--  8213

  
  ---
  Release:  on 2019-09-19 17:55:19
  SHA: 9ca14e081860b1abcc0d676f253a472028690e29
  Source: https://opendev.org/openstack/nova/src/api-ref/source/index.rst
  URL: https://docs.openstack.org/api-ref/compute/

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1846656/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1846559] [NEW] Handling Down Cells in nova - security_groups can be in the response for GET /servers/detail

2019-10-03 Thread Matt Riedemann
Public bug reported:

- [x] This doc is inaccurate in this way:

This came up during a review to remove nova-net usage from functional
tests and enhance the neutron fixture used in those tests:

https://review.opendev.org/#/c/685927/2/nova/tests/functional/test_servers.py@1264

In summary, GET /servers/detail responses for servers in a down cell may
include a "security_groups" key because the API proxies that information
from neutron only using the server id (the neutron security group driver
finds the ports from that server id and the security groups from the
ports). None of the security group information about a server, when
using neutron, is cached with the server in the cell database unlike the
port information (VIFs i.e. instance.info_cache.network_info).

As a result, the doc is wrong for the keys it says can be returned from
a GET /servers/detail response in a down cell scenario since it doesn't
include 'security_groups'. The linked patch above shows that with the
changed sample:

https://review.opendev.org/#/c/685927/2/doc/api_samples/servers/v2.69
/servers-details-resp.json

Also note that this is not the same for the GET /servers/{server_id}
(show) case because that returns from the view builder here:

https://github.com/openstack/nova/blob/867401e575d2b27b9bc63ceda41cd85233545cd5/nova/api/openstack/compute/views/servers.py#L210

without including any security group information.

Note that fixing the API to be consistent between show and detail would
require a microversion and is likely not worth a new microversion of
that, a user can get security group information from the networking API
directly with something like this:

  GET /v2.0/ports?device_id==security_groups

And from the ports response the client can get the security groups by
id.

This bug is just to update the down cell API guide docs.

---
Release: 19.1.0.dev1588 on 2019-09-24 00:12:44
SHA: 2b15e162546ff5aa6458b2d1b2422a775e92b785
Source: https://opendev.org/openstack/nova/src/api-guide/source/down_cells.rst
URL: https://docs.openstack.org/api-guide/compute/down_cells.html

** Affects: nova
 Importance: Medium
 Status: Confirmed


** Tags: api-guide cells doc

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1846559

Title:
  Handling Down Cells in nova - security_groups can be in the response
  for GET /servers/detail

Status in OpenStack Compute (nova):
  Confirmed

Bug description:
  - [x] This doc is inaccurate in this way:

  This came up during a review to remove nova-net usage from functional
  tests and enhance the neutron fixture used in those tests:

  
https://review.opendev.org/#/c/685927/2/nova/tests/functional/test_servers.py@1264

  In summary, GET /servers/detail responses for servers in a down cell
  may include a "security_groups" key because the API proxies that
  information from neutron only using the server id (the neutron
  security group driver finds the ports from that server id and the
  security groups from the ports). None of the security group
  information about a server, when using neutron, is cached with the
  server in the cell database unlike the port information (VIFs i.e.
  instance.info_cache.network_info).

  As a result, the doc is wrong for the keys it says can be returned
  from a GET /servers/detail response in a down cell scenario since it
  doesn't include 'security_groups'. The linked patch above shows that
  with the changed sample:

  https://review.opendev.org/#/c/685927/2/doc/api_samples/servers/v2.69
  /servers-details-resp.json

  Also note that this is not the same for the GET /servers/{server_id}
  (show) case because that returns from the view builder here:

  
https://github.com/openstack/nova/blob/867401e575d2b27b9bc63ceda41cd85233545cd5/nova/api/openstack/compute/views/servers.py#L210

  without including any security group information.

  Note that fixing the API to be consistent between show and detail
  would require a microversion and is likely not worth a new
  microversion of that, a user can get security group information from
  the networking API directly with something like this:

GET /v2.0/ports?device_id==security_groups

  And from the ports response the client can get the security groups by
  id.

  This bug is just to update the down cell API guide docs.

  ---
  Release: 19.1.0.dev1588 on 2019-09-24 00:12:44
  SHA: 2b15e162546ff5aa6458b2d1b2422a775e92b785
  Source: https://opendev.org/openstack/nova/src/api-guide/source/down_cells.rst
  URL: https://docs.openstack.org/api-guide/compute/down_cells.html

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1846559/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team

[Yahoo-eng-team] [Bug 1846532] [NEW] Confusing error message when volume create fails

2019-10-03 Thread Matt Riedemann
Public bug reported:

Method `nova.volume.cinder.API#create` accepts `size` as the 3rd args,
but in wrapper of `nova.volume.cinder.translate_volume_exception`, the 3rd
parameter is volume_id. If we hit cinder exception when creating volumes
like the response body down below:
```
{"itemNotFound": {"message": "Volume type with name xxx could not be found.",
"code": 404}}
```
we may get exception in nova compute log like this:
```
BuildAbortException: Build of instance xxx aborted: Volume 40 could not be
found.
```
actually, `40` is volume size, not voluem id.

This could be a little misleading.

** Affects: nova
 Importance: Medium
 Assignee: Fan Zhang (fanzhang)
 Status: In Progress

** Affects: nova/queens
 Importance: Low
 Status: Confirmed

** Affects: nova/rocky
 Importance: Low
 Status: Confirmed

** Affects: nova/stein
 Importance: Low
 Status: Confirmed

** Affects: nova/train
 Importance: Low
 Status: Confirmed


** Tags: serviceability volumes

** Also affects: nova/queens
   Importance: Undecided
   Status: New

** Also affects: nova/stein
   Importance: Undecided
   Status: New

** Also affects: nova/train
   Importance: Undecided
   Status: New

** Also affects: nova/rocky
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1846532

Title:
  Confusing error message when volume create fails

Status in OpenStack Compute (nova):
  In Progress
Status in OpenStack Compute (nova) queens series:
  Confirmed
Status in OpenStack Compute (nova) rocky series:
  Confirmed
Status in OpenStack Compute (nova) stein series:
  Confirmed
Status in OpenStack Compute (nova) train series:
  Confirmed

Bug description:
  Method `nova.volume.cinder.API#create` accepts `size` as the 3rd args,
  but in wrapper of `nova.volume.cinder.translate_volume_exception`, the 3rd
  parameter is volume_id. If we hit cinder exception when creating volumes
  like the response body down below:
  ```
  {"itemNotFound": {"message": "Volume type with name xxx could not be found.",
  "code": 404}}
  ```
  we may get exception in nova compute log like this:
  ```
  BuildAbortException: Build of instance xxx aborted: Volume 40 could not be
  found.
  ```
  actually, `40` is volume size, not voluem id.

  This could be a little misleading.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1846532/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1846527] [NEW] online_data_migrations docs don't mention using --config-file to run the migrations per cell db

2019-10-03 Thread Matt Riedemann
Public bug reported:

This came up in the mailing list while answering some questions about
when/how various cells v2 and database related commands get run:

http://lists.openstack.org/pipermail/openstack-
discuss/2019-October/009937.html

Recent change https://review.opendev.org/#/c/671298/ was added to the
upgrade guide to mention that you can use the --config-file option with
the nova-manage db sync command to migrate the cell database schema per
cell database, in most cases that being cell0 and cell1. The same is
true for the online_data_migrations command since that does data
migrations for both the API DB and cell DB, and you would need to run it
per cell DB using the --config-file option with a config file whose
[database]/connection is configured for a given cell, e.g. cell0 or
cell1.

So I think the CLI guide should probably be updated for nova-manage and
the upgrades guide like in https://review.opendev.org/#/c/671298/. For
the CLI guide, it might be useful to just have a generic section about
using --config-file per cell database for commands that require a cell
database but don't have a kind of --all-cells option like the
archive_deleted_rows and purge commands.

** Affects: nova
 Importance: Undecided
 Status: New


** Tags: cells doc nova-manage upgrade

** Tags added: upgrade

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1846527

Title:
  online_data_migrations docs don't mention using --config-file to run
  the migrations per cell db

Status in OpenStack Compute (nova):
  New

Bug description:
  This came up in the mailing list while answering some questions about
  when/how various cells v2 and database related commands get run:

  http://lists.openstack.org/pipermail/openstack-
  discuss/2019-October/009937.html

  Recent change https://review.opendev.org/#/c/671298/ was added to the
  upgrade guide to mention that you can use the --config-file option
  with the nova-manage db sync command to migrate the cell database
  schema per cell database, in most cases that being cell0 and cell1.
  The same is true for the online_data_migrations command since that
  does data migrations for both the API DB and cell DB, and you would
  need to run it per cell DB using the --config-file option with a
  config file whose [database]/connection is configured for a given
  cell, e.g. cell0 or cell1.

  So I think the CLI guide should probably be updated for nova-manage
  and the upgrades guide like in https://review.opendev.org/#/c/671298/.
  For the CLI guide, it might be useful to just have a generic section
  about using --config-file per cell database for commands that require
  a cell database but don't have a kind of --all-cells option like the
  archive_deleted_rows and purge commands.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1846527/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1846401] Re: console proxy deployment info was removed from cells v2 layout doc

2019-10-02 Thread Matt Riedemann
** Also affects: nova/train
   Importance: Undecided
   Status: New

** Changed in: nova/train
   Status: New => In Progress

** Changed in: nova/train
   Importance: Undecided => Low

** Changed in: nova/train
 Assignee: (unassigned) => Matt Riedemann (mriedem)

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1846401

Title:
  console proxy deployment info was removed from cells v2 layout doc

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) train series:
  In Progress

Bug description:
  The information about how console proxies need to be deployed in a
  multi-cell deployment was mistakenly removed in the following commit
  as part of nova-consoleauth service docs removal:

  
https://github.com/openstack/nova/commit/009fd0f35bcb88acc80f12e69d5fb72c0ee5391f
  #diff-236824986276093f57fa8ba4d3639e68L322

  We need to restore the general information for console proxies using
  the database for storing token authorizations.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1846401/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1846262] [NEW] Failed resize claim leaves otherwise active instance in ERROR state

2019-10-01 Thread Matt Riedemann
Public bug reported:

I noticed this while working on a functional test to recreate a bug
during resize reschedule:

https://review.opendev.org/#/c/686017/

And discussed a bit in IRC:

http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-
nova.2019-10-01.log.html#t2019-10-01T16:33:27

The issue is that we can start a resize (or cold migration) of a stopped
or active (normally active) server and fail a resize claim in the
compute service due to some race issue or for resource claims that are
not handled by placement yet, like NUMA and PCI devices:

https://github.com/openstack/nova/blob/4d18b29c95e3862c68ab41a4c090eb30c32a037a/nova/compute/manager.py#L4527

That ResourceTracker.resize_claim can raise ComputeResourcesUnavailable
which is handled here:

https://github.com/openstack/nova/blob/4d18b29c95e3862c68ab41a4c090eb30c32a037a/nova/compute/manager.py#L4610

We may try to reschedule but if rescheduling fails, or we don't
reschedule, the instance is set to error state by this context manager:

https://github.com/openstack/nova/blob/4d18b29c95e3862c68ab41a4c090eb30c32a037a/nova/compute/manager.py#L4592

That will set the instance vm_state to error:

https://github.com/openstack/nova/blob/4d18b29c95e3862c68ab41a4c090eb30c32a037a/nova/compute/manager.py#L8809

If we failed a resize claim, there is actually no change in the guest,
same like if we failed a cold migration because the scheduler selected
the same host and the virt driver does not support that, see:

https://github.com/openstack/nova/blob/4d18b29c95e3862c68ab41a4c090eb30c32a037a/nova/compute/manager.py#L4489

If _prep_resize raises InstanceFaultRollback the
_error_out_instance_on_exception will handle it differently since
https://review.opendev.org/#/c/633212/ and not put the instance into
ERROR state but revert the vm_state to its previous value (active or
stopped).

If the guest is not changed I don't think the instance should be in
ERROR status because of a resize claim failure, but opinions on that
differ, e.g.:

(11:40:45 AM) mriedem: dansmith: ok, but still, the user shouldn't have to stop 
and then start to get out of that, or hard reboot, when the thing that failed 
is a resize claim race
(11:41:03 AM) dansmith: mriedem: so maybe it's just stop I'm thinking of.. 
anyway, I dunno.. it's very annoying as a user to do something, come back later 
and have it not obvious that the thing has happened, or failed or whatever
(11:41:52 AM) dansmith: mriedem: if you're going to retry the operation for 
them, I agree. if you're not, then being super obvious about what has happened 
is best, IMHO

If we aren't going to automatically handle the resize claim failure and
not set the instance to error state, then we should at least have
something in the API reference documentation about post-conditions for
resize and cold migrate actions such that if the instance is in ERROR
state and there is a fault for the resize claim failure, the user can
stop/start or hard reboot the server to reset its status. I do think we
have some precedence in handling non-error conditions like this though
since https://review.opendev.org/#/c/633227/.

This is latent behavior so I'm going to mark it low priority but I
wanted to make sure we have a bug reported for it.

** Affects: nova
 Importance: Low
 Status: Triaged


** Tags: resize

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1846262

Title:
  Failed resize claim leaves otherwise active instance in ERROR state

Status in OpenStack Compute (nova):
  Triaged

Bug description:
  I noticed this while working on a functional test to recreate a bug
  during resize reschedule:

  https://review.opendev.org/#/c/686017/

  And discussed a bit in IRC:

  http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-
  nova.2019-10-01.log.html#t2019-10-01T16:33:27

  The issue is that we can start a resize (or cold migration) of a
  stopped or active (normally active) server and fail a resize claim in
  the compute service due to some race issue or for resource claims that
  are not handled by placement yet, like NUMA and PCI devices:

  
https://github.com/openstack/nova/blob/4d18b29c95e3862c68ab41a4c090eb30c32a037a/nova/compute/manager.py#L4527

  That ResourceTracker.resize_claim can raise
  ComputeResourcesUnavailable which is handled here:

  
https://github.com/openstack/nova/blob/4d18b29c95e3862c68ab41a4c090eb30c32a037a/nova/compute/manager.py#L4610

  We may try to reschedule but if rescheduling fails, or we don't
  reschedule, the instance is set to error state by this context
  manager:

  
https://github.com/openstack/nova/blob/4d18b29c95e3862c68ab41a4c090eb30c32a037a/nova/compute/manager.py#L4592

  That will set the instance vm_state to error:

  
https://github.com/openstack/nova/blob/4d18b29c95e3862c68ab41a4c090eb30c32a037a/nova/compute/manager.py#L8809

  

[Yahoo-eng-team] [Bug 1781286] Re: CantStartEngineError in cell conductor during reschedule - get_host_availability_zone up-call

2019-09-30 Thread Matt Riedemann
Note for backports: this problem goes back to Pike but we won't be able
to backport the fix since it's going to require RPC API version changes.

** No longer affects: nova/pike

** No longer affects: nova/queens

** Changed in: nova
 Assignee: (unassigned) => Matt Riedemann (mriedem)

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1781286

Title:
  CantStartEngineError in cell conductor during reschedule -
  get_host_availability_zone up-call

Status in OpenStack Compute (nova):
  Triaged

Bug description:
  In a stable/queens devstack environment with multiple PowerVM compute
  nodes, everytime I see this in devstack@n-cond-cell1.service logs:

  Jul 11 15:48:57 myhostname nova-conductor[3796]: DEBUG
  nova.conductor.manager [None req-af22375c-f920-4747-bd2f-0de80ee69465
  admin admin] Rescheduling: True {{(pid=4108) build_instances
  /opt/stack/nova/nova/conductor/manager.py:571}}

  it is shortly thereafter followed by:

  Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR 
oslo_messaging.rpc.server [None req-af22375c-f920-4747-bd2f-0de80ee69465 admin 
admin] Exception during message handling: CantStartEngineError: No 
sql_connection parameter is established
  Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR 
oslo_messaging.rpc.server Traceback (most recent call last):
  Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR 
oslo_messaging.rpc.server   File 
"/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/server.py", line 
163, in _process_incoming
  Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR 
oslo_messaging.rpc.server res = self.dispatcher.dispatch(message)
  Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR 
oslo_messaging.rpc.server   File 
"/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 
220, in dispatch
  Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR 
oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, 
args)
  Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR 
oslo_messaging.rpc.server   File 
"/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 
190, in _do_dispatch
  Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR 
oslo_messaging.rpc.server result = func(ctxt, **new_args)
  Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR 
oslo_messaging.rpc.server   File "/opt/stack/nova/nova/conductor/manager.py", 
line 652, in build_instances
  Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR 
oslo_messaging.rpc.server host.service_host))
  Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR 
oslo_messaging.rpc.server   File "/opt/stack/nova/nova/availability_zones.py", 
line 95, in get_host_availability_zone
  Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR 
oslo_messaging.rpc.server key='availability_zone')
  Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR 
oslo_messaging.rpc.server   File 
"/usr/local/lib/python2.7/dist-packages/oslo_versionedobjects/base.py", line 
184, in wrapper
  Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR 
oslo_messaging.rpc.server result = fn(cls, context, *args, **kwargs)
  Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR 
oslo_messaging.rpc.server   File "/opt/stack/nova/nova/objects/aggregate.py", 
line 541, in get_by_host
  Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR 
oslo_messaging.rpc.server _get_by_host_from_db(context, host, key=key)]
  Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR 
oslo_messaging.rpc.server   File 
"/usr/local/lib/python2.7/dist-packages/oslo_db/sqlalchemy/enginefacade.py", 
line 987, in wrapper
  Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR 
oslo_messaging.rpc.server with self._transaction_scope(context):
  Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR 
oslo_messaging.rpc.server   File "/usr/lib/python2.7/contextlib.py", line 17, 
in __enter__
  Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR 
oslo_messaging.rpc.server return self.gen.next()
  Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR 
oslo_messaging.rpc.server   File 
"/usr/local/lib/python2.7/dist-packages/oslo_db/sqlalchemy/enginefacade.py", 
line 1037, in _transaction_scope
  Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR 
oslo_messaging.rpc.server context=context) as resource:
  Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR 
oslo_messaging.rpc.server   File "/usr/lib/python2.7/contextlib.py", line 17, 
in __enter__
  Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR 
oslo_messaging.rpc.server return self.gen.next()
  Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR 
oslo_messaging.rpc.server   File 
"/usr/local/lib/python2.7/dist-package

[Yahoo-eng-team] [Bug 1846045] [NEW] Docs don't mention running console proxies per cell

2019-09-30 Thread Matt Riedemann
Public bug reported:

This came up in the mailing list today:

http://lists.openstack.org/pipermail/openstack-
discuss/2019-September/009827.html

It's not immediately obvious that console proxy services should be run
per-cell rather than globally.

One would expect to see something about that here:

https://docs.openstack.org/nova/latest/user/cellsv2-layout.html

and/or here:

https://docs.openstack.org/nova/latest/admin/remote-console-access.html

or even in the cells FAQs page:

https://docs.openstack.org/nova/latest/user/cells.html#faqs

There was a lot of confusion over the deprecation of the nova-
consoleauth service in Rocky and several release notes and workarounds
for that:

https://docs.openstack.org/nova/stein/configuration/config.html#workarounds.enable_consoleauth

** Affects: nova
 Importance: Undecided
 Status: New


** Tags: cells console doc

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1846045

Title:
  Docs don't mention running console proxies per cell

Status in OpenStack Compute (nova):
  New

Bug description:
  This came up in the mailing list today:

  http://lists.openstack.org/pipermail/openstack-
  discuss/2019-September/009827.html

  It's not immediately obvious that console proxy services should be run
  per-cell rather than globally.

  One would expect to see something about that here:

  https://docs.openstack.org/nova/latest/user/cellsv2-layout.html

  and/or here:

  https://docs.openstack.org/nova/latest/admin/remote-console-
  access.html

  or even in the cells FAQs page:

  https://docs.openstack.org/nova/latest/user/cells.html#faqs

  There was a lot of confusion over the deprecation of the nova-
  consoleauth service in Rocky and several release notes and workarounds
  for that:

  
https://docs.openstack.org/nova/stein/configuration/config.html#workarounds.enable_consoleauth

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1846045/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1845986] Re: SEV does not enable IOMMU on SCSI controller

2019-09-30 Thread Matt Riedemann
** Also affects: nova/train
   Importance: Undecided
   Status: New

** Changed in: nova/train
   Status: New => In Progress

** Changed in: nova/train
 Assignee: (unassigned) => Boris Bobrov (bbobrov)

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1845986

Title:
  SEV does not enable IOMMU on SCSI controller

Status in OpenStack Compute (nova):
  In Progress
Status in OpenStack Compute (nova) train series:
  In Progress

Bug description:
  https://review.opendev.org/#/c/644565/ added logic to
  libvirt/designer.py for enabling iommu for certain devices where
  virtio is used.  This is required for AMD SEV[0].  However it missed
  the case of a SCSI controller where the model is virtio-scsi, e.g.:

  

  As with other virtio devices, here a child element needs to be added
  to the config when SEV is enabled:

  

  [0] http://specs.openstack.org/openstack/nova-
  specs/specs/train/approved/amd-sev-libvirt-support.html#proposed-
  change

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1845986/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1845905] Re: vpmem - libvirt.libvirtError: XML error: Invalid value for element or attribute 'maxMemory'

2019-09-30 Thread Matt Riedemann
** Also affects: nova/train
   Importance: Undecided
   Status: New

** Changed in: nova/train
 Assignee: (unassigned) => Dan Smith (danms)

** Changed in: nova/train
   Status: New => In Progress

** Changed in: nova/train
   Importance: Undecided => High

** Changed in: nova
   Importance: Undecided => High

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1845905

Title:
  vpmem - libvirt.libvirtError: XML error: Invalid value for element or
  attribute 'maxMemory'

Status in OpenStack Compute (nova):
  In Progress
Status in OpenStack Compute (nova) train series:
  In Progress

Bug description:
  The result of Python3 divide operation is float point. This resulted
  in an invalid value for 'maxMemory' entry of libvirt domain xml which
  expects an integer.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1845905/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1845146] Re: NUMA aware live migration failed when vCPU pin set

2019-09-27 Thread Matt Riedemann
** Also affects: nova/train
   Importance: High
 Assignee: Artom Lifshitz (notartom)
   Status: In Progress

** No longer affects: nova/train

** Also affects: nova/train
   Importance: High
 Assignee: Artom Lifshitz (notartom)
   Status: In Progress

** Changed in: nova/train
 Assignee: Artom Lifshitz (notartom) => Dan Smith (danms)

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1845146

Title:
  NUMA aware live migration failed when vCPU pin set

Status in OpenStack Compute (nova):
  In Progress
Status in OpenStack Compute (nova) train series:
  In Progress

Bug description:
  Description
  ===

  When vCPU pin policy is dedicated, the NUMA aware live migration may
  go failed.

  
  Steps to reproduce
  ==

  1. Create two flavor: 2c2g.numa; 4c.4g.numa
 (venv) [root@t1 ~]# openstack flavor show 2c2g.numa
  
++--+
  | Field  | Value  

  |
  
++--+
  | OS-FLV-DISABLED:disabled   | False  

  |
  | OS-FLV-EXT-DATA:ephemeral  | 0  

  |
  | access_project_ids | None   

  |
  | disk   | 1  

  |
  | id | b4a2df98-82c5-4a53-8ba5-4372f20a98bd   

  |
  | name   | 2c2g.numa  

  |
  | os-flavor-access:is_public | True   

  |
  | properties | hw:cpu_policy='dedicated', hw:numa_cpus.0='0', 
hw:numa_cpus.1='1', hw:numa_mem.0='1024', hw:numa_mem.1='1024', 
hw:numa_nodes='2' |
  | ram| 2048   

  |
  | rxtx_factor| 1.0

  |
  | swap   |

  |
  | vcpus  | 2  

  |
  
++--+
 (venv) [root@t1 ~]# openstack flavor show 4c.4g.numa
  
++--+
  | Field  | Value  

  |
  
++--+
  | OS-FLV-DISABLED:disabled   | False  

  |
  | OS-FLV-EXT-DATA:ephemeral  | 0  

  |
  | access_project_ids | None   

  |
  | disk   | 1  

  |
  | id | cf53f5ea-c036-4a79-8183-6a2389212d02   
 

[Yahoo-eng-team] [Bug 1845243] Re: Nested 'path' query param in console URL breaks serialproxy

2019-09-27 Thread Matt Riedemann
** Also affects: nova/train
   Importance: Undecided
   Status: New

** Changed in: nova/train
   Status: New => Confirmed

** Changed in: nova/train
   Importance: Undecided => High

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1845243

Title:
  Nested 'path' query param in console URL breaks serialproxy

Status in OpenStack Compute (nova):
  In Progress
Status in OpenStack Compute (nova) rocky series:
  Confirmed
Status in OpenStack Compute (nova) stein series:
  Confirmed
Status in OpenStack Compute (nova) train series:
  Confirmed

Bug description:
  Description
  ===

  Change I2ddf0f4d768b698e980594dd67206464a9cea37b changed all console
  URLs to have the token attached as a nested query parameter inside an
  outer "path" query parameter, e.g. "?path=?token=***".

  While this was necessary for NoVNC support, it appears to have broken
  Ironic serial consoles, which use the nova-serialproxy service, which
  apparently is not aware that it needs to parse the token in this
  manner. It uses websockify.

  To test, I enabled debug mode and added some extra logging in the
  nova-serialproxy to prove that "token" was empty in this function:
  
https://github.com/openstack/nova/blob/stable/rocky/nova/objects/console_auth_token.py#L143

  Steps to reproduce
  ==

  1. Have Ironic set up to allow web/serial consoles 
(https://docs.openstack.org/ironic/pike/admin/console.html). I believe this 
also requires having nova-serialproxy deployed.
  2. Launch an Ironic instance and attempt to access the console via Horizon.

  
  Expected result
  ===

  The serial console loads in the web interface; "Status: Opened" is
  displayed in the bottom. Console is interactive assuming the node has
  booted properly.

  
  Actual result
  =

  The serial console loads, but is blank; "Status: Closed" is displayed
  in the bottom. nova-serialproxy logs indicate the token was expired or
  invalid. The console never becomes interactive, but does not indicate
  there is an error in Horizon (at least on my deployment.)

  Environment
  ===

  OpenStack Rocky release, deployed with Kolla-Ansible.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1845243/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1841481] Re: Race during ironic re-balance corrupts local RT ProviderTree and compute_nodes cache

2019-09-25 Thread Matt Riedemann
** Also affects: nova/queens
   Importance: Undecided
   Status: New

** Also affects: nova/rocky
   Importance: Undecided
   Status: New

** Also affects: nova/stein
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1841481

Title:
  Race during ironic re-balance corrupts local RT ProviderTree and
  compute_nodes cache

Status in OpenStack Compute (nova):
  In Progress
Status in OpenStack Compute (nova) queens series:
  New
Status in OpenStack Compute (nova) rocky series:
  New
Status in OpenStack Compute (nova) stein series:
  New

Bug description:
  Seen with an ironic re-balance in this job:

  
https://d01b2e57f0a56cb7edf0-b6bc206936c08bb07a5f77cfa916a2d4.ssl.cf5.rackcdn.com/678298/4/check
  /ironic-tempest-ipa-wholedisk-direct-tinyipa-multinode/92c65ac/

  On the subnode we see the RT detect that the node is moving hosts:

  Aug 26 18:41:38.818412 ubuntu-bionic-rax-ord-0010443319 nova-
  compute[747]: INFO nova.compute.resource_tracker [None req-a894abee-
  a2f1-4423-8ede-2a1b9eef28a4 None None] ComputeNode 61dbc9c7-828b-4c42
  -b19c-a3716037965f moving from ubuntu-bionic-rax-ord-0010443317 to
  ubuntu-bionic-rax-ord-0010443319

  On that new host, the ProviderTree cache is getting updated with
  refreshed associations for inventory:

  Aug 26 18:41:38.881026 ubuntu-bionic-rax-ord-0010443319 nova-
  compute[747]: DEBUG nova.scheduler.client.report [None req-a894abee-
  a2f1-4423-8ede-2a1b9eef28a4 None None] Refreshing inventories for
  resource provider 61dbc9c7-828b-4c42-b19c-a3716037965f {{(pid=747)
  _refresh_associations
  /opt/stack/nova/nova/scheduler/client/report.py:761}}

  aggregates:

  Aug 26 18:41:38.953685 ubuntu-bionic-rax-ord-0010443319 nova-
  compute[747]: DEBUG nova.scheduler.client.report [None req-a894abee-
  a2f1-4423-8ede-2a1b9eef28a4 None None] Refreshing aggregate
  associations for resource provider 61dbc9c7-828b-4c42-b19c-
  a3716037965f, aggregates: None {{(pid=747) _refresh_associations
  /opt/stack/nova/nova/scheduler/client/report.py:770}}

  and traits - but when we get traits the provider is gone:

  Aug 26 18:41:38.995595 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: 
ERROR nova.compute.manager [None req-a894abee-a2f1-4423-8ede-2a1b9eef28a4 None 
None] Error updating resources for node 61dbc9c7-828b-4c42-b19c-a3716037965f.: 
ResourceProviderTraitRetrievalFailed: Failed to get traits for resource 
provider with UUID 61dbc9c7-828b-4c42-b19c-a3716037965f
  Aug 26 18:41:38.995595 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: 
ERROR nova.compute.manager Traceback (most recent call last):
  Aug 26 18:41:38.995595 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: 
ERROR nova.compute.manager   File "/opt/stack/nova/nova/compute/manager.py", 
line 8250, in _update_available_resource_for_node
  Aug 26 18:41:38.995595 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: 
ERROR nova.compute.manager startup=startup)
  Aug 26 18:41:38.995595 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: 
ERROR nova.compute.manager   File 
"/opt/stack/nova/nova/compute/resource_tracker.py", line 715, in 
update_available_resource
  Aug 26 18:41:38.995595 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: 
ERROR nova.compute.manager self._update_available_resource(context, 
resources, startup=startup)
  Aug 26 18:41:38.995595 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: 
ERROR nova.compute.manager   File 
"/usr/local/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py", line 
328, in inner
  Aug 26 18:41:38.995595 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: 
ERROR nova.compute.manager return f(*args, **kwargs)
  Aug 26 18:41:38.995595 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: 
ERROR nova.compute.manager   File 
"/opt/stack/nova/nova/compute/resource_tracker.py", line 738, in 
_update_available_resource
  Aug 26 18:41:38.995595 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: 
ERROR nova.compute.manager is_new_compute_node = 
self._init_compute_node(context, resources)
  Aug 26 18:41:38.995595 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: 
ERROR nova.compute.manager   File 
"/opt/stack/nova/nova/compute/resource_tracker.py", line 561, in 
_init_compute_node
  Aug 26 18:41:38.995595 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: 
ERROR nova.compute.manager if self._check_for_nodes_rebalance(context, 
resources, nodename):
  Aug 26 18:41:38.995595 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: 
ERROR nova.compute.manager   File 
"/opt/stack/nova/nova/compute/resource_tracker.py", line 516, in 
_check_for_nodes_rebalance
  Aug 26 18:41:38.995595 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: 
ERROR nova.compute.manager self._update(context, cn)
  Aug 26 18:41:38.995595 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: 
ERROR 

[Yahoo-eng-team] [Bug 1845243] Re: Nested 'path' query param in console URL breaks serialproxy

2019-09-25 Thread Matt Riedemann
I know tempest has a novnc console test, I wonder if the same is
possible for ironic serial consoles in ironic CI testing so we could
avoid these types of regressions in the future?

** Also affects: nova/rocky
   Importance: Undecided
   Status: New

** Also affects: nova/stein
   Importance: Undecided
   Status: New

** Tags added: console ironic

** Changed in: nova
   Status: New => Confirmed

** Changed in: nova/rocky
   Importance: Undecided => High

** Changed in: nova/stein
   Status: New => Confirmed

** Changed in: nova
   Importance: Undecided => High

** Tags added: regression

** Changed in: nova/rocky
   Status: New => Confirmed

** Changed in: nova/stein
   Importance: Undecided => High

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1845243

Title:
  Nested 'path' query param in console URL breaks serialproxy

Status in OpenStack Compute (nova):
  Confirmed
Status in OpenStack Compute (nova) rocky series:
  Confirmed
Status in OpenStack Compute (nova) stein series:
  Confirmed

Bug description:
  Description
  ===

  Change I2ddf0f4d768b698e980594dd67206464a9cea37b changed all console
  URLs to have the token attached as a nested query parameter inside an
  outer "path" query parameter, e.g. "?path=?token=***".

  While this was necessary for NoVNC support, it appears to have broken
  Ironic serial consoles, which use the nova-serialproxy service, which
  apparently is not aware that it needs to parse the token in this
  manner. It uses websockify.

  To test, I enabled debug mode and added some extra logging in the
  nova-serialproxy to prove that "token" was empty in this function:
  
https://github.com/openstack/nova/blob/stable/rocky/nova/objects/console_auth_token.py#L143

  Steps to reproduce
  ==

  1. Have Ironic set up to allow web/serial consoles 
(https://docs.openstack.org/ironic/pike/admin/console.html). I believe this 
also requires having nova-serialproxy deployed.
  2. Launch an Ironic instance and attempt to access the console via Horizon.

  
  Expected result
  ===

  The serial console loads in the web interface; "Status: Opened" is
  displayed in the bottom. Console is interactive assuming the node has
  booted properly.

  
  Actual result
  =

  The serial console loads, but is blank; "Status: Closed" is displayed
  in the bottom. nova-serialproxy logs indicate the token was expired or
  invalid. The console never becomes interactive, but does not indicate
  there is an error in Horizon (at least on my deployment.)

  Environment
  ===

  OpenStack Rocky release, deployed with Kolla-Ansible.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1845243/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1845291] Re: migration is not recheduled if the server originally booted with --availability-zone :

2019-09-25 Thread Matt Riedemann
This goes back to Newton:

https://github.com/openstack/nova/commit/76dfb4ba9fa0fed1350021591956c4e8143b1ce9

** Changed in: nova
   Status: New => In Progress

** Changed in: nova
   Importance: Undecided => Medium

** Also affects: nova/stein
   Importance: Undecided
   Status: New

** Also affects: nova/queens
   Importance: Undecided
   Status: New

** Also affects: nova/rocky
   Importance: Undecided
   Status: New

** Also affects: nova/ocata
   Importance: Undecided
   Status: New

** Also affects: nova/pike
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1845291

Title:
  migration is not recheduled if the server originally booted with
  --availability-zone :

Status in OpenStack Compute (nova):
  In Progress
Status in OpenStack Compute (nova) ocata series:
  New
Status in OpenStack Compute (nova) pike series:
  New
Status in OpenStack Compute (nova) queens series:
  New
Status in OpenStack Compute (nova) rocky series:
  New
Status in OpenStack Compute (nova) stein series:
  New

Bug description:
  Steps to reproduce
  ==
  1) boot a server with --availability-zone : This will force nova to 
boot the server on the given host
  2) Try to migrate the server in a situation when the first destination host 
of the migration selected by the scheduler will fail (e.g. move_claim fails) 
but there are alternate hosts that could support the migration.

  Expected result
  ===

  Migration is re-scheduled after the first failure and can succeed on
  an alternate destination.

  Actual result
  =
  Nova does not try to re-schedule the migration after the first failure. 
Server goes to ERROR state.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1845291/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1845148] Re: OpenStack Compute (nova) in nova

2019-09-24 Thread Matt Riedemann
Do you have the logs? Are there specific errors in the scheduler or
conductor logs about NoValidHost? You can trace a request through the
logs by the request ID which is something like "req-" so trace a
request and see why the scheduler is filtering out all hosts. I'm
closing this as invalid since it's a support request.

** Changed in: nova
   Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1845148

Title:
  OpenStack Compute (nova) in nova

Status in OpenStack Compute (nova):
  Invalid

Bug description:

  
  In openstack stein. I create a new instance but it give me an error.
  Exhausted all hosts available for retrying build failures for instance 
07e367e0-0a9c-4e6e-b08c-e03cdc54cec4.]. 
  How could I fix it?

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1845148/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1694844] Re: Boot from volume fails when cross_az_attach=False and volume is provided to nova without an AZ for the instance

2019-09-23 Thread Matt Riedemann
** No longer affects: nova/ocata

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1694844

Title:
  Boot from volume fails when cross_az_attach=False and volume is
  provided to nova without an AZ for the instance

Status in OpenStack Compute (nova):
  In Progress

Bug description:
  This was recreated with a devstack change:

  http://logs.openstack.org/74/467674/4/check/gate-tempest-dsvm-neutron-
  full-ubuntu-
  xenial/3dbd6e9/logs/screen-n-api.txt.gz#_May_26_02_41_54_584798

  In this failing test, Tempest creates a volume:

  {"volume": {"status": "creating", "user_id":
  "2256bb66db8741aab58a20367b00bfa2", "attachments": [], "links":
  [{"href":
  "https://10.39.38.35:8776/v2/272882ba896341d483982dbcb1fde0f4/volumes
  /55a7c64a-f7b2-4b77-8f60-c1ccda8e0c30", "rel": "self"}, {"href":
  "https://10.39.38.35:8776/272882ba896341d483982dbcb1fde0f4/volumes
  /55a7c64a-f7b2-4b77-8f60-c1ccda8e0c30", "rel": "bookmark"}],
  "availability_zone": "nova", "bootable": "false", "encrypted": false,
  "created_at": "2017-05-26T02:41:45.617286", "description": null,
  "updated_at": null, "volume_type": "lvmdriver-1", "name": "tempest-
  TestVolumeBootPattern-volume-origin-1984626538", "replication_status":
  null, "consistencygroup_id": null, "source_volid": null,
  "snapshot_id": null, "multiattach": false, "metadata": {}, "id":
  "55a7c64a-f7b2-4b77-8f60-c1ccda8e0c30", "size": 1}}

  And the AZ on the volume defaults to 'nova' because that's the default
  AZ in cinder.conf.

  That volume ID is then passed to create the server:

  {"server": {"block_device_mapping_v2": [{"source_type": "volume",
  "boot_index": 0, "destination_type": "volume", "uuid": "55a7c64a-
  f7b2-4b77-8f60-c1ccda8e0c30", "delete_on_termination": true}],
  "networks": [{"uuid": "da48954d-1f66-427b-892c-a7f2eb1b54a3"}],
  "imageRef": "", "name": "tempest-TestVolumeBootPattern-
  server-1371698056", "flavorRef": "42"}}

  Which fails with the 400 InvalidVolume error because of this check in
  the API:

  
https://github.com/openstack/nova/blob/f112dc686dadd643410575cc3487cf1632e4f689/nova/volume/cinder.py#L286

  The instance is not associated with a host yet so it's not in an
  aggregate, and since an AZ wasn't specified when creating an instance
  (and I don't think we want people passing 'nova' as the AZ), it fails
  when comparing None to 'nova'.

  This is separate from bug 1497253 and change
  https://review.openstack.org/#/c/366724/ because in that case Nova is
  creating the volume during boot from volume and can specify the AZ for
  the volume. In this bug, the volume already exists and is provided to
  Nova.

  We might need to be able to distinguish if the API or compute service
  is calling check_availability_zone and if so, pass a default AZ in the
  case of the API if one isn't defined.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1694844/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1844929] [NEW] grenade jobs failing due to "Timed out waiting for response from cell" in scheduler

2019-09-22 Thread Matt Riedemann
Public bug reported:

Seen here:

https://zuul.opendev.org/t/openstack/build/d53346210978403f888b85b82b2fe0c7/log/logs/screen-n-sch.txt.gz?severity=3#2368

Sep 22 00:50:54.174385 ubuntu-bionic-ovh-gra1-0011664420 nova-
scheduler[18043]: WARNING nova.context [None req-
1929039e-1517-4326-9700-738d4b570ba6 tempest-
AttachInterfacesUnderV243Test-2009753731 tempest-
AttachInterfacesUnderV243Test-2009753731] Timed out waiting for response
from cell 8acfb79b-2e40-4e1c-bc3d-d404dac6db90

Looks like something is causing timeouts reaching cell1 during grenade
runs. The only errors I see in the rabbit logs are these for the uwsgi
(API) servers:

=ERROR REPORT 22-Sep-2019::00:35:30 ===

closing AMQP connection <0.1511.0> (217.182.141.188:48492 ->
217.182.141.188:5672 - uwsgi:19453:72e08501-61ca-4ade-865e-
f0605979ed7d):

missed heartbeats from client, timeout: 60s

--

It looks like we don't have mysql logs in this grenade run, maybe we
need a fix like this somewhere for grenade:

https://github.com/openstack/devstack/commit/f92c346131db2c89b930b1a23f8489419a2217dc

logstash shows 1101 hits in the last 7 days, since Sept 17 actually:

http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22Timed%20out%20waiting%20for%20response%20from%20cell%5C%22%20AND%20tags%3A%5C%22screen-n-sch.txt%5C%22=7d

check and gate queues, all failures. It also appears to only show up on
fortnebula and OVH nodes, primarily fortnebula. I wonder if there is a
performing/timing issue if those nodes are slower and we aren't waiting
for something during the grenade upgrade before proceeding.

** Affects: nova
 Importance: High
 Status: Confirmed

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1844929

Title:
  grenade jobs failing due to "Timed out waiting for response from cell"
  in scheduler

Status in OpenStack Compute (nova):
  Confirmed

Bug description:
  Seen here:

  
https://zuul.opendev.org/t/openstack/build/d53346210978403f888b85b82b2fe0c7/log/logs/screen-n-sch.txt.gz?severity=3#2368

  Sep 22 00:50:54.174385 ubuntu-bionic-ovh-gra1-0011664420 nova-
  scheduler[18043]: WARNING nova.context [None req-
  1929039e-1517-4326-9700-738d4b570ba6 tempest-
  AttachInterfacesUnderV243Test-2009753731 tempest-
  AttachInterfacesUnderV243Test-2009753731] Timed out waiting for
  response from cell 8acfb79b-2e40-4e1c-bc3d-d404dac6db90

  Looks like something is causing timeouts reaching cell1 during grenade
  runs. The only errors I see in the rabbit logs are these for the uwsgi
  (API) servers:

  =ERROR REPORT 22-Sep-2019::00:35:30 ===

  closing AMQP connection <0.1511.0> (217.182.141.188:48492 ->
  217.182.141.188:5672 - uwsgi:19453:72e08501-61ca-4ade-865e-
  f0605979ed7d):

  missed heartbeats from client, timeout: 60s

  --

  It looks like we don't have mysql logs in this grenade run, maybe we
  need a fix like this somewhere for grenade:

  
https://github.com/openstack/devstack/commit/f92c346131db2c89b930b1a23f8489419a2217dc

  logstash shows 1101 hits in the last 7 days, since Sept 17 actually:

  
http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22Timed%20out%20waiting%20for%20response%20from%20cell%5C%22%20AND%20tags%3A%5C%22screen-n-sch.txt%5C%22=7d

  check and gate queues, all failures. It also appears to only show up
  on fortnebula and OVH nodes, primarily fortnebula. I wonder if there
  is a performing/timing issue if those nodes are slower and we aren't
  waiting for something during the grenade upgrade before proceeding.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1844929/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1763761] Re: CPU topologies in nova - doesn't mention numa specific image properties

2019-09-20 Thread Matt Riedemann
** Tags added: low-hanging-fruit

** No longer affects: python-glanceclient

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to Glance.
https://bugs.launchpad.net/bugs/1763761

Title:
  CPU topologies in nova - doesn't mention numa specific image
  properties

Status in Glance:
  Triaged
Status in OpenStack Compute (nova):
  Confirmed

Bug description:
  - [x] This is a doc addition request.

  This doc only talks about flavor extra specs for specifying numa nodes
  using the "hw:numa_nodes" flavor extra spec, but it's also possible to
  define numa nodes using the hw_numa_nodes image property, which
  coincidentally is also missing from the glance image properties doc:

  https://docs.openstack.org/python-glanceclient/latest/cli/property-
  keys.html

  ---
  Release: 17.0.0.0rc2.dev694 on 2018-04-13 15:32
  SHA: e93be2690754bcba4cb346d4376ce87f94f03303
  Source: 
https://git.openstack.org/cgit/openstack/nova/tree/doc/source/admin/cpu-topologies.rst
  URL: https://docs.openstack.org/nova/latest/admin/cpu-topologies.html

To manage notifications about this bug go to:
https://bugs.launchpad.net/glance/+bug/1763761/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1636338] Re: Numa topology not calculated for instance with numa_topology after upgrading to Mitaka

2019-09-20 Thread Matt Riedemann
Is this still a problem we need to track? Mitaka is long end of life
upstream at this point so I'm not even sure this is a problem on
upstream stable branches for which we could backport a fix.

** Changed in: nova
 Assignee: Stephen Finucane (stephenfinucane) => (unassigned)

** Changed in: nova
   Status: In Progress => Won't Fix

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1636338

Title:
  Numa topology not calculated for instance with numa_topology after
  upgrading to Mitaka

Status in OpenStack Compute (nova):
  Won't Fix

Bug description:
  This is related to this bug
  https://bugs.launchpad.net/nova/+bug/1596119

  After upgrading to Mitaka with the above patch, a new bug surfaced. The bug 
is related to InstanceNUMACell having cpu_policy set to None. This causes 
cpu_pinning_requested to always return False.
  
https://github.com/openstack/nova/blob/master/nova/objects/instance_numa_topology.py#L112

  This will then trick computes with old NUMA instances into thinking
  that nothing is pinned, causing new instances with cpu_policy set to
  CPUAllocationPolicy.DEDICATED to potentially get scheduled on the same
  NUMA zone.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1636338/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1840424] Re: glance manpage building in rocky is broken due to missing glance-cache-manage

2019-09-20 Thread Matt Riedemann
** Also affects: glance/rocky
   Importance: Undecided
   Status: New

** Also affects: glance/stein
   Importance: Undecided
 Assignee: Thomas Bechtold (toabctl)
   Status: In Progress

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to Glance.
https://bugs.launchpad.net/bugs/1840424

Title:
  glance manpage building in rocky is broken due to missing glance-
  cache-manage

Status in Glance:
  In Progress
Status in Glance rocky series:
  New
Status in Glance stein series:
  In Progress

Bug description:
  Using the latest commit (b3ff79ffa45f2439d769006fe9eb84ccf5690759)
  from stable/rocky branch.

  When trying to build the man pages with:

  sphinx-build -W -b man doc/source doc/build/man

  I get:

  [snipped]
  looking for now-outdated files... none found
  pickling environment... done
  checking consistency... done
  writing... glance-api.1 { } glance-cache-cleaner.1 { } glance-cache-manage.1 
{ 
  Exception occurred:
File 
"/home/tom/devel/openstack/glance/.tox/docs/lib/python2.7/site-packages/sphinx/environment/__init__.py",
 line 782, in get_doctree
  with open(doctree_filename, 'rb') as f:
  IOError: [Errno 2] No such file or directory: 
u'/home/tom/devel/openstack/glance/doc/build/man/.doctrees/cli/glancecachemanage.doctree'
  The full traceback has been saved in /tmp/sphinx-err-YA1GQ3.log, if you want 
to report the issue to the developers.

  
  This is because commit f126d3b8cc6ea5b8dc45bba52402cadfb4beb041 removed 
glancecachemanage.rst and the man page building is not tested in CI.

To manage notifications about this bug go to:
https://bugs.launchpad.net/glance/+bug/1840424/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1814245] Re: _disconnect_volume incorrectly called for multiattach volumes during post_live_migration

2019-09-18 Thread Matt Riedemann
** Also affects: nova/pike
   Importance: Undecided
   Status: New

** Changed in: nova/pike
   Status: New => In Progress

** Changed in: nova/pike
 Assignee: (unassigned) => Matt Riedemann (mriedem)

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1814245

Title:
  _disconnect_volume incorrectly called for multiattach volumes  during
  post_live_migration

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) pike series:
  In Progress
Status in OpenStack Compute (nova) queens series:
  Fix Committed
Status in OpenStack Compute (nova) rocky series:
  Fix Committed

Bug description:
  Description
  ===

  Idc5cecffa9129d600c36e332c97f01f1e5ff1f9f introduced a simple check to
  ensure disconnect_volume is only called when detaching a multi-attach
  volume from the final instance using it on a given host.

  That change however doesn't take LM into account and more specifically
  the call to _disconect_volume during post_live_migration at the end of
  the migration from the source. At this point the original instance has
  already moved so the call to objects.InstanceList.get_uuids_by_host
  will only return one local instance that is using the volume instead
  of two, allowing disconnect_volume to be called.

  Depending on the backend being used this call can succeed removing the
  connection to the volume for the remaining instance or os-brick can
  fail in situations where it needs to flush I/O etc from the in-use
  connection.

  
  Steps to reproduce
  ==

  * Launch two instances attached to the same multiattach volume on the same 
host.
  * LM one of these instances to another host.

  Expected result
  ===

  No calls to disconnect_volume are made and the remaining instance on
  the host is still able to access the multi-attach volume.

  Actual result
  =

  A call to disconnect_volume is made and the remaining instance is
  unable to access the volume *or* the LM fails due to os-brick failures
  to disconnect the in-use volume on the host.

  Environment
  ===
  1. Exact version of OpenStack you are running. See the following
list for all releases: http://docs.openstack.org/releases/

 master

  2. Which hypervisor did you use?
 (For example: Libvirt + KVM, Libvirt + XEN, Hyper-V, PowerKVM, ...)

 Libvirt + KVM

  
  2. Which storage type did you use?
 (For example: Ceph, LVM, GPFS, ...)
 What's the version of that?

 LVM/iSCSI with multipath enabled reproduces the os-brick failure.

  3. Which networking type did you use?
 (For example: nova-network, Neutron with OpenVSwitch, ...)

 N/A

  Logs & Configs
  ==

  # nova show testvm2
  [..]
  | fault| {"message": "Unexpected error 
while running command.  
|
  |  | Command: multipath -f 
360014054a424982306a4a659007f73b2   
|
  |  | Exit code: 1 

 |
  |  | Stdout: u'Jan 28 16:09:29 | 
360014054a424982306a4a659007f73b2: map in use\  
  |
  |  | Jan 28 16:09:29 | failed to 
remove multipath map 360014054a424982306a4a", "code": 500, "details": " 
  |
  |  |   File 
\"/usr/lib/python2.7/site-packages/nova/compute/manager.py\", line 202, in 
decorated_function  |
  |  | return function(self, 
context, *args, **kwargs)   
|
  |  |   File 
\"/usr/lib/python2.7/site-packages/nova/compute/manager.py\", line 6299, in 
_post_live_migration   |
  |  | migrate_data)

 |
  |  |   File 
\"/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py\", line 7744, in 
post_live_migration|
  |  | 
self._disconnect_volume(context, connection_info, instance) 
  |
  |  |   File 
\"/usr/li

[Yahoo-eng-team] [Bug 1844583] [NEW] tox -e docs fails with "WARNING: RSVG converter command 'rsvg-convert' cannot be run. Check the rsvg_converter_bin setting"

2019-09-18 Thread Matt Riedemann
Public bug reported:

Since this change:

https://github.com/openstack/nova/commit/16b9486bf7e91bfd5dc48297cee9f54b49156c93

Local docs builds fail if you don't have librsvg2-bin installed for the
sphinxcontrib-svg2pdfconverter dependency (I'm on Ubuntu 18.04). We
should include that in bindep.txt.

** Affects: nova
 Importance: Low
 Assignee: Matt Riedemann (mriedem)
 Status: Confirmed


** Tags: doc

** Changed in: nova
 Assignee: (unassigned) => Matt Riedemann (mriedem)

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1844583

Title:
  tox -e docs fails with "WARNING: RSVG converter command 'rsvg-convert'
  cannot be run. Check the rsvg_converter_bin setting"

Status in OpenStack Compute (nova):
  Confirmed

Bug description:
  Since this change:

  
https://github.com/openstack/nova/commit/16b9486bf7e91bfd5dc48297cee9f54b49156c93

  Local docs builds fail if you don't have librsvg2-bin installed for
  the sphinxcontrib-svg2pdfconverter dependency (I'm on Ubuntu 18.04).
  We should include that in bindep.txt.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1844583/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1844510] Re: openstackNova(Rocky)-launch instacne-NeutronAdminCredentialConfigurationInvalid

2019-09-18 Thread Matt Riedemann
Double check the configuration for the [neutron] section in nova.conf
against this:

https://docs.openstack.org/neutron/rocky/install/controller-install-
ubuntu.html#configure-the-compute-service-to-use-the-networking-service

Note that the install guide is just a reference, the actual URLs have to
make sense for your deployment, e.g. I'm guessing the URL hostname for
auth isn't actually "controller". It also looks like you can drop the
/v3 suffix on the auth_url.

** Tags added: config neutron

** Changed in: nova
   Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1844510

Title:
  openstackNova(Rocky)-launch instacne-
  NeutronAdminCredentialConfigurationInvalid

Status in OpenStack Compute (nova):
  Invalid

Bug description:
  I try to launch new instance (from horizon as 'admin').
  From empty list of instance, click 'launch instance'.

  [root@controller1 ~]# uname -a
  Linux controller1 3.10.84-21.fc21.loongson.18.mips64el #1 SMP PREEMPT Tue Apr 
16 18:41:34 CST 2019 mips64 mips64 mips64 GNU/Linux

  [root@controller1 ~]# less /var/log/nova/nova-api.log
  2019-09-18 16:42:27.566 2320 ERROR nova.network.neutronv2.api 
[req-4ff4645f-47be-4c66-bff1-2c8dbb4cca99 5af84d7c91ce4def8dad829fdd707e00 
0c71a300399e4d759ef8b9dc6b00accf - default default] Neutron client was not able 
to generate a valid 
  admin token, please verify Neutron admin credential located in nova.conf: 
Unauthorized: 401-{u'error': {u'message': u'The request you have made requires 
authentication.', u'code': 401, u'title': u'Unauthorized'}}
  2019-09-18 16:42:27.568 2320 ERROR nova.api.openstack.wsgi 
[req-4ff4645f-47be-4c66-bff1-2c8dbb4cca99 5af84d7c91ce4def8dad829fdd707e00 
0c71a300399e4d759ef8b9dc6b00accf - default default] Unexpected exception in API 
method: NeutronAdminCre
  dentialConfigurationInvalid: Networking client is experiencing an 
unauthorized exception.
  2019-09-18 16:42:27.568 2320 ERROR nova.api.openstack.wsgi Traceback (most 
recent call last):
  2019-09-18 16:42:27.568 2320 ERROR nova.api.openstack.wsgi   File 
"/usr/lib/python2.7/site-packages/nova/api/openstack/wsgi.py", line 801, in 
wrapped
  2019-09-18 16:42:27.568 2320 ERROR nova.api.openstack.wsgi return 
f(*args, **kwargs)
  2019-09-18 16:42:27.568 2320 ERROR nova.api.openstack.wsgi   File 
"/usr/lib/python2.7/site-packages/nova/api/validation/__init__.py", line 110, 
in wrapper
  2019-09-18 16:42:27.568 2320 ERROR nova.api.openstack.wsgi return 
func(*args, **kwargs)
  2019-09-18 16:42:27.568 2320 ERROR nova.api.openstack.wsgi   File 
"/usr/lib/python2.7/site-packages/nova/api/validation/__init__.py", line 110, 
in wrapper
  2019-09-18 16:42:27.568 2320 ERROR nova.api.openstack.wsgi return 
func(*args, **kwargs)
  2019-09-18 16:42:27.568 2320 ERROR nova.api.openstack.wsgi   File 
"/usr/lib/python2.7/site-packages/nova/api/validation/__init__.py", line 110, 
in wrapper
  2019-09-18 16:42:27.568 2320 ERROR nova.api.openstack.wsgi return 
func(*args, **kwargs)
  2019-09-18 16:42:27.568 2320 ERROR nova.api.openstack.wsgi   File 
"/usr/lib/python2.7/site-packages/nova/api/validation/__init__.py", line 110, 
in wrapper
  2019-09-18 16:42:27.568 2320 ERROR nova.api.openstack.wsgi return 
func(*args, **kwargs)
  2019-09-18 16:42:27.568 2320 ERROR nova.api.openstack.wsgi   File 
"/usr/lib/python2.7/site-packages/nova/api/validation/__init__.py", line 110, 
in wrapper
  2019-09-18 16:42:27.568 2320 ERROR nova.api.openstack.wsgi return 
func(*args, **kwargs)
  2019-09-18 16:42:27.568 2320 ERROR nova.api.openstack.wsgi   File 
"/usr/lib/python2.7/site-packages/nova/api/validation/__init__.py", line 110, 
in wrapper
  2019-09-18 16:42:27.568 2320 ERROR nova.api.openstack.wsgi return 
func(*args, **kwargs)
  2019-09-18 16:42:27.568 2320 ERROR nova.api.openstack.wsgi   File 
"/usr/lib/python2.7/site-packages/nova/api/validation/__init__.py", line 110, 
in wrapper
  2019-09-18 16:42:27.568 2320 ERROR nova.api.openstack.wsgi return 
func(*args, **kwargs)
  2019-09-18 16:42:27.568 2320 ERROR nova.api.openstack.wsgi   File 
"/usr/lib/python2.7/site-packages/nova/api/validation/__init__.py", line 110, 
in wrapper
  2019-09-18 16:42:27.568 2320 ERROR nova.api.openstack.wsgi return 
func(*args, **kwargs)
  2019-09-18 16:42:27.568 2320 ERROR nova.api.openstack.wsgi   File 
"/usr/lib/python2.7/site-packages/nova/api/validation/__init__.py", line 110, 
in wrapper
  2019-09-18 16:42:27.568 2320 ERROR nova.api.openstack.wsgi return 
func(*args, **kwargs)
  2019-09-18 16:42:27.568 2320 ERROR nova.api.openstack.wsgi   File 
"/usr/lib/python2.7/site-packages/nova/api/validation/__init__.py", line 110, 
in wrapper
  2019-09-18 16:42:27.568 2320 ERROR nova.api.openstack.wsgi return 
func(*args, **kwargs)
  2019-09-18 16:42:27.568 2320 ERROR nova.api.openstack.wsgi   File 

[Yahoo-eng-team] [Bug 1763043] Re: Unnecessary "Instance not resizing, skipping migration" warning in n-cpu logs during live migration

2019-09-16 Thread Matt Riedemann
This is no longer valid on master (Train) due to this change:

https://review.opendev.org/#/c/634606/86/nova/compute/resource_tracker.py

I'm not sure it's worth trying to do a stable-only change to avoid the
warning messages during live migration at this point since they have
been around for years.

** Changed in: nova
   Status: In Progress => Invalid

** Changed in: nova
 Assignee: Matt Riedemann (mriedem) => (unassigned)

** No longer affects: nova/queens

** No longer affects: nova/rocky

** Changed in: nova
   Status: Invalid => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1763043

Title:
  Unnecessary "Instance not resizing, skipping migration" warning in
  n-cpu logs during live migration

Status in OpenStack Compute (nova):
  Fix Released

Bug description:
  In a 7 day CI run, we have over 40K hits of this warning in the logs:

  
http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22Instance%20not%20resizing%2C%20skipping%20migration%5C%22%20AND%20tags%3A%5C%22screen-n-cpu.txt%5C%22=7d

  http://logs.openstack.org/54/507854/4/gate/legacy-tempest-dsvm-
  multinode-live-
  migration/d723002/logs/subnode-2/screen-n-cpu.txt#_Apr_11_13_54_16_225676

  Apr 11 13:54:16.225676 ubuntu-xenial-rax-dfw-0003443206 nova-
  compute[29642]: WARNING nova.compute.resource_tracker [None req-
  61a6f9c9-3355-4594-acfa-ebf31ba995aa tempest-
  LiveMigrationTest-1725408283 tempest-LiveMigrationTest-1725408283]
  [instance: 6f4923e3-bf1f-4cb7-bd37-00e5d437759e] Instance not
  resizing, skipping migration.

  That warning was written back in 2012 when resize support was added to
  the resource tracker:

  https://review.openstack.org/#/c/15799/

  And since https://review.openstack.org/#/c/226411/ in 2015 it doesn't
  apply to evacuations.

  We shouldn't see a warning in the nova-compute logs during a normal
  operation like a live migration, so we really should either just drop
  this down to debug or remove it completely.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1763043/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1823215] Re: TestInstanceNotificationSampleWithMultipleComputeOldAttachFlow._test_live_migration_force_complete intermittent fails with MismatchError: 6 != 7

2019-09-12 Thread Matt Riedemann
*** This bug is a duplicate of bug 1843615 ***
https://bugs.launchpad.net/bugs/1843615

This was fixed with https://review.opendev.org/#/c/681540/ since I
didn't remember we already had a bug for this.

** This bug has been marked a duplicate of bug 1843615
   
TestInstanceNotificationSampleWithMultipleCompute.test_multiple_compute_actions 
intermittently failing since Sept 10, 2019

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1823215

Title:
  
TestInstanceNotificationSampleWithMultipleComputeOldAttachFlow._test_live_migration_force_complete
  intermittent fails with MismatchError: 6 != 7

Status in OpenStack Compute (nova):
  Confirmed

Bug description:
  Seen here:

  http://logs.openstack.org/47/638047/9/check/nova-tox-
  functional/71f64ae/job-output.txt.gz#_2019-04-02_00_07_32_290065

  2019-04-02 00:07:32.290065 | ubuntu-bionic | {2} 
nova.tests.functional.notification_sample_tests.test_instance.TestInstanceNotificationSampleWithMultipleComputeOldAttachFlow.test_multiple_compute_actions
 [14.302238s] ... FAILED
  2019-04-02 00:07:32.290219 | ubuntu-bionic |
  2019-04-02 00:07:32.290275 | ubuntu-bionic | Captured traceback:
  2019-04-02 00:07:32.290318 | ubuntu-bionic | ~~~
  2019-04-02 00:07:32.290378 | ubuntu-bionic | Traceback (most recent call 
last):
  2019-04-02 00:07:32.290525 | ubuntu-bionic |   File 
"nova/tests/functional/notification_sample_tests/test_instance.py", line 68, in 
test_multiple_compute_actions
  2019-04-02 00:07:32.290569 | ubuntu-bionic | action(server)
  2019-04-02 00:07:32.290726 | ubuntu-bionic |   File 
"nova/tests/functional/notification_sample_tests/test_instance.py", line 311, 
in _test_live_migration_force_complete
  2019-04-02 00:07:32.290822 | ubuntu-bionic | self.assertEqual(6, 
len(fake_notifier.VERSIONED_NOTIFICATIONS))
  2019-04-02 00:07:32.291011 | ubuntu-bionic |   File 
"/home/zuul/src/git.openstack.org/openstack/nova/.tox/functional/local/lib/python2.7/site-packages/testtools/testcase.py",
 line 411, in assertEqual
  2019-04-02 00:07:32.291148 | ubuntu-bionic | 
self.assertThat(observed, matcher, message)
  2019-04-02 00:07:32.291351 | ubuntu-bionic |   File 
"/home/zuul/src/git.openstack.org/openstack/nova/.tox/functional/local/lib/python2.7/site-packages/testtools/testcase.py",
 line 498, in assertThat
  2019-04-02 00:07:32.291402 | ubuntu-bionic | raise mismatch_error
  2019-04-02 00:07:32.291475 | ubuntu-bionic | 
testtools.matchers._impl.MismatchError: 6 != 7
  2019-04-02 00:07:32.291497 | ubuntu-bionic |
  2019-04-02 00:07:32.291515 | ubuntu-bionic |
  2019-04-02 00:07:32.291558 | ubuntu-bionic | Captured pythonlogging:
  2019-04-02 00:07:32.291602 | ubuntu-bionic | ~~~
  2019-04-02 00:07:32.291737 | ubuntu-bionic | 2019-04-02 00:07:19,024 
WARNING [placement.db_api] TransactionFactory already started, not 
reconfiguring.
  2019-04-02 00:07:32.291908 | ubuntu-bionic | 2019-04-02 00:07:19,053 INFO 
[nova.service] Starting conductor node (version 19.1.0)
  2019-04-02 00:07:32.292181 | ubuntu-bionic | 2019-04-02 00:07:19,073 INFO 
[nova.service] Starting scheduler node (version 19.1.0)
  2019-04-02 00:07:32.292326 | ubuntu-bionic | 2019-04-02 00:07:19,089 INFO 
[nova.network.driver] Loading network driver 'nova.network.linux_net'
  2019-04-02 00:07:32.292438 | ubuntu-bionic | 2019-04-02 00:07:19,090 INFO 
[nova.service] Starting network node (version 19.1.0)
  2019-04-02 00:07:32.292606 | ubuntu-bionic | 2019-04-02 00:07:19,118 INFO 
[nova.virt.driver] Loading compute driver 'fake.FakeLiveMigrateDriver'
  2019-04-02 00:07:32.292820 | ubuntu-bionic | 2019-04-02 00:07:19,118 
WARNING [nova.compute.monitors] Excluding nova.compute.monitors.cpu monitor 
virt_driver. Not in the list of enabled monitors (CONF.compute_monitors).
  2019-04-02 00:07:32.292945 | ubuntu-bionic | 2019-04-02 00:07:19,119 INFO 
[nova.service] Starting compute node (version 19.1.0)
  2019-04-02 00:07:32.293174 | ubuntu-bionic | 2019-04-02 00:07:19,141 
WARNING [nova.compute.manager] No compute node record found for host compute. 
If this is the first time this service is starting on this host, then you can 
ignore this warning.
  2019-04-02 00:07:32.293304 | ubuntu-bionic | 2019-04-02 00:07:19,144 
WARNING [nova.compute.resource_tracker] No compute node record for 
compute:fake-mini
  2019-04-02 00:07:32.293484 | ubuntu-bionic | 2019-04-02 00:07:19,148 INFO 
[nova.compute.resource_tracker] Compute node record created for 
compute:fake-mini with uuid: 109a2d73-cdf9-4d76-8e6e-74dc79ff7359
  2019-04-02 00:07:32.293687 | ubuntu-bionic | 2019-04-02 00:07:19,187 INFO 
[placement.requestlog] 127.0.0.1 "GET 
/placement/resource_providers?in_tree=109a2d73-cdf9-4d76-8e6e-74dc79ff7359" 
status: 200 len: 26 microversion: 1.14
  

[Yahoo-eng-team] [Bug 1843615] Re: TestInstanceNotificationSampleWithMultipleCompute.test_multiple_compute_actions intermittently failing since Sept 10, 2019

2019-09-11 Thread Matt Riedemann
** Also affects: nova/stein
   Importance: Undecided
   Status: New

** Changed in: nova/stein
   Status: New => Confirmed

** Changed in: nova/stein
   Importance: Undecided => High

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1843615

Title:
  
TestInstanceNotificationSampleWithMultipleCompute.test_multiple_compute_actions
  intermittently failing since Sept 10, 2019

Status in OpenStack Compute (nova):
  In Progress
Status in OpenStack Compute (nova) stein series:
  Confirmed

Bug description:
  Seen here:

  
https://openstack.fortnebula.com:13808/v1/AUTH_e8fd161dc34c421a979a9e6421f823e9/zuul_opendev_logs_c4c/671072/18/gate
  /nova-tox-functional/c4ca604/job-output.txt

  2019-09-11 16:01:31.460243 | ubuntu-bionic | {3} 
nova.tests.functional.notification_sample_tests.test_instance.TestInstanceNotificationSampleWithMultipleCompute.test_multiple_compute_actions
 [15.126947s] ... FAILED
  2019-09-11 16:01:31.460323 | ubuntu-bionic |
  2019-09-11 16:01:31.460383 | ubuntu-bionic | Captured traceback:
  2019-09-11 16:01:31.460442 | ubuntu-bionic | ~~~
  2019-09-11 16:01:31.460525 | ubuntu-bionic | Traceback (most recent call 
last):
  2019-09-11 16:01:31.460714 | ubuntu-bionic |   File 
"nova/tests/functional/notification_sample_tests/test_instance.py", line 61, in 
test_multiple_compute_actions
  2019-09-11 16:01:31.460775 | ubuntu-bionic | action(server)
  2019-09-11 16:01:31.460975 | ubuntu-bionic |   File 
"nova/tests/functional/notification_sample_tests/test_instance.py", line 306, 
in _test_live_migration_force_complete
  2019-09-11 16:01:31.461065 | ubuntu-bionic | 
fake_notifier.VERSIONED_NOTIFICATIONS)
  2019-09-11 16:01:31.461297 | ubuntu-bionic |   File 
"/home/zuul/src/opendev.org/openstack/nova/.tox/functional/local/lib/python2.7/site-packages/testtools/testcase.py",
 line 411, in assertEqual
  2019-09-11 16:01:31.461394 | ubuntu-bionic | 
self.assertThat(observed, matcher, message)
  2019-09-11 16:01:31.461628 | ubuntu-bionic |   File 
"/home/zuul/src/opendev.org/openstack/nova/.tox/functional/local/lib/python2.7/site-packages/testtools/testcase.py",
 line 498, in assertThat
  2019-09-11 16:01:31.461695 | ubuntu-bionic | raise mismatch_error
  2019-09-11 16:01:31.484778 | ubuntu-bionic | 
testtools.matchers._impl.MismatchError: 6 != 7: [{'priority': 'INFO', 
'payload': {'nova_object.namespace': 'nova', 'nova_object.name': 
'RequestSpecPayload', 'nova_object.version': '1.1', 'nova_object.data': 
{'flavor': {'nova_object.namespace': 'nova', 'nova_object.name': 
'FlavorPayload', 'nova_object.version': '1.4', 'nova_object.data': {'flavorid': 
u'a22d5517-147c-4147-a0d1-e698df5cd4e3', 'is_public': True, 'ephemeral_gb': 0, 
'vcpus': 1, 'root_gb': 1, 'disabled': False, 'description': None, 'projects': 
None, 'vcpu_weight': 0, 'memory_mb': 512, 'name': u'test_flavor', 
'rxtx_factor': 1.0, 'extra_specs': {'trait:COMPUTE_STATUS_DISABLED': 
u'forbidden', u'hw:watchdog_action': u'disabled'}, 'swap': 0}}, 'image': 
{'nova_object.namespace': 'nova', 'nova_object.name': 'ImageMetaPayload', 
'nova_object.version': '1.0', 'nova_object.data': {'direct_url': None, 
'container_format': u'raw', 'visibility': u'public', 'size': 25165824, 
'disk_format': u'raw', 'virtual_size': None, 'protected': False, 'status': 
u'active', 'updated_at': '2011-01-01T01:02:03Z', 'tags': [u'tag1', u'tag2'], 
'name': u'fakeimage123456', 'created_at': '2011-01-01T01:02:03Z', 'min_disk': 
0, 'checksum': None, 'owner': None, 'id': 
u'155d900f-4e14-4e4c-a73d-069cbf4541e6', 'properties': 
{'nova_object.namespace': 'nova', 'nova_object.name': 'ImageMetaPropsPayload', 
'nova_object.version': '1.1', 'nova_object.data': {'hw_architecture': 
u'x86_64'}}, 'min_ram': 0}}, 'requested_destination': {'nova_object.namespace': 
'nova', 'nova_object.name': 'DestinationPayload', 'nova_object.version': '1.0', 
'nova_object.data': {'host': u'host2', 'aggregates': None, 'node': u'host2', 
'cell': {'nova_object.namespace': 'nova', 'nova_object.name': 
'CellMappingPayload', 'nova_object.version': '2.0', 'nova_object.data': 
{'disabled': False, 'uuid': u'49bb4305-6acb-4b60-abff-382e2e85108a', 'name': 
u'cell1', 'security_groups': [u'default'], 'scheduler_hints': {}, 
'project_id': u'6f70656e737461636b20342065766572', 'retry': None, 
'num_instances': 1, 'instance_group': None, 'force_nodes': None, 
'ignore_hosts': [u'compute'], 'force_hosts': None, 'numa_topology': None, 
'instance_uuid': u'8d65a36d-36e8-4994-9bdd-89a455166ab9', 'availability_zone': 
None, 'user_id': u'fake', 'pci_requests': {'nova_object.namespace': 'nova', 
'nova_object.name': 'InstancePCIRequestsPayload', 'nova_object.version': '1.0', 
'nova_object.data': {'requests': [], 'instance_uuid': 
u'8d65a36d-36e8-4994-9bdd-89a455166ab9', 'publisher_id': 
u'nova-scheduler:host2', 

[Yahoo-eng-team] [Bug 1843615] [NEW] TestInstanceNotificationSampleWithMultipleCompute.test_multiple_compute_actions intermittently failing since Sept 10, 2019

2019-09-11 Thread Matt Riedemann
r', 'launched_at': '2012-10-29T13:42:11Z', 
'state': u'active', 'action_initiator_project': 
u'6f70656e737461636b20342065766572', 'architecture': u'x86_64', 'deleted_at': 
None, 'host': u'compute', 'availability_zone': u'nova', 'locked': False, 
'ip_addresses': [{'nova_object.namespace': 'nova', 'nova_object.name': 
'IpPayload', 'nova_object.version': '1.0', 'nova_object.data': {'label': 
u'private-network', 'meta': {}, 'address': '192.168.1.3', 'device_name': 
u'tapce531f90-19', 'mac': u'fa:16:3e:4c:2c:30', 'version': 4, 'port_uuid': 
u'ce531f90-199f-48c0-816c-13e38010b442'}}], 'auto_disk_config': u'MANUAL', 
'block_devices': [{'nova_object.namespace': 'nova', 'nova_object.name': 
'BlockDevicePayload', 'nova_object.version': '1.0', 'nova_object.data': 
{'boot_index': None, 'device_name': u'/dev/sdb', 'delete_on_termination': 
False, 'volume_id': u'a07f71dc-8151-4e7d-a0cc-cd24a3f3', 'tag': None}}], 
'node': u'fake-mini', 'request_id': 
u'req-5b6c791d-5709-4f36-8fbe-c3e02869e35d', 'locked_reason': None, 
'tenant_id': u'6f70656e737461636b20342065766572', 'metadata': {}, 'task_state': 
u'migrating', 'terminated_at': None, 'image_uuid': 
u'155d900f-4e14-4e4c-a73d-069cbf4541e6', 'display_name': u'some-server', 
'updated_at': '2012-10-29T13:42:11Z', 'power_state': u'running', 'user_id': 
u'fake', 'uuid': u'8d65a36d-36e8-4994-9bdd-89a455166ab9'}}, 'publisher_id': 
u'nova-compute:compute', 'event_type': 
u'instance.live_migration_force_complete.end'}, {'priority': 'INFO', 'payload': 
{'nova_object.namespace': 'nova', 'nova_object.name': 'InstanceActionPayload', 
'nova_object.version': '1.8', 'nova_object.data': {'os_type': None, 'flavor': 
{'nova_object.namespace': 'nova', 'nova_object.name': 'FlavorPayload', 
'nova_object.version': '1.4', 'nova_object.data': {'flavorid': 
u'a22d5517-147c-4147-a0d1-e698df5cd4e3', 'is_public': True, 'ephemeral_gb': 0, 
'vcpus': 1, 'root_gb': 1, 'disabled': False, 'description': None, 'projects': 
None, 'vcpu_weight': 0, 'memory_mb': 512, 'name': u'test_flavor', 
'rxtx_factor': 1.0, 'extra_specs': {u'hw:watchdog_action': u'disabled'}, 
'swap': 0}}, 'display_description': u'some-server', 'action_initiator_user': 
u'admin', 'kernel_id': u'', 'host_name': u'some-server', 'created_at': 
'2012-10-29T13:42:11Z', 'ramdisk_id': u'', 'key_name': u'my-key', 'fault': 
None, 'progress': 0, 'reservation_id': u'r-7gm79j0r', 'launched_at': 
'2012-10-29T13:42:11Z', 'state': u'active', 'action_initiator_project': 
u'6f70656e737461636b20342065766572', 'architecture': u'x86_64', 'deleted_at': 
None, 'host': u'compute', 'availability_zone': u'nova', 'locked': False, 
'ip_addresses': [{'nova_object.namespace': 'nova', 'nova_object.name': 
'IpPayload', 'nova_object.version': '1.0', 'nova_object.data': {'label': 
u'private-network', 'meta': {}, 'address': '192.168.1.3', 'device_name': 
u'tapce531f90-19', 'mac': u'fa:16:3e:4c:2c:30', 'version': 4, 'port_uuid': 
u'ce531f90-199f-48c0-816c-13e38010b442'}}], 'auto_disk_config': u'MANUAL', 
'block_devices': [{'nova_object.namespace': 'nova', 'nova_object.name': 
'BlockDevicePayload', 'nova_object.version': '1.0', 'nova_object.data': 
{'boot_index': None, 'device_name': u'/dev/sdb', 'delete_on_termination': 
False, 'volume_id': u'a07f71dc-8151-4e7d-a0cc-cd24a3f3', 'tag': None}}], 
'node': u'fake-mini', 'request_id': 
u'req-5b6c791d-5709-4f36-8fbe-c3e02869e35d', 'locked_reason': None, 
'tenant_id': u'6f70656e737461636b20342065766572', 'metadata': {}, 'task_state': 
u'migrating', 'terminated_at': None, 'image_uuid': 
u'155d900f-4e14-4e4c-a73d-069cbf4541e6', 'display_name': u'some-server', 
'updated_at': '2012-10-29T13:42:11Z', 'power_state': u'running', 'user_id': 
u'fake', 'uuid': u'8d65a36d-36e8-4994-9bdd-89a455166ab9'}}, 'publisher_id': 
u'nova-compute:compute', 'event_type': u'instance.live_migration_post.start'}]

The test code is expecting 6 notifications but got 7:

self._wait_for_notification(
'instance.live_migration_force_complete.end')

# 0. scheduler.select_destinations.start
# 1. scheduler.select_destinations.end
# 2. instance.live_migration_pre.start
# 3. instance.live_migration_pre.end
# 4. instance.live_migration_force_complete.start
# 5. instance.live_migration_force_complete.end
self.assertEqual(6, len(fake_notifier.VERSIONED_NOTIFICATIONS),
 fake_notifier.VERSIONED_NOTIFICATIONS)

The 7th is instance.live_migration_post.start:

http://paste.openstack.org/show/775148/

so it appears something has changed when that is sent or we're losing a
race with when force complete is triggered? Meaning maybe we don't catch
the force complete in time before post live migration starts.

** Affects: nova
     Importance: High
 Assignee: Matt Riedemann (mriedem)
 Status: Confirmed

** Changed in: nova
   Importance: Undecided => High

** Changed in: nova
   Status: New => Confirmed

-- 
You received this bug notification because you are a member o

[Yahoo-eng-team] [Bug 1843098] [NEW] Compute API in nova - host_numa_node field in server topology API is wrong

2019-09-06 Thread Matt Riedemann
Public bug reported:

- [x] This doc is inaccurate in this way:

https://docs.openstack.org/api-ref/compute/?expanded=show-server-
topology-detail#id401

There is no 'host_numa_node' parameter in the response, it's called
'host_node'.

---
Release:  on 2019-08-06 17:29:30
SHA: 3882cc5bb6c74b1df60475b6b7ec907d6ddf54f5
Source: https://opendev.org/openstack/nova/src/api-ref/source/index.rst
URL: https://docs.openstack.org/api-ref/compute/

** Affects: nova
 Importance: High
 Status: Confirmed


** Tags: api-ref

** Changed in: nova
   Status: New => Confirmed

** Changed in: nova
   Importance: Undecided => Medium

** Changed in: nova
   Importance: Medium => High

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1843098

Title:
  Compute API in nova - host_numa_node field in server topology API is
  wrong

Status in OpenStack Compute (nova):
  Confirmed

Bug description:
  - [x] This doc is inaccurate in this way:

  https://docs.openstack.org/api-ref/compute/?expanded=show-server-
  topology-detail#id401

  There is no 'host_numa_node' parameter in the response, it's called
  'host_node'.

  ---
  Release:  on 2019-08-06 17:29:30
  SHA: 3882cc5bb6c74b1df60475b6b7ec907d6ddf54f5
  Source: https://opendev.org/openstack/nova/src/api-ref/source/index.rst
  URL: https://docs.openstack.org/api-ref/compute/

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1843098/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1843090] [NEW] ComputeTaskManager._cold_migrate could get a legacy request spec dict from stein computes if rpc pinned and not convert it properly

2019-09-06 Thread Matt Riedemann
Public bug reported:

As of this change in Stein https://review.opendev.org/#/c/582417/ the
compute service will pass a request spec back to conductor when
rescheduling during a resize or cold migration. If the compute RPC API
version is pinned below 5.1, however, that request spec will be a legacy
dict rather than a full RequestSpec object so the code here:

https://github.com/openstack/nova/blob/19.0.0/nova/conductor/manager.py#L302-L321

Needs to account for the legacy dict case.

** Affects: nova
 Importance: Low
 Assignee: Matt Riedemann (mriedem)
 Status: In Progress

** Affects: nova/stein
 Importance: Low
 Status: Triaged


** Tags: conductor upgrade

** Also affects: nova/stein
   Importance: Undecided
   Status: New

** Changed in: nova/stein
   Status: New => Triaged

** Changed in: nova/stein
   Importance: Undecided => Low

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1843090

Title:
  ComputeTaskManager._cold_migrate could get a legacy request spec dict
  from stein computes if rpc pinned and not convert it properly

Status in OpenStack Compute (nova):
  In Progress
Status in OpenStack Compute (nova) stein series:
  Triaged

Bug description:
  As of this change in Stein https://review.opendev.org/#/c/582417/ the
  compute service will pass a request spec back to conductor when
  rescheduling during a resize or cold migration. If the compute RPC API
  version is pinned below 5.1, however, that request spec will be a
  legacy dict rather than a full RequestSpec object so the code here:

  
https://github.com/openstack/nova/blob/19.0.0/nova/conductor/manager.py#L302-L321

  Needs to account for the legacy dict case.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1843090/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1843058] [NEW] libvirt live migration fails intermittently in grenade live migration job with "error while loading state for instance 0x0 of device 'kvm-tpr-opt'"

2019-09-06 Thread Matt Riedemann
Public bug reported:

This may be related to bug 1838309 but I'm not sure so I'm reporting it
separately so we can track it in elastic-recheck. This is the traceback
in the nova-compute logs:

Sep 06 01:28:11.837685 ubuntu-bionic-rax-iad-0010857269 nova-compute[3855]: 
DEBUG nova.virt.libvirt.driver [None req-e6bcaa2e-aa66-4107-b0c6-9b3976d45c76 
None None] [instance: 64689c1f-27b6-4889-8206-3bc458427197] Migration operation 
thread notification {{(pid=3855) thread_finished 
/opt/stack/old/nova/nova/virt/libvirt/driver.py:8039}}
Sep 06 01:28:11.838031 ubuntu-bionic-rax-iad-0010857269 nova-compute[3855]: 
Traceback (most recent call last):
Sep 06 01:28:11.838031 ubuntu-bionic-rax-iad-0010857269 nova-compute[3855]:   
File "/usr/local/lib/python2.7/dist-packages/eventlet/hubs/hub.py", line 460, 
in fire_timers
Sep 06 01:28:11.838282 ubuntu-bionic-rax-iad-0010857269 nova-compute[3855]: 
timer()
Sep 06 01:28:11.838282 ubuntu-bionic-rax-iad-0010857269 nova-compute[3855]:   
File "/usr/local/lib/python2.7/dist-packages/eventlet/hubs/timer.py", line 59, 
in __call__
Sep 06 01:28:11.838561 ubuntu-bionic-rax-iad-0010857269 nova-compute[3855]: 
cb(*args, **kw)
Sep 06 01:28:11.838561 ubuntu-bionic-rax-iad-0010857269 nova-compute[3855]:   
File "/usr/local/lib/python2.7/dist-packages/eventlet/event.py", line 175, in 
_do_send
Sep 06 01:28:11.838774 ubuntu-bionic-rax-iad-0010857269 nova-compute[3855]: 
waiter.switch(result)
Sep 06 01:28:11.838774 ubuntu-bionic-rax-iad-0010857269 nova-compute[3855]:   
File "/usr/local/lib/python2.7/dist-packages/eventlet/greenthread.py", line 
219, in main
Sep 06 01:28:11.839008 ubuntu-bionic-rax-iad-0010857269 nova-compute[3855]: 
result = function(*args, **kwargs)
Sep 06 01:28:11.839008 ubuntu-bionic-rax-iad-0010857269 nova-compute[3855]:   
File "/opt/stack/old/nova/nova/utils.py", line 800, in context_wrapper
Sep 06 01:28:11.839688 ubuntu-bionic-rax-iad-0010857269 nova-compute[3855]: 
return func(*args, **kwargs)
Sep 06 01:28:11.839688 ubuntu-bionic-rax-iad-0010857269 nova-compute[3855]:   
File "/opt/stack/old/nova/nova/virt/libvirt/driver.py", line 7711, in 
_live_migration_operation
Sep 06 01:28:11.839688 ubuntu-bionic-rax-iad-0010857269 nova-compute[3855]: 
LOG.error("Live Migration failure: %s", e, instance=instance)
Sep 06 01:28:11.839688 ubuntu-bionic-rax-iad-0010857269 nova-compute[3855]:   
File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, 
in __exit__
Sep 06 01:28:11.839688 ubuntu-bionic-rax-iad-0010857269 nova-compute[3855]: 
self.force_reraise()
Sep 06 01:28:11.839688 ubuntu-bionic-rax-iad-0010857269 nova-compute[3855]:   
File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, 
in force_reraise
Sep 06 01:28:11.839688 ubuntu-bionic-rax-iad-0010857269 nova-compute[3855]: 
six.reraise(self.type_, self.value, self.tb)
Sep 06 01:28:11.839688 ubuntu-bionic-rax-iad-0010857269 nova-compute[3855]:   
File "/opt/stack/old/nova/nova/virt/libvirt/driver.py", line 7704, in 
_live_migration_operation
Sep 06 01:28:11.840435 ubuntu-bionic-rax-iad-0010857269 nova-compute[3855]: 
bandwidth=CONF.libvirt.live_migration_bandwidth)
Sep 06 01:28:11.840435 ubuntu-bionic-rax-iad-0010857269 nova-compute[3855]:   
File "/opt/stack/old/nova/nova/virt/libvirt/guest.py", line 682, in migrate
Sep 06 01:28:11.840435 ubuntu-bionic-rax-iad-0010857269 nova-compute[3855]: 
destination, params=params, flags=flags)
Sep 06 01:28:11.840435 ubuntu-bionic-rax-iad-0010857269 nova-compute[3855]:   
File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 190, in 
doit
Sep 06 01:28:11.840435 ubuntu-bionic-rax-iad-0010857269 nova-compute[3855]: 
result = proxy_call(self._autowrap, f, *args, **kwargs)
Sep 06 01:28:11.840435 ubuntu-bionic-rax-iad-0010857269 nova-compute[3855]:   
File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 148, in 
proxy_call
Sep 06 01:28:11.840435 ubuntu-bionic-rax-iad-0010857269 nova-compute[3855]: 
rv = execute(f, *args, **kwargs)
Sep 06 01:28:11.840435 ubuntu-bionic-rax-iad-0010857269 nova-compute[3855]:   
File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 129, in 
execute
Sep 06 01:28:11.840435 ubuntu-bionic-rax-iad-0010857269 nova-compute[3855]: 
six.reraise(c, e, tb)
Sep 06 01:28:11.840435 ubuntu-bionic-rax-iad-0010857269 nova-compute[3855]:   
File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 83, in 
tworker
Sep 06 01:28:11.841508 ubuntu-bionic-rax-iad-0010857269 nova-compute[3855]: 
rv = meth(*args, **kwargs)
Sep 06 01:28:11.841508 ubuntu-bionic-rax-iad-0010857269 nova-compute[3855]:   
File "/usr/local/lib/python2.7/dist-packages/libvirt.py", line 1745, in 
migrateToURI3
Sep 06 01:28:11.843021 ubuntu-bionic-rax-iad-0010857269 nova-compute[3855]: 
if ret == -1: raise libvirtError ('virDomainMigrateToURI3() failed', dom=self)
Sep 06 01:28:11.843239 ubuntu-bionic-rax-iad-0010857269 nova-compute[3855]: 

[Yahoo-eng-team] [Bug 1842985] [NEW] Testing Zero Downtime Upgrade Process in nova - broken reference link

2019-09-05 Thread Matt Riedemann
Public bug reported:

- [x] This doc is inaccurate in this way:

The reference link here is broken:

https://docs.openstack.org/nova/latest/contributor/testing/zero-
downtime-upgrade.html#zero-downtime-upgrade-process

---
Release:  on 2017-09-06 22:01:01
SHA: 4476e6218499bf1ae757973b500acfa59a5a9cbe
Source: 
https://opendev.org/openstack/nova/src/doc/source/contributor/testing/zero-downtime-upgrade.rst
URL: 
https://docs.openstack.org/nova/latest/contributor/testing/zero-downtime-upgrade.html

** Affects: nova
 Importance: Low
 Status: Confirmed


** Tags: doc low-hanging-fruit

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1842985

Title:
  Testing Zero Downtime Upgrade Process in nova - broken reference link

Status in OpenStack Compute (nova):
  Confirmed

Bug description:
  - [x] This doc is inaccurate in this way:

  The reference link here is broken:

  https://docs.openstack.org/nova/latest/contributor/testing/zero-
  downtime-upgrade.html#zero-downtime-upgrade-process

  ---
  Release:  on 2017-09-06 22:01:01
  SHA: 4476e6218499bf1ae757973b500acfa59a5a9cbe
  Source: 
https://opendev.org/openstack/nova/src/doc/source/contributor/testing/zero-downtime-upgrade.rst
  URL: 
https://docs.openstack.org/nova/latest/contributor/testing/zero-downtime-upgrade.html

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1842985/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1838666] Re: lxml 4.4.0 causes failed tests in nova

2019-08-30 Thread Matt Riedemann
** Also affects: nova/stein
   Importance: Undecided
   Status: New

** Changed in: nova
   Importance: Undecided => Medium

** Changed in: nova/stein
   Importance: Undecided => Medium

** Changed in: nova/stein
   Status: New => Confirmed

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1838666

Title:
  lxml 4.4.0 causes failed tests in nova

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) stein series:
  Confirmed

Bug description:
  It looks like it's just a ordering issue for the elements that are
  returned.

  See https://review.opendev.org/673848 for details on the failure (you
  can depend on it for testing fixes as well).

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1838666/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1842087] [NEW] _check_can_migrate_pci in the LiveMigrationTask has host agnostic validation that is redundant/expensive

2019-08-30 Thread Matt Riedemann
Public bug reported:

This PCI validation code in the live migration task in conductor is run
per possible dest host for the migration:

https://github.com/openstack/nova/blob/master/nova/conductor/tasks/live_migrate.py#L212-L228

But is host agnostic, meaning if I have 100 possible dest hosts for the
live migration and an instance with a flavor-defined pci request, it's
going to fail that validation the same way 100 times.

That validation should be pulled up to a point before we even start
asking the scheduler for hosts, e.g. like the numa live migration
support:

https://github.com/openstack/nova/blob/master/nova/conductor/tasks/live_migrate.py#L85

** Affects: nova
 Importance: Low
 Status: Triaged


** Tags: conductor live-migration

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1842087

Title:
  _check_can_migrate_pci in the LiveMigrationTask has host agnostic
  validation that is redundant/expensive

Status in OpenStack Compute (nova):
  Triaged

Bug description:
  This PCI validation code in the live migration task in conductor is
  run per possible dest host for the migration:

  
https://github.com/openstack/nova/blob/master/nova/conductor/tasks/live_migrate.py#L212-L228

  But is host agnostic, meaning if I have 100 possible dest hosts for
  the live migration and an instance with a flavor-defined pci request,
  it's going to fail that validation the same way 100 times.

  That validation should be pulled up to a point before we even start
  asking the scheduler for hosts, e.g. like the numa live migration
  support:

  
https://github.com/openstack/nova/blob/master/nova/conductor/tasks/live_migrate.py#L85

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1842087/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1842081] [NEW] Error during ComputeManager._cleanup_running_deleted_instances: VirtDriverNotReady: Virt driver is not ready. (ironic)

2019-08-30 Thread Matt Riedemann
Public bug reported:

Seeing this on start of nova-compute with ironic when ironic-api isn't
yet available:

Aug 24 01:06:39.710754 ubuntu-bionic-rax-iad-0010410623 nova-
compute[7945]: ERROR nova.virt.ironic.driver [None req-
9542c6c8-a038-45f5-bd18-e18f83c17755 None None] An unknown error has
occurred when trying to get the list of nodes from the Ironic inventory.
Error: StrictVersion instance has no attribute 'version'

Aug 24 01:06:39.711672 ubuntu-bionic-rax-iad-0010410623 nova-
compute[7945]: ERROR oslo_service.periodic_task [None req-
9542c6c8-a038-45f5-bd18-e18f83c17755 None None] Error during
ComputeManager._cleanup_running_deleted_instances: VirtDriverNotReady:
Virt driver is not ready.

Aug 24 01:06:39.711672 ubuntu-bionic-rax-iad-0010410623 nova-
compute[7945]: ERROR oslo_service.periodic_task Traceback (most recent
call last):

Aug 24 01:06:39.711672 ubuntu-bionic-rax-iad-0010410623 nova-
compute[7945]: ERROR oslo_service.periodic_task   File
"/usr/local/lib/python2.7/dist-packages/oslo_service/periodic_task.py",
line 222, in run_periodic_tasks

Aug 24 01:06:39.711672 ubuntu-bionic-rax-iad-0010410623 nova-
compute[7945]: ERROR oslo_service.periodic_task task(self, context)

Aug 24 01:06:39.711672 ubuntu-bionic-rax-iad-0010410623 nova-
compute[7945]: ERROR oslo_service.periodic_task   File
"/opt/stack/nova/nova/compute/manager.py", line 8369, in
_cleanup_running_deleted_instances

Aug 24 01:06:39.711672 ubuntu-bionic-rax-iad-0010410623 nova-
compute[7945]: ERROR oslo_service.periodic_task for instance in
self._running_deleted_instances(context):

Aug 24 01:06:39.711672 ubuntu-bionic-rax-iad-0010410623 nova-
compute[7945]: ERROR oslo_service.periodic_task   File
"/opt/stack/nova/nova/compute/manager.py", line 8423, in
_running_deleted_instances

Aug 24 01:06:39.711672 ubuntu-bionic-rax-iad-0010410623 nova-
compute[7945]: ERROR oslo_service.periodic_task instances =
self._get_instances_on_driver(context, filters)

Aug 24 01:06:39.711672 ubuntu-bionic-rax-iad-0010410623 nova-
compute[7945]: ERROR oslo_service.periodic_task   File
"/opt/stack/nova/nova/compute/manager.py", line 634, in
_get_instances_on_driver

Aug 24 01:06:39.711672 ubuntu-bionic-rax-iad-0010410623 nova-
compute[7945]: ERROR oslo_service.periodic_task driver_uuids =
self.driver.list_instance_uuids()

Aug 24 01:06:39.711672 ubuntu-bionic-rax-iad-0010410623 nova-
compute[7945]: ERROR oslo_service.periodic_task   File
"/opt/stack/nova/nova/virt/ironic/driver.py", line 685, in
list_instance_uuids

Aug 24 01:06:39.711672 ubuntu-bionic-rax-iad-0010410623 nova-
compute[7945]: ERROR oslo_service.periodic_task
fields=['instance_uuid'], limit=0)

Aug 24 01:06:39.711672 ubuntu-bionic-rax-iad-0010410623 nova-
compute[7945]: ERROR oslo_service.periodic_task   File
"/opt/stack/nova/nova/virt/ironic/driver.py", line 656, in
_get_node_list

Aug 24 01:06:39.711672 ubuntu-bionic-rax-iad-0010410623 nova-
compute[7945]: ERROR oslo_service.periodic_task raise
exception.VirtDriverNotReady()

Aug 24 01:06:39.711672 ubuntu-bionic-rax-iad-0010410623 nova-
compute[7945]: ERROR oslo_service.periodic_task VirtDriverNotReady: Virt
driver is not ready.

Aug 24 01:06:39.711672 ubuntu-bionic-rax-iad-0010410623 nova-
compute[7945]: ERROR oslo_service.periodic_task

Looks like this is due to https://review.opendev.org/#/c/657132/ in
Train where the _cleanup_running_deleted_instances periodic task runs
immediately on startup of the nova-compute service which could be before
the hypervisor (in this case ironic) is ready.

This doesn't really break anything, but it's an ugly traceback in the
logs that could be avoided. We should handle the VirtDriverNotReady
error and return from the periodic.

** Affects: nova
 Importance: Low
 Assignee: Matt Riedemann (mriedem)
 Status: Triaged


** Tags: compute ironic

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1842081

Title:
  Error during ComputeManager._cleanup_running_deleted_instances:
  VirtDriverNotReady: Virt driver is not ready. (ironic)

Status in OpenStack Compute (nova):
  Triaged

Bug description:
  Seeing this on start of nova-compute with ironic when ironic-api isn't
  yet available:

  Aug 24 01:06:39.710754 ubuntu-bionic-rax-iad-0010410623 nova-
  compute[7945]: ERROR nova.virt.ironic.driver [None req-
  9542c6c8-a038-45f5-bd18-e18f83c17755 None None] An unknown error has
  occurred when trying to get the list of nodes from the Ironic
  inventory. Error: StrictVersion instance has no attribute 'version'

  Aug 24 01:06:39.711672 ubuntu-bionic-rax-iad-0010410623 nova-
  compute[7945]: ERROR oslo_service.periodic_task [None req-
  9542c6c8-a038-45f5-bd18-e18f83c17755 None None] Error during
  ComputeManager._cleanup_running_deleted_instances: VirtDriverNotReady:
  Virt driver is not ready

[Yahoo-eng-team] [Bug 1842061] [NEW] Compute schedulers in nova - AggregateInstanceExtraSpecsFilter docs are not clear

2019-08-30 Thread Matt Riedemann
Public bug reported:

- [x] This is a doc addition request.

The description for the AggregateInstanceExtraSpecsFilter filter is not
clear:

https://docs.openstack.org/nova/latest/admin/configuration/schedulers.html#aggregateinstanceextraspecsfilter

(note it's also described here:
https://docs.openstack.org/nova/latest/user/filter-scheduler.html)

It's not clear what aggregate_instance_extra_specs is used for.

Note that further down in the document there are some examples:

https://docs.openstack.org/nova/latest/admin/configuration/schedulers.html#example-specify-compute-
hosts-with-ssds

So I guess based on that, it means you would just add metadata to a host
aggregate like foo=bar and then tie a flavor to that by setting an extra
spec of aggregate_instance_extra_specs:foo=bar on the flavor. But what
about other standard extra specs like hide_hypervisor_id, you can't put
the aggregate_instance_extra_specs prefix on that in the flavor since it
would break the extra spec for the actual code that checks for that
standard extra spec. Does that mean the flavor has to have both the
scoped and unscoped spec? Or that the filter will handle the unscoped
spec? It would be nice to have the documentation on the filter itself
explain this and give examples of how to use it, for both a standard and
custom flavor extra spec (note the latter has an example linked above
for the ssd example).

This originally came up while triaging bug 1841932 and trying to make
sense of the filter (it's not very clear even by looking at the code).

---
Release:  on 2019-08-22 20:13:47
SHA: 0882ea69ea0c46cf97ecd5a1ec49a3007f293c28
Source: 
https://opendev.org/openstack/nova/src/doc/source/admin/configuration/schedulers.rst
URL: https://docs.openstack.org/nova/latest/admin/configuration/schedulers.html

** Affects: nova
 Importance: Undecided
 Status: New


** Tags: doc

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1842061

Title:
  Compute schedulers in nova - AggregateInstanceExtraSpecsFilter docs
  are not clear

Status in OpenStack Compute (nova):
  New

Bug description:
  - [x] This is a doc addition request.

  The description for the AggregateInstanceExtraSpecsFilter filter is
  not clear:

  
https://docs.openstack.org/nova/latest/admin/configuration/schedulers.html#aggregateinstanceextraspecsfilter

  (note it's also described here:
  https://docs.openstack.org/nova/latest/user/filter-scheduler.html)

  It's not clear what aggregate_instance_extra_specs is used for.

  Note that further down in the document there are some examples:

  
https://docs.openstack.org/nova/latest/admin/configuration/schedulers.html#example-specify-compute-
  hosts-with-ssds

  So I guess based on that, it means you would just add metadata to a
  host aggregate like foo=bar and then tie a flavor to that by setting
  an extra spec of aggregate_instance_extra_specs:foo=bar on the flavor.
  But what about other standard extra specs like hide_hypervisor_id, you
  can't put the aggregate_instance_extra_specs prefix on that in the
  flavor since it would break the extra spec for the actual code that
  checks for that standard extra spec. Does that mean the flavor has to
  have both the scoped and unscoped spec? Or that the filter will handle
  the unscoped spec? It would be nice to have the documentation on the
  filter itself explain this and give examples of how to use it, for
  both a standard and custom flavor extra spec (note the latter has an
  example linked above for the ssd example).

  This originally came up while triaging bug 1841932 and trying to make
  sense of the filter (it's not very clear even by looking at the code).

  ---
  Release:  on 2019-08-22 20:13:47
  SHA: 0882ea69ea0c46cf97ecd5a1ec49a3007f293c28
  Source: 
https://opendev.org/openstack/nova/src/doc/source/admin/configuration/schedulers.rst
  URL: 
https://docs.openstack.org/nova/latest/admin/configuration/schedulers.html

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1842061/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1512645] Re: Security groups incorrectly applied on new additional interfaces

2019-08-29 Thread Matt Riedemann
** Changed in: nova
   Status: New => Opinion

** Changed in: nova
   Importance: Undecided => Wishlist

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1512645

Title:
  Security groups incorrectly applied on new additional interfaces

Status in neutron:
  Invalid
Status in OpenStack Compute (nova):
  Opinion

Bug description:
  When launching an instance with one network interface and enabling 2
  security groups everything is working as it supposed to be.

  But when attaching additional network interfaces only the default
  security group is applied to those new interfaces. The additional
  security group isn't enabled at all on those extra interfaces.

  We had to dig into the iptables chains to discover this behavior. Once
  adding the rule manually or adding them to the default security group
  everything is working fine.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1512645/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1841411] Re: Instances recovered after failed migrations enter error state (hyper-v)

2019-08-27 Thread Matt Riedemann
** Summary changed:

- Instances recovered after failed migrations enter error state
+ Instances recovered after failed migrations enter error state (hyper-v)

** Tags added: live-migration

** Changed in: nova
   Importance: Undecided => Medium

** Also affects: nova/ocata
   Importance: Undecided
   Status: New

** Also affects: nova/stein
   Importance: Undecided
   Status: New

** Also affects: nova/rocky
   Importance: Undecided
   Status: New

** Also affects: nova/queens
   Importance: Undecided
   Status: New

** Also affects: nova/pike
   Importance: Undecided
   Status: New

** No longer affects: nova/ocata

** No longer affects: nova/pike

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1841411

Title:
  Instances recovered after failed migrations enter error state
  (hyper-v)

Status in compute-hyperv:
  In Progress
Status in OpenStack Compute (nova):
  In Progress
Status in OpenStack Compute (nova) queens series:
  New
Status in OpenStack Compute (nova) rocky series:
  New
Status in OpenStack Compute (nova) stein series:
  New

Bug description:
  Most users expect that if a live migration fails but the instance is
  fully recovered, it shouldn't enter 'error' state. Setting the
  migration status to 'error' should be enough. This simplifies
  debugging, making it clear that the instance dosn't have to be
  manually recovered.

  This patch changed this behavior, indirectly affecting the Hyper-V
  driver, which propagates migration errors:
  Idfdce9e7dd8106af01db0358ada15737cb846395

  When using the Hyper-V driver, instances enter error state even after
  successful recoveries. We may copy the Libvirt driver behavior and
  avoid propagating exceptions in this case.

To manage notifications about this bug go to:
https://bugs.launchpad.net/compute-hyperv/+bug/1841411/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1841667] Re: failing libvirt tests: need ordering

2019-08-27 Thread Matt Riedemann
*** This bug is a duplicate of bug 1838666 ***
https://bugs.launchpad.net/bugs/1838666

The actual version of libvirt on the system shouldn't matter, these
tests should not be running against a real libvirt, everything should be
faked out. My guess is the tests are using unordered dicts and that's
why the keys are in a different order, or something with the way the xml
comparison code is asserting the attributes.

** Tags added: libvirt testing

** Also affects: nova/stein
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1841667

Title:
  failing libvirt tests: need ordering

Status in OpenStack Compute (nova):
  New
Status in OpenStack Compute (nova) stein series:
  New

Bug description:
  When rebuilding Nova from Stein in Debian Sid, I get 3 unit test
  errors, probably due to a more recent libvirt (ie: 5.6.0). See for
  example, on this first one:

  

  we get bus= and dev= inverted.

  ==
  FAIL: 
nova.tests.unit.virt.libvirt.test_driver.LibvirtDriverTestCase.test_get_disk_xml
  
nova.tests.unit.virt.libvirt.test_driver.LibvirtDriverTestCase.test_get_disk_xml
  --
  _StringException: pythonlogging:'': {{{2019-08-27 20:26:05,026 WARNING 
[os_brick.initiator.connectors.remotefs] Connection details not present. 
RemoteFsClient may not initialize properly.}}}

  Traceback (most recent call last):
File "/<>/nova/tests/unit/virt/libvirt/test_driver.py", line 
20926, in test_get_disk_xml
  self.assertEqual(diska_xml.strip(), actual_diska_xml.strip())
File "/usr/lib/python3/dist-packages/testtools/testcase.py", line 411, in 
assertEqual
  self.assertThat(observed, matcher, message)
File "/usr/lib/python3/dist-packages/testtools/testcase.py", line 498, in 
assertThat
  raise mismatch_error
  testtools.matchers._impl.MismatchError: !=:
  reference = '''\
  


0e38683e-f0af-418f-a3f1-6b67ea0f919d
  '''
  actual= '''\
  


0e38683e-f0af-418f-a3f1-6b67ea0f919d
  '''

  
  ==
  FAIL: 
nova.tests.unit.virt.libvirt.test_driver.LibvirtConnTestCase.test_detach_volume_with_vir_domain_affect_live_flag
  
nova.tests.unit.virt.libvirt.test_driver.LibvirtConnTestCase.test_detach_volume_with_vir_domain_affect_live_flag
  --
  _StringException: pythonlogging:'': {{{2019-08-27 20:26:31,189 WARNING 
[os_brick.initiator.connectors.remotefs] Connection details not present. 
RemoteFsClient may not initialize properly.}}}

  Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/mock/mock.py", line 1330, in patched
  return func(*args, **keywargs)
File "/<>/nova/tests/unit/virt/libvirt/test_driver.py", line 
7955, in test_detach_volume_with_vir_domain_affect_live_flag
  """, flags=flags)
File "/usr/lib/python3/dist-packages/mock/mock.py", line 944, in 
assert_called_with
  six.raise_from(AssertionError(_error_message(cause)), cause)
File "", line 3, in raise_from
  AssertionError: expected call not found.
  Expected: detachDeviceFlags('\n  \n  \n\n', 
flags=3)
  Actual: detachDeviceFlags('\n  \n  \n\n', 
flags=3)

  
  ==
  FAIL: 
nova.tests.unit.virt.libvirt.test_driver.LibvirtConnTestCase.test_update_volume_xml
  
nova.tests.unit.virt.libvirt.test_driver.LibvirtConnTestCase.test_update_volume_xml
  --
  _StringException: pythonlogging:'': {{{2019-08-27 20:26:37,451 WARNING 
[os_brick.initiator.connectors.remotefs] Connection details not present. 
RemoteFsClient may not initialize properly.}}}

  Traceback (most recent call last):
File "/<>/nova/tests/unit/virt/libvirt/test_driver.py", line 
10157, in test_update_volume_xml
  etree.tostring(config, encoding='unicode'))
File "/usr/lib/python3/dist-packages/testtools/testcase.py", line 411, in 
assertEqual
  self.assertThat(observed, matcher, message)
File "/usr/lib/python3/dist-packages/testtools/testcase.py", line 498, in 
assertThat
  raise mismatch_error
  testtools.matchers._impl.MismatchError: !=:
  reference = '58a84f6d-3f0c-4e19-a0af-eb657b790657'
  actual= '58a84f6d-3f0c-4e19-a0af-eb657b790657'

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1841667/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1841481] [NEW] Race during ironic re-balance corrupts local RT ProviderTree and compute_nodes cache

2019-08-26 Thread Matt Riedemann
Public bug reported:

Seen with an ironic re-balance in this job:

https://d01b2e57f0a56cb7edf0-b6bc206936c08bb07a5f77cfa916a2d4.ssl.cf5.rackcdn.com/678298/4/check
/ironic-tempest-ipa-wholedisk-direct-tinyipa-multinode/92c65ac/

On the subnode we see the RT detect that the node is moving hosts:

Aug 26 18:41:38.818412 ubuntu-bionic-rax-ord-0010443319 nova-
compute[747]: INFO nova.compute.resource_tracker [None req-a894abee-
a2f1-4423-8ede-2a1b9eef28a4 None None] ComputeNode 61dbc9c7-828b-4c42
-b19c-a3716037965f moving from ubuntu-bionic-rax-ord-0010443317 to
ubuntu-bionic-rax-ord-0010443319

On that new host, the ProviderTree cache is getting updated with
refreshed associations for inventory:

Aug 26 18:41:38.881026 ubuntu-bionic-rax-ord-0010443319 nova-
compute[747]: DEBUG nova.scheduler.client.report [None req-a894abee-
a2f1-4423-8ede-2a1b9eef28a4 None None] Refreshing inventories for
resource provider 61dbc9c7-828b-4c42-b19c-a3716037965f {{(pid=747)
_refresh_associations
/opt/stack/nova/nova/scheduler/client/report.py:761}}

aggregates:

Aug 26 18:41:38.953685 ubuntu-bionic-rax-ord-0010443319 nova-
compute[747]: DEBUG nova.scheduler.client.report [None req-a894abee-
a2f1-4423-8ede-2a1b9eef28a4 None None] Refreshing aggregate associations
for resource provider 61dbc9c7-828b-4c42-b19c-a3716037965f, aggregates:
None {{(pid=747) _refresh_associations
/opt/stack/nova/nova/scheduler/client/report.py:770}}

and traits - but when we get traits the provider is gone:

Aug 26 18:41:38.995595 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: 
ERROR nova.compute.manager [None req-a894abee-a2f1-4423-8ede-2a1b9eef28a4 None 
None] Error updating resources for node 61dbc9c7-828b-4c42-b19c-a3716037965f.: 
ResourceProviderTraitRetrievalFailed: Failed to get traits for resource 
provider with UUID 61dbc9c7-828b-4c42-b19c-a3716037965f
Aug 26 18:41:38.995595 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: 
ERROR nova.compute.manager Traceback (most recent call last):
Aug 26 18:41:38.995595 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: 
ERROR nova.compute.manager   File "/opt/stack/nova/nova/compute/manager.py", 
line 8250, in _update_available_resource_for_node
Aug 26 18:41:38.995595 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: 
ERROR nova.compute.manager startup=startup)
Aug 26 18:41:38.995595 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: 
ERROR nova.compute.manager   File 
"/opt/stack/nova/nova/compute/resource_tracker.py", line 715, in 
update_available_resource
Aug 26 18:41:38.995595 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: 
ERROR nova.compute.manager self._update_available_resource(context, 
resources, startup=startup)
Aug 26 18:41:38.995595 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: 
ERROR nova.compute.manager   File 
"/usr/local/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py", line 
328, in inner
Aug 26 18:41:38.995595 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: 
ERROR nova.compute.manager return f(*args, **kwargs)
Aug 26 18:41:38.995595 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: 
ERROR nova.compute.manager   File 
"/opt/stack/nova/nova/compute/resource_tracker.py", line 738, in 
_update_available_resource
Aug 26 18:41:38.995595 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: 
ERROR nova.compute.manager is_new_compute_node = 
self._init_compute_node(context, resources)
Aug 26 18:41:38.995595 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: 
ERROR nova.compute.manager   File 
"/opt/stack/nova/nova/compute/resource_tracker.py", line 561, in 
_init_compute_node
Aug 26 18:41:38.995595 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: 
ERROR nova.compute.manager if self._check_for_nodes_rebalance(context, 
resources, nodename):
Aug 26 18:41:38.995595 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: 
ERROR nova.compute.manager   File 
"/opt/stack/nova/nova/compute/resource_tracker.py", line 516, in 
_check_for_nodes_rebalance
Aug 26 18:41:38.995595 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: 
ERROR nova.compute.manager self._update(context, cn)
Aug 26 18:41:38.995595 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: 
ERROR nova.compute.manager   File 
"/opt/stack/nova/nova/compute/resource_tracker.py", line 1054, in _update
Aug 26 18:41:38.995595 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: 
ERROR nova.compute.manager self._update_to_placement(context, compute_node, 
startup)
Aug 26 18:41:38.996935 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: 
ERROR nova.compute.manager   File 
"/usr/local/lib/python2.7/dist-packages/retrying.py", line 49, in wrapped_f
Aug 26 18:41:38.996935 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: 
ERROR nova.compute.manager return Retrying(*dargs, **dkw).call(f, *args, 
**kw)
Aug 26 18:41:38.996935 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: 
ERROR nova.compute.manager   File 
"/usr/local/lib/python2.7/dist-packages/retrying.py", line 206, in call

[Yahoo-eng-team] [Bug 1841476] [NEW] Spurious ComputeHostNotFound warnings in nova-compute logs during ironic node re-balance

2019-08-26 Thread Matt Riedemann
Public bug reported:

Seen here:

https://d01b2e57f0a56cb7edf0-b6bc206936c08bb07a5f77cfa916a2d4.ssl.cf5.rackcdn.com/678298/4/check
/ironic-tempest-ipa-wholedisk-direct-tinyipa-
multinode/92c65ac/compute1/logs/screen-n-cpu.txt.gz

We see a warning that a compute node could not be found by host and node
but then later is found just by nodename and is moving to the current
host:

Aug 26 18:41:38.800657 ubuntu-bionic-rax-ord-0010443319 nova-
compute[747]: WARNING nova.compute.resource_tracker [None req-a894abee-
a2f1-4423-8ede-2a1b9eef28a4 None None] No compute node record for
ubuntu-bionic-rax-ord-0010443319:61dbc9c7-828b-4c42-b19c-a3716037965f:
ComputeHostNotFound_Remote: Compute host ubuntu-bionic-rax-
ord-0010443319 could not be found.

Aug 26 18:41:38.818412 ubuntu-bionic-rax-ord-0010443319 nova-
compute[747]: INFO nova.compute.resource_tracker [None req-a894abee-
a2f1-4423-8ede-2a1b9eef28a4 None None] ComputeNode 61dbc9c7-828b-4c42
-b19c-a3716037965f moving from ubuntu-bionic-rax-ord-0010443317 to
ubuntu-bionic-rax-ord-0010443319

The warning comes from this call:

https://github.com/openstack/nova/blob/71478c3eedd95e2eeb219f47460603221ee249b9/nova/compute/resource_tracker.py#L554

And the re-balance is found here:

https://github.com/openstack/nova/blob/71478c3eedd95e2eeb219f47460603221ee249b9/nova/compute/resource_tracker.py#L561

The warning is then a red herring. We could:

1. add something to the warning message saying this could be due to a
re-balance but that might be confusing for non-ironic computes

and/or

2. check if self.driver.rebalances_nodes and if True, change the warning
to an info level message (and potentially modify the message with the
re-balance wording in #1 above).

** Affects: nova
 Importance: Low
 Status: Triaged


** Tags: ironic resource-tracker serviceability

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1841476

Title:
  Spurious ComputeHostNotFound warnings in nova-compute logs during
  ironic node re-balance

Status in OpenStack Compute (nova):
  Triaged

Bug description:
  Seen here:

  
https://d01b2e57f0a56cb7edf0-b6bc206936c08bb07a5f77cfa916a2d4.ssl.cf5.rackcdn.com/678298/4/check
  /ironic-tempest-ipa-wholedisk-direct-tinyipa-
  multinode/92c65ac/compute1/logs/screen-n-cpu.txt.gz

  We see a warning that a compute node could not be found by host and
  node but then later is found just by nodename and is moving to the
  current host:

  Aug 26 18:41:38.800657 ubuntu-bionic-rax-ord-0010443319 nova-
  compute[747]: WARNING nova.compute.resource_tracker [None req-
  a894abee-a2f1-4423-8ede-2a1b9eef28a4 None None] No compute node record
  for ubuntu-bionic-rax-ord-0010443319:61dbc9c7-828b-4c42-b19c-
  a3716037965f: ComputeHostNotFound_Remote: Compute host ubuntu-bionic-
  rax-ord-0010443319 could not be found.

  Aug 26 18:41:38.818412 ubuntu-bionic-rax-ord-0010443319 nova-
  compute[747]: INFO nova.compute.resource_tracker [None req-a894abee-
  a2f1-4423-8ede-2a1b9eef28a4 None None] ComputeNode 61dbc9c7-828b-4c42
  -b19c-a3716037965f moving from ubuntu-bionic-rax-ord-0010443317 to
  ubuntu-bionic-rax-ord-0010443319

  The warning comes from this call:

  
https://github.com/openstack/nova/blob/71478c3eedd95e2eeb219f47460603221ee249b9/nova/compute/resource_tracker.py#L554

  And the re-balance is found here:

  
https://github.com/openstack/nova/blob/71478c3eedd95e2eeb219f47460603221ee249b9/nova/compute/resource_tracker.py#L561

  The warning is then a red herring. We could:

  1. add something to the warning message saying this could be due to a
  re-balance but that might be confusing for non-ironic computes

  and/or

  2. check if self.driver.rebalances_nodes and if True, change the
  warning to an info level message (and potentially modify the message
  with the re-balance wording in #1 above).

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1841476/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1833902] Re: Revert resize tests are failing in jobs with iptables_hybrid fw driver

2019-08-26 Thread Matt Riedemann
** No longer affects: neutron

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1833902

Title:
  Revert resize tests are failing in jobs with iptables_hybrid fw driver

Status in OpenStack Compute (nova):
  Fix Released

Bug description:
  Tests:

  
tempest.api.compute.admin.test_migrations.MigrationsAdminTest.test_resize_server_revert_deleted_flavor
  
tempest.api.compute.servers.test_server_actions.ServerActionsTestJSON.test_resize_server_revert
  
tempest.api.compute.servers.test_server_actions.ServerActionsTestJSON.test_resize_server_revert_with_volume_attached

  are failing 100% times since last ~2 days.
  And it happens only in jobs with iptables_hybrid fw driver but I don't know 
if this is really some source of issue or maybe just red herring.

  Logstash query:

  
http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22tempest.api.compute.admin.test_migrations.MigrationsAdminTest.test_resize_server_revert_deleted_flavor%5C%22%20AND%20message%3A%5C%22FAILED%5C%22

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1833902/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1840978] [NEW] nova-manage commands with unexpected errors returning 1 conflict with expected cases of 1 for flow control

2019-08-21 Thread Matt Riedemann
Public bug reported:

The archive_deleted_rows command returns 1 meaning some records were
archived and the code documents that if automating and not using
--until-complete, you should keep going while you get rc=1 until you get
rc=0:

https://github.com/openstack/nova/blob/0bf81cfe73340ba5cfd9cf44a38905014ba780f0/nova/cmd/manage.py#L505

The problem is if some unexpected error happens, let's say there is a
TypeError in the code or something, the command will also return 1:

https://github.com/openstack/nova/blob/0bf81cfe73340ba5cfd9cf44a38905014ba780f0/nova/cmd/manage.py#L2625

That unexpected error should probably be a 255 which generally means a
command failed in some unexpected way. There might be other nova-manage
commands that return 1 for flow control as well.

Note that changing the "unexpected error" code from 1 to 255 is an
upgrade impacting change worth a release note.

** Affects: nova
 Importance: Low
 Status: Triaged


** Tags: nova-manage

** Tags added: nova-manage

** Changed in: nova
   Importance: Undecided => Low

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1840978

Title:
  nova-manage commands with unexpected errors returning 1 conflict with
  expected cases of 1 for flow control

Status in OpenStack Compute (nova):
  Triaged

Bug description:
  The archive_deleted_rows command returns 1 meaning some records were
  archived and the code documents that if automating and not using
  --until-complete, you should keep going while you get rc=1 until you
  get rc=0:

  
https://github.com/openstack/nova/blob/0bf81cfe73340ba5cfd9cf44a38905014ba780f0/nova/cmd/manage.py#L505

  The problem is if some unexpected error happens, let's say there is a
  TypeError in the code or something, the command will also return 1:

  
https://github.com/openstack/nova/blob/0bf81cfe73340ba5cfd9cf44a38905014ba780f0/nova/cmd/manage.py#L2625

  That unexpected error should probably be a 255 which generally means a
  command failed in some unexpected way. There might be other nova-
  manage commands that return 1 for flow control as well.

  Note that changing the "unexpected error" code from 1 to 255 is an
  upgrade impacting change worth a release note.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1840978/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1704179] Re: Too many period db actions in large scale clusters increase the load of database

2019-08-21 Thread Matt Riedemann
*** This bug is a duplicate of bug 1729621 ***
https://bugs.launchpad.net/bugs/1729621

** This bug has been marked a duplicate of bug 1729621
   Inconsistent value for vcpu_used

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1704179

Title:
  Too many period db actions in large scale clusters increase the  load
  of database

Status in OpenStack Compute (nova):
  In Progress

Bug description:
  Too many period db actions in large scale clusters increase the  load
  of database,  especially un-necessary db update or query.

  For example, over 1000 nodes, it will be 2 * 1000=2000 db update for
  compute_node table in 60s in _update_available_resource, but this two
  db update is not necessary if resource usage not changed.

  Delete the first and second _update()  in _init_compute_node can
  reduce two db update for one node in 60s, if resource usage not
  changed for this compute_node.

  Then the funtion self._resource_change(compute_node) in _update() make
  sense

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1704179/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1729621] Re: Inconsistent value for vcpu_used

2019-08-21 Thread Matt Riedemann
** Also affects: nova/queens
   Importance: Undecided
   Status: New

** Also affects: nova/rocky
   Importance: Undecided
   Status: New

** Changed in: nova/queens
   Status: New => Fix Released

** Changed in: nova/rocky
   Status: New => Fix Released

** Changed in: nova/pike
 Assignee: Tony Breeds (o-tony) => Radoslav Gerganov (rgerganov)

** Changed in: nova/pike
   Status: In Progress => Won't Fix

** Changed in: nova/queens
 Assignee: (unassigned) => Radoslav Gerganov (rgerganov)

** Changed in: nova/rocky
 Assignee: (unassigned) => Radoslav Gerganov (rgerganov)

** No longer affects: nova/ocata

** Changed in: nova/queens
   Importance: Undecided => High

** Changed in: nova/rocky
   Importance: Undecided => High

** Changed in: nova/pike
   Importance: Undecided => High

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1729621

Title:
  Inconsistent value for vcpu_used

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) pike series:
  Won't Fix
Status in OpenStack Compute (nova) queens series:
  Fix Released
Status in OpenStack Compute (nova) rocky series:
  Fix Released

Bug description:
  Description
  ===

  Nova updates hypervisor resources using function called
  ./nova/compute/resource_tracker.py:update_available_resource().

  In case of *shutdowned* instances it could impact inconsistent values
  for resources like vcpu_used.

  Resources are taken from function self.driver.get_available_resource():
  
https://github.com/openstack/nova/blob/f974e3c3566f379211d7fdc790d07b5680925584/nova/compute/resource_tracker.py#L617
  
https://github.com/openstack/nova/blob/f974e3c3566f379211d7fdc790d07b5680925584/nova/virt/libvirt/driver.py#L5766

  This function calculates allocated vcpu's based on function _get_vcpu_total().
  
https://github.com/openstack/nova/blob/f974e3c3566f379211d7fdc790d07b5680925584/nova/virt/libvirt/driver.py#L5352

  As we see in _get_vcpu_total() function calls
  *self._host.list_guests()* without "only_running=False" parameter. So
  it doesn't respect shutdowned instances.

  At the end of resource update process function _update_available_resource() 
is beign called:
  > /opt/stack/nova/nova/compute/resource_tracker.py(733)

   677 @utils.synchronized(COMPUTE_RESOURCE_SEMAPHORE)
   678 def _update_available_resource(self, context, resources):
   679
   681 # initialize the compute node object, creating it
   682 # if it does not already exist.
   683 self._init_compute_node(context, resources)

  It initialize compute node object with resources that are calculated
  without shutdowned instances. If compute node object already exists it
  *UPDATES* its fields - *for a while nova-api has other resources
  values than it its in real.*

   731 # update the compute_node
   732 self._update(context, cn)

  The inconsistency is automatically fixed during other code execution:
  
https://github.com/openstack/nova/blob/f974e3c3566f379211d7fdc790d07b5680925584/nova/compute/resource_tracker.py#L709

  But for heavy-loaded hypervisors (like 100 active instances and 30
  shutdowned instances) it creates wrong informations in nova database
  for about 4-5 seconds (in my usecase) - it could impact other issues
  like spawning on already full hypervisor (because scheduler has wrong
  informations about hypervisor usage).

  Steps to reproduce
  ==

  1) Start devstack
  2) Create 120 instances
  3) Stop some instances
  4) Watch blinking values in nova hypervisor-show
  nova hypervisor-show e6dfc16b-7914-48fb-a235-6fe3a41bb6db

  Expected result
  ===
  Returned values should be the same during test.

  Actual result
  =
  while true; do echo -n "$(date) "; echo "select hypervisor_hostname, 
vcpus_used from compute_nodes where 
hypervisor_hostname='example.compute.node.com';" | mysql nova_cell1; sleep 0.3; 
done

  Thu Nov  2 14:50:09 UTC 2017 example.compute.node.com  120
  Thu Nov  2 14:50:10 UTC 2017 example.compute.node.com  120
  Thu Nov  2 14:50:10 UTC 2017 example.compute.node.com  120
  Thu Nov  2 14:50:10 UTC 2017 example.compute.node.com  120
  Thu Nov  2 14:50:11 UTC 2017 example.compute.node.com  120
  Thu Nov  2 14:50:11 UTC 2017 example.compute.node.com  120
  Thu Nov  2 14:50:11 UTC 2017 example.compute.node.com  120
  Thu Nov  2 14:50:11 UTC 2017 example.compute.node.com  120
  Thu Nov  2 14:50:12 UTC 2017 example.compute.node.com  117
  Thu Nov  2 14:50:12 UTC 2017 example.compute.node.com  117
  Thu Nov  2 14:50:12 UTC 2017 example.compute.node.com  117
  Thu Nov  2 14:50:13 UTC 2017 example.compute.node.com  117
  Thu Nov  2 14:50:13 UTC 2017 example.compute.node.com  117
  Thu Nov  2 14:50:13 UTC 2017 example.compute.node.com  117
  Thu Nov  2 14:50:14 UTC 

[Yahoo-eng-team] [Bug 1789991] Re: nova-compute error after enrolling ironic baremetal nodes

2019-08-21 Thread Matt Riedemann
*** This bug is a duplicate of bug 1839674 ***
https://bugs.launchpad.net/bugs/1839674

** This bug has been marked a duplicate of bug 1839674
   ResourceTracker.compute_nodes won't try to create a ComputeNode a second 
time if the first create() fails

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1789991

Title:
  nova-compute error after enrolling ironic baremetal nodes

Status in OpenStack Compute (nova):
  New

Bug description:
  Description
  ===
  After enrolling some ironic baremetal nodes, I noticed the following in 
nova-compute.log (longer trace below): 

  2018-08-30 17:00:51.142 7 ERROR nova.compute.manager [req-73ba9d4b-
  b51d-4ab7-88c8-5fc3f27fd89e - - - - -] Error updating resources for
  node 0e5705cc-e872-49aa-aff4-1a91278b5cb3.: NotImplementedError:
  Cannot load 'id' in the base class

  Steps to reproduce
  ==

  * Enroll ironic baremetal nodes (openstack baremetal node provide)
  * Wait
  * Error repeatedly appears in nova-compute.log

  Expected result
  ===
  No errors in log

  Actual result
  =
  Errors in log

  Environment
  ===
  openstack-nova-compute-18.0.0-0.20180829095234.45fc232.el7.noarch
  puppet-nova-13.3.1-0.20180825165256.5d1889b.el7.noarch
  python-nova-18.0.0-0.20180829095234.45fc232.el7.noarch
  python-novajoin-1.0.19-0.20180828183900.3d58511.el7.noarch
  openstack-nova-common-18.0.0-0.20180829095234.45fc232.el7.noarch
  python2-novaclient-11.0.0-0.20180807085257.f1005ce.el7.noarch

  
  Logs & Configs
  =
  2018-08-30 17:00:51.142 7 DEBUG oslo_concurrency.lockutils 
[req-73ba9d4b-b51d-4ab7-88c8-5fc3f27fd89e - - - - -] Lock "compute_resources" 
release\
  d by "nova.compute.resource_tracker._update_available_resource" :: held 
0.001s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils\
  .py:285
  2018-08-30 17:00:51.142 7 ERROR nova.compute.manager 
[req-73ba9d4b-b51d-4ab7-88c8-5fc3f27fd89e - - - - -] Error updating resources 
for node 0e57\
  05cc-e872-49aa-aff4-1a91278b5cb3.: NotImplementedError: Cannot load 'id' in 
the base class
  2018-08-30 17:00:51.142 7 ERROR nova.compute.manager Traceback (most recent 
call last):
  2018-08-30 17:00:51.142 7 ERROR nova.compute.manager   File 
"/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 7729, in 
_update_av\
  ailable_resource_for_node
  2018-08-30 17:00:51.142 7 ERROR nova.compute.manager 
rt.update_available_resource(context, nodename)
  2018-08-30 17:00:51.142 7 ERROR nova.compute.manager   File 
"/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 700, 
in up\
  date_available_resource
  2018-08-30 17:00:51.142 7 ERROR nova.compute.manager 
self._update_available_resource(context, resources)
  2018-08-30 17:00:51.142 7 ERROR nova.compute.manager   File 
"/usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py", line 274, in 
inner
  2018-08-30 17:00:51.142 7 ERROR nova.compute.manager return f(*args, 
**kwargs)
  2018-08-30 17:00:51.142 7 ERROR nova.compute.manager   File 
"/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 723, 
in _u\
  pdate_available_resource
  2018-08-30 17:00:51.142 7 ERROR nova.compute.manager 
self._init_compute_node(context, resources)
  2018-08-30 17:00:51.142 7 ERROR nova.compute.manager   File 
"/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 563, 
in _i\
  nit_compute_node
  2018-08-30 17:00:51.142 7 ERROR nova.compute.manager 
self._setup_pci_tracker(context, cn, resources)
  2018-08-30 17:00:51.142 7 ERROR nova.compute.manager   File 
"/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 594, 
in _s\
  etup_pci_tracker
  2018-08-30 17:00:51.142 7 ERROR nova.compute.manager n_id = 
compute_node.id
  2018-08-30 17:00:51.142 7 ERROR nova.compute.manager   File 
"/usr/lib/python2.7/site-packages/oslo_versionedobjects/base.py", line 67, in 
getter
  2018-08-30 17:00:51.142 7 ERROR nova.compute.manager 
self.obj_load_attr(name)
  2018-08-30 17:00:51.142 7 ERROR nova.compute.manager   File 
"/usr/lib/python2.7/site-packages/oslo_versionedobjects/base.py", line 603, in 
obj_l\
  oad_attr
  2018-08-30 17:00:51.142 7 ERROR nova.compute.manager _("Cannot load '%s' 
in the base class") % attrname)
  2018-08-30 17:00:51.142 7 ERROR nova.compute.manager NotImplementedError: 
Cannot load 'id' in the base class
  2018-08-30 17:00:51.142 7 ERROR nova.compute.manager

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1789991/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1840930] [NEW] Networking service in neutron - install guide says to configure nova with [neutron]/url which is deprecated

2019-08-21 Thread Matt Riedemann
Public bug reported:

- [x] This doc is inaccurate in this way:

The [neutron]/url option
https://docs.openstack.org/nova/latest/configuration/config.html#neutron.url
in nova has been deprecated since the Queens release and is being
removed in Train. The neutron/compute config guide in the neutron
install guides still says to use the url option though. Since Queens
when nova started using KSA adapters for working with neutron config:

https://review.opendev.org/#/c/509892/

I think we want to avoid configuring the [neutron] section in nova.conf
with url or endpoint_override but install rely on KSA to use the service
types authority to find the endpoint to use based on service name/type
and interface, in other words things should just work without needing to
explicitly define an endpoint url to use for nova talking to neutron -
nova can go through KSA and the service catalog to get the endpoint it
needs.

---
Release: 14.1.0.dev665 on 2017-06-30 05:58:47
SHA: 490471ebd3ac56d0cee164b9c1c1211687e49437
Source: https://opendev.org/openstack/neutron/src/doc/source/install/index.rst
URL: https://docs.openstack.org/neutron/latest/install/

** Affects: neutron
 Importance: Undecided
 Status: New


** Tags: doc

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1840930

Title:
  Networking service in neutron - install guide says to configure nova
  with [neutron]/url which is deprecated

Status in neutron:
  New

Bug description:
  - [x] This doc is inaccurate in this way:

  The [neutron]/url option
  https://docs.openstack.org/nova/latest/configuration/config.html#neutron.url
  in nova has been deprecated since the Queens release and is being
  removed in Train. The neutron/compute config guide in the neutron
  install guides still says to use the url option though. Since Queens
  when nova started using KSA adapters for working with neutron config:

  https://review.opendev.org/#/c/509892/

  I think we want to avoid configuring the [neutron] section in
  nova.conf with url or endpoint_override but install rely on KSA to use
  the service types authority to find the endpoint to use based on
  service name/type and interface, in other words things should just
  work without needing to explicitly define an endpoint url to use for
  nova talking to neutron - nova can go through KSA and the service
  catalog to get the endpoint it needs.

  ---
  Release: 14.1.0.dev665 on 2017-06-30 05:58:47
  SHA: 490471ebd3ac56d0cee164b9c1c1211687e49437
  Source: https://opendev.org/openstack/neutron/src/doc/source/install/index.rst
  URL: https://docs.openstack.org/neutron/latest/install/

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1840930/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1840430] Re: 创建虚拟机出错

2019-08-19 Thread Matt Riedemann
Looks like the nova-api service isn't configured properly for
authenticating to neutron, make sure the [neutron] section of your nova
configuration is set for working with neutron. See:

https://docs.openstack.org/neutron/latest/install/controller-install-
ubuntu.html#configure-the-compute-service-to-use-the-networking-service

** Changed in: nova
   Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1840430

Title:
  创建虚拟机出错

Status in OpenStack Compute (nova):
  Invalid

Bug description:
  2019-08-16 17:13:51.274 8884 INFO nova.osapi_compute.wsgi.server 
[req-71681a14-7b44-471f-8060-30419a0924b2 7a29e10f16db4e4a938f9b73b2599310 
81018e90418c4a708459ee88bf9a734c - default default] 192.168.1.115 "GET /v2.1 
HTTP/1.1" status: 302 len: 249 time: 0.1194441
  2019-08-16 17:13:51.279 8884 INFO nova.osapi_compute.wsgi.server 
[req-45b7dc8c-f676-40ee-9c41-6698fce3a636 7a29e10f16db4e4a938f9b73b2599310 
81018e90418c4a708459ee88bf9a734c - default default] 192.168.1.115 "GET /v2.1/ 
HTTP/1.1" status: 200 len: 720 time: 0.0041459
  2019-08-16 17:13:51.402 8884 INFO nova.api.openstack.wsgi 
[req-15a014b2-b96a-43bf-b0c7-dd378bb551b3 7a29e10f16db4e4a938f9b73b2599310 
81018e90418c4a708459ee88bf9a734c - default default] HTTP 异常抛出:云主机类型 m1.tiny 
没有找到。
  2019-08-16 17:13:51.403 8884 INFO nova.osapi_compute.wsgi.server 
[req-15a014b2-b96a-43bf-b0c7-dd378bb551b3 7a29e10f16db4e4a938f9b73b2599310 
81018e90418c4a708459ee88bf9a734c - default default] 192.168.1.115 "GET 
/v2.1/flavors/m1.tiny HTTP/1.1" status: 404 len: 472 time: 0.0161059
  2019-08-16 17:13:51.422 8884 INFO nova.osapi_compute.wsgi.server 
[req-9136c449-1258-44ce-abbd-46e560724f29 7a29e10f16db4e4a938f9b73b2599310 
81018e90418c4a708459ee88bf9a734c - default default] 192.168.1.115 "GET 
/v2.1/flavors?is_public=None HTTP/1.1" status: 200 len: 1780 time: 0.0171521
  2019-08-16 17:13:51.437 8884 INFO nova.osapi_compute.wsgi.server 
[req-66e55bfa-f1de-4727-9853-d6bc833abf36 7a29e10f16db4e4a938f9b73b2599310 
81018e90418c4a708459ee88bf9a734c - default default] 192.168.1.115 "GET 
/v2.1/flavors/cadf12b6-fa82-4e33-a933-b222a2525622 HTTP/1.1" status: 200 len: 
800 time: 0.0120480
  2019-08-16 17:13:51.709 8884 ERROR nova.api.openstack.extensions 
[req-752381d2-f28e-4907-a536-7169473f9698 7a29e10f16db4e4a938f9b73b2599310 
81018e90418c4a708459ee88bf9a734c - default default] 在API方法中发生未预料的异常
  2019-08-16 17:13:51.709 8884 ERROR nova.api.openstack.extensions Traceback 
(most recent call last):
  2019-08-16 17:13:51.709 8884 ERROR nova.api.openstack.extensions   File 
"/usr/lib/python2.7/site-packages/nova/api/openstack/extensions.py", line 338, 
in wrapped
  2019-08-16 17:13:51.709 8884 ERROR nova.api.openstack.extensions return 
f(*args, **kwargs)
  2019-08-16 17:13:51.709 8884 ERROR nova.api.openstack.extensions   File 
"/usr/lib/python2.7/site-packages/nova/api/validation/__init__.py", line 108, 
in wrapper
  2019-08-16 17:13:51.709 8884 ERROR nova.api.openstack.extensions return 
func(*args, **kwargs)
  2019-08-16 17:13:51.709 8884 ERROR nova.api.openstack.extensions   File 
"/usr/lib/python2.7/site-packages/nova/api/validation/__init__.py", line 108, 
in wrapper
  2019-08-16 17:13:51.709 8884 ERROR nova.api.openstack.extensions return 
func(*args, **kwargs)
  2019-08-16 17:13:51.709 8884 ERROR nova.api.openstack.extensions   File 
"/usr/lib/python2.7/site-packages/nova/api/validation/__init__.py", line 108, 
in wrapper
  2019-08-16 17:13:51.709 8884 ERROR nova.api.openstack.extensions return 
func(*args, **kwargs)
  2019-08-16 17:13:51.709 8884 ERROR nova.api.openstack.extensions   File 
"/usr/lib/python2.7/site-packages/nova/api/validation/__init__.py", line 108, 
in wrapper
  2019-08-16 17:13:51.709 8884 ERROR nova.api.openstack.extensions return 
func(*args, **kwargs)
  2019-08-16 17:13:51.709 8884 ERROR nova.api.openstack.extensions   File 
"/usr/lib/python2.7/site-packages/nova/api/validation/__init__.py", line 108, 
in wrapper
  2019-08-16 17:13:51.709 8884 ERROR nova.api.openstack.extensions return 
func(*args, **kwargs)
  2019-08-16 17:13:51.709 8884 ERROR nova.api.openstack.extensions   File 
"/usr/lib/python2.7/site-packages/nova/api/validation/__init__.py", line 108, 
in wrapper
  2019-08-16 17:13:51.709 8884 ERROR nova.api.openstack.extensions return 
func(*args, **kwargs)
  2019-08-16 17:13:51.709 8884 ERROR nova.api.openstack.extensions   File 
"/usr/lib/python2.7/site-packages/nova/api/openstack/compute/servers.py", line 
642, in create
  2019-08-16 17:13:51.709 8884 ERROR nova.api.openstack.extensions 
**create_kwargs)
  2019-08-16 17:13:51.709 8884 ERROR nova.api.openstack.extensions   File 
"/usr/lib/python2.7/site-packages/nova/hooks.py", line 154, in inner
  2019-08-16 17:13:51.709 8884 ERROR nova.api.openstack.extensions rv = 
f(*args, **kwargs)
  2019-08-16 17:13:51.709 

[Yahoo-eng-team] [Bug 1784874] Re: ResourceTracker doesn't clean up compute_nodes or stats entries

2019-08-14 Thread Matt Riedemann
** Also affects: nova/ocata
   Importance: Undecided
   Status: New

** Changed in: nova/ocata
   Status: New => In Progress

** Changed in: nova/ocata
   Importance: Undecided => Low

** Changed in: nova/ocata
 Assignee: (unassigned) => Matt Riedemann (mriedem)

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1784874

Title:
  ResourceTracker doesn't clean up compute_nodes or stats entries

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) ocata series:
  In Progress
Status in OpenStack Compute (nova) pike series:
  In Progress
Status in OpenStack Compute (nova) queens series:
  In Progress

Bug description:
  This was noted in review:

  https://review.openstack.org/#/c/587636/4/nova/compute/resource_tracker.py@141

  That the ResourceTracker.compute_nodes and ResourceTracker.stats (and
  old_resources) entries only grow and are never cleaned up as we
  rebalance nodes or nodes are deleted, which means it just takes up
  memory over time.

  When we cleanup compute nodes here:

  
https://github.com/openstack/nova/blob/47ef500f4492c731ebfa33a12822ef6b5db4e7e2/nova/compute/manager.py#L7759

  We should probably call a cleanup hook into the ResourceTracker to
  cleanup those entries as well.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1784874/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1784874] Re: ResourceTracker doesn't clean up compute_nodes or stats entries

2019-08-14 Thread Matt Riedemann
** Also affects: nova/pike
   Importance: Undecided
   Status: New

** Changed in: nova/pike
   Importance: Undecided => Low

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1784874

Title:
  ResourceTracker doesn't clean up compute_nodes or stats entries

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) pike series:
  In Progress
Status in OpenStack Compute (nova) queens series:
  In Progress

Bug description:
  This was noted in review:

  https://review.openstack.org/#/c/587636/4/nova/compute/resource_tracker.py@141

  That the ResourceTracker.compute_nodes and ResourceTracker.stats (and
  old_resources) entries only grow and are never cleaned up as we
  rebalance nodes or nodes are deleted, which means it just takes up
  memory over time.

  When we cleanup compute nodes here:

  
https://github.com/openstack/nova/blob/47ef500f4492c731ebfa33a12822ef6b5db4e7e2/nova/compute/manager.py#L7759

  We should probably call a cleanup hook into the ResourceTracker to
  cleanup those entries as well.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1784874/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1840159] [NEW] nova-grenade-live-migration intermittently fails with "Error monitoring migration: Timed out during operation: cannot acquire state change lock (held by remoteDisp

2019-08-14 Thread Matt Riedemann
Public bug reported:

Seen here:

https://logs.opendev.org/21/655721/14/check/nova-grenade-live-
migration/2ee634d/logs/subnode-2/screen-n-cpu.txt.gz?level=TRACE#_Aug_13_10_03_49_974378

Aug 13 10:03:49.974378 ubuntu-bionic-limestone-regionone-0010083920 
nova-compute[25863]: WARNING nova.virt.libvirt.driver [-] [instance: 
a1637e8b-6f2d-4127-9799-31cefb3f43a6] Error monitoring migration: Timed out 
during operation: cannot acquire state change lock (held by 
remoteDispatchDomainMigratePerform3Params): libvirtError: Timed out during 
operation: cannot acquire state change lock (held by 
remoteDispatchDomainMigratePerform3Params)
Aug 13 10:03:49.974378 ubuntu-bionic-limestone-regionone-0010083920 
nova-compute[25863]: ERROR nova.virt.libvirt.driver [instance: 
a1637e8b-6f2d-4127-9799-31cefb3f43a6] Traceback (most recent call last):
Aug 13 10:03:49.974378 ubuntu-bionic-limestone-regionone-0010083920 
nova-compute[25863]: ERROR nova.virt.libvirt.driver [instance: 
a1637e8b-6f2d-4127-9799-31cefb3f43a6]   File 
"/opt/stack/old/nova/nova/virt/libvirt/driver.py", line 8052, in _live_migration
Aug 13 10:03:49.974378 ubuntu-bionic-limestone-regionone-0010083920 
nova-compute[25863]: ERROR nova.virt.libvirt.driver [instance: 
a1637e8b-6f2d-4127-9799-31cefb3f43a6] finish_event, disk_paths)
Aug 13 10:03:49.974378 ubuntu-bionic-limestone-regionone-0010083920 
nova-compute[25863]: ERROR nova.virt.libvirt.driver [instance: 
a1637e8b-6f2d-4127-9799-31cefb3f43a6]   File 
"/opt/stack/old/nova/nova/virt/libvirt/driver.py", line 7857, in 
_live_migration_monitor
Aug 13 10:03:49.974378 ubuntu-bionic-limestone-regionone-0010083920 
nova-compute[25863]: ERROR nova.virt.libvirt.driver [instance: 
a1637e8b-6f2d-4127-9799-31cefb3f43a6] info = guest.get_job_info()
Aug 13 10:03:49.974378 ubuntu-bionic-limestone-regionone-0010083920 
nova-compute[25863]: ERROR nova.virt.libvirt.driver [instance: 
a1637e8b-6f2d-4127-9799-31cefb3f43a6]   File 
"/opt/stack/old/nova/nova/virt/libvirt/guest.py", line 709, in get_job_info
Aug 13 10:03:49.974378 ubuntu-bionic-limestone-regionone-0010083920 
nova-compute[25863]: ERROR nova.virt.libvirt.driver [instance: 
a1637e8b-6f2d-4127-9799-31cefb3f43a6] stats = self._domain.jobStats()
Aug 13 10:03:49.974378 ubuntu-bionic-limestone-regionone-0010083920 
nova-compute[25863]: ERROR nova.virt.libvirt.driver [instance: 
a1637e8b-6f2d-4127-9799-31cefb3f43a6]   File 
"/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 190, in doit
Aug 13 10:03:49.974378 ubuntu-bionic-limestone-regionone-0010083920 
nova-compute[25863]: ERROR nova.virt.libvirt.driver [instance: 
a1637e8b-6f2d-4127-9799-31cefb3f43a6] result = proxy_call(self._autowrap, 
f, *args, **kwargs)
Aug 13 10:03:49.974378 ubuntu-bionic-limestone-regionone-0010083920 
nova-compute[25863]: ERROR nova.virt.libvirt.driver [instance: 
a1637e8b-6f2d-4127-9799-31cefb3f43a6]   File 
"/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 148, in 
proxy_call
Aug 13 10:03:49.974378 ubuntu-bionic-limestone-regionone-0010083920 
nova-compute[25863]: ERROR nova.virt.libvirt.driver [instance: 
a1637e8b-6f2d-4127-9799-31cefb3f43a6] rv = execute(f, *args, **kwargs)
Aug 13 10:03:49.974378 ubuntu-bionic-limestone-regionone-0010083920 
nova-compute[25863]: ERROR nova.virt.libvirt.driver [instance: 
a1637e8b-6f2d-4127-9799-31cefb3f43a6]   File 
"/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 129, in execute
Aug 13 10:03:49.974378 ubuntu-bionic-limestone-regionone-0010083920 
nova-compute[25863]: ERROR nova.virt.libvirt.driver [instance: 
a1637e8b-6f2d-4127-9799-31cefb3f43a6] six.reraise(c, e, tb)
Aug 13 10:03:49.974378 ubuntu-bionic-limestone-regionone-0010083920 
nova-compute[25863]: ERROR nova.virt.libvirt.driver [instance: 
a1637e8b-6f2d-4127-9799-31cefb3f43a6]   File 
"/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 83, in tworker
Aug 13 10:03:49.974378 ubuntu-bionic-limestone-regionone-0010083920 
nova-compute[25863]: ERROR nova.virt.libvirt.driver [instance: 
a1637e8b-6f2d-4127-9799-31cefb3f43a6] rv = meth(*args, **kwargs)
Aug 13 10:03:49.974378 ubuntu-bionic-limestone-regionone-0010083920 
nova-compute[25863]: ERROR nova.virt.libvirt.driver [instance: 
a1637e8b-6f2d-4127-9799-31cefb3f43a6]   File 
"/usr/local/lib/python2.7/dist-packages/libvirt.py", line 1403, in jobStats
Aug 13 10:03:49.974378 ubuntu-bionic-limestone-regionone-0010083920 
nova-compute[25863]: ERROR nova.virt.libvirt.driver [instance: 
a1637e8b-6f2d-4127-9799-31cefb3f43a6] if ret is None: raise libvirtError 
('virDomainGetJobStats() failed', dom=self)
Aug 13 10:03:49.974378 ubuntu-bionic-limestone-regionone-0010083920 
nova-compute[25863]: ERROR nova.virt.libvirt.driver [instance: 
a1637e8b-6f2d-4127-9799-31cefb3f43a6] libvirtError: Timed out during operation: 
cannot acquire state change lock (held by 
remoteDispatchDomainMigratePerform3Params)
Aug 13 10:03:49.974378 ubuntu-bionic-limestone-regionone-0010083920 

[Yahoo-eng-team] [Bug 1784874] Re: ResourceTracker doesn't clean up compute_nodes or stats entries

2019-08-13 Thread Matt Riedemann
** Also affects: nova/queens
   Importance: Undecided
   Status: New

** Changed in: nova/queens
   Importance: Undecided => Low

** Changed in: nova/queens
   Status: New => Confirmed

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1784874

Title:
  ResourceTracker doesn't clean up compute_nodes or stats entries

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) queens series:
  Confirmed

Bug description:
  This was noted in review:

  https://review.openstack.org/#/c/587636/4/nova/compute/resource_tracker.py@141

  That the ResourceTracker.compute_nodes and ResourceTracker.stats (and
  old_resources) entries only grow and are never cleaned up as we
  rebalance nodes or nodes are deleted, which means it just takes up
  memory over time.

  When we cleanup compute nodes here:

  
https://github.com/openstack/nova/blob/47ef500f4492c731ebfa33a12822ef6b5db4e7e2/nova/compute/manager.py#L7759

  We should probably call a cleanup hook into the ResourceTracker to
  cleanup those entries as well.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1784874/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1840068] Re: (lxc) Instance failed to spawn: TypeError: object of type 'filter' has no len()

2019-08-13 Thread Matt Riedemann
This filter code goes back to 2012 so we could backport the fix further
(to pike and ocata) but no one is really using the libvirt+lxc code as
far as I can tell, at least not with python3, so we can just backport to
the non-extended-maintenance branches unless someone wants to backport
them upstream to pike and ocata.

** Also affects: nova/stein
   Importance: Undecided
   Status: New

** Also affects: nova/queens
   Importance: Undecided
   Status: New

** Also affects: nova/rocky
   Importance: Undecided
   Status: New

** Summary changed:

- (lxc) Instance failed to spawn: TypeError: object of type 'filter' has no 
len()
+ (lxc) Instance failed to spawn: TypeError: object of type 'filter' has no 
len() - python3

** Changed in: nova/queens
   Importance: Undecided => Medium

** Changed in: nova
   Importance: High => Medium

** Changed in: nova/queens
   Status: New => Confirmed

** Changed in: nova/rocky
   Importance: Undecided => Medium

** Changed in: nova/stein
   Status: New => Confirmed

** Changed in: nova/stein
   Importance: Undecided => Medium

** Changed in: nova/rocky
   Status: New => Confirmed

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1840068

Title:
  (lxc) Instance failed to spawn: TypeError: object of type 'filter' has
  no len() - python3

Status in OpenStack Compute (nova):
  In Progress
Status in OpenStack Compute (nova) queens series:
  Confirmed
Status in OpenStack Compute (nova) rocky series:
  Confirmed
Status in OpenStack Compute (nova) stein series:
  Confirmed

Bug description:
  Seen in the nova-lxc CI job here:

  https://logs.opendev.org/24/676024/2/experimental/nova-
  lxc/f9a892c/controller/logs/screen-n-cpu.txt.gz#_Aug_12_23_31_05_043911

  Aug 12 23:31:05.043911 ubuntu-bionic-rax-ord-0010072710 nova-compute[27015]: 
ERROR nova.compute.manager [None req-55d6dd1b-96ca-4afe-9a0c-cec902d3bd87 
tempest-ServerAddressesTestJSON-1311986476 
tempest-ServerAddressesTestJSON-1311986476] [instance: 
842a9704-3700-42ef-adb3-b038ca525127] Instance failed to spawn: TypeError: 
object of type 'filter' has no len()
  Aug 12 23:31:05.043911 ubuntu-bionic-rax-ord-0010072710 nova-compute[27015]: 
ERROR nova.compute.manager [instance: 842a9704-3700-42ef-adb3-b038ca525127] 
Traceback (most recent call last):
  Aug 12 23:31:05.043911 ubuntu-bionic-rax-ord-0010072710 nova-compute[27015]: 
ERROR nova.compute.manager [instance: 842a9704-3700-42ef-adb3-b038ca525127]   
File "/opt/stack/nova/nova/compute/manager.py", line 2495, in _build_resources
  Aug 12 23:31:05.043911 ubuntu-bionic-rax-ord-0010072710 nova-compute[27015]: 
ERROR nova.compute.manager [instance: 842a9704-3700-42ef-adb3-b038ca525127] 
yield resources
  Aug 12 23:31:05.043911 ubuntu-bionic-rax-ord-0010072710 nova-compute[27015]: 
ERROR nova.compute.manager [instance: 842a9704-3700-42ef-adb3-b038ca525127]   
File "/opt/stack/nova/nova/compute/manager.py", line 2256, in 
_build_and_run_instance
  Aug 12 23:31:05.043911 ubuntu-bionic-rax-ord-0010072710 nova-compute[27015]: 
ERROR nova.compute.manager [instance: 842a9704-3700-42ef-adb3-b038ca525127] 
block_device_info=block_device_info)
  Aug 12 23:31:05.043911 ubuntu-bionic-rax-ord-0010072710 nova-compute[27015]: 
ERROR nova.compute.manager [instance: 842a9704-3700-42ef-adb3-b038ca525127]   
File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 3231, in spawn
  Aug 12 23:31:05.043911 ubuntu-bionic-rax-ord-0010072710 nova-compute[27015]: 
ERROR nova.compute.manager [instance: 842a9704-3700-42ef-adb3-b038ca525127] 
destroy_disks_on_failure=True)
  Aug 12 23:31:05.043911 ubuntu-bionic-rax-ord-0010072710 nova-compute[27015]: 
ERROR nova.compute.manager [instance: 842a9704-3700-42ef-adb3-b038ca525127]   
File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 5823, in 
_create_domain_and_network
  Aug 12 23:31:05.043911 ubuntu-bionic-rax-ord-0010072710 nova-compute[27015]: 
ERROR nova.compute.manager [instance: 842a9704-3700-42ef-adb3-b038ca525127] 
destroy_disks_on_failure)
  Aug 12 23:31:05.043911 ubuntu-bionic-rax-ord-0010072710 nova-compute[27015]: 
ERROR nova.compute.manager [instance: 842a9704-3700-42ef-adb3-b038ca525127]   
File "/usr/local/lib/python3.6/dist-packages/oslo_utils/excutils.py", line 220, 
in __exit__
  Aug 12 23:31:05.043911 ubuntu-bionic-rax-ord-0010072710 nova-compute[27015]: 
ERROR nova.compute.manager [instance: 842a9704-3700-42ef-adb3-b038ca525127] 
self.force_reraise()
  Aug 12 23:31:05.043911 ubuntu-bionic-rax-ord-0010072710 nova-compute[27015]: 
ERROR nova.compute.manager [instance: 842a9704-3700-42ef-adb3-b038ca525127]   
File "/usr/local/lib/python3.6/dist-packages/oslo_utils/excutils.py", line 196, 
in force_reraise
  Aug 12 23:31:05.043911 ubuntu-bionic-rax-ord-0010072710 nova-compute[27015]: 
ERROR nova.compute.manager [instance: 842a9704-3700-42ef-adb3-b038ca525127] 

[Yahoo-eng-team] [Bug 1840068] [NEW] (lxc) Instance failed to spawn: TypeError: object of type 'filter' has no len()

2019-08-13 Thread Matt Riedemann
Public bug reported:

Seen in the nova-lxc CI job here:

https://logs.opendev.org/24/676024/2/experimental/nova-
lxc/f9a892c/controller/logs/screen-n-cpu.txt.gz#_Aug_12_23_31_05_043911

Aug 12 23:31:05.043911 ubuntu-bionic-rax-ord-0010072710 nova-compute[27015]: 
ERROR nova.compute.manager [None req-55d6dd1b-96ca-4afe-9a0c-cec902d3bd87 
tempest-ServerAddressesTestJSON-1311986476 
tempest-ServerAddressesTestJSON-1311986476] [instance: 
842a9704-3700-42ef-adb3-b038ca525127] Instance failed to spawn: TypeError: 
object of type 'filter' has no len()
Aug 12 23:31:05.043911 ubuntu-bionic-rax-ord-0010072710 nova-compute[27015]: 
ERROR nova.compute.manager [instance: 842a9704-3700-42ef-adb3-b038ca525127] 
Traceback (most recent call last):
Aug 12 23:31:05.043911 ubuntu-bionic-rax-ord-0010072710 nova-compute[27015]: 
ERROR nova.compute.manager [instance: 842a9704-3700-42ef-adb3-b038ca525127]   
File "/opt/stack/nova/nova/compute/manager.py", line 2495, in _build_resources
Aug 12 23:31:05.043911 ubuntu-bionic-rax-ord-0010072710 nova-compute[27015]: 
ERROR nova.compute.manager [instance: 842a9704-3700-42ef-adb3-b038ca525127] 
yield resources
Aug 12 23:31:05.043911 ubuntu-bionic-rax-ord-0010072710 nova-compute[27015]: 
ERROR nova.compute.manager [instance: 842a9704-3700-42ef-adb3-b038ca525127]   
File "/opt/stack/nova/nova/compute/manager.py", line 2256, in 
_build_and_run_instance
Aug 12 23:31:05.043911 ubuntu-bionic-rax-ord-0010072710 nova-compute[27015]: 
ERROR nova.compute.manager [instance: 842a9704-3700-42ef-adb3-b038ca525127] 
block_device_info=block_device_info)
Aug 12 23:31:05.043911 ubuntu-bionic-rax-ord-0010072710 nova-compute[27015]: 
ERROR nova.compute.manager [instance: 842a9704-3700-42ef-adb3-b038ca525127]   
File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 3231, in spawn
Aug 12 23:31:05.043911 ubuntu-bionic-rax-ord-0010072710 nova-compute[27015]: 
ERROR nova.compute.manager [instance: 842a9704-3700-42ef-adb3-b038ca525127] 
destroy_disks_on_failure=True)
Aug 12 23:31:05.043911 ubuntu-bionic-rax-ord-0010072710 nova-compute[27015]: 
ERROR nova.compute.manager [instance: 842a9704-3700-42ef-adb3-b038ca525127]   
File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 5823, in 
_create_domain_and_network
Aug 12 23:31:05.043911 ubuntu-bionic-rax-ord-0010072710 nova-compute[27015]: 
ERROR nova.compute.manager [instance: 842a9704-3700-42ef-adb3-b038ca525127] 
destroy_disks_on_failure)
Aug 12 23:31:05.043911 ubuntu-bionic-rax-ord-0010072710 nova-compute[27015]: 
ERROR nova.compute.manager [instance: 842a9704-3700-42ef-adb3-b038ca525127]   
File "/usr/local/lib/python3.6/dist-packages/oslo_utils/excutils.py", line 220, 
in __exit__
Aug 12 23:31:05.043911 ubuntu-bionic-rax-ord-0010072710 nova-compute[27015]: 
ERROR nova.compute.manager [instance: 842a9704-3700-42ef-adb3-b038ca525127] 
self.force_reraise()
Aug 12 23:31:05.043911 ubuntu-bionic-rax-ord-0010072710 nova-compute[27015]: 
ERROR nova.compute.manager [instance: 842a9704-3700-42ef-adb3-b038ca525127]   
File "/usr/local/lib/python3.6/dist-packages/oslo_utils/excutils.py", line 196, 
in force_reraise
Aug 12 23:31:05.043911 ubuntu-bionic-rax-ord-0010072710 nova-compute[27015]: 
ERROR nova.compute.manager [instance: 842a9704-3700-42ef-adb3-b038ca525127] 
six.reraise(self.type_, self.value, self.tb)
Aug 12 23:31:05.043911 ubuntu-bionic-rax-ord-0010072710 nova-compute[27015]: 
ERROR nova.compute.manager [instance: 842a9704-3700-42ef-adb3-b038ca525127]   
File "/usr/local/lib/python3.6/dist-packages/six.py", line 693, in reraise
Aug 12 23:31:05.043911 ubuntu-bionic-rax-ord-0010072710 nova-compute[27015]: 
ERROR nova.compute.manager [instance: 842a9704-3700-42ef-adb3-b038ca525127] 
raise value
Aug 12 23:31:05.043911 ubuntu-bionic-rax-ord-0010072710 nova-compute[27015]: 
ERROR nova.compute.manager [instance: 842a9704-3700-42ef-adb3-b038ca525127]   
File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 5789, in 
_create_domain_and_network
Aug 12 23:31:05.043911 ubuntu-bionic-rax-ord-0010072710 nova-compute[27015]: 
ERROR nova.compute.manager [instance: 842a9704-3700-42ef-adb3-b038ca525127] 
block_device_info):
Aug 12 23:31:05.043911 ubuntu-bionic-rax-ord-0010072710 nova-compute[27015]: 
ERROR nova.compute.manager [instance: 842a9704-3700-42ef-adb3-b038ca525127]   
File "/usr/lib/python3.6/contextlib.py", line 81, in __enter__
Aug 12 23:31:05.043911 ubuntu-bionic-rax-ord-0010072710 nova-compute[27015]: 
ERROR nova.compute.manager [instance: 842a9704-3700-42ef-adb3-b038ca525127] 
return next(self.gen)
Aug 12 23:31:05.043911 ubuntu-bionic-rax-ord-0010072710 nova-compute[27015]: 
ERROR nova.compute.manager [instance: 842a9704-3700-42ef-adb3-b038ca525127]   
File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 5701, in 
_lxc_disk_handler
Aug 12 23:31:05.043911 ubuntu-bionic-rax-ord-0010072710 nova-compute[27015]: 
ERROR nova.compute.manager [instance: 842a9704-3700-42ef-adb3-b038ca525127] 
block_device_info)
Aug 12 

[Yahoo-eng-team] [Bug 1839961] Re: Test tempest.api.compute.servers.test_novnc.NoVNCConsoleTestJSON.test_novnc failing often

2019-08-13 Thread Matt Riedemann
*** This bug is a duplicate of bug 1669468 ***
https://bugs.launchpad.net/bugs/1669468

** This bug has been marked a duplicate of bug 1669468
   tempest.api.compute.servers.test_novnc.NoVNCConsoleTestJSON.test_novnc fails 
intermittently in neutron multinode nv job

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1839961

Title:
  Test
  tempest.api.compute.servers.test_novnc.NoVNCConsoleTestJSON.test_novnc
  failing often

Status in OpenStack Compute (nova):
  New

Bug description:
  I see that Tempest API test 
tempest.api.compute.servers.test_novnc.NoVNCConsoleTestJSON.test_novnc is 
failing quite often on tempest-multinode-full and tempest-multinode-full-py3 
jobs.
  Example: 
https://logs.opendev.org/12/672612/4/check/tempest-multinode-full-py3/72623e0/testr_results.html.gz

  Logstash query which I used to find other occurrences:
  
http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22AssertionError%3A%20True%20is%20not%20false%20%3A%20Token%20must%20be%20invalid%20because%20the%20connection%20closed.%5C%22

  I found 61 entries in last 7 days.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1839961/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1839853] [NEW] Misuse of nova.objects.base.obj_equal_prims in tests

2019-08-12 Thread Matt Riedemann
Public bug reported:

There are some tests, mostly related to BuildRequest objects, that are
calling nova.objects.base.obj_equal_prims which does not assert
anything, it only returns True or False - the test code itself must
assert the expected result of the obj_equal_prims method.

https://github.com/openstack/nova/blob/ab34c941be28f3486cd2699af8d9237e9edac351/nova/tests/functional/db/test_build_request.py

https://github.com/openstack/nova/blob/d89579a66ac38fd1e30cea55306e6e7b69bab5b9/nova/tests/unit/objects/test_build_request.py

** Affects: nova
 Importance: Medium
 Status: Confirmed


** Tags: low-hanging-fruit testing

** Changed in: nova
   Status: New => Confirmed

** Tags added: testing

** Changed in: nova
   Importance: Undecided => Medium

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1839853

Title:
  Misuse of nova.objects.base.obj_equal_prims in tests

Status in OpenStack Compute (nova):
  Confirmed

Bug description:
  There are some tests, mostly related to BuildRequest objects, that are
  calling nova.objects.base.obj_equal_prims which does not assert
  anything, it only returns True or False - the test code itself must
  assert the expected result of the obj_equal_prims method.

  
https://github.com/openstack/nova/blob/ab34c941be28f3486cd2699af8d9237e9edac351/nova/tests/functional/db/test_build_request.py

  
https://github.com/openstack/nova/blob/d89579a66ac38fd1e30cea55306e6e7b69bab5b9/nova/tests/unit/objects/test_build_request.py

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1839853/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1839674] Re: ResourceTracker.compute_nodes won't try to create a ComputeNode a second time if the first create() fails

2019-08-09 Thread Matt Riedemann
** Also affects: nova/ocata
   Importance: Undecided
   Status: New

** Also affects: nova/rocky
   Importance: Undecided
   Status: New

** Also affects: nova/stein
   Importance: Undecided
   Status: New

** Also affects: nova/pike
   Importance: Undecided
   Status: New

** Also affects: nova/queens
   Importance: Undecided
   Status: New

** Changed in: nova/ocata
   Status: New => Triaged

** Changed in: nova/pike
   Status: New => Triaged

** Changed in: nova/queens
   Status: New => Triaged

** Changed in: nova/stein
   Status: New => Triaged

** Changed in: nova/pike
   Importance: Undecided => Medium

** Changed in: nova/rocky
   Importance: Undecided => Medium

** Changed in: nova/queens
   Importance: Undecided => Medium

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1839674

Title:
  ResourceTracker.compute_nodes won't try to create a ComputeNode a
  second time if the first create() fails

Status in OpenStack Compute (nova):
  Triaged
Status in OpenStack Compute (nova) ocata series:
  Triaged
Status in OpenStack Compute (nova) pike series:
  Triaged
Status in OpenStack Compute (nova) queens series:
  Triaged
Status in OpenStack Compute (nova) rocky series:
  New
Status in OpenStack Compute (nova) stein series:
  Triaged

Bug description:
  I found this while writing a functional recreate test for bug 1839560.

  As of this change in Ocata:

  
https://github.com/openstack/nova/commit/1c967593fbb0ab8b9dc8b0b509e388591d32f537

  The ResourceTracker.compute_nodes dict will store the ComputeNode
  object *before* trying to create it:

  
https://github.com/openstack/nova/blob/6b7d0caad86fe32ffc49a8672de1eb7258f3b919/nova/compute/resource_tracker.py#L570-L571

  The problem is if ComputeNode.create() fails for whatever reason, the
  next run through update_available_resource won't try to create the
  ComputeNode again because of this:

  
https://github.com/openstack/nova/blob/6b7d0caad86fe32ffc49a8672de1eb7258f3b919/nova/compute/resource_tracker.py#L546

  And eventually you get errors like this:

  b'2019-08-09 17:02:59,356 ERROR [nova.compute.manager] Error updating 
resources for node node2.'
  b'Traceback (most recent call last):'
  b'  File "/home/osboxes/git/nova/nova/compute/manager.py", line 8250, in 
_update_available_resource_for_node'
  b'startup=startup)'
  b'  File "/home/osboxes/git/nova/nova/compute/resource_tracker.py", line 
715, in update_available_resource'
  b'self._update_available_resource(context, resources, 
startup=startup)'
  b'  File 
"/home/osboxes/git/nova/.tox/functional-py36/lib/python3.6/site-packages/oslo_concurrency/lockutils.py",
 line 328, in inner'
  b'return f(*args, **kwargs)'
  b'  File "/home/osboxes/git/nova/nova/compute/resource_tracker.py", line 
796, in _update_available_resource'
  b'self._update(context, cn, startup=startup)'
  b'  File "/home/osboxes/git/nova/nova/compute/resource_tracker.py", line 
1052, in _update'
  b'self.old_resources[nodename] = old_compute'
  b'  File 
"/home/osboxes/git/nova/.tox/functional-py36/lib/python3.6/site-packages/oslo_utils/excutils.py",
 line 220, in __exit__'
  b'self.force_reraise()'
  b'  File 
"/home/osboxes/git/nova/.tox/functional-py36/lib/python3.6/site-packages/oslo_utils/excutils.py",
 line 196, in force_reraise'
  b'six.reraise(self.type_, self.value, self.tb)'
  b'  File 
"/home/osboxes/git/nova/.tox/functional-py36/lib/python3.6/site-packages/six.py",
 line 693, in reraise'
  b'raise value'
  b'  File "/home/osboxes/git/nova/nova/compute/resource_tracker.py", line 
1046, in _update'
  b'compute_node.save()'
  b'  File 
"/home/osboxes/git/nova/.tox/functional-py36/lib/python3.6/site-packages/oslo_versionedobjects/base.py",
 line 226, in wrapper'
  b'return fn(self, *args, **kwargs)'
  b'  File "/home/osboxes/git/nova/nova/objects/compute_node.py", line 352, 
in save'
  b'db_compute = db.compute_node_update(self._context, self.id, 
updates)'
  b'  File 
"/home/osboxes/git/nova/.tox/functional-py36/lib/python3.6/site-packages/oslo_versionedobjects/base.py",
 line 67, in getter'
  b'self.obj_load_attr(name)'
  b'  File 
"/home/osboxes/git/nova/.tox/functional-py36/lib/python3.6/site-packages/oslo_versionedobjects/base.py",
 line 603, in obj_load_attr'
  b'_("Cannot load \'%s\' in the base class") % attrname)'
  b"NotImplementedError: Cannot load 'id' in the base class"

  We should only map the ComputeNode when we've successfully created it.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1839674/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : 

[Yahoo-eng-team] [Bug 1839674] [NEW] ResourceTracker.compute_nodes won't try to create a ComputeNode a second time if the first create() fails

2019-08-09 Thread Matt Riedemann
Public bug reported:

I found this while writing a functional recreate test for bug 1839560.

As of this change in Ocata:

https://github.com/openstack/nova/commit/1c967593fbb0ab8b9dc8b0b509e388591d32f537

The ResourceTracker.compute_nodes dict will store the ComputeNode object
*before* trying to create it:

https://github.com/openstack/nova/blob/6b7d0caad86fe32ffc49a8672de1eb7258f3b919/nova/compute/resource_tracker.py#L570-L571

The problem is if ComputeNode.create() fails for whatever reason, the
next run through update_available_resource won't try to create the
ComputeNode again because of this:

https://github.com/openstack/nova/blob/6b7d0caad86fe32ffc49a8672de1eb7258f3b919/nova/compute/resource_tracker.py#L546

And eventually you get errors like this:

b'2019-08-09 17:02:59,356 ERROR [nova.compute.manager] Error updating 
resources for node node2.'
b'Traceback (most recent call last):'
b'  File "/home/osboxes/git/nova/nova/compute/manager.py", line 8250, in 
_update_available_resource_for_node'
b'startup=startup)'
b'  File "/home/osboxes/git/nova/nova/compute/resource_tracker.py", line 
715, in update_available_resource'
b'self._update_available_resource(context, resources, startup=startup)'
b'  File 
"/home/osboxes/git/nova/.tox/functional-py36/lib/python3.6/site-packages/oslo_concurrency/lockutils.py",
 line 328, in inner'
b'return f(*args, **kwargs)'
b'  File "/home/osboxes/git/nova/nova/compute/resource_tracker.py", line 
796, in _update_available_resource'
b'self._update(context, cn, startup=startup)'
b'  File "/home/osboxes/git/nova/nova/compute/resource_tracker.py", line 
1052, in _update'
b'self.old_resources[nodename] = old_compute'
b'  File 
"/home/osboxes/git/nova/.tox/functional-py36/lib/python3.6/site-packages/oslo_utils/excutils.py",
 line 220, in __exit__'
b'self.force_reraise()'
b'  File 
"/home/osboxes/git/nova/.tox/functional-py36/lib/python3.6/site-packages/oslo_utils/excutils.py",
 line 196, in force_reraise'
b'six.reraise(self.type_, self.value, self.tb)'
b'  File 
"/home/osboxes/git/nova/.tox/functional-py36/lib/python3.6/site-packages/six.py",
 line 693, in reraise'
b'raise value'
b'  File "/home/osboxes/git/nova/nova/compute/resource_tracker.py", line 
1046, in _update'
b'compute_node.save()'
b'  File 
"/home/osboxes/git/nova/.tox/functional-py36/lib/python3.6/site-packages/oslo_versionedobjects/base.py",
 line 226, in wrapper'
b'return fn(self, *args, **kwargs)'
b'  File "/home/osboxes/git/nova/nova/objects/compute_node.py", line 352, 
in save'
b'db_compute = db.compute_node_update(self._context, self.id, updates)'
b'  File 
"/home/osboxes/git/nova/.tox/functional-py36/lib/python3.6/site-packages/oslo_versionedobjects/base.py",
 line 67, in getter'
b'self.obj_load_attr(name)'
b'  File 
"/home/osboxes/git/nova/.tox/functional-py36/lib/python3.6/site-packages/oslo_versionedobjects/base.py",
 line 603, in obj_load_attr'
b'_("Cannot load \'%s\' in the base class") % attrname)'
b"NotImplementedError: Cannot load 'id' in the base class"

We should only map the ComputeNode when we've successfully created it.

** Affects: nova
 Importance: Medium
 Assignee: Matt Riedemann (mriedem)
 Status: Triaged


** Tags: resource-tracker

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1839674

Title:
  ResourceTracker.compute_nodes won't try to create a ComputeNode a
  second time if the first create() fails

Status in OpenStack Compute (nova):
  Triaged

Bug description:
  I found this while writing a functional recreate test for bug 1839560.

  As of this change in Ocata:

  
https://github.com/openstack/nova/commit/1c967593fbb0ab8b9dc8b0b509e388591d32f537

  The ResourceTracker.compute_nodes dict will store the ComputeNode
  object *before* trying to create it:

  
https://github.com/openstack/nova/blob/6b7d0caad86fe32ffc49a8672de1eb7258f3b919/nova/compute/resource_tracker.py#L570-L571

  The problem is if ComputeNode.create() fails for whatever reason, the
  next run through update_available_resource won't try to create the
  ComputeNode again because of this:

  
https://github.com/openstack/nova/blob/6b7d0caad86fe32ffc49a8672de1eb7258f3b919/nova/compute/resource_tracker.py#L546

  And eventually you get errors like this:

  b'2019-08-09 17:02:59,356 ERROR [nova.compute.manager] Error updating 
resources for node node2.'
  b'Traceback (most recent call last):'
  b'  File "/home/osboxes/git/nova/nova/compute/manager.py", line 8250, in 
_update_available_resource_for_node'
  b'startup=startup)'
  b'  File &q

[Yahoo-eng-team] [Bug 1833278] Re: nova-status upgrade check should fail if db sync has not been performed

2019-08-09 Thread Matt Riedemann
Some related discussion in IRC today:

http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-
nova.2019-08-09.log.html#t2019-08-09T17:21:09

** Changed in: nova
   Status: In Progress => Opinion

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1833278

Title:
  nova-status upgrade check should fail if db sync has not been
  performed

Status in OpenStack Compute (nova):
  Opinion

Bug description:
  When performing an upgrade, the upgrade check is supposed to be run
  after the DB schema syncs and data migration. This should be something
  that is checked by the upgrade check command.

  Steps to reproduce
  ==

  Tested in Queens -> Rocky upgrade.

  Prior to an upgrade, using new code:

  nova-status upgrade check

  Expected results
  

  Command fails, saying DB sync needs to be performed.

  Actual results
  ==

  Command succeeds.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1833278/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1839621] Re: Inappropriate split of transport_url string

2019-08-09 Thread Matt Riedemann
** Changed in: nova
   Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1839621

Title:
  Inappropriate split of transport_url string

Status in OpenStack Compute (nova):
  Invalid

Bug description:
  In /etc/nova/nova.conf line 3085 if your password for messaging
  provider (such as rabbit) contains "#" character then string will be
  splitted inaccurately preventing nova service from starting.

  Steps to reproduce

  1. In /etc/nova/nova.conf set transport url to

  transport_url=rabbit://openstack:test#passw...@controller.host.example.com

  2.  systemctl start openstack-nova-api.service   openstack-nova-
  consoleauth.service openstack-nova-scheduler.service   openstack-nova-
  conductor.service openstack-nova-novncproxy.service

  this will produce:

  Job for openstack-nova-consoleauth.service failed because the control process 
exited with error code. See "systemctl status 
openstack-nova-consoleauth.service" and "journalctl -xe" for details.
  Job for openstack-nova-api.service failed because the control process exited 
with error code. See "systemctl status openstack-nova-api.service" and 
"journalctl -xe" for details.
  Job for openstack-nova-conductor.service failed because the control process 
exited with error code. See "systemctl status openstack-nova-conductor.service" 
and "journalctl -xe" for details.
  Job for openstack-nova-scheduler.service failed because the control process 
exited with error code. See "systemctl status openstack-nova-scheduler.service" 
and "journalctl -xe" for details.

  3. Check journalctl -xe logs and notice:

   nova-conductor[31437]: ValueError: invalid literal for int() with base 10: 
'test'
   systemd[1]: openstack-nova-conductor.service: main process exited, 
code=exited, status=1/FAILURE
   systemd[1]: Failed to start OpenStack Nova Conductor Server.


  Environment:
  OS: CentOS Linux release 7.6.1810 
  kernel: 3.10.0-957.21.3.el7.x86_64

  rpm -qa | grep nova

  python2-novaclient-13.0.1-1.el7.noarch
  openstack-nova-conductor-19.0.1-1.el7.noarch
  openstack-nova-console-19.0.1-1.el7.noarch
  openstack-nova-common-19.0.1-1.el7.noarch
  openstack-nova-novncproxy-19.0.1-1.el7.noarch
  python2-nova-19.0.1-1.el7.noarch
  openstack-nova-api-19.0.1-1.el7.noarch
  openstack-nova-scheduler-19.0.1-1.el7.noarch

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1839621/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1669468] Re: tempest.api.compute.servers.test_novnc.NoVNCConsoleTestJSON.test_novnc fails intermittently in neutron multinode nv job

2019-08-09 Thread Matt Riedemann
Patch here: https://review.opendev.org/#/c/675652/

** Also affects: devstack
   Importance: Undecided
   Status: New

** No longer affects: nova

** Changed in: devstack
   Status: New => In Progress

** Changed in: devstack
   Importance: Undecided => Medium

** Changed in: devstack
 Assignee: (unassigned) => Matt Riedemann (mriedem)

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1669468

Title:
  tempest.api.compute.servers.test_novnc.NoVNCConsoleTestJSON.test_novnc
  fails intermittently in neutron multinode nv job

Status in devstack:
  In Progress

Bug description:
  Example output:

  2017-02-21 06:42:10.010442 | ==
  2017-02-21 06:42:10.010458 | Failed 1 tests - output below:
  2017-02-21 06:42:10.010471 | ==
  2017-02-21 06:42:10.010477 | 
  2017-02-21 06:42:10.010507 | 
tempest.api.compute.servers.test_novnc.NoVNCConsoleTestJSON.test_novnc[id-c640fdff-8ab4-45a4-a5d8-7e6146cbd0dc]
  2017-02-21 06:42:10.010542 | 
---
  2017-02-21 06:42:10.010548 | 
  2017-02-21 06:42:10.010558 | Captured traceback:
  2017-02-21 06:42:10.010569 | ~~~
  2017-02-21 06:42:10.010583 | Traceback (most recent call last):
  2017-02-21 06:42:10.010606 |   File 
"tempest/api/compute/servers/test_novnc.py", line 152, in test_novnc
  2017-02-21 06:42:10.010621 | self._validate_rfb_negotiation()
  2017-02-21 06:42:10.010646 |   File 
"tempest/api/compute/servers/test_novnc.py", line 77, in 
_validate_rfb_negotiation
  2017-02-21 06:42:10.010665 | 'Token must be invalid because the 
connection '
  2017-02-21 06:42:10.010721 |   File 
"/opt/stack/new/tempest/.tox/tempest/local/lib/python2.7/site-packages/unittest2/case.py",
 line 696, in assertFalse
  2017-02-21 06:42:10.010737 | raise self.failureException(msg)
  2017-02-21 06:42:10.010762 | AssertionError: True is not false : Token 
must be invalid because the connection closed.
  2017-02-21 06:42:10.010768 | 
  2017-02-21 06:42:10.010774 | 
  2017-02-21 06:42:10.010785 | Captured pythonlogging:
  2017-02-21 06:42:10.010796 | ~~~
  2017-02-21 06:42:10.010848 | 2017-02-21 06:07:18,545 16286 INFO 
[tempest.lib.common.rest_client] Request (NoVNCConsoleTestJSON:test_novnc): 200 
POST 
https://10.27.33.58:8774/v2.1/servers/82d4d4ca-c263-4ac5-85bc-a33488af5ff5/action
 0.165s
  2017-02-21 06:42:10.010905 | 2017-02-21 06:07:18,545 16286 DEBUG
[tempest.lib.common.rest_client] Request - Headers: {'Accept': 
'application/json', 'X-Auth-Token': '', 'Content-Type': 
'application/json'}
  2017-02-21 06:42:10.010925 | Body: {"os-getVNCConsole": {"type": 
"novnc"}}
  2017-02-21 06:42:10.011109 | Response - Headers: {u'content-type': 
'application/json', 'content-location': 
'https://10.27.33.58:8774/v2.1/servers/82d4d4ca-c263-4ac5-85bc-a33488af5ff5/action',
 u'date': 'Tue, 21 Feb 2017 06:07:18 GMT', u'x-openstack-nova-api-version': 
'2.1', 'status': '200', u'content-length': '121', u'server': 'Apache/2.4.18 
(Ubuntu)', u'connection': 'close', u'openstack-api-version': 'compute 2.1', 
u'vary': 'OpenStack-API-Version,X-OpenStack-Nova-API-Version', 
u'x-compute-request-id': 'req-d9681919-5b5e-4477-b38d-2734b660a099'}
  2017-02-21 06:42:10.011153 | Body: {"console": {"url": 
"http://10.27.33.58:6080/vnc_auto.html?token=f8a52df3-8e0d-4d64-8877-07f607f84b74;,
 "type": "novnc"}}
  2017-02-21 06:42:10.011161 | 
  2017-02-21 06:42:10.011167 | 
  2017-02-21 06:42:10.011172 | 

  
  Full logs at: 
http://logs.openstack.org/38/431038/3/check/gate-tempest-dsvm-neutron-multinode-full-ubuntu-xenial-nv/5e1d485/console.html#_2017-02-21_06_07_18_740230

  This started at 2017-02-21

  The very first change which failed here was
  https://review.openstack.org/#/c/431038/ but is not related to the
  error.

To manage notifications about this bug go to:
https://bugs.launchpad.net/devstack/+bug/1669468/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1839524] Re: resize on same host

2019-08-09 Thread Matt Riedemann
Just because you configure the API to allow resizing to the same host
doesn't mean the scheduler is going to pick the same host, e.g. if the
host the instance is on is already full, or does not have spare capacity
for the new flavor you're resizing *to* then the scheduler will pick
another host. Or if the scheduler weights are configured such that the
scheduler picks another host, etc.

** Changed in: nova
   Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1839524

Title:
  resize on same host

Status in OpenStack Compute (nova):
  Invalid

Bug description:
  resize instance on the same host is not work when there are more than one 
openstack compute .
  allow_resize_to_same_host=True is just work on all in one openstack and after 
adding another computes its not work any more .

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1839524/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1839560] Re: ironic: moving node to maintenance makes it unusable afterwards

2019-08-08 Thread Matt Riedemann
There are some ideas about hard-deleting the compute nodes records when
they (soft) deleted but only if ironic nodes, but that gets messy (and
called from lots of places, like when a nova-compute service record is
deleted), so it's probably easiest to just revert this:

https://review.opendev.org/#/c/571535/

Note you'd also have to revert this to avoid conflicts:

https://review.opendev.org/#/c/611162/

** Also affects: nova/rocky
   Importance: Undecided
   Status: New

** Also affects: nova/stein
   Importance: Undecided
   Status: New

** Changed in: nova/rocky
   Status: New => Confirmed

** Changed in: nova/stein
   Status: New => Confirmed

** Changed in: nova/rocky
   Importance: Undecided => High

** Changed in: nova/stein
   Importance: Undecided => High

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1839560

Title:
  ironic: moving node to maintenance makes it unusable afterwards

Status in OpenStack Compute (nova):
  Triaged
Status in OpenStack Compute (nova) rocky series:
  Confirmed
Status in OpenStack Compute (nova) stein series:
  Confirmed

Bug description:
  If you use the Ironic API to set a node into a maintenance (for
  whatever reason), it will no longer be included in the list of
  available nodes to Nova.

  When Nova refreshes it's resources periodically, it will find that it
  is no longer in the list of available nodes and delete it from the
  database.

  Once you enable the node again and Nova attempts to create the
  ComputeNode again, it fails due to the duplicate UUID in the database,
  because the old record is soft deleted and had the same UUID.

  ref:
  
https://github.com/openstack/nova/commit/9f28727eb75e05e07bad51b6eecce667d09dfb65
  - this made computenode.uuid match the baremetal uuid

  
https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L8304-L8316
  - this soft-deletes the computenode record when it doesn't see it in the list 
of active nodes

  
  traces:
  2019-08-08 17:20:13.921 6379 INFO nova.compute.manager 
[req-c71e5c81-eb34-4f72-a260-6aa7e802f490 - - - - -] Deleting orphan compute 
node 31 hypervisor host is 77788ad5-f1a4-46ac-8132-2d88dbd4e594, nodes are 
set([u'6d556617-2bdc-42b3-a3fe-b9218a1ebf0e', 
u'a634fab2-ecea-4cfa-be09-032dce6eaf51', 
u'2dee290d-ef73-46bc-8fc2-af248841ca12'])
  ...
  2019-08-08 22:21:25.284 82770 WARNING nova.compute.resource_tracker 
[req-a58eb5e2-9be0-4503-bf68-dff32ff87a3a - - - - -] No compute node record for 
ctl1-:77788ad5-f1a4-46ac-8132-2d88dbd4e594: ComputeHostNotFound_Remote: 
Compute host ctl1- could not be found.
  
  Remote error: DBDuplicateEntry (pymysql.err.IntegrityError) (1062, 
u"Duplicate entry '77788ad5-f1a4-46ac-8132-2d88dbd4e594' for key 
'compute_nodes_uuid_idx'")
  

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1839560/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1839515] [NEW] Weird functional test failures hitting neutron API in unrelated resize flows since 8/5

2019-08-08 Thread Matt Riedemann
Public bug reported:

Noticed here:

https://logs.opendev.org/32/634832/43/check/nova-tox-functional-
py36/d4f3be5/testr_results.html.gz

With this test:

nova.tests.functional.notification_sample_tests.test_service.TestServiceUpdateNotificationSampleLatest.test_service_disabled

That's a simple test which disables a service and then asserts there is
a service.update notification, but there is another notification
happening as well:


Traceback (most recent call last):
  File 
"/home/zuul/src/opendev.org/openstack/nova/nova/tests/functional/notification_sample_tests/test_service.py",
 line 122, in test_service_disabled
'uuid': self.service_uuid})
  File 
"/home/zuul/src/opendev.org/openstack/nova/nova/tests/functional/notification_sample_tests/test_service.py",
 line 37, in _verify_notification
base._verify_notification(sample_file_name, replacements, actual)
  File 
"/home/zuul/src/opendev.org/openstack/nova/nova/tests/functional/notification_sample_tests/notification_sample_base.py",
 line 148, in _verify_notification
self.assertEqual(1, len(fake_notifier.VERSIONED_NOTIFICATIONS))
  File 
"/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py36/lib/python3.6/site-packages/testtools/testcase.py",
 line 411, in assertEqual
self.assertThat(observed, matcher, message)
  File 
"/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py36/lib/python3.6/site-packages/testtools/testcase.py",
 line 498, in assertThat
raise mismatch_error
testtools.matchers._impl.MismatchError: 1 != 2

And in the error output, we can see this weird traceback of a resize
revert failure b/c the NeutronFixture isn't being used:

2019-08-07 23:22:23,621 ERROR [nova.network.neutronv2.api] The [neutron] 
section of your nova configuration file must be configured for authentication 
with the networking service endpoint. See the networking service install guide 
for details: https://docs.openstack.org/neutron/latest/install/
2019-08-07 23:22:23,634 ERROR [nova.compute.manager] Setting instance vm_state 
to ERROR
Traceback (most recent call last):
  File "/home/zuul/src/opendev.org/openstack/nova/nova/compute/manager.py", 
line 8656, in _error_out_instance_on_exception
yield
  File "/home/zuul/src/opendev.org/openstack/nova/nova/compute/manager.py", 
line 4830, in _resize_instance
migration_p)
  File 
"/home/zuul/src/opendev.org/openstack/nova/nova/network/neutronv2/api.py", line 
2697, in migrate_instance_start
client = _get_ksa_client(context, admin=True)
  File 
"/home/zuul/src/opendev.org/openstack/nova/nova/network/neutronv2/api.py", line 
215, in _get_ksa_client
auth_plugin = _get_auth_plugin(context, admin=admin)
  File 
"/home/zuul/src/opendev.org/openstack/nova/nova/network/neutronv2/api.py", line 
151, in _get_auth_plugin
_ADMIN_AUTH = _load_auth_plugin(CONF)
  File 
"/home/zuul/src/opendev.org/openstack/nova/nova/network/neutronv2/api.py", line 
82, in _load_auth_plugin
raise neutron_client_exc.Unauthorized(message=err_msg)
neutronclient.common.exceptions.Unauthorized: Unknown auth type: None

According to logstash this started showing up around 8/5:

http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22ERROR%20%5Bnova.network.neutronv2.api%5D%20The%20%5Bneutron%5D%20section%20of%20your%20nova%20configuration%20file%20must%20be%20configured%20for%20authentication%20with%20the%20networking%20service%20endpoint.%5C%22%20AND%20tags%3A%5C%22console%5C%22=7d

Which makes me think this change, which is restarting a compute service
and sleeping in a stub:

https://review.opendev.org/#/c/670393/

Might be screwing up concurrently running tests.

Looking at when that test runs and the ones that fails:

2019-08-07 23:21:54.157918 | ubuntu-bionic | {4}
nova.tests.functional.compute.test_init_host.ComputeManagerInitHostTestCase.test_migrate_disk_and_power_off_crash_finish_revert_migration
[4.063814s] ... ok

2019-08-07 23:25:00.073443 | ubuntu-bionic | {4}
nova.tests.functional.notification_sample_tests.test_service.TestServiceUpdateNotificationSampleLatest.test_service_disabled
[160.155643s] ... FAILED

We can see they are on the same worker process and run at about the same
time.

Furthermore, we can see that
TestServiceUpdateNotificationSampleLatest.test_service_disabled
eventually times out after 160 seconds and this is in the error output:

2019-08-07 23:24:59,911 ERROR [nova.compute.api] An error occurred while 
updating the COMPUTE_STATUS_DISABLED trait on compute node resource providers 
managed by host host1. The trait will be synchronized automatically by the 
compute service when the update_available_resource periodic task runs.
Traceback (most recent call last):
  File "/home/zuul/src/opendev.org/openstack/nova/nova/compute/api.py", line 
5034, in _update_compute_provider_status
self.rpcapi.set_host_enabled(context, service.host, enabled)
  File "/home/zuul/src/opendev.org/openstack/nova/nova/compute/rpcapi.py", line 
996, in set_host_enabled
 

[Yahoo-eng-team] [Bug 1735009] Re: Cannot rebuild baremetal instance when vm_state is ERROR

2019-08-08 Thread Matt Riedemann
** Also affects: nova/ocata
   Importance: Undecided
   Status: New

** Also affects: nova/rocky
   Importance: Undecided
   Status: New

** Also affects: nova/queens
   Importance: Undecided
   Status: New

** Also affects: nova/stein
   Importance: Undecided
   Status: New

** Also affects: nova/pike
   Importance: Undecided
   Status: New

** Tags added: rebuild

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1735009

Title:
  Cannot rebuild baremetal instance when vm_state is ERROR

Status in OpenStack Compute (nova):
  In Progress
Status in OpenStack Compute (nova) ocata series:
  New
Status in OpenStack Compute (nova) pike series:
  New
Status in OpenStack Compute (nova) queens series:
  New
Status in OpenStack Compute (nova) rocky series:
  New
Status in OpenStack Compute (nova) stein series:
  New

Bug description:
  You can rebuild an instance in ERROR since Havana:
  
http://git.openstack.org/cgit/openstack/nova/commit/?id=99c51e34230394cadf0b82e364ea10c38e193979

  This change broke this feature for Ironic since Liberty:
  
http://git.openstack.org/cgit/openstack/nova/commit/?id=ea3967a1fb47297608defd680286fd9051ff5bbe

  The change adds a check for vm_state=ERROR when waiting for baremetal
  instance to be ACTIVE.

  The vm_state is restored to ACTIVE only restored after a successful
  build. This means rebuilding a baremetal instance using the Ironic
  driver is impossible because wait_for_active fails if vm_state=ERROR
  is found.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1735009/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1788527] Re: Redundant instance group lookup during scheduling of move operations

2019-08-08 Thread Matt Riedemann
** Also affects: nova/rocky
   Importance: Undecided
   Status: New

** Changed in: nova/rocky
   Status: New => In Progress

** Changed in: nova/rocky
 Assignee: (unassigned) => Balazs Gibizer (balazs-gibizer)

** Changed in: nova/rocky
   Importance: Undecided => Low

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1788527

Title:
  Redundant instance group lookup during scheduling of move operations

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) rocky series:
  In Progress
Status in OpenStack Compute (nova) stein series:
  Fix Committed

Bug description:
  This change:

  
https://github.com/openstack/nova/commit/459ca56de2366aea53efc9ad3295fdf4ddcd452c

  Added code to the setup_instance_group flow to get the instance group
  fresh so we had the latest hosts for members of the group.

  Then change:

  
https://github.com/openstack/nova/commit/94fd36f0582c5dbcf2b9886da7c7bf986d3ad5d1
  #diff-cbbdc4d7c140314a7e0b2d97ebcd1f9c

  Was added to not persist group hosts/members in the RequestSpec since
  they could be stale after the initial server create. This means when
  we move a server (evacuate, resize, unshelve, live migrate), we get
  the request spec with the group plus the current hosts/members of the
  group. So if the request spec has the group hosts set by the time it
  gets to setup_instance_group, the call in _get_group_details to get
  the group fresh is redundant.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1788527/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1839391] Re: archive_deleted_rows docs and user-facing messages say CONF.api_database.connection

2019-08-07 Thread Matt Riedemann
** Also affects: nova/stein
   Importance: Undecided
   Status: New

** Also affects: nova/queens
   Importance: Undecided
   Status: New

** Also affects: nova/rocky
   Importance: Undecided
   Status: New

** Changed in: nova/queens
   Status: New => Confirmed

** Changed in: nova/stein
   Status: New => Confirmed

** Changed in: nova/rocky
   Status: New => Confirmed

** Changed in: nova/rocky
   Importance: Undecided => Critical

** Changed in: nova/rocky
   Importance: Critical => Low

** Changed in: nova/stein
   Importance: Undecided => Low

** Changed in: nova/queens
   Importance: Undecided => Low

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1839391

Title:
  archive_deleted_rows docs and user-facing messages say
  CONF.api_database.connection

Status in OpenStack Compute (nova):
  In Progress
Status in OpenStack Compute (nova) queens series:
  Confirmed
Status in OpenStack Compute (nova) rocky series:
  Confirmed
Status in OpenStack Compute (nova) stein series:
  Confirmed

Bug description:
  The docs here:

  https://docs.openstack.org/nova/latest/cli/nova-manage.html

  and error message here:

  
https://github.com/openstack/nova/blob/af40e3d1a67c8542683368fd6927ac9c0363a3b8/nova/cmd/manage.py#L526

  Those are talking about a variable in code and should be saying
  something like [api_database]/connection instead.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1839391/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1839391] [NEW] archive_deleted_rows docs and user-facing messages say CONF.api_database.connection

2019-08-07 Thread Matt Riedemann
Public bug reported:

The docs here:

https://docs.openstack.org/nova/latest/cli/nova-manage.html

and error message here:

https://github.com/openstack/nova/blob/af40e3d1a67c8542683368fd6927ac9c0363a3b8/nova/cmd/manage.py#L526

Those are talking about a variable in code and should be saying
something like [api_database]/connection instead.

** Affects: nova
 Importance: Low
 Assignee: Matt Riedemann (mriedem)
 Status: In Progress


** Tags: docs nova-manage

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1839391

Title:
  archive_deleted_rows docs and user-facing messages say
  CONF.api_database.connection

Status in OpenStack Compute (nova):
  In Progress

Bug description:
  The docs here:

  https://docs.openstack.org/nova/latest/cli/nova-manage.html

  and error message here:

  
https://github.com/openstack/nova/blob/af40e3d1a67c8542683368fd6927ac9c0363a3b8/nova/cmd/manage.py#L526

  Those are talking about a variable in code and should be saying
  something like [api_database]/connection instead.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1839391/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1839360] Re: nova-compute fails with DBNotAllowed error

2019-08-07 Thread Matt Riedemann
https://review.opendev.org/#/q/Icddbe4760eaff30e4e13c1e8d3d5d3f489dac3c4
goes back to stable/rocky so this should go back that far as well.

** Changed in: nova
   Importance: Undecided => Medium

** Also affects: nova/rocky
   Importance: Undecided
   Status: New

** Also affects: nova/stein
   Importance: Undecided
   Status: New

** Changed in: nova/rocky
   Status: New => Confirmed

** Changed in: nova/rocky
   Importance: Undecided => Medium

** Changed in: nova/stein
   Importance: Undecided => Medium

** Changed in: nova/stein
   Status: New => Confirmed

** Tags added: docs serviceability

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1839360

Title:
  nova-compute fails with DBNotAllowed error

Status in OpenStack Compute (nova):
  In Progress
Status in OpenStack Compute (nova) rocky series:
  Confirmed
Status in OpenStack Compute (nova) stein series:
  Confirmed

Bug description:
  Description
  ===

  During routine operations or things like running regular tempest checks 
nova-compute tries to reach database and fails with DBNotAllowed error:
  
https://logs.opendev.org/33/660333/10/check/openstack-ansible-deploy-aio_metal-ubuntu-bionic/97d8bc3/logs/host/nova-compute.service.journal-23-20-40.log.txt.gz#_Aug_06_22_51_25

  Steps to reproduce
  ==

  This might be reproduced with deploying all nova components (like api,
  scheduler, conductor, compute) on the same host (OSA all-in-one
  deployment). During such setup single configuration file is used
  (nova.conf).

  As a solution it's possible to log more helpful information why this
  happens and add some description into docs.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1839360/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1838811] Re: /opt/stack/devstack/tools/outfilter.py failing in neutron functional jobs since 8/2

2019-08-05 Thread Matt Riedemann
** No longer affects: devstack

** Changed in: neutron
   Importance: Undecided => Critical

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1838811

Title:
  /opt/stack/devstack/tools/outfilter.py failing in neutron functional
  jobs since 8/2

Status in neutron:
  Fix Released

Bug description:
  Seen here:

  https://logs.opendev.org/86/673486/4/gate/neutron-functional-
  python27/c3fe4df/ara-report/result/28d8d223-313a-49ba-b8aa-
  8af15fdda973/

  ++ ./stack.sh:main:500  :   
/opt/stack/devstack/tools/outfilter.py -v --no-timestamp -o 
/opt/stack/logs/devstacklog.txt.2019-08-02-160322
  Traceback (most recent call last):
File "/opt/stack/devstack/tools/outfilter.py", line 104, in 
  sys.exit(main())
File "/opt/stack/devstack/tools/outfilter.py", line 61, in main
  outfile = open(opts.outfile, 'ab', 0)
  IOError: [Errno 13] Permission denied: 
'/opt/stack/logs/devstacklog.txt.2019-08-02-160322'

  Looks like it's a result of:

  https://review.opendev.org/#/c/203698/

  Based on logstash data of that failure:

  
http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22IOError%3A%20%5BErrno%2013%5D%20Permission%20denied%3A%20'%2Fopt%2Fstack%2Flogs%2Fdevstacklog.txt%5C%22%20AND%20tags%3A%5C%22console%5C%22=7d

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1838811/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1838819] [NEW] Docs needed for tunables at large scale

2019-08-02 Thread Matt Riedemann
Public bug reported:

Various things come up in IRC every once in a while about configuration
options that need to be tweaked at large scale (blizzard, cern, etc)
which once you hit hundreds or thousands of compute nodes need to be
changed to avoid killing the control plane.

One such option is this:

https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.heal_instance_info_cache_interval

>From a blizzard operator:

(3:04:18 PM) eandersson: mriedem, we had to set heal_instance_info_cache high 
because it was killing our control plane
(3:05:41 PM) eandersson: It was getting real heavy on large sites with 1k nodes
(3:06:26 PM) eandersson: We also ended up adding a variance

Similarly, CERN had to totally disable this one:

https://docs.openstack.org/nova/latest/configuration/config.html#compute.resource_provider_association_refresh

And rely on SIGHUP / restart of the service if they needed to refresh
that cache.

We should put these things in the admin docs as we come across them so
we don't forget about this stuff when new operators/users come along and
hit scaling issues.

** Affects: nova
 Importance: Undecided
 Status: New


** Tags: docs performance

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1838819

Title:
  Docs needed for tunables at large scale

Status in OpenStack Compute (nova):
  New

Bug description:
  Various things come up in IRC every once in a while about
  configuration options that need to be tweaked at large scale
  (blizzard, cern, etc) which once you hit hundreds or thousands of
  compute nodes need to be changed to avoid killing the control plane.

  One such option is this:

  
https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.heal_instance_info_cache_interval

  From a blizzard operator:

  (3:04:18 PM) eandersson: mriedem, we had to set heal_instance_info_cache high 
because it was killing our control plane
  (3:05:41 PM) eandersson: It was getting real heavy on large sites with 1k 
nodes
  (3:06:26 PM) eandersson: We also ended up adding a variance

  Similarly, CERN had to totally disable this one:

  
https://docs.openstack.org/nova/latest/configuration/config.html#compute.resource_provider_association_refresh

  And rely on SIGHUP / restart of the service if they needed to refresh
  that cache.

  We should put these things in the admin docs as we come across them so
  we don't forget about this stuff when new operators/users come along
  and hit scaling issues.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1838819/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1838817] [NEW] neutron: refreshing vif model for a server gets the same network multiple times

2019-08-02 Thread Matt Riedemann
Public bug reported:

As of this change in Rocky:

https://review.opendev.org/#/c/585339/

When refreshing the vif model for an instance, e.g. when we get a
network-changed event with a specific port ID:

https://logs.opendev.org/26/674326/2/experimental/nova-osprofiler-
redis/899a204/controller/logs/screen-n-cpu.txt.gz#_Aug_02_18_35_50_884613

Aug 02 18:35:50.884613 ubuntu-bionic-vexxhost-sjc1-0009723918 nova-
compute[20428]: DEBUG nova.network.neutronv2.api [req-1a4c2dbf-df86-4044
-a59f-f751a53c5ea6 req-b0e1e2f7-d126-4e55-909d-4803816ca80f service
nova] [instance: 5bbe0419-fbeb-4667-8c56-785fdc1d0a62] Refreshing
network info cache for port 252040d6-4469-46ec-88c3-85e599a43104
{{(pid=20428) _get_instance_nw_info
/opt/stack/nova/nova/network/neutronv2/api.py:1756}}

We get the network for the port multiple times, first here:

https://github.com/openstack/nova/blob/600ecf3d9a5116d040cd18023ff270b91b06247d/nova/network/neutronv2/api.py#L2966

https://github.com/openstack/nova/blob/600ecf3d9a5116d040cd18023ff270b91b06247d/nova/network/neutronv2/api.py#L392

And then we pass that list of 1 network dict to _build_vif_model here:

https://github.com/openstack/nova/blob/600ecf3d9a5116d040cd18023ff270b91b06247d/nova/network/neutronv2/api.py#L2850

and pass it to _nw_info_build_network here:

https://github.com/openstack/nova/blob/600ecf3d9a5116d040cd18023ff270b91b06247d/nova/network/neutronv2/api.py#L2883

Which then calls _get_physnet_tunneled_info which gets the network again
here:

https://github.com/openstack/nova/blob/600ecf3d9a5116d040cd18023ff270b91b06247d/nova/network/neutronv2/api.py#L1904

and/or here:

https://github.com/openstack/nova/blob/600ecf3d9a5116d040cd18023ff270b91b06247d/nova/network/neutronv2/api.py#L1928

Furthermore, when we're doing forced _heal_instance_info_cache (stein+)
we'll refresh the vif model for all ports that are currently attached to
the server:

https://github.com/openstack/nova/blob/600ecf3d9a5116d040cd18023ff270b91b06247d/nova/network/neutronv2/api.py#L3015

And rebuild the vif model per port here:

https://github.com/openstack/nova/blob/600ecf3d9a5116d040cd18023ff270b91b06247d/nova/network/neutronv2/api.py#L3027

If there is more than one port on the same network attached to the
server, we'll be calling show_network for each port even though we're
getting the same data when those ports are on the same network.

I noticed this while checking some osprofiler results and noticed the
network-changed event on the port-targeted refresh took a relatively
long time:

https://logs.opendev.org/26/674326/2/experimental/nova-osprofiler-
redis/899a204/osprofiler-traces/trace-fc50ca23-a6c2-474a-
ac07-e61e706eb27d.html.gz

** Affects: nova
 Importance: Medium
 Assignee: Matt Riedemann (mriedem)
 Status: Triaged

** Affects: nova/rocky
 Importance: Medium
 Status: Confirmed

** Affects: nova/stein
 Importance: Medium
 Status: Confirmed


** Tags: neutron performance

** Changed in: nova
   Status: New => Triaged

** Changed in: nova
   Importance: Undecided => Medium

** Also affects: nova/rocky
   Importance: Undecided
   Status: New

** Also affects: nova/stein
   Importance: Undecided
   Status: New

** Changed in: nova/rocky
   Status: New => Confirmed

** Changed in: nova/stein
   Importance: Undecided => Medium

** Changed in: nova/stein
   Status: New => Confirmed

** Changed in: nova/rocky
   Importance: Undecided => Medium

** Changed in: nova
 Assignee: (unassigned) => Matt Riedemann (mriedem)

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1838817

Title:
  neutron: refreshing vif model for a server gets the same network
  multiple times

Status in OpenStack Compute (nova):
  Triaged
Status in OpenStack Compute (nova) rocky series:
  Confirmed
Status in OpenStack Compute (nova) stein series:
  Confirmed

Bug description:
  As of this change in Rocky:

  https://review.opendev.org/#/c/585339/

  When refreshing the vif model for an instance, e.g. when we get a
  network-changed event with a specific port ID:

  https://logs.opendev.org/26/674326/2/experimental/nova-osprofiler-
  redis/899a204/controller/logs/screen-n-cpu.txt.gz#_Aug_02_18_35_50_884613

  Aug 02 18:35:50.884613 ubuntu-bionic-vexxhost-sjc1-0009723918 nova-
  compute[20428]: DEBUG nova.network.neutronv2.api [req-1a4c2dbf-
  df86-4044-a59f-f751a53c5ea6 req-b0e1e2f7-d126-4e55-909d-4803816ca80f
  service nova] [instance: 5bbe0419-fbeb-4667-8c56-785fdc1d0a62]
  Refreshing network info cache for port 252040d6-4469-46ec-
  88c3-85e599a43104 {{(pid=20428) _get_instance_nw_info
  /opt/stack/nova/nova/network/neutronv2/api.py:1756}}

  We get the network for the port multiple times, first here:

  
https://github.com/openstack/nova/blob/600ecf3d9a5116d040cd18023ff270b91b06247d/nova/network/neutronv2/api.py#L2966


[Yahoo-eng-team] [Bug 1838811] Re: /opt/stack/devstack/tools/outfilter.py failing in neutron functional jobs since 8/2

2019-08-02 Thread Matt Riedemann
** Also affects: neutron
   Importance: Undecided
   Status: New

** Changed in: neutron
   Status: New => Confirmed

** Changed in: devstack
   Status: New => Confirmed

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1838811

Title:
  /opt/stack/devstack/tools/outfilter.py failing in neutron functional
  jobs since 8/2

Status in devstack:
  Confirmed
Status in neutron:
  Confirmed

Bug description:
  Seen here:

  https://logs.opendev.org/86/673486/4/gate/neutron-functional-
  python27/c3fe4df/ara-report/result/28d8d223-313a-49ba-b8aa-
  8af15fdda973/

  ++ ./stack.sh:main:500  :   
/opt/stack/devstack/tools/outfilter.py -v --no-timestamp -o 
/opt/stack/logs/devstacklog.txt.2019-08-02-160322
  Traceback (most recent call last):
File "/opt/stack/devstack/tools/outfilter.py", line 104, in 
  sys.exit(main())
File "/opt/stack/devstack/tools/outfilter.py", line 61, in main
  outfile = open(opts.outfile, 'ab', 0)
  IOError: [Errno 13] Permission denied: 
'/opt/stack/logs/devstacklog.txt.2019-08-02-160322'

  Looks like it's a result of:

  https://review.opendev.org/#/c/203698/

  Based on logstash data of that failure:

  
http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22IOError%3A%20%5BErrno%2013%5D%20Permission%20denied%3A%20'%2Fopt%2Fstack%2Flogs%2Fdevstacklog.txt%5C%22%20AND%20tags%3A%5C%22console%5C%22=7d

To manage notifications about this bug go to:
https://bugs.launchpad.net/devstack/+bug/1838811/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1838807] [NEW] UnsupportedServiceVersion: Service placement has no discoverable version. The resulting Proxy object will only have direct passthrough REST capabilities.

2019-08-02 Thread Matt Riedemann
Public bug reported:

I'm seeing this all over the nova tox functional job console logs since
the placement client code in nova was changed to use the openstacksdk:

https://logs.opendev.org/61/673961/1/gate/nova-tox-functional-
py36/a4cb2af/job-output.txt.gz#_2019-08-01_17_51_24_070487

2019-08-01 17:51:24.070487 | ubuntu-bionic |
b'/home/zuul/src/opendev.org/openstack/nova/.tox/functional-
py36/lib/python3.6/site-packages/openstack/service_description.py:224:
UnsupportedServiceVersion: Service placement has no discoverable
version. The resulting Proxy object will only have direct passthrough
REST capabilities.'

I don't know if this is a nova problem, or an sdk problem, or a
placement problem, but it's chewing up the functional job logs so if
it's external to nova we should add a warnings filter in our tests to
only log this once.

** Affects: nova
 Importance: Medium
 Status: Confirmed


** Tags: placement testing

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1838807

Title:
  UnsupportedServiceVersion: Service placement has no discoverable
  version. The resulting Proxy object will only have direct passthrough
  REST capabilities.

Status in OpenStack Compute (nova):
  Confirmed

Bug description:
  I'm seeing this all over the nova tox functional job console logs
  since the placement client code in nova was changed to use the
  openstacksdk:

  https://logs.opendev.org/61/673961/1/gate/nova-tox-functional-
  py36/a4cb2af/job-output.txt.gz#_2019-08-01_17_51_24_070487

  2019-08-01 17:51:24.070487 | ubuntu-bionic |
  b'/home/zuul/src/opendev.org/openstack/nova/.tox/functional-
  py36/lib/python3.6/site-packages/openstack/service_description.py:224:
  UnsupportedServiceVersion: Service placement has no discoverable
  version. The resulting Proxy object will only have direct passthrough
  REST capabilities.'

  I don't know if this is a nova problem, or an sdk problem, or a
  placement problem, but it's chewing up the functional job logs so if
  it's external to nova we should add a warnings filter in our tests to
  only log this once.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1838807/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1838541] Re: Spurious warnings in compute logs while building/unshelving an instance: Instance cf1dc8a6-48fe-42fd-90a7-d352c58e1454 is not being actively managed by this compute

2019-07-31 Thread Matt Riedemann
Technically this goes back to Pike but I'm not sure we care about fixing
it there at this point since Pike is in Extended Maintenance mode
upstream. Someone can backport it to stable/pike if they care to.

** Also affects: nova/stein
   Importance: Undecided
   Status: New

** Also affects: nova/queens
   Importance: Undecided
   Status: New

** Also affects: nova/rocky
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1838541

Title:
  Spurious warnings in compute logs while building/unshelving an
  instance: Instance cf1dc8a6-48fe-42fd-90a7-d352c58e1454 is not being
  actively managed by this compute host but has allocations referencing
  this compute host: {u'resources': {u'VCPU': 1, u'MEMORY_MB': 64}}.
  Skipping heal of allocation because we do not know what to do.

Status in OpenStack Compute (nova):
  In Progress
Status in OpenStack Compute (nova) queens series:
  Confirmed
Status in OpenStack Compute (nova) rocky series:
  Confirmed
Status in OpenStack Compute (nova) stein series:
  Confirmed

Bug description:
  This warning log from the ResourceTracker is logged quite a bit in CI
  runs:

  
http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22is%20not%20being%20actively%20managed%20by%5C%22%20AND%20tags%3A%5C%22screen-n-cpu.txt%5C%22=7d

  2601 hits in 7 days.

  Looking at one of these the warning shows up while spawning the
  instance during an unshelve operation. This is a possible race for the
  rt.instance_claim call because the instance.host/node are set here:

  
https://github.com/openstack/nova/blob/619c0c676aae5359225c54bc27ce349e138e420e/nova/compute/resource_tracker.py#L208

  before the instance would be added to the rt.tracked_instances dict
  started here:

  
https://github.com/openstack/nova/blob/619c0c676aae5359225c54bc27ce349e138e420e/nova/compute/resource_tracker.py#L217

  If the update_available_resource periodic task runs between those
  times, we'll call _remove_deleted_instances_allocations with the
  instance and it will have allocations on the node, created by the
  scheduler, but may not be in tracked_instances yet so we don't short-
  circuit here:

  
https://github.com/openstack/nova/blob/619c0c676aae5359225c54bc27ce349e138e420e/nova/compute/resource_tracker.py#L1339

  And hit the log condition here:

  
https://github.com/openstack/nova/blob/619c0c676aae5359225c54bc27ce349e138e420e/nova/compute/resource_tracker.py#L1397

  We should probably downgrade that warning to DEBUG if the instance
  task_state is set since clearly the instance is undergoing some state
  transition. We should log the task_state and only log the message as a
  warning if the instance does not have a task_state set but is also not
  tracked on the host.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1838541/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1838541] [NEW] Spurious warnings in compute logs while building/unshelving an instance: Instance cf1dc8a6-48fe-42fd-90a7-d352c58e1454 is not being actively managed by this comput

2019-07-31 Thread Matt Riedemann
Public bug reported:

This warning log from the ResourceTracker is logged quite a bit in CI
runs:

http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22is%20not%20being%20actively%20managed%20by%5C%22%20AND%20tags%3A%5C%22screen-n-cpu.txt%5C%22=7d

2601 hits in 7 days.

Looking at one of these the warning shows up while spawning the instance
during an unshelve operation. This is a possible race for the
rt.instance_claim call because the instance.host/node are set here:

https://github.com/openstack/nova/blob/619c0c676aae5359225c54bc27ce349e138e420e/nova/compute/resource_tracker.py#L208

before the instance would be added to the rt.tracked_instances dict
started here:

https://github.com/openstack/nova/blob/619c0c676aae5359225c54bc27ce349e138e420e/nova/compute/resource_tracker.py#L217

If the update_available_resource periodic task runs between those times,
we'll call _remove_deleted_instances_allocations with the instance and
it will have allocations on the node, created by the scheduler, but may
not be in tracked_instances yet so we don't short-circuit here:

https://github.com/openstack/nova/blob/619c0c676aae5359225c54bc27ce349e138e420e/nova/compute/resource_tracker.py#L1339

And hit the log condition here:

https://github.com/openstack/nova/blob/619c0c676aae5359225c54bc27ce349e138e420e/nova/compute/resource_tracker.py#L1397

We should probably downgrade that warning to DEBUG if the instance
task_state is set since clearly the instance is undergoing some state
transition. We should log the task_state and only log the message as a
warning if the instance does not have a task_state set but is also not
tracked on the host.

** Affects: nova
 Importance: Medium
 Status: Triaged


** Tags: resource-tracker serviceability

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1838541

Title:
  Spurious warnings in compute logs while building/unshelving an
  instance: Instance cf1dc8a6-48fe-42fd-90a7-d352c58e1454 is not being
  actively managed by this compute host but has allocations referencing
  this compute host: {u'resources': {u'VCPU': 1, u'MEMORY_MB': 64}}.
  Skipping heal of allocation because we do not know what to do.

Status in OpenStack Compute (nova):
  Triaged

Bug description:
  This warning log from the ResourceTracker is logged quite a bit in CI
  runs:

  
http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22is%20not%20being%20actively%20managed%20by%5C%22%20AND%20tags%3A%5C%22screen-n-cpu.txt%5C%22=7d

  2601 hits in 7 days.

  Looking at one of these the warning shows up while spawning the
  instance during an unshelve operation. This is a possible race for the
  rt.instance_claim call because the instance.host/node are set here:

  
https://github.com/openstack/nova/blob/619c0c676aae5359225c54bc27ce349e138e420e/nova/compute/resource_tracker.py#L208

  before the instance would be added to the rt.tracked_instances dict
  started here:

  
https://github.com/openstack/nova/blob/619c0c676aae5359225c54bc27ce349e138e420e/nova/compute/resource_tracker.py#L217

  If the update_available_resource periodic task runs between those
  times, we'll call _remove_deleted_instances_allocations with the
  instance and it will have allocations on the node, created by the
  scheduler, but may not be in tracked_instances yet so we don't short-
  circuit here:

  
https://github.com/openstack/nova/blob/619c0c676aae5359225c54bc27ce349e138e420e/nova/compute/resource_tracker.py#L1339

  And hit the log condition here:

  
https://github.com/openstack/nova/blob/619c0c676aae5359225c54bc27ce349e138e420e/nova/compute/resource_tracker.py#L1397

  We should probably downgrade that warning to DEBUG if the instance
  task_state is set since clearly the instance is undergoing some state
  transition. We should log the task_state and only log the message as a
  warning if the instance does not have a task_state set but is also not
  tracked on the host.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1838541/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1819460] Re: instance stuck in BUILD state due to unhandled exceptions in conductor

2019-07-30 Thread Matt Riedemann
Actually ignore comment 15, claim_resources didn't raise
AllocationUpdateFailed until Stein:

https://github.com/openstack/nova/commit/37301f2f278a3702369eec957402e36d53068973

So the bug doesn't apply to Rocky or Queens.

** No longer affects: nova/rocky

** No longer affects: nova/queens

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1819460

Title:
  instance stuck in BUILD state due to unhandled exceptions in conductor

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) stein series:
  Fix Committed

Bug description:
  There are two calls[1][2] during ConductorTaskManager.build_instances,
  used during re-schedule, that could potentially raise exceptions which
  leads to that the instance is stuck in BUILD state instead of going to
  ERROR state.


  [1] 
https://github.com/openstack/nova/blob/892ead1438abc9a8a876209343e6a85c80f0059f/nova/conductor/manager.py#L670
  [2] 
https://github.com/openstack/nova/blob/892ead1438abc9a8a876209343e6a85c80f0059f/nova/conductor/manager.py#L679

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1819460/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1819460] Re: instance stuck in BUILD state due to unhandled exceptions in conductor

2019-07-30 Thread Matt Riedemann
I'll be backporting the non-fill provider mapping part of this to rocky
and queens since the code fix and functional tests related to bug
1837955 rely on changes from the series that fixed this bug.

** Also affects: nova/queens
   Importance: Undecided
   Status: New

** Also affects: nova/rocky
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1819460

Title:
  instance stuck in BUILD state due to unhandled exceptions in conductor

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) queens series:
  Confirmed
Status in OpenStack Compute (nova) rocky series:
  Confirmed
Status in OpenStack Compute (nova) stein series:
  Fix Committed

Bug description:
  There are two calls[1][2] during ConductorTaskManager.build_instances,
  used during re-schedule, that could potentially raise exceptions which
  leads to that the instance is stuck in BUILD state instead of going to
  ERROR state.


  [1] 
https://github.com/openstack/nova/blob/892ead1438abc9a8a876209343e6a85c80f0059f/nova/conductor/manager.py#L670
  [2] 
https://github.com/openstack/nova/blob/892ead1438abc9a8a876209343e6a85c80f0059f/nova/conductor/manager.py#L679

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1819460/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1838389] Re: Nova-compute try to flush wrong device mapper when live migrate VM

2019-07-30 Thread Matt Riedemann
What version of os-brick are you using? There might be fixes in newer
releases of os-brick but you'd have to check the change log probably.
Lee Yarwood might be familiar with any related changes to os-brick as
well.

** Tags added: libvirt live-migration volumes

** Also affects: os-brick
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1838389

Title:
  Nova-compute try to flush wrong device mapper when live migrate VM

Status in OpenStack Compute (nova):
  New
Status in os-brick:
  New

Bug description:
  Description
  ===

  When I live-migrate a VM boot from volume on 3par storage (we're using
  multipath for redundancy), it failed because of nova-compute calling
  os-brick to flush wrong device mapper, that device mapper is belong to
  the volume of another VM that lie on the same compute host.

  
  Environment
  ===

  OpenStack version Rocky

  Hypervisors: Libvirt + KVM

  Multipath version 0.4.9-123.el7.x86_64

  Storage: 3par8440

  Networking: Neutron with OpenVSwitch

  compute_server_1 have 10 VMs, 2 of them is

  VM-1 with UUID 35940aef-cf19-465a-84e7-8aa14da7fe28, 
- boots from volume /dev/vda with wwn 360002ac0031a0002107b
- has a volume attached to /dev/vdb with wwn 
360002ac003190002107b

  VM-2 with UUID b2c3f475-b916-4811-9614-2c81a79868e8,
- boots from volume /dev/vda with wwn 360002ac003130002107b
- has a volume attached to /dev/vdb with wwn 
360002ac001ac0002107b

  Try to live-migrate VM-1 to anther compute host but it's failed
  because os-brick try to flush device mapper with wwn
  360002ac001ac0002107b of VM-2

  I also tried to live migrate some other VMs on this compute_server_1
  but all of that is ok.

  
  Expected result
  ===
  OS-brick flush the right device mapper of the VM.

  Actual result
  =
  OS-brick flush the wrong device mapper of another VM that lied on the same 
compute host of VM live-migrating.

  Logs of nova-compute
  ==

  2019-07-30 14:16:09.293 6 INFO nova.virt.libvirt.driver [-] [instance: 
35940aef-cf19-465a-84e7-8aa14da7fe28] Migration running for 30 secs, memory 0% 
remaining; (bytes processed=20294869659, remaining=298622976, total=34377375744)
  2019-07-30 14:16:09.628 6 INFO nova.compute.manager [-] [instance: 
35940aef-cf19-465a-84e7-8aa14da7fe28] VM Migration completed (Lifecycle Event)
  2019-07-30 14:16:09.760 6 INFO nova.compute.manager 
[req-99b22dd0-8cb2-45d8-b7b7-4241e1ffcfe0 - - - - -] [instance: 
35940aef-cf19-465a-84e7-8aa14da7fe28] During sync_power_state the instance has 
a pending task (migrating). Skip.
  2019-07-30 14:16:10.521 6 WARNING nova.compute.manager 
[req-ea4ac52f-9cac-4d1f-b282-d9e99d76f3d7 f295657702674882b2aab02bd9b15b42 
c7fe4b7c1a824f738fe12e32b31c1650 - default default] [instance: 
35940aef-cf19-465a-84e7-8aa14da7fe28] Received unexpected event 
network-vif-unplugged-883d1c97-164f-4c73-a423-afdd8b6ee0f6 for instance with 
vm_state active and task_state migrating.
  2019-07-30 14:16:11.254 6 INFO nova.virt.libvirt.driver [-] [instance: 
35940aef-cf19-465a-84e7-8aa14da7fe28] Migration operation has completed
  2019-07-30 14:16:11.254 6 INFO nova.compute.manager [-] [instance: 
35940aef-cf19-465a-84e7-8aa14da7fe28] _post_live_migration() is started..
  2019-07-30 14:16:11.319 6 INFO oslo.privsep.daemon [-] Running privsep 
helper: ['sudo', 'nova-rootwrap', '/etc/nova/rootwrap.conf', 'privsep-helper', 
'--config-file', '/usr/share/nova/nova-dist.conf', '--config-file', 
'/etc/nova/nova.conf', '--privsep_context', 'os_brick.privileged.default', 
'--privsep_sock_path', '/tmp/tmpzyR_mV/privsep.sock']
  2019-07-30 14:16:12.131 6 INFO oslo.privsep.daemon [-] Spawned new privsep 
daemon via rootwrap
  2019-07-30 14:16:12.050 260 INFO oslo.privsep.daemon [-] privsep daemon 
starting
  2019-07-30 14:16:12.054 260 INFO oslo.privsep.daemon [-] privsep process 
running with uid/gid: 0/0
  2019-07-30 14:16:12.056 260 INFO oslo.privsep.daemon [-] privsep process 
running with capabilities (eff/prm/inh): CAP_SYS_ADMIN/CAP_SYS_ADMIN/none
  2019-07-30 14:16:12.057 260 INFO oslo.privsep.daemon [-] privsep daemon 
running as pid 260
  2019-07-30 14:16:12.575 6 INFO os_brick.initiator.linuxscsi [-] Find 
Multipath device file for volume WWN 360002ac001ac0002107b
  2019-07-30 14:16:14.065 6 WARNING nova.compute.manager 
[req-e1ecb028-7af8-4d2c-8a3c-10ecbd627337 f295657702674882b2aab02bd9b15b42 
c7fe4b7c1a824f738fe12e32b31c1650 - default default] [instance: 
35940aef-cf19-465a-84e7-8aa14da7fe28] Received unexpected event 
network-vif-plugged-883d1c97-164f-4c73-a423-afdd8b6ee0f6 for instance with 
vm_state active and task_state migrating.
  2019-07-30 14:16:26.253 6 INFO nova.compute.manager [-] [instance: 

[Yahoo-eng-team] [Bug 1781391] Re: cellv2_delete_host when host not found by ComputeNodeList

2019-07-26 Thread Matt Riedemann
** Also affects: nova/queens
   Importance: Undecided
   Status: New

** Changed in: nova/queens
   Status: New => In Progress

** Changed in: nova/queens
   Importance: Undecided => Medium

** Changed in: nova
   Importance: Undecided => Medium

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1781391

Title:
  cellv2_delete_host when host not found by ComputeNodeList

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) queens series:
  In Progress

Bug description:
  Problematic Situation:

  1 check the hosts visible to nova compute
  nova hypervisor-list
  id   hypervisor hostname  state  status
  xx compute2 upenbled
   
  2 check the hosts visible to cellv2
  nova-manage cell_v2 list_hosts
  cell name   cell uuid  hostname
  cell1  compute1
  cell1  compute2
  Here compute1 that actually does not exist (like renamed) still remains in 
cell_mappings

  3 now try to delete host compute1
  nova-manage cell_v2 delete_host --cell_uuid  --host compute1
  then the following error is shown:
  Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/nova/cmd/manage.py", line 1620, in 
main
  ret = fn(*fn_args, **fn_kwargs)
File "/usr/lib/python2.7/site-packages/nova/cmd/manage.py", line 1558, in 
delete_host
  nodes = objects.ComputeNodeList.get_all_by_host(cctxt, host)
File "/usr/lib/python2.7/site-packages/oslo_versionedobjects/base.py", line 
184, in wrapper
  result = fn(cls, context, *args, **kwargs)
File "/usr/lib/python2.7/site-packages/nova/objects/compute_node.py", line 
437, in get_all_by_host
  use_slave=use_slave)
File "/usr/lib/python2.7/site-packages/nova/db/sqlalchemy/api.py", line 
225, in wrapper
  return f(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/nova/objects/compute_node.py", line 
432, in _db_compute_node_get_all_by_host
  return db.compute_node_get_all_by_host(context, host)
File "/usr/lib/python2.7/site-packages/nova/db/api.py", line 297, in 
compute_node_get_all_by_host
  return IMPL.compute_node_get_all_by_host(context, host)
File "/usr/lib/python2.7/site-packages/nova/db/sqlalchemy/api.py", line 
270, in wrapped
  return f(context, *args, **kwargs)
File "/usr/lib/python2.7/site-packages/nova/db/sqlalchemy/api.py", line 
672, in compute_node_get_all_by_host
  raise exception.ComputeHostNotFound(host=host)
  ComputeHostNotFound: Compute host compute1 could not be found.

  Not quite sure the exact way to reproduce it, but I think it would be nicer 
to allow the delete_host
  operation for situations like this.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1781391/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


<    1   2   3   4   5   6   7   8   9   10   >