[Yahoo-eng-team] [Bug 1839560] Re: ironic: moving node to maintenance makes it unusable afterwards

Matt Riedemann Thu, 08 Aug 2019 16:21:34 -0700

There are some ideas about hard-deleting the compute nodes records when
they (soft) deleted but only if ironic nodes, but that gets messy (and
called from lots of places, like when a nova-compute service record is
deleted), so it's probably easiest to just revert this:


https://review.opendev.org/#/c/571535/

Note you'd also have to revert this to avoid conflicts:

https://review.opendev.org/#/c/611162/

** Also affects: nova/rocky
   Importance: Undecided
       Status: New

** Also affects: nova/stein
   Importance: Undecided
       Status: New

** Changed in: nova/rocky
       Status: New => Confirmed

** Changed in: nova/stein
       Status: New => Confirmed

** Changed in: nova/rocky
   Importance: Undecided => High

** Changed in: nova/stein
   Importance: Undecided => High

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1839560

Title:
  ironic: moving node to maintenance makes it unusable afterwards

Status in OpenStack Compute (nova):
  Triaged
Status in OpenStack Compute (nova) rocky series:
  Confirmed
Status in OpenStack Compute (nova) stein series:
  Confirmed

Bug description:
  If you use the Ironic API to set a node into a maintenance (for
  whatever reason), it will no longer be included in the list of
  available nodes to Nova.

  When Nova refreshes it's resources periodically, it will find that it
  is no longer in the list of available nodes and delete it from the
  database.

  Once you enable the node again and Nova attempts to create the
  ComputeNode again, it fails due to the duplicate UUID in the database,
  because the old record is soft deleted and had the same UUID.

  ref:
  
https://github.com/openstack/nova/commit/9f28727eb75e05e07bad51b6eecce667d09dfb65
  - this made computenode.uuid match the baremetal uuid

  
https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L8304-L8316
  - this soft-deletes the computenode record when it doesn't see it in the list 
of active nodes

  
  traces:
  2019-08-08 17:20:13.921 6379 INFO nova.compute.manager 
[req-c71e5c81-eb34-4f72-a260-6aa7e802f490 - - - - -] Deleting orphan compute 
node 31 hypervisor host is 77788ad5-f1a4-46ac-8132-2d88dbd4e594, nodes are 
set([u'6d556617-2bdc-42b3-a3fe-b9218a1ebf0e', 
u'a634fab2-ecea-4cfa-be09-032dce6eaf51', 
u'2dee290d-ef73-46bc-8fc2-af248841ca12'])
  ...
  2019-08-08 22:21:25.284 82770 WARNING nova.compute.resource_tracker 
[req-a58eb5e2-9be0-4503-bf68-dff32ff87a3a - - - - -] No compute node record for 
ctl1-xxxx:77788ad5-f1a4-46ac-8132-2d88dbd4e594: ComputeHostNotFound_Remote: 
Compute host ctl1-xxxx could not be found.
  ....
  Remote error: DBDuplicateEntry (pymysql.err.IntegrityError) (1062, 
u"Duplicate entry '77788ad5-f1a4-46ac-8132-2d88dbd4e594' for key 
'compute_nodes_uuid_idx'")
  ....

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1839560/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to     : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1839560] Re: ironic: moving node to maintenance makes it unusable afterwards

Reply via email to