[Yahoo-eng-team] [Bug 1784983] [NEW] we should not set instance to ERROR state when rebuild_claim faild

Tao Li Wed, 01 Aug 2018 20:41:25 -0700

Public bug reported:

Description
===========
When a compute node is down, we evacaute the instances which locate in this 
compute. In concurrent scenario， serveral instances selecte the same 
destination node. And unfortunately，the memory is not enough for some instance, 
then the destination node raise the ComputeResourcesUnavailable exception, and 
set the instance to error state finally. But I think in 
ComputeResourcesUnavailable excepton, we should not set the instance to error 
state. In fact the instance remains in the source node.


Steps to reproduce
==================
* Create many instances in on source node, and the destination have little 
resource such memory.
* Power off the compute or stop the compute service in this node.
* Concurrently evacuate all instances in source node with specifying the 
destination node. 
* Fortunately， you will find one or more instance in error state.


Expected result
===============
I wonder no instance is in error state when no enough resources.

Actual result
=============
Some instance is in error state .

Environment
===========
P release，But I found the issue also exists in main branch.


Logs & Configs
==============
2018-08-01 16:21:45.739 41514 DEBUG nova.notifications.objects.base 
[req-1710e7e5-9073-47f1-8ae8-1e68c65272c9 855c20651d244348b10c91d907aa59ca - - 
- -] Defaulting the value of the field 'projects' to None in FlavorPayload due 
to 'Cannot call _load_projects on orphaned Flavor object' populate_schema 
/usr/lib/python2.7/site-packages/nova/notifications/objects/base.py:125
2018-08-01 16:21:45.747 41514 ERROR nova.compute.manager 
[req-1710e7e5-9073-47f1-8ae8-1e68c65272c9 855c20651d244348b10c91d907aa59ca - - 
- -] [instance: 5b8ae80d-7e33-4099-8732-905355cee045] Setting instance vm_state 
to ERROR: BuildAbortException: Build of instance 
5b8ae80d-7e33-4099-8732-905355cee045 aborted: Insufficient compute resources: 
Free memory 1141.00 MB < requested 2048 MB.
2018-08-01 16:21:45.747 41514 ERROR nova.compute.manager [instance: 
5b8ae80d-7e33-4099-8732-905355cee045] Traceback (most recent call last):
2018-08-01 16:21:45.747 41514 ERROR nova.compute.manager [instance: 
5b8ae80d-7e33-4099-8732-905355cee045]   File 
"/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 7142, in 
_error_out_instance_on_exception
2018-08-01 16:21:45.747 41514 ERROR nova.compute.manager [instance: 
5b8ae80d-7e33-4099-8732-905355cee045]     yield
2018-08-01 16:21:45.747 41514 ERROR nova.compute.manager [instance: 
5b8ae80d-7e33-4099-8732-905355cee045]   File 
"/usr/lib/python2.7/site-packages/nova/fh/compute/manager.py", line 700, in 
rebuild_instance
2018-08-01 16:21:45.747 41514 ERROR nova.compute.manager [instance: 
5b8ae80d-7e33-4099-8732-905355cee045]     instance_uuid=instance.uuid, 
reason=e.format_message())
2018-08-01 16:21:45.747 41514 ERROR nova.compute.manager [instance: 
5b8ae80d-7e33-4099-8732-905355cee045] BuildAbortException: Build of instance 
5b8ae80d-7e33-4099-8732-905355cee045 aborted: Insufficient compute resources: 
Free memory 1141.00 MB < requested 2048 MB.
2018-08-01 16:21:45.747 41514 ERROR nova.compute.manager [instance: 
5b8ae80d-7e33-4099-8732-905355cee045]

** Affects: nova
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1784983

Title:
  we should not set instance to ERROR state when rebuild_claim faild

Status in OpenStack Compute (nova):
  New

Bug description:
  Description
  ===========
  When a compute node is down, we evacaute the instances which locate in this 
compute. In concurrent scenario， serveral instances selecte the same 
destination node. And unfortunately，the memory is not enough for some instance, 
then the destination node raise the ComputeResourcesUnavailable exception, and 
set the instance to error state finally. But I think in 
ComputeResourcesUnavailable excepton, we should not set the instance to error 
state. In fact the instance remains in the source node.

  Steps to reproduce
  ==================
  * Create many instances in on source node, and the destination have little 
resource such memory.
  * Power off the compute or stop the compute service in this node.
  * Concurrently evacuate all instances in source node with specifying the 
destination node. 
  * Fortunately， you will find one or more instance in error state.

  
  Expected result
  ===============
  I wonder no instance is in error state when no enough resources.

  Actual result
  =============
  Some instance is in error state .

  Environment
  ===========
  P release，But I found the issue also exists in main branch.

  
  Logs & Configs
  ==============
  2018-08-01 16:21:45.739 41514 DEBUG nova.notifications.objects.base 
[req-1710e7e5-9073-47f1-8ae8-1e68c65272c9 855c20651d244348b10c91d907aa59ca - - 
- -] Defaulting the value of the field 'projects' to None in FlavorPayload due 
to 'Cannot call _load_projects on orphaned Flavor object' populate_schema 
/usr/lib/python2.7/site-packages/nova/notifications/objects/base.py:125
  2018-08-01 16:21:45.747 41514 ERROR nova.compute.manager 
[req-1710e7e5-9073-47f1-8ae8-1e68c65272c9 855c20651d244348b10c91d907aa59ca - - 
- -] [instance: 5b8ae80d-7e33-4099-8732-905355cee045] Setting instance vm_state 
to ERROR: BuildAbortException: Build of instance 
5b8ae80d-7e33-4099-8732-905355cee045 aborted: Insufficient compute resources: 
Free memory 1141.00 MB < requested 2048 MB.
  2018-08-01 16:21:45.747 41514 ERROR nova.compute.manager [instance: 
5b8ae80d-7e33-4099-8732-905355cee045] Traceback (most recent call last):
  2018-08-01 16:21:45.747 41514 ERROR nova.compute.manager [instance: 
5b8ae80d-7e33-4099-8732-905355cee045]   File 
"/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 7142, in 
_error_out_instance_on_exception
  2018-08-01 16:21:45.747 41514 ERROR nova.compute.manager [instance: 
5b8ae80d-7e33-4099-8732-905355cee045]     yield
  2018-08-01 16:21:45.747 41514 ERROR nova.compute.manager [instance: 
5b8ae80d-7e33-4099-8732-905355cee045]   File 
"/usr/lib/python2.7/site-packages/nova/fh/compute/manager.py", line 700, in 
rebuild_instance
  2018-08-01 16:21:45.747 41514 ERROR nova.compute.manager [instance: 
5b8ae80d-7e33-4099-8732-905355cee045]     instance_uuid=instance.uuid, 
reason=e.format_message())
  2018-08-01 16:21:45.747 41514 ERROR nova.compute.manager [instance: 
5b8ae80d-7e33-4099-8732-905355cee045] BuildAbortException: Build of instance 
5b8ae80d-7e33-4099-8732-905355cee045 aborted: Insufficient compute resources: 
Free memory 1141.00 MB < requested 2048 MB.
  2018-08-01 16:21:45.747 41514 ERROR nova.compute.manager [instance: 
5b8ae80d-7e33-4099-8732-905355cee045]

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1784983/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to     : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1784983] [NEW] we should not set instance to ERROR state when rebuild_claim faild

Reply via email to