Public bug reported: Description =========== In an environment where there are multiple compute nodes with ironic driver, when a compute node goes down, another compute node cannot take over ironic nodes.
Steps to reproduce ================== 1. Start multiple compute nodes with ironic driver. 2. Register one node to ironic. 2. Stop a compute node which manages the ironic node. 3. Create an instance. Expected result =============== The instance creation is failed. Actual result ============= The instance is created. Environment =========== 1. Exact version of OpenStack you are running. openstack-nova-scheduler-15.0.6-2.el7.noarch openstack-nova-console-15.0.6-2.el7.noarch python2-novaclient-7.1.0-1.el7.noarch openstack-nova-common-15.0.6-2.el7.noarch openstack-nova-serialproxy-15.0.6-2.el7.noarch openstack-nova-placement-api-15.0.6-2.el7.noarch python-nova-15.0.6-2.el7.noarch openstack-nova-novncproxy-15.0.6-2.el7.noarch openstack-nova-api-15.0.6-2.el7.noarch openstack-nova-conductor-15.0.6-2.el7.noarch 2. Which hypervisor did you use? ironic Details ======= When a nova-compute goes down, another nova-compute will take over ironic nodes managed by the failed nova-compute by re-balancing a hash-ring. Then the active nova-compute tries creating a new resource provider with a new ComputeNode object UUID and the hypervisor name (ironic node name)[1][2][3]. This creation fails with a conflict(409) since there is a resource provider with the same name created by the failed nova-compute. When a new instance is requested, the scheduler gets only an old resource provider for the ironic node[4]. Then, the ironic node is not selected: WARNING nova.scheduler.filters.compute_filter [req- a37d68b5-7ab1-4254-8698-502304607a90 7b55e61a07304f9cab1544260dcd3e41 e21242f450d948d7af2650ac9365ee36 - - -] (compute02, 8904aeeb-a35b-4ba3 -848a-73269fdde4d3) ram: 4096MB disk: 849920MB io_ops: 0 instances: 0 has not been heard from in a while [1] https://github.com/openstack/nova/blob/stable/ocata/nova/compute/resource_tracker.py#L464 [2] https://github.com/openstack/nova/blob/stable/ocata/nova/scheduler/client/report.py#L630 [3] https://github.com/openstack/nova/blob/stable/ocata/nova/scheduler/client/report.py#L410 [4] https://github.com/openstack/nova/blob/stable/ocata/nova/scheduler/filter_scheduler.py#L183 ** Affects: nova Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1714248 Title: Compute node HA for ironic doesn't work due to the name duplication of Resource Provider Status in OpenStack Compute (nova): New Bug description: Description =========== In an environment where there are multiple compute nodes with ironic driver, when a compute node goes down, another compute node cannot take over ironic nodes. Steps to reproduce ================== 1. Start multiple compute nodes with ironic driver. 2. Register one node to ironic. 2. Stop a compute node which manages the ironic node. 3. Create an instance. Expected result =============== The instance creation is failed. Actual result ============= The instance is created. Environment =========== 1. Exact version of OpenStack you are running. openstack-nova-scheduler-15.0.6-2.el7.noarch openstack-nova-console-15.0.6-2.el7.noarch python2-novaclient-7.1.0-1.el7.noarch openstack-nova-common-15.0.6-2.el7.noarch openstack-nova-serialproxy-15.0.6-2.el7.noarch openstack-nova-placement-api-15.0.6-2.el7.noarch python-nova-15.0.6-2.el7.noarch openstack-nova-novncproxy-15.0.6-2.el7.noarch openstack-nova-api-15.0.6-2.el7.noarch openstack-nova-conductor-15.0.6-2.el7.noarch 2. Which hypervisor did you use? ironic Details ======= When a nova-compute goes down, another nova-compute will take over ironic nodes managed by the failed nova-compute by re-balancing a hash-ring. Then the active nova-compute tries creating a new resource provider with a new ComputeNode object UUID and the hypervisor name (ironic node name)[1][2][3]. This creation fails with a conflict(409) since there is a resource provider with the same name created by the failed nova-compute. When a new instance is requested, the scheduler gets only an old resource provider for the ironic node[4]. Then, the ironic node is not selected: WARNING nova.scheduler.filters.compute_filter [req- a37d68b5-7ab1-4254-8698-502304607a90 7b55e61a07304f9cab1544260dcd3e41 e21242f450d948d7af2650ac9365ee36 - - -] (compute02, 8904aeeb-a35b-4ba3 -848a-73269fdde4d3) ram: 4096MB disk: 849920MB io_ops: 0 instances: 0 has not been heard from in a while [1] https://github.com/openstack/nova/blob/stable/ocata/nova/compute/resource_tracker.py#L464 [2] https://github.com/openstack/nova/blob/stable/ocata/nova/scheduler/client/report.py#L630 [3] https://github.com/openstack/nova/blob/stable/ocata/nova/scheduler/client/report.py#L410 [4] https://github.com/openstack/nova/blob/stable/ocata/nova/scheduler/filter_scheduler.py#L183 To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1714248/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp