Public bug reported: I had 12 ESX nova-compute cluster with 100 ESX hypervisor. For some reason one of nova-compute node went down. After couple of attempt nova-compute came up fine. But,
1. Nova deleted all the instances running on that particular( esx-compute11) from its DB 2. All the instances were deleted from the backend as well. Filing this bug to track if there is any issue with nova scheduler on ESX setup. Logs: stack@runner:~/nsbu_cqe_openstack/nested$ nova service-list | grep nova-compute | grep esx | 6 | nova-compute | esx-compute2 | nova | enabled | up | 2016-02-03T09:45:15.000000 | - | | 7 | nova-compute | esx-compute1 | nova | enabled | up | 2016-02-03T09:45:17.000000 | - | | 8 | nova-compute | esx-compute4 | nova | enabled | up | 2016-02-03T09:45:18.000000 | - | | 9 | nova-compute | esx-compute3 | nova | enabled | up | 2016-02-03T09:45:21.000000 | - | | 10 | nova-compute | esx-compute8 | nova | enabled | up | 2016-02-03T09:45:20.000000 | - | | 11 | nova-compute | esx-compute7 | nova | enabled | up | 2016-02-03T09:45:19.000000 | - | | 12 | nova-compute | esx-compute12 | nova | enabled | up | 2016-02-03T09:45:19.000000 | - | | 13 | nova-compute | esx-compute5 | nova | enabled | up | 2016-02-03T09:45:19.000000 | - | | 14 | nova-compute | esx-compute9 | nova | enabled | up | 2016-02-03T09:45:17.000000 | - | | 15 | nova-compute | esx-compute6 | nova | enabled | up | 2016-02-03T09:45:19.000000 | - | | 16 | nova-compute | esx-compute10 | nova | enabled | up | 2016-02-03T09:45:20.000000 | - | | 17 | nova-compute | esx-compute11 | nova | enabled | down | 2016-02-03T09:26:53.000000 | - | stack@runner:~/nsbu_cqe_openstack/nested$ stack@controller:~$ sudo netstat -anp | grep 62.24.1.87 tcp6 0 0 62.24.1.111:5672 62.24.1.87:58180 ESTABLISHED 8687/beam.smp tcp6 0 0 62.24.1.111:5672 62.24.1.87:58179 ESTABLISHED 8687/beam.smp stack@controller:~$ 2016-02-03 01:27:03.217 INFO nova.service [-] Starting compute node (version 13.0.0) Traceback (most recent call last): File "/usr/local/lib/python2.7/dist-packages/eventlet/hubs/hub.py", line 457, in fire_timers timer() File "/usr/local/lib/python2.7/dist-packages/eventlet/hubs/timer.py", line 58, in __call__ cb(*args, **kw) File "/usr/local/lib/python2.7/dist-packages/eventlet/greenthread.py", line 214, in main result = function(*args, **kwargs) File "/usr/local/lib/python2.7/dist-packages/oslo_service/service.py", line 671, in run_service service.start() File "/opt/stack/nova/nova/service.py", line 183, in start self.manager.init_host() File "/opt/stack/nova/nova/compute/manager.py", line 1313, in init_host context, self.host, expected_attrs=['info_cache', 'metadata']) File "/usr/local/lib/python2.7/dist-packages/oslo_versionedobjects/base.py", line 172, in wrapper args, kwargs) File "/opt/stack/nova/nova/conductor/rpcapi.py", line 241, in object_class_action_versions args=args, kwargs=kwargs) File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/client.py", line 158, in call retry=self.retry) File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/transport.py", line 90, in _send timeout=timeout, retry=retry) File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 464, in send retry=retry) File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 453, in _send result = self._waiter.wait(msg_id, timeout) File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 336, in wait message = self.waiters.get(msg_id, timeout=timeout) File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 239, in get 'to message ID %s' % msg_id) MessagingTimeout: Timed out waiting for a reply to message ID 5a19ba4d2a694453b5db95fb2f73f9e8 2016-02-03 01:28:58.448 INFO oslo_messaging._drivers.amqpdriver [-] No calling threads waiting for msg_id : 5a19ba4d2a694453b5db95fb2f73f9e8 Logs: M-Release, master branch stack@esx-compute3:/opt/stack/nova$ git log -1 commit 197bd6dd1231f1f57cdd6c0acb1dfbdc3b2b0989 Merge: 1ec0b56 5f5590f Author: Jenkins <jenk...@review.openstack.org> Date: Sun Feb 7 04:08:54 2016 +0000 Merge "libvirt: use osinfo when configuring the disk bus" stack@esx-compute3:/opt/stack/nova$ ** Affects: nova Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1543010 Title: Nova clears DB if ESX nova-compute node restarted Status in OpenStack Compute (nova): New Bug description: I had 12 ESX nova-compute cluster with 100 ESX hypervisor. For some reason one of nova-compute node went down. After couple of attempt nova-compute came up fine. But, 1. Nova deleted all the instances running on that particular( esx-compute11) from its DB 2. All the instances were deleted from the backend as well. Filing this bug to track if there is any issue with nova scheduler on ESX setup. Logs: stack@runner:~/nsbu_cqe_openstack/nested$ nova service-list | grep nova-compute | grep esx | 6 | nova-compute | esx-compute2 | nova | enabled | up | 2016-02-03T09:45:15.000000 | - | | 7 | nova-compute | esx-compute1 | nova | enabled | up | 2016-02-03T09:45:17.000000 | - | | 8 | nova-compute | esx-compute4 | nova | enabled | up | 2016-02-03T09:45:18.000000 | - | | 9 | nova-compute | esx-compute3 | nova | enabled | up | 2016-02-03T09:45:21.000000 | - | | 10 | nova-compute | esx-compute8 | nova | enabled | up | 2016-02-03T09:45:20.000000 | - | | 11 | nova-compute | esx-compute7 | nova | enabled | up | 2016-02-03T09:45:19.000000 | - | | 12 | nova-compute | esx-compute12 | nova | enabled | up | 2016-02-03T09:45:19.000000 | - | | 13 | nova-compute | esx-compute5 | nova | enabled | up | 2016-02-03T09:45:19.000000 | - | | 14 | nova-compute | esx-compute9 | nova | enabled | up | 2016-02-03T09:45:17.000000 | - | | 15 | nova-compute | esx-compute6 | nova | enabled | up | 2016-02-03T09:45:19.000000 | - | | 16 | nova-compute | esx-compute10 | nova | enabled | up | 2016-02-03T09:45:20.000000 | - | | 17 | nova-compute | esx-compute11 | nova | enabled | down | 2016-02-03T09:26:53.000000 | - | stack@runner:~/nsbu_cqe_openstack/nested$ stack@controller:~$ sudo netstat -anp | grep 62.24.1.87 tcp6 0 0 62.24.1.111:5672 62.24.1.87:58180 ESTABLISHED 8687/beam.smp tcp6 0 0 62.24.1.111:5672 62.24.1.87:58179 ESTABLISHED 8687/beam.smp stack@controller:~$ 2016-02-03 01:27:03.217 INFO nova.service [-] Starting compute node (version 13.0.0) Traceback (most recent call last): File "/usr/local/lib/python2.7/dist-packages/eventlet/hubs/hub.py", line 457, in fire_timers timer() File "/usr/local/lib/python2.7/dist-packages/eventlet/hubs/timer.py", line 58, in __call__ cb(*args, **kw) File "/usr/local/lib/python2.7/dist-packages/eventlet/greenthread.py", line 214, in main result = function(*args, **kwargs) File "/usr/local/lib/python2.7/dist-packages/oslo_service/service.py", line 671, in run_service service.start() File "/opt/stack/nova/nova/service.py", line 183, in start self.manager.init_host() File "/opt/stack/nova/nova/compute/manager.py", line 1313, in init_host context, self.host, expected_attrs=['info_cache', 'metadata']) File "/usr/local/lib/python2.7/dist-packages/oslo_versionedobjects/base.py", line 172, in wrapper args, kwargs) File "/opt/stack/nova/nova/conductor/rpcapi.py", line 241, in object_class_action_versions args=args, kwargs=kwargs) File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/client.py", line 158, in call retry=self.retry) File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/transport.py", line 90, in _send timeout=timeout, retry=retry) File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 464, in send retry=retry) File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 453, in _send result = self._waiter.wait(msg_id, timeout) File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 336, in wait message = self.waiters.get(msg_id, timeout=timeout) File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 239, in get 'to message ID %s' % msg_id) MessagingTimeout: Timed out waiting for a reply to message ID 5a19ba4d2a694453b5db95fb2f73f9e8 2016-02-03 01:28:58.448 INFO oslo_messaging._drivers.amqpdriver [-] No calling threads waiting for msg_id : 5a19ba4d2a694453b5db95fb2f73f9e8 Logs: M-Release, master branch stack@esx-compute3:/opt/stack/nova$ git log -1 commit 197bd6dd1231f1f57cdd6c0acb1dfbdc3b2b0989 Merge: 1ec0b56 5f5590f Author: Jenkins <jenk...@review.openstack.org> Date: Sun Feb 7 04:08:54 2016 +0000 Merge "libvirt: use osinfo when configuring the disk bus" stack@esx-compute3:/opt/stack/nova$ To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1543010/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp