Public bug reported: Description =========== If a QEMU process crashed(oom, etc.), libvirt will send an event which say the instance stopped, and in detail say the instance stopped failed. But nova only handle the stop event, it not check the detail.
When event handler receive a stopped event, it will sleep 15s to ensure the event is not sent by a reboot operation. https://github.com/openstack/nova/blob/stable/train/nova/virt/libvirt/host.py#L352 As a result, nova will take a long time to detect the crashed instance. Steps to reproduce ================== 1. Launch a VM 2. Login the compute node, find the corresponding process, and kill the process: "kill -SIGBUS pid" Expected result =============== The nova service can detect the crashed event in second. Actual result ============= Nova need more that 10 seconds to handle the event. Environment =========== 1. OpenStack cluster version master build 2019.11.11 (all-in-one) 2. Hypervisor Libvirt + KVM 3. Storage type Ceph 4. Networking type Neutron with OVS ** Affects: nova Importance: Undecided Status: New ** Tags: libvirt -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1853259 Title: performance gaps on detect crashed instance Status in OpenStack Compute (nova): New Bug description: Description =========== If a QEMU process crashed(oom, etc.), libvirt will send an event which say the instance stopped, and in detail say the instance stopped failed. But nova only handle the stop event, it not check the detail. When event handler receive a stopped event, it will sleep 15s to ensure the event is not sent by a reboot operation. https://github.com/openstack/nova/blob/stable/train/nova/virt/libvirt/host.py#L352 As a result, nova will take a long time to detect the crashed instance. Steps to reproduce ================== 1. Launch a VM 2. Login the compute node, find the corresponding process, and kill the process: "kill -SIGBUS pid" Expected result =============== The nova service can detect the crashed event in second. Actual result ============= Nova need more that 10 seconds to handle the event. Environment =========== 1. OpenStack cluster version master build 2019.11.11 (all-in-one) 2. Hypervisor Libvirt + KVM 3. Storage type Ceph 4. Networking type Neutron with OVS To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1853259/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp