Thanks Swen! Is this occurring because the agent is only periodically being queried?
When I perform no action, it took around 15 minutes to notice a host offline. When I did a force reconnect it detected the host was disconnected, as expected. Waiting 15 minutes just seems a bit excessive to notice a host failure. Given this, what I envision is a Zabbix or ping source that can check host status and perform a series of API interactions if it fails the external monitoring action. Thanks, Alex From: m...@swen.io <m...@swen.io> Date: Saturday, April 13, 2024 at 11:27 AM To: users@cloudstack.apache.org <users@cloudstack.apache.org> Subject: AW: Handling KVM host failure EXTERNAL We are monitoring our hosts via Zabbix and take manual actions when a host fails. If a host is in state "Disconnected" or "Alert" you can declare a host as degraded via api (https://urldefense.com/v3/__https://cloudstack.apache.org/api/apidocs-4.19/apis/declareHostAsDegraded.h__;!!P9cq_d3Gyw!mrgz8FCPhtmYu76sUegTjcdgtQRq5RlYJHLminr5_UGzfMzl1yAVXbNlGry56HUPbaT4zrwV-Q$ tml) or UI (icon). Daniel Salvador (gutoveronezi) also provided a very good explanation on 10th of April in a response to similar question. Regards, Swen -----Ursprüngliche Nachricht----- Von: Dietrich, Alex <adietr...@ussignal.com.INVALID> Gesendet: Freitag, 12. April 2024 17:46 An: users <users@cloudstack.apache.org> Betreff: Handling KVM host failure Hello All, How are folks handling KVM host failure in CloudStack? For example, when a host has a loss of power or hard power off, CloudStack takes nearly 15 minutes to detect that the host is offline. This creates a challenge as VMs are considered to be running in CloudStack during that time despite being unreachable. Is there a knob I am missing on speeding up the detection? Thanks, Alex [__tpx__]