Thanks Swen!

Is this occurring because the agent is only periodically being queried?

When I perform no action, it took around 15 minutes to notice a host offline. 
When I did a force reconnect it detected the host was disconnected, as 
expected. Waiting 15 minutes just seems a bit excessive to notice a host 
failure.

Given this, what I envision is a Zabbix or ping source that can check host 
status and perform a series of API interactions if it fails the external 
monitoring action.

Thanks,
Alex

From: m...@swen.io <m...@swen.io>
Date: Saturday, April 13, 2024 at 11:27 AM
To: users@cloudstack.apache.org <users@cloudstack.apache.org>
Subject: AW: Handling KVM host failure
EXTERNAL

We are monitoring our hosts via Zabbix and take manual actions when a host
fails. If a host is in state "Disconnected" or "Alert" you can declare a
host as degraded via api
(https://urldefense.com/v3/__https://cloudstack.apache.org/api/apidocs-4.19/apis/declareHostAsDegraded.h__;!!P9cq_d3Gyw!mrgz8FCPhtmYu76sUegTjcdgtQRq5RlYJHLminr5_UGzfMzl1yAVXbNlGry56HUPbaT4zrwV-Q$
tml) or UI (icon).

Daniel Salvador (gutoveronezi) also provided a very good explanation on 10th
of April in a response to similar question.

Regards,
Swen

-----Ursprüngliche Nachricht-----
Von: Dietrich, Alex <adietr...@ussignal.com.INVALID>
Gesendet: Freitag, 12. April 2024 17:46
An: users <users@cloudstack.apache.org>
Betreff: Handling KVM host failure


Hello All,

How are folks handling KVM host failure in CloudStack?

For example, when a host has a loss of power or hard power off, CloudStack
takes nearly 15 minutes to detect that the host is offline. This creates a
challenge as VMs are considered to be running in CloudStack during that time
despite being unreachable.

Is there a knob I am missing on speeding up the detection?

Thanks,
Alex


[__tpx__]


Reply via email to