Correct, the user instance and the computing offer have HA enabled. regards,
Murilo Moura On Mon, Apr 15, 2024 at 4:00 AM <m...@swen.io> wrote: > HI Murilo, > > just checking, the user instance you are talking about are using a service > offering with HA enabled, correct? > > Regards, > Swen > > -----Ursprüngliche Nachricht----- > Von: Murilo Moura <a...@bigsys.com.br> > Gesendet: Sonntag, 14. April 2024 06:31 > An: users@cloudstack.apache.org > Betreff: Re: AW: Manual fence KVM Host > > Hello Guto! > > > I carefully checked the instructions that you and Daniel left in this > thread that I opened, but one point is not working and I would like to see > if you have experienced something similar... > > By putting the host in the "Disconnected" state, I can trigger the API to > mark the host as degraded, so far everything is ok. Right after this action > I see that the system VMs are recreated on the node that was active, but > the user instances (user vms) are not recreated. > > Checking the NFS host where the image of this VM is located, I noticed > that using the "qemu-img info" command I cannot read the volume file of > this instance (error: Failed to get shared "write" lock). > > Is there any way to execute unlock or even another parameter that makes > kvm start a VM without locking the volume on the primary storage in NFS? (I > tried to put the NFS storage in version 4, but it still had no effect)... > > > regards, > > Murilo Moura > > > On Wed, Apr 10, 2024 at 2:38 PM Guto Veronezi <gutoveron...@apache.org> > wrote: > > > Hello Murilo, > > > > Complementing Swen's answer, if your host is still up and you can > > manage it, then you could also put your host in maintenance mode in > > ACS. This process will evacuate (migrate to another host) every VM > > from the host (not only the ones that have HA enabled). Is this your > > situation? If not, could you provide more details about your > > configurations and the environment state? > > > > Depending on what you have in your setup, the HA might not work as > > expected. For VMware and XenServer, the process is expected to happen > > at the hypervisor level. For KVM, ACS does not support HA; what ACS > > supports is failover (it is named HA in ACS though) and this process > > will work only when certain criteria are met. Furthermore, we have two > > ways to implement the failover for ACS + KVM: the VM's failover and > > the host's failover. In both cases, when identified that a host > > crashed or a VM suddenly stopped working, ACS will start the VM in > another host. > > > > In ACS + KVM, to work with VM's failover, it is necessary at least one > > NFS primary storage; the KVM Agent of every host writes the heartbeat > > in it. The VM's failover is triggered only if the VM's compute > > offering has the property "Offer HA" enabled OR the global setting > > "force.ha" is enabled. VRs have failover triggered independently of > > the offering of the global setting. In this approach, ACS will check > > the VM state periodically (sending commands to the KVM Agent) and it > > will trigger the failover if the VM meets the previously mentioned > > criteria AND the determined limit (defined by the global settings > > "ping.interval" and > > "ping.timeout") has been elapsed. Bear in mind that, if you lose your > > host, ACS will trigger the failover; however, if you gracefully > > shutdown the KVM Agent or the host, the Agent will send a disconnect > > command to the Management Server and ACS will not check the VM state > > anymore for that host. Therefore, if you lose your host while the > > service is down, the failover will not be triggered. Also, if a host > > loses access to the NFS primary storage used for heartbeat and the VM > > uses some other primary storage, ACS might trigger the failover too. > > As we do not have a STONITH/fencing in this scenario, it is possible > > for the VM to still be running in the host and ACS to try to start it in > another host. > > > > In ACS + KVM, to work with the host's failover, it is necessary to > > configure the host's OOBM (of each host desired to trigger the > > failover) in ACS. In this approach, ACS monitors the Agent's state and > > triggers the failover in case it cannot establish the connection > > again. In this scenario, ACS will shut down the host via OOBM and will > > start the VMs in another host; therefore, it is not dependent on an NFS > primary storage. > > This behavior is driven by the "kvm.ha.*" global settings. > > Furthermore, one has to be aware that stopping the Agent might trigger > > the failover; therefore, it is recommended to disable the failover > > feature while doing operations in the host (like upgrading the > > packages or some other maintenance processes). > > > > Best regards, > > Daniel Salvador (gutoveronezi) > > > > On 10/04/2024 03:52, m...@swen.io wrote: > > > What exactly do you mean? In which state is the host? > > > If a host is in state "Disconnected" or "Alert" you can declare a > > > host > > as degraded via api ( > > https://cloudstack.apache.org/api/apidocs-4.19/apis/declareHostAsDegra > > ded.html) > > or UI (icon). > > > Cloudstack will then start all VM with HA enabled on other hosts, if > > storage is accessible. > > > > > > Regards, > > > Swen > > > > > > -----Ursprüngliche Nachricht----- > > > Von: Murilo Moura <a...@bigsys.com.br> > > > Gesendet: Mittwoch, 10. April 2024 02:10 > > > An: users@cloudstack.apache.org > > > Betreff: Manual fence KVM Host > > > > > > hey guys! > > > > > > Is there any way to manually fence a KVM host and then automatically > > start the migration of VMs that have HA enabled? > > > > > > > > > > >