Hey Daniel,

PR #4586 (https://github.com/apache/cloudstack/pull/4586) addressed your issue, 
as well. I'm currently working on it. Could you share with me how I can 
reproduce your reboot problem?

Kind regards,
Sina

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐

On Saturday, October 16th, 2021 at 05:40, Daniel Augusto Veronezi Salvador 
<dvsalvador...@gmail.com> wrote:

> Hi Mauro,
> 

> On KVM's monitor, when there is an inconsistency on the heartbeat's file
> 

> or heartbeat timeout is extrapolated several times, by default, the host
> 

> is restarted.
> 

> The PR 4586 (https://github.com/apache/cloudstack/pull/4586) already
> 

> addressed this issue by externalizing a property, which allows the
> 

> operator to decide if the host must be restarted or not (default is
> 

> 'true', meaning that the host will be restarted). However, this feature
> 

> will be available only after release 4.16.
> 

> Best regards,
> 

> Daniel Salvador
> 

> On 15/10/2021 20:43, Mauro Ferraro - G2K Hosting wrote:
> 

> > Hi guys, how are you?.
> > 

> > We are having this problems with ACS when a primary storages fails.
> > 

> > We have several primary storage with Linux and NFS server serving KVM
> > 

> > images. So every hosts have been mounted all the NFS servers because
> > 

> > in one Host can be running VMs from different storages. The main
> > 

> > problem of this, is when some storage fails because any reason all the
> > 

> > cluster gets crazy and start rebooting the hosts to reconnect with
> > 

> > this storage and all the VMs on the cluster, (including the VMs that
> > 

> > were working good) goes down becuase the conection to one storage fails.
> > 

> > If the problem with storage is permanent, the cluster never start
> > 

> > again and hosts will reboot indefinitely.
> > 

> > When this problem appears, the logs say this:
> > 

> > host heartbeat: kvmheartbeat.sh will reboot system because it was
> > 

> > unable to write the heartbeat to the storage.
> > 

> > Many users, edit the script kvmheartbeat.shto avoid the hosts reboot
> > 

> > or restart the agent on the host but i really not be sure that this is
> > 

> > the real solution.
> > 

> > Can someone help to propose a best solution at this high risk problem?.
> > 

> > Regards,
> > 

> > Mauro

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to