Hey Daniel, PR #4586 (https://github.com/apache/cloudstack/pull/4586) addressed your issue, as well. I'm currently working on it. Could you share with me how I can reproduce your reboot problem?
Kind regards, Sina ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ On Saturday, October 16th, 2021 at 05:40, Daniel Augusto Veronezi Salvador <dvsalvador...@gmail.com> wrote: > Hi Mauro, > > On KVM's monitor, when there is an inconsistency on the heartbeat's file > > or heartbeat timeout is extrapolated several times, by default, the host > > is restarted. > > The PR 4586 (https://github.com/apache/cloudstack/pull/4586) already > > addressed this issue by externalizing a property, which allows the > > operator to decide if the host must be restarted or not (default is > > 'true', meaning that the host will be restarted). However, this feature > > will be available only after release 4.16. > > Best regards, > > Daniel Salvador > > On 15/10/2021 20:43, Mauro Ferraro - G2K Hosting wrote: > > > Hi guys, how are you?. > > > > We are having this problems with ACS when a primary storages fails. > > > > We have several primary storage with Linux and NFS server serving KVM > > > > images. So every hosts have been mounted all the NFS servers because > > > > in one Host can be running VMs from different storages. The main > > > > problem of this, is when some storage fails because any reason all the > > > > cluster gets crazy and start rebooting the hosts to reconnect with > > > > this storage and all the VMs on the cluster, (including the VMs that > > > > were working good) goes down becuase the conection to one storage fails. > > > > If the problem with storage is permanent, the cluster never start > > > > again and hosts will reboot indefinitely. > > > > When this problem appears, the logs say this: > > > > host heartbeat: kvmheartbeat.sh will reboot system because it was > > > > unable to write the heartbeat to the storage. > > > > Many users, edit the script kvmheartbeat.shto avoid the hosts reboot > > > > or restart the agent on the host but i really not be sure that this is > > > > the real solution. > > > > Can someone help to propose a best solution at this high risk problem?. > > > > Regards, > > > > Mauro
signature.asc
Description: OpenPGP digital signature