Hi Daan It seems cloudstack did know the host had died because it tried to fence the host but couldn't because we have host HA disabled. It also reported OOB stop had occurred on the HA enabled VM's and started them all again on the same host. We then had to put the host into MM because the iDrac logs were showing issues with 2 memory DIMMS.
All I know is that whichever host the corrupt VR was running on - we could not Console to it or any other running VM on the same host - because the agent comms were messed up. We have found in the agent host a line that states PublicKey authentication had failed to the VR (because the VR was corrupt at the guest OS level). At the time we did not see this and any command sent from with ACS mgmt. to either reboot the VR or restart the VPC with cleanup resulted in the host agent not servicing the request or any other request - such as to view the console of any VM or live migrate any VM to another host. We're still sifting through both agent and mgmt. logs to try and determine what exactly happened that was causing this behaviour. All other running VM's on the host were actually fine as we could connect by external methods. We are hoping to upgrade the environment ASAP so we can get better Host HA with StorPool Primary storage. BR Gary Gary Dixon Quadris Cloud Manager 0161 537 4980 +44 7989717661 gary.di...@quadris.co.uk www.quadris.com Innovation House, 12-13 Bredbury Business Park Bredbury Park Way, Bredbury, Stockport, SK6 2SN -----Original Message----- From: Daan Hoogland <daan.hoogl...@gmail.com> Sent: Monday, February 26, 2024 1:03 PM To: users@cloudstack.apache.org Subject: Re: corrupt RVR causing host agent issues Gary, the mail does not display the screenshot for me. Also this is an old version (4.15) I think you should upgrade. What might be the root of your issue is that *you* have seen the physical host crashed but CloudStack could not determine that. To prevent starting the same VM twice it would withhold taking any action in such situations. You may call this a bug or a "lack of feature", but the bottom line is that this is expected behaviour. I do not think a corrupt VR would crash a host. On Mon, Feb 26, 2024 at 1:25 PM Gary Dixon <gary.di...@quadris.co.uk.invalid> wrote: > ACS 4.15.2 > > KVM > > Ubuntu 20.04 > > > > Hi all > > > > We had a physical host crash on Friday due to hardware failure. This > appeared to have caused issues with some RVR’s going into an ‘unknown’ > state. > > > > The strange thing was that on any host where a RVR in an unknown state > was running – we could not console onto any VM’s on that host – nor > could we SSH directly to the RVR from the host. > > The UI was showing all hosts agent state as ‘UP’ > > > > Only when we restarted the ACS mgmt. service did we notice that the > host agent where a RVR was running in an ‘unknown’ state then was in a > ‘connecting’ state for some time – there were no networking issues > either – host was pingable from the mgmt. server. > > > > We were then briefly able to console onto one of the RVR’s in an > unknown state and then discovered that the RVR was indeed corrupt – > this is the screenshot of the RVR terminal : > > > > We then marked the RVR in the DB as ‘stopped’ and virsh destroyed it > directly on the host. We were then able to restart the VPC with > cleanup which then re-created the corrupt RVR. > > It then appeared that once the corrupt RVR had gone – all other RVR’s > in an unknown state transitioned to ‘backup’ state > > > > We are wondering if we have encountered a bug where if a corrupt RVR > crashes the host cloudstack agent if ACS tries to do anything with the > RVR – like restart it > > > > BR > > > > Gary > > > > > > > Gary Dixon > Quadris Cloud Manager > 0161 537 4980 <0161%20537%204980> > +44 7989717661 <+44%207989717661> > gary.di...@quadris.co.uk > http://www.q/ > uadris.com%2F&data=05%7C02%7CGary.Dixon%40quadris.co.uk%7Cccb839a47f40 > 4b38ae5608dc36cb3fbe%7Cf1d6abf3d3b44894ae16db0fb93a96a2%7C0%7C0%7C6384 > 45493800485528%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2l > uMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=9hX%2BwqSLFpxdb > KKSdUqqhPBIK3CaUyl%2F9GkrNUSny98%3D&reserved=0 > Innovation House, 12‑13 Bredbury Business Park Bredbury Park Way, > Bredbury, Stockport, SK6 2SN > -- Daan