Hi, could you please post whole engine.log (from the time which you turned off the host with engine VM) and also vdsm.log from both hosts?
Thanks Martin Perina ----- Original Message ----- > From: "Michael Hölzl" <m...@ins.jku.at> > To: users@ovirt.org > Sent: Monday, September 21, 2015 10:27:08 AM > Subject: [ovirt-users] HA - Fencing not working when host with engine gets > shutdown > > Hi all, > > we are trying to setup an ovirt environment with two hosts, both > connected to a ISCSI storage device, a hosted engine and power > management configured over ILO. So far it seems to work fine in our > testing setup and starting/stopping VMs works smoothly with proper > scheduling between those hosts. So we wanted to test HA for the VMs now > and started to manually shutdown a host while there are still VMs > running on that machine (to simulate power failure or a kernel panic). > The expected outcome was that all machines were HA is enabled, are > booted again. This works if the machine with the failure does not have > the engine running. If the machine with the hosted engine VM gets > shutdown, the host gets in the "Not Responsive state" and all VMs end up > in an unkown state. However, the engine itself starts correctly on the > second host and it seems like it tries to fence the other host (as > expected) - Events which we get in the open virtualization manager: > 1. Host hosted_engine_2 is non responsive > 2. Host hosted_engine_1 from cluster Default was chosen as a proxy to > execute Status command on Host hosted_engine_2. > 3. Host hosted_engine_2 became non responsive. It has no power > management configured. Please check the host status, manually reboot it, > and click "Confirm Host Has Been Rebooted" > 4. Host hosted_engine_2 is not responding. It will stay in Connecting > state for a grace period of 124 seconds and after that an attempt to > fence the host will be issued. > > Event 4 is continuously coming every 3 minutes. Complete engine.log file > during engine boot up: http://pastebin.com/D6xS3Wfy > So the host detects the machine is not responding and wants to fence it. > But although the host has power management configured over ILO, the > engine thinks that it is not. As a result the second host does not get > fenced and VMs are not migrated to the running machine. > In the log files there are also a lot of time out exception. But I guess > that this is because the host cannot connect to the other machine. > > Did anybody face similar problems with HA? Or any clue what the problem > might be? > > Thanks, > Michael > > > ---- > ovirt version: 3.5.4 > Hosted engine VM OS: Cent OS 6.5 > Host Machines OS: Cent OS 7 > > P.S. We also have to note that we had problems with the command > fence_ipmilan at the beginning. We were receiving the message "Unable to > obtain correct plug status or plug is not available," whenever the > command fence_ipmilan was called. However, the command fence_ilo4 > worked. So we use a simple script for fence_ipmilan now that calls > fence_ilo4 and passes the arguments. > _______________________________________________ > Users mailing list > Users@ovirt.org > http://lists.ovirt.org/mailman/listinfo/users > _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users