On Tue, Sep 19, 2017 at 2:26 PM, Arik Hadas <aha...@redhat.com> wrote:
> > On Tue, Sep 19, 2017 at 12:44 PM, Alex K <rightkickt...@gmail.com> wrote: > >> A second test did not yield the same result. >> This time the VMs were restarted to another host and when the lost host >> recovered no VMs were running on it. >> Seems that there is a racing issue somewhere. >> > > Did you test with the same VM? > Yes > were the disks + lease located on the same storage domains in both tests? > Yes. On all cases the leases are on same storage domain, the same where the VM disks reside. > did the VM run on the same host (and if not, is the libvirt + qemu > versions different between the two?). > Yes > It may be a racing issue but not necessarily. There is an observation in > the bug I mentioned before that it happens only (/more) with certain > storage types... > The storage is based on gluster volume, replica 3 with 1 arbiter. The gluster version is 3.8.12. A third test yielded the same issue, VMs on recovered host remained in paused status. > > >> >> Thanx, >> Alex >> >> >> On Tue, Sep 19, 2017 at 11:52 AM, Arik Hadas <aha...@redhat.com> wrote: >> >>> >>> >>> On Tue, Sep 19, 2017 at 11:41 AM, Alex K <rightkickt...@gmail.com> >>> wrote: >>> >>>> Hi again, >>>> >>>> I performed a different test by isolating one host (say host A) through >>>> removing all its network interfaces (thus power management through IPMI was >>>> also not avaialble). >>>> The VMs (with VM lease enabled) were successfully restarted to another >>>> host. >>>> When connecting back the host A, the cluster performed a power >>>> management and the host became a member of the cluster. >>>> The VMs that were running on the host A were found "paused", which is >>>> normal. >>>> After 15 minutes I see that the VMs at host A are still in "paused" >>>> state and I would expect that the cluster should decide at some point to >>>> shutdown the paused VMs and continue with the VMs that are already running >>>> at other hosts. >>>> >>>> Is this behavior normal? >>>> >>> >>> I believe it is not the expected behavior - the VM should not stay in >>> paused state when its lease expires. But we know about this, see comment 9 >>> in [1]. >>> >>> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1459865 >>> >>> >>>> >>>> Thanx, >>>> Alex >>>> >>>> On Tue, Sep 19, 2017 at 10:18 AM, Alex K <rightkickt...@gmail.com> >>>> wrote: >>>> >>>>> Hi All, >>>>> >>>>> Just completed the tests and it works great. >>>>> VM leases is just what I needed. >>>>> >>>>> Thanx, >>>>> Alex >>>>> >>>>> On Tue, Sep 19, 2017 at 10:16 AM, Yaniv Kaul <yk...@redhat.com> wrote: >>>>> >>>>>> >>>>>> >>>>>> On Tue, Sep 19, 2017 at 1:00 AM, Alex K <rightkickt...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Enabling VM leases could be an answer to this. Will test tomorrow. >>>>>>> >>>>>>> >>>>>> Indeed. Let us know how it worked for you. >>>>>> >>>>>> >>>>>>> Thanx, >>>>>>> Alex >>>>>>> >>>>>>> On Sep 18, 2017 7:50 PM, "Alex K" <rightkickt...@gmail.com> wrote: >>>>>>> >>>>>>> Hi All, >>>>>>> >>>>>>> I have the following issue with the HA behavior of oVirt 4.1 and >>>>>>> need to check with you if there is any work around from your experience. >>>>>>> >>>>>>> I have 3 servers (A, B, C) with hosted engine in self hosted setup >>>>>>> on top gluster with replica 3 + 1 arbiter. All good except one point: >>>>>>> >>>>>>> The hosts have been configured with power management using IPMI >>>>>>> (server iLO). >>>>>>> If I disconnect power from one host (say C) (or disconnect all >>>>>>> network cables of the host) the two other hosts go to a loop where they >>>>>>> try >>>>>>> to verify the status of the host C by issuing power management commands >>>>>>> to >>>>>>> the host C. Since power of host is off the server iLO does not respond >>>>>>> on >>>>>>> the network and the power management of host C fails, leaving the VMs >>>>>>> that >>>>>>> were running on the host C in an unknown state and they are never >>>>>>> restarted >>>>>>> to the other hosts. >>>>>>> >>>>>>> Is there any fencing option to change this behavior so as if both >>>>>>> available hosts fail to do power management of the unresponsive host to >>>>>>> decide that the host is down and to restart the VMs of that host to the >>>>>>> other available hosts. >>>>>>> >>>>>>> >>>>>> No, this is a bad assumption. Perhaps they are the ones isolated form >>>>>> it? >>>>>> Y. >>>>>> >>>>>> >>>>>>> >>>>>>> I could also add additional power management through UPS to avoid >>>>>>> this issue but this is not currently an option and I am interested to >>>>>>> see >>>>>>> if this behavior can be tweaked. >>>>>>> >>>>>>> Thanx, >>>>>>> Alex >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Users mailing list >>>>>>> Users@ovirt.org >>>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>>> _______________________________________________ >>>> Users mailing list >>>> Users@ovirt.org >>>> http://lists.ovirt.org/mailman/listinfo/users >>>> >>>> >>> >> >
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users