On Tue, Sep 19, 2017 at 3:27 PM, Alex K <rightkickt...@gmail.com> wrote:
> On Tue, Sep 19, 2017 at 2:26 PM, Arik Hadas <aha...@redhat.com> wrote: > >> >> On Tue, Sep 19, 2017 at 12:44 PM, Alex K <rightkickt...@gmail.com> wrote: >> >>> A second test did not yield the same result. >>> This time the VMs were restarted to another host and when the lost host >>> recovered no VMs were running on it. >>> Seems that there is a racing issue somewhere. >>> >> >> Did you test with the same VM? >> > Yes > >> were the disks + lease located on the same storage domains in both tests? >> > Yes. On all cases the leases are on same storage domain, the same where > the VM disks reside. > >> did the VM run on the same host (and if not, is the libvirt + qemu >> versions different between the two?). >> > Yes > >> It may be a racing issue but not necessarily. There is an observation in >> the bug I mentioned before that it happens only (/more) with certain >> storage types... >> > The storage is based on gluster volume, replica 3 with 1 arbiter. > The gluster version is 3.8.12. > A third test yielded the same issue, VMs on recovered host remained in > paused status. > > Ack, thanks. So I suggest you to add yourself (as CC) to [1] so you will be informed about the resolution for this. In light of your answers it does look like a racing issue. [1] https://bugzilla.redhat.com/show_bug.cgi?id=1459865 > >> >>> >>> Thanx, >>> Alex >>> >>> >>> On Tue, Sep 19, 2017 at 11:52 AM, Arik Hadas <aha...@redhat.com> wrote: >>> >>>> >>>> >>>> On Tue, Sep 19, 2017 at 11:41 AM, Alex K <rightkickt...@gmail.com> >>>> wrote: >>>> >>>>> Hi again, >>>>> >>>>> I performed a different test by isolating one host (say host A) >>>>> through removing all its network interfaces (thus power management through >>>>> IPMI was also not avaialble). >>>>> The VMs (with VM lease enabled) were successfully restarted to another >>>>> host. >>>>> When connecting back the host A, the cluster performed a power >>>>> management and the host became a member of the cluster. >>>>> The VMs that were running on the host A were found "paused", which is >>>>> normal. >>>>> After 15 minutes I see that the VMs at host A are still in "paused" >>>>> state and I would expect that the cluster should decide at some point to >>>>> shutdown the paused VMs and continue with the VMs that are already running >>>>> at other hosts. >>>>> >>>>> Is this behavior normal? >>>>> >>>> >>>> I believe it is not the expected behavior - the VM should not stay in >>>> paused state when its lease expires. But we know about this, see comment 9 >>>> in [1]. >>>> >>>> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1459865 >>>> >>>> >>>>> >>>>> Thanx, >>>>> Alex >>>>> >>>>> On Tue, Sep 19, 2017 at 10:18 AM, Alex K <rightkickt...@gmail.com> >>>>> wrote: >>>>> >>>>>> Hi All, >>>>>> >>>>>> Just completed the tests and it works great. >>>>>> VM leases is just what I needed. >>>>>> >>>>>> Thanx, >>>>>> Alex >>>>>> >>>>>> On Tue, Sep 19, 2017 at 10:16 AM, Yaniv Kaul <yk...@redhat.com> >>>>>> wrote: >>>>>> >>>>>>> >>>>>>> >>>>>>> On Tue, Sep 19, 2017 at 1:00 AM, Alex K <rightkickt...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Enabling VM leases could be an answer to this. Will test tomorrow. >>>>>>>> >>>>>>>> >>>>>>> Indeed. Let us know how it worked for you. >>>>>>> >>>>>>> >>>>>>>> Thanx, >>>>>>>> Alex >>>>>>>> >>>>>>>> On Sep 18, 2017 7:50 PM, "Alex K" <rightkickt...@gmail.com> wrote: >>>>>>>> >>>>>>>> Hi All, >>>>>>>> >>>>>>>> I have the following issue with the HA behavior of oVirt 4.1 and >>>>>>>> need to check with you if there is any work around from your >>>>>>>> experience. >>>>>>>> >>>>>>>> I have 3 servers (A, B, C) with hosted engine in self hosted setup >>>>>>>> on top gluster with replica 3 + 1 arbiter. All good except one point: >>>>>>>> >>>>>>>> The hosts have been configured with power management using IPMI >>>>>>>> (server iLO). >>>>>>>> If I disconnect power from one host (say C) (or disconnect all >>>>>>>> network cables of the host) the two other hosts go to a loop where >>>>>>>> they try >>>>>>>> to verify the status of the host C by issuing power management >>>>>>>> commands to >>>>>>>> the host C. Since power of host is off the server iLO does not respond >>>>>>>> on >>>>>>>> the network and the power management of host C fails, leaving the VMs >>>>>>>> that >>>>>>>> were running on the host C in an unknown state and they are never >>>>>>>> restarted >>>>>>>> to the other hosts. >>>>>>>> >>>>>>>> Is there any fencing option to change this behavior so as if both >>>>>>>> available hosts fail to do power management of the unresponsive host to >>>>>>>> decide that the host is down and to restart the VMs of that host to the >>>>>>>> other available hosts. >>>>>>>> >>>>>>>> >>>>>>> No, this is a bad assumption. Perhaps they are the ones isolated >>>>>>> form it? >>>>>>> Y. >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> I could also add additional power management through UPS to avoid >>>>>>>> this issue but this is not currently an option and I am interested to >>>>>>>> see >>>>>>>> if this behavior can be tweaked. >>>>>>>> >>>>>>>> Thanx, >>>>>>>> Alex >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Users mailing list >>>>>>>> Users@ovirt.org >>>>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>>> _______________________________________________ >>>>> Users mailing list >>>>> Users@ovirt.org >>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>> >>>>> >>>> >>> >> >
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users