Re: [Users] HA

Koen Vanoppen Tue, 08 Apr 2014 05:41:37 -0700

Or with other words, the SPM and the VM should move almost immediate after
the storage connections on the hypervisor are gone. I know, I'm asking to
much maybe, but we would be very happy :-) :-).


So sketch:

Mercury1 SPM
Mercury 2

Mercury1 loses both fibre connections --> goes in non-operational and the
VM goes in paused state and stays this way, until I manually reboot the
host so it fences.

What I would like is that when mercury 1 loses both fibre connections. He
fences immediate so the VM's are moved also almost instantly... If this is
possible... :-)

Kind regards and thanks for all the help!



2014-04-08 14:26 GMT+02:00 Koen Vanoppen <vanoppen.k...@gmail.com>:

> Ok,
> Thanx already for all the help. I adapted some things for quicker respons:
> engine-config --get FenceQuietTimeBetweenOperationsInSec-->180
>  engine-config --set FenceQuietTimeBetweenOperationsInSec=60
>
>  engine-config --get StorageDomainFalureTimeoutInMinutes-->180
>  engine-config --set StorageDomainFalureTimeoutInMinutes=1
>
>  engine-config --get SpmCommandFailOverRetries-->5
>  engine-config --set SpmCommandFailOverRetries
>
>  engine-config --get SPMFailOverAttempts-->3
>  engine-config --set SPMFailOverAttempts=1
>
> engine-config --get NumberOfFailedRunsOnVds-->3
>  engine-config --set NumberOfFailedRunsOnVds=1
>
> engine-config --get vdsTimeout-->180
> engine-config --set vdsTimeout=30
>
> engine-config --get VDSAttemptsToResetCount-->2
> engine-config --set VDSAttemptsToResetCount=1
>
> engine-config --get TimeoutToResetVdsInSeconds-->60
> engine-config --set TimeoutToResetVdsInSeconds=30
>
> Now the result of this is that when the VM is not running on the SPM that
> it will migrate before going in pause mode.
> But when we tried it, when the vm is running on the SPM, it get's in
> paused mode (for safety reasons, I know ;-) ). And stays there until the
> host gets MANUALLY fenced by rebooting it. So now my question is... How can
> I make the hypervisor fence (so reboots, so vm is moved) quicker?
>
> Kind regards,
>
> Koen
>
>
> 2014-04-04 16:28 GMT+02:00 Koen Vanoppen <vanoppen.k...@gmail.com>:
>
> Ja das waar. Maar was aan't rijden... Dus ik stuur maar door dan :-). Ik
>> heb reeds de time out aangepast. Die stond op 5 min voor hij den time out
>> ging geven. Staat nu op 2 min
>>  On Apr 4, 2014 4:14 PM, "David Van Zeebroeck" <
>> david.van.zeebro...@brusselsairport.be> wrote:
>>
>>>   Ik heb ze ook he
>>>
>>>
>>>
>>> Maar normaal had de fencing moeten werken als ik het zo lees
>>>
>>> Dus daar is ergens iets verkeerd gelopen zo te lezen
>>>
>>>
>>>
>>> *From:* Koen Vanoppen [mailto:vanoppen.k...@gmail.com]
>>> *Sent:* vrijdag 4 april 2014 16:07
>>> *To:* David Van Zeebroeck
>>> *Subject:* Fwd: Re: [Users] HA
>>>
>>>
>>>
>>>
>>>
>>> David Van Zeebroeck
>>>
>>> Product Manager Unix Infrastructure
>>>
>>> Information & Communication Technology
>>>
>>> *Brussels Airport Company*
>>>
>>> T +32 (0)2 753 66 24
>>>
>>> M +32 (0)497 02 17 31
>>>
>>> david.van.zeebro...@brusselsairport.be
>>>
>>>  *www.brusselsairport.be <http://www.brusselsairport.be>*
>>>
>>>
>>>
>>>
>>>
>>> *FOLLOW US ON:*
>>>
>>> <http://https://nl-nl.facebook.com/BrusselsairportBRU>
>>>
>>> <http://www.brusselsairport.be/en/mediaroom/brusm/>
>>>
>>>
>>>
>>> Company Info <http://www.brusselsairport.be/en/maildisclaimer/>
>>>
>>>
>>>
>>>  ---------- Forwarded message ----------
>>> From: "Michal Skrivanek" <michal.skriva...@redhat.com>
>>> Date: Apr 4, 2014 3:39 PM
>>> Subject: Re: [Users] HA
>>> To: "Koen Vanoppen" <vanoppen.k...@gmail.com>
>>> Cc: "ovirt-users Users" <users@ovirt.org>
>>>
>>>
>>>
>>> On 4 Apr 2014, at 15:14, Sander Grendelman wrote:
>>>
>>>
>>>
>>>   Do you have power management configured?
>>>
>>> Was the "failed" host fenced/rebooted?
>>>
>>>
>>>
>>> On Fri, Apr 4, 2014 at 2:21 PM, Koen Vanoppen <vanoppen.k...@gmail.com>
>>> wrote:
>>>
>>> So... It is possible for a fully automatic migration of the VM to
>>> another hypervisor in case Storage connection fails?
>>>
>>> How can we make this happen? Because for the moment, when we tested the
>>> situation they stayed in pause state.
>>>
>>> (Test situation:
>>>
>>>    - Unplug the 2 fibre cables from the hypervisor
>>>    - VM's go in pause state
>>>    - VM's stayed in pause state until the failure was solved
>>>
>>>
>>>
>>> as said before, it's not safe hence we (try to) not migrate them.
>>>
>>> They only get paused when they actually access the storage which may not
>>> be always the case. I.e. the storage connection is severed, host deemed
>>> NonOperational and VMs are getting migrated from it, then some of them will
>>> succeed if they didn't access that "bad" storage … the paused VMs will
>>> remain (mostly, it can still happen that they appear paused migrated on
>>> other host when the disk access occurs only at the last stage of migration)
>>>
>>>
>>>
>>>
>>>
>>> so in other words, if you want to migrate the VMs without interruption
>>> it's not sometimes possible
>>>
>>> if you are fine with the VMs restarted in short time on other host then
>>> power management/fencing will help here
>>>
>>>
>>>
>>> Thanks,
>>>
>>> michal
>>>
>>>     )
>>>
>>>
>>>
>>> They only returned when we restored the fiber connection to the
>>> Hypervisor…
>>>
>>>
>>>
>>> yes, since 3.3 we have the autoresume feature
>>>
>>>
>>>
>>> Thanks,
>>>
>>> michal
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> Kind Regards,
>>>
>>> Koen
>>>
>>>
>>>
>>>
>>>
>>> 2014-04-04 13:52 GMT+02:00 Koen Vanoppen <vanoppen.k...@gmail.com>:
>>>
>>> So... It is possible for a fully automatic migration of the VM to
>>> another hypervisor in case Storage connection fails?
>>>
>>> How can we make this happen? Because for the moment, when we tested the
>>> situation they stayed in pause state.
>>>
>>> (Test situation:
>>>
>>>    - Unplug the 2 fibre cables from the hypervisor
>>>    - VM's go in pause state
>>>    - VM's stayed in pause state until the failure was solved
>>>
>>> )
>>>
>>>
>>>
>>> They only returned when we restored the fiber connection to the
>>> Hypervisor...
>>>
>>> Kind Regards,
>>>
>>> Koen
>>>
>>>
>>>
>>> 2014-04-03 16:53 GMT+02:00 Koen Vanoppen <vanoppen.k...@gmail.com>:
>>>
>>>
>>>
>>> ---------- Forwarded message ----------
>>> From: "Doron Fediuck" <dfedi...@redhat.com>
>>> Date: Apr 3, 2014 4:51 PM
>>> Subject: Re: [Users] HA
>>>
>>> To: "Koen Vanoppen" <vanoppen.k...@gmail.com>
>>> Cc: "Omer Frenkel" <ofren...@redhat.com>, <users@ovirt.org>, "Federico
>>> Simoncelli" <fsimo...@redhat.com>, "Allon Mureinik" <amure...@redhat.com
>>> >
>>>
>>>
>>>
>>> ----- Original Message -----
>>> > From: "Koen Vanoppen" <vanoppen.k...@gmail.com>
>>> > To: "Omer Frenkel" <ofren...@redhat.com>, users@ovirt.org
>>> > Sent: Wednesday, April 2, 2014 4:17:36 PM
>>> > Subject: Re: [Users] HA
>>> >
>>> > Yes, indeed. I meant not-operational. Sorry.
>>> > So, if I understand this correctly. When we ever come in a situation
>>> that we
>>> > loose both storage connections on our hypervisor, we will have to
>>> manually
>>> > restore the connections first?
>>> >
>>> > And thanx for the tip for speeding up thins :-).
>>> >
>>> > Kind regards,
>>> >
>>> > Koen
>>> >
>>> >
>>> > 2014-04-02 15:14 GMT+02:00 Omer Frenkel < ofren...@redhat.com > :
>>> >
>>> >
>>> >
>>> >
>>> >
>>> > ----- Original Message -----
>>> > > From: "Koen Vanoppen" < vanoppen.k...@gmail.com >
>>> > > To: users@ovirt.org
>>> > > Sent: Wednesday, April 2, 2014 4:07:19 PM
>>> > > Subject: [Users] HA
>>> > >
>>> > > Dear All,
>>> > >
>>> > > Due our acceptance testing, we discovered something. (Document will
>>> > > follow).
>>> > > When we disable one fiber path, no problem multipath finds it way no
>>> pings
>>> > > are lost.
>>> > > BUT when we disabled both the fiber paths (so one of the storage
>>> domain is
>>> > > gone on this host, but still available on the other host), vms go in
>>> paused
>>> > > mode... He chooses a new SPM (can we speed this up?), put's the host
>>> in
>>> > > non-responsive (can we speed this up, more important) and the VM's
>>> stay on
>>> > > Paused mode... I would expect that they would be migrated (yes, HA is
>>> >
>>> > i guess you mean the host moves to not-operational (in contrast to
>>> > non-responsive)?
>>> > if so, the engine will not migrate vms that are paused to do io error,
>>> > because of data corruption risk.
>>> >
>>> > to speed up you can look at the storage domain monitoring timeout:
>>> > engine-config --get StorageDomainFalureTimeoutInMinutes
>>> >
>>> >
>>> > > enabled) to the other host and reboot there... Any solution? We are
>>> still
>>> > > using oVirt 3.3.1 , but we are planning a upgrade to 3.4 after the
>>> easter
>>> > > holiday.
>>> > >
>>> > > Kind Regards,
>>> > >
>>> > > Koen
>>> > >
>>>
>>> Hi Koen,
>>> Resuming from paused due to io issues is supported (adding relevant
>>> folks).
>>> Regardless, if you did not define power management, you should manually
>>> approve
>>> source host was rebooted in order for migration to proceed. Otherwise we
>>> risk
>>> split-brain scenario.
>>>
>>> Doron
>>>
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Users mailing list
>>> Users@ovirt.org
>>> http://lists.ovirt.org/mailman/listinfo/users
>>>
>>>
>>>
>>> _______________________________________________
>>> Users mailing list
>>> Users@ovirt.org
>>> http://lists.ovirt.org/mailman/listinfo/users
>>>
>>>
>>>
>>>
>

<<inline: image157135.JPG>>

<<inline: image490c34.JPG>>

_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users

Re: [Users] HA

Reply via email to