Or with other words, the SPM and the VM should move almost immediate after the storage connections on the hypervisor are gone. I know, I'm asking to much maybe, but we would be very happy :-) :-).
So sketch: Mercury1 SPM Mercury 2 Mercury1 loses both fibre connections --> goes in non-operational and the VM goes in paused state and stays this way, until I manually reboot the host so it fences. What I would like is that when mercury 1 loses both fibre connections. He fences immediate so the VM's are moved also almost instantly... If this is possible... :-) Kind regards and thanks for all the help! 2014-04-08 14:26 GMT+02:00 Koen Vanoppen <vanoppen.k...@gmail.com>: > Ok, > Thanx already for all the help. I adapted some things for quicker respons: > engine-config --get FenceQuietTimeBetweenOperationsInSec-->180 > engine-config --set FenceQuietTimeBetweenOperationsInSec=60 > > engine-config --get StorageDomainFalureTimeoutInMinutes-->180 > engine-config --set StorageDomainFalureTimeoutInMinutes=1 > > engine-config --get SpmCommandFailOverRetries-->5 > engine-config --set SpmCommandFailOverRetries > > engine-config --get SPMFailOverAttempts-->3 > engine-config --set SPMFailOverAttempts=1 > > engine-config --get NumberOfFailedRunsOnVds-->3 > engine-config --set NumberOfFailedRunsOnVds=1 > > engine-config --get vdsTimeout-->180 > engine-config --set vdsTimeout=30 > > engine-config --get VDSAttemptsToResetCount-->2 > engine-config --set VDSAttemptsToResetCount=1 > > engine-config --get TimeoutToResetVdsInSeconds-->60 > engine-config --set TimeoutToResetVdsInSeconds=30 > > Now the result of this is that when the VM is not running on the SPM that > it will migrate before going in pause mode. > But when we tried it, when the vm is running on the SPM, it get's in > paused mode (for safety reasons, I know ;-) ). And stays there until the > host gets MANUALLY fenced by rebooting it. So now my question is... How can > I make the hypervisor fence (so reboots, so vm is moved) quicker? > > Kind regards, > > Koen > > > 2014-04-04 16:28 GMT+02:00 Koen Vanoppen <vanoppen.k...@gmail.com>: > > Ja das waar. Maar was aan't rijden... Dus ik stuur maar door dan :-). Ik >> heb reeds de time out aangepast. Die stond op 5 min voor hij den time out >> ging geven. Staat nu op 2 min >> On Apr 4, 2014 4:14 PM, "David Van Zeebroeck" < >> david.van.zeebro...@brusselsairport.be> wrote: >> >>> Ik heb ze ook he >>> >>> >>> >>> Maar normaal had de fencing moeten werken als ik het zo lees >>> >>> Dus daar is ergens iets verkeerd gelopen zo te lezen >>> >>> >>> >>> *From:* Koen Vanoppen [mailto:vanoppen.k...@gmail.com] >>> *Sent:* vrijdag 4 april 2014 16:07 >>> *To:* David Van Zeebroeck >>> *Subject:* Fwd: Re: [Users] HA >>> >>> >>> >>> >>> >>> David Van Zeebroeck >>> >>> Product Manager Unix Infrastructure >>> >>> Information & Communication Technology >>> >>> *Brussels Airport Company* >>> >>> T +32 (0)2 753 66 24 >>> >>> M +32 (0)497 02 17 31 >>> >>> david.van.zeebro...@brusselsairport.be >>> >>> *www.brusselsairport.be <http://www.brusselsairport.be>* >>> >>> >>> >>> >>> >>> *FOLLOW US ON:* >>> >>> <http://https://nl-nl.facebook.com/BrusselsairportBRU> >>> >>> <http://www.brusselsairport.be/en/mediaroom/brusm/> >>> >>> >>> >>> Company Info <http://www.brusselsairport.be/en/maildisclaimer/> >>> >>> >>> >>> ---------- Forwarded message ---------- >>> From: "Michal Skrivanek" <michal.skriva...@redhat.com> >>> Date: Apr 4, 2014 3:39 PM >>> Subject: Re: [Users] HA >>> To: "Koen Vanoppen" <vanoppen.k...@gmail.com> >>> Cc: "ovirt-users Users" <users@ovirt.org> >>> >>> >>> >>> On 4 Apr 2014, at 15:14, Sander Grendelman wrote: >>> >>> >>> >>> Do you have power management configured? >>> >>> Was the "failed" host fenced/rebooted? >>> >>> >>> >>> On Fri, Apr 4, 2014 at 2:21 PM, Koen Vanoppen <vanoppen.k...@gmail.com> >>> wrote: >>> >>> So... It is possible for a fully automatic migration of the VM to >>> another hypervisor in case Storage connection fails? >>> >>> How can we make this happen? Because for the moment, when we tested the >>> situation they stayed in pause state. >>> >>> (Test situation: >>> >>> - Unplug the 2 fibre cables from the hypervisor >>> - VM's go in pause state >>> - VM's stayed in pause state until the failure was solved >>> >>> >>> >>> as said before, it's not safe hence we (try to) not migrate them. >>> >>> They only get paused when they actually access the storage which may not >>> be always the case. I.e. the storage connection is severed, host deemed >>> NonOperational and VMs are getting migrated from it, then some of them will >>> succeed if they didn't access that "bad" storage … the paused VMs will >>> remain (mostly, it can still happen that they appear paused migrated on >>> other host when the disk access occurs only at the last stage of migration) >>> >>> >>> >>> >>> >>> so in other words, if you want to migrate the VMs without interruption >>> it's not sometimes possible >>> >>> if you are fine with the VMs restarted in short time on other host then >>> power management/fencing will help here >>> >>> >>> >>> Thanks, >>> >>> michal >>> >>> ) >>> >>> >>> >>> They only returned when we restored the fiber connection to the >>> Hypervisor… >>> >>> >>> >>> yes, since 3.3 we have the autoresume feature >>> >>> >>> >>> Thanks, >>> >>> michal >>> >>> >>> >>> >>> >>> >>> >>> Kind Regards, >>> >>> Koen >>> >>> >>> >>> >>> >>> 2014-04-04 13:52 GMT+02:00 Koen Vanoppen <vanoppen.k...@gmail.com>: >>> >>> So... It is possible for a fully automatic migration of the VM to >>> another hypervisor in case Storage connection fails? >>> >>> How can we make this happen? Because for the moment, when we tested the >>> situation they stayed in pause state. >>> >>> (Test situation: >>> >>> - Unplug the 2 fibre cables from the hypervisor >>> - VM's go in pause state >>> - VM's stayed in pause state until the failure was solved >>> >>> ) >>> >>> >>> >>> They only returned when we restored the fiber connection to the >>> Hypervisor... >>> >>> Kind Regards, >>> >>> Koen >>> >>> >>> >>> 2014-04-03 16:53 GMT+02:00 Koen Vanoppen <vanoppen.k...@gmail.com>: >>> >>> >>> >>> ---------- Forwarded message ---------- >>> From: "Doron Fediuck" <dfedi...@redhat.com> >>> Date: Apr 3, 2014 4:51 PM >>> Subject: Re: [Users] HA >>> >>> To: "Koen Vanoppen" <vanoppen.k...@gmail.com> >>> Cc: "Omer Frenkel" <ofren...@redhat.com>, <users@ovirt.org>, "Federico >>> Simoncelli" <fsimo...@redhat.com>, "Allon Mureinik" <amure...@redhat.com >>> > >>> >>> >>> >>> ----- Original Message ----- >>> > From: "Koen Vanoppen" <vanoppen.k...@gmail.com> >>> > To: "Omer Frenkel" <ofren...@redhat.com>, users@ovirt.org >>> > Sent: Wednesday, April 2, 2014 4:17:36 PM >>> > Subject: Re: [Users] HA >>> > >>> > Yes, indeed. I meant not-operational. Sorry. >>> > So, if I understand this correctly. When we ever come in a situation >>> that we >>> > loose both storage connections on our hypervisor, we will have to >>> manually >>> > restore the connections first? >>> > >>> > And thanx for the tip for speeding up thins :-). >>> > >>> > Kind regards, >>> > >>> > Koen >>> > >>> > >>> > 2014-04-02 15:14 GMT+02:00 Omer Frenkel < ofren...@redhat.com > : >>> > >>> > >>> > >>> > >>> > >>> > ----- Original Message ----- >>> > > From: "Koen Vanoppen" < vanoppen.k...@gmail.com > >>> > > To: users@ovirt.org >>> > > Sent: Wednesday, April 2, 2014 4:07:19 PM >>> > > Subject: [Users] HA >>> > > >>> > > Dear All, >>> > > >>> > > Due our acceptance testing, we discovered something. (Document will >>> > > follow). >>> > > When we disable one fiber path, no problem multipath finds it way no >>> pings >>> > > are lost. >>> > > BUT when we disabled both the fiber paths (so one of the storage >>> domain is >>> > > gone on this host, but still available on the other host), vms go in >>> paused >>> > > mode... He chooses a new SPM (can we speed this up?), put's the host >>> in >>> > > non-responsive (can we speed this up, more important) and the VM's >>> stay on >>> > > Paused mode... I would expect that they would be migrated (yes, HA is >>> > >>> > i guess you mean the host moves to not-operational (in contrast to >>> > non-responsive)? >>> > if so, the engine will not migrate vms that are paused to do io error, >>> > because of data corruption risk. >>> > >>> > to speed up you can look at the storage domain monitoring timeout: >>> > engine-config --get StorageDomainFalureTimeoutInMinutes >>> > >>> > >>> > > enabled) to the other host and reboot there... Any solution? We are >>> still >>> > > using oVirt 3.3.1 , but we are planning a upgrade to 3.4 after the >>> easter >>> > > holiday. >>> > > >>> > > Kind Regards, >>> > > >>> > > Koen >>> > > >>> >>> Hi Koen, >>> Resuming from paused due to io issues is supported (adding relevant >>> folks). >>> Regardless, if you did not define power management, you should manually >>> approve >>> source host was rebooted in order for migration to proceed. Otherwise we >>> risk >>> split-brain scenario. >>> >>> Doron >>> >>> >>> >>> >>> >>> >>> _______________________________________________ >>> Users mailing list >>> Users@ovirt.org >>> http://lists.ovirt.org/mailman/listinfo/users >>> >>> >>> >>> _______________________________________________ >>> Users mailing list >>> Users@ovirt.org >>> http://lists.ovirt.org/mailman/listinfo/users >>> >>> >>> >>> >
<<inline: image157135.JPG>>
<<inline: image490c34.JPG>>
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users