Hello,

It was 4.0.5 however, we’ve decided to pull the plug on oVirt for now as it’s 
too risky in taking down possibly a large number or servers due to this issue. 
I think oVirt should be a little less “picky” if you will, on storage 
connections. For example, this specific issue prevented anything storage 
related from being done. Because the “master” was locked you cannot:

Add other storage
Activate hosts
Start VM’s
Reinitialize the datacenter
Remove storage

These points above a huge – while oVirt is indeed open source, upstream of RHEV 
and doesn’t cost anything, I feel that in scenarios like this it could be the 
downfall of oVirt itself being too risky.

The logging with oVirt seems to be crazy though – we’ve been testing it now for 
about 2.5 years, maybe 3 years? Once oVirt gets in a state where it cannot 
connect to something, it just goes haywire – many likely don’t see this 
however, every time these things happened it when we’re testing failover 
scenarios to see how oVirt responds.

A few recommendations I would make are:

Drop the whole “master” storage thing – it complicates setting storage up. 
Either connect, or don’t connect. If there’s connectivity issues, oVirt gets 
hung up on switching to this “master” storage. If you have a single storage 
domain, you’ll likely have problems as we’ve experienced because once oVirt 
cannot find the “master” it begins to go berserk, then spirals out of control 
there. It might not on small setups with a few hypervisors, but on an install 
with a few hundred VM’s, large number of hypervisors etc, it seems to get ugly 
real quick.

Stop trying to reconnect things, I think that’s what I’m looking for. When 
something fails, oVirt just goes in a loop over and over which eventually 
causes dashboard issues, crazy amounts of logs etc. It would be better if oVirt 
would just stop, make a log entry and then quit, maybe after a few times.

In my case, I could mount the storage manually to ALL hosts, I could even force 
start the VM’s with virsh. The oVirt dashboard just kept saying it was locked, 
and wouldn’t let you do anything at all with the entire datacenter.

At this time, we’ve pushed these servers back into production using our current 
hypervisor software which is stable but does not have the benefits of oVirt. 
It’ll be revisited later on and is still in use for non-production things.


From: Maor Lipchuk<mailto:mlipc...@redhat.com>
Sent: Sunday, January 22, 2017 7:33 AM
To: Bill Bill<mailto:jax2...@outlook.com>
Cc: users<mailto:users@ovirt.org>
Subject: Re: [ovirt-users] master storage domain stuck in locked state



On Sun, Jan 22, 2017 at 2:31 PM, Maor Lipchuk 
<mlipc...@redhat.com<mailto:mlipc...@redhat.com>> wrote:
Hi Bill,

Can you please attach the engine and VDSM logs.
Does the storage domain still stuck?

Also which oVirt version are you using?


Regards,
Maor

On Sat, Jan 21, 2017 at 3:11 AM, Bill Bill 
<jax2...@outlook.com<mailto:jax2...@outlook.com>> wrote:

Also cannot reinitialize the datacenter because the storage domain is locked.

From: Bill Bill<mailto:jax2...@outlook.com>
Sent: Friday, January 20, 2017 8:08 PM
To: users<mailto:users@ovirt.org>
Subject: RE: master storage domain stuck in locked state

Spoke too soon. Some hosts came back up but the storage domain is still locked 
so no vm’s can be started. What is the proper way to force this to be unlocked? 
Each time we look to move into production after successful testing, something 
like this always seems to pop up at the last minute rending oVirt questionable 
in terms of reliability for some unknown issue.



From: Bill Bill<mailto:jax2...@outlook.com>
Sent: Friday, January 20, 2017 7:54 PM
To: users<mailto:users@ovirt.org>
Subject: RE: master storage domain stuck in locked state


So apparently something didn’t change the metadata to master before connection 
was lost. I changed the metadata role to master and it came backup. Seems 
emailing in helped because every time I can’t figure something out, email in a 
find it shortly after.


From: Bill Bill<mailto:jax2...@outlook.com>
Sent: Friday, January 20, 2017 7:43 PM
To: users<mailto:users@ovirt.org>
Subject: master storage domain stuck in locked state

No clue how to get this out. I can mount all storage manually on the 
hypervisors. It seems like after a reboot oVirt is now having some issue and 
the storage domain is stuck in locked state. Because of this, can’t activate 
any other storage either, so the other domains are in maintenance and the 
master sits in locked state, has been for hours.

This sticks out on a hypervisor:

StoragePoolWrongMaster: Wrong Master domain or its version: 
u'SD=d8a0172e-837f-4552-92c7-566dc4e548e4, 
pool=3fd2ad92-e1eb-49c2-906d-00ec233f610a'

Not sure, nothing changed other than a reboot of the storage.

Engine log shows:

[org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] 
(DefaultQuartzScheduler8) [5696732b] START, SetVdsStatusVDSCommand(HostName = 
U31U32NodeA, SetVdsStatusVDSCommandParameters:{runAsync='true', 
hostId='70e2b8e4-0752-47a8-884c-837a00013e79', status='NonOperational', 
nonOperationalReason='STORAGE_DOMAIN_UNREACHABLE', 
stopSpmFailureLogged='false', maintenanceReason='null'}), log id: 6db9820a

No idea why it says unreachable, it certainly is because I can manually mount 
ALL storage to the hypervisor.

Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows 10


_______________________________________________
Users mailing list
Users@ovirt.org<mailto:Users@ovirt.org>
http://lists.ovirt.org/mailman/listinfo/users



_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users

Reply via email to