Hello John,
On Mi, 2014-07-23 at 19:47 -0400, Jason Brooks wrote: > > ----- Original Message ----- > > From: "John Gardeniers" <jgardeni...@objectmastery.com> > > To: "users" <users@ovirt.org> > > Sent: Wednesday, July 23, 2014 4:29:45 PM > > Subject: [ovirt-users] Self-hosted engine won't start > > > > Hi All, > > > > I have created a lab with 2 hypervisors and a self-hosted engine. Today > > I followed the upgrade instructions as described in > > http://www.ovirt.org/Hosted_Engine_Howto and rebooted the engine. I > > didn't really do an upgrade but simply wanted to test what would happen > > when the engine was rebooted. > > > > When the engine didn't restart I re-ran hosted-engine > > --set-maintenance=none and restarted the vdsm, ovirt-ha-agent and > > ovirt-ha-broker services on both nodes. 15 minutes later it still hadn't > > restarted, so I then tried rebooting both hypervisers. After an hour > > there was still no sign of the engine starting. The agent logs don't > > help me much. The following bits are repeated over and over. > > > > ovirt1 (192.168.19.20): > > > > MainThread::INFO::2014-07-24 > > 09:18:40,272::brokerlink::108::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) > > Trying: notify time=1406157520.27 type=state_transition > > detail=EngineDown-EngineDown hostname='ovirt1.om.net' > > MainThread::INFO::2014-07-24 > > 09:18:40,272::brokerlink::117::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) > > Success, was notification of state_transition (EngineDown-EngineDown) > > sent? ignored > > MainThread::INFO::2014-07-24 > > 09:18:40,594::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) > > Current state EngineDown (score: 2400) > > MainThread::INFO::2014-07-24 > > 09:18:40,594::hosted_engine::332::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) > > Best remote host 192.168.19.21 (id: 2, score: 2400) > > > > ovirt2 (192.168.19.21): > > > > MainThread::INFO::2014-07-24 > > 09:18:04,005::brokerlink::108::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) > > Trying: notify time=1406157484.01 type=state_transition > > detail=EngineDown-EngineDown hostname='ovirt2.om.net' > > MainThread::INFO::2014-07-24 > > 09:18:04,006::brokerlink::117::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) > > Success, was notification of state_transition (EngineDown-EngineDown) > > sent? ignored > > MainThread::INFO::2014-07-24 > > 09:18:04,324::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) > > Current state EngineDown (score: 2400) > > MainThread::INFO::2014-07-24 > > 09:18:04,324::hosted_engine::332::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) > > Best remote host 192.168.19.20 (id: 1, score: 2400) > > > > From the above information I decided to simply shut down one hypervisor > > and see what happens. The engine did start back up again a few minutes > > later. > > I've seen this behavior, too. > > Jason > > > > > The interesting part is that each hypervisor seems to think the other is > > a better host. Where do you get this from? From the line: 'Best remote host 192.168.19.20 (id: 1, score: 2400)' ? I assume this is not the case; HA broker just looking for the best remote candidate. But I have also trouble with this behavior; esp. when I had the cluster in global maintenance. I resolve this by stating hosted engine manually in in global maintenance and waiting for {"health": "good", "vm": "up", "detail": "up"} and disabling global maintenance afterwards. I found the HA feature is indeed working - and tried out best by manually stopping the engine service (service hosted-engine stop). IIRC This should trigger a failover and reboot of the engine. > The two machines are identical, so there's no reason I > > can see for this odd behaviour. In a lab environment this is little more > > than an annoying inconvenience. In a production environment it would be > > completely unacceptable. > > > > May I suggest that this issue be looked into and some means found to > > eliminate this kind of mutual exclusion? e.g. After a few minutes of > > such an issue one hypervisor could be randomly given a slightly higher > > weighting, which should result in it being chosen to start the engine. > > > > regards, > > John > > _______________________________________________ > > Users mailing list > > Users@ovirt.org > > http://lists.ovirt.org/mailman/listinfo/users > > > _______________________________________________ > Users mailing list > Users@ovirt.org > http://lists.ovirt.org/mailman/listinfo/users Cheers, Daniel -- Daniel Helgenberger m box bewegtbild GmbH P: +49/30/2408781-22 F: +49/30/2408781-10 ACKERSTR. 19 D-10115 BERLIN www.m-box.de www.monkeymen.tv Geschäftsführer: Martin Retschitzegger / Michaela Göllner Handeslregister: Amtsgericht Charlottenburg / HRB 112767
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users