Hi all, I had put a specif email alert during the deploy and then I wanted to change it. I did the following:
At one of the hosts ra: hosted-engine --set-shared-config destination-emails ale...@domain.com --type=broker systemctl restart ovirt-ha-broker.service I had to do the above since changing the email from GUI did not have any effect. After the above the emails are received at the new email address but the cluster seems to have some issue recognizing the state of engine. i am flooded with emails that " EngineMaybeAway-EngineUnexpectedlyDown " I have restarted at each host also the ovirt-ha-agent.service. Did put the cluster to global maintenance and then disabled global maintenance. host agent logs I have: MainThread::ERROR::2018-02-18 11:12:20,751::hosted_engine::720::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_sanlock) cannot get lock on host id 1: host already holds lock on a different host id One other host logs: MainThread::INFO::2018-02-18 11:20:23,692::states::682::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(score) Score is 0 due to unexpected vm shutdown at Sun Feb 18 11:15:13 2018 MainThread::INFO::2018-02-18 11:20:23,692::hosted_engine::453::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Current state EngineUnexpectedlyDown (score: 0) The engine status on 3 hosts is: hosted-engine --vm-status --== Host 1 status ==-- conf_on_shared_storage : True Status up-to-date : True Hostname : v0 Host ID : 1 Engine status : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"} Score : 0 stopped : False Local maintenance : False crc32 : cfd15dac local_conf_timestamp : 4721144 Host timestamp : 4721144 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=4721144 (Sun Feb 18 11:20:33 2018) host-id=1 score=0 vm_conf_refresh_time=4721144 (Sun Feb 18 11:20:33 2018) conf_on_shared_storage=True maintenance=False state=EngineUnexpectedlyDown stopped=False timeout=Tue Feb 24 15:29:44 1970 --== Host 2 status ==-- conf_on_shared_storage : True Status up-to-date : True Hostname : v1 Host ID : 2 Engine status : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"} Score : 0 stopped : False Local maintenance : False crc32 : 5cbcef4c local_conf_timestamp : 2499416 Host timestamp : 2499416 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=2499416 (Sun Feb 18 11:20:46 2018) host-id=2 score=0 vm_conf_refresh_time=2499416 (Sun Feb 18 11:20:46 2018) conf_on_shared_storage=True maintenance=False state=EngineUnexpectedlyDown stopped=False timeout=Thu Jan 29 22:18:42 1970 --== Host 3 status ==-- conf_on_shared_storage : True Status up-to-date : False Hostname : v2 Host ID : 3 Engine status : unknown stale-data Score : 3400 stopped : False Local maintenance : False crc32 : f064d529 local_conf_timestamp : 2920612 Host timestamp : 2920611 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=2920611 (Sun Feb 18 10:47:31 2018) host-id=3 score=3400 vm_conf_refresh_time=2920612 (Sun Feb 18 10:47:32 2018) conf_on_shared_storage=True maintenance=False state=GlobalMaintenance stopped=False Putting each host at maintenance then activating them back does not resolve the issue. Seems I have to avoid defining email address during deploy and have it set only later at GUI. How one can recover from this situation? Thanx, Alex
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users