Please attach the output of hosted-engine --vm-status and the /var/log/ovirt-hosted-engine-ha/agent.log file from both hosts.
The VM will restart if the ovirt-engine service does not become available within timeout. And that might mean couple of things - the FQDN of the engine is wrong, the engine needs something that was only available on the dead host (A) like some storage, host B cannot ping the gateway.. Best regards Martin Sivak On Wed, Apr 25, 2018 at 11:33 AM, <dhy...@sina.com> wrote: > sorry, I mis-represent, > > I hava two node, A:192.168.122.65 , B:192.168.122.66 with hosted-engine. > > testing engine HA : > > first two node is up, and hosted-engine VM run in A, then I poweroff A, and > after 3 minutes, B start it`s hosted engine VM, > But it`s ovirt-engine connect to host A, and continue for about 10 minutes, > then hosted engine VM restart. > ----- Original Message ----- > From: Martin Sivak <msi...@redhat.com> > To: dhy336 <dhy...@sina.com> > Subject: Re: Re: Re: Re: [ovirt-users] 回复:Re: Hosted-engine can not_switch > Date: 2018-04-25 17:11 > > > Your hosted engine VM has its own address that does not depend on > which host it is currently running. So it should be available on the > same address no matter where the VM is running. > Best regards > Martin Sivak > On Wed, Apr 25, 2018 at 9:07 AM, <dhy...@sina.com> wrote: >>>> I deploy two node for hosted engine, first hosted engine VM run in >>>> 192.168.122.65, I power off this host, hosted-engine VM switch >>>> another host,but ovirt engine still connect 192.168.122.65. if restart >>>> ovirt-engine server, it is work. >> >> I think this issue is error, because hosted engine VM has power up in >> another host( 192.168.122.66), so hosted engine should >> connect to host( 192.168.122.66), not connet to host(192.168.122.66)? >> >> thanks >> >> ----- Original Message ----- >> From: Martin Sivak <msi...@redhat.com> >> To: dhy336 <dhy...@sina.com> >> Cc: users <users@ovirt.org> >> Subject: Re: Re: Re: [ovirt-users] 回复:Re: Hosted-engine can not_switch >> Date: 2018-04-20 18:28 >> >> >> Hi, >> No, this is not an error. You killed the host without moving it to >> maintenance first. The engine has no way to distinguish this from >> temporary network failure for example. Give it some time and the host >> will move its status to one of the error states and handle the highly >> available VMs on it (if fencing is properly configured). >> Best regards >> Martin Sivak >> On Fri, Apr 20, 2018 at 12:13 PM, <dhy...@sina.com> wrote: >>> this process is not error ? >>> ----- Original Message ----- >>> From: Martin Sivak <msi...@redhat.com> >>> To: dhy336 <dhy...@sina.com> >>> Cc: users <users@ovirt.org> >>> Subject: Re: Re: [ovirt-users] 回复:Re: Hosted-engine can not_switch >>> Date: 2018-04-20 18:05 >>> >>> >>> Hi, >>> the engine does not know you killed the host. It will notice >>> eventually and handle the situation. Just give it time (5 minutes or >>> so). >>> Best regards >>> -- >>> Martin Sivak >>> SLA / oVirt >>> On Fri, Apr 20, 2018 at 12:00 PM, <dhy...@sina.com> wrote: >>>> Hi, thanks for your feedback. I hava another qeustions >>>> >>>> I deploy two node for hosted engine, first hosted engine VM run in >>>> 192.168.122.65, I power off this host, hosted-engine VM switch >>>> another host,but ovirt engine still connect 192.168.122.65. if restart >>>> ovirt-engine server, it is work. >>>> >>>> >>>> 2018-04-20 17:13:04,692+08 ERROR >>>> [org.ovirt.engine.core.vdsbroker.vdsbroker.GetAllVmStatsVDSCommand] >>>> (EE-ManagedThreadFactory-en gineScheduled-Thread-98) [] Command >>>> 'GetAllVmStatsVDSCommand(HostName = hosted-engine2, >>>> VdsIdVDSCommandParametersBase:{hos >>>> tId='a5428ef7-9df6-4a86-91de-7e36fda340fa'})' execution failed: >>>> java.net.NoRouteToHostException: No route to host >>>> 6568 2018-04-20 17:13:04,693+08 INFO >>>> [org.ovirt.engine.core.vdsbroker.monitoring.PollVmStatsRefresher] >>>> (EE-ManagedThreadFactory-engi neScheduled-Thread-98) [] Failed to fetch >>>> vms info for host 'hosted-engin2' - skipping VMs monitoring. >>>> 6569 2018-04-20 17:13:19,710+08 INFO >>>> [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp >>>> Reactor) >>>> [] Connecting to hosted-engine2/192.168.122.656570 2018-04-20 >>>> 17:13:22,730+08 ERROR >>>> [org.ovirt.engine.core.vdsbroker.vdsbroker.GetAllVmStatsVDSCommand] >>>> (EE-ManagedThreadFactory-en gineScheduled-Thread-45) [] Command >>>> 'GetAllVmStatsVDSCommand(HostName = hosted-engine-tchyp2, >>>> VdsIdVDSCommandParametersBase:{hos >>>> tId='a5428ef7-9df6-4a86-91de-7e36fda340fa'})' execution failed: >>>> java.net.NoRouteToHostException: No route to host >>>> 6571 2018-04-20 17:13:22,732+08 INFO >>>> [org.ovirt.engine.core.vdsbroker.monitoring.PollVmStatsRefresher] >>>> (EE-ManagedThreadFactory-engi neScheduled-Thread-45) [] Failed to fetch >>>> vms info for host 'hosted-engine2' - skipping VMs monitoring. >>>> >>>> ----- Original Message ----- >>>> From: Martin Sivak <msi...@redhat.com> >>>> To: dhy336 <dhy...@sina.com> >>>> Cc: users <users@ovirt.org> >>>> Subject: Re: [ovirt-users] 回复:Re: Hosted-engine can not_switch >>>> Date: 2018-04-20 16:40 >>>> >>>> >>>> Hi, >>>> your ovirt-hosted-engine-ha package is too old. You need at least >>>> 2.1.9 to properly support 4.2 engine. The same applies to vdsm. Please >>>> upgrade the node. >>>> Best regards >>>> Martin Sivak >>>> On Fri, Apr 20, 2018 at 3:58 AM, <dhy...@sina.com> wrote: >>>>> Hi I find some error logs in /var/log/ovirt-hosted-engine-ha/broker. >>>>> >>>>> [root@hosted-engine2 ~]# ll /rhev/data-center/mnt >>>>> total 0 >>>>> drwxr-xr-x. 3 vdsm kvm 76 Apr 18 22:28 192.168.122.218:_exports_data >>>>> drwxr-xr-x. 3 vdsm kvm 76 Apr 18 22:12 >>>>> 192.168.122.218:_exports_hosted-engine-test1 >>>>> [root@hosted-engine2 ~]# ll >>>>> /rhev/data-center/mnt/192.168.122.218\:_exports_hosted-engine-test1/ >>>>> total 0 >>>>> drwxr-xr-x. 5 vdsm kvm 50 Apr 18 22:14 >>>>> 8a734205-65b7-4801-b7f0-d380eb45dbae >>>>> -rwxr-xr-x. 1 vdsm kvm 0 Apr 20 09:54 __DIRECT_IO_TEST__ >>>>> >>>>> uuid 8a734205-65b7-4801-b7f0-d380eb45dbae is in >>>>> /rhev/data-center/mnt/192.168.122.218\:_exports_hosted-engine-test1/ >>>>> but broker find it in /rhev/data-center/mnt, is it my version is error? >>>>> my >>>>> ovirt-hosted-engine-ha version is 2.1.5, vdsm is 4.20.5, >>>>> ovirt-engine is 4.2 >>>>> >>>>> MainThread::INFO::2018-04-19 >>>>> >>>>> >>>>> >>>>> >>>>> 19:26:31,479::listener::41::ovirt_hosted_engine_ha.broker.listener.Listener::(__init__) >>>>> Initializing SocketServer >>>>> MainThread::INFO::2018-04-19 >>>>> >>>>> >>>>> >>>>> >>>>> 19:26:31,480::listener::56::ovirt_hosted_engine_ha.broker.listener.Listener::(__init__) >>>>> SocketServer ready >>>>> Thread-1::INFO::2018-04-19 >>>>> >>>>> >>>>> >>>>> >>>>> 19:26:31,558::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup) >>>>> Connection established >>>>> Thread-1::ERROR::2018-04-19 >>>>> >>>>> >>>>> >>>>> >>>>> 19:26:31,559::listener::192::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle) >>>>> Error handling request, data: 'set-storage-domain FilesystemBackend >>>>> dom_type=nfs3 sd_uuid=8a734205-65b7-4801-b7f0-d380eb45dbae' >>>>> Traceback (most recent call last): >>>>> File >>>>> >>>>> >>>>> >>>>> >>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py", >>>>> line 166, in handle >>>>> data) >>>>> File >>>>> >>>>> >>>>> >>>>> >>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py", >>>>> line 299, in _dispatch >>>>> .set_storage_domain(client, sd_type, **options) >>>>> File >>>>> >>>>> >>>>> >>>>> >>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", >>>>> line 66, in set_storage_domain >>>>> self._backends[client].connect() >>>>> File >>>>> >>>>> >>>>> >>>>> >>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py", >>>>> line 462, in connect >>>>> self._dom_type) >>>>> File >>>>> >>>>> >>>>> >>>>> >>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py", >>>>> line 107, in get_domain_path >>>>> " in {1}".format(sd_uuid, parent)) >>>>> BackendFailureException: path to storage domain >>>>> 8a734205-65b7-4801-b7f0-d380eb45dbae not found in /rhev/data-center/mnt >>>>> Thread-1::INFO::2018-04-19 >>>>> >>>>> >>>>> >>>>> >>>>> 19:26:31,563::listener::186::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle) >>>>> Connection closed >>>>> Thread-2::INFO::2018-04-19 >>>>> >>>>> >>>>> >>>>> >>>>> 19:26:44,601::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup) >>>>> Connection established >>>>> >>>>> ----- 原始邮件 ----- >>>>> 发件人:<dhy...@sina.com> >>>>> 收件人:"Martin Sivak" <msi...@redhat.com> >>>>> 抄送人:users <users@ovirt.org> >>>>> 主题:[ovirt-users] 回复:Re: Hosted-engine can not_switch >>>>> 日期:2018年04月20日 09点30分 >>>>> >>>>> libvirt has not error logs . I only find some error for vdsm. >>>>> vdsm log is: >>>>> 2018-04-20 09:24:52,610+0800 INFO (jsonrpc/1) [vdsm.api] FINISH >>>>> getVolumeInfo return={'info': {'status': 'OK', 'domain': >>>>> '8a734205-65b7-4801-b7f0-d380eb45dbae', 'voltype': 'LEAF', >>>>> 'description': >>>>> 'hosted-engine.lockspace', 'parent': >>>>> '00000000-0000-0000-0000-000000000000', >>>>> 'format': 'RAW', 'generation': 0, 'image': >>>>> '611272bd-c2cc-42bc-94e2-9aa52e754c35', 'ctime': '1524032037', >>>>> 'disktype': >>>>> '2', 'legality': 'LEGAL', 'mtime': '0', 'apparentsize': '1048576', >>>>> 'children': [], 'pool': '', 'capacity': '1048576', 'uuid': >>>>> u'7037aac6-7c8e-4efd-82f7-ca618c953fe6', 'truesize': '1048576', 'type': >>>>> 'PREALLOCATED', 'lease': {'owners': [], 'version': None}}} >>>>> from=::1,48306, >>>>> task_id=03a7938e-8afb-4b16-b8dd-126c2b1f5d52 (api:52) >>>>> 2018-04-20 09:24:52,611+0800 INFO (jsonrpc/1) [jsonrpc.JsonRpcServer] >>>>> RPC >>>>> call Volume.getInfo succeeded in 0.03 seconds (__init__:630) >>>>> 2018-04-20 09:24:54,113+0800 ERROR (periodic/3) >>>>> [virt.periodic.Operation] >>>>> <vdsm.virt.sampling.VMBulkstatsMonitor object at 0x1e92f90> operation >>>>> failed >>>>> (periodic:215) >>>>> Traceback (most recent call last): >>>>> File "/usr/lib/python2.7/site-packages/vdsm/virt/periodic.py", line >>>>> 213, >>>>> in __call__ >>>>> self._func() >>>>> File "/usr/lib/python2.7/site-packages/vdsm/virt/sampling.py", line >>>>> 522, >>>>> in __call__ >>>>> self._send_metrics() >>>>> File "/usr/lib/python2.7/site-packages/vdsm/virt/sampling.py", line >>>>> 538, >>>>> in _send_metrics >>>>> vm_sample.interval) >>>>> File "/usr/lib/python2.7/site-packages/vdsm/virt/vmstats.py", line 45, >>>>> in >>>>> produce >>>>> networks(vm, stats, first_sample, last_sample, interval) >>>>> File "/usr/lib/python2.7/site-packages/vdsm/virt/vmstats.py", line 322, >>>>> in >>>>> networks >>>>> if nic.name.startswith('hostdev'): >>>>> AttributeError: name >>>>> 2018-04-20 09:24:54,800+0800 INFO (Reactor thread) >>>>> [ProtocolDetector.AcceptorImpl] Accepted connection from ::1:48308 >>>>> (protocoldetector:61) >>>>> 2018-04-20 09:24:54,810+0800 INFO (Reactor thread) >>>>> [ProtocolDetector.Detector] Detected protocol stomp from ::1:48308 >>>>> (protocoldetector:125) >>>>> 2018-04-20 09:24:54,810+0800 INFO (Reactor thread) >>>>> [Broker.StompAdapter] >>>>> Processing CONNECT request (stompreactor:103) >>>>> 2018-04-20 09:24:54,818+0800 INFO (JsonRpc (StompReactor)) >>>>> [Broker.StompAdapter] Subscribe command received (stompreactor:132) >>>>> 2018-04-20 09:24:55,119+0800 INFO (jsonrpc/6) [api.host] START >>>>> getHardwareInfo() from=::1,48308 (api:46) >>>>> >>>>> ----- 原始邮件 ----- >>>>> 发件人:Martin Sivak <msi...@redhat.com> >>>>> 收件人:dhy336 <dhy...@sina.com> >>>>> 抄送人:users <users@ovirt.org> >>>>> 主题:Re: [ovirt-users] Hosted-engine can not switch >>>>> 日期:2018年04月19日 20点16分 >>>>> >>>>> >>>>> We need more than just this small log snippet. Please check the vdsm >>>>> and libvirt logs as well. >>>>> Best regards >>>>> Martin Sivak >>>>> On Thu, Apr 19, 2018 at 2:05 PM, <dhy...@sina.com> wrote: >>>>>> Hi, >>>>>> I deploy three node with hosted engine, I force shut down a node which >>>>>> Host-engine VM is run, But hosted engine VM in other nodes can not >>>>>> run. >>>>>> >>>>>> I find some error in /var/log/ovirt-hosted-engine-ha/agent.log >>>>>> >>>>>> MainThread::INFO::2018-04-19 >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> 19:56:35,787::hosted_engine::1192::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_clean_vdsm_state) >>>>>> Cleaning state for non-running VM >>>>>> MainThread::INFO::2018-04-19 >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> 19:56:42,587::hosted_engine::1176::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_clean_vdsm_state) >>>>>> Vdsm state for VM clean >>>>>> MainThread::INFO::2018-04-19 >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> 19:56:42,589::hosted_engine::1125::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_start_engine_vm) >>>>>> Starting vm using `/usr/sbin/hosted-engine --vm-start` >>>>>> MainThread::INFO::2018-04-19 >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> 19:56:47,599::hosted_engine::1131::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_start_engine_vm) >>>>>> stdout: >>>>>> MainThread::INFO::2018-04-19 >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> 19:56:47,600::hosted_engine::1132::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_start_engine_vm) >>>>>> stderr: Virtual machine does not exist: {'vmId': >>>>>> u'08bbd680-a8a7-4267-82e7-89f36e87e930'} >>>>>> >>>>>> MainThread::INFO::2018-04-19 >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> 19:56:47,600::hosted_engine::1144::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_start_engine_vm) >>>>>> Engine VM started on localhost >>>>>> MainThread::INFO::2018-04-19 >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> 19:56:47,609::brokerlink::111::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) >>>>>> Trying: notify time=1524139007.61 type=state_transition >>>>>> detail=EngineStart-EngineStarting hostname='hosted-engine2' >>>>>> MainThread::INFO::2018-04-19 >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> 19:56:47,670::brokerlink::121::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) >>>>>> Success, was notification of state_transition >>>>>> (EngineStart-EngineStarting) >>>>>> sent? sent >>>>>> MainThread::INFO::2018-04-19 >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> 19:56:47,670::hosted_engine::604::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm) >>>>>> Initializing VDSM >>>>>> MainThread::INFO::2018-04-19 >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> 19:56:50,095::hosted_engine::630::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) >>>>>> Connecting the storage >>>>>> MainThread::INFO::2018-04-19 >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> 19:56:50,096::storage_server::220::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(validate_storage_server) >>>>>> Validating storage server >>>>>> MainThread::INFO::2018-04-19 >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> 19:56:52,449::hosted_engine::639::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) >>>>>> Storage domain reported as valid and reconnect is not forced. >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Users mailing list >>>>>> Users@ovirt.org >>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>> >>>>> _______________________________________________ >>>>> Users mailing list >>>>> Users@ovirt.org >>>>> http://lists.ovirt.org/mailman/listinfo/users _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users