On Fri, Feb 3, 2017 at 5:22 PM, Ralf Schenk <r...@databay.de> wrote: > Hello, > > of course: > > [root@microcloud27 mnt]# sanlock client status > daemon 8a93c9ea-e242-408c-a63d-a9356bb22df5.microcloud > p -1 helper > p -1 listener > p -1 status > > sanlock.log attached. (Beginning 2017-01-27 where everything was fine) > Thanks, the issue is here:
2017-02-02 19:01:22+0100 4848 [1048]: s36 lockspace 7c8deaa8-be02-4aaf-b9b4-ddc8da99ad96:3:/rhev/data-center/mnt/glusterSD/glusterfs.rxmgmt.databay.de:_engine/7c8deaa8-be02-4aaf-b9b4-ddc8da99ad96/dom_md/ids:0 2017-02-02 19:03:42+0100 4988 [12983]: s36 delta_acquire host_id 3 busy1 3 15 13129 7ad427b1-fbb6-4cee-b9ee-01f596fddfbb.microcloud 2017-02-02 19:03:43+0100 4989 [1048]: s36 add_lockspace fail result -262 Could you please check if you have other hosts contending for the same ID (id=3 in this case). > Bye > > Am 03.02.2017 um 16:12 schrieb Simone Tiraboschi: > > The hosted-engine storage domain is mounted for sure, > but the issue is here: > Exception: Failed to start monitoring domain > (sd_uuid=7c8deaa8-be02-4aaf-b9b4-ddc8da99ad96, > host_id=3): timeout during domain acquisition > > The point is that in VDSM logs I see just something like: > 2017-02-02 21:05:22,283 INFO (jsonrpc/1) [dispatcher] Run and protect: > repoStats(options=None) (logUtils:49) > 2017-02-02 21:05:22,285 INFO (jsonrpc/1) [dispatcher] Run and protect: > repoStats, Return response: {u'a7fbaaad-7043-4391-9523-3bedcdc4fb0d': > {'code': 0, 'actual': True, 'version': 0, 'acquired': True, 'delay': > '0.000748727', 'lastCheck': '0.1', 'valid': True}, > u'2b2a44fc-f2bd-47cd-b7af-00be59e30a35': {'code': 0, 'actual': True, > 'version': 0, 'acquired': True, 'delay': '0.00082529', 'lastCheck': '0.1', > 'valid': True}, u'5d99af76-33b5-47d8-99da-1f32413c7bb0': {'code': 0, > 'actual': True, 'version': 4, 'acquired': True, 'delay': '0.000349356', > 'lastCheck': '5.3', 'valid': True}, u'7c8deaa8-be02-4aaf-b9b4-ddc8da99ad96': > {'code': 0, 'actual': True, 'version': 4, 'acquired': False, 'delay': > '0.000377052', 'lastCheck': '0.6', 'valid': True}} (logUtils:52) > > Where the other storage domains have 'acquired': True whil it's > always 'acquired': False for the hosted-engine storage domain. > > Could you please share your /var/log/sanlock.log from the same host and > the output of > sanlock client status > ? > > > > > On Fri, Feb 3, 2017 at 3:52 PM, Ralf Schenk <r...@databay.de> wrote: > >> Hello, >> >> I also put host in Maintenance and restarted vdsm while ovirt-ha-agent is >> running. I can mount the gluster Volume "engine" manually in the host. >> >> I get this repeatedly in /var/log/vdsm.log: >> >> 2017-02-03 15:29:28,891 INFO (MainThread) [vds] Exiting (vdsm:167) >> 2017-02-03 15:29:30,974 INFO (MainThread) [vds] (PID: 11456) I am the >> actual vdsm 4.19.4-1.el7.centos microcloud27 (3.10.0-514.6.1.el7.x86_64) >> (vdsm:145) >> 2017-02-03 15:29:30,974 INFO (MainThread) [vds] VDSM will run with cpu >> affinity: frozenset([1]) (vdsm:251) >> 2017-02-03 15:29:31,013 INFO (MainThread) [storage.check] Starting check >> service (check:91) >> 2017-02-03 15:29:31,017 INFO (MainThread) [storage.Dispatcher] Starting >> StorageDispatcher... (dispatcher:47) >> 2017-02-03 15:29:31,017 INFO (check/loop) [storage.asyncevent] Starting >> <EventLoop running=True closed=False at 0x37480464> (asyncevent:122) >> 2017-02-03 15:29:31,156 INFO (MainThread) [dispatcher] Run and protect: >> registerDomainStateChangeCallback(callbackFunc=<functools.partial object >> at 0x2881fc8>) (logUtils:49) >> 2017-02-03 15:29:31,156 INFO (MainThread) [dispatcher] Run and protect: >> registerDomainStateChangeCallback, Return response: None (logUtils:52) >> 2017-02-03 15:29:31,160 INFO (MainThread) [MOM] Preparing MOM interface >> (momIF:49) >> 2017-02-03 15:29:31,161 INFO (MainThread) [MOM] Using named unix socket >> /var/run/vdsm/mom-vdsm.sock (momIF:58) >> 2017-02-03 15:29:31,162 INFO (MainThread) [root] Unregistering all >> secrets (secret:91) >> 2017-02-03 15:29:31,164 INFO (MainThread) [vds] Setting channels' >> timeout to 30 seconds. (vmchannels:223) >> 2017-02-03 15:29:31,165 INFO (MainThread) [vds.MultiProtocolAcceptor] >> Listening at :::54321 (protocoldetector:185) >> 2017-02-03 15:29:31,354 INFO (vmrecovery) [vds] recovery: completed in >> 0s (clientIF:495) >> 2017-02-03 15:29:31,371 INFO (BindingXMLRPC) [vds] XMLRPC server running >> (bindingxmlrpc:63) >> 2017-02-03 15:29:31,471 INFO (periodic/1) [dispatcher] Run and protect: >> repoStats(options=None) (logUtils:49) >> 2017-02-03 15:29:31,472 INFO (periodic/1) [dispatcher] Run and protect: >> repoStats, Return response: {} (logUtils:52) >> 2017-02-03 15:29:31,472 WARN (periodic/1) [MOM] MOM not available. >> (momIF:116) >> 2017-02-03 15:29:31,473 WARN (periodic/1) [MOM] MOM not available, KSM >> stats will be missing. (momIF:79) >> 2017-02-03 15:29:31,474 ERROR (periodic/1) [root] failed to retrieve >> Hosted Engine HA info (api:252) >> Traceback (most recent call last): >> File "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 231, in >> _getHaInfo >> stats = instance.get_all_stats() >> File >> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", >> line 103, in get_all_stats >> self._configure_broker_conn(broker) >> File >> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", >> line 180, in _configure_broker_conn >> dom_type=dom_type) >> File >> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", >> line 177, in set_storage_domain >> .format(sd_type, options, e)) >> RequestError: Failed to set storage domain FilesystemBackend, options >> {'dom_type': 'glusterfs', 'sd_uuid': '7c8deaa8-be02-4aaf-b9b4-ddc8da99ad96'}: >> Request failed: <class 'ovirt_hos >> ted_engine_ha.lib.storage_backends.BackendFailureException'> >> 2017-02-03 15:29:35,920 INFO (Reactor thread) >> [ProtocolDetector.AcceptorImpl] Accepted connection from ::1:49506 >> (protocoldetector:72) >> 2017-02-03 15:29:35,929 INFO (Reactor thread) >> [ProtocolDetector.Detector] Detected protocol stomp from ::1:49506 >> (protocoldetector:127) >> 2017-02-03 15:29:35,930 INFO (Reactor thread) [Broker.StompAdapter] >> Processing CONNECT request (stompreactor:102) >> 2017-02-03 15:29:35,930 INFO (JsonRpc (StompReactor)) >> [Broker.StompAdapter] Subscribe command received (stompreactor:129) >> 2017-02-03 15:29:36,067 INFO (jsonrpc/0) [jsonrpc.JsonRpcServer] RPC >> call Host.ping succeeded in 0.00 seconds (__init__:515) >> 2017-02-03 15:29:36,071 INFO (jsonrpc/1) [throttled] Current >> getAllVmStats: {} (throttledlog:105) >> 2017-02-03 15:29:36,071 INFO (jsonrpc/1) [jsonrpc.JsonRpcServer] RPC >> call Host.getAllVmStats succeeded in 0.00 seconds (__init__:515) >> 2017-02-03 15:29:46,435 INFO (periodic/0) [dispatcher] Run and protect: >> repoStats(options=None) (logUtils:49) >> 2017-02-03 15:29:46,435 INFO (periodic/0) [dispatcher] Run and protect: >> repoStats, Return response: {} (logUtils:52) >> 2017-02-03 15:29:46,439 ERROR (periodic/0) [root] failed to retrieve >> Hosted Engine HA info (api:252) >> Traceback (most recent call last): >> File "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 231, in >> _getHaInfo >> stats = instance.get_all_stats() >> File >> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", >> line 103, in get_all_stats >> self._configure_broker_conn(broker) >> File >> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", >> line 180, in _configure_broker_conn >> dom_type=dom_type) >> File >> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", >> line 177, in set_storage_domain >> .format(sd_type, options, e)) >> RequestError: Failed to set storage domain FilesystemBackend, options >> {'dom_type': 'glusterfs', 'sd_uuid': '7c8deaa8-be02-4aaf-b9b4-ddc8da99ad96'}: >> Request failed: <class 'ovirt_hos >> ted_engine_ha.lib.storage_backends.BackendFailureException'> >> 2017-02-03 15:29:51,095 INFO (jsonrpc/2) [jsonrpc.JsonRpcServer] RPC >> call Host.getAllVmStats succeeded in 0.00 seconds (__init__:515) >> 2017-02-03 15:29:51,219 INFO (jsonrpc/3) [jsonrpc.JsonRpcServer] RPC >> call Host.setKsmTune succeeded in 0.00 seconds (__init__:515) >> 2017-02-03 15:30:01,444 INFO (periodic/1) [dispatcher] Run and protect: >> repoStats(options=None) (logUtils:49) >> 2017-02-03 15:30:01,444 INFO (periodic/1) [dispatcher] Run and protect: >> repoStats, Return response: {} (logUtils:52) >> 2017-02-03 15:30:01,448 ERROR (periodic/1) [root] failed to retrieve >> Hosted Engine HA info (api:252) >> >> >> >> Am 03.02.2017 um 13:39 schrieb Simone Tiraboschi: >> >> I see there an ERROR on stopMonitoringDomain but I cannot see the >> correspondent startMonitoringDomain; could you please look for it? >> >> On Fri, Feb 3, 2017 at 1:16 PM, Ralf Schenk <r...@databay.de> wrote: >> >>> Hello, >>> >>> attached is my vdsm.log from the host with hosted-engine-ha around the >>> time-frame of agent timeout that is not working anymore for engine (it >>> works in Ovirt and is active). It simply isn't working for engine-ha >>> anymore after Update. >>> >>> At 2017-02-02 19:25:34,248 you'll find an error corresponoding to agent >>> timeout error. >>> >>> Bye >>> >>> >>> >>> Am 03.02.2017 um 11:28 schrieb Simone Tiraboschi: >>> >>> 3. Three of my hosts have the hosted engine deployed for ha. First all >>>>> three where marked by a crown (running was gold and others where silver). >>>>> After upgrading the 3 Host deployed hosted engine ha is not active >>>>> anymore. >>>>> >>>>> I can't get this host back with working ovirt-ha-agent/broker. I >>>>> already rebooted, manually restarted the services but It isn't able to get >>>>> cluster state according to >>>>> "hosted-engine --vm-status". The other hosts state the host status as >>>>> "unknown stale-data" >>>>> >>>>> I already shut down all agents on all hosts and issued a >>>>> "hosted-engine --reinitialize-lockspace" but that didn't help. >>>>> >>>>> Agents stops working after a timeout-error according to log: >>>>> >>>>> MainThread::INFO::2017-02-02 19:24:52,040::hosted_engine::8 >>>>> 41::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_domain_monitor_status) >>>>> VDSM domain monitor status: PENDING >>>>> MainThread::INFO::2017-02-02 19:24:59,185::hosted_engine::8 >>>>> 41::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_domain_monitor_status) >>>>> VDSM domain monitor status: PENDING >>>>> MainThread::INFO::2017-02-02 19:25:06,333::hosted_engine::8 >>>>> 41::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_domain_monitor_status) >>>>> VDSM domain monitor status: PENDING >>>>> MainThread::INFO::2017-02-02 19:25:13,554::hosted_engine::8 >>>>> 41::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_domain_monitor_status) >>>>> VDSM domain monitor status: PENDING >>>>> MainThread::INFO::2017-02-02 19:25:20,710::hosted_engine::8 >>>>> 41::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_domain_monitor_status) >>>>> VDSM domain monitor status: PENDING >>>>> MainThread::INFO::2017-02-02 19:25:27,865::hosted_engine::8 >>>>> 41::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_domain_monitor_status) >>>>> VDSM domain monitor status: PENDING >>>>> MainThread::ERROR::2017-02-02 19:25:27,866::hosted_engine::8 >>>>> 15::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_domain_monitor) >>>>> Failed to start monitoring domain >>>>> (sd_uuid=7c8deaa8-be02-4aaf-b9b4-ddc8da99ad96, >>>>> host_id=3): timeout during domain acquisition >>>>> MainThread::WARNING::2017-02-02 19:25:27,866::hosted_engine::4 >>>>> 69::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>> Error while monitoring engine: Failed to start monitoring domain >>>>> (sd_uuid=7c8deaa8-be02-4aaf-b9b4-ddc8da99ad96, host_id=3): timeout >>>>> during domain acquisition >>>>> MainThread::WARNING::2017-02-02 19:25:27,866::hosted_engine::4 >>>>> 72::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>> Unexpected error >>>>> Traceback (most recent call last): >>>>> File >>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>>> line 443, in start_monitoring >>>>> self._initialize_domain_monitor() >>>>> File >>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>>> line 816, in _initialize_domain_monitor >>>>> raise Exception(msg) >>>>> Exception: Failed to start monitoring domain >>>>> (sd_uuid=7c8deaa8-be02-4aaf-b9b4-ddc8da99ad96, host_id=3): timeout >>>>> during domain acquisition >>>>> MainThread::ERROR::2017-02-02 19:25:27,866::hosted_engine::4 >>>>> 85::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>> Shutting down the agent because of 3 failures in a row! >>>>> MainThread::INFO::2017-02-02 19:25:32,087::hosted_engine::8 >>>>> 41::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_domain_monitor_status) >>>>> VDSM domain monitor status: PENDING >>>>> MainThread::INFO::2017-02-02 19:25:34,250::hosted_engine::7 >>>>> 69::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_stop_domain_monitor) >>>>> Failed to stop monitoring domain >>>>> (sd_uuid=7c8deaa8-be02-4aaf-b9b4-ddc8da99ad96): >>>>> Storage domain is member of pool: u'domain=7c8deaa8-be02-4aaf-b9 >>>>> b4-ddc8da99ad96' >>>>> MainThread::INFO::2017-02-02 19:25:34,254::agent::143::ovir >>>>> t_hosted_engine_ha.agent.agent.Agent::(run) Agent shutting down >>>>> >>>> Simone, Martin, can you please follow up on this? >>>> >>> >>> Ralph, could you please attach vdsm logs from on of your hosts for the >>> relevant time frame? >>> >>> >>> -- >>> >>> >>> *Ralf Schenk* >>> fon +49 (0) 24 05 / 40 83 70 <+49%202405%20408370> >>> fax +49 (0) 24 05 / 40 83 759 <+49%202405%204083759> >>> mail *r...@databay.de* <r...@databay.de> >>> >>> *Databay AG* >>> Jens-Otto-Krag-Straße 11 >>> D-52146 Würselen >>> *www.databay.de* <http://www.databay.de> >>> >>> Sitz/Amtsgericht Aachen • HRB:8437 • USt-IdNr.: DE 210844202 >>> Vorstand: Ralf Schenk, Dipl.-Ing. Jens Conze, Aresch Yavari, Dipl.-Kfm. >>> Philipp Hermanns >>> Aufsichtsratsvorsitzender: Wilhelm Dohmen >>> ------------------------------ >>> >> >> >> -- >> >> >> *Ralf Schenk* >> fon +49 (0) 24 05 / 40 83 70 <+49%202405%20408370> >> fax +49 (0) 24 05 / 40 83 759 <+49%202405%204083759> >> mail *r...@databay.de* <r...@databay.de> >> >> *Databay AG* >> Jens-Otto-Krag-Straße 11 >> D-52146 Würselen >> *www.databay.de* <http://www.databay.de> >> >> Sitz/Amtsgericht Aachen • HRB:8437 • USt-IdNr.: DE 210844202 >> Vorstand: Ralf Schenk, Dipl.-Ing. Jens Conze, Aresch Yavari, Dipl.-Kfm. >> Philipp Hermanns >> Aufsichtsratsvorsitzender: Wilhelm Dohmen >> ------------------------------ >> > > > -- > > > *Ralf Schenk* > fon +49 (0) 24 05 / 40 83 70 <+49%202405%20408370> > fax +49 (0) 24 05 / 40 83 759 <+49%202405%204083759> > mail *r...@databay.de* <r...@databay.de> > > *Databay AG* > Jens-Otto-Krag-Straße 11 > D-52146 Würselen > *www.databay.de* <http://www.databay.de> > > Sitz/Amtsgericht Aachen • HRB:8437 • USt-IdNr.: DE 210844202 > Vorstand: Ralf Schenk, Dipl.-Ing. Jens Conze, Aresch Yavari, Dipl.-Kfm. > Philipp Hermanns > Aufsichtsratsvorsitzender: Wilhelm Dohmen > ------------------------------ >
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users