I've checked id's in /rhev/data-center/mnt/glusterSD/*...../dom_md/ # -rw-rw----. 1 vdsm kvm 1048576 Mar 12 05:14 ids
seems ok sanlock.log showing; --------------------------- r14 acquire_token open error -13 r14 cmd_acquire 2,11,89283 acquire_token -13 Now I'm not quiet sure on which direction to take. Lockspace --------------- "hosted-engine --reinitialize-lockspace" is throwing an exception; Exception("Lockfile reset cannot be performed with" Exception: Lockfile reset cannot be performed with an active agent. @didi - I am in "Global Maintenance". I just noticed that host 1 now shows. Engine status: unknown stale-data state= AgentStopped I'm pretty sure Ive been able to start the Engine VM while in Global Maintenance. But you raise a good question. I don't see why you would be restricted in running the engine while in Global or even starting the VM. If so this is a little bakwards. On 12 March 2017 at 16:28, Yedidyah Bar David <d...@redhat.com> wrote: > On Fri, Mar 10, 2017 at 12:39 PM, Martin Sivak <msi...@redhat.com> wrote: > > Hi Ian, > > > > it is normal that VDSMs are competing for the lock, one should win > > though. If that is not the case then the lockspace might be corrupted > > or the sanlock daemons can't reach it. > > > > I would recommend putting the cluster to global maintenance and > > attempting a manual start using: > > > > # hosted-engine --set-maintenance --mode=global > > # hosted-engine --vm-start > > Is that possible? See also: > > http://lists.ovirt.org/pipermail/users/2016-January/036993.html > > > > > You will need to check your storage connectivity and sanlock status on > > all hosts if that does not work. > > > > # sanlock client status > > > > There are couple of locks I would expect to be there (ha_agent, spm), > > but no lock for hosted engine disk should be visible. > > > > Next steps depend on whether you have important VMs running on the > > cluster and on the Gluster status (I can't help you there > > unfortunately). > > > > Best regards > > > > -- > > Martin Sivak > > SLA / oVirt > > > > > > On Fri, Mar 10, 2017 at 7:37 AM, Ian Neilsen <ian.neil...@gmail.com> > wrote: > >> I just noticed this in the vdsm.logs. The agent looks like it is > trying to > >> start hosted engine on both machines?? > >> > >> <on_poweroff>destroy</on_poweroff><on_reboot>destroy</ > on_reboot><on_crash>destroy</on_crash></domain> > >> Thread-7517::ERROR::2017-03-10 > >> 01:26:13,053::vm::773::virt.vm::(_startUnderlyingVm) > >> vmId=`2419f9fe-4998-4b7a-9fe9-151571d20379`::The vm start process > failed > >> Traceback (most recent call last): > >> File "/usr/share/vdsm/virt/vm.py", line 714, in _startUnderlyingVm > >> self._run() > >> File "/usr/share/vdsm/virt/vm.py", line 2026, in _run > >> self._connection.createXML(domxml, flags), > >> File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", > line > >> 123, in wrapper ret = f(*args, **kwargs) > >> File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 917, in > >> wrapper return func(inst, *args, **kwargs) > >> File "/usr/lib64/python2.7/site-packages/libvirt.py", line 3782, in > >> createXML if ret is None:raise libvirtError('virDomainCreateXML() > failed', > >> conn=self) > >> > >> libvirtError: Failed to acquire lock: Permission denied > >> > >> INFO::2017-03-10 01:26:13,054::vm::1330::virt.vm::(setDownStatus) > >> vmId=`2419f9fe-4998-4b7a-9fe9-151571d20379`::Changed state to Down: > Failed > >> to acquire lock: Permission denied (code=1) > >> INFO::2017-03-10 01:26:13,054::guestagent::430::virt.vm::(stop) > >> vmId=`2419f9fe-4998-4b7a-9fe9-151571d20379`::Stopping connection > >> > >> DEBUG::2017-03-10 01:26:13,054::vmchannels::238::vds::(unregister) > Delete > >> fileno 56 from listener. > >> DEBUG::2017-03-10 01:26:13,055::vmchannels::66::vds::(_unregister_fd) > Failed > >> to unregister FD from epoll (ENOENT): 56 > >> DEBUG::2017-03-10 01:26:13,055::__init__::209:: > jsonrpc.Notification::(emit) > >> Sending event {"params": {"2419f9fe-4998-4b7a-9fe9-151571d20379": > {"status": > >> "Down", "exitReason": 1, "exitMessage": "Failed to acquire lock: > Permission > >> denied", "exitCode": 1}, "notify_time": 4308740560}, "jsonrpc": "2.0", > >> "method": "|virt|VM_status|2419f9fe-4998-4b7a-9fe9-151571d20379"} > >> VM Channels Listener::DEBUG::2017-03-10 > >> 01:26:13,475::vmchannels::142::vds::(_do_del_channels) fileno 56 was > removed > >> from listener. > >> DEBUG::2017-03-10 01:26:14,430::check::296::storage.check::(_start_ > process) > >> START check > >> u'/rhev/data-center/mnt/glusterSD/192.168.3.10:_data/ > a08822ec-3f5b-4dba-ac2d-5510f0b4b6a2/dom_md/metadata' > >> cmd=['/usr/bin/taskset', '--cpu-list', '0-39', '/usr/bin/dd', > >> u'if=/rhev/data-center/mnt/glusterSD/192.168.3.10:_data/ > a08822ec-3f5b-4dba-ac2d-5510f0b4b6a2/dom_md/metadata', > >> 'of=/dev/null', 'bs=4096', 'count=1', 'iflag=direct'] delay=0.00 > >> DEBUG::2017-03-10 01:26:14,481::asyncevent::564: > :storage.asyncevent::(reap) > >> Process <cpopen.CPopen object at 0x3ba6550> terminated (count=1) > >> DEBUG::2017-03-10 > >> 01:26:14,481::check::327::storage.check::(_check_completed) FINISH > check > >> u'/rhev/data-center/mnt/glusterSD/192.168.3.10:_data/ > a08822ec-3f5b-4dba-ac2d-5510f0b4b6a2/dom_md/metadata' > >> rc=0 err=bytearray(b'0+1 records in\n0+1 records out\n300 bytes (300 B) > >> copied, 8.7603e-05 s, 3.4 MB/s\n') elapsed=0.06 > >> > >> > >> On 10 March 2017 at 10:40, Ian Neilsen <ian.neil...@gmail.com> wrote: > >>> > >>> Hi All > >>> > >>> I had a storage issue with my gluster volumes running under ovirt > hosted. > >>> I now cannot start the hosted engine manager vm from "hosted-engine > >>> --vm-start". > >>> I've scoured the net to find a way, but can't seem to find anything > >>> concrete. > >>> > >>> Running Centos7, ovirt 4.0 and gluster 3.8.9 > >>> > >>> How do I recover the engine manager. Im at a loss! > >>> > >>> Engine Status = score between nodes was 0 for all, now node 1 is > reading > >>> 3400, but all others are 0 > >>> > >>> {"reason": "bad vm status", "health": "bad", "vm": "down", "detail": > >>> "down"} > >>> > >>> > >>> Logs from agent.log > >>> ================== > >>> > >>> INFO::2017-03-09 > >>> 19:32:52,600::state_decorators::51::ovirt_hosted_ > engine_ha.agent.hosted_engine.HostedEngine::(check) > >>> Global maintenance detected > >>> INFO::2017-03-09 > >>> 19:32:52,603::hosted_engine::612::ovirt_hosted_engine_ha. > agent.hosted_engine.HostedEngine::(_initialize_vdsm) > >>> Initializing VDSM > >>> INFO::2017-03-09 > >>> 19:32:54,820::hosted_engine::639::ovirt_hosted_engine_ha. > agent.hosted_engine.HostedEngine::(_initialize_storage_images) > >>> Connecting the storage > >>> INFO::2017-03-09 > >>> 19:32:54,821::storage_server::219::ovirt_hosted_engine_ha. > lib.storage_server.StorageServer::(connect_storage_server) > >>> Connecting storage server > >>> INFO::2017-03-09 > >>> 19:32:59,194::storage_server::226::ovirt_hosted_engine_ha. > lib.storage_server.StorageServer::(connect_storage_server) > >>> Connecting storage server > >>> INFO::2017-03-09 > >>> 19:32:59,211::storage_server::233::ovirt_hosted_engine_ha. > lib.storage_server.StorageServer::(connect_storage_server) > >>> Refreshing the storage domain > >>> INFO::2017-03-09 > >>> 19:32:59,328::hosted_engine::666::ovirt_hosted_engine_ha. > agent.hosted_engine.HostedEngine::(_initialize_storage_images) > >>> Preparing images > >>> INFO::2017-03-09 > >>> 19:32:59,328::image::126::ovirt_hosted_engine_ha.lib. > image.Image::(prepare_images) > >>> Preparing images > >>> INFO::2017-03-09 > >>> 19:33:01,748::hosted_engine::669::ovirt_hosted_engine_ha. > agent.hosted_engine.HostedEngine::(_initialize_storage_images) > >>> Reloading vm.conf from the shared storage domain > >>> INFO::2017-03-09 > >>> 19:33:01,748::config::206::ovirt_hosted_engine_ha.agent. > hosted_engine.HostedEngine.config::(refresh_local_conf_file) > >>> Trying to get a fresher copy of vm configuration from the OVF_STORE > >>> WARNING::2017-03-09 > >>> 19:33:04,056::ovf_store::107::ovirt_hosted_engine_ha.lib. > ovf.ovf_store.OVFStore::(scan) > >>> Unable to find OVF_STORE > >>> ERROR::2017-03-09 > >>> 19:33:04,058::config::235::ovirt_hosted_engine_ha.agent. > hosted_engine.HostedEngine.config::(refresh_local_conf_file) > >>> Unable to get vm.conf from OVF_STORE, falling back to initial vm.conf > >>> > >>> ovirt-ha-agent logs > >>> ================ > >>> > >>> ovirt-ha-agent > >>> ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config ERROR > Unable > >>> to get vm.conf from OVF_STORE, falling back to initial vm.conf > >>> > >>> vdsm > >>> ====== > >>> > >>> vdsm vds.dispatcher ERROR SSL error during reading data: unexpected eof > >>> > >>> ovirt-ha-broker > >>> ============ > >>> > >>> ovirt-ha-broker cpu_load_no_engine.EngineHealth ERROR Failed to > >>> getVmStats: 'pid' > >>> > >>> -- > >>> Ian Neilsen > >>> > >>> Mobile: 0424 379 762 > >>> Linkedin: http://au.linkedin.com/in/ianneilsen > >>> Twitter : ineilsen > >> > >> > >> > >> > >> -- > >> Ian Neilsen > >> > >> Mobile: 0424 379 762 > >> Linkedin: http://au.linkedin.com/in/ianneilsen > >> Twitter : ineilsen > >> > >> _______________________________________________ > >> Users mailing list > >> Users@ovirt.org > >> http://lists.ovirt.org/mailman/listinfo/users > >> > > _______________________________________________ > > Users mailing list > > Users@ovirt.org > > http://lists.ovirt.org/mailman/listinfo/users > > > > -- > Didi > -- Ian Neilsen Mobile: 0424 379 762 Linkedin: http://au.linkedin.com/in/ianneilsen Twitter : ineilsen
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users