On Wed, Sep 11, 2013 at 10:50:44AM -0400, Hans-Joachim wrote:
> Hello,
> 
> I run in troubles....
> 
> For a testing I completly powered off the oVirt Cluster (Engine as well as 
> the the host).. and now my host
> is unable to fine the Master Domain
> 
> System: CentOS 6.4, vdsm-4.12.1-1.el6, oVirt Engine Version: 3.3.0-1.el6
> 
> Trace vdsm.log
> 
> Thread-98::DEBUG::2013-09-11 16:35:06,859::BindingXMLRPC::177::vds::(wrapper) 
> client [192.168.1.29]
> Thread-98::DEBUG::2013-09-11 
> 16:35:06,859::task::579::TaskManager.Task::(_updateState) 
> Task=`b2a7a44f-8cb0-453a-bf52-16a200f992f9`::moving from state init -> state 
> preparing
> Thread-98::INFO::2013-09-11 16:35:06,860::logUtils::44::dispatcher::(wrapper) 
> Run and protect: 
> connectStoragePool(spUUID='5849b030-626e-47cb-ad90-3ce782d831b3', hostID=1, 
> scsiKey='5849b030-626e-47cb-ad90-3ce782d831b3', 
> msdUUID='3401420d-6918-49f9-b9e2-23ba47949f29', masterVersion=1, options=None)
> Thread-98::DEBUG::2013-09-11 
> 16:35:06,860::resourceManager::197::ResourceManager.Request::(__init__) 
> ResName=`Storage.5849b030-626e-47cb-ad90-3ce782d831b3`ReqID=`079c6f43-9453-4598-a9f3-7056ad3a595e`::Request
>  was made in '/usr/share/vdsm/storage/hsm.py' line '1020' at 
> '_connectStoragePool'
> Thread-98::DEBUG::2013-09-11 
> 16:35:06,860::resourceManager::541::ResourceManager::(registerResource) 
> Trying to register resource 'Storage.5849b030-626e-47cb-ad90-3ce782d831b3' 
> for lock type 'exclusive'
> Thread-98::DEBUG::2013-09-11 
> 16:35:06,861::resourceManager::600::ResourceManager::(registerResource) 
> Resource 'Storage.5849b030-626e-47cb-ad90-3ce782d831b3' is free. Now locking 
> as 'exclusive' (1 active user)
> Thread-98::DEBUG::2013-09-11 
> 16:35:06,861::resourceManager::237::ResourceManager.Request::(grant) 
> ResName=`Storage.5849b030-626e-47cb-ad90-3ce782d831b3`ReqID=`079c6f43-9453-4598-a9f3-7056ad3a595e`::Granted
>  request
> Thread-98::INFO::2013-09-11 
> 16:35:06,861::sp::680::Storage.StoragePool::(connect) Connect host #1 to the 
> storage pool 5849b030-626e-47cb-ad90-3ce782d831b3 with master domain: 
> 3401420d-6918-49f9-b9e2-23ba47949f29 (ver = 1)
> Thread-98::DEBUG::2013-09-11 
> 16:35:06,862::lvm::483::OperationMutex::(_invalidateAllPvs) Operation 'lvm 
> invalidate operation' got the operation mutex
> Thread-98::DEBUG::2013-09-11 
> 16:35:06,862::lvm::485::OperationMutex::(_invalidateAllPvs) Operation 'lvm 
> invalidate operation' released the operation mutex
> Thread-98::DEBUG::2013-09-11 
> 16:35:06,862::lvm::494::OperationMutex::(_invalidateAllVgs) Operation 'lvm 
> invalidate operation' got the operation mutex
> Thread-98::DEBUG::2013-09-11 
> 16:35:06,862::lvm::496::OperationMutex::(_invalidateAllVgs) Operation 'lvm 
> invalidate operation' released the operation mutex
> Thread-98::DEBUG::2013-09-11 
> 16:35:06,863::lvm::514::OperationMutex::(_invalidateAllLvs) Operation 'lvm 
> invalidate operation' got the operation mutex
> Thread-98::DEBUG::2013-09-11 
> 16:35:06,863::lvm::516::OperationMutex::(_invalidateAllLvs) Operation 'lvm 
> invalidate operation' released the operation mutex
> Thread-98::DEBUG::2013-09-11 
> 16:35:06,863::misc::807::SamplingMethod::(__call__) Trying to enter sampling 
> method (storage.sdc.refreshStorage)
> Thread-98::DEBUG::2013-09-11 
> 16:35:06,863::misc::809::SamplingMethod::(__call__) Got in to sampling method
> Thread-98::DEBUG::2013-09-11 
> 16:35:06,863::misc::807::SamplingMethod::(__call__) Trying to enter sampling 
> method (storage.iscsi.rescan)
> Thread-98::DEBUG::2013-09-11 
> 16:35:06,864::misc::809::SamplingMethod::(__call__) Got in to sampling method
> Thread-98::DEBUG::2013-09-11 
> 16:35:06,864::iscsiadm::91::Storage.Misc.excCmd::(_runCmd) '/usr/bin/sudo -n 
> /sbin/iscsiadm -m session -R' (cwd None)
> Thread-98::DEBUG::2013-09-11 
> 16:35:06,875::iscsiadm::91::Storage.Misc.excCmd::(_runCmd) FAILED: <err> = 
> 'iscsiadm: No session found.\n'; <rc> = 21

Before the connectStoragePool attempt, vdsm.log should have had
connectStorageServer. Does it? Is it successful?


> Thread-98::DEBUG::2013-09-11 
> 16:35:06,875::misc::817::SamplingMethod::(__call__) Returning last result
> Thread-98::DEBUG::2013-09-11 
> 16:35:08,902::multipath::111::Storage.Misc.excCmd::(rescan) '/usr/bin/sudo -n 
> /sbin/multipath' (cwd None)
> Thread-98::DEBUG::2013-09-11 
> 16:35:08,916::multipath::111::Storage.Misc.excCmd::(rescan) SUCCESS: <err> = 
> ''; <rc> = 0
> Thread-98::DEBUG::2013-09-11 
> 16:35:08,917::lvm::483::OperationMutex::(_invalidateAllPvs) Operation 'lvm 
> invalidate operation' got the operation mutex
> Thread-98::DEBUG::2013-09-11 
> 16:35:08,917::lvm::485::OperationMutex::(_invalidateAllPvs) Operation 'lvm 
> invalidate operation' released the operation mutex
> Thread-98::DEBUG::2013-09-11 
> 16:35:08,918::lvm::494::OperationMutex::(_invalidateAllVgs) Operation 'lvm 
> invalidate operation' got the operation mutex
> Thread-98::DEBUG::2013-09-11 
> 16:35:08,918::lvm::496::OperationMutex::(_invalidateAllVgs) Operation 'lvm 
> invalidate operation' released the operation mutex
> Thread-98::DEBUG::2013-09-11 
> 16:35:08,918::lvm::514::OperationMutex::(_invalidateAllLvs) Operation 'lvm 
> invalidate operation' got the operation mutex
> Thread-98::DEBUG::2013-09-11 
> 16:35:08,918::lvm::516::OperationMutex::(_invalidateAllLvs) Operation 'lvm 
> invalidate operation' released the operation mutex
> Thread-98::DEBUG::2013-09-11 
> 16:35:08,919::misc::817::SamplingMethod::(__call__) Returning last result
> Thread-98::ERROR::2013-09-11 
> 16:35:08,919::sdc::137::Storage.StorageDomainCache::(_findDomain) looking for 
> unfetched domain 3401420d-6918-49f9-b9e2-23ba47949f29
> Thread-98::ERROR::2013-09-11 
> 16:35:08,919::sdc::154::Storage.StorageDomainCache::(_findUnfetchedDomain) 
> looking for domain 3401420d-6918-49f9-b9e2-23ba47949f29
> Thread-98::DEBUG::2013-09-11 
> 16:35:08,919::lvm::374::OperationMutex::(_reloadvgs) Operation 'lvm reload 
> operation' got the operation mutex
> Thread-98::DEBUG::2013-09-11 
> 16:35:08,921::lvm::311::Storage.Misc.excCmd::(cmd) '/usr/bin/sudo -n 
> /sbin/lvm vgs --config " devices { preferred_names = [\\"^/dev/mapper/\\"] 
> ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 
> filter = [ \'a|/dev/mapper/3600508b1001052395359564d53570200|\', \'r|.*|\' ] 
> } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 } backup 
> { retain_min = 50 retain_days = 0 } " --noheadings --units b --nosuffix 
> --separator | -o 
> uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free
>  3401420d-6918-49f9-b9e2-23ba47949f29' (cwd None)
> Thread-98::DEBUG::2013-09-11 
> 16:35:09,096::lvm::311::Storage.Misc.excCmd::(cmd) FAILED: <err> = ' Volume 
> group "3401420d-6918-49f9-b9e2-23ba47949f29" not found\n'; <rc> = 5
> Thread-98::WARNING::2013-09-11 
> 16:35:09,097::lvm::379::Storage.LVM::(_reloadvgs) lvm vgs failed: 5 [] [' 
> Volume group "3401420d-6918-49f9-b9e2-23ba47949f29" not found']
> Thread-98::DEBUG::2013-09-11 
> 16:35:09,097::lvm::403::OperationMutex::(_reloadvgs) Operation 'lvm reload 
> operation' released the operation mutex
> Thread-98::ERROR::2013-09-11 
> 16:35:09,098::sdc::143::Storage.StorageDomainCache::(_findDomain) domain 
> 3401420d-6918-49f9-b9e2-23ba47949f29 not found
> Thread-98::DEBUG::2013-09-11 
> 16:35:09,099::resourceManager::615::ResourceManager::(releaseResource) Trying 
> to release resource 'Storage.5849b030-626e-47cb-ad90-3ce782d831b3'
> Thread-98::DEBUG::2013-09-11 
> 16:35:09,099::resourceManager::634::ResourceManager::(releaseResource) 
> Released resource 'Storage.5849b030-626e-47cb-ad90-3ce782d831b3' (0 active 
> users)
> Thread-98::DEBUG::2013-09-11 
> 16:35:09,099::resourceManager::640::ResourceManager::(releaseResource) 
> Resource 'Storage.5849b030-626e-47cb-ad90-3ce782d831b3' is free, finding out 
> if anyone is waiting for it.
> Thread-98::DEBUG::2013-09-11 
> 16:35:09,099::resourceManager::648::ResourceManager::(releaseResource) No one 
> is waiting for resource 'Storage.5849b030-626e-47cb-ad90-3ce782d831b3', 
> Clearing records.
> Thread-98::ERROR::2013-09-11 
> 16:35:09,100::task::850::TaskManager.Task::(_setError) 
> Task=`b2a7a44f-8cb0-453a-bf52-16a200f992f9`::Unexpected error


Any idea, anyone, why this line is not followed by a proper traceback?
(utterly unrelated - but worrisome - question)

> Thread-98::DEBUG::2013-09-11 
> 16:35:09,100::task::869::TaskManager.Task::(_run) 
> Task=`b2a7a44f-8cb0-453a-bf52-16a200f992f9`::Task._run: 
> b2a7a44f-8cb0-453a-bf52-16a200f992f9 ('5849b030-626e-47cb-ad90-3ce782d831b3', 
> 1, '5849b030-626e-47cb-ad90-3ce782d831b3', 
> '3401420d-6918-49f9-b9e2-23ba47949f29', 1) {} failed - stopping task
> Thread-98::DEBUG::2013-09-11 
> 16:35:09,100::task::1194::TaskManager.Task::(stop) 
> Task=`b2a7a44f-8cb0-453a-bf52-16a200f992f9`::stopping in state preparing 
> (force False)
> Thread-98::DEBUG::2013-09-11 
> 16:35:09,101::task::974::TaskManager.Task::(_decref) 
> Task=`b2a7a44f-8cb0-453a-bf52-16a200f992f9`::ref 1 aborting True
> Thread-98::INFO::2013-09-11 
> 16:35:09,101::task::1151::TaskManager.Task::(prepare) 
> Task=`b2a7a44f-8cb0-453a-bf52-16a200f992f9`::aborting: Task is aborted: 
> 'Cannot find master domain' - code 304
> Thread-98::DEBUG::2013-09-11 
> 16:35:09,101::task::1156::TaskManager.Task::(prepare) 
> Task=`b2a7a44f-8cb0-453a-bf52-16a200f992f9`::Prepare: aborted: Cannot find 
> master domain
> Thread-98::DEBUG::2013-09-11 
> 16:35:09,101::task::974::TaskManager.Task::(_decref) 
> Task=`b2a7a44f-8cb0-453a-bf52-16a200f992f9`::ref 0 aborting True
> Thread-98::DEBUG::2013-09-11 
> 16:35:09,102::task::909::TaskManager.Task::(_doAbort) 
> Task=`b2a7a44f-8cb0-453a-bf52-16a200f992f9`::Task._doAbort: force False
> Thread-98::DEBUG::2013-09-11 
> 16:35:09,102::resourceManager::976::ResourceManager.Owner::(cancelAll) 
> Owner.cancelAll requests {}
> Thread-98::DEBUG::2013-09-11 
> 16:35:09,102::task::579::TaskManager.Task::(_updateState) 
> Task=`b2a7a44f-8cb0-453a-bf52-16a200f992f9`::moving from state preparing -> 
> state aborting
> Thread-98::DEBUG::2013-09-11 
> 16:35:09,102::task::534::TaskManager.Task::(__state_aborting) 
> Task=`b2a7a44f-8cb0-453a-bf52-16a200f992f9`::_aborting: recover policy none
> Thread-98::DEBUG::2013-09-11 
> 16:35:09,102::task::579::TaskManager.Task::(_updateState) 
> Task=`b2a7a44f-8cb0-453a-bf52-16a200f992f9`::moving from state aborting -> 
> state failed
> Thread-98::DEBUG::2013-09-11 
> 16:35:09,103::resourceManager::939::ResourceManager.Owner::(releaseAll) 
> Owner.releaseAll requests {} resources {}
> Thread-98::DEBUG::2013-09-11 
> 16:35:09,103::resourceManager::976::ResourceManager.Owner::(cancelAll) 
> Owner.cancelAll requests {}
> Thread-98::ERROR::2013-09-11 
> 16:35:09,103::dispatcher::67::Storage.Dispatcher.Protect::(run) {'status': 
> {'message': "Cannot find master domain: 
> 'spUUID=5849b030-626e-47cb-ad90-3ce782d831b3, 
> msdUUID=3401420d-6918-49f9-b9e2-23ba47949f29'", 'code': 304}}
> 
_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users

Reply via email to