Before hacking the db, I suggest trying one more thing. 1. In the CS UI, unmanage the cluster with the affected host. 2. From the CLI of a host in the XS pool, clear the host tags of the affected host. xe host-param-clear param-name=tags uuid=<UUID of affected host> 3. In the CS UI, manage the cluster with the affected host.
This usually solves this type of problem (it might take a couple minutes). Clearing host tags won't affect running VMs (if any). Best regards, Kirk On 05/28/2013 03:03 PM, Ahmad Emneina wrote: > first, I'd query to make sure its the right host, judging from the logs it > has an id of 20... execute the following > backup command from the cli that can access the db server: > > mysqldump <credentials if necessary> --databases cloud > cloud.sql > > Then start operating... > > select * from host where id=20\G (hit the enter key) > > if this is infact the misbehaving host, execute > > update host set status='Up' where id=20; (again hit enter) > > then try to launch a vm on that host (in the deployVirtualMachine api > command you can/should be able to specify the exact host) > > > On Tue, May 28, 2013 at 2:46 PM, Old, Curtis <[email protected]> wrote: > >> Oh god where do I find that sucker in the DB? Sorry I'M an admin type >> that isn't a mysql guru or Ji >> >> Curtis Old >> Neustar, Inc. / Neustarlabs / Senior Research Engineer >> 46000 Center Oak Plaza Sterling, VA 20166 >> Office: +1.571.434.5384 Mobile: +1.540.532.2230 / [email protected] >> / www.neustar.biz <http://www.neustar.biz/> >> ________________________________________ >> >> >> The information contained in this e-mail message is intended only for the >> use of the recipient(s) named above and may contain confidential and/or >> privileged information. If you are not the intended recipient you have >> received this e-mail message in error and any review, dissemination, >> distribution, or copying of this message is strictly prohibited. If you >> have received this communication in error, please notify us immediately >> and delete the original message. >> >> >> >> >> >> On 5/28/13 5:43 PM, "Ahmad Emneina" <[email protected]> wrote: >> >>> oh this looks like a bug. A potential workaround for it is marking the >>> host >>> as Up in the cloudstack db. I'd make a backup first then modify the db >>> entry. >>> >>> >>> On Tue, May 28, 2013 at 2:36 PM, Old, Curtis <[email protected]> >>> wrote: >>> >>>> Tailed last 3000 lines right after trying a forced reconnect: >>>> http://paste.cloudstack.org/hTOm/ >>>> >>>> Listed as Management log N*, >>>> >>>> Author Curtis Old >>>> >>>> >>>> Can we see the latest logs? paste.cloudstack.org >>>> >>>> >>>> On Tue, May 28, 2013 at 2:15 PM, Old, Curtis >>>> <[email protected]>wrote: >>>> >>>>> Tried that already (like 2 times) >>>>> >>>>> Curtis Old >>>>> Neustar, Inc. / Neustarlabs / Senior Research Engineer >>>>> 46000 Center Oak Plaza Sterling, VA 20166 >>>>> Office: +1.571.434.5384 Mobile: +1.540.532.2230 / >>>>> [email protected] >>>>> / www.neustar.biz <http://www.neustar.biz/> >>>>> ________________________________________ >>>>> >>>>> >>>>> The information contained in this e-mail message is intended only for >>>>> the >>>>> use of the recipient(s) named above and may contain confidential and/or >>>>> privileged information. If you are not the intended recipient you have >>>>> received this e-mail message in error and any review, dissemination, >>>>> distribution, or copying of this message is strictly prohibited. If you >>>>> have received this communication in error, please notify us immediately >>>>> and delete the original message. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On 5/28/13 5:12 PM, "Ahmad Emneina" <[email protected]> wrote: >>>>> >>>>>> how about a reboot of the host, then force reconnect. >>>>>> >>>>>> >>>>>> On Tue, May 28, 2013 at 2:05 PM, Old, Curtis <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Bummer no goodÅ ... >>>>>>> >>>>>>> Curtis Old >>>>>>> Neustar, Inc. / Neustarlabs / Senior Research Engineer >>>>>>> 46000 Center Oak Plaza Sterling, VA 20166 >>>>>>> Office: +1.571.434.5384 Mobile: +1.540.532.2230 / >>>>> [email protected] >>>>>>> / www.neustar.biz <http://www.neustar.biz/> >>>>>>> ________________________________________ >>>>>>> >>>>>>> >>>>>>> The information contained in this e-mail message is intended only >>>>> for >>>>>>> the >>>>>>> use of the recipient(s) named above and may contain confidential >>>>> and/or >>>>>>> privileged information. If you are not the intended recipient you >>>>> have >>>>>>> received this e-mail message in error and any review, dissemination, >>>>>>> distribution, or copying of this message is strictly prohibited. If >>>>> you >>>>>>> have received this communication in error, please notify us >>>>> immediately >>>>>>> and delete the original message. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 5/28/13 5:01 PM, "Ahmad Emneina" <[email protected]> wrote: >>>>>>> >>>>>>>> dont do that :D do a toolstack restart on the xenserver host, and a >>>>>>> force >>>>>>>> reconnect from cloudstack. see if that makes them play nice. >>>>>>>> >>>>>>>> >>>>>>>> On Tue, May 28, 2013 at 1:58 PM, Old, Curtis >>>>> <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Well the hypervisior looks fine from a Xenserver prospective >>>>> but >>>>>>>>> CloudStack won't "reconnect" it. I thought about deleting it in >>>>> CS >>>>>>> and >>>>>>>>> re-adding it, but that bugs me lol >>>>>>>>> >>>>>>>>> *Curtis Old** >>>>>>>>> **Neustar, Inc. / Neustarlabs / **Senior Research Engineer* >>>>>>>>> >>>>>>>>> *46000 Center Oak Plaza Sterling, VA 20166** >>>>>>>>> **Office:** **+1.571.434.5384** **Mobile: **+1.540.532.2230** >>>>> **/** >>>>>>> ** >>>>>>>>> [email protected]** / **www.neustar.biz** * >>>>>>>>> * >>>>>>>>> ------------------------------ >>>>>>>>> * >>>>>>>>> >>>>>>>>> *The information contained in this e-mail message is intended >>>>> only >>>>>>> for >>>>>>>>> the use of the recipient(s) named above and may contain >>>>> confidential >>>>>>>>> and/or >>>>>>>>> privileged information. If you are not the intended recipient you >>>>>>> have >>>>>>>>> received this e-mail message in error and any review, >>>>> dissemination, >>>>>>>>> distribution, or copying of this message is strictly prohibited. >>>>> If >>>>>>> you >>>>>>>>> have received this communication in error, please notify us >>>>>>> immediately >>>>>>>>> and >>>>>>>>> delete the original message.* >>>>>>>>> >>>>>>>>> From: Ahmad Emneina <[email protected]> >>>>>>>>> Reply-To: "[email protected]" <[email protected]> >>>>>>>>> Date: Tuesday, May 28, 2013 4:56 PM >>>>>>>>> To: "Old, Curtis" <[email protected]> >>>>>>>>> Subject: Re: Xenserver hypervisior will not reconnect after >>>>> reboot >>>>>>>>> >>>>>>>>> are you good to go again, after cleaning up some space? >>>>>>>>> >>>>>>>>> >>>>>>>>> On Tue, May 28, 2013 at 1:24 PM, Old, Curtis >>>>>>>>> <[email protected]>wrote: >>>>>>>>> >>>>>>>>>> Sorry the log entries are from the xenserver log >>>>>>>>>> >>>>>>>>>> Curtis Old >>>>>>>>>> Neustar, Inc. / Neustarlabs / Senior Research Engineer >>>>>>>>>> 46000 Center Oak Plaza Sterling, VA 20166 >>>>>>>>>> Office: +1.571.434.5384 Mobile: +1.540.532.2230 / >>>>>>>>>> [email protected] >>>>>>>>>> / www.neustar.biz <http://www.neustar.biz/> >>>>>>>>>> ________________________________________ >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> The information contained in this e-mail message is intended >>>>> only >>>>>>> for >>>>>>>>>> the >>>>>>>>>> use of the recipient(s) named above and may contain confidential >>>>>>> and/or >>>>>>>>>> privileged information. If you are not the intended recipient >>>>> you >>>>>>> have >>>>>>>>>> received this e-mail message in error and any review, >>>>> dissemination, >>>>>>>>>> distribution, or copying of this message is strictly >>>>> prohibited. If >>>>>>> you >>>>>>>>>> have received this communication in error, please notify us >>>>>>> immediately >>>>>>>>>> and delete the original message. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 5/28/13 4:20 PM, "Old, Curtis" <[email protected]> >>>>> wrote: >>>>>>>>>> >>>>>>>>>>> The in innodes did fill, so after I cleaned them up I rebooted >>>>> and >>>>>>>>>> then >>>>>>>>>>> the host was singing and dancing again in XenCenter, and >>>>> primary >>>>>>>>>> storage >>>>>>>>>>> was mounted. What from the management log, if we can narrow it >>>>>>> down? >>>>>>>>>>> When I try the forced reconnect only this shows up: >>>>>>>>>>> >>>>>>>>>>> [20130528T20:17:39.118Z|debug|lab-cloud-9|8622 >>>>>>>>>> inet_rpc||http_critical] >>>>>>>>>>> Connection terminated >>>>>>>>>>> [20130528T20:17:51.019Z| info|lab-cloud-9|28 >>>>> heartbeat|Heartbeat >>>>>>>>>>> D:4de2ede3bcba|xapi] stunnel pid: 23045 (cached = false) >>>>> connected >>>>>>> to >>>>>>>>>>> 10.31.105.158:443 >>>>>>>>>>> [20130528T20:17:51.019Z| info|lab-cloud-9|28 >>>>> heartbeat|Heartbeat >>>>>>>>>>> D:4de2ede3bcba|xapi] with_recorded_stunnelpid task_opt=None >>>>>>>>>> s_pid=23063 >>>>>>>>>>> [20130528T20:18:21.089Z| info|lab-cloud-9|28 >>>>> heartbeat|Heartbeat >>>>>>>>>>> D:4de2ede3bcba|xapi] stunnel pid: 23063 (cached = false) >>>>> connected >>>>>>> to >>>>>>>>>>> 10.31.105.158:443 >>>>>>>>>>> [20130528T20:18:21.089Z| info|lab-cloud-9|28 >>>>> heartbeat|Heartbeat >>>>>>>>>>> D:4de2ede3bcba|xapi] with_recorded_stunnelpid task_opt=None >>>>>>>>>> s_pid=23063 >>>>>>>>>>> [20130528T20:18:39.124Z|debug|lab-cloud-9|8623 >>>>>>>>>> inet_rpc||http_critical] >>>>>>>>>>> Connection terminated >>>>>>>>>>> [20130528T20:18:51.159Z| info|lab-cloud-9|28 >>>>> heartbeat|Heartbeat >>>>>>>>>>> D:4de2ede3bcba|xapi] stunnel pid: 23068 (cached = false) >>>>> connected >>>>>>> to >>>>>>>>>>> 10.31.105.158:443 >>>>>>>>>>> [20130528T20:18:51.159Z| info|lab-cloud-9|28 >>>>> heartbeat|Heartbeat >>>>>>>>>>> D:4de2ede3bcba|xapi] with_recorded_stunnelpid task_opt=None >>>>>>>>>> s_pid=23068 >>>>>>>>>>> [20130528T20:19:21.229Z| info|lab-cloud-9|28 >>>>> heartbeat|Heartbeat >>>>>>>>>>> D:4de2ede3bcba|xapi] stunnel pid: 23072 (cached = false) >>>>> connected >>>>>>> to >>>>>>>>>>> 10.31.105.158:443 >>>>>>>>>>> [20130528T20:19:21.229Z| info|lab-cloud-9|28 >>>>> heartbeat|Heartbeat >>>>>>>>>>> D:4de2ede3bcba|xapi] with_recorded_stunnelpid task_opt=None >>>>>>>>>> s_pid=23072 >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Curtis Old >>>>>>>>>>> Neustar, Inc. / Neustarlabs / Senior Research Engineer >>>>>>>>>>> 46000 Center Oak Plaza Sterling, VA 20166 >>>>>>>>>>> Office: +1.571.434.5384 Mobile: +1.540.532.2230 / >>>>>>>>>> [email protected] >>>>>>>>>>> / www.neustar.biz <http://www.neustar.biz/> >>>>>>>>>>> ________________________________________ >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> The information contained in this e-mail message is intended >>>>> only >>>>>>> for >>>>>>>>>> the >>>>>>>>>>> use of the recipient(s) named above and may contain >>>>> confidential >>>>>>>>>> and/or >>>>>>>>>>> privileged information. If you are not the intended recipient >>>>> you >>>>>>> have >>>>>>>>>>> received this e-mail message in error and any review, >>>>>>> dissemination, >>>>>>>>>>> distribution, or copying of this message is strictly >>>>> prohibited. >>>>> If >>>>>>>>>> you >>>>>>>>>>> have received this communication in error, please notify us >>>>>>>>>> immediately >>>>>>>>>>> and delete the original message. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 5/28/13 4:08 PM, "Ahmad Emneina" <[email protected]> >> wrote: >>>>>>>>>>> >>>>>>>>>>>> pretty odd... how does the disk space look on the xenserver >>>>> host? >>>>>>>>>> we'd >>>>>>>>>>>> need >>>>>>>>>>>> more from the management log, as well as the xensource.log off >>>>> the >>>>>>>>>>>> hypervisor. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Tue, May 28, 2013 at 9:21 AM, Old, Curtis >>>>>>> <[email protected] >>>>>>>> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Tried the force reconnect from CS 4.0.0 UI and I get >>>>> "Command >>>>>>>>>> Failed >>>>>>>>>>>>> due >>>>>>>>>>>>> to internal server Error" any ideas? >>>>>>>>>>>>> >>>>>>>>>>>>> management-server.log >>>>>>>>>>>>> >>>>>>>>>>>>> 2013-05-24 10:33:00,974 DEBUG >>>>> [cloud.async.AsyncJobManagerImpl] >>>>>>>>>>>>> (catalina-exec-7:null) submit async job-230, details: >>>>> AsyncJobVO >>>>>>>>>>>>> {id:230, >>>>>>>>>>>>> userId: 2, accountId: 2, sessionKey: null, instanceType: >>>>> Host, >>>>>>>>>>>>> instanceId: >>>>>>>>>>>>> 20, cmd: com.cloud.api.commands.ReconnectHostCmd, >>>>> cmdOriginator: >>>>>>>>>> null, >>>>>>>>>>>>> cmdInfo: >>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>> >>>>> >>>>> >>>>>>>>>>>>> {"response":"json","id":"346bcc09-e56b-4b27-b269-29252bc2a653"," >>>>>>>>>>>>> ses >>>>>>>>>>>>> si >>>>>>>>>>>>> onk >>>>>>>>>>>>> e >>>>>>>>>> >>>>>>>>>> >>>>>>> >>>>> >>>>> >>>>>>>>>>>>> y":"k/+GoOS0g1IOaSr5bKA2BWg1yD4\u003d","ctxUserId":"2","_":"1369 >>>>>>>>>>>>> 405 >>>>>>>>>>>>> 98 >>>>>>>>>>>>> 077 >>>>>>>>>>>>> 8 >>>>>>>>>>>>> ","ctxAccountId":"2","ctxStartEventId":"1514"}, >>>>>>>>>>>>> cmdVersion: 0, callbackType: 0, callbackAddress: null, >>>>> status: >>>>>>> 0, >>>>>>>>>>>>> processStatus: 0, resultCode: 0, result: null, initMsid: >>>>>>>>>> 112939138816, >>>>>>>>>>>>> completeMsid: null, lastUpdated: null, lastPolled: null, >>>>>>> created: >>>>>>>>>> null} >>>>>>>>>>>>> 2013-05-24 10:33:00,978 DEBUG >>>>> [cloud.async.AsyncJobManagerImpl] >>>>>>>>>>>>> (Job-Executor-16:job-230) Executing >>>>>>>>>>>>> com.cloud.api.commands.ReconnectHostCmd >>>>>>>>>>>>> for job-230 >>>>>>>>>>>>> 2013-05-24 10:33:00,985 INFO >>>>> [agent.manager.AgentManagerImpl] >>>>>>>>>>>>> (Job-Executor-16:job-230) Unable to disconnect host because >>>>> it >>>>>>> is >>>>>>>>>> not >>>>>>>>>>>>> in >>>>>>>>>>>>> the correct state: host=20; Status=Disconnected >>>>>>>>>>>>> 2013-05-24 10:33:00,986 WARN >>>>> [api.commands.ReconnectHostCmd] >>>>>>>>>>>>> (Job-Executor-16:job-230) Exception: >>>>>>>>>>>>> com.cloud.api.ServerApiException >>>>>>>>>>>>> at >>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>> >>>>> >>>>> >>>>>>>>>>>>> com.cloud.api.commands.ReconnectHostCmd.execute(ReconnectHostCmd >>>>>>>>>>>>> .ja >>>>>>>>>>>>> va >>>>>>>>>>>>> :10 >>>>>>>>>>>>> 8 >>>>>>>>>>>>> ) >>>>>>>>>>>>> at >>>>> com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:138) >>>>>>>>>>>>> at >>>>>>>>>> >>>>>>> >>>>> >>>>> >>>>>>>>>>>>> com.cloud.async.AsyncJobManagerImpl$1.run(AsyncJobManagerImpl.ja >>>>>>>>>>>>> va: >>>>>>>>>>>>> 43 >>>>>>>>>>>>> 2) >>>>>>>>>>>>> at >>>>>>>>>> >>>>>>> >>>>> >>>>> >>>>>>>>>>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.ja >>>>>>>>>>>>> va: >>>>>>>>>>>>> 47 >>>>>>>>>>>>> 1) >>>>>>>>>>>>> at >>>>>>> >>>>>>>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) >>>>>>>>>>>>> at java.util.concurrent.FutureTask.run(FutureTask.java:166) >>>>>>>>>>>>> at >>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>> >>>>> >>>>> >>>>>>>>>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExec >>>>>>>>>>>>> uto >>>>>>>>>>>>> r. >>>>>>>>>>>>> jav >>>>>>>>>>>>> a >>>>>>>>>>>>> :1110) >>>>>>>>>>>>> at >>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>> >>>>> >>>>> >>>>>>>>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExe >>>>>>>>>>>>> cut >>>>>>>>>>>>> or >>>>>>>>>>>>> .ja >>>>>>>>>>>>> v >>>>>>>>>>>>> a:603) >>>>>>>>>>>>> at java.lang.Thread.run(Thread.java:679) >>>>>>>>>>>>> 2013-05-24 10:33:00,986 WARN [cloud.api.ApiDispatcher] >>>>>>>>>>>>> (Job-Executor-16:job-230) class >>>>>>> com.cloud.api.ServerApiException : >>>>>>>>>> null >>>>>>>>>>>>> 2013-05-24 10:33:00,986 DEBUG >>>>> [cloud.async.AsyncJobManagerImpl] >>>>>>>>>>>>> (Job-Executor-16:job-230) Complete async job-230, >>>>> jobStatus: 2, >>>>>>>>>>>>> resultCode: >>>>>>>>>>>>> 530, result: Error Code: 534 Error text: null >>>>>>>>>>>>> 2013-05-24 10:33:05,998 DEBUG >>>>> [cloud.async.AsyncJobManagerImpl] >>>>>>>>>>>>> (catalina-exec-4:null) Async job-230 completed >>>>>>>>>>>>> >>>>>>>>>>>>> catalina.out >>>>>>>>>>>>> >>>>>>>>>>>>> INFO [agent.manager.AgentManagerImpl] >>>>> (Job-Executor-16:job-230) >>>>>>>>>> Unable >>>>>>>>>>>>> to >>>>>>>>>>>>> disconnect host because it is not in the correct state: >>>>> host=20; >>>>>>>>>>>>> Status=Disconnected >>>>>>>>>>>>> WARN [api.commands.ReconnectHostCmd] >>>>> (Job-Executor-16:job-230) >>>>>>>>>>>>> Exception: >>>>>>>>>>>>> com.cloud.api.ServerApiException >>>>>>>>>>>>> at >>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>> >>>>> >>>>> >>>>>>>>>>>>> com.cloud.api.commands.ReconnectHostCmd.execute(ReconnectHostCmd >>>>>>>>>>>>> .ja >>>>>>>>>>>>> va >>>>>>>>>>>>> :10 >>>>>>>>>>>>> 8 >>>>>>>>>>>>> ) >>>>>>>>>>>>> at >>>>>>>>>> com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:138) >>>>>>>>>>>>> at >>>>>>>>>>>>> >>>>>>>>>> >>>>> >>>>>>> com.cloud.async.AsyncJobManagerImpl$1.run(AsyncJobManagerImpl.java:432 >>>>>>> ) >>>>>>>>>>>>> at >>>>>>>>>>>>> >>>>>>>>>> >>>>> >>>>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471 >>>>>>> ) >>>>>>>>>>>>> at >>>>>>>>>>>>> >>>>>>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) >>>>>>>>>>>>> at >>>>>>> java.util.concurrent.FutureTask.run(FutureTask.java:166) >>>>>>>>>>>>> at >>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>> >>>>> >>>>> >>>>>>>>>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExec >>>>>>>>>>>>> uto >>>>>>>>>>>>> r. >>>>>>>>>>>>> jav >>>>>>>>>>>>> a >>>>>>>>>>>>> :1110) >>>>>>>>>>>>> at >>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>> >>>>> >>>>> >>>>>>>>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExe >>>>>>>>>>>>> cut >>>>>>>>>>>>> or >>>>>>>>>>>>> .ja >>>>>>>>>>>>> v >>>>>>>>>> >>>a:603) >>>>>>>>>>>>> at java.lang.Thread.run(Thread.java:679) >>>>>>>>>>>>> WARN [cloud.api.ApiDispatcher] (Job-Executor-16:job-230) >>>>> class >>>>>>>>>>>>> com.cloud.api.ServerApiException : null >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>> >>>>>>> >>>>> >>>>> >>>> >> >> >
