Hi All, Thank you very much for the help.
Ended up solving the issue. There was an invalid value in our configuration table which seemed to prevent a lot of DAOs from being autowired. On 29 August 2014 21:16, Paul Angus <[email protected]> wrote: > Hi Ian, > > I've seen this kind of behaviour before with KVM hosts reconnecting. > > There’s a select …. WITH UPDATE; query on the op_ha_work table which locks > the table, stopping other hosts updating their status. If there are a lot > of entries in there they all lock each other out. Deleting the entries > fixed the problem, but you have to deal with hosts and vms being up/down > yourself. > > So check the op_ha_work table for lots of entries which can lock up the > database. If you can check the database for the queries that it's handling > - that would be best. > > Also check that the management server and MySQL DB is tuned for the load > that being thrown at it. > (http://support.citrix.com/article/CTX132020) > Remember if you have other services such as Nagios or puppet/chef directly > reading the DB, that adds to the number of connections into the mysql db - > I have seen the management server starved of mysql connections when a lot > of hosts are brought back online. > > > Regards > > Paul Angus > Cloud Architect > S: +44 20 3603 0540 | M: +447711418784 | T: CloudyAngus > [email protected] > > -----Original Message----- > From: [email protected] [mailto:[email protected]] On Behalf Of > Carlos Reategui > Sent: 29 August 2014 20:55 > To: [email protected] > Subject: Re: Management Server won't connect after cluster shutdown and > restart > > Hi Ian, > > So the root of the problem was that the machines where not started up in > the correct order. > > My plan had been to stop all VMs from CS, then stop CS, then shutdown the > VM hosts. On the other end the hosts needed to be brought up first and > once they are ok then bring up the CS machine and make sure everything was > in the same state it thought things were when it was shutdown. > Unfortunately CS came up before everything else was the way it expected > it to be and I did not realize that at the time. > > To resolve I went back to my CS db backup from right after I shut it down > the MS, made sure the VM hosts were all as expected and then started the MS. > > > > > > > On Fri, Aug 29, 2014 at 8:02 AM, Ian Duffy <[email protected]> wrote: > > > Hi carlos, > > > > Did you ever find a fix for this? > > > > I'm seeing a same issue on 4.1.1 with Vmware ESXi. > > > > > > On 29 October 2013 04:54, Carlos Reategui <[email protected]> wrote: > > > > > Update. I cleared out the async_job table and also reset the system > > > vms > > it > > > thought where in starting mode from my previous attempts by setting > > > them > > to > > > Stopped from starting. I also re-set the XS pool master to be the > > > one XS thinks it is. > > > > > > Now when I start the CS MS here are the logs leading up to the first > > > exception about the Unable to reach the pool: > > > > > > 2013-10-28 21:27:11,040 DEBUG [cloud.alert.ClusterAlertAdapter] > > > (Cluster-Notification-1:null) Management server node 172.30.45.2 is > > > up, send alert > > > > > > 2013-10-28 21:27:11,045 WARN [cloud.cluster.ClusterManagerImpl] > > > (Cluster-Notification-1:null) Notifying management server join event > > took 9 > > > ms > > > > > > 2013-10-28 21:27:23,236 DEBUG [cloud.server.StatsCollector] > > > (StatsCollector-2:null) HostStatsCollector is running... > > > > > > 2013-10-28 21:27:23,243 DEBUG [cloud.server.StatsCollector] > > > (StatsCollector-3:null) VmStatsCollector is running... > > > > > > 2013-10-28 21:27:23,247 DEBUG [cloud.server.StatsCollector] > > > (StatsCollector-1:null) StorageCollector is running... > > > > > > 2013-10-28 21:27:23,255 DEBUG [cloud.server.StatsCollector] > > > (StatsCollector-1:null) There is no secondary storage VM for > > > secondary storage host nfs://172.30.45.2/store/secondary > > > > > > 2013-10-28 21:27:23,273 DEBUG [agent.manager.ClusteredAgentAttache] > > > (StatsCollector-2:null) Seq 1-201916421: Forwarding null to > > 233845174730255 > > > > > > 2013-10-28 21:27:23,274 DEBUG [agent.manager.ClusteredAgentAttache] > > > (AgentManager-Handler-9:null) Seq 1-201916421: Routing from > > 233845174730253 > > > > > > 2013-10-28 21:27:23,275 DEBUG [agent.manager.ClusteredAgentAttache] > > > (AgentManager-Handler-9:null) Seq 1-201916421: Link is closed > > > > > > 2013-10-28 21:27:23,275 DEBUG > > > [agent.manager.ClusteredAgentManagerImpl] > > > (AgentManager-Handler-9:null) Seq 1-201916421: MgmtId 233845174730253: > > Req: > > > Resource [Host:1] is unreachable: Host 1: Link is c > > > > > > losed > > > > > > 2013-10-28 21:27:23,275 DEBUG > > > [agent.manager.ClusteredAgentManagerImpl] > > > (AgentManager-Handler-9:null) Seq 1--1: MgmtId 233845174730253: Req: > > > Routing to peer > > > > > > 2013-10-28 21:27:23,277 DEBUG > > > [agent.manager.ClusteredAgentManagerImpl] > > > (AgentManager-Handler-11:null) Seq 1--1: MgmtId 233845174730253: Req: > > > Cancel request received > > > > > > 2013-10-28 21:27:23,277 DEBUG [agent.manager.AgentAttache] > > > (AgentManager-Handler-11:null) Seq 1-201916421: Cancelling. > > > > > > 2013-10-28 21:27:23,277 DEBUG [agent.manager.AgentAttache] > > > (StatsCollector-2:null) Seq 1-201916421: Waiting some more time > > > because this is the current command > > > > > > 2013-10-28 21:27:23,277 DEBUG [agent.manager.AgentAttache] > > > (StatsCollector-2:null) Seq 1-201916421: Waiting some more time > > > because this is the current command > > > > > > 2013-10-28 21:27:23,277 INFO [utils.exception.CSExceptionErrorCode] > > > (StatsCollector-2:null) Could not find exception: > > > com.cloud.exception.OperationTimedoutException in error code list > > > for exceptions > > > > > > 2013-10-28 21:27:23,277 WARN [agent.manager.AgentAttache] > > > (StatsCollector-2:null) Seq 1-201916421: Timed out on null > > > > > > 2013-10-28 21:27:23,278 DEBUG [agent.manager.AgentAttache] > > > (StatsCollector-2:null) Seq 1-201916421: Cancelling. > > > > > > 2013-10-28 21:27:23,278 WARN [agent.manager.AgentManagerImpl] > > > (StatsCollector-2:null) Operation timed out: Commands 201916421 to > > > Host 1 timed out after 3600 > > > > > > 2013-10-28 21:27:23,278 WARN [cloud.resource.ResourceManagerImpl] > > > (StatsCollector-2:null) Unable to obtain host 1 statistics. > > > > > > 2013-10-28 21:27:23,278 WARN [cloud.server.StatsCollector] > > > (StatsCollector-2:null) Received invalid host stats for host: 1 > > > > > > 2013-10-28 21:27:23,281 DEBUG [agent.manager.ClusteredAgentAttache] > > > (StatsCollector-1:null) Seq 1-201916422: Forwarding null to > > 233845174730255 > > > > > > 2013-10-28 21:27:23,283 DEBUG [agent.manager.ClusteredAgentAttache] > > > (AgentManager-Handler-12:null) Seq 1-201916422: Routing from > > > 233845174730253 > > > > > > 2013-10-28 21:27:23,283 DEBUG [agent.manager.ClusteredAgentAttache] > > > (AgentManager-Handler-12:null) Seq 1-201916422: Link is closed > > > > > > 2013-10-28 21:27:23,283 DEBUG > > > [agent.manager.ClusteredAgentManagerImpl] > > > (AgentManager-Handler-12:null) Seq 1-201916422: MgmtId 233845174730253: > > > Req: Resource [Host:1] is unreachable: Host 1: Link is > > > > > > closed > > > > > > 2013-10-28 21:27:23,284 DEBUG > > > [agent.manager.ClusteredAgentManagerImpl] > > > (AgentManager-Handler-12:null) Seq 1--1: MgmtId 233845174730253: Req: > > > Routing to peer > > > > > > 2013-10-28 21:27:23,286 DEBUG > > > [agent.manager.ClusteredAgentManagerImpl] > > > (AgentManager-Handler-13:null) Seq 1--1: MgmtId 233845174730253: Req: > > > Cancel request received > > > > > > 2013-10-28 21:27:23,286 DEBUG [agent.manager.AgentAttache] > > > (AgentManager-Handler-13:null) Seq 1-201916422: Cancelling. > > > > > > 2013-10-28 21:27:23,286 DEBUG [agent.manager.AgentAttache] > > > (StatsCollector-1:null) Seq 1-201916422: Waiting some more time > > > because this is the current command > > > > > > 2013-10-28 21:27:23,286 DEBUG [agent.manager.AgentAttache] > > > (StatsCollector-1:null) Seq 1-201916422: Waiting some more time > > > because this is the current command > > > > > > 2013-10-28 21:27:23,286 INFO [utils.exception.CSExceptionErrorCode] > > > (StatsCollector-1:null) Could not find exception: > > > com.cloud.exception.OperationTimedoutException in error code list > > > for exceptions > > > > > > 2013-10-28 21:27:23,286 WARN [agent.manager.AgentAttache] > > > (StatsCollector-1:null) Seq 1-201916422: Timed out on null > > > > > > 2013-10-28 21:27:23,286 DEBUG [agent.manager.AgentAttache] > > > (StatsCollector-1:null) Seq 1-201916422: Cancelling. > > > > > > 2013-10-28 21:27:23,288 DEBUG [cloud.storage.StorageManagerImpl] > > > (StatsCollector-1:null) Unable to send storage pool command to > > > Pool[200|LVM] via 1 > > > > > > com.cloud.exception.OperationTimedoutException: Commands 201916422 > > > to > > Host > > > 1 timed out after 3600 > > > > > > at > > com.cloud.agent.manager.AgentAttache.send(AgentAttache.java:429) > > > > > > at > > > com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java: > > > 511) > > > > > > at > > > com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java: > > > 464) > > > > > > at > > > > > > > > com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav > > a:2347) > > > > > > at > > > > > > > > com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav > > a:422) > > > > > > at > > > > > > > > com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav > > a:436) > > > > > > at > > > > > > > > com.cloud.server.StatsCollector$StorageCollector.run(StatsCollector.ja > > va:316) > > > > > > at > > > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:4 > > > 71) > > > > > > at > > > > > java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java: > > 351) > > > > > > at > > java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178) > > > > > > at > > > > > > > > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.a > > ccess$201(ScheduledThreadPoolExecutor.java:165) > > > > > > at > > > > > > > > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.r > > un(ScheduledThreadPoolExecutor.java:267) > > > > > > at > > > > > > > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.j > > ava:1146) > > > > > > at > > > > > > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor. > > java:615) > > > > > > at java.lang.Thread.run(Thread.java:679) > > > > > > 2013-10-28 21:27:23,289 INFO [cloud.server.StatsCollector] > > > (StatsCollector-1:null) Unable to reach Pool[200|LVM] > > > > > > com.cloud.exception.StorageUnavailableException: Resource > > [StoragePool:200] > > > is unreachable: Unable to send command to the pool > > > > > > at > > > > > > > > com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav > > a:2357) > > > > > > at > > > > > > > > com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav > > a:422) > > > > > > at > > > > > > > > com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav > > a:436) > > > > > > at > > > > > > > > com.cloud.server.StatsCollector$StorageCollector.run(StatsCollector.ja > > va:316) > > > > > > at > > > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:4 > > > 71) > > > > > > at > > > > > java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java: > > 351) > > > > > > at > > java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178) > > > > > > at > > > > > > > > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.a > > ccess$201(ScheduledThreadPoolExecutor.java:165) > > > > > > at > > > > > > > > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.r > > un(ScheduledThreadPoolExecutor.java:267) > > > > > > at > > > > > > > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.j > > ava:1146) > > > > > > at > > > > > > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor. > > java:615) > > > > > > at java.lang.Thread.run(Thread.java:679) > > > > > > 2013-10-28 21:27:23,300 DEBUG [agent.manager.ClusteredAgentAttache] > > > (StatsCollector-2:null) Seq 2-1168703496: Forwarding null to > > > 233845174730255 > > > > > > 2013-10-28 21:27:23,301 DEBUG [agent.manager.ClusteredAgentAttache] > > > (AgentManager-Handler-14:null) Seq 2-1168703496: Routing from > > > 233845174730253 > > > > > > 2013-10-28 21:27:23,301 DEBUG [agent.manager.ClusteredAgentAttache] > > > (AgentManager-Handler-14:null) Seq 2-1168703496: Link is closed > > > > > > 2013-10-28 21:27:23,302 DEBUG > > > [agent.manager.ClusteredAgentManagerImpl] > > > (AgentManager-Handler-14:null) Seq 2-1168703496: MgmtId > 233845174730253: > > > Req: Resource [Host:2] is unreachable: Host 2: Link is closed > > > > > > 2013-10-28 21:27:23,302 DEBUG > > > [agent.manager.ClusteredAgentManagerImpl] > > > (AgentManager-Handler-14:null) Seq 2--1: MgmtId 233845174730253: Req: > > > Routing to peer > > > > > > 2013-10-28 21:27:23,303 DEBUG > > > [agent.manager.ClusteredAgentManagerImpl] > > > (AgentManager-Handler-15:null) Seq 2--1: MgmtId 233845174730253: Req: > > > Cancel request received > > > > > > 2013-10-28 21:27:23,303 DEBUG [agent.manager.AgentAttache] > > > (AgentManager-Handler-15:null) Seq 2-1168703496: Cancelling. > > > > > > 2013-10-28 21:27:23,303 DEBUG [agent.manager.AgentAttache] > > > (StatsCollector-2:null) Seq 2-1168703496: Waiting some more time > > > because this is the current command > > > > > > 2013-10-28 21:27:23,304 DEBUG [agent.manager.AgentAttache] > > > (StatsCollector-2:null) Seq 2-1168703496: Waiting some more time > > > because this is the current command > > > > > > 2013-10-28 21:27:23,304 INFO [utils.exception.CSExceptionErrorCode] > > > (StatsCollector-2:null) Could not find exception: > > > com.cloud.exception.OperationTimedoutException in error code list > > > for exceptions > > > > > > 2013-10-28 21:27:23,304 WARN [agent.manager.AgentAttache] > > > (StatsCollector-2:null) Seq 2-1168703496: Timed out on null > > > > > > 2013-10-28 21:27:23,304 DEBUG [agent.manager.AgentAttache] > > > (StatsCollector-2:null) Seq 2-1168703496: Cancelling. > > > > > > 2013-10-28 21:27:23,304 WARN [agent.manager.AgentManagerImpl] > > > (StatsCollector-2:null) Operation timed out: Commands 1168703496 to > > > Host > > 2 > > > timed out after 3600 > > > > > > 2013-10-28 21:27:23,304 WARN [cloud.resource.ResourceManagerImpl] > > > (StatsCollector-2:null) Unable to obtain host 2 statistics. > > > > > > 2013-10-28 21:27:23,304 WARN [cloud.server.StatsCollector] > > > (StatsCollector-2:null) Received invalid host stats for host: 2 > > > > > > 2013-10-28 21:27:23,307 DEBUG [agent.manager.ClusteredAgentAttache] > > > (StatsCollector-1:null) Seq 2-1168703497: Forwarding null to > > > 233845174730255 > > > > > > 2013-10-28 21:27:23,308 DEBUG [agent.manager.ClusteredAgentAttache] > > > (AgentManager-Handler-1:null) Seq 2-1168703497: Routing from > > > 233845174730253 > > > > > > 2013-10-28 21:27:23,308 DEBUG [agent.manager.ClusteredAgentAttache] > > > (AgentManager-Handler-1:null) Seq 2-1168703497: Link is closed > > > > > > 2013-10-28 21:27:23,308 DEBUG > > > [agent.manager.ClusteredAgentManagerImpl] > > > (AgentManager-Handler-1:null) Seq 2-1168703497: MgmtId 233845174730253: > > > Req: Resource [Host:2] is unreachable: Host 2: Link is closed > > > > > > 2013-10-28 21:27:23,308 DEBUG > > > [agent.manager.ClusteredAgentManagerImpl] > > > (AgentManager-Handler-1:null) Seq 2--1: MgmtId 233845174730253: Req: > > > Routing to peer > > > > > > 2013-10-28 21:27:23,310 DEBUG > > > [agent.manager.ClusteredAgentManagerImpl] > > > (AgentManager-Handler-5:null) Seq 2--1: MgmtId 233845174730253: Req: > > Cancel > > > request received > > > > > > 2013-10-28 21:27:23,310 DEBUG [agent.manager.AgentAttache] > > > (AgentManager-Handler-5:null) Seq 2-1168703497: Cancelling. > > > > > > 2013-10-28 21:27:23,310 DEBUG [agent.manager.AgentAttache] > > > (StatsCollector-1:null) Seq 2-1168703497: Waiting some more time > > > because this is the current command > > > > > > 2013-10-28 21:27:23,310 DEBUG [agent.manager.AgentAttache] > > > (StatsCollector-1:null) Seq 2-1168703497: Waiting some more time > > > because this is the current command > > > > > > 2013-10-28 21:27:23,310 INFO [utils.exception.CSExceptionErrorCode] > > > (StatsCollector-1:null) Could not find exception: > > > com.cloud.exception.OperationTimedoutException in error code list > > > for exceptions > > > > > > 2013-10-28 21:27:23,310 WARN [agent.manager.AgentAttache] > > > (StatsCollector-1:null) Seq 2-1168703497: Timed out on null > > > > > > 2013-10-28 21:27:23,310 DEBUG [agent.manager.AgentAttache] > > > (StatsCollector-1:null) Seq 2-1168703497: Cancelling. > > > > > > 2013-10-28 21:27:23,310 DEBUG [cloud.storage.StorageManagerImpl] > > > (StatsCollector-1:null) Unable to send storage pool command to > > > Pool[201|LVM] via 2 > > > > > > com.cloud.exception.OperationTimedoutException: Commands 1168703497 > > > to > > Host > > > 2 timed out after 3600 > > > > > > at > > com.cloud.agent.manager.AgentAttache.send(AgentAttache.java:429) > > > > > > at > > > com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java: > > > 511) > > > > > > at > > > com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java: > > > 464) > > > > > > at > > > > > > > > com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav > > a:2347) > > > > > > at > > > > > > > > com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav > > a:422) > > > > > > at > > > > > > > > com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav > > a:436) > > > > > > at > > > > > > > > com.cloud.server.StatsCollector$StorageCollector.run(StatsCollector.ja > > va:316) > > > > > > at > > > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:4 > > > 71) > > > > > > at > > > > > java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java: > > 351) > > > > > > at > > java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178) > > > > > > at > > > > > > > > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.a > > ccess$201(ScheduledThreadPoolExecutor.java:165) > > > > > > at > > > > > > > > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.r > > un(ScheduledThreadPoolExecutor.java:267) > > > > > > at > > > > > > > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.j > > ava:1146) > > > > > > at > > > > > > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor. > > java:615) > > > > > > at java.lang.Thread.run(Thread.java:679) > > > > > > 2013-10-28 21:27:23,311 INFO [cloud.server.StatsCollector] > > > (StatsCollector-1:null) Unable to reach Pool[201|LVM] > > > > > > com.cloud.exception.StorageUnavailableException: Resource > > [StoragePool:201] > > > is unreachable: Unable to send command to the pool > > > > > > at > > > > > > > > com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav > > a:2357) > > > > > > at > > > > > > > > com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav > > a:422) > > > > > > at > > > > > > > > com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav > > a:436) > > > > > > at > > > > > > > > com.cloud.server.StatsCollector$StorageCollector.run(StatsCollector.ja > > va:316) > > > > > > at > > > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:4 > > > 71) > > > > > > at > > > > > java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java: > > 351) > > > > > > at > > java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178) > > > > > > at > > > > > > > > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.a > > ccess$201(ScheduledThreadPoolExecutor.java:165) > > > > > > at > > > > > > > > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.r > > un(ScheduledThreadPoolExecutor.java:267) > > > > > > at > > > > > > > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.j > > ava:1146) > > > > > > at > > > > > > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor. > > java:615) > > > > > > at java.lang.Thread.run(Thread.java:679) > > > > > > 2013-10-28 21:27:23,328 DEBUG [agent.manager.ClusteredAgentAttache] > > > (StatsCollector-1:null) Seq 2-1168703498: Forwarding null to > > > 233845174730255 > > > > > > 2013-10-28 21:27:23,329 DEBUG [agent.manager.ClusteredAgentAttache] > > > (AgentManager-Handler-3:null) Seq 2-1168703498: Routing from > > > 233845174730253 > > > > > > 2013-10-28 21:27:23,329 DEBUG [agent.manager.ClusteredAgentAttache] > > > (AgentManager-Handler-3:null) Seq 2-1168703498: Link is closed > > > > > > 2013-10-28 21:27:23,329 DEBUG > > > [agent.manager.ClusteredAgentManagerImpl] > > > (AgentManager-Handler-3:null) Seq 2-1168703498: MgmtId 233845174730253: > > > Req: Resource [Host:2] is unreachable: Host 2: Link is closed > > > > > > 2013-10-28 21:27:23,330 DEBUG > > > [agent.manager.ClusteredAgentManagerImpl] > > > (AgentManager-Handler-3:null) Seq 2--1: MgmtId 233845174730253: Req: > > > Routing to peer > > > > > > 2013-10-28 21:27:23,331 DEBUG > > > [agent.manager.ClusteredAgentManagerImpl] > > > (AgentManager-Handler-4:null) Seq 2--1: MgmtId 233845174730253: Req: > > Cancel > > > request received > > > > > > 2013-10-28 21:27:23,331 DEBUG [agent.manager.AgentAttache] > > > (AgentManager-Handler-4:null) Seq 2-1168703498: Cancelling. > > > > > > 2013-10-28 21:27:23,331 DEBUG [agent.manager.AgentAttache] > > > (StatsCollector-1:null) Seq 2-1168703498: Waiting some more time > > > because this is the current command > > > > > > 2013-10-28 21:27:23,331 DEBUG [agent.manager.AgentAttache] > > > (StatsCollector-1:null) Seq 2-1168703498: Waiting some more time > > > because this is the current command > > > > > > 2013-10-28 21:27:23,331 INFO [utils.exception.CSExceptionErrorCode] > > > (StatsCollector-1:null) Could not find exception: > > > com.cloud.exception.OperationTimedoutException in error code list > > > for exceptions > > > > > > 2013-10-28 21:27:23,332 WARN [agent.manager.AgentAttache] > > > (StatsCollector-1:null) Seq 2-1168703498: Timed out on null > > > > > > 2013-10-28 21:27:23,332 DEBUG [agent.manager.AgentAttache] > > > (StatsCollector-1:null) Seq 2-1168703498: Cancelling. > > > > > > 2013-10-28 21:27:23,332 DEBUG [cloud.storage.StorageManagerImpl] > > > (StatsCollector-1:null) Unable to send storage pool command to > > > Pool[202|NetworkFilesystem] via 2 > > > > > > com.cloud.exception.OperationTimedoutException: Commands 1168703498 > > > to > > Host > > > 2 timed out after 3600 > > > > > > at > > com.cloud.agent.manager.AgentAttache.send(AgentAttache.java:429) > > > > > > at > > > com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java: > > > 511) > > > > > > at > > > com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java: > > > 464) > > > > > > at > > > > > > > > com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav > > a:2347) > > > > > > at > > > > > > > > com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav > > a:422) > > > > > > at > > > > > > > > com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav > > a:436) > > > > > > at > > > > > > > > com.cloud.server.StatsCollector$StorageCollector.run(StatsCollector.ja > > va:316) > > > > > > at > > > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:4 > > > 71) > > > > > > at > > > > > java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java: > > 351) > > > > > > at > > java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178) > > > > > > at > > > > > > > > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.a > > ccess$201(ScheduledThreadPoolExecutor.java:165) > > > > > > at > > > > > > > > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.r > > un(ScheduledThreadPoolExecutor.java:267) > > > > > > at > > > > > > > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.j > > ava:1146) > > > > > > at > > > > > > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor. > > java:615) > > > > > > at java.lang.Thread.run(Thread.java:679) > > > > > > IP tables is disable on the XS hosts so the connection prob is not a > > > firewall issue. > > > > > > If I do an xe se-list I see all 3 of the above SRs and the hosts > > > have mounted the NFS SR and can access it. > > > > > > > > > > > > > > > On Mon, Oct 28, 2013 at 9:05 PM, Carlos Reategui > > > <[email protected] > > > >wrote: > > > > > > > Using CS 4.1.1 with 2 hosts running XS 6.0.2 > > > > > > > > Had to shut everything down and now I am having problems bringing > > things > > > > up. > > > > > > > > As suggested I used CS to stop all my instances as well as the > > > > system > > VMs > > > > and the SR. Then I shutdown the XS 6.02 servers after enabling > > > maintenance > > > > mode from the CS console. > > > > > > > > After bringing things up, my XS servers had the infamous > > interface-rename > > > > issue which I resolved by editing the udev rules file manually. > > > > > > > > Now I have my XS servers up but for some reason my pool master got > > > changed > > > > so I used xe pool-designate-new-master to switch it back. > > > > > > > > I did not notice that this designation change had been picked up > > > > by CS > > > and > > > > when starting it up it keeps trying to connect to the wrong pool > > master. > > > > Should I switch XS to match CS or what do I need to change in CS > > > > to > > tell > > > > it what the pool master is? > > > > > > > > I tried putting the server that CS thinks is the master in > > > > maintenance mode from CS but that just ends up in an apparent > > > > infinite cycle > > spitting > > > > out endless lines like these: > > > > > > > > 2013-10-28 20:39:02,059 DEBUG > > > > [agent.manager.ClusteredAgentAttache] > > > > (AgentManager-Handler-1:null) Seq 2-855048230: Forwarding Seq > > > 2-855048230: > > > > { Cmd , MgmtId: 233845174730253, via: 2, Ver: v1, Flags > > > > > > > > : 100111, [{"MaintainCommand":{"wait":0}}] } to 233845174730255 > > > > > > > > 2013-10-28 20:39:02,060 DEBUG > > > > [agent.manager.ClusteredAgentAttache] > > > > (AgentManager-Handler-11:null) Seq 2-855048230: Forwarding Seq > > > > 2-855048230: { Cmd , MgmtId: 233845174730253, via: 2, Ver: v1, > > > > Flag > > > > > > > > s: 100111, [{"MaintainCommand":{"wait":0}}] } to 233845174730255 > > > > > > > > 2013-10-28 20:39:02,062 DEBUG > > > > [agent.manager.ClusteredAgentAttache] > > > > (AgentManager-Handler-13:null) Seq 2-855048230: Forwarding Seq > > > > 2-855048230: { Cmd , MgmtId: 233845174730253, via: 2, Ver: v1, > > > > Flag > > > > > > > > s: 100111, [{"MaintainCommand":{"wait":0}}] } to 233845174730255 > > > > > > > > 2013-10-28 20:39:02,063 DEBUG > > > > [agent.manager.ClusteredAgentAttache] > > > > (AgentManager-Handler-7:null) Seq 2-855048230: Forwarding Seq > > > 2-855048230: > > > > { Cmd , MgmtId: 233845174730253, via: 2, Ver: v1, Flags > > > > > > > > : 100111, [{"MaintainCommand":{"wait":0}}] } to 233845174730255 > > > > > > > > 2013-10-28 20:39:02,064 DEBUG > > > > [agent.manager.ClusteredAgentAttache] > > > > (AgentManager-Handler-15:null) Seq 2-855048230: Forwarding Seq > > > > 2-855048230: { Cmd , MgmtId: 233845174730253, via: 2, Ver: v1, > > > > Flag > > > > > > > > s: 100111, [{"MaintainCommand":{"wait":0}}] } to 233845174730255 > > > > > > > > 2013-10-28 20:39:02,066 DEBUG > > > > [agent.manager.ClusteredAgentAttache] > > > > (AgentManager-Handler-14:null) Seq 2-855048230: Forwarding Seq > > > > 2-855048230: { Cmd , MgmtId: 233845174730253, via: 2, Ver: v1, > > > > Flag > > > > > > > > s: 100111, [{"MaintainCommand":{"wait":0}}] } to 233845174730255 > > > > > > > > 2013-10-28 20:39:02,067 DEBUG > > > > [agent.manager.ClusteredAgentAttache] > > > > (AgentManager-Handler-2:null) Seq 2-855048230: Forwarding Seq > > > 2-855048230: > > > > { Cmd , MgmtId: 233845174730253, via: 2, Ver: v1, Flags > > > > > > > > : 100111, [{"MaintainCommand":{"wait":0}}] } to 233845174730255 > > > > > > > > 2013-10-28 20:39:02,068 DEBUG > > > > [agent.manager.ClusteredAgentAttache] > > > > (AgentManager-Handler-12:null) Seq 2-855048230: Forwarding Seq > > > > 2-855048230: { Cmd , MgmtId: 233845174730253, via: 2, Ver: v1, > > > > Flag > > > > > > > > s: 100111, [{"MaintainCommand":{"wait":0}}] } to 233845174730255 > > > > > > > > After stopping and restarting the MS, the first error I see is: > > > > > > > > 2013-10-28 20:41:53,749 DEBUG [cloud.api.ApiServlet] > > > > (catalina-exec-1:null) ===START=== 10.110.3.70 -- GET > > > > > > > > > command=queryAsyncJobResult&jobId=d695b8ba-53b5-4e22-8e97-54e5ed236f88 > > &response=json&sessi > > > > > > > > onkey=r4nsNGoidS8enQWHRKbV2AUNeac%3D&_=1383018110624 > > > > > > > > 2013-10-28 20:41:53,756 ERROR [cloud.api.ApiServlet] > > > > (catalina-exec-1:null) unknown exception writing api response > > > > > > > > java.lang.NullPointerException > > > > > > > > at > > > > > > > > > com.cloud.user.AccountManagerImpl.getSystemUser(AccountManagerImpl.jav > > a:280) > > > > > > > > at > > > > > > > > > com.cloud.user.AccountManagerImpl.getSystemUser(AccountManagerImpl.jav > > a:143) > > > > > > > > at > > > > com.cloud.api.ApiServlet.processRequest(ApiServlet.java:238) > > > > > > > > at com.cloud.api.ApiServlet.doGet(ApiServlet.java:66) > > > > > > > > at > > > > javax.servlet.http.HttpServlet.service(HttpServlet.java:617) > > > > > > > > at > > > > javax.servlet.http.HttpServlet.service(HttpServlet.java:717) > > > > > > > > at > > > > > > > > > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Appli > > cationFilterChain.java:290) > > > > > > > > at > > > > > > > > > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFi > > lterChain.java:206) > > > > > > > > at > > > > > > > > > org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperVa > > lve.java:233) > > > > > > > > at > > > > > > > > > org.apache.catalina.core.StandardContextValve.invoke(StandardContextVa > > lve.java:191) > > > > > > > > at > > > > > > > > > org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.ja > > va:127) > > > > > > > > at > > > > > > > > > org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.ja > > va:102) > > > > > > > > at > > > > > > org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:6 > > 15) > > > > > > > > at > > > > > > > > > org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValv > > e.java:109) > > > > > > > > at > > > > > > > > > org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java > > :293) > > > > > > > > at > > > > > > > > > org.apache.coyote.http11.Http11NioProcessor.process(Http11NioProcessor > > .java:889) > > > > > > > > at > > > > > > > > > org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.pro > > cess(Http11NioProtocol.java:744) > > > > > > > > at > > > > > > > > > org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint > > .java:2282) > > > > > > > > at > > > > > > > > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.j > > ava:1146) > > > > > > > > at > > > > > > > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor. > > java:615) > > > > > > > > at java.lang.Thread.run(Thread.java:679) > > > > > > > > 2013-10-28 20:41:53,761 DEBUG [cloud.api.ApiServlet] > > > > (catalina-exec-1:null) ===END=== 10.110.3.70 -- GET > > > > > > > > > command=queryAsyncJobResult&jobId=d695b8ba-53b5-4e22-8e97-54e5ed236f88 > > &response=json&session > > > > > > > > key=r4nsNGoidS8enQWHRKbV2AUNeac%3D&_=1383018110624 > > > > > > > > Then I see a few of these: > > > > > > > > 2013-10-28 20:42:01,464 WARN > > > > [agent.manager.ClusteredAgentManagerImpl] > > > > (HA-Worker-4:work-10) Unable to connect to peer management server: > > > > 233845174730255, ip: 172.30.45.2 due to Connection refused > > > > > > > > java.net.ConnectException: Connection refused > > > > > > > > at sun.nio.ch.Net.connect(Native Method) > > > > > > > > at > > > sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:530) > > > > > > > > at > > > > java.nio.channels.SocketChannel.open(SocketChannel.java:164) > > > > > > > > at > > > > > > > > > com.cloud.agent.manager.ClusteredAgentManagerImpl.connectToPeer(Cluste > > redAgentManagerImpl.java:477) > > > > > > > > at > > > > > > > > > com.cloud.agent.manager.ClusteredAgentAttache.send(ClusteredAgentAttac > > he.java:172) > > > > > > > > at > > > com.cloud.agent.manager.AgentAttache.send(AgentAttache.java:388) > > > > > > > > at > > > > > > com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:51 > > 1) > > > > > > > > at > > > > > > com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:46 > > 4) > > > > > > > > at > > > > > > > > > com.cloud.ha.CheckOnAgentInvestigator.isVmAlive(CheckOnAgentInvestigat > > or.java:53) > > > > > > > > at > > > > > > > > > com.cloud.ha.HighAvailabilityManagerImpl.restart(HighAvailabilityManag > > erImpl.java:434) > > > > > > > > at > > > > > > > > > com.cloud.ha.HighAvailabilityManagerImpl$WorkerThread.run(HighAvailabi > > lityManagerImpl.java:829) > > > > > > > > 2013-10-28 20:42:01,468 WARN > > > > [agent.manager.ClusteredAgentManagerImpl] > > > > (HA-Worker-2:work-11) Unable to connect to peer management server: > > > > 233845174730255, ip: 172.30.45.2 due to Connection refused > > > > > > > > java.net.ConnectException: Connection refused > > > > > > > > at sun.nio.ch.Net.connect(Native Method) > > > > > > > > at > > > sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:530) > > > > > > > > at > > > > java.nio.channels.SocketChannel.open(SocketChannel.java:164) > > > > > > > > at > > > > > > > > > com.cloud.agent.manager.ClusteredAgentManagerImpl.connectToPeer(Cluste > > redAgentManagerImpl.java:477) > > > > > > > > at > > > > > > > > > com.cloud.agent.manager.ClusteredAgentAttache.send(ClusteredAgentAttac > > he.java:172) > > > > > > > > at > > > com.cloud.agent.manager.AgentAttache.send(AgentAttache.java:388) > > > > > > > > at > > > > > > com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:51 > > 1) > > > > > > > > at > > > > > > com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:46 > > 4) > > > > > > > > at > > > > > > > > > com.cloud.ha.CheckOnAgentInvestigator.isVmAlive(CheckOnAgentInvestigat > > or.java:53) > > > > > > > > at > > > > > > > > > com.cloud.ha.HighAvailabilityManagerImpl.restart(HighAvailabilityManag > > erImpl.java:434) > > > > > > > > at > > > > > > > > > com.cloud.ha.HighAvailabilityManagerImpl$WorkerThread.run(HighAvailabi > > lityManagerImpl.java:829) > > > > > > > > > > > > The next error is: > > > > > > > > 2013-10-28 20:42:01,845 WARN [utils.nio.Task] > > > > (AgentManager-Handler-6:null) Caught the following exception but > > pushing > > > on > > > > > > > > java.lang.NullPointerException > > > > > > > > at > > > > > > > > > com.google.gson.FieldAttributes.getAnnotationFromArray(FieldAttributes > > .java:231) > > > > > > > > at > > > > com.google.gson.FieldAttributes.getAnnotation(FieldAttributes.java > > > > :150) > > > > > > > > at > > > > > > > > > com.google.gson.VersionExclusionStrategy.shouldSkipField(VersionExclus > > ionStrategy.java:38) > > > > > > > > at > > > > > > > > > com.google.gson.DisjunctionExclusionStrategy.shouldSkipField(Disjuncti > > onExclusionStrategy.java:38) > > > > > > > > at > > > > > > > > > com.google.gson.ReflectingFieldNavigator.visitFieldsReflectively(Refle > > ctingFieldNavigator.java:58) > > > > > > > > at > > > com.google.gson.ObjectNavigator.accept(ObjectNavigator.java:120) > > > > > > > > at > > > > > > > > > com.google.gson.JsonSerializationContextDefault.serialize(JsonSerializ > > ationContextDefault.java:62) > > > > > > > > at > > > > > > > > > com.google.gson.JsonSerializationContextDefault.serialize(JsonSerializ > > ationContextDefault.java:53) > > > > > > > > at com.google.gson.Gson.toJsonTree(Gson.java:220) > > > > > > > > at com.google.gson.Gson.toJsonTree(Gson.java:197) > > > > > > > > at > > > > > > > > > com.cloud.agent.transport.ArrayTypeAdaptor.serialize(ArrayTypeAdaptor. > > java:56) > > > > > > > > at > > > > > > > > > com.cloud.agent.transport.ArrayTypeAdaptor.serialize(ArrayTypeAdaptor. > > java:37) > > > > > > > > at > > > > > > > > > com.google.gson.JsonSerializationVisitor.findAndInvokeCustomSerializer > > (JsonSerializationVisitor.java:184) > > > > > > > > at > > > > > > > > > com.google.gson.JsonSerializationVisitor.visitUsingCustomHandler(JsonS > > erializationVisitor.java:160) > > > > > > > > at > > > com.google.gson.ObjectNavigator.accept(ObjectNavigator.java:101) > > > > > > > > at > > > > > > > > > com.google.gson.JsonSerializationContextDefault.serialize(JsonSerializ > > ationContextDefault.java:62) > > > > > > > > at > > > > > > > > > com.google.gson.JsonSerializationContextDefault.serialize(JsonSerializ > > ationContextDefault.java:53) > > > > > > > > at com.google.gson.Gson.toJsonTree(Gson.java:220) > > > > > > > > at com.google.gson.Gson.toJson(Gson.java:260) > > > > > > > > at > > > > com.cloud.agent.transport.Request.toBytes(Request.java:316) > > > > > > > > at > > > > com.cloud.agent.transport.Request.getBytes(Request.java:332) > > > > > > > > at > > > > > > > > > com.cloud.agent.manager.ClusteredAgentManagerImpl.cancel(ClusteredAgen > > tManagerImpl.java:435) > > > > > > > > at > > > > > > > > > com.cloud.agent.manager.ClusteredAgentManagerImpl$ClusteredAgentHandle > > r.doTask(ClusteredAgentManagerImpl.java:641) > > > > > > > > at com.cloud.utils.nio.Task.run(Task.java:83) > > > > > > > > at > > > > > > > > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.j > > ava:1146) > > > > > > > > at > > > > > > > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor. > > java:615) > > > > > > > > at java.lang.Thread.run(Thread.java:679) > > > > > > > > and then the next set of errors I see over and over are: > > > > > > > > 2013-10-28 20:42:16,433 DEBUG [cloud.storage.StorageManagerImpl] > > > > (StatsCollector-2:null) Unable to send storage pool command to > > > > Pool[200|LVM] via 1 > > > > > > > > com.cloud.exception.OperationTimedoutException: Commands > > > > 1112277002 to Host 1 timed out after 3600 > > > > > > > > at > > > com.cloud.agent.manager.AgentAttache.send(AgentAttache.java:429) > > > > > > > > at > > > > > > com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:51 > > 1) > > > > > > > > at > > > > > > com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:46 > > 4) > > > > > > > > at > > > > > > > > > com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav > > a:2347) > > > > > > > > at > > > > > > > > > com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav > > a:422) > > > > > > > > at > > > > > > > > > com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav > > a:436) > > > > > > > > at > > > > > > > > > com.cloud.server.StatsCollector$StorageCollector.run(StatsCollector.ja > > va:316) > > > > > > > > at > > > > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java > > > > :471) > > > > > > > > at > > > > > > > > > java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java: > > 351) > > > > > > > > at > > > java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178) > > > > > > > > at > > > > > > > > > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.a > > ccess$201(ScheduledThreadPoolExecutor.java:165) > > > > > > > > at > > > > > > > > > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.r > > un(ScheduledThreadPoolExecutor.java:267) > > > > > > > > at > > > > > > > > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.j > > ava:1146) > > > > > > > > at > > > > > > > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor. > > java:615) > > > > > > > > at java.lang.Thread.run(Thread.java:679) > > > > > > > > 2013-10-28 20:42:16,434 INFO [cloud.server.StatsCollector] > > > > (StatsCollector-2:null) Unable to reach Pool[200|LVM] > > > > > > > > com.cloud.exception.StorageUnavailableException: Resource > > > > [StoragePool:200] is unreachable: Unable to send command to the > > > > pool > > > > > > > > at > > > > > > > > > com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav > > a:2357) > > > > > > > > at > > > > > > > > > com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav > > a:422) > > > > > > > > at > > > > > > > > > com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav > > a:436) > > > > > > > > at > > > > > > > > > com.cloud.server.StatsCollector$StorageCollector.run(StatsCollector.ja > > va:316) > > > > > > > > at > > > > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java > > > > :471) > > > > > > > > at > > > > > > > > > java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java: > > 351) > > > > > > > > at > > > java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178) > > > > > > > > at > > > > > > > > > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.a > > ccess$201(ScheduledThreadPoolExecutor.java:165) > > > > > > > > at > > > > > > > > > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.r > > un(ScheduledThreadPoolExecutor.java:267) > > > > > > > > at > > > > > > > > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.j > > ava:1146) > > > > > > > > at > > > > > > > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor. > > java:615) > > > > > > > > at java.lang.Thread.run(Thread.java:679) > > > > > > > > I have tried to force reconnect to both hosts but that ends up > > > > maxing > > out > > > > a CPU core and filling up the log file with endless log lines. > > > > > > > > Any thoughts on how to recover my system? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Find out more about ShapeBlue and our range of CloudStack related services > > IaaS Cloud Design & Build< > http://shapeblue.com/iaas-cloud-design-and-build//> > CSForge – rapid IaaS deployment framework<http://shapeblue.com/csforge/> > CloudStack Consulting<http://shapeblue.com/cloudstack-consultancy/> > CloudStack Infrastructure Support< > http://shapeblue.com/cloudstack-infrastructure-support/> > CloudStack Bootcamp Training Courses< > http://shapeblue.com/cloudstack-training/> > > This email and any attachments to it may be confidential and are intended > solely for the use of the individual to whom it is addressed. Any views or > opinions expressed are solely those of the author and do not necessarily > represent those of Shape Blue Ltd or related companies. If you are not the > intended recipient of this email, you must neither take any action based > upon its contents, nor copy or show it to anyone. Please contact the sender > if you believe you have received this email in error. Shape Blue Ltd is a > company incorporated in England & Wales. ShapeBlue Services India LLP is a > company incorporated in India and is operated under license from Shape Blue > Ltd. Shape Blue Brasil Consultoria Ltda is a company incorporated in Brasil > and is operated under license from Shape Blue Ltd. ShapeBlue SA Pty Ltd is > a company registered by The Republic of South Africa and is traded under > license from Shape Blue Ltd. ShapeBlue is a registered trademark. >
