Re: Management Server won't connect after cluster shutdown and restart

Ian Duffy Sat, 30 Aug 2014 05:30:51 -0700

Hi All,

Thank you very much for the help.


Ended up solving the issue. There was an invalid value in our configuration
table which seemed to prevent a lot of DAOs from being autowired.




On 29 August 2014 21:16, Paul Angus <[email protected]> wrote:

> Hi Ian,
>
> I've seen this kind of behaviour before with KVM hosts reconnecting.
>
> There’s a select …. WITH UPDATE; query on the op_ha_work table which locks
> the table, stopping other hosts updating their status. If there are a lot
> of entries in there they all lock each other out. Deleting the entries
> fixed the problem, but you have to deal with hosts and vms being up/down
> yourself.
>
> So check the op_ha_work table for lots of entries which can lock up the
> database. If you can check the database for the queries that it's handling
> - that would be best.
>
> Also check that the management server and MySQL DB is tuned for the load
> that being thrown at it.
> (http://support.citrix.com/article/CTX132020)
> Remember if you have other services such as Nagios or puppet/chef directly
> reading the DB, that adds to the number of connections into the mysql db -
> I have seen the management server starved of mysql connections when a lot
> of hosts are brought back online.
>
>
> Regards
>
> Paul Angus
> Cloud Architect
> S: +44 20 3603 0540 | M: +447711418784 | T: CloudyAngus
> [email protected]
>
> -----Original Message-----
> From: [email protected] [mailto:[email protected]] On Behalf Of
> Carlos Reategui
> Sent: 29 August 2014 20:55
> To: [email protected]
> Subject: Re: Management Server won't connect after cluster shutdown and
> restart
>
> Hi Ian,
>
> So the root of the problem was that the machines where not started up in
> the correct order.
>
> My plan had been to stop all VMs from CS, then stop CS, then shutdown the
> VM hosts.  On the other end the hosts needed to be brought up first and
> once they are ok then bring up the CS machine and make sure everything was
> in the same state it thought things were when it was shutdown.
>  Unfortunately CS came up before everything else was the way it expected
> it to be and I did not realize that at the time.
>
> To resolve I went back to my CS db backup from right after I shut it down
> the MS, made sure the VM hosts were all as expected and then started the MS.
>
>
>
>
>
>
> On Fri, Aug 29, 2014 at 8:02 AM, Ian Duffy <[email protected]> wrote:
>
> > Hi carlos,
> >
> > Did you ever find a fix for this?
> >
> > I'm seeing a same issue on 4.1.1 with Vmware ESXi.
> >
> >
> > On 29 October 2013 04:54, Carlos Reategui <[email protected]> wrote:
> >
> > > Update.  I cleared out the async_job table and also reset the system
> > > vms
> > it
> > > thought where in starting mode from my previous attempts by setting
> > > them
> > to
> > > Stopped from starting.  I also re-set the XS pool master to be the
> > > one XS thinks it is.
> > >
> > > Now when I start the CS MS here are the logs leading up to the first
> > > exception about the Unable to reach the pool:
> > >
> > > 2013-10-28 21:27:11,040 DEBUG [cloud.alert.ClusterAlertAdapter]
> > > (Cluster-Notification-1:null) Management server node 172.30.45.2 is
> > > up, send alert
> > >
> > > 2013-10-28 21:27:11,045 WARN  [cloud.cluster.ClusterManagerImpl]
> > > (Cluster-Notification-1:null) Notifying management server join event
> > took 9
> > > ms
> > >
> > > 2013-10-28 21:27:23,236 DEBUG [cloud.server.StatsCollector]
> > > (StatsCollector-2:null) HostStatsCollector is running...
> > >
> > > 2013-10-28 21:27:23,243 DEBUG [cloud.server.StatsCollector]
> > > (StatsCollector-3:null) VmStatsCollector is running...
> > >
> > > 2013-10-28 21:27:23,247 DEBUG [cloud.server.StatsCollector]
> > > (StatsCollector-1:null) StorageCollector is running...
> > >
> > > 2013-10-28 21:27:23,255 DEBUG [cloud.server.StatsCollector]
> > > (StatsCollector-1:null) There is no secondary storage VM for
> > > secondary storage host nfs://172.30.45.2/store/secondary
> > >
> > > 2013-10-28 21:27:23,273 DEBUG [agent.manager.ClusteredAgentAttache]
> > > (StatsCollector-2:null) Seq 1-201916421: Forwarding null to
> > 233845174730255
> > >
> > > 2013-10-28 21:27:23,274 DEBUG [agent.manager.ClusteredAgentAttache]
> > > (AgentManager-Handler-9:null) Seq 1-201916421: Routing from
> > 233845174730253
> > >
> > > 2013-10-28 21:27:23,275 DEBUG [agent.manager.ClusteredAgentAttache]
> > > (AgentManager-Handler-9:null) Seq 1-201916421: Link is closed
> > >
> > > 2013-10-28 21:27:23,275 DEBUG
> > > [agent.manager.ClusteredAgentManagerImpl]
> > > (AgentManager-Handler-9:null) Seq 1-201916421: MgmtId 233845174730253:
> > Req:
> > > Resource [Host:1] is unreachable: Host 1: Link is c
> > >
> > > losed
> > >
> > > 2013-10-28 21:27:23,275 DEBUG
> > > [agent.manager.ClusteredAgentManagerImpl]
> > > (AgentManager-Handler-9:null) Seq 1--1: MgmtId 233845174730253: Req:
> > > Routing to peer
> > >
> > > 2013-10-28 21:27:23,277 DEBUG
> > > [agent.manager.ClusteredAgentManagerImpl]
> > > (AgentManager-Handler-11:null) Seq 1--1: MgmtId 233845174730253: Req:
> > > Cancel request received
> > >
> > > 2013-10-28 21:27:23,277 DEBUG [agent.manager.AgentAttache]
> > > (AgentManager-Handler-11:null) Seq 1-201916421: Cancelling.
> > >
> > > 2013-10-28 21:27:23,277 DEBUG [agent.manager.AgentAttache]
> > > (StatsCollector-2:null) Seq 1-201916421: Waiting some more time
> > > because this is the current command
> > >
> > > 2013-10-28 21:27:23,277 DEBUG [agent.manager.AgentAttache]
> > > (StatsCollector-2:null) Seq 1-201916421: Waiting some more time
> > > because this is the current command
> > >
> > > 2013-10-28 21:27:23,277 INFO  [utils.exception.CSExceptionErrorCode]
> > > (StatsCollector-2:null) Could not find exception:
> > > com.cloud.exception.OperationTimedoutException in error code list
> > > for exceptions
> > >
> > > 2013-10-28 21:27:23,277 WARN  [agent.manager.AgentAttache]
> > > (StatsCollector-2:null) Seq 1-201916421: Timed out on null
> > >
> > > 2013-10-28 21:27:23,278 DEBUG [agent.manager.AgentAttache]
> > > (StatsCollector-2:null) Seq 1-201916421: Cancelling.
> > >
> > > 2013-10-28 21:27:23,278 WARN  [agent.manager.AgentManagerImpl]
> > > (StatsCollector-2:null) Operation timed out: Commands 201916421 to
> > > Host 1 timed out after 3600
> > >
> > > 2013-10-28 21:27:23,278 WARN  [cloud.resource.ResourceManagerImpl]
> > > (StatsCollector-2:null) Unable to obtain host 1 statistics.
> > >
> > > 2013-10-28 21:27:23,278 WARN  [cloud.server.StatsCollector]
> > > (StatsCollector-2:null) Received invalid host stats for host: 1
> > >
> > > 2013-10-28 21:27:23,281 DEBUG [agent.manager.ClusteredAgentAttache]
> > > (StatsCollector-1:null) Seq 1-201916422: Forwarding null to
> > 233845174730255
> > >
> > > 2013-10-28 21:27:23,283 DEBUG [agent.manager.ClusteredAgentAttache]
> > > (AgentManager-Handler-12:null) Seq 1-201916422: Routing from
> > > 233845174730253
> > >
> > > 2013-10-28 21:27:23,283 DEBUG [agent.manager.ClusteredAgentAttache]
> > > (AgentManager-Handler-12:null) Seq 1-201916422: Link is closed
> > >
> > > 2013-10-28 21:27:23,283 DEBUG
> > > [agent.manager.ClusteredAgentManagerImpl]
> > > (AgentManager-Handler-12:null) Seq 1-201916422: MgmtId 233845174730253:
> > > Req: Resource [Host:1] is unreachable: Host 1: Link is
> > >
> > > closed
> > >
> > > 2013-10-28 21:27:23,284 DEBUG
> > > [agent.manager.ClusteredAgentManagerImpl]
> > > (AgentManager-Handler-12:null) Seq 1--1: MgmtId 233845174730253: Req:
> > > Routing to peer
> > >
> > > 2013-10-28 21:27:23,286 DEBUG
> > > [agent.manager.ClusteredAgentManagerImpl]
> > > (AgentManager-Handler-13:null) Seq 1--1: MgmtId 233845174730253: Req:
> > > Cancel request received
> > >
> > > 2013-10-28 21:27:23,286 DEBUG [agent.manager.AgentAttache]
> > > (AgentManager-Handler-13:null) Seq 1-201916422: Cancelling.
> > >
> > > 2013-10-28 21:27:23,286 DEBUG [agent.manager.AgentAttache]
> > > (StatsCollector-1:null) Seq 1-201916422: Waiting some more time
> > > because this is the current command
> > >
> > > 2013-10-28 21:27:23,286 DEBUG [agent.manager.AgentAttache]
> > > (StatsCollector-1:null) Seq 1-201916422: Waiting some more time
> > > because this is the current command
> > >
> > > 2013-10-28 21:27:23,286 INFO  [utils.exception.CSExceptionErrorCode]
> > > (StatsCollector-1:null) Could not find exception:
> > > com.cloud.exception.OperationTimedoutException in error code list
> > > for exceptions
> > >
> > > 2013-10-28 21:27:23,286 WARN  [agent.manager.AgentAttache]
> > > (StatsCollector-1:null) Seq 1-201916422: Timed out on null
> > >
> > > 2013-10-28 21:27:23,286 DEBUG [agent.manager.AgentAttache]
> > > (StatsCollector-1:null) Seq 1-201916422: Cancelling.
> > >
> > > 2013-10-28 21:27:23,288 DEBUG [cloud.storage.StorageManagerImpl]
> > > (StatsCollector-1:null) Unable to send storage pool command to
> > > Pool[200|LVM] via 1
> > >
> > > com.cloud.exception.OperationTimedoutException: Commands 201916422
> > > to
> > Host
> > > 1 timed out after 3600
> > >
> > >         at
> > com.cloud.agent.manager.AgentAttache.send(AgentAttache.java:429)
> > >
> > >         at
> > > com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:
> > > 511)
> > >
> > >         at
> > > com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:
> > > 464)
> > >
> > >         at
> > >
> > >
> > com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
> > a:2347)
> > >
> > >         at
> > >
> > >
> > com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
> > a:422)
> > >
> > >         at
> > >
> > >
> > com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
> > a:436)
> > >
> > >         at
> > >
> > >
> > com.cloud.server.StatsCollector$StorageCollector.run(StatsCollector.ja
> > va:316)
> > >
> > >         at
> > > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:4
> > > 71)
> > >
> > >         at
> > >
> > java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:
> > 351)
> > >
> > >         at
> > java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
> > >
> > >         at
> > >
> > >
> > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.a
> > ccess$201(ScheduledThreadPoolExecutor.java:165)
> > >
> > >         at
> > >
> > >
> > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.r
> > un(ScheduledThreadPoolExecutor.java:267)
> > >
> > >         at
> > >
> > >
> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.j
> > ava:1146)
> > >
> > >         at
> > >
> > >
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.
> > java:615)
> > >
> > >         at java.lang.Thread.run(Thread.java:679)
> > >
> > > 2013-10-28 21:27:23,289 INFO  [cloud.server.StatsCollector]
> > > (StatsCollector-1:null) Unable to reach Pool[200|LVM]
> > >
> > > com.cloud.exception.StorageUnavailableException: Resource
> > [StoragePool:200]
> > > is unreachable: Unable to send command to the pool
> > >
> > >         at
> > >
> > >
> > com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
> > a:2357)
> > >
> > >         at
> > >
> > >
> > com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
> > a:422)
> > >
> > >         at
> > >
> > >
> > com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
> > a:436)
> > >
> > >         at
> > >
> > >
> > com.cloud.server.StatsCollector$StorageCollector.run(StatsCollector.ja
> > va:316)
> > >
> > >         at
> > > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:4
> > > 71)
> > >
> > >         at
> > >
> > java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:
> > 351)
> > >
> > >         at
> > java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
> > >
> > >         at
> > >
> > >
> > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.a
> > ccess$201(ScheduledThreadPoolExecutor.java:165)
> > >
> > >         at
> > >
> > >
> > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.r
> > un(ScheduledThreadPoolExecutor.java:267)
> > >
> > >         at
> > >
> > >
> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.j
> > ava:1146)
> > >
> > >         at
> > >
> > >
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.
> > java:615)
> > >
> > >         at java.lang.Thread.run(Thread.java:679)
> > >
> > > 2013-10-28 21:27:23,300 DEBUG [agent.manager.ClusteredAgentAttache]
> > > (StatsCollector-2:null) Seq 2-1168703496: Forwarding null to
> > > 233845174730255
> > >
> > > 2013-10-28 21:27:23,301 DEBUG [agent.manager.ClusteredAgentAttache]
> > > (AgentManager-Handler-14:null) Seq 2-1168703496: Routing from
> > > 233845174730253
> > >
> > > 2013-10-28 21:27:23,301 DEBUG [agent.manager.ClusteredAgentAttache]
> > > (AgentManager-Handler-14:null) Seq 2-1168703496: Link is closed
> > >
> > > 2013-10-28 21:27:23,302 DEBUG
> > > [agent.manager.ClusteredAgentManagerImpl]
> > > (AgentManager-Handler-14:null) Seq 2-1168703496: MgmtId
> 233845174730253:
> > > Req: Resource [Host:2] is unreachable: Host 2: Link is closed
> > >
> > > 2013-10-28 21:27:23,302 DEBUG
> > > [agent.manager.ClusteredAgentManagerImpl]
> > > (AgentManager-Handler-14:null) Seq 2--1: MgmtId 233845174730253: Req:
> > > Routing to peer
> > >
> > > 2013-10-28 21:27:23,303 DEBUG
> > > [agent.manager.ClusteredAgentManagerImpl]
> > > (AgentManager-Handler-15:null) Seq 2--1: MgmtId 233845174730253: Req:
> > > Cancel request received
> > >
> > > 2013-10-28 21:27:23,303 DEBUG [agent.manager.AgentAttache]
> > > (AgentManager-Handler-15:null) Seq 2-1168703496: Cancelling.
> > >
> > > 2013-10-28 21:27:23,303 DEBUG [agent.manager.AgentAttache]
> > > (StatsCollector-2:null) Seq 2-1168703496: Waiting some more time
> > > because this is the current command
> > >
> > > 2013-10-28 21:27:23,304 DEBUG [agent.manager.AgentAttache]
> > > (StatsCollector-2:null) Seq 2-1168703496: Waiting some more time
> > > because this is the current command
> > >
> > > 2013-10-28 21:27:23,304 INFO  [utils.exception.CSExceptionErrorCode]
> > > (StatsCollector-2:null) Could not find exception:
> > > com.cloud.exception.OperationTimedoutException in error code list
> > > for exceptions
> > >
> > > 2013-10-28 21:27:23,304 WARN  [agent.manager.AgentAttache]
> > > (StatsCollector-2:null) Seq 2-1168703496: Timed out on null
> > >
> > > 2013-10-28 21:27:23,304 DEBUG [agent.manager.AgentAttache]
> > > (StatsCollector-2:null) Seq 2-1168703496: Cancelling.
> > >
> > > 2013-10-28 21:27:23,304 WARN  [agent.manager.AgentManagerImpl]
> > > (StatsCollector-2:null) Operation timed out: Commands 1168703496 to
> > > Host
> > 2
> > > timed out after 3600
> > >
> > > 2013-10-28 21:27:23,304 WARN  [cloud.resource.ResourceManagerImpl]
> > > (StatsCollector-2:null) Unable to obtain host 2 statistics.
> > >
> > > 2013-10-28 21:27:23,304 WARN  [cloud.server.StatsCollector]
> > > (StatsCollector-2:null) Received invalid host stats for host: 2
> > >
> > > 2013-10-28 21:27:23,307 DEBUG [agent.manager.ClusteredAgentAttache]
> > > (StatsCollector-1:null) Seq 2-1168703497: Forwarding null to
> > > 233845174730255
> > >
> > > 2013-10-28 21:27:23,308 DEBUG [agent.manager.ClusteredAgentAttache]
> > > (AgentManager-Handler-1:null) Seq 2-1168703497: Routing from
> > > 233845174730253
> > >
> > > 2013-10-28 21:27:23,308 DEBUG [agent.manager.ClusteredAgentAttache]
> > > (AgentManager-Handler-1:null) Seq 2-1168703497: Link is closed
> > >
> > > 2013-10-28 21:27:23,308 DEBUG
> > > [agent.manager.ClusteredAgentManagerImpl]
> > > (AgentManager-Handler-1:null) Seq 2-1168703497: MgmtId 233845174730253:
> > > Req: Resource [Host:2] is unreachable: Host 2: Link is closed
> > >
> > > 2013-10-28 21:27:23,308 DEBUG
> > > [agent.manager.ClusteredAgentManagerImpl]
> > > (AgentManager-Handler-1:null) Seq 2--1: MgmtId 233845174730253: Req:
> > > Routing to peer
> > >
> > > 2013-10-28 21:27:23,310 DEBUG
> > > [agent.manager.ClusteredAgentManagerImpl]
> > > (AgentManager-Handler-5:null) Seq 2--1: MgmtId 233845174730253: Req:
> > Cancel
> > > request received
> > >
> > > 2013-10-28 21:27:23,310 DEBUG [agent.manager.AgentAttache]
> > > (AgentManager-Handler-5:null) Seq 2-1168703497: Cancelling.
> > >
> > > 2013-10-28 21:27:23,310 DEBUG [agent.manager.AgentAttache]
> > > (StatsCollector-1:null) Seq 2-1168703497: Waiting some more time
> > > because this is the current command
> > >
> > > 2013-10-28 21:27:23,310 DEBUG [agent.manager.AgentAttache]
> > > (StatsCollector-1:null) Seq 2-1168703497: Waiting some more time
> > > because this is the current command
> > >
> > > 2013-10-28 21:27:23,310 INFO  [utils.exception.CSExceptionErrorCode]
> > > (StatsCollector-1:null) Could not find exception:
> > > com.cloud.exception.OperationTimedoutException in error code list
> > > for exceptions
> > >
> > > 2013-10-28 21:27:23,310 WARN  [agent.manager.AgentAttache]
> > > (StatsCollector-1:null) Seq 2-1168703497: Timed out on null
> > >
> > > 2013-10-28 21:27:23,310 DEBUG [agent.manager.AgentAttache]
> > > (StatsCollector-1:null) Seq 2-1168703497: Cancelling.
> > >
> > > 2013-10-28 21:27:23,310 DEBUG [cloud.storage.StorageManagerImpl]
> > > (StatsCollector-1:null) Unable to send storage pool command to
> > > Pool[201|LVM] via 2
> > >
> > > com.cloud.exception.OperationTimedoutException: Commands 1168703497
> > > to
> > Host
> > > 2 timed out after 3600
> > >
> > >         at
> > com.cloud.agent.manager.AgentAttache.send(AgentAttache.java:429)
> > >
> > >         at
> > > com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:
> > > 511)
> > >
> > >         at
> > > com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:
> > > 464)
> > >
> > >         at
> > >
> > >
> > com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
> > a:2347)
> > >
> > >         at
> > >
> > >
> > com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
> > a:422)
> > >
> > >         at
> > >
> > >
> > com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
> > a:436)
> > >
> > >         at
> > >
> > >
> > com.cloud.server.StatsCollector$StorageCollector.run(StatsCollector.ja
> > va:316)
> > >
> > >         at
> > > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:4
> > > 71)
> > >
> > >         at
> > >
> > java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:
> > 351)
> > >
> > >         at
> > java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
> > >
> > >         at
> > >
> > >
> > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.a
> > ccess$201(ScheduledThreadPoolExecutor.java:165)
> > >
> > >         at
> > >
> > >
> > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.r
> > un(ScheduledThreadPoolExecutor.java:267)
> > >
> > >         at
> > >
> > >
> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.j
> > ava:1146)
> > >
> > >         at
> > >
> > >
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.
> > java:615)
> > >
> > >         at java.lang.Thread.run(Thread.java:679)
> > >
> > > 2013-10-28 21:27:23,311 INFO  [cloud.server.StatsCollector]
> > > (StatsCollector-1:null) Unable to reach Pool[201|LVM]
> > >
> > > com.cloud.exception.StorageUnavailableException: Resource
> > [StoragePool:201]
> > > is unreachable: Unable to send command to the pool
> > >
> > >         at
> > >
> > >
> > com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
> > a:2357)
> > >
> > >         at
> > >
> > >
> > com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
> > a:422)
> > >
> > >         at
> > >
> > >
> > com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
> > a:436)
> > >
> > >         at
> > >
> > >
> > com.cloud.server.StatsCollector$StorageCollector.run(StatsCollector.ja
> > va:316)
> > >
> > >         at
> > > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:4
> > > 71)
> > >
> > >         at
> > >
> > java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:
> > 351)
> > >
> > >         at
> > java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
> > >
> > >         at
> > >
> > >
> > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.a
> > ccess$201(ScheduledThreadPoolExecutor.java:165)
> > >
> > >         at
> > >
> > >
> > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.r
> > un(ScheduledThreadPoolExecutor.java:267)
> > >
> > >         at
> > >
> > >
> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.j
> > ava:1146)
> > >
> > >         at
> > >
> > >
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.
> > java:615)
> > >
> > >         at java.lang.Thread.run(Thread.java:679)
> > >
> > > 2013-10-28 21:27:23,328 DEBUG [agent.manager.ClusteredAgentAttache]
> > > (StatsCollector-1:null) Seq 2-1168703498: Forwarding null to
> > > 233845174730255
> > >
> > > 2013-10-28 21:27:23,329 DEBUG [agent.manager.ClusteredAgentAttache]
> > > (AgentManager-Handler-3:null) Seq 2-1168703498: Routing from
> > > 233845174730253
> > >
> > > 2013-10-28 21:27:23,329 DEBUG [agent.manager.ClusteredAgentAttache]
> > > (AgentManager-Handler-3:null) Seq 2-1168703498: Link is closed
> > >
> > > 2013-10-28 21:27:23,329 DEBUG
> > > [agent.manager.ClusteredAgentManagerImpl]
> > > (AgentManager-Handler-3:null) Seq 2-1168703498: MgmtId 233845174730253:
> > > Req: Resource [Host:2] is unreachable: Host 2: Link is closed
> > >
> > > 2013-10-28 21:27:23,330 DEBUG
> > > [agent.manager.ClusteredAgentManagerImpl]
> > > (AgentManager-Handler-3:null) Seq 2--1: MgmtId 233845174730253: Req:
> > > Routing to peer
> > >
> > > 2013-10-28 21:27:23,331 DEBUG
> > > [agent.manager.ClusteredAgentManagerImpl]
> > > (AgentManager-Handler-4:null) Seq 2--1: MgmtId 233845174730253: Req:
> > Cancel
> > > request received
> > >
> > > 2013-10-28 21:27:23,331 DEBUG [agent.manager.AgentAttache]
> > > (AgentManager-Handler-4:null) Seq 2-1168703498: Cancelling.
> > >
> > > 2013-10-28 21:27:23,331 DEBUG [agent.manager.AgentAttache]
> > > (StatsCollector-1:null) Seq 2-1168703498: Waiting some more time
> > > because this is the current command
> > >
> > > 2013-10-28 21:27:23,331 DEBUG [agent.manager.AgentAttache]
> > > (StatsCollector-1:null) Seq 2-1168703498: Waiting some more time
> > > because this is the current command
> > >
> > > 2013-10-28 21:27:23,331 INFO  [utils.exception.CSExceptionErrorCode]
> > > (StatsCollector-1:null) Could not find exception:
> > > com.cloud.exception.OperationTimedoutException in error code list
> > > for exceptions
> > >
> > > 2013-10-28 21:27:23,332 WARN  [agent.manager.AgentAttache]
> > > (StatsCollector-1:null) Seq 2-1168703498: Timed out on null
> > >
> > > 2013-10-28 21:27:23,332 DEBUG [agent.manager.AgentAttache]
> > > (StatsCollector-1:null) Seq 2-1168703498: Cancelling.
> > >
> > > 2013-10-28 21:27:23,332 DEBUG [cloud.storage.StorageManagerImpl]
> > > (StatsCollector-1:null) Unable to send storage pool command to
> > > Pool[202|NetworkFilesystem] via 2
> > >
> > > com.cloud.exception.OperationTimedoutException: Commands 1168703498
> > > to
> > Host
> > > 2 timed out after 3600
> > >
> > >         at
> > com.cloud.agent.manager.AgentAttache.send(AgentAttache.java:429)
> > >
> > >         at
> > > com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:
> > > 511)
> > >
> > >         at
> > > com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:
> > > 464)
> > >
> > >         at
> > >
> > >
> > com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
> > a:2347)
> > >
> > >         at
> > >
> > >
> > com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
> > a:422)
> > >
> > >         at
> > >
> > >
> > com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
> > a:436)
> > >
> > >         at
> > >
> > >
> > com.cloud.server.StatsCollector$StorageCollector.run(StatsCollector.ja
> > va:316)
> > >
> > >         at
> > > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:4
> > > 71)
> > >
> > >         at
> > >
> > java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:
> > 351)
> > >
> > >         at
> > java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
> > >
> > >         at
> > >
> > >
> > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.a
> > ccess$201(ScheduledThreadPoolExecutor.java:165)
> > >
> > >         at
> > >
> > >
> > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.r
> > un(ScheduledThreadPoolExecutor.java:267)
> > >
> > >         at
> > >
> > >
> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.j
> > ava:1146)
> > >
> > >         at
> > >
> > >
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.
> > java:615)
> > >
> > >         at java.lang.Thread.run(Thread.java:679)
> > >
> > > IP tables is disable on the XS hosts so the connection prob is not a
> > > firewall issue.
> > >
> > > If I do an xe se-list I see all 3 of the above SRs and the hosts
> > > have mounted the NFS SR and can access it.
> > >
> > >
> > >
> > >
> > > On Mon, Oct 28, 2013 at 9:05 PM, Carlos Reategui
> > > <[email protected]
> > > >wrote:
> > >
> > > > Using CS 4.1.1 with 2 hosts running XS 6.0.2
> > > >
> > > > Had to shut everything down and now I am having problems bringing
> > things
> > > > up.
> > > >
> > > > As suggested I used CS to stop all my instances as well as the
> > > > system
> > VMs
> > > > and the SR. Then I shutdown the XS 6.02 servers after enabling
> > > maintenance
> > > > mode from the CS console.
> > > >
> > > > After bringing things up, my XS servers had the infamous
> > interface-rename
> > > > issue which I resolved by editing the udev rules file manually.
> > > >
> > > > Now I have my XS servers up but for some reason my pool master got
> > > changed
> > > > so I used xe pool-designate-new-master to switch it back.
> > > >
> > > > I did not notice that this designation change had been picked up
> > > > by CS
> > > and
> > > > when starting it up it keeps trying to connect to the wrong pool
> > master.
> > > >  Should I switch XS to match CS or what do I need to change in CS
> > > > to
> > tell
> > > > it what the pool master is?
> > > >
> > > > I tried putting the server that CS thinks is the master in
> > > > maintenance mode from CS but that just ends up in an apparent
> > > > infinite cycle
> > spitting
> > > > out endless lines like these:
> > > >
> > > > 2013-10-28 20:39:02,059 DEBUG
> > > > [agent.manager.ClusteredAgentAttache]
> > > > (AgentManager-Handler-1:null) Seq 2-855048230: Forwarding Seq
> > > 2-855048230:
> > > > { Cmd , MgmtId: 233845174730253, via: 2, Ver: v1, Flags
> > > >
> > > > : 100111, [{"MaintainCommand":{"wait":0}}] } to 233845174730255
> > > >
> > > > 2013-10-28 20:39:02,060 DEBUG
> > > > [agent.manager.ClusteredAgentAttache]
> > > > (AgentManager-Handler-11:null) Seq 2-855048230: Forwarding Seq
> > > > 2-855048230:  { Cmd , MgmtId: 233845174730253, via: 2, Ver: v1,
> > > > Flag
> > > >
> > > > s: 100111, [{"MaintainCommand":{"wait":0}}] } to 233845174730255
> > > >
> > > > 2013-10-28 20:39:02,062 DEBUG
> > > > [agent.manager.ClusteredAgentAttache]
> > > > (AgentManager-Handler-13:null) Seq 2-855048230: Forwarding Seq
> > > > 2-855048230:  { Cmd , MgmtId: 233845174730253, via: 2, Ver: v1,
> > > > Flag
> > > >
> > > > s: 100111, [{"MaintainCommand":{"wait":0}}] } to 233845174730255
> > > >
> > > > 2013-10-28 20:39:02,063 DEBUG
> > > > [agent.manager.ClusteredAgentAttache]
> > > > (AgentManager-Handler-7:null) Seq 2-855048230: Forwarding Seq
> > > 2-855048230:
> > > > { Cmd , MgmtId: 233845174730253, via: 2, Ver: v1, Flags
> > > >
> > > > : 100111, [{"MaintainCommand":{"wait":0}}] } to 233845174730255
> > > >
> > > > 2013-10-28 20:39:02,064 DEBUG
> > > > [agent.manager.ClusteredAgentAttache]
> > > > (AgentManager-Handler-15:null) Seq 2-855048230: Forwarding Seq
> > > > 2-855048230:  { Cmd , MgmtId: 233845174730253, via: 2, Ver: v1,
> > > > Flag
> > > >
> > > > s: 100111, [{"MaintainCommand":{"wait":0}}] } to 233845174730255
> > > >
> > > > 2013-10-28 20:39:02,066 DEBUG
> > > > [agent.manager.ClusteredAgentAttache]
> > > > (AgentManager-Handler-14:null) Seq 2-855048230: Forwarding Seq
> > > > 2-855048230:  { Cmd , MgmtId: 233845174730253, via: 2, Ver: v1,
> > > > Flag
> > > >
> > > > s: 100111, [{"MaintainCommand":{"wait":0}}] } to 233845174730255
> > > >
> > > > 2013-10-28 20:39:02,067 DEBUG
> > > > [agent.manager.ClusteredAgentAttache]
> > > > (AgentManager-Handler-2:null) Seq 2-855048230: Forwarding Seq
> > > 2-855048230:
> > > > { Cmd , MgmtId: 233845174730253, via: 2, Ver: v1, Flags
> > > >
> > > > : 100111, [{"MaintainCommand":{"wait":0}}] } to 233845174730255
> > > >
> > > > 2013-10-28 20:39:02,068 DEBUG
> > > > [agent.manager.ClusteredAgentAttache]
> > > > (AgentManager-Handler-12:null) Seq 2-855048230: Forwarding Seq
> > > > 2-855048230:  { Cmd , MgmtId: 233845174730253, via: 2, Ver: v1,
> > > > Flag
> > > >
> > > > s: 100111, [{"MaintainCommand":{"wait":0}}] } to 233845174730255
> > > >
> > > > After stopping and restarting the MS, the first error I see is:
> > > >
> > > > 2013-10-28 20:41:53,749 DEBUG [cloud.api.ApiServlet]
> > > > (catalina-exec-1:null) ===START===  10.110.3.70 -- GET
> > > >
> > >
> > command=queryAsyncJobResult&jobId=d695b8ba-53b5-4e22-8e97-54e5ed236f88
> > &response=json&sessi
> > > >
> > > > onkey=r4nsNGoidS8enQWHRKbV2AUNeac%3D&_=1383018110624
> > > >
> > > > 2013-10-28 20:41:53,756 ERROR [cloud.api.ApiServlet]
> > > > (catalina-exec-1:null) unknown exception writing api response
> > > >
> > > > java.lang.NullPointerException
> > > >
> > > >         at
> > > >
> > >
> > com.cloud.user.AccountManagerImpl.getSystemUser(AccountManagerImpl.jav
> > a:280)
> > > >
> > > >         at
> > > >
> > >
> > com.cloud.user.AccountManagerImpl.getSystemUser(AccountManagerImpl.jav
> > a:143)
> > > >
> > > >         at
> > > > com.cloud.api.ApiServlet.processRequest(ApiServlet.java:238)
> > > >
> > > >         at com.cloud.api.ApiServlet.doGet(ApiServlet.java:66)
> > > >
> > > >         at
> > > > javax.servlet.http.HttpServlet.service(HttpServlet.java:617)
> > > >
> > > >         at
> > > > javax.servlet.http.HttpServlet.service(HttpServlet.java:717)
> > > >
> > > >         at
> > > >
> > >
> > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Appli
> > cationFilterChain.java:290)
> > > >
> > > >         at
> > > >
> > >
> > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFi
> > lterChain.java:206)
> > > >
> > > >         at
> > > >
> > >
> > org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperVa
> > lve.java:233)
> > > >
> > > >         at
> > > >
> > >
> > org.apache.catalina.core.StandardContextValve.invoke(StandardContextVa
> > lve.java:191)
> > > >
> > > >         at
> > > >
> > >
> > org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.ja
> > va:127)
> > > >
> > > >         at
> > > >
> > >
> > org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.ja
> > va:102)
> > > >
> > > >         at
> > > >
> > org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:6
> > 15)
> > > >
> > > >         at
> > > >
> > >
> > org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValv
> > e.java:109)
> > > >
> > > >         at
> > > >
> > >
> > org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java
> > :293)
> > > >
> > > >         at
> > > >
> > >
> > org.apache.coyote.http11.Http11NioProcessor.process(Http11NioProcessor
> > .java:889)
> > > >
> > > >         at
> > > >
> > >
> > org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.pro
> > cess(Http11NioProtocol.java:744)
> > > >
> > > >         at
> > > >
> > >
> > org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint
> > .java:2282)
> > > >
> > > >         at
> > > >
> > >
> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.j
> > ava:1146)
> > > >
> > > >         at
> > > >
> > >
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.
> > java:615)
> > > >
> > > >         at java.lang.Thread.run(Thread.java:679)
> > > >
> > > > 2013-10-28 20:41:53,761 DEBUG [cloud.api.ApiServlet]
> > > > (catalina-exec-1:null) ===END===  10.110.3.70 -- GET
> > > >
> > >
> > command=queryAsyncJobResult&jobId=d695b8ba-53b5-4e22-8e97-54e5ed236f88
> > &response=json&session
> > > >
> > > > key=r4nsNGoidS8enQWHRKbV2AUNeac%3D&_=1383018110624
> > > >
> > > > Then I see a few of these:
> > > >
> > > > 2013-10-28 20:42:01,464 WARN
> > > > [agent.manager.ClusteredAgentManagerImpl]
> > > > (HA-Worker-4:work-10) Unable to connect to peer management server:
> > > > 233845174730255, ip: 172.30.45.2 due to Connection refused
> > > >
> > > > java.net.ConnectException: Connection refused
> > > >
> > > >         at sun.nio.ch.Net.connect(Native Method)
> > > >
> > > >         at
> > > sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:530)
> > > >
> > > >         at
> > > > java.nio.channels.SocketChannel.open(SocketChannel.java:164)
> > > >
> > > >         at
> > > >
> > >
> > com.cloud.agent.manager.ClusteredAgentManagerImpl.connectToPeer(Cluste
> > redAgentManagerImpl.java:477)
> > > >
> > > >         at
> > > >
> > >
> > com.cloud.agent.manager.ClusteredAgentAttache.send(ClusteredAgentAttac
> > he.java:172)
> > > >
> > > >         at
> > > com.cloud.agent.manager.AgentAttache.send(AgentAttache.java:388)
> > > >
> > > >         at
> > > >
> > com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:51
> > 1)
> > > >
> > > >         at
> > > >
> > com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:46
> > 4)
> > > >
> > > >         at
> > > >
> > >
> > com.cloud.ha.CheckOnAgentInvestigator.isVmAlive(CheckOnAgentInvestigat
> > or.java:53)
> > > >
> > > >         at
> > > >
> > >
> > com.cloud.ha.HighAvailabilityManagerImpl.restart(HighAvailabilityManag
> > erImpl.java:434)
> > > >
> > > >         at
> > > >
> > >
> > com.cloud.ha.HighAvailabilityManagerImpl$WorkerThread.run(HighAvailabi
> > lityManagerImpl.java:829)
> > > >
> > > > 2013-10-28 20:42:01,468 WARN
> > > > [agent.manager.ClusteredAgentManagerImpl]
> > > > (HA-Worker-2:work-11) Unable to connect to peer management server:
> > > > 233845174730255, ip: 172.30.45.2 due to Connection refused
> > > >
> > > > java.net.ConnectException: Connection refused
> > > >
> > > >         at sun.nio.ch.Net.connect(Native Method)
> > > >
> > > >         at
> > > sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:530)
> > > >
> > > >         at
> > > > java.nio.channels.SocketChannel.open(SocketChannel.java:164)
> > > >
> > > >         at
> > > >
> > >
> > com.cloud.agent.manager.ClusteredAgentManagerImpl.connectToPeer(Cluste
> > redAgentManagerImpl.java:477)
> > > >
> > > >         at
> > > >
> > >
> > com.cloud.agent.manager.ClusteredAgentAttache.send(ClusteredAgentAttac
> > he.java:172)
> > > >
> > > >         at
> > > com.cloud.agent.manager.AgentAttache.send(AgentAttache.java:388)
> > > >
> > > >         at
> > > >
> > com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:51
> > 1)
> > > >
> > > >         at
> > > >
> > com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:46
> > 4)
> > > >
> > > >         at
> > > >
> > >
> > com.cloud.ha.CheckOnAgentInvestigator.isVmAlive(CheckOnAgentInvestigat
> > or.java:53)
> > > >
> > > >         at
> > > >
> > >
> > com.cloud.ha.HighAvailabilityManagerImpl.restart(HighAvailabilityManag
> > erImpl.java:434)
> > > >
> > > >         at
> > > >
> > >
> > com.cloud.ha.HighAvailabilityManagerImpl$WorkerThread.run(HighAvailabi
> > lityManagerImpl.java:829)
> > > >
> > > >
> > > > The next error is:
> > > >
> > > > 2013-10-28 20:42:01,845 WARN  [utils.nio.Task]
> > > > (AgentManager-Handler-6:null) Caught the following exception but
> > pushing
> > > on
> > > >
> > > > java.lang.NullPointerException
> > > >
> > > >         at
> > > >
> > >
> > com.google.gson.FieldAttributes.getAnnotationFromArray(FieldAttributes
> > .java:231)
> > > >
> > > >         at
> > > > com.google.gson.FieldAttributes.getAnnotation(FieldAttributes.java
> > > > :150)
> > > >
> > > >         at
> > > >
> > >
> > com.google.gson.VersionExclusionStrategy.shouldSkipField(VersionExclus
> > ionStrategy.java:38)
> > > >
> > > >         at
> > > >
> > >
> > com.google.gson.DisjunctionExclusionStrategy.shouldSkipField(Disjuncti
> > onExclusionStrategy.java:38)
> > > >
> > > >         at
> > > >
> > >
> > com.google.gson.ReflectingFieldNavigator.visitFieldsReflectively(Refle
> > ctingFieldNavigator.java:58)
> > > >
> > > >         at
> > > com.google.gson.ObjectNavigator.accept(ObjectNavigator.java:120)
> > > >
> > > >         at
> > > >
> > >
> > com.google.gson.JsonSerializationContextDefault.serialize(JsonSerializ
> > ationContextDefault.java:62)
> > > >
> > > >         at
> > > >
> > >
> > com.google.gson.JsonSerializationContextDefault.serialize(JsonSerializ
> > ationContextDefault.java:53)
> > > >
> > > >         at com.google.gson.Gson.toJsonTree(Gson.java:220)
> > > >
> > > >         at com.google.gson.Gson.toJsonTree(Gson.java:197)
> > > >
> > > >         at
> > > >
> > >
> > com.cloud.agent.transport.ArrayTypeAdaptor.serialize(ArrayTypeAdaptor.
> > java:56)
> > > >
> > > >         at
> > > >
> > >
> > com.cloud.agent.transport.ArrayTypeAdaptor.serialize(ArrayTypeAdaptor.
> > java:37)
> > > >
> > > >         at
> > > >
> > >
> > com.google.gson.JsonSerializationVisitor.findAndInvokeCustomSerializer
> > (JsonSerializationVisitor.java:184)
> > > >
> > > >         at
> > > >
> > >
> > com.google.gson.JsonSerializationVisitor.visitUsingCustomHandler(JsonS
> > erializationVisitor.java:160)
> > > >
> > > >         at
> > > com.google.gson.ObjectNavigator.accept(ObjectNavigator.java:101)
> > > >
> > > >         at
> > > >
> > >
> > com.google.gson.JsonSerializationContextDefault.serialize(JsonSerializ
> > ationContextDefault.java:62)
> > > >
> > > >         at
> > > >
> > >
> > com.google.gson.JsonSerializationContextDefault.serialize(JsonSerializ
> > ationContextDefault.java:53)
> > > >
> > > >         at com.google.gson.Gson.toJsonTree(Gson.java:220)
> > > >
> > > >         at com.google.gson.Gson.toJson(Gson.java:260)
> > > >
> > > >         at
> > > > com.cloud.agent.transport.Request.toBytes(Request.java:316)
> > > >
> > > >         at
> > > > com.cloud.agent.transport.Request.getBytes(Request.java:332)
> > > >
> > > >         at
> > > >
> > >
> > com.cloud.agent.manager.ClusteredAgentManagerImpl.cancel(ClusteredAgen
> > tManagerImpl.java:435)
> > > >
> > > >         at
> > > >
> > >
> > com.cloud.agent.manager.ClusteredAgentManagerImpl$ClusteredAgentHandle
> > r.doTask(ClusteredAgentManagerImpl.java:641)
> > > >
> > > >         at com.cloud.utils.nio.Task.run(Task.java:83)
> > > >
> > > >         at
> > > >
> > >
> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.j
> > ava:1146)
> > > >
> > > >         at
> > > >
> > >
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.
> > java:615)
> > > >
> > > >         at java.lang.Thread.run(Thread.java:679)
> > > >
> > > > and then the next set of errors I see over and over are:
> > > >
> > > > 2013-10-28 20:42:16,433 DEBUG [cloud.storage.StorageManagerImpl]
> > > > (StatsCollector-2:null) Unable to send storage pool command to
> > > > Pool[200|LVM] via 1
> > > >
> > > > com.cloud.exception.OperationTimedoutException: Commands
> > > > 1112277002 to Host 1 timed out after 3600
> > > >
> > > >         at
> > > com.cloud.agent.manager.AgentAttache.send(AgentAttache.java:429)
> > > >
> > > >         at
> > > >
> > com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:51
> > 1)
> > > >
> > > >         at
> > > >
> > com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:46
> > 4)
> > > >
> > > >         at
> > > >
> > >
> > com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
> > a:2347)
> > > >
> > > >         at
> > > >
> > >
> > com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
> > a:422)
> > > >
> > > >         at
> > > >
> > >
> > com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
> > a:436)
> > > >
> > > >         at
> > > >
> > >
> > com.cloud.server.StatsCollector$StorageCollector.run(StatsCollector.ja
> > va:316)
> > > >
> > > >         at
> > > > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java
> > > > :471)
> > > >
> > > >         at
> > > >
> > >
> > java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:
> > 351)
> > > >
> > > >         at
> > > java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
> > > >
> > > >         at
> > > >
> > >
> > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.a
> > ccess$201(ScheduledThreadPoolExecutor.java:165)
> > > >
> > > >         at
> > > >
> > >
> > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.r
> > un(ScheduledThreadPoolExecutor.java:267)
> > > >
> > > >         at
> > > >
> > >
> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.j
> > ava:1146)
> > > >
> > > >         at
> > > >
> > >
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.
> > java:615)
> > > >
> > > >         at java.lang.Thread.run(Thread.java:679)
> > > >
> > > > 2013-10-28 20:42:16,434 INFO  [cloud.server.StatsCollector]
> > > > (StatsCollector-2:null) Unable to reach Pool[200|LVM]
> > > >
> > > > com.cloud.exception.StorageUnavailableException: Resource
> > > > [StoragePool:200] is unreachable: Unable to send command to the
> > > > pool
> > > >
> > > >         at
> > > >
> > >
> > com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
> > a:2357)
> > > >
> > > >         at
> > > >
> > >
> > com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
> > a:422)
> > > >
> > > >         at
> > > >
> > >
> > com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
> > a:436)
> > > >
> > > >         at
> > > >
> > >
> > com.cloud.server.StatsCollector$StorageCollector.run(StatsCollector.ja
> > va:316)
> > > >
> > > >         at
> > > > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java
> > > > :471)
> > > >
> > > >         at
> > > >
> > >
> > java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:
> > 351)
> > > >
> > > >         at
> > > java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
> > > >
> > > >         at
> > > >
> > >
> > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.a
> > ccess$201(ScheduledThreadPoolExecutor.java:165)
> > > >
> > > >         at
> > > >
> > >
> > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.r
> > un(ScheduledThreadPoolExecutor.java:267)
> > > >
> > > >         at
> > > >
> > >
> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.j
> > ava:1146)
> > > >
> > > >         at
> > > >
> > >
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.
> > java:615)
> > > >
> > > >         at java.lang.Thread.run(Thread.java:679)
> > > >
> > > > I have tried to force reconnect to both hosts but that ends up
> > > > maxing
> > out
> > > > a CPU core and filling up the log file with endless log lines.
> > > >
> > > > Any thoughts on how to recover my system?
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > >
> >
> Find out more about ShapeBlue and our range of CloudStack related services
>
> IaaS Cloud Design & Build<
> http://shapeblue.com/iaas-cloud-design-and-build//>
> CSForge – rapid IaaS deployment framework<http://shapeblue.com/csforge/>
> CloudStack Consulting<http://shapeblue.com/cloudstack-consultancy/>
> CloudStack Infrastructure Support<
> http://shapeblue.com/cloudstack-infrastructure-support/>
> CloudStack Bootcamp Training Courses<
> http://shapeblue.com/cloudstack-training/>
>
> This email and any attachments to it may be confidential and are intended
> solely for the use of the individual to whom it is addressed. Any views or
> opinions expressed are solely those of the author and do not necessarily
> represent those of Shape Blue Ltd or related companies. If you are not the
> intended recipient of this email, you must neither take any action based
> upon its contents, nor copy or show it to anyone. Please contact the sender
> if you believe you have received this email in error. Shape Blue Ltd is a
> company incorporated in England & Wales. ShapeBlue Services India LLP is a
> company incorporated in India and is operated under license from Shape Blue
> Ltd. Shape Blue Brasil Consultoria Ltda is a company incorporated in Brasil
> and is operated under license from Shape Blue Ltd. ShapeBlue SA Pty Ltd is
> a company registered by The Republic of South Africa and is traded under
> license from Shape Blue Ltd. ShapeBlue is a registered trademark.
>

Re: Management Server won't connect after cluster shutdown and restart

Reply via email to