I will be able to re-check some code and answer you with new thoughts on this problem tonight (the log files you provided today have some very insightful information regarding this problem). I believe this is a bug in HA, a very tricky one that has not been reported yet (at least I do not remember seeing anything similar). And thanks to randomness/destiny now you have found/uncovered a way to reproduce the problem (bug?!).
I do not suggest you changing any data at all directly in the data base (unless you know ACS code well and are 100% of what you are doing). This can cause even bigger and more complicated problems. If you remove the HA feature from this VMs, this problem will stop. However, if you can/are willing to maintain the HA until we fully uncover the problem, map and report it, this would be great. BTW: why are you restarting MSs? What are the configurations of your MSs? Thanks for your time and for sharing information with our community. On Tue, Jan 31, 2017 at 8:53 AM, Francois Scheurer < francois.scheu...@everyware.ch> wrote: > Dear Rafael > > > Many thanks for your deep analysis. > Nice humor with the planetsalignments BTW ;) > > We also suspected the HA Flag to be related this issue, as we never got > this issue in the past, until we enabled the HA on all VMs and Offerings. > > Here is also the workers list: > > mysql> select * from op_ha_work; > +------+-------------+-----------+--------------------+----- > -----+----------------+---------+---------------------+----- > --+---------------------+-------+-------------+---------+ > | id | instance_id | type | vm_type | state | > mgmt_server_id | host_id | created | tried | taken > | step | time_to_try | updated | > +------+-------------+-----------+--------------------+----- > -----+----------------+---------+---------------------+----- > --+---------------------+-------+-------------+---------+ > | 105 | 343 | HA | User | Running | > 345049098498 | 29 | 2015-03-28 03:19:03 | 2 | 2015-03-28 03:30:04 | > Error | 1394056013 | 3 | > | 109 | 407 | HA | ConsoleProxy | Running | > 345049098498 | 33 | 2015-03-28 03:19:03 | 2 | 2015-03-28 03:30:04 | > Error | 1394056013 | 3 | > | 113 | 407 | HA | ConsoleProxy | Running | > 345049098498 | 33 | 2015-03-28 03:19:03 | 2 | 2015-03-28 03:30:04 | > Error | 1394056013 | 3 | > | 117 | 407 | HA | ConsoleProxy | Running | > 345049098498 | 33 | 2015-03-28 03:19:03 | 2 | 2015-03-28 03:30:04 | > Error | 1394056013 | 3 | > | 121 | 343 | HA | User | Running | > 345049098498 | 29 | 2015-03-28 03:19:03 | 2 | 2015-03-28 03:30:04 | > Error | 1394056013 | 3 | > | 125 | 405 | HA | SecondaryStorageVm | Running | > 345049098498 | 29 | 2015-03-28 03:19:03 | 2 | 2015-03-28 03:30:04 | > Error | 1394056013 | 3 | > | 129 | 393 | HA | User | Running | > 345049098498 | 25 | 2015-03-28 03:19:03 | 2 | 2015-03-28 03:30:04 | > Error | 1394056014 | 30 | > | 133 | 405 | HA | SecondaryStorageVm | Running | > 345049098498 | 29 | 2015-03-28 03:19:03 | 2 | 2015-03-28 03:30:04 | > Error | 1394056014 | 3 | > | 137 | 402 | HA | ConsoleProxy | Running | > 345049098498 | 25 | 2015-03-28 03:19:03 | 2 | 2015-03-28 03:30:04 | > Error | 1394056014 | 3 | > | 345 | 402 | HA | ConsoleProxy | Starting | > 345049098498 | 25 | 2015-07-06 13:09:26 | 1 | 2015-07-06 13:09:26 | > Error | 1402527506 | 770 | > | 349 | 687 | HA | SecondaryStorageVm | Starting | > 345049098498 | 25 | 2015-07-06 13:09:26 | 1 | 2015-07-06 13:09:26 | > Error | 1402527506 | 2 | > | 2057 | 1178 | Migration | User | Running | > 345049098498 | 33 | 2017-01-31 07:56:08 | 0 | 2017-01-31 07:56:08 | > Done | 1451024774 | 95 | > | 2058 | 1736 | Migration | User | Running | > 345049098498 | 224 | 2017-01-31 07:56:08 | 0 | 2017-01-31 07:56:08 | > Done | 1451024774 | 45 | > | 2059 | 1240 | Migration | User | Running | > 345049098498 | 7 | 2017-01-31 07:56:08 | 0 | 2017-01-31 07:56:08 | > Done | 1451024774 | 61 | > | 2060 | 1178 | Migration | User | Running | > 345049098122 | 33 | 2017-01-31 07:56:08 | 1 | 2017-01-31 08:06:37 | > Done | 1451025374 | 95 | > | 2061 | 1178 | Migration | User | Stopped | > NULL | 33 | 2017-01-31 07:56:09 | 5 | NULL | Done > | 1451027900 | 109 | > | 2062 | 1690 | Migration | User | Running | > 345049098498 | 224 | 2017-01-31 07:56:09 | 0 | 2017-01-31 07:56:09 | > Done | 1451024774 | 109 | > | 2063 | 1177 | Migration | User | Running | > 345049098498 | 29 | 2017-01-31 07:56:09 | 0 | 2017-01-31 07:56:09 | > Done | 1451024774 | 82 | > | 2064 | 1690 | Migration | User | Running | > 345049098122 | 224 | 2017-01-31 07:56:09 | 1 | 2017-01-31 07:58:18 | > Done | 1451024900 | 109 | > | 2065 | 1240 | Migration | User | Running | > 345049098122 | 7 | 2017-01-31 07:56:09 | 1 | 2017-01-31 07:58:18 | > Done | 1451024900 | 61 | > | 2066 | 1736 | Migration | User | Running | > 345049098122 | 224 | 2017-01-31 07:56:09 | 1 | 2017-01-31 07:58:18 | > Done | 1451024900 | 45 | > | 2067 | 1240 | Migration | User | Stopped | > NULL | 7 | 2017-01-31 07:56:09 | 5 | NULL | Done > | 1451028003 | 80 | > | 2068 | 1178 | Migration | User | Running | > 345049098122 | 33 | 2017-01-31 07:56:16 | 0 | 2017-01-31 07:56:16 | > Done | 1451024781 | 95 | > | 2069 | 1736 | Migration | User | Running | > 345049098122 | 224 | 2017-01-31 07:56:16 | 0 | 2017-01-31 07:56:16 | > Done | 1451024781 | 45 | > | 2070 | 1240 | Migration | User | Running | > 345049098122 | 7 | 2017-01-31 07:56:16 | 0 | 2017-01-31 07:56:16 | > Done | 1451024781 | 61 | > | 2071 | 1177 | Migration | User | Running | > 345049098122 | 29 | 2017-01-31 07:56:16 | 0 | 2017-01-31 07:56:16 | > Done | 1451024781 | 82 | > | 2072 | 1177 | Migration | User | Stopped | > NULL | 29 | 2017-01-31 07:56:16 | 5 | NULL | Done > | 1451027973 | 101 | > | 2073 | 1177 | Migration | User | Running | > 345049098122 | 29 | 2017-01-31 07:56:16 | 1 | 2017-01-31 07:58:22 | > Done | 1451024904 | 82 | > | 2074 | 1690 | Migration | User | Stopped | > NULL | 224 | 2017-01-31 07:56:16 | 5 | NULL | Done > | 1451027464 | 123 | > | 2075 | 1736 | Migration | User | Stopped | > NULL | 224 | 2017-01-31 07:56:16 | 5 | NULL | Done > | 1451027477 | 59 | > +------+-------------+-----------+--------------------+----- > -----+----------------+---------+---------------------+----- > --+---------------------+-------+-------------+---------+ > > If we put a host in maintenance, it may unfortunately happen that a > timeout occurs and CS aborts the migration. > As far as we can interpret the logs, it seems (we speculate) that CS > detects that the VM is then still running on the original host, which is in > maintenance, and decides somehow to stop it because of the "host not up" > status. > We then need to start several times the VM, until CS gives up the stopping. > > After restarting the MSs I guess that a similar issue may happen: the > hosts are disconnected during the early stage and then reconnects. Maybe CS > then again stops the VMs because the "host not up" status? > Cf. 2017-01-30 08:44:06,157 DEBUG [c.c.d.DeploymentPlanningManagerImpl] > (Work-Job-Executor-6:ctx-0d738a8e job-13737/job-192988 ctx-6e9c753f) The > last host of this VM is not UP or is not enabled, host status is: > Disconnected, host resource state is: Enabled > > But the problem after the MSs restart is definitely only impacting VMs > listed in the Table op_ha_work. > > > To your questions: > We have 3 ACS MSs managing several Zones. Each zone has a single > pod and cluster. > The Cluster of i-638-1735-VM has this summary: > Total number of VMs: 121 > Total number hypervisors: 4 > Total allocated RAM: 522.0 / 972 GB (53.7 %) > Total allocated cores: 211 > Total allocated storage: 11285 GB > Used Memory on the hosts is now varying from 28 to 77%, so there > is enough RAM. > I did not check the XS logs. The same Problem has arisen this > morning again (shortly after a restart of all MSs), so I will look in the > XS logs now. > > I have some fair experience with Eclipse and Java, I will ask my > boss if we can go this way. > > The main issue for now, is that restarting the MSs is reproducing the > problem, which is quite annoying for the impacted VM's. > I thought to stop the MSs, delete all HA/Migration entries in op_ha_work > and start the MSs. > Is that a viable option? (or should I delete the Migration entries only? > or never touch this table?) > We will also very probably remove the HA Flag on all VMs if it does not > work reliably. > > Best Regards > Francois Scheurer > > > > Here are new logs regarding the same issue of today. > I tried to get the important parts related to the stop of i-189-1178-VM. > > regexp: (i-189-1178-VM|VmWorkMigrateAway|job-15393|job-192989| > StopRequestedvm|job-14965|job-193019|Stopped-Migrating|ctx- > 36aae733|work-2049) > > 2017-01-30 08:44:04,440 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] > (Work-Job-Executor-5:ctx-32f1e087 job-15393/job-192989) Executing > AsyncJobVO {id:192989, userId: 1, accountId: 1, instanceType: null, > instanceId: null, cmd: com.cloud.vm.VmWorkMigrateAway, cmdInfo: > rO0ABXNyAB5jb20uY2xvdWQudm0uVm1Xb3JrTWlncmF0ZU > F3YXmt4MX4jtcEmwIAAUoACXNyY0hvc3RJZHhyABNjb20uY2xvdWQudm0uVm > 1Xb3Jrn5m2VvAlZ2sCAARKAAlhY2NvdW50SWRKAAZ1c2VySWRKAAR2bUlkTA > ALaGFuZGxlck5hbWV0ABJMamF2YS9sYW5nL1N0cmluZzt4cAAAAAAAAAABAA > AAAAAAAAEAAAAAAAAEmnQAGVZpcnR1YWxNYWNoaW5lTWFuYWdlckltcGwAAAAAAAAAHQ, > cmdVersion: 0, status: IN_PROGRESS, processStatus: 0, resu > ltCode: 0, result: null, initMsid: 345049103441, completeMsid: null, > lastUpdated: null, lastPolled: null, created: Mon Jan 30 08:44:03 CET 2017} > 2017-01-30 08:44:04,440 DEBUG [c.c.v.VmWorkJobDispatcher] > (Work-Job-Executor-5:ctx-32f1e087 job-15393/job-192989) Run VM work job: > com.cloud.vm.VmWorkMigrateAway for VM 1178, job origin: 15393 > > 2017-01-30 08:44:05,217 DEBUG [c.c.d.DeploymentPlanningManagerImpl] > (Work-Job-Executor-5:ctx-32f1e087 job-15393/job-192989 ctx-ea83d6ea) > Deploy avoids pods: [], clusters: [], hosts: [29] > 2017-01-30 08:44:05,220 DEBUG [c.c.a.m.ClusteredAgentManagerImpl] > (AgentConnectTaskPool-4:ctx-5b61f2e5) create ClusteredAgentAttache for 232 > 2017-01-30 08:44:05,220 DEBUG [c.c.d.DeploymentPlanningManagerImpl] > (Work-Job-Executor-5:ctx-32f1e087 job-15393/job-192989 ctx-ea83d6ea) > DeploymentPlanner allocation algorithm: com.cloud.deploy.UserDispersin > gPlanner@1734b2e2 > 2017-01-30 08:44:05,220 DEBUG [c.c.d.DeploymentPlanningManagerImpl] > (Work-Job-Executor-5:ctx-32f1e087 job-15393/job-192989 ctx-ea83d6ea) > Trying to allocate a host and storage pools from dc:17, pod:25,cluster:25, > requested cpu: 4200, requested ram: 4294967296 > 2017-01-30 08:44:05,220 DEBUG [c.c.d.DeploymentPlanningManagerImpl] > (Work-Job-Executor-5:ctx-32f1e087 job-15393/job-192989 ctx-ea83d6ea) Is > ROOT volume READY (pool already allocated)?: Yes > 2017-01-30 08:44:05,220 DEBUG [c.c.d.DeploymentPlanningManagerImpl] > (Work-Job-Executor-5:ctx-32f1e087 job-15393/job-192989 ctx-ea83d6ea) This > VM has last host_id specified, trying to choose the same host: 25 > > 2017-01-30 08:44:05,614 DEBUG [c.c.d.DeploymentPlanningManagerImpl] > (Work-Job-Executor-5:ctx-32f1e087 job-15393/job-192989 ctx-ea83d6ea) The > last host of this VM is not UP or is not enabled, host status is: > Disconnected, host resource state is: Enabled > 2017-01-30 08:44:05,614 DEBUG [c.c.d.DeploymentPlanningManagerImpl] > (Work-Job-Executor-5:ctx-32f1e087 job-15393/job-192989 ctx-ea83d6ea) > Cannot choose the last host to deploy this VM > 2017-01-30 08:44:05,614 DEBUG [c.c.d.FirstFitPlanner] > (Work-Job-Executor-5:ctx-32f1e087 job-15393/job-192989 ctx-ea83d6ea) > Searching resources only under specified Cluster: 25 > > 2017-01-30 08:44:05,777 DEBUG [c.c.a.m.a.i.FirstFitAllocator] > (Work-Job-Executor-5:ctx-32f1e087 job-15393/job-192989 ctx-ea83d6ea > FirstFitRoutingAllocator) Found 0 hosts for allocation after > prioritization: [] > 2017-01-30 08:44:05,777 DEBUG [c.c.a.m.a.i.FirstFitAllocator] > (Work-Job-Executor-5:ctx-32f1e087 job-15393/job-192989 ctx-ea83d6ea > FirstFitRoutingAllocator) Looking for speed=4200Mhz, Ram=4096 > 2017-01-30 08:44:05,777 DEBUG [c.c.a.m.a.i.FirstFitAllocator] > (Work-Job-Executor-5:ctx-32f1e087 job-15393/job-192989 ctx-ea83d6ea > FirstFitRoutingAllocator) Host Allocator returning 0 suitable hosts > 2017-01-30 08:44:05,777 DEBUG [c.c.d.DeploymentPlanningManagerImpl] > (Work-Job-Executor-5:ctx-32f1e087 job-15393/job-192989 ctx-ea83d6ea) No > suitable hosts found > 2017-01-30 08:44:05,777 DEBUG [c.c.d.DeploymentPlanningManagerImpl] > (Work-Job-Executor-5:ctx-32f1e087 job-15393/job-192989 ctx-ea83d6ea) No > suitable hosts found under this Cluster: 25 > 2017-01-30 08:44:05,787 DEBUG [c.c.d.DeploymentPlanningManagerImpl] > (Work-Job-Executor-5:ctx-32f1e087 job-15393/job-192989 ctx-ea83d6ea) > Could not find suitable Deployment Destination for this VM under any > clusters, returning. > 2017-01-30 08:44:05,787 DEBUG [c.c.d.FirstFitPlanner] > (Work-Job-Executor-5:ctx-32f1e087 job-15393/job-192989 ctx-ea83d6ea) > Searching resources only under specified Cluster: 25 > 2017-01-30 08:44:05,798 DEBUG [c.c.d.FirstFitPlanner] > (Work-Job-Executor-5:ctx-32f1e087 job-15393/job-192989 ctx-ea83d6ea) The > specified cluster is in avoid set, returning. > 2017-01-30 08:44:05,798 DEBUG [c.c.v.VirtualMachineManagerImpl] > (Work-Job-Executor-5:ctx-32f1e087 job-15393/job-192989 ctx-ea83d6ea) > Unable to find destination for migrating the vm VM[User|i-189-1178-VM] > 2017-01-30 08:44:05,798 WARN [c.c.v.VirtualMachineManagerImpl] > (Work-Job-Executor-5:ctx-32f1e087 job-15393/job-192989 ctx-ea83d6ea) > Failed to deploy vm 1178 with original planner, sending HAPlanner > > 2017-01-30 08:44:05,985 DEBUG [c.c.d.DeploymentPlanningManagerImpl] > (Work-Job-Executor-5:ctx-32f1e087 job-15393/job-192989 ctx-ea83d6ea) > Deploy avoids pods: [], clusters: [], hosts: [29] > 2017-01-30 08:44:05,985 DEBUG [c.c.d.DeploymentPlanningManagerImpl] > (Work-Job-Executor-5:ctx-32f1e087 job-15393/job-192989 ctx-ea83d6ea) > DeploymentPlanner allocation algorithm: com.cloud.deploy.SkipHeurestic > sPlanner@4fdf86ce > 2017-01-30 08:44:05,985 DEBUG [c.c.d.DeploymentPlanningManagerImpl] > (Work-Job-Executor-5:ctx-32f1e087 job-15393/job-192989 ctx-ea83d6ea) > Trying to allocate a host and storage pools from dc:17, pod:25,cluster:25, > requested cpu: 4200, requested ram: 4294967296 > 2017-01-30 08:44:05,985 DEBUG [c.c.d.DeploymentPlanningManagerImpl] > (Work-Job-Executor-5:ctx-32f1e087 job-15393/job-192989 ctx-ea83d6ea) Is > ROOT volume READY (pool already allocated)?: Yes > 2017-01-30 08:44:05,985 DEBUG [c.c.d.DeploymentPlanningManagerImpl] > (Work-Job-Executor-5:ctx-32f1e087 job-15393/job-192989 ctx-ea83d6ea) This > VM has last host_id specified, trying to choose the same host: 25 > > 2017-01-30 08:44:06,022 DEBUG [c.c.d.DeploymentPlanningManagerImpl] > (Work-Job-Executor-5:ctx-32f1e087 job-15393/job-192989 ctx-ea83d6ea) The > last host of this VM is not UP or is not enabled, host status is: > Disconnected, host resource state is: Enabled > 2017-01-30 08:44:06,022 DEBUG [c.c.d.DeploymentPlanningManagerImpl] > (Work-Job-Executor-5:ctx-32f1e087 job-15393/job-192989 ctx-ea83d6ea) > Cannot choose the last host to deploy this VM > 2017-01-30 08:44:06,022 DEBUG [c.c.d.FirstFitPlanner] > (Work-Job-Executor-5:ctx-32f1e087 job-15393/job-192989 ctx-ea83d6ea) > Searching resources only under specified Cluster: 25 > 2017-01-30 08:44:06,039 DEBUG [c.c.d.SkipHeuresticsPlanner] > (Work-Job-Executor-5:ctx-32f1e087 job-15393/job-192989 ctx-ea83d6ea) > Deploying vm during HA process, so skipping disable threshold check > > 2017-01-30 08:44:06,151 DEBUG [c.c.a.m.a.i.FirstFitAllocator] > (Work-Job-Executor-5:ctx-32f1e087 job-15393/job-192989 ctx-ea83d6ea > FirstFitRoutingAllocator) Found 0 hosts for allocation after > prioritization: [] > 2017-01-30 08:44:06,151 DEBUG [c.c.a.m.a.i.FirstFitAllocator] > (Work-Job-Executor-5:ctx-32f1e087 job-15393/job-192989 ctx-ea83d6ea > FirstFitRoutingAllocator) Looking for speed=4200Mhz, Ram=4096 > 2017-01-30 08:44:06,151 DEBUG [c.c.a.m.a.i.FirstFitAllocator] > (Work-Job-Executor-5:ctx-32f1e087 job-15393/job-192989 ctx-ea83d6ea > FirstFitRoutingAllocator) Host Allocator returning 0 suitable hosts > 2017-01-30 08:44:06,151 DEBUG [c.c.d.DeploymentPlanningManagerImpl] > (Work-Job-Executor-5:ctx-32f1e087 job-15393/job-192989 ctx-ea83d6ea) No > suitable hosts found > 2017-01-30 08:44:06,151 DEBUG [c.c.d.DeploymentPlanningManagerImpl] > (Work-Job-Executor-5:ctx-32f1e087 job-15393/job-192989 ctx-ea83d6ea) No > suitable hosts found under this Cluster: 25 > 2017-01-30 08:44:06,157 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] > (Work-Job-Executor-4:ctx-017064e7 job-13733/job-192986) Publish async > job-192986 complete on message bus > 2017-01-30 08:44:06,157 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] > (Work-Job-Executor-4:ctx-017064e7 job-13733/job-192986) Wake up jobs > related to job-192986 > 2017-01-30 08:44:06,157 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] > (Work-Job-Executor-4:ctx-017064e7 job-13733/job-192986) Update db status > for job-192986 > 2017-01-30 08:44:06,157 DEBUG [c.c.d.DeploymentPlanningManagerImpl] > (Work-Job-Executor-6:ctx-0d738a8e job-13737/job-192988 ctx-6e9c753f) The > last host of this VM is not UP or is not enabled, host status is: > Disconnected, host resource state is: Enabled > 2017-01-30 08:44:06,157 DEBUG [c.c.d.DeploymentPlanningManagerImpl] > (Work-Job-Executor-6:ctx-0d738a8e job-13737/job-192988 ctx-6e9c753f) > Cannot choose the last host to deploy this VM > 2017-01-30 08:44:06,157 DEBUG [c.c.d.FirstFitPlanner] > (Work-Job-Executor-6:ctx-0d738a8e job-13737/job-192988 ctx-6e9c753f) > Searching resources only under specified Cluster: 35 > 2017-01-30 08:44:06,160 DEBUG [c.c.d.DeploymentPlanningManagerImpl] > (Work-Job-Executor-5:ctx-32f1e087 job-15393/job-192989 ctx-ea83d6ea) > Could not find suitable Deployment Destination for this VM under any > clusters, returning. > 2017-01-30 08:44:06,160 DEBUG [c.c.d.FirstFitPlanner] > (Work-Job-Executor-5:ctx-32f1e087 job-15393/job-192989 ctx-ea83d6ea) > Searching resources only under specified Cluster: 25 > 2017-01-30 08:44:06,163 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] > (AsyncJobMgr-Heartbeat-1:ctx-eb81fc0b) Execute sync-queue item: > SyncQueueItemVO {id:63629, queueId: 64423, contentType: AsyncJob, > contentId: 192991, lastProcessMsid: 345049098498, lastprocessNumber: 27, > lastProcessTime: Mon Jan 30 08:44:06 CET 2017, created: Mon Jan 30 08:44:05 > CET 2017} > 2017-01-30 08:44:06,169 INFO [o.a.c.f.j.i.AsyncJobMonitor] > (Work-Job-Executor-8:ctx-381e8d1a job-14953/job-192992) Add job-192992 > into job monitoring > 2017-01-30 08:44:06,184 DEBUG [c.c.d.SkipHeuresticsPlanner] > (Work-Job-Executor-6:ctx-0d738a8e job-13737/job-192988 ctx-6e9c753f) > Deploying vm during HA process, so skipping disable threshold check > 2017-01-30 08:44:06,186 DEBUG [c.c.d.FirstFitPlanner] > (Work-Job-Executor-5:ctx-32f1e087 job-15393/job-192989 ctx-ea83d6ea) The > specified cluster is in avoid set, returning. > 2017-01-30 08:44:06,186 DEBUG [c.c.v.VirtualMachineManagerImpl] > (Work-Job-Executor-5:ctx-32f1e087 job-15393/job-192989 ctx-ea83d6ea) > Unable to find destination for migrating the vm VM[User|i-189-1178-VM] > 2017-01-30 08:44:06,186 ERROR [c.c.v.VmWorkJobHandlerProxy] > (Work-Job-Executor-5:ctx-32f1e087 job-15393/job-192989 ctx-ea83d6ea) > Invocation exception, caused by: > com.cloud.exception.InsufficientServerCapacityException: > Unable to find a server to migrate to.Scope=interface > com.cloud.org.Cluster; id=25 > 2017-01-30 08:44:06,187 INFO [c.c.v.VmWorkJobHandlerProxy] > (Work-Job-Executor-5:ctx-32f1e087 job-15393/job-192989 ctx-ea83d6ea) > Rethrow exception com.cloud.exception.InsufficientServerCapacityException: > Unable to find a server to migrate to.Scope=interface > com.cloud.org.Cluster; id=25 > 2017-01-30 08:44:06,187 DEBUG [c.c.v.VmWorkJobDispatcher] > (Work-Job-Executor-5:ctx-32f1e087 job-15393/job-192989) Done with run of > VM work job: com.cloud.vm.VmWorkMigrateAway for VM 1178, job origin: 15393 > 2017-01-30 08:44:06,187 ERROR [c.c.v.VmWorkJobDispatcher] > (Work-Job-Executor-5:ctx-32f1e087 job-15393/job-192989) Unable to > complete AsyncJobVO {id:192989, userId: 1, accountId: 1, instanceType: > null, instanceId: null, cmd: com.cloud.vm.VmWorkMigrateAway, cmdInfo: > rO0ABXNyAB5jb20uY2xvdWQudm0uVm1Xb3JrTWlncmF0ZUF3YXmt4MX4jtcE > mwIAAUoACXNyY0hvc3RJZHhyABNjb20uY2xvdWQudm0uVm1Xb3Jrn5m2VvAl > Z2sCAARKAAlhY2NvdW50SWRKAAZ1c2VySWRKAAR2bUlkTAALaGFuZGxlck5h > bWV0ABJMamF2YS9sYW5nL1N0cmluZzt4cAAAAAAAAAABAAAAAAAAAAEAAAAA > AAAEmnQAGVZpcnR1YWxNYWNoaW5lTWFuYWdlckltcGwAAAAAAAAAHQ, cmdVersion: 0, > status: IN_PROGRESS, processStatus: 0, resultCode: 0, result: null, > initMsid: 345049103441, completeMsid: null, lastUpdated: null, lastPolled: > null, created: Mon Jan 30 08:44:03 CET 2017}, job origin:15393 > com.cloud.exception.InsufficientServerCapacityException: Unable to find a > server to migrate to.Scope=interface com.cloud.org.Cluster; id=25 > at com.cloud.vm.VirtualMachineManagerImpl.orchestrateMigrateAwa > y(VirtualMachineManagerImpl.java:2390) > at com.cloud.vm.VirtualMachineManagerImpl.orchestrateMigrateAwa > y(VirtualMachineManagerImpl.java:4520) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAcce > ssorImpl.java:57) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMe > thodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at com.cloud.vm.VmWorkJobHandlerProxy.handleVmWorkJob(VmWorkJob > HandlerProxy.java:107) > at com.cloud.vm.VirtualMachineManagerImpl.handleVmWorkJob(Virtu > alMachineManagerImpl.java:4636) > at com.cloud.vm.VmWorkJobDispatcher.runJob(VmWorkJobDispatcher. > java:103) > at org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImp > l$5.runInContext(AsyncJobManagerImpl.java:537) > at org.apache.cloudstack.managed.context.ManagedContextRunnable > $1.run(ManagedContextRunnable.java:49) > at org.apache.cloudstack.managed.context.impl.DefaultManagedCon > text$1.call(DefaultManagedContext.java:56) > at org.apache.cloudstack.managed.context.impl.DefaultManagedCon > text.callWithContext(DefaultManagedContext.java:103) > at org.apache.cloudstack.managed.context.impl.DefaultManagedCon > text.runWithContext(DefaultManagedContext.java:53) > at org.apache.cloudstack.managed.context.ManagedContextRunnable > .run(ManagedContextRunnable.java:46) > at org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImp > l$5.run(AsyncJobManagerImpl.java:494) > at java.util.concurrent.Executors$RunnableAdapter.call( > Executors.java:471) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool > Executor.java:1145) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo > lExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > 2017-01-30 08:44:06,188 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] > (AsyncJobMgr-Heartbeat-1:ctx-eb81fc0b) Schedule queued job-192991 > 2017-01-30 08:44:06,192 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] > (Work-Job-Executor-4:ctx-017064e7 job-13733/job-192986) Wake up jobs > joined with job-192986 and disjoin all subjobs created from job- 192986 > 2017-01-30 08:44:06,196 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] > (Work-Job-Executor-5:ctx-32f1e087 job-15393/job-192989) Complete async > job-192989, jobStatus: FAILED, resultCode: 0, result: > rO0ABXNyADdjb20uY2xvdWQuZXhjZXB0aW9uLkluc3VmZmljaWVudFNlcnZl > ckNhcGFjaXR5RXhjZXB0aW9uAAAAAFZNT3ACAAFaABVhZmZpbml0eUdyb3Vw > c0FwcGxpZWR4cgAxY29tLmNsb3VkLmV4Y2VwdGlvbi5JbnN1ZmZpY2llbnRD > YXBhY2l0eUV4Y2VwdGlvbgAAAABWTU91AgACTAACaWR0ABBMamF2YS9sYW5n > L0xvbmc7TAAFc2NvcGV0ABFMamF2YS9sYW5nL0NsYXNzO3hyACJjb20uY2xv > dWQuZXhjZXB0aW9uLkNsb3VkRXhjZXB0aW9ueeiOG_HkRMcCAAJMAAtjc0Vy > cm9yQ29kZXQAE0xqYXZhL2xhbmcvSW50ZWdlcjtMAAZpZExpc3R0ABVMamF2 > YS91dGlsL0FycmF5TGlzdDt4cgATamF2YS5sYW5nLkV4Y2VwdGlvbtD9Hz4a > OxzEAgAAeHIAE2phdmEubGFuZy5UaHJvd2FibGXVxjUnOXe4ywMABEwABWNh > dXNldAAVTGphdmEvbGFuZy9UaHJvd2FibGU7TAANZGV0YWlsTWVzc2FnZXQA > EkxqYXZhL2xhbmcvU3RyaW5nO1sACnN0YWNrVHJhY2V0AB5bTGphdmEvbGFu > Zy9TdGFja1RyYWNlRWxlbWVudDtMABRzdXBwcmVzc2VkRXhjZXB0aW9uc3QA > EExqYXZhL3V0aWwvTGlzdDt4cHEAfgANdAAmVW5hYmxlIHRvIGZpbmQgYSBz > ZXJ2ZXIgdG8gbWlncmF0ZSB0by51cgAeW0xqYXZhLmxhbmcuU3RhY2tUcmFj > ZUVsZW1lbnQ7AkYqPDz9IjkCAAB4cAAAABVzcgAbamF2YS5sYW5nLlN0YWNr > VHJhY2VFbGVtZW50YQnFmiY23YUCAARJAApsaW5lTnVtYmVyTAAOZGVjbGFy > aW5nQ2xhc3NxAH4ACkwACGZpbGVOYW1lcQB-AApMAAptZXRob2ROYW1lcQB- > AAp4cAAACVZ0ACZjb20uY2xvdWQudm0uVmlydHVhbE1hY2hpbmVNYW5hZ2Vy > SW1wbHQAHlZpcnR1YWxNYWNoaW5lTWFuYWdlckltcGwuamF2YXQAFm9yY2hl > c3RyYXRlTWlncmF0ZUF3YXlzcQB-ABEAABGocQB-ABNxAH4AFHEAfgAVc3EA > fgAR_____nQAJHN1bi5yZWZsZWN0Lk5hdGl2ZU1ldGhvZEFjY2Vzc29ySW1w > bHQAHU5hdGl2ZU1ldGhvZEFjY2Vzc29ySW1wbC5qYXZhdAAHaW52b2tlMHNx > AH4AEQAAADlxAH4AGHEAfgAZdAAGaW52b2tlc3EAfgARAAAAK3QAKHN1bi5y > ZWZsZWN0LkRlbGVnYXRpbmdNZXRob2RBY2Nlc3NvckltcGx0ACFEZWxlZ2F0 > aW5nTWV0aG9kQWNjZXNzb3JJbXBsLmphdmFxAH4AHHNxAH4AEQAAAl50ABhq > YXZhLmxhbmcucmVmbGVjdC5NZXRob2R0AAtNZXRob2QuamF2YXEAfgAcc3EA > fgARAAAAa3QAImNvbS5jbG91ZC52bS5WbVdvcmtKb2JIYW5kbGVyUHJveHl0 > ABpWbVdvcmtKb2JIYW5kbGVyUHJveHkuamF2YXQAD2hhbmRsZVZtV29ya0pv > YnNxAH4AEQAAEhxxAH4AE3EAfgAUcQB-ACZzcQB-ABEAAABndAAgY29tLmNs > b3VkLnZtLlZtV29ya0pvYkRpc3BhdGNoZXJ0ABhWbVdvcmtKb2JEaXNwYXRj > aGVyLmphdmF0AAZydW5Kb2JzcQB-ABEAAAIZdAA_b3JnLmFwYWNoZS5jbG > 91ZHN0YWNrLmZyYW1ld29yay5qb2JzLmltcGwuQXN5bmNKb2JNYW5hZ2VySW > 1wbCQ1dAAYQXN5bmNKb2JNYW5hZ2VySW1wbC5qYXZhdAAMcnVuSW5Db250ZX > h0c3EAfgARAAAAMXQAPm9yZy5hcGFjaGUuY2xvdWRzdGFjay5tYW5hZ2VkLm > NvbnRleHQuTWFuYWdlZENvbnRleHRSdW5uYWJsZSQxdAAbTWFuYWdlZENvbn > RleHRSdW5uYWJsZS5qYXZhdAADcnVuc3EAfgARAAAAOHQAQm9yZy5hcGFjaG > UuY2xvdWRzdGFjay5tYW5hZ2VkLmNvbnRleHQuaW1wbC5EZWZhdWx0TWFuYW > dlZENvbnRleHQkMXQAGkRlZmF1bHRNYW5hZ2VkQ29udGV4dC5qYXZhdAAEY2 > FsbHNxAH4AEQAAAGd0AEBvcmcuYXBhY2hlLmNsb3Vkc3RhY2subWFuYWdlZC > 5jb250ZXh0LmltcGwuRGVmYXVsdE1hbmFnZWRDb250ZXh0cQB-ADZ0AA9jYW > xsV2l0aENvbnRleHRzcQB-ABEAAAA1cQB-ADlxAH4ANnQADnJ1bl > dpdGhDb250ZXh0c3EAfgARAAAALnQAPG9yZy5hcGFjaGUuY2xvdWRzdGFjay > 5tYW5hZ2VkLmNvbnRleHQuTWFuYWdlZENvbnRleHRSdW5uYWJsZXEAfgAycQ > B-ADNzcQB-ABEAAAHucQB-AC1xAH4ALnEAfgAzc3EAfgARAAAB13QALmphdm > EudXRpbC5jb25jdXJyZW50LkV4ZWN1dG9ycyRSdW5uYWJsZUFkYXB0ZXJ0AA > 5FeGVjdXRvcnMuamF2YXEAfgA3c3EAfgARAAABBnQAH2phdmEudXRpbC5jb2 > 5jdXJyZW50LkZ1dHVyZVRhc2t0AA9GdXR1cmVUYXNrLmphdmFxAH4AM3NxAH > 4AEQAABHl0ACdqYXZhLnV0aWwuY29uY3VycmVudC5UaHJlYWRQb29sRXhlY3 > V0b3J0ABdUaHJlYWRQb29sRXhlY3V0b3IuamF2YXQACXJ1bldvcmtlcnNxAH > 4AEQAAAmd0AC5qYXZhLnV0aWwuY29uY3VycmVudC5UaHJlYWRQb29sRXhlY3 > V0b3IkV29ya2VycQB-AEhxAH4AM3NxAH4AEQAAAul0ABBqYXZhLmxhbmcuVG > hyZWFkdAALVGhyZWFkLmphdmFxAH4AM3NyACZqYXZhLnV0aWwuQ29sbGVjdG > lvbnMkVW5tb2RpZmlhYmxlTGlzdPwPJTG17I4QAgABTAAEbGlzdHEAfgAMeH > IALGphdmEudXRpbC5Db2xsZWN0aW9ucyRVbm1vZGlmaWFibGVDb2xsZWN0aW > 9uGUIAgMte9x4CAAFMAAFjdAAWTGphdmEvdXRpbC9Db2xsZWN0aW9uO3hwc3 > IAE2phdmEudXRpbC5BcnJheUxpc3R4gdIdmcdhnQMAAUkABHNpemV4cAAAAA > B3BAAAAAB4cQB-AFR4c3IAEWphdmEubGFuZy5JbnRlZ2VyEuKgpPeBhzgCAA > FJAAV2YWx1ZXhyABBqYXZhLmxhbmcuTnVtYmVyhqyVHQuU4IsCAAB4cAAAEO > 9zcQB-AFMAAAAAdwQAAAAAeHNyAA5qYXZhLmxhbmcuTG9uZzuL5JDMjyPfAg > ABSgAFdmFsdWV4cQB-AFYAAAAAAAAAGXZyABVjb20uY2xvdWQub3JnLkNsdX > N0ZXKqoV-omaW2FAIAAHhwAA > 2017-01-30 08:44:06,201 DEBUG [c.c.d.DeploymentPlanningManagerImpl] > (Work-Job-Executor-7:ctx-bcf531e9 job-13729/job-192987 ctx-4c93bc31) > Deploy avoids pods: [], clusters: [], hosts: [3] > 2017-01-30 08:44:06,201 DEBUG [c.c.d.DeploymentPlanningManagerImpl] > (Work-Job-Executor-7:ctx-bcf531e9 job-13729/job-192987 ctx-4c93bc31) > DeploymentPlanner allocation algorithm: com.cloud.deploy.SkipHeurestic > sPlanner@4fdf86ce > 2017-01-30 08:44:06,201 DEBUG [c.c.d.DeploymentPlanningManagerImpl] > (Work-Job-Executor-7:ctx-bcf531e9 job-13729/job-192987 ctx-4c93bc31) > Trying to allocate a host and storage pools from dc:14, pod:17,cluster:17, > requested cpu: 4200, requested ram: 4294967296 > 2017-01-30 08:44:06,201 DEBUG [c.c.d.DeploymentPlanningManagerImpl] > (Work-Job-Executor-7:ctx-bcf531e9 job-13729/job-192987 ctx-4c93bc31) Is > ROOT volume READY (pool already allocated)?: Yes > 2017-01-30 08:44:06,201 DEBUG [c.c.d.DeploymentPlanningManagerImpl] > (Work-Job-Executor-7:ctx-bcf531e9 job-13729/job-192987 ctx-4c93bc31) This > VM has last host_id specified, trying to choose the same host: 213 > 2017-01-30 08:44:06,213 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] > (Work-Job-Executor-5:ctx-32f1e087 job-15393/job-192989) Publish async > job-192989 complete on message bus > 2017-01-30 08:44:06,213 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] > (Work-Job-Executor-5:ctx-32f1e087 job-15393/job-192989) Wake up jobs > related to job-192989 > 2017-01-30 08:44:06,213 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] > (Work-Job-Executor-5:ctx-32f1e087 job-15393/job-192989) Update db status > for job-192989 > 2017-01-30 08:44:06,219 DEBUG [c.c.d.DeploymentPlanningManagerImpl] > (Work-Job-Executor-6:ctx-0d738a8e job-13737/job-192988 ctx-6e9c753f) > Checking resources in Cluster: 35 under Pod: 31 > 2017-01-30 08:44:06,227 DEBUG [c.c.a.m.a.i.FirstFitAllocator] > (Work-Job-Executor-6:ctx-0d738a8e job-13737/job-192988 ctx-6e9c753f > FirstFitRoutingAllocator) Looking for hosts in dc: 23 pod:31 cluster:35 > 2017-01-30 08:44:06,229 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] > (Work-Job-Executor-8:ctx-381e8d1a job-14953/job-192992) Executing > AsyncJobVO {id:192992, userId: 1, accountId: 1, instanceType: null, > instanceId: null, cmd: com.cloud.vm.VmWorkMigrateAway, cmdInfo: > rO0ABXNyAB5jb20uY2xvdWQudm0uVm1Xb3JrTWlncmF0ZUF3YXmt4MX4jtcE > mwIAAUoACXNyY0hvc3RJZHhyABNjb20uY2xvdWQudm0uVm1Xb3Jrn5m2VvAl > Z2sCAARKAAlhY2NvdW50SWRKAAZ1c2VySWRKAAR2bUlkTAALaGFuZGxlck5h > bWV0ABJMamF2YS9sYW5nL1N0cmluZzt4cAAAAAAAAAABAAAAAAAAAAEAAAAA > AAAGwXQAGVZpcnR1YWxNYWNoaW5lTWFuYWdlckltcGwAAAAAAAAAEw, cmdVersion: 0, > status: IN_PROGRESS, processStatus: 0, resultCode: 0, result: null, > initMsid: 345049098122, completeMsid: null, lastUpdated: null, lastPolled: > null, created: Mon Jan 30 08:44:05 CET 2017} > 2017-01-30 08:44:06,229 DEBUG [c.c.v.VmWorkJobDispatcher] > (Work-Job-Executor-8:ctx-381e8d1a job-14953/job-192992) Run VM work job: > com.cloud.vm.VmWorkMigrateAway for VM 1729, job origin: 14953 > 2017-01-30 08:44:06,230 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] > (AsyncJobMgr-Heartbeat-1:ctx-eb81fc0b) Execute sync-queue item: > SyncQueueItemVO {id:63630, queueId: 67180, contentType: AsyncJob, > contentId: 192990, lastProcessMsid: 345049098498, lastprocessNumber: 61, > lastProcessTime: Mon Jan 30 08:44:06 CET 2017, created: Mon Jan 30 08:44:05 > CET 2017} > 2017-01-30 08:44:06,244 DEBUG [c.c.d.DeploymentPlanningManagerImpl] > (Work-Job-Executor-7:ctx-bcf531e9 job-13729/job-192987 ctx-4c93bc31) The > last host of this VM is not UP or is not enabled, host status is: > Disconnected, host resource state is: Enabled > 2017-01-30 08:44:06,262 DEBUG [c.c.d.DeploymentPlanningManagerImpl] > (Work-Job-Executor-7:ctx-bcf531e9 job-13729/job-192987 ctx-4c93bc31) > Cannot choose the last host to deploy this VM > 2017-01-30 08:44:06,263 DEBUG [c.c.d.FirstFitPlanner] > (Work-Job-Executor-7:ctx-bcf531e9 job-13729/job-192987 ctx-4c93bc31) > Searching resources only under specified Cluster: 17 > 2017-01-30 08:44:06,262 INFO [o.a.c.f.j.i.AsyncJobMonitor] > (Work-Job-Executor-9:ctx-f4d09efb job-14965/job-192991) Add job-192991 > into job monitoring > 2017-01-30 08:44:06,246 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] > (Work-Job-Executor-4:ctx-017064e7 job-13733/job-192986) Done executing > com.cloud.vm.VmWorkMigrateAway for job-192986 > 2017-01-30 08:44:06,244 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] > (Work-Job-Executor-5:ctx-32f1e087 job-15393/job-192989) Wake up jobs > joined with job-192989 and disjoin all subjobs created from job- 192989 > 2017-01-30 08:44:06,269 INFO [o.a.c.f.j.i.AsyncJobMonitor] > (Work-Job-Executor-4:ctx-017064e7 job-13733/job-192986) Remove job-192986 > from job monitoring > 2017-01-30 08:44:06,279 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] > (Work-Job-Executor-5:ctx-32f1e087 job-15393/job-192989) Done executing > com.cloud.vm.VmWorkMigrateAway for job-192989 > 2017-01-30 08:44:06,305 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] > (Work-Job-Executor-9:ctx-f4d09efb job-14965/job-192991) Executing > AsyncJobVO {id:192991, userId: 1, accountId: 1, instanceType: null, > instanceId: null, cmd: com.cloud.vm.VmWorkMigrateAway, cmdInfo: > rO0ABXNyAB5jb20uY2xvdWQudm0uVm1Xb3JrTWlncmF0ZUF3YXmt4MX4jtcE > mwIAAUoACXNyY0hvc3RJZHhyABNjb20uY2xvdWQudm0uVm1Xb3Jrn5m2VvAl > Z2sCAARKAAlhY2NvdW50SWRKAAZ1c2VySWRKAAR2bUlkTAALaGFuZGxlck5h > bWV0ABJMamF2YS9sYW5nL1N0cmluZzt4cAAAAAAAAAABAAAAAAAAAAEAAAAA > AAAGRXQAGVZpcnR1YWxNYWNoaW5lTWFuYWdlckltcGwAAAAAAAAABw, cmdVersion: 0, > status: IN_PROGRESS, processStatus: 0, resultCode: 0, result: null, > initMsid: 345049098122, completeMsid: null, lastUpdated: null, lastPolled: > null, created: Mon Jan 30 08:44:05 CET 2017} > 2017-01-30 08:44:06,305 DEBUG [c.c.v.VmWorkJobDispatcher] > (Work-Job-Executor-9:ctx-f4d09efb job-14965/job-192991) Run VM work job: > com.cloud.vm.VmWorkMigrateAway for VM 1605, job origin: 14965 > 2017-01-30 08:44:06,306 DEBUG [c.c.a.m.AgentManagerImpl] > (AgentManager-Handler-13:null) Ping from 258 > 2017-01-30 08:44:06,319 DEBUG [c.c.a.m.a.i.FirstFitAllocator] > (Work-Job-Executor-6:ctx-0d738a8e job-13737/job-192988 ctx-6e9c753f > FirstFitRoutingAllocator) List of hosts in ascending order of number of > VMs: [224, 19, 21, 15] > 2017-01-30 08:44:06,322 DEBUG [c.c.a.m.a.i.FirstFitAllocator] > (Work-Job-Executor-6:ctx-0d738a8e job-13737/job-192988 ctx-6e9c753f > FirstFitRoutingAllocator) FirstFitAllocator has 0 hosts to check for > allocation: [] > 2017-01-30 08:44:06,322 INFO [c.c.a.m.AgentManagerImpl] > (AgentManager-Handler-13:null) PingMap for agent: 258 will not be updated > because agent is no longer in the PingMap > 2017-01-30 08:44:06,326 INFO [o.a.c.f.j.i.AsyncJobMonitor] > (Work-Job-Executor-5:ctx-32f1e087 job-15393/job-192989) Remove job-192989 > from job monitoring > > 2017-01-30 08:54:38,139 INFO [c.c.v.VirtualMachineManagerImpl] > (Work-Job-Executor-20:ctx-a988090f job-14965/job-193019 ctx-052eb4f5) > Migration cancelled because state has changed: VM[User|i-189-1178-VM] > 2017-01-30 08:54:38,139 DEBUG [c.c.v.VirtualMachineManagerImpl] > (Work-Job-Executor-20:ctx-a988090f job-14965/job-193019 ctx-052eb4f5) > Unable to migrate VM due to: Migration cancelled because state has changed: > VM[User|i-189-1178-VM] > 2017-01-30 08:54:38,167 DEBUG [c.c.c.CapacityManagerImpl] > (Work-Job-Executor-20:ctx-a988090f job-14965/job-193019 ctx-052eb4f5) VM > state transitted from :Running to Stopping with event: StopRequestedvm's > original host id: 29 new host id: 33 host id before state transition: 33 > 2017-01-30 08:54:38,177 DEBUG [c.c.a.m.DirectAgentAttache] > (DirectAgent-325:ctx-e540d7f5) Seq 19-2355664080091480198: Response > Received: > 2017-01-30 08:54:38,177 DEBUG [c.c.a.t.Request] > (DirectAgent-325:ctx-e540d7f5) Seq 19-2355664080091480198: Processing: { > Ans: , MgmtId: 345049098498, via: 19, Ver: v1, Flags: 10, > [{"com.cloud.agent.api.NetworkUsageAnswer":{"routerName":"r- > 1706-VM","bytesSent":12276848087,"bytesReceived": > 5226621450,"result":true,"details":"","wait":0}}] } > 2017-01-30 08:54:38,177 DEBUG [c.c.a.t.Request] > (RouterMonitor-1:ctx-2e80f405) Seq 19-2355664080091480198: Received: { > Ans: , MgmtId: 345049098498, via: 19, Ver: v1, Flags: 10, { > NetworkUsageAnswer } } > 2017-01-30 08:54:38,177 DEBUG [c.c.a.m.AgentManagerImpl] > (RouterMonitor-1:ctx-2e80f405) Details from executing class > com.cloud.agent.api.NetworkUsageCommand: > 2017-01-30 08:54:38,178 DEBUG [c.c.a.t.Request] > (Work-Job-Executor-20:ctx-a988090f job-14965/job-193019 ctx-052eb4f5) Seq > 33-1917970491306410035: Sending { Cmd , MgmtId: 345049098498, via: > 33(ewcstack-vh003-prod), Ver: v1, Flags: 100011, > [{"com.cloud.agent.api.StopCommand":{"isProxy":false," > executeInSequence":false,"checkBeforeCleanup":false,"vmName":"i-189-1178-VM","wait":0}}] > } > 2017-01-30 08:54:38,178 DEBUG [c.c.a.t.Request] > (Work-Job-Executor-20:ctx-a988090f job-14965/job-193019 ctx-052eb4f5) Seq > 33-1917970491306410035: Executing: { Cmd , MgmtId: 345049098498, via: > 33(ewcstack-vh003-prod), Ver: v1, Flags: 100011, > [{"com.cloud.agent.api.StopCommand":{"isProxy":false," > executeInSequence":false,"checkBeforeCleanup":false,"vmName":"i-189-1178-VM","wait":0}}] > } > 2017-01-30 08:54:38,178 DEBUG [c.c.a.m.DirectAgentAttache] > (DirectAgent-43:ctx-da30f109) Seq 33-1917970491306410035: Executing request > 2017-01-30 08:54:38,202 DEBUG [c.c.a.m.DirectAgentAttache] > (DirectAgent-326:ctx-4fb88aa7) Seq 21-1035546439318503505: Executing request > > 2017-01-30 09:05:55,628 INFO [c.c.h.HighAvailabilityManagerImpl] > (HA-Worker-4:ctx-36aae733 work-2049) Processing > HAWork[2049-Migration-1178-Stopped-Migrating] > 2017-01-30 09:05:55,633 WARN [o.a.c.f.j.AsyncJobExecutionContext] > (HA-Worker-4:ctx-36aae733 work-2049) Job is executed without a context, > setup psudo job for the executing thread > 2017-01-30 09:05:55,645 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] > (HA-Worker-4:ctx-36aae733 work-2049) Sync job-193045 execution on object > VmWorkJobQueue.1178 > > > > > On 31.01.2017 00:42, Rafael Weingärtner wrote: > >> This seems to be one of those nasty bugs that are very hard to debug. It >> seems to be cause to a combination of different factors, such as planets >> alignments ;) >> >> At class “VirtualMachineManagerImpl” line 2395 (method >> “com.cloud.vm.VirtualMachineManagerImpl.orchestrateMigrateAway(String, >> long, DeploymentPlanner)”) is the start of the code flow that triggers the >> migration. The point here is that, if an error occurs during the >> migration, >> the VM will be stopped; For some reason this is executed only for HA VMs. >> >> Going a little bit further, we find the point that is triggering the error >> in the method “com.cloud.vm.VirtualMachineManagerImpl.migrate( >> VMInstanceVO, >> long, DeployDestination)” at line 1941. This line will call the method >> “com.cloud.vm.VirtualMachineManagerImpl.changeState(VMInstanceVO, Event, >> Long, ItWorkVO, Step)” to change the VM state to migrating/prepare >> migration/or something like that (I am guessing here), but for some >> reason, >> the state of the VM is not the one expected and this is causing a >> “ConcurrentOperationException” to be thrown. The problem is that the >> “com.cloud.vm.VirtualMachineManagerImpl.changeState(VMInstanceVO, Event, >> Long, ItWorkVO, Step)” does not seem to provide much useful log entries. >> >> Just to enable us to discuss the problem a little further, could you tell >> us a little bit about your environment setting? How many ACS MSs do you >> have? What is the VM workload being migrated (I mean, the number of VMs >> and >> computing resources)? Do the other servers have enough computing resources >> to support these VMs? Could you also check the XenServer logs, and check >> what happened on both, source and destination hosts when the migration for >> VM “i-638-1735-VM” was triggered? >> >> Do you have the knowledge/skills to use an IDE (e.g. Eclipse) to connect >> to >> a running JVM and debug a code? Because for me it will not be possible to >> go further without looking at the variables in a running environment. >> >> >> On Mon, Jan 30, 2017 at 4:28 AM, Francois Scheurer < >> francois.scheu...@everyware.ch> wrote: >> >> Dear Rafael >>> >>> >>> Sorry for the late answer (was sick a couple of days). >>> >>> I confirm we are migrating inside the same xenserver pool (within a CS >>> Cluster). >>> >>> >>> Best Regards >>> Francois >>> >>> >>> >>> On 26.01.2017 04:35, Rafael Weingärtner wrote: >>> >>> Found the code that is executed. >>>> I need you to confirm if you are migrating within a cluster or across >>>> clusters? >>>> >>>> On Wed, Jan 25, 2017 at 11:57 AM, Rafael Weingärtner < >>>> rafaelweingart...@gmail.com> wrote: >>>> >>>> I do not have access to ACS code right now, but I suggest starting >>>> >>>>> debugging here: >>>>> >>>>> 2017-01-24 18:05:28,427 DEBUG [c.c.c.CapacityManagerImpl] >>>>> >>>>>> (Work-Job-Executor-156:ctx-14b86f5e job-60216/job-191380 >>>>>> ctx-175d37df) >>>>>> VM state transitted from :Running to Stopping with event: >>>>>> StopRequestedvm's >>>>>> original host id: 224 new host id: 15 host id before state transition: >>>>>> 15 >>>>>> >>>>>> I would try to understand first why ACS requested the hypervisor to >>>>>> stop >>>>>> >>>>> the VM. I mean the following, check the source code for conditions that >>>>> would make ACS request the shutdown of VMs. >>>>> BTW: are you migrating within a cluster or across clusters? >>>>> >>>>> I will be able to analyze this further only after work hours, so if you >>>>> find anything, keep me posted. >>>>> >>>>> On Wed, Jan 25, 2017 at 11:45 AM, Francois Scheurer < >>>>> francois.scheu...@everyware.ch> wrote: >>>>> >>>>> Hello Rafael >>>>> >>>>>> >>>>>> I think the important log lines are these: >>>>>> >>>>>> ewcstack-man02-prod: 2017-01-24 18:05:28,407 INFO >>>>>> [c.c.v.VirtualMachineManagerImpl] (Work-Job-Executor-156:ctx-14b86f5e >>>>>> job-60216/job-191380 ctx-175d37df) Migration cancelled because state >>>>>> has >>>>>> changed: VM[User|i-638-1736-VM] >>>>>> ewcstack-man02-prod: 2017-01-24 18:05:28,407 DEBUG >>>>>> [c.c.v.VirtualMachineManagerImpl] (Work-Job-Executor-156:ctx-14b86f5e >>>>>> job-60216/job-191380 ctx-175d37df) Unable to migrate VM due to: >>>>>> Migration >>>>>> cancelled because state has changed: VM[User|i-638-1736-VM] >>>>>> ewcstack-man02-prod: 2017-01-24 18:05:28,427 DEBUG >>>>>> [c.c.c.CapacityManagerImpl] (Work-Job-Executor-156:ctx-14b86f5e >>>>>> job-60216/job-191380 ctx-175d37df) VM state transitted from :Running >>>>>> to >>>>>> Stopping with event: StopRequestedvm's original host id: 224 new host >>>>>> id: >>>>>> 15 host id before state transition: 15 >>>>>> >>>>>> The really disturbing thing is that CS stop again the VM if we start >>>>>> it >>>>>> manually, apparently because the migratingjob is still running. >>>>>> >>>>>> Note the the VM HA Flag is enabled. >>>>>> >>>>>> >>>>>> >>>>>> Thank you. >>>>>> >>>>>> >>>>>> Best Regards. >>>>>> >>>>>> Francois >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> more details: >>>>>> >>>>>> >>>>>> >>>>>> ewcstack-man02-prod: 2017-01-24 18:05:28,148 DEBUG >>>>>> [o.a.c.e.o.VolumeOrchestrator] (Work-Job-Executor-156:ctx-14b86f5e >>>>>> job-60216/job-191380 ctx-175d37df) Preparing 3 volumes for >>>>>> VM[User|i-638-1736-VM] >>>>>> ewcstack-man02-prod: 2017-01-24 18:05:28,155 DEBUG >>>>>> [c.c.v.VmWorkJobDispatcher] (Work-Job-Executor-122:ctx-a8bacc64 >>>>>> job-14953/job-191351) Done with run of VM work job: >>>>>> com.cloud.vm.VmWorkMigrateAway for VM 1411, job origin: 14953 >>>>>> ewcstack-man02-prod: 2017-01-24 18:05:28,155 DEBUG >>>>>> [o.a.c.f.j.i.AsyncJobManagerImpl] (Work-Job-Executor-122:ctx-a8bacc64 >>>>>> job-14953/job-191351) Done executing com.cloud.vm.VmWorkMigrateAway >>>>>> for >>>>>> job-191351 >>>>>> ewcstack-man02-prod: 2017-01-24 18:05:28,160 INFO >>>>>> [o.a.c.f.j.i.AsyncJobMonitor] (Work-Job-Executor-122:ctx-a8bacc64 >>>>>> job-14953/job-191351) Remove job-191351 from job monitoring >>>>>> ewcstack-man02-prod: 2017-01-24 18:05:28,168 INFO >>>>>> [c.c.h.HighAvailabilityManagerImpl] (HA-Worker-2:ctx-f3bcf43c >>>>>> work-1989) >>>>>> Completed HAWork[1989-Migration-1411-Running-Migrating] >>>>>> ewcstack-man02-prod: 2017-01-24 18:05:28,246 DEBUG >>>>>> [c.c.a.m.ClusteredAgentAttache] (Work-Job-Executor-156:ctx-14b86f5e >>>>>> job-60216/job-191380 ctx-175d37df) Seq 19-8526158519542516375: >>>>>> Forwarding >>>>>> Seq 19-8526158519542516375: { Cmd , MgmtId: 345049103441, via: >>>>>> 19(ewcstack-vh010-prod), Ver: v1, Flags: 100111, [{ >>>>>> "com.cloud.agent.api.PrepareForMigrationCommand":{"vm":{"id" >>>>>> :1736,"name":"i-638-1736-VM","bootloader":"PyGrub","type":"U >>>>>> ser","cpus":4,"minSpeed":2100,"maxSpeed":2100,"minRam":17179 >>>>>> 869184,"maxRam":17179869184,"arch":"x86_64","os":"Other >>>>>> (64-bit)","platformEmulator":"Other install >>>>>> media","bootArgs":"","enable >>>>>> HA":true,"limitCpuUse":true,"enableDynamicallyScaleVm":true, >>>>>> "vncPassword":"pC1WUC7h1DBH9J36q3H9pg==","params":{"memoryOv >>>>>> ercommitRatio":"1.0","platform":"viridian:true;acpi:1;apic:t >>>>>> rue;pae:true;nx:true;timeoffset:-3601","keyboard":"us","Mess >>>>>> age.ReservedCapacityFreed.Flag":"false","cpuOvercommitRatio":"4.0"," >>>>>> hypervisortoolsversion":"xenserver61"},"uuid":"5bff01de-c033 >>>>>> -4925-9e47-30dd14539272","disks":[{"data":{"org.apache.cloud >>>>>> stack.storage.to.VolumeObjectTO":{"uuid":"1ab514b2-f2de- >>>>>> 4d85-8e0d-9768184a1349","volumeType":"ROOT","dataStore":{"org. >>>>>> apache.cloudstack.storage.to.PrimaryDataStoreTO":{"uuid":"b8 >>>>>> 751b89-00f >>>>>> 6-5079-ed7a-c073579c7563","id":64,"poolType":"PreSetup","hos >>>>>> t":"localhost","path":"/b8751b89-00f6-5079-ed7a-c073579c7563 >>>>>> ","port":0,"url":"PreSetup://localhost/b8751b89-00f6-5079-ed >>>>>> 7a-c073579c7563/?ROLE=Primary&STOREUUID=b8751b89-00f6-5079- >>>>>> ed7a-c073579c7563"}},"name":"ROOT-1736","size":53687091200,"path":"d3 >>>>>> f87e3c-1efe-42ad-aa93-609d5b980a34","volumeId":8655,"vmName" >>>>>> :"i-638-1736-VM","accountId":638,"format":"VHD","provisionin >>>>>> gType":"THIN","id":8655,"deviceId":0,"hypervisorType":" >>>>>> XenServer"}},"diskSeq":0,"path":"d3f87e3c-1efe-42ad-aa93 >>>>>> -609d5b980a34","type":"ROOT","_details":{"managed":"false"," >>>>>> storagePort":"0", >>>>>> "storageHost":"localhost","volumeSize":"53687091200"}},{"dat >>>>>> a":{"org.apache.cloudstack.storage.to.VolumeObjectTO":{"uuid >>>>>> ":"733df9dc-5e24-4ab9-b22b-bce78322428e","volumeType":"DATAD >>>>>> ISK","dataStore":{"org.apache.cloudstack.storage.to.PrimaryD >>>>>> ataStoreTO":{"uuid":"b8751b89-00f6-5079-ed7a-c073579c7563","id":64," >>>>>> poolType":"PreSetup","host":"localhost","path":"/b8751b89-00 >>>>>> f6-5079-ed7a-c073579c7563","port":0,"url":"PreSetup://localh >>>>>> ost/b8751b89-00f6-5079-ed7a-c073579c7563/?ROLE=Primary&STORE >>>>>> UUID=b8751b89-00f6-5079-ed7a-c073579c7563"}},"name":"WELLA- >>>>>> DB1-NCSA_D","size":268435456000,"path":"265f168a-bb92-4591-a32b-ee3e2 >>>>>> 2c8cb68","volumeId":8656,"vmName":"i-638-1736-VM","accountId >>>>>> ":638,"format":"VHD","provisioningType":"THIN","id":8656," >>>>>> deviceId":1,"hypervisorType":"XenServer"}},"diskSeq":1," >>>>>> path":"265f168a-bb92-4591-a32b-ee3e22c8cb68","type":"DATADIS >>>>>> K","_details":{"managed":"false","storagePort":"0","stor >>>>>> ageHost":"localhos >>>>>> t","volumeSize":"268435456000"}},{"data":{"org.apache.clouds >>>>>> tack.storage.to.VolumeObjectTO":{"uuid":"bda6f335-b8a3-479c- >>>>>> 8c0d-41b344d2a50d","volumeType":"DATADISK","dataStore":{"org. >>>>>> apache.cloudstack.storage.to.PrimaryDataStoreTO":{"uuid" >>>>>> :"09df3ca6-0a20-e4f0-3d79-12a6ed67a13d","id":89,"poolType":" >>>>>> PreSetup"," >>>>>> host":"localhost","path":"/09df3ca6-0a20-e4f0-3d79-12a6ed67a >>>>>> 13d","port":0,"url":"PreSetup://localhost/09df3ca6-0a20-e4f0 >>>>>> -3d79-12a6ed67a13d/?ROLE=Primary&STOREUUID=09df3ca6-0a20- >>>>>> >>>>>> e4f0-3d79-12a6ed67a13d"}},"name":"WELLA-DB1-NCSA_E","size >>>>>> ":429496729600,"path":"612cb586-a70c-4896-b120-062ee7a67894" >>>>>> ,"volumeId":86 >>>>>> 57,"vmName":"i-638-1736-VM","accountId":638,"format":"VHD"," >>>>>> provisioningType":"THIN","id":8657,"deviceId":2,"hypervisorT >>>>>> ype":"XenServer"}},"diskSeq":2,"path":"612cb586-a70c-4896-b1 >>>>>> 20-062ee7a67894","type":"DATADISK","_details":{"managed":" >>>>>> false","storagePort":"0","storageHost":"localhost","volumeSi >>>>>> ze":"42949 >>>>>> 6729600"}},{"data":{"org.apache.cloudstack.storage.to.Templa >>>>>> teObjectTO":{"uuid":"0e5bf4ff-caca-4b08-abed-7bbcdf3c42ef"," >>>>>> id":201,"format":"ISO","accountId":1,"hvm":true,"displayText >>>>>> ":"xen-pv-drv-iso","name":"xs-tools.iso","guestOsType":"CentOS 4.5 >>>>>> (32-bit)","hypervisorType":"XenServer"}},"diskSeq":3,"type":" >>>>>> ISO"}],"nics":[{"deviceId":0,"networkRateMbps":1000,"default >>>>>> Nic":true,"pxeDisable":false,"nicUuid":"3523c3d3-9f92-4076-a >>>>>> d99-f8864d0f0edd","uuid":"e6fbb257-16e1-415f-88fb-9a79b263b3 >>>>>> a8","ip":"192.168.1.42","netmask":"255.255.255.0"," >>>>>> gateway":"192.168.1.1","mac":"02:00:0c:ae:00:0f","dns1":"8.8 >>>>>> .8.8","dns2":"8.8 >>>>>> .4.4","broadcastType":"Vlan","type":"Guest","broadcastUri":" >>>>>> vlan://2285","isolationUri":"vlan://2285","isSecurityGroupEn >>>>>> abled":false,"name":"bond0"}],"vcpuMaxLimit":16},"wait":0}}] } to >>>>>> 345049098498 >>>>>> ewcstack-man02-prod: 2017-01-24 18:05:28,399 DEBUG [c.c.a.t.Request] >>>>>> (AgentManager-Handler-10:null) Seq 19-8526158519542516375: Processing: >>>>>> { >>>>>> Ans: , MgmtId: 345049103441, via: 19, Ver: v1, Flags: 110, >>>>>> [{"com.cloud.agent.api.PrepareForMigrationAnswer":{"result": >>>>>> true,"wait":0}}] >>>>>> } >>>>>> ewcstack-man02-prod: 2017-01-24 18:05:28,399 DEBUG >>>>>> [c.c.a.m.AgentAttache] >>>>>> (AgentManager-Handler-10:null) Seq 19-8526158519542516375: No more >>>>>> commands >>>>>> found >>>>>> ewcstack-man02-prod: 2017-01-24 18:05:28,399 DEBUG [c.c.a.t.Request] >>>>>> (Work-Job-Executor-156:ctx-14b86f5e job-60216/job-191380 >>>>>> ctx-175d37df) >>>>>> Seq 19-8526158519542516375: Received: { Ans: , MgmtId: 345049103441, >>>>>> via: >>>>>> 19, Ver: v1, Flags: 110, { PrepareForMigrationAnswer } } >>>>>> >>>>>> ewcstack-man02-prod: 2017-01-24 18:05:28,407 INFO >>>>>> [c.c.v.VirtualMachineManagerImpl] (Work-Job-Executor-156:ctx-14b86f5e >>>>>> job-60216/job-191380 ctx-175d37df) Migration cancelled because state >>>>>> has >>>>>> changed: VM[User|i-638-1736-VM] >>>>>> ewcstack-man02-prod: 2017-01-24 18:05:28,407 DEBUG >>>>>> [c.c.v.VirtualMachineManagerImpl] (Work-Job-Executor-156:ctx-14b86f5e >>>>>> job-60216/job-191380 ctx-175d37df) Unable to migrate VM due to: >>>>>> Migration >>>>>> cancelled because state has changed: VM[User|i-638-1736-VM] >>>>>> ewcstack-man02-prod: 2017-01-24 18:05:28,427 DEBUG >>>>>> [c.c.c.CapacityManagerImpl] (Work-Job-Executor-156:ctx-14b86f5e >>>>>> job-60216/job-191380 ctx-175d37df) VM state transitted from :Running >>>>>> to >>>>>> Stopping with event: StopRequestedvm's original host id: 224 new host >>>>>> id: >>>>>> 15 host id before state transition: 15 >>>>>> >>>>>> ewcstack-man02-prod: 2017-01-24 18:05:28,438 DEBUG >>>>>> [c.c.a.m.ClusteredAgentAttache] (Work-Job-Executor-156:ctx-14b86f5e >>>>>> job-60216/job-191380 ctx-175d37df) Seq 15-6889944479923437643: >>>>>> Forwarding >>>>>> Seq 15-6889944479923437643: { Cmd , MgmtId: 345049103441, via: >>>>>> 15(ewcstack-vh011-prod), Ver: v1, Flags: 100011, >>>>>> [{"com.cloud.agent.api.StopCommand":{"isProxy":false,"execut >>>>>> eInSequence":false,"checkBeforeCleanup":false,"vmName":"i- >>>>>> 638-1736-VM","wait":0}}] >>>>>> } to 345049098498 >>>>>> ewcstack-man02-prod: 2017-01-24 18:05:30,335 DEBUG >>>>>> [c.c.a.m.AgentManagerImpl] (AgentManager-Handler-1:null) SeqA >>>>>> 254-800498: >>>>>> Processing Seq 254-800498: { Cmd , MgmtId: -1, via: 254, Ver: v1, >>>>>> Flags: >>>>>> 11, [{"com.cloud.agent.api.ConsoleProxyLoadReportCommand":{"_ >>>>>> proxyVmId":1530,"_loadInfo":"{\n >>>>>> \"connections\": []\n}","wait":0}}] } >>>>>> ewcstack-man02-prod: 2017-01-24 18:05:30,340 DEBUG >>>>>> [c.c.a.m.AgentManagerImpl] (AgentManager-Handler-1:null) SeqA >>>>>> 254-800498: >>>>>> Sending Seq 254-800498: { Ans: , MgmtId: 345049103441, via: 254, Ver: >>>>>> v1, >>>>>> Flags: 100010, [{"com.cloud.agent.api.AgentCo >>>>>> ntrolAnswer":{"result":true,"wait":0}}] } >>>>>> ewcstack-man02-prod: 2017-01-24 18:05:32,447 DEBUG [c.c.a.t.Request] >>>>>> (AgentManager-Handler-11:null) Seq 15-6889944479923437632: Processing: >>>>>> { >>>>>> Ans: , MgmtId: 345049103441, via: 15, Ver: v1, Flags: 10, >>>>>> [{"com.cloud.agent.api.StopAnswer":{"platform":"viridian:tru >>>>>> e;acpi:1;apic:true;pae:true;nx:true;timeoffset:3601","result >>>>>> ":true,"details":"Stop >>>>>> VM i-638-1735-VM Succeed","wait":0}}] } >>>>>> ewcstack-man02-prod: 2017-01-24 18:05:32,447 DEBUG [c.c.a.t.Request] >>>>>> (Work-Job-Executor-115:ctx-0d5db528 job-13733/job-191379 >>>>>> ctx-bbabfeae) >>>>>> Seq 15-6889944479923437632: Received: { Ans: , MgmtId: 345049103441, >>>>>> via: >>>>>> 15, Ver: v1, Flags: 10, { StopAnswer } } >>>>>> ewcstack-man02-prod: 2017-01-24 18:05:32,470 DEBUG >>>>>> [c.c.v.VirtualMachineManagerImpl] (Work-Job-Executor-115:ctx-0d5db528 >>>>>> job-13733/job-191379 ctx-bbabfeae) VM[User|i-638-1735-VM] is stopped >>>>>> on >>>>>> the >>>>>> host. Proceeding to release resource held. >>>>>> ewcstack-man02-prod: 2017-01-24 18:05:32,475 DEBUG >>>>>> [c.c.n.NetworkModelImpl] (Work-Job-Executor-115:ctx-0d5db528 >>>>>> job-13733/job-191379 ctx-bbabfeae) Service SecurityGroup is not >>>>>> supported >>>>>> in the network id=519 >>>>>> ewcstack-man02-prod: 2017-01-24 18:05:32,478 DEBUG >>>>>> [o.a.c.e.o.NetworkOrchestrator] (Work-Job-Executor-115:ctx-0d5db528 >>>>>> job-13733/job-191379 ctx-bbabfeae) Changing active number of nics for >>>>>> network id=519 on -1 >>>>>> ewcstack-man02-prod: 2017-01-24 18:05:32,488 DEBUG >>>>>> [o.a.c.e.o.NetworkOrchestrator] (Work-Job-Executor-115:ctx-0d5db528 >>>>>> job-13733/job-191379 ctx-bbabfeae) Asking VpcVirtualRouter to release >>>>>> NicProfile[17403-1735-3364083b-dcc0-4eb9-a7f9-f3e85bf8c7d3-1 >>>>>> 92.168.1.11-null >>>>>> ewcstack-man02-prod: 2017-01-24 18:05:32,488 DEBUG >>>>>> [c.c.v.VirtualMachineManagerImpl] (Work-Job-Executor-115:ctx-0d5db528 >>>>>> job-13733/job-191379 ctx-bbabfeae) Successfully released network >>>>>> resources >>>>>> for the vm VM[User|i-638-1735-VM] >>>>>> ewcstack-man02-prod: 2017-01-24 18:05:32,488 DEBUG >>>>>> [c.c.v.VirtualMachineManagerImpl] (Work-Job-Executor-115:ctx-0d5db528 >>>>>> job-13733/job-191379 ctx-bbabfeae) Successfully released storage >>>>>> resources >>>>>> for the vm VM[User|i-638-1735-VM] >>>>>> ewcstack-man02-prod: 2017-01-24 18:05:32,496 DEBUG >>>>>> [c.c.c.CapacityManagerImpl] (Work-Job-Executor-115:ctx-0d5db528 >>>>>> job-13733/job-191379 ctx-bbabfeae) VM state transitted from :Stopping >>>>>> to >>>>>> Stopped with event: OperationSucceededvm's original host id: 15 new >>>>>> host >>>>>> id: null host id before state transition: 15 >>>>>> ewcstack-man02-prod: 2017-01-24 18:05:32,505 DEBUG >>>>>> [c.c.c.CapacityManagerImpl] (Work-Job-Executor-115:ctx-0d5db528 >>>>>> job-13733/job-191379 ctx-bbabfeae) Hosts's actual total CPU: 82976 and >>>>>> CPU >>>>>> after applying overprovisioning: 331904 >>>>>> ewcstack-man02-prod: 2017-01-24 18:05:32,505 DEBUG >>>>>> [c.c.c.CapacityManagerImpl] (Work-Job-Executor-115:ctx-0d5db528 >>>>>> job-13733/job-191379 ctx-bbabfeae) Hosts's actual total RAM: >>>>>> 261961927808 >>>>>> and RAM after applying overprovisioning: 261961924608 >>>>>> ewcstack-man02-prod: 2017-01-24 18:05:32,505 DEBUG >>>>>> [c.c.c.CapacityManagerImpl] (Work-Job-Executor-115:ctx-0d5db528 >>>>>> job-13733/job-191379 ctx-bbabfeae) release cpu from host: 15, old >>>>>> used: >>>>>> 71300,reserved: 2100, actual total: 82976, total with >>>>>> overprovisioning: >>>>>> 331904; new used: 67100,reserved:6300; movedfromreserved: >>>>>> false,moveToReserveredtrue >>>>>> ewcstack-man02-prod: 2017-01-24 18:05:32,505 DEBUG >>>>>> [c.c.c.CapacityManagerImpl] (Work-Job-Executor-115:ctx-0d5db528 >>>>>> job-13733/job-191379 ctx-bbabfeae) release mem from host: 15, old >>>>>> used: >>>>>> 98784247808,reserved: 1073741824, total: 261961924608; new used: >>>>>> 94489280512,reserved:5368709120 <(53)%206870-9120>; >>>>>> movedfromreserved: >>>>>> >>>>>> false,moveToReserveredtrue >>>>>> ewcstack-man02-prod: 2017-01-24 18:05:32,538 DEBUG >>>>>> [c.c.n.NetworkModelImpl] (Work-Job-Executor-115:ctx-0d5db528 >>>>>> job-13733/job-191379 ctx-bbabfeae) Service SecurityGroup is not >>>>>> supported >>>>>> in the network id=519 >>>>>> ewcstack-man02-prod: 2017-01-24 18:05:32,557 DEBUG >>>>>> [c.c.n.NetworkModelImpl] (Work-Job-Executor-115:ctx-0d5db528 >>>>>> job-13733/job-191379 ctx-bbabfeae) Service SecurityGroup is not >>>>>> supported >>>>>> in the network id=519 >>>>>> >>>>>> ewcstack-man02-prod: 2017-01-24 18:05:32,588 ERROR >>>>>> [c.c.v.VmWorkJobHandlerProxy] (Work-Job-Executor-115:ctx-0d5db528 >>>>>> job-13733/job-191379 ctx-bbabfeae) Invocation exception, caused by: >>>>>> com.cloud.utils.exception.CloudRuntimeException: Unable to migrate >>>>>> VM[User|i-638-1735-VM] >>>>>> ewcstack-man02-prod: 2017-01-24 18:05:32,589 INFO >>>>>> [c.c.v.VmWorkJobHandlerProxy] (Work-Job-Executor-115:ctx-0d5db528 >>>>>> job-13733/job-191379 ctx-bbabfeae) Rethrow exception >>>>>> com.cloud.utils.exception.CloudRuntimeException: Unable to migrate >>>>>> VM[User|i-638-1735-VM] >>>>>> ewcstack-man02-prod: 2017-01-24 18:05:32,589 DEBUG >>>>>> [c.c.v.VmWorkJobDispatcher] (Work-Job-Executor-115:ctx-0d5db528 >>>>>> job-13733/job-191379) Done with run of VM work job: >>>>>> com.cloud.vm.VmWorkMigrateAway for VM 1735, job origin: 13733 >>>>>> ewcstack-man02-prod: 2017-01-24 18:05:32,589 ERROR >>>>>> [c.c.v.VmWorkJobDispatcher] (Work-Job-Executor-115:ctx-0d5db528 >>>>>> job-13733/job-191379) Unable to complete AsyncJobVO {id:191379, >>>>>> userId: >>>>>> 1, >>>>>> accountId: 1, instanceType: null, instanceId: null, cmd: >>>>>> com.cloud.vm.VmWorkMigrateAway, cmdInfo: >>>>>> rO0ABXNyAB5jb20uY2xvdWQudm0uVm >>>>>> 1Xb3JrTWlncmF0ZUF3YXmt4MX4jtcEmwIAAUoACXNyY0hvc3RJZHhyABNjb2 >>>>>> 0uY2xvdWQudm0uVm1Xb3Jrn5m2VvAlZ2sCAARKAAlhY2NvdW50SWRKAAZ1c2 >>>>>> VySWRKAAR2bUlkTAALaGFuZGxlck5hbWV0ABJMamF2YS9sYW5nL1N0cmluZz >>>>>> t4cAAAAAAAAAABAAAAAAAAAAEAAAAAAAAGx3QAGVZpcnR1YWxNYWNoaW5lTW >>>>>> FuYWdlckltcGwAAAAAAAAA4A, >>>>>> cmdVersion: 0, status: IN_PROGRESS, processStatus: 0, resultCode: 0, >>>>>> result: null, initMsid: 345049098498, completeMsid: null, lastUpdated: >>>>>> null, lastPolled: null, created: Tue Jan 24 18:05:00 CET 2017}, job >>>>>> origin:13733 >>>>>> ewcstack-man02-prod: com.cloud.utils.exception.CloudRuntimeException: >>>>>> Unable to migrate VM[User|i-638-1735-VM] >>>>>> ewcstack-man02-prod: at com.cloud.vm.VirtualMachineMan >>>>>> agerImpl.orchestrateMigrateAway(VirtualMachineManagerImpl.java:2405) >>>>>> ewcstack-man02-prod: at com.cloud.vm.VirtualMachineMan >>>>>> agerImpl.orchestrateMigrateAway(VirtualMachineManagerImpl.java:4517) >>>>>> ewcstack-man02-prod: at sun.reflect.GeneratedMethodAcc >>>>>> essor402.invoke(Unknown >>>>>> Source) >>>>>> ewcstack-man02-prod: at sun.reflect.DelegatingMethodAc >>>>>> cessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >>>>>> ewcstack-man02-prod: at java.lang.reflect.Method.invok >>>>>> e(Method.java:606) >>>>>> ewcstack-man02-prod: at com.cloud.vm.VmWorkJobHandlerP >>>>>> roxy.handleVmWorkJob(VmWorkJobHandlerProxy.java:107) >>>>>> ewcstack-man02-prod: at com.cloud.vm.VirtualMachineMan >>>>>> agerImpl.handleVmWorkJob(VirtualMachineManagerImpl.java:4636) >>>>>> ewcstack-man02-prod: at com.cloud.vm.VmWorkJobDispatch >>>>>> er.runJob(VmWorkJobDispatcher.java:103) >>>>>> ewcstack-man02-prod: at org.apache.cloudstack.framewor >>>>>> k.jobs.impl.AsyncJobManagerImpl$5.runInContext(AsyncJobManag >>>>>> erImpl.java:537) >>>>>> ewcstack-man02-prod: at org.apache.cloudstack.managed. >>>>>> context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49) >>>>>> ewcstack-man02-prod: at org.apache.cloudstack.managed. >>>>>> context.impl.DefaultManagedContext$1.call(DefaultManagedCont >>>>>> ext.java:56) >>>>>> ewcstack-man02-prod: at org.apache.cloudstack.managed. >>>>>> context.impl.DefaultManagedContext.callWithContext(DefaultMa >>>>>> nagedContext.java:103) >>>>>> ewcstack-man02-prod: at org.apache.cloudstack.managed. >>>>>> context.impl.DefaultManagedContext.runWithContext(DefaultMan >>>>>> agedContext.java:53) >>>>>> ewcstack-man02-prod: at org.apache.cloudstack.managed. >>>>>> context.ManagedContextRunnable.run(ManagedContextRunnable.java:46) >>>>>> ewcstack-man02-prod: at org.apache.cloudstack.framewor >>>>>> k.jobs.impl.AsyncJobManagerImpl$5.run(AsyncJobManagerImpl.java:494) >>>>>> ewcstack-man02-prod: at java.util.concurrent.Executors >>>>>> $RunnableAdapter.call(Executors.java:471) >>>>>> ewcstack-man02-prod: at java.util.concurrent.FutureTas >>>>>> k.run(FutureTask.java:262) >>>>>> ewcstack-man02-prod: at java.util.concurrent.ThreadPoo >>>>>> lExecutor.runWorker(ThreadPoolExecutor.java:1145) >>>>>> ewcstack-man02-prod: at java.util.concurrent.ThreadPoo >>>>>> lExecutor$Worker.run(ThreadPoolExecutor.java:615) >>>>>> ewcstack-man02-prod: at java.lang.Thread.run(Thread.java:745) >>>>>> ewcstack-man02-prod: 2017-01-24 18:05:32,590 DEBUG >>>>>> [o.a.c.f.j.i.AsyncJobManagerImpl] (Work-Job-Executor-115:ctx-0d5db528 >>>>>> job-13733/job-191379) Complete async job-191379, jobStatus: FAILED, >>>>>> resultCode: 0, result: rO0ABXNyAC9jb20uY2xvdWQudXRpbH >>>>>> MuZXhjZXB0aW9uLkNsb3VkUnVudGltZUV4Y2VwdGlvbgAAAABWTU9yAwABSQ >>>>>> ALY3NFcnJvckNvZGV4cgAaamF2YS5sYW5nLlJ1bnRpbWVFeGNlcHRpb26eXw >>>>>> ZHCjSD5QIAAHhyABNqYXZhLmxhbmcuRXhjZXB0aW9u0P0fPho7HMQCAAB4cg >>>>>> ATamF2YS5sYW5nLlRocm93YWJsZdXGNSc5d7jLAwAETAAFY2F1c2V0ABVMam >>>>>> F2YS9sYW5nL1Rocm93YWJsZTtMAA1kZXRhaWxNZXNzYWdldAASTGphdmEvbG >>>>>> FuZy9TdHJpbmc7WwAKc3RhY2tUcmFjZXQAHltMamF2YS9sYW5nL1N0YWNrVH >>>>>> JhY2VFbGVtZW50O0wAFHN1cHByZXNzZWRFeGNlcHRpb25zdAAQTGphdmEvdX >>>>>> RpbC9MaXN0O3hwcQB-AAh0AChVbmFibGUgdG8gbWlncmF0ZSBWTVtVc2VyfG >>>>>> ktNjM4LTE3MzUtVk1ddXIAHltMamF2YS5sYW5nLlN0YWNrVHJhY2VFbGVtZW >>>>>> 50OwJGKjw8_SI5AgAAeHAAAAAUc3IAG2phdmEubGFuZy5TdGFja1RyYWNlRW >>>>>> xlbWVudGEJxZomNt2FAgAESQAKbGluZU51bWJlckwADmRlY2xhcmluZ0NsYX >>>>>> NzcQB-AAVMAAhmaWxlTmFtZXEAfgAFTAAKbWV0aG9kTmFtZXEAfgAFeHAAAA >>>>>> lldAAmY29tLmNsb3VkLnZtLlZpcnR1YWxNYWNoaW5lTWFuYWdlckltcGx0AB >>>>>> 5WaXJ0dWFsTWFjaGluZU1hbmFnZXJJbXBsLmphdmF0ABZvcmNoZXN0cmF0ZU >>>>>> 1pZ3JhdGVBd2F5c3EAfgAMAAARpXEAfgAOcQB-AA9xAH4AEHNxAH4ADP____ >>>>>> 90ACZzdW4ucmVmbGVjdC5HZW5lcmF0ZWRNZXRob2RBY2Nlc3NvcjQwMnB0AA >>>>>> ZpbnZva2VzcQB-AAwAAAArdAAoc3VuLnJlZmxlY3QuRGVsZWdhdGluZ01ldG >>>>>> hvZEFjY2Vzc29ySW1wbHQAIURlbGVnYXRpbmdNZXRob2RBY2Nlc3NvckltcG >>>>>> wuamF2YXEAfgAUc3EAfgAMAAACXnQAGGphdmEubGFuZy5yZWZsZWN0Lk1ldG >>>>>> hvZHQAC01ldGhvZC5qYXZhcQB-ABRzcQB-AAwAAABrdAAiY29tLmNsb3VkLn >>>>>> ZtLlZtV29ya0pvYkhhbmRsZXJQcm94eXQAGlZtV29ya0pvYkhhbmRsZXJQcm >>>>>> 94eS5qYXZhdAAPaGFuZGxlVm1Xb3JrSm9ic3EAfgAMAAASHHEAfgAOcQB- >>>>>> AA9xAH4AHnNxAH4ADAAAAGd0ACBjb20uY2xvdWQudm0uVm1Xb3JrSm9iRGlz >>>>>> cGF0Y2hlcnQAGFZtV29ya0pvYkRpc3BhdGNoZXIuamF2YXQABnJ1bkpvYnNx >>>>>> AH4ADAAAAhl0AD9vcmcuYXBhY2hlLmNsb3Vkc3RhY2suZnJhbWV3b3JrLmpv >>>>>> YnMuaW1wbC5Bc3luY0pvYk1hbmFnZXJJbXBsJDV0ABhBc3luY0pvYk1hbmFn >>>>>> ZXJJbXBsLmphdmF0AAxydW5JbkNvbnRleHRzcQB-AAwAAAAxdAA-b3JnLmFw >>>>>> YWNoZS5jbG91ZHN0YWNrLm1hbmFnZWQuY29udGV4dC5NYW5hZ2VkQ29udGV4 >>>>>> dFJ1bm5hYmxlJDF0ABtNYW5hZ2VkQ29udGV4dFJ1bm5hYmxlLmphdmF0AANy >>>>>> dW5zcQB-AAwAAAA4dABCb3JnLmFwYWNoZS5jbG91ZHN0YWNrLm1hbmFnZWQu >>>>>> Y29udGV4dC5pbXBsLkRlZmF1bHRNYW5hZ2VkQ29udGV4dCQxdAAaRGVmYXVs >>>>>> dE1hbmFnZWRDb250ZXh0LmphdmF0AARjYWxsc3EAfgAMAAAAZ3QAQG9yZy5h >>>>>> cGFjaGUuY2xvdWRzdGFjay5tYW5hZ2VkLmNvbnRleHQuaW1wbC5EZWZhdWx0 >>>>>> TWFuYWdlZENvbnRleHRxAH4ALnQAD2NhbGxXaXRoQ29udGV4dHNxAH4ADAAA >>>>>> ADVxAH4AMXEAfgAudAAOcnVuV2l0aENvbnRleHRzcQB-AAwAAAAudAA8b3Jn >>>>>> LmFwYWNoZS5jbG91ZHN0YWNrLm1hbmFnZWQuY29udGV4dC5NYW5hZ2VkQ29u >>>>>> dGV4dFJ1bm5hYmxlcQB-ACpxAH4AK3NxAH4ADAAAAe5xAH4AJXEAfgAmcQB- >>>>>> ACtzcQB-AAwAAAHXdAAuamF2YS51dGlsLmNvbmN1cnJlbnQuRXhlY3V0b3Jz >>>>>> JFJ1bm5hYmxlQWRhcHRlcnQADkV4ZWN1dG9ycy5qYXZhcQB-AC9zcQB-AAwA >>>>>> AAEGdAAfamF2YS51dGlsLmNvbmN1cnJlbnQuRnV0dXJlVGFza3QAD0Z1dHVy >>>>>> ZVRhc2suamF2YXEAfgArc3EAfgAMAAAEeXQAJ2phdmEudXRpbC5jb25jdXJy >>>>>> ZW50LlRocmVhZFBvb2xFeGVjdXRvcnQAF1RocmVhZFBvb2xFeGVjdXRvci5q >>>>>> YXZhdAAJcnVuV29ya2Vyc3EAfgAMAAACZ3QALmphdmEudXRpbC5jb25jdXJy >>>>>> ZW50LlRocmVhZFBvb2xFeGVjdXRvciRXb3JrZXJxAH4AQHEAfgArc3EAfgAM >>>>>> AAAC6XQAEGphdmEubGFuZy5UaHJlYWR0AAtUaHJlYWQuamF2YXEAfgArc3IA >>>>>> JmphdmEudXRpbC5Db2xsZWN0aW9ucyRVbm1vZGlmaWFibGVMaXN0_A8lMbXs >>>>>> jhACAAFMAARsaXN0cQB-AAd4cgAsamF2YS51dGlsLkNvbGxlY3Rpb25zJFVu >>>>>> bW9kaWZpYWJsZUNvbGxlY3Rpb24ZQgCAy173HgIAAUwAAWN0ABZMamF2YS91 >>>>>> dGlsL0NvbGxlY3Rpb247eHBzcgATamF2YS51dGlsLkFycmF5TGlzdHiB0h2Z >>>>>> x2GdAwABSQAEc2l6ZXhwAAAAAHcEAAAAAHhxAH4ATHgAABCadwgAAAAAAAAAAHg >>>>>> ewcstack-man02-prod: 2017-01-24 18:05:32,592 DEBUG >>>>>> [o.a.c.f.j.i.AsyncJobManagerImpl] (Work-Job-Executor-115:ctx-0d5db528 >>>>>> job-13733/job-191379) Publish async job-191379 complete on message bus >>>>>> ewcstack-man02-prod: 2017-01-24 18:05:32,592 DEBUG >>>>>> [o.a.c.f.j.i.AsyncJobManagerImpl] (Work-Job-Executor-115:ctx-0d5db528 >>>>>> job-13733/job-191379) Wake up jobs related to job-191379 >>>>>> ewcstack-man02-prod: 2017-01-24 18:05:32,592 DEBUG >>>>>> [o.a.c.f.j.i.AsyncJobManagerImpl] (Work-Job-Executor-115:ctx-0d5db528 >>>>>> job-13733/job-191379) Update db status for job-191379 >>>>>> ewcstack-man02-prod: 2017-01-24 18:05:32,594 DEBUG >>>>>> [o.a.c.f.j.i.AsyncJobManagerImpl] (Work-Job-Executor-115:ctx-0d5db528 >>>>>> job-13733/job-191379) Wake up jobs joined with job-191379 and disjoin >>>>>> all >>>>>> subjobs created from job- 191379 >>>>>> ewcstack-man02-prod: 2017-01-24 18:05:32,604 DEBUG >>>>>> [o.a.c.f.j.i.AsyncJobManagerImpl] (Work-Job-Executor-115:ctx-0d5db528 >>>>>> job-13733/job-191379) Done executing com.cloud.vm.VmWorkMigrateAway >>>>>> for >>>>>> job-191379 >>>>>> ewcstack-man02-prod: 2017-01-24 18:05:32,608 INFO >>>>>> [o.a.c.f.j.i.AsyncJobMonitor] (Work-Job-Executor-115:ctx-0d5db528 >>>>>> job-13733/job-191379) Remove job-191379 from job monitoring >>>>>> >>>>>> >>>>>> >>>>>> On 25.01.2017 15:53, Rafael Weingärtner wrote: >>>>>> >>>>>> Did you check the ACS log files looking for some exceptions when the >>>>>> >>>>>>> migrations for this specific VM started? >>>>>>> >>>>>>> On Wed, Jan 25, 2017 at 9:49 AM, Francois Scheurer < >>>>>>> francois.scheu...@everyware.ch> wrote: >>>>>>> >>>>>>> Dear CS contributors >>>>>>> >>>>>>> We use CS 4.5.1 on a 3 Clusters with XenServer 6.5. and shared >>>>>>>> primary >>>>>>>> storage. >>>>>>>> >>>>>>>> When we put a host in maintenance, cs evacuates all VM's on other >>>>>>>> hosts. >>>>>>>> >>>>>>>> But it happens regularly that some VM's are stopped instead of live >>>>>>>> migrated. >>>>>>>> >>>>>>>> If we restart that stopped VM's, CS will then stop them again later. >>>>>>>> >>>>>>>> We need to start them several times until they stays up. >>>>>>>> >>>>>>>> Is it a known issue? Is it fixed on CS 4.9.2 ? >>>>>>>> >>>>>>>> >>>>>>>> Migrating manually all VM's is working but impracticable because too >>>>>>>> much >>>>>>>> time consuming. >>>>>>>> >>>>>>>> >>>>>>>> Many thanks in advance for your help. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Best Regards >>>>>>>> Francois >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> >>>>>>>> >>>>>>>> EveryWare AG >>>>>>>> François Scheurer >>>>>>>> Senior Systems Engineer >>>>>>>> Zurlindenstrasse 52a >>>>>>>> CH-8003 Zürich >>>>>>>> >>>>>>>> tel: +41 44 466 60 00 >>>>>>>> fax: +41 44 466 60 10 >>>>>>>> mail: francois.scheu...@everyware.ch >>>>>>>> web: http://www.everyware.ch >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>> >>>>>> >>>>>> EveryWare AG >>>>>> François Scheurer >>>>>> Senior Systems Engineer >>>>>> Zurlindenstrasse 52a >>>>>> CH-8003 Zürich >>>>>> >>>>>> tel: +41 44 466 60 00 >>>>>> fax: +41 44 466 60 10 >>>>>> mail: francois.scheu...@everyware.ch >>>>>> web: http://www.everyware.ch >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>> Rafael Weingärtner >>>>> >>>>> >>>>> >>>> -- >>> >>> >>> EveryWare AG >>> François Scheurer >>> Senior Systems Engineer >>> Zurlindenstrasse 52a >>> CH-8003 Zürich >>> >>> tel: +41 44 466 60 00 >>> fax: +41 44 466 60 10 >>> mail: francois.scheu...@everyware.ch >>> web: http://www.everyware.ch >>> >>> >> >> > -- > > > EveryWare AG > François Scheurer > Senior Systems Engineer > Zurlindenstrasse 52a > CH-8003 Zürich > > tel: +41 44 466 60 00 > fax: +41 44 466 60 10 > mail: francois.scheu...@everyware.ch > web: http://www.everyware.ch > > -- Rafael Weingärtner