Kiran Koneti created CLOUDSTACK-4936: ----------------------------------------
Summary: System VM's Agent goes to disconnected state but VM state still up when they are stopped manually. Key: CLOUDSTACK-4936 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-4936 Project: CloudStack Issue Type: Bug Security Level: Public (Anyone can view this level - this is the default.) Components: SystemVM Affects Versions: 4.2.1 Reporter: Kiran Koneti Priority: Critical Fix For: 4.2.1 Steps are as below: 1)Installed the CS using the Trail install.(MS Centos 6.2 and Hypervisor Xen 5.6sp2 + CSP installed) 2)once the setup is up and system Vm's came up tried to deploy Some VM's and the VM deployment was successful. 3)Tried to stop the SSVM and CPVM the Stop command is passed to the hypervisor and the VM's are sopped in the Hypervisor . 4)But the cloudstack UI pops with a message unable to stop the VM and updates the status as follows in UI. 5)The VM state is UP but the Agent state shows disconnected. The Management server logs are as below: "2013-10-23 16:49:07,476 DEBUG [cloud.async.AsyncJobManagerImpl] (catalina-exec-19:null) submit async job-212 = [ fec03341-1c65-4752-8f62-21bffa5bc178 ], details: AsyncJobVO {id:212, userId: 2, accountId: 2, sessionKey: null, instanceType: SystemVm, instanceId: 3, cmd: org.apache.cloudstack.api.command.admin.systemvm.StopSystemVmCmd, cmdOriginator: null, cmdInfo: {"response":"json","id":"ffe2558d-516f-4d06-94fe-be20ad83cede","sessionkey":"eEEdlU+E7L8Kb1Op0S4B1PIyhUE\u003d","cmdEventType":"SSVM.STOP","ctxUserId":"2","httpmethod":"GET","_":"1382527274866","ctxAccountId":"2","ctxStartEventId":"966"}, cmdVersion: 0, callbackType: 0, callbackAddress: null, status: 0, processStatus: 0, resultCode: 0, result: null, initMsid: 233845173167606, completeMsid: null, lastUpdated: null, lastPolled: null, created: null} 2013-10-23 16:49:07,477 DEBUG [cloud.async.AsyncJobManagerImpl] (Job-Executor-120:job-212 = [ fec03341-1c65-4752-8f62-21bffa5bc178 ]) Executing org.apache.cloudstack.api.command.admin.systemvm.StopSystemVmCmd for job-212 = [ fec03341-1c65-4752-8f62-21bffa5bc178 ] 2013-10-23 16:49:07,478 DEBUG [cloud.api.ApiServlet] (catalina-exec-19:null) ===END=== 10.252.192.62 -- GET command=stopSystemVm&id=ffe2558d-516f-4d06-94fe-be20ad83cede&response=json&sessionkey=eEEdlU%2BE7L8Kb1Op0S4B1PIyhUE%3D&_=1382527274866 2013-10-23 16:49:07,551 DEBUG [cloud.capacity.CapacityManagerImpl] (Job-Executor-120:job-212 = [ fec03341-1c65-4752-8f62-21bffa5bc178 ]) VM state transitted from :Running to Stopping with event: StopRequestedvm's original host id: 1 new host id: 1 host id before state transition: 1 2013-10-23 16:49:07,553 DEBUG [agent.transport.Request] (Job-Executor-120:job-212 = [ fec03341-1c65-4752-8f62-21bffa5bc178 ]) Seq 1-964559712: Sending { Cmd , MgmtId: 233845173167606, via: 1, Ver: v1, Flags: 100111, [{"com.cloud.agent.api.StopCommand":{"isProxy":false,"executeInSequence":true,"vmName":"s-3-VM","wait":0}}] } 2013-10-23 16:49:07,553 DEBUG [agent.transport.Request] (Job-Executor-120:job-212 = [ fec03341-1c65-4752-8f62-21bffa5bc178 ]) Seq 1-964559712: Executing: { Cmd , MgmtId: 233845173167606, via: 1, Ver: v1, Flags: 100111, [{"com.cloud.agent.api.StopCommand":{"isProxy":false,"executeInSequence":true,"vmName":"s-3-VM","wait":0}}] } 2013-10-23 16:49:07,553 DEBUG [agent.manager.DirectAgentAttache] (DirectAgent-17:null) Seq 1-964559712: Executing request 2013-10-23 16:49:07,653 DEBUG [xen.resource.CitrixResourceBase] (DirectAgent-17:null) 9. The VM s-3-VM is in Stopping state 2013-10-23 16:49:07,935 INFO [xen.resource.CitrixResourceBase] (DirectAgent-17:null) Removed network rules for vm s-3-VM 2013-10-23 16:49:08,288 DEBUG [agent.manager.AgentManagerImpl] (AgentManager-Handler-9:null) SeqA 5-25: Processing Seq 5-25: { Cmd , MgmtId: -1, via: 5, Ver: v1, Flags: 11, [{"com.cloud.agent.api.ConsoleProxyLoadReportCommand":{"_proxyVmId":4,"_loadInfo":"{\n \"connections\": []\n}","wait":0}}] } 2013-10-23 16:49:08,368 DEBUG [agent.manager.AgentManagerImpl] (AgentManager-Handler-9:null) SeqA 5-25: Sending Seq 5-25: { Ans: , MgmtId: 233845173167606, via: 5, Ver: v1, Flags: 100010, [{"com.cloud.agent.api.AgentControlAnswer":{"result":true,"wait":0}}] } 2013-10-23 16:49:09,162 DEBUG [agent.manager.AgentManagerImpl] (AgentManager-Handler-15:null) SeqA 4--1: Processing Seq 4--1: { Cmd , MgmtId: -1, via: 4, Ver: v1, Flags: 111, [{"com.cloud.agent.api.ShutdownCommand":{"reason":"sig.kill","wait":0}}] } 2013-10-23 16:49:09,162 INFO [agent.manager.AgentManagerImpl] (AgentManager-Handler-15:null) Host 4 has informed us that it is shutting down with reason sig.kill and detail null 2013-10-23 16:49:09,163 INFO [agent.manager.AgentManagerImpl] (AgentTaskPool-11:null) Host 4 is disconnecting with event ShutdownRequested 2013-10-23 16:49:09,166 DEBUG [agent.manager.AgentManagerImpl] (AgentTaskPool-11:null) The next status of agent 4is Disconnected, current status is Up 2013-10-23 16:49:09,166 DEBUG [agent.manager.AgentManagerImpl] (AgentTaskPool-11:null) Deregistering link for 4 with state Disconnected 2013-10-23 16:49:09,166 DEBUG [agent.manager.AgentManagerImpl] (AgentTaskPool-11:null) Remove Agent : 4 2013-10-23 16:49:09,166 DEBUG [agent.manager.AgentManagerImpl] (AgentTaskPool-11:null) Remove Agent : 4 2013-10-23 16:49:09,166 DEBUG [agent.manager.ConnectedAgentAttache] (AgentTaskPool-11:null) Processing Disconnect. 2013-10-23 16:49:09,166 DEBUG [agent.manager.ConnectedAgentAttache] (AgentTaskPool-11:null) Processing Disconnect. 2013-10-23 16:49:09,166 DEBUG [agent.manager.AgentManagerImpl] (AgentTaskPool-11:null) Sending Disconnect to listener: com.cloud.hypervisor.xen.discoverer.XcpServerDiscoverer_EnhancerByCloudStack_621251c1 2013-10-23 16:49:09,166 DEBUG [agent.manager.AgentManagerImpl] (AgentTaskPool-11:null) Sending Disconnect to listener: com.cloud.deploy.DeploymentPlanningManagerImpl_EnhancerByCloudStack_28d6dc37 2013-10-23 16:49:09,166 DEBUG [agent.manager.AgentManagerImpl] (AgentTaskPool-11:null) Sending Disconnect to listener: com.cloud.network.NetworkManagerImpl_EnhancerByCloudStack_3a67edca 2013-10-23 16:49:09,166 DEBUG [agent.manager.AgentManagerImpl] (AgentTaskPool-11:null) Sending Disconnect to listener: com.cloud.storage.secondary.SecondaryStorageListener 2013-10-23 16:49:09,166 DEBUG [agent.manager.AgentManagerImpl] (AgentTaskPool-11:null) Sending Disconnect to listener: com.cloud.hypervisor.vmware.manager.VmwareManagerImpl_EnhancerByCloudStack_d1dcecc4 2013-10-23 16:49:09,166 DEBUG [agent.manager.AgentManagerImpl] (AgentTaskPool-11:null) Sending Disconnect to listener: com.cloud.network.security.SecurityGroupListener 2013-10-23 16:49:09,166 DEBUG [agent.manager.AgentManagerImpl] (AgentTaskPool-11:null) Sending Disconnect to listener: com.cloud.storage.listener.StoragePoolMonitor 2013-10-23 16:49:09,166 DEBUG [agent.manager.AgentManagerImpl] (AgentTaskPool-11:null) Sending Disconnect to listener: com.cloud.vm.ClusteredVirtualMachineManagerImpl_EnhancerByCloudStack_67289ece 2013-10-23 16:49:09,167 DEBUG [agent.manager.AgentManagerImpl] (AgentTaskPool-11:null) Sending Disconnect to listener: com.cloud.storage.LocalStoragePoolListener 2013-10-23 16:49:09,167 DEBUG [agent.manager.AgentManagerImpl] (AgentTaskPool-11:null) Sending Disconnect to listener: com.cloud.network.SshKeysDistriMonitor 2013-10-23 16:49:09,167 DEBUG [agent.manager.AgentManagerImpl] (AgentTaskPool-11:null) Sending Disconnect to listener: com.cloud.network.router.VirtualNetworkApplianceManagerImpl_EnhancerByCloudStack_9a0b6f 2013-10-23 16:49:09,167 DEBUG [agent.manager.AgentManagerImpl] (AgentTaskPool-11:null) Sending Disconnect to listener: com.cloud.network.SshKeysDistriMonitor 2013-10-23 16:49:09,167 DEBUG [agent.manager.AgentManagerImpl] (AgentTaskPool-11:null) Sending Disconnect to listener: com.cloud.network.router.VpcVirtualNetworkApplianceManagerImpl_EnhancerByCloudStack_7b7dd392 2013-10-23 16:49:09,167 DEBUG [agent.manager.AgentManagerImpl] (AgentTaskPool-11:null) Sending Disconnect to listener: com.cloud.storage.upload.UploadListener 2013-10-23 16:49:09,167 DEBUG [agent.manager.AgentManagerImpl] (AgentTaskPool-11:null) Sending Disconnect to listener: com.cloud.storage.download.DownloadListener 2013-10-23 16:49:09,167 DEBUG [agent.manager.AgentManagerImpl] (AgentTaskPool-11:null) Sending Disconnect to listener: com.cloud.agent.manager.AgentMonitor 2013-10-23 16:49:09,167 DEBUG [agent.manager.AgentManagerImpl] (AgentTaskPool-11:null) Sending Disconnect to listener: com.cloud.capacity.StorageCapacityListener 2013-10-23 16:49:09,167 DEBUG [agent.manager.AgentManagerImpl] (AgentTaskPool-11:null) Sending Disconnect to listener: com.cloud.capacity.ComputeCapacityListener 2013-10-23 16:49:09,167 DEBUG [agent.manager.AgentManagerImpl] (AgentTaskPool-11:null) Sending Disconnect to listener: com.cloud.network.NetworkUsageManagerImpl$DirectNetworkStatsListener 2013-10-23 16:49:09,167 DEBUG [cloud.network.NetworkUsageManagerImpl] (AgentTaskPool-11:null) Disconnected called on 4 with status Disconnected 2013-10-23 16:49:09,167 DEBUG [agent.manager.AgentManagerImpl] (AgentTaskPool-11:null) Sending Disconnect to listener: com.cloud.consoleproxy.ConsoleProxyListener 2013-10-23 16:49:09,168 DEBUG [cloud.host.Status] (AgentTaskPool-11:null) Transition:[Resource state = Enabled, Agent event = ShutdownRequested, Host id = 4, name = s-3-VM] 2013-10-23 16:49:09,263 DEBUG [cloud.host.Status] (AgentTaskPool-11:null) Agent status update: [id = 4; name = s-3-VM; old status = Up; event = ShutdownRequested; new status = Disconnected; old update count = 23; new update count = 24] 2013-10-23 16:49:09,263 DEBUG [agent.manager.ClusteredAgentManagerImpl] (AgentTaskPool-11:null) Notifying other nodes of to disconnect 2013-10-23 16:49:10,527 DEBUG [cloud.api.ApiServlet] (catalina-exec-14:null) ===START=== 10.252.192.62 -- GET command=queryAsyncJobResult&jobId=fec03341-1c65-4752-8f62-21bffa5bc178&response=json&sessionkey=eEEdlU%2BE7L8Kb1Op0S4B1PIyhUE%3D&_=1382527278181 2013-10-23 16:49:10,591 DEBUG [cloud.api.ApiServlet] (catalina-exec-14:null) ===END=== 10.252.192.62 -- GET command=queryAsyncJobResult&jobId=fec03341-1c65-4752-8f62-21bffa5bc178&response=json&sessionkey=eEEdlU%2BE7L8Kb1Op0S4B1PIyhUE%3D&_=1382527278181 2013-10-23 16:49:12,502 DEBUG [host.dao.HostDaoImpl] (ClusteredAgentManager Timer:null) Resetting hosts suitable for reconnect 2013-10-23 16:49:12,504 DEBUG [host.dao.HostDaoImpl] (ClusteredAgentManager Timer:null) Completed resetting hosts suitable for reconnect 2013-10-23 16:49:12,504 DEBUG [host.dao.HostDaoImpl] (ClusteredAgentManager Timer:null) Acquiring hosts for clusters already owned by this management server 2013-10-23 16:49:12,505 DEBUG [host.dao.HostDaoImpl] (ClusteredAgentManager Timer:null) Completed acquiring hosts for clusters already owned by this management server 2013-10-23 16:49:12,505 DEBUG [host.dao.HostDaoImpl] (ClusteredAgentManager Timer:null) Acquiring hosts for clusters not owned by any management server 2013-10-23 16:49:12,505 DEBUG [host.dao.HostDaoImpl] (ClusteredAgentManager Timer:null) Completed acquiring hosts for clusters not owned by any management server 2013-10-23 16:49:13,291 DEBUG [agent.manager.AgentManagerImpl] (AgentManager-Handler-13:null) SeqA 5-26: Processing Seq 5-26: { Cmd , MgmtId: -1, via: 5, Ver: v1, Flags: 11, [{"com.cloud.agent.api.ConsoleProxyLoadReportCommand":{"_proxyVmId":4,"_loadInfo":"{\n \"connections\": []\n}","wait":0}}] } 2013-10-23 16:49:13,346 DEBUG [agent.manager.AgentManagerImpl] (AgentManager-Handler-13:null) SeqA 5-26: Sending Seq 5-26: { Ans: , MgmtId: 233845173167606, via: 5, Ver: v1, Flags: 100010, [{"com.cloud.agent.api.AgentControlAnswer":{"result":true,"wait":0}}] } 2013-10-23 16:49:13,526 DEBUG [cloud.api.ApiServlet] (catalina-exec-22:null) ===START=== 10.252.192.62 -- GET command=queryAsyncJobResult&jobId=fec03341-1c65-4752-8f62-21bffa5bc178&response=json&sessionkey=eEEdlU%2BE7L8Kb1Op0S4B1PIyhUE%3D&_=1382527281181 2013-10-23 16:49:13,601 DEBUG [cloud.api.ApiServlet] (catalina-exec-22:null) ===END=== 10.252.192.62 -- GET command=queryAsyncJobResult&jobId=fec03341-1c65-4752-8f62-21bffa5bc178&response=json&sessionkey=eEEdlU%2BE7L8Kb1Op0S4B1PIyhUE%3D&_=1382527281181 2013-10-23 16:49:16,526 DEBUG [cloud.api.ApiServlet] (catalina-exec-2:null) ===START=== 10.252.192.62 -- GET command=queryAsyncJobResult&jobId=fec03341-1c65-4752-8f62-21bffa5bc178&response=json&sessionkey=eEEdlU%2BE7L8Kb1Op0S4B1PIyhUE%3D&_=1382527284180 2013-10-23 16:49:16,619 DEBUG [cloud.api.ApiServlet] (catalina-exec-2:null) ===END=== 10.252.192.62 -- GET command=queryAsyncJobResult&jobId=fec03341-1c65-4752-8f62-21bffa5bc178&response=json&sessionkey=eEEdlU%2BE7L8Kb1Op0S4B1PIyhUE%3D&_=1382527284180 2013-10-23 16:49:18,091 DEBUG [xen.resource.CitrixResourceBase] (DirectAgent-17:null) 10. The VM s-3-VM is in Stopped state 2013-10-23 16:49:18,092 DEBUG [agent.manager.DirectAgentAttache] (DirectAgent-17:null) Seq 1-964559712: Response Received: 2013-10-23 16:49:18,092 DEBUG [agent.transport.Request] (DirectAgent-17:null) Seq 1-964559712: Processing: { Ans: , MgmtId: 233845173167606, via: 1, Ver: v1, Flags: 110, [{"com.cloud.agent.api.StopAnswer":{"hypervisortoolsversion":"xenserver56","result":true,"details":"Stop VM s-3-VM Succeed","wait":0}}] } 2013-10-23 16:49:18,092 DEBUG [agent.manager.AgentAttache] (DirectAgent-17:null) Seq 1-964559712: No more commands found 2013-10-23 16:49:18,092 DEBUG [agent.transport.Request] (Job-Executor-120:job-212 = [ fec03341-1c65-4752-8f62-21bffa5bc178 ]) Seq 1-964559712: Received: { Ans: , MgmtId: 233845173167606, via: 1, Ver: v1, Flags: 110, { StopAnswer } } 2013-10-23 16:49:18,094 WARN [cloud.vm.VirtualMachineManagerImpl] (Job-Executor-120:job-212 = [ fec03341-1c65-4752-8f62-21bffa5bc178 ]) Unable to stop vm VM[SecondaryStorageVm|s-3-VM] 2013-10-23 16:49:18,166 DEBUG [cloud.capacity.CapacityManagerImpl] (Job-Executor-120:job-212 = [ fec03341-1c65-4752-8f62-21bffa5bc178 ]) VM state transitted from :Stopping to Running with event: OperationFailedvm's original host id: 1 new host id: 1 host id before state transition: 1 2013-10-23 16:49:18,167 DEBUG [cloud.async.AsyncJobManagerImpl] (Job-Executor-120:job-212 = [ fec03341-1c65-4752-8f62-21bffa5bc178 ]) Complete async job-212 = [ fec03341-1c65-4752-8f62-21bffa5bc178 ], jobStatus: 2, resultCode: 530, result: Error Code: 530 Error text: Fail to stop system vm 2013-10-23 16:49:18,294 DEBUG [agent.manager.AgentManagerImpl] (AgentManager-Handler-11:null) SeqA 5-27: Processing Seq 5-27: { Cmd , MgmtId: -1, via: 5, Ver: v1, Flags: 11, [{"com.cloud.agent.api.ConsoleProxyLoadReportCommand":{"_proxyVmId":4,"_loadInfo":"{\n \"connections\": []\n}","wait":0}}] } 2013-10-23 16:49:18,350 DEBUG [agent.manager.AgentManagerImpl] (AgentManager-Handler-11:null) SeqA 5-27: Sending Seq 5-27: { Ans: , MgmtId: 233845173167606, via: 5, Ver: v1, Flags: 100010, [{"com.cloud.agent.api.AgentControlAnswer":{"result":true,"wait":0}}] } 2013-10-23 16:49:19,527 DEBUG [cloud.api.ApiServlet] (catalina-exec-25:null) ===START=== 10.252.192.62 -- GET command=queryAsyncJobResult&jobId=fec03341-1c65-4752-8f62-21bffa5bc178&response=json&sessionkey=eEEdlU%2BE7L8Kb1Op0S4B1PIyhUE%3D&_=1382527287180 2013-10-23 16:49:19,541 DEBUG [cloud.async.AsyncJobManagerImpl] (catalina-exec-25:null) Async job-212 = [ fec03341-1c65-4752-8f62-21bffa5bc178 ] completed" Observations: Once I reboot the System Vm's then they are recreated and the agent status is connected and the VM's are created in the Hypervisor also. Attaching the Management sever logs. -- This message was sent by Atlassian JIRA (v6.1#6144)