Hi I didn´t stop the agent, I did a 'shutdown -h now' at the Host 'A' in order to simulate a crash.
My goal is verify if one of my KVM hosts fail, the VMs with HA enabled from thos host 'A' will migrate to another Host (in this case host 'B'. Or , al least, it will be posible do it manually. If you need more tests, I can do it. Thanks. On Mon, Jul 20, 2015 at 12:16 PM, Milamber <milam...@apache.org> wrote: > > > On 20/07/2015 15:44, Luciano Castro wrote: > >> Hi ! >> >> My test today: I stopped other instance, and changed to HA Offer. I >> started >> this instance. >> >> After, I shutdown gracefully the KVM host of it. >> > > Why a gracefully shutdown of the KVM host ? The HA process is to (re)start > the HA VMs on a new host, the current host has been crashed or not > available i.e. its cloudstack agent won't respond. > If you stopped gently the cloudstack-agent, the CS mgr don't consider this > to a crash, so the HA won't start. > > What's behavior do you expect? > > > > >> and I checked the investigators process: >> >> [root@1q2 ~]# grep -i Investigator >> /var/log/cloudstack/management/management-server.log >> >> >> [root@1q2 ~]# date >> Mon Jul 20 14:39:43 UTC 2015 >> >> [root@1q2 ~]# ls -ltrh >> /var/log/cloudstack/management/management-server.log >> -rw-rw-r--. 1 cloud cloud 14M Jul 20 14:39 >> /var/log/cloudstack/management/management-server.log >> >> >> >> Nothing. I dont know how internally these process work. but seems that >> they are not working well, agree? >> >> options value >> ha.investigators.exclude nothing >> ha.investigators.orde >> >> SimpleInvestigator,XenServerInvestigator,KVMInvestigator,HypervInvestigator,VMwareInvestigator,PingInvestigator,ManagementIPSysVMInvestigator >> investigate.retry.interval 60 >> >> There´s a way to check if these process are running ? >> >> [root@1q2 ~]# ps waux| grep -i java >> root 11408 0.0 0.0 103252 880 pts/0 S+ 14:44 0:00 grep -i >> java >> cloud 24225 0.7 1.7 16982036 876412 ? Sl Jul16 43:48 >> /usr/lib/jvm/jre-1.7.0/bin/java -Djava.awt.headless=true >> -Dcom.sun.management.jmxremote=false -Xmx2g >> -XX:+HeapDumpOnOutOfMemoryError >> -XX:HeapDumpPath=/var/log/cloudstack/management/ -XX:PermSize=512M >> -XX:MaxPermSize=800m >> >> -Djava.security.properties=/etc/cloudstack/management/java.security.ciphers >> -classpath >> >> :::/etc/cloudstack/management:/usr/share/cloudstack-management/setup:/usr/share/cloudstack-management/bin/bootstrap.jar:/usr/share/cloudstack-management/bin/tomcat-juli.jar:/usr/share/java/commons-daemon.jar >> -Dcatalina.base=/usr/share/cloudstack-management >> -Dcatalina.home=/usr/share/cloudstack-management -Djava.endorsed.dirs= >> -Djava.io.tmpdir=/usr/share/cloudstack-management/temp >> >> -Djava.util.logging.config.file=/usr/share/cloudstack-management/conf/logging.properties >> -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager >> org.apache.catalina.startup.Bootstrap start >> >> >> >> Thanks >> >> >> >> On Sat, Jul 18, 2015 at 1:53 PM, Milamber <milam...@apache.org> wrote: >> >> >>> On 17/07/2015 22:26, Somesh Naidu wrote: >>> >>> Perhaps, the management server don't reconize the host 3 totally down >>>> >>>>> (ping alive? or some quorum don't ok) >>>>> The only way to the mgt server to accept totally that the host 3 has a >>>>> real problem that the host 3 has been reboot (around 12:44)? >>>>> >>>>> The host disconnect was triggered at 12:19 on host 3. Mgmt server was >>>> pretty sure the host is down (it was a graceful shutdown I believe) >>>> which >>>> is why it triggered a disconnect and notified other nodes. There was no >>>> checkhealth/checkonhost/etc. triggered; just the agent disconnected and >>>> all >>>> listeners (ping/etc.) notified. >>>> >>>> At this time mgmt server should have scheduled HA on all VMs running on >>>> that host. The HA investigators would then work their way identifying >>>> whether the VMs are still running, if they need to be fenced, etc. But >>>> this >>>> never happened. >>>> >>>> >>> AFAIK, stopping the cloudstack-agent service don't allow to start the HA >>> process for the VMs hosted by the node. Seems normal to me that the HA >>> process don't start at this moment. >>> If I would start the HA process on a node, I go to the Web UI (or >>> cloudmonkey) to change the state of the Host from Up to Maintenance. >>> >>> >>> (after I can stop the CS-agent service if I need for exemple reboot a >>> node) >>> >>> >>> >>> Regards, >>>> Somesh >>>> >>>> >>>> -----Original Message----- >>>> From: Milamber [mailto:milam...@apache.org] >>>> Sent: Friday, July 17, 2015 6:01 PM >>>> To: users@cloudstack.apache.org >>>> Subject: Re: HA feature - KVM - CloudStack 4.5.1 >>>> >>>> >>>> >>>> On 17/07/2015 21:23, Somesh Naidu wrote: >>>> >>>> Ok, so here are my findings. >>>>> >>>>> 1. Host ID 3 was shutdown around 2015-07-16 12:19:09 at which point >>>>> management server called a disconnect. >>>>> 2. Based on the logs, it seems VM IDs 32, 18, 39 and 46 were running on >>>>> the host. >>>>> 3. No HA tasks for any of these VMs at this time. >>>>> 5. Management server restarted at around 2015-07-16 12:30:20. >>>>> 6. Host ID 3 connected back at around 2015-07-16 12:44:08. >>>>> 7. Management server identified the missing VMs and triggered HA on >>>>> those. >>>>> 8. The VMs were eventually started, all 4 of them. >>>>> >>>>> I am not 100% sure why HA wasn't triggered until 2015-07-16 12:30 (#3), >>>>> but I know that management server restart caused it not happen until >>>>> the >>>>> host was reconnected. >>>>> >>>>> Perhaps, the management server don't reconize the host 3 totally down >>>> (ping alive? or some quorum don't ok) >>>> The only way to the mgt server to accept totally that the host 3 has a >>>> real problem that the host 3 has been reboot (around 12:44)? >>>> >>>> What is the storage subsystem? CLVMd? >>>> >>>> >>>> Regards, >>>> >>>>> Somesh >>>>> >>>>> >>>>> -----Original Message----- >>>>> From: Luciano Castro [mailto:luciano.cas...@gmail.com] >>>>> Sent: Friday, July 17, 2015 12:13 PM >>>>> To: users@cloudstack.apache.org >>>>> Subject: Re: HA feature - KVM - CloudStack 4.5.1 >>>>> >>>>> No problems Somesh, thanks for your help. >>>>> >>>>> Link of log: >>>>> >>>>> >>>>> >>>>> https://dl.dropboxusercontent.com/u/6774061/management-server.log.2015-07-16.gz >>>>> >>>>> Luciano >>>>> >>>>> On Fri, Jul 17, 2015 at 12:00 PM, Somesh Naidu < >>>>> somesh.na...@citrix.com> >>>>> wrote: >>>>> >>>>> How large is the management server logs dated 2015-07-16? I would >>>>> like >>>>> >>>>>> to >>>>>> review the logs. All the information I need from that incident should >>>>>> be in >>>>>> there so I don't need any more testing. >>>>>> >>>>>> Regards, >>>>>> Somesh >>>>>> >>>>>> -----Original Message----- >>>>>> From: Luciano Castro [mailto:luciano.cas...@gmail.com] >>>>>> Sent: Friday, July 17, 2015 7:58 AM >>>>>> To: users@cloudstack.apache.org >>>>>> Subject: Re: HA feature - KVM - CloudStack 4.5.1 >>>>>> >>>>>> Hi Somesh! >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> [root@1q2 ~]# zgrep -i -E >>>>>> >>>>>> >>>>>> >>>>>> 'SimpleIvestigator|KVMInvestigator|PingInvestigator|ManagementIPSysVMInvestigator' >>>>>> /var/log/cloudstack/management/management-server.log.2015-07-16.gz >>>>>> |tail >>>>>> -5000 > /tmp/management.txt >>>>>> [root@1q2 ~]# cat /tmp/management.txt >>>>>> 2015-07-16 12:30:45,452 DEBUG [o.a.c.s.l.r.ExtensionRegistry] >>>>>> (main:null) >>>>>> Registering extension [KVMInvestigator] in [Ha Investigators Registry] >>>>>> 2015-07-16 12:30:45,452 DEBUG [o.a.c.s.l.r.RegistryLifecycle] >>>>>> (main:null) >>>>>> Registered com.cloud.ha.KVMInvestigator@57ceec9a >>>>>> 2015-07-16 12:30:45,927 DEBUG [o.a.c.s.l.r.ExtensionRegistry] >>>>>> (main:null) >>>>>> Registering extension [PingInvestigator] in [Ha Investigators >>>>>> Registry] >>>>>> 2015-07-16 12:30:45,928 DEBUG [o.a.c.s.l.r.ExtensionRegistry] >>>>>> (main:null) >>>>>> Registering extension [ManagementIPSysVMInvestigator] in [Ha >>>>>> Investigators >>>>>> Registry] >>>>>> 2015-07-16 12:30:53,796 INFO [o.a.c.s.l.r.DumpRegistry] (main:null) >>>>>> Registry [Ha Investigators Registry] contains [SimpleInvestigator, >>>>>> XenServerInvestigator, KVMInv >>>>>> >>>>>> I searched this log before, but as I thought that had not nothing >>>>>> special. >>>>>> >>>>>> If you want propose to me another scenario of test, I can do it. >>>>>> >>>>>> Thanks >>>>>> >>>>>> >>>>>> On Thu, Jul 16, 2015 at 7:27 PM, Somesh Naidu < >>>>>> somesh.na...@citrix.com> >>>>>> wrote: >>>>>> >>>>>> What about other investigators, specifically " KVMInvestigator, >>>>>> >>>>>>> PingInvestigator"? They report the VMs as alive=false too? >>>>>>> >>>>>>> Also, it is recommended that you look at the management-sever.log >>>>>>> instead >>>>>>> of catalina.out (for one, the latter doesn’t have timestamp). >>>>>>> >>>>>>> Regards, >>>>>>> Somesh >>>>>>> >>>>>>> >>>>>>> -----Original Message----- >>>>>>> From: Luciano Castro [mailto:luciano.cas...@gmail.com] >>>>>>> Sent: Thursday, July 16, 2015 1:14 PM >>>>>>> To: users@cloudstack.apache.org >>>>>>> Subject: Re: HA feature - KVM - CloudStack 4.5.1 >>>>>>> >>>>>>> Hi Somesh! >>>>>>> >>>>>>> >>>>>>> thanks for help.. I did again ,and I collected new logs: >>>>>>> >>>>>>> My vm_instance name is i-2-39-VM. There was some routers in KVM host >>>>>>> 'A' >>>>>>> (this one that I powered off now): >>>>>>> >>>>>>> >>>>>>> [root@1q2 ~]# grep -i -E 'SimpleInvestigator.*false' >>>>>>> /var/log/cloudstack/management/catalina.out >>>>>>> INFO [c.c.h.HighAvailabilityManagerImpl] (HA-Worker-2:ctx-e2f91c9c >>>>>>> >>>>>>> work-3) >>>>>> >>>>>> SimpleInvestigator found VM[DomainRouter|r-4-VM]to be alive? false >>>>>>> INFO [c.c.h.HighAvailabilityManagerImpl] (HA-Worker-1:ctx-729acf4f >>>>>>> >>>>>>> work-7) >>>>>> >>>>>> SimpleInvestigator found VM[User|i-23-33-VM]to be alive? false >>>>>>> INFO [c.c.h.HighAvailabilityManagerImpl] (HA-Worker-4:ctx-a66a4941 >>>>>>> >>>>>>> work-8) >>>>>> >>>>>> SimpleInvestigator found VM[DomainRouter|r-36-VM]to be alive? false >>>>>>> INFO [c.c.h.HighAvailabilityManagerImpl] (HA-Worker-1:ctx-5977245e >>>>>>> work-10) SimpleInvestigator found VM[User|i-17-26-VM]to be alive? >>>>>>> false >>>>>>> INFO [c.c.h.HighAvailabilityManagerImpl] (HA-Worker-1:ctx-c7f39be0 >>>>>>> >>>>>>> work-9) >>>>>> >>>>>> SimpleInvestigator found VM[DomainRouter|r-32-VM]to be alive? false >>>>>>> INFO [c.c.h.HighAvailabilityManagerImpl] (HA-Worker-3:ctx-ad4f5fda >>>>>>> work-10) SimpleInvestigator found VM[DomainRouter|r-46-VM]to be >>>>>>> alive? >>>>>>> false >>>>>>> INFO [c.c.h.HighAvailabilityManagerImpl] (HA-Worker-0:ctx-0257f5af >>>>>>> work-11) SimpleInvestigator found VM[User|i-4-52-VM]to be alive? >>>>>>> false >>>>>>> INFO [c.c.h.HighAvailabilityManagerImpl] (HA-Worker-4:ctx-7ddff382 >>>>>>> work-12) SimpleInvestigator found VM[DomainRouter|r-32-VM]to be >>>>>>> alive? >>>>>>> false >>>>>>> INFO [c.c.h.HighAvailabilityManagerImpl] (HA-Worker-1:ctx-9f79917e >>>>>>> work-13) SimpleInvestigator found VM[User|i-2-39-VM]to be alive? >>>>>>> false >>>>>>> >>>>>>> >>>>>>> >>>>>>> KVM host 'B' agent log (where the machine would be migrate): >>>>>>> >>>>>>> 2015-07-16 16:58:56,537 INFO [kvm.resource.LibvirtComputingResource] >>>>>>> (agentRequest-Handler-4:null) Live migration of instance i-2-39-VM >>>>>>> initiated >>>>>>> 2015-07-16 16:58:57,540 INFO [kvm.resource.LibvirtComputingResource] >>>>>>> (agentRequest-Handler-4:null) Waiting for migration of i-2-39-VM to >>>>>>> complete, waited 1000ms >>>>>>> 2015-07-16 16:58:58,541 INFO [kvm.resource.LibvirtComputingResource] >>>>>>> (agentRequest-Handler-4:null) Waiting for migration of i-2-39-VM to >>>>>>> complete, waited 2000ms >>>>>>> 2015-07-16 16:58:59,542 INFO [kvm.resource.LibvirtComputingResource] >>>>>>> (agentRequest-Handler-4:null) Waiting for migration of i-2-39-VM to >>>>>>> complete, waited 3000ms >>>>>>> 2015-07-16 16:59:00,543 INFO [kvm.resource.LibvirtComputingResource] >>>>>>> (agentRequest-Handler-4:null) Waiting for migration of i-2-39-VM to >>>>>>> complete, waited 4000ms >>>>>>> 2015-07-16 16:59:01,245 INFO [kvm.resource.LibvirtComputingResource] >>>>>>> (agentRequest-Handler-4:null) Migration thread for i-2-39-VM is done >>>>>>> >>>>>>> It said done for my i-2-39-VM instance, but I can´t ping this host. >>>>>>> >>>>>>> Luciano >>>>>>> >>>>>>> >>>>>>> -- >>>>>> Luciano Castro >>>>>> >>>>>> >>>>>> >> > -- Luciano Castro