Re: [D] HA not working in cloudstack 4.19 [cloudstack]

via GitHub Wed, 17 Jul 2024 00:27:36 -0700


GitHub user btzq added a comment to the discussion: HA not working in 
cloudstack 4.19


Hey @Aashiqps , we are facing the same situation as well. Great to have found 
another Linstor user in dissagregated architecture. 

At this moment, we managed to get Volume Snapshots to work fine, after latest 
fixes from Linbit side. 

But we still the failover issues, where we cant seem to get all the Virtual 
Routers to start up successfully during a node failure (we simulated by pulling 
the power from the server). If the Virtual Router cant start up, then the VMs 
in that network cant continue to start up as well. 

We tested this using NFS Too just to isolate the network issue and the VM HA 
using NFS works just fine. 

When we go through the logs, we cant seem to identify whats the problem. 

Our latest findings is that according to the logs, the VR was successfully 
migrated to the new host, and its status transitioned to running. However, the 
'ACS HighAvailabilityManager' triggered a stop/reboot action on the router and 
did not take any action to start it afterward. 

`2024-07-16 17:31:18,853 DEBUG [c.c.v.VmWorkJobDispatcher] 
(Work-Job-Executor-137:ctx-b555a5f3 job-385919/job-386245) (logid:1dc1e938) Run 
VM work job: com.cloud.vm.VmWorkStop for VM 54572, job origin: 385919
2024-07-16 17:31:18,854 DEBUG [c.c.v.VmWorkJobHandlerProxy] 
(Work-Job-Executor-137:ctx-b555a5f3 job-385919/job-386245 ctx-10b406cf) 
(logid:1dc1e938) Execute VM work job: 
com.cloud.vm.VmWorkStop{"cleanup":true,"userId":1,"accountId":1,"vmId":54572,"handlerName":"VirtualMachineManagerImpl"}
2024-07-16 17:31:18,867 DEBUG [c.c.c.CapacityManagerImpl] 
(Work-Job-Executor-137:ctx-b555a5f3 job-385919/job-386245 ctx-10b406cf) 
(logid:1dc1e938) VM instance 
{"id":54572,"instanceName":"r-54572-VM","type":"DomainRouter","uuid":"a0022aec-996e-490d-8a21-3eccd43c9e0b"}
 state transited from [Running] to [Stopping] with event [StopRequested]. VM's 
original host: Host 
{"id":129,"name":"n2ncloudmy1cp02","type":"Routing","uuid":"3bf16d9d-e561-4e59-b855-7256bee35c6f"},
 new host: Host 
{"id":129,"name":"n2ncloudmy1cp02","type":"Routing","uuid":"3bf16d9d-e561-4e59-b855-7256bee35c6f"},
 host before state transition: Host 
{"id":129,"name":"n2ncloudmy1cp02","type":"Routing","uuid":"3bf16d9d-e561-4e59-b855-7256bee35c6f"}


2024-07-16 17:31:37,947 INFO  [c.c.h.HighAvailabilityManagerImpl] 
(HA-Worker-42:ctx-cfc0e084 work-103139) (logid:fe9b0f9a) VM
VM instance 
{"id":54572,"instanceName":"r-54572-VM","type":"DomainRouter","uuid":"a0022aec-996e-490d-8a21-3eccd43c9e0b"}
 is no
w no longer on host 129
2024-07-16 17:31:37,947 INFO  [c.c.h.HighAvailabilityManagerImpl] 
(HA-Worker-42:ctx-cfc0e084 work-103139) (logid:fe9b0f9a) Completed work 
HAWork[103139-HA-54572-Stopped-Investigating]. Took 1/10 attempts.`

But there are few things to take note:
- Theres no need to use IMPI OOB. Infact, users are asked not to. This is cause 
only NFS iSCSI storage are susceptible to splitbrain, but in Linstor, 
apparently the technology is different which is why splitbrain will not occur. 
- Need to update Cloudstack Agent to not restart the server. 
- HA Strategy in Cloudstack + Linstor is to rely solely on VM HA (not Host HA). 

More info here:

https://linbit.com/drbd-user-guide/linstor-guide-1_0-en/#ch-cloudstack:~:text=video%20here.-,14.9.%20High%20Availability%20and%20LINSTOR%20Volumes%20in%20CloudStack,-The%20CloudStack%20documentation
 
https://linbit.com/drbd-user-guide/linstor-guide-1_0-en/#ch-cloudstack:~:text=14.9.1.%20Explanation%20and%20Reasoning

Im curious to know your progress and if you managed to find any solution to it? 
Happy to communicate to help each other out. 

GitHub link: 
https://github.com/apache/cloudstack/discussions/9362#discussioncomment-10070156

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Re: [D] HA not working in cloudstack 4.19 [cloudstack]

Reply via email to