GitHub user akoskuczi-bw created a discussion: KVM cluster with NFS primary 
storage – VM HA not working when host is powered down

### problem

In a KVM cluster with NFS primary storage, VM HA does not work when a host is 
powered down.  

- The host status transitions to Down, HA state shows Fenced.  
- VMs from the powered-down host are not restarted on other available hosts in 
the cluster.  
- Both Host HA and VM HA are enabled.  
- OOB driver: IPMI.  




### Expected behavior
VMs from the failed host should be restarted on other available hosts in the 
cluster.  

### Actual behavior
- Host goes to `Down` and HA state `Fenced`.  
- VMs are not started elsewhere.  
- Management server logs show a `NoTransitionException`.  

### Relevant log snippet
WARN  [o.a.c.h.HAManagerImpl] (BackgroundTaskPollManager-4:[ctx-c2bf501d]) 
(logid:96e12771) Unable to find next HA state for current HA state=[Fenced] for 
event=[Ineligible] for host Host 
{"id":4,"name":"csh-1-2.clab.run","type":"Routing","uuid":"f8f86177-f0e3-4994-8609-dd55e0e35a3e"}
 with id 4. com.cloud.utils.fsm.NoTransitionException: Unable to transition to 
a new state from Fenced via Ineligible
        at 
com.cloud.utils.fsm.StateMachine2.getTransition(StateMachine2.java:108)
        at com.cloud.utils.fsm.StateMachine2.getNextState(StateMachine2.java:94)
        at 
org.apache.cloudstack.ha.HAManagerImpl.transitionHAState(HAManagerImpl.java:153)
        at 
org.apache.cloudstack.ha.HAManagerImpl.validateAndFindHAProvider(HAManagerImpl.java:233)
        at 
org.apache.cloudstack.ha.HAManagerImpl$HAManagerBgPollTask.runInContext(HAManagerImpl.java:665)
        at 
org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
        at 
org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
        at 
org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
        at 
org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
        at 
org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
        at 
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
        at 
java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
        at 
java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        at java.base/java.lang.Thread.run(Thread.java:840)





### versions

### Environment
- CloudStack version: 4.20.1.0 
- Hypervisor: KVM  
- Primary storage: NFS  
- HA settings: Host HA enabled, VM HA enabled, OOB driver = IPMI  

### The steps to reproduce the bug

1.1. Enable Host HA and VM HA in a KVM cluster (NFS primary storage).  
2. Power off a host that runs VMs.  
3. Observe host and VM states in the management server.  

### What to do about it?

_No response_

GitHub link: https://github.com/apache/cloudstack/discussions/11674

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Reply via email to