Hi,

I am facing the same issue mentioned here.
Re: KVM Host HA and power lost to host.
<http://mail-archives.apache.org/mod_mbox/cloudstack-users/201903.mbox/%3ccwlp123mb2497e9a14efb930067be397df8...@cwlp123mb2497.gbrp123.prod.outlook.com%3E>

I have configured everything for HA to work and using KVM as hypervisor. We
are using Dell PowerEdge R440 servers. Server has redundant PSU.
The issue mentioned below is only with Dell R440 Servers. On other server
models (PowerEdge R730 & R230) with the same configuration KVM HA works
fine.

*Testing Done:*
1. When host is powered off (server that is hosting the vm)
2. Host state stays in *disconnected **state* and VM never migrates. (
Ideally it should go into maintenance mode and VMs should migrate to
available host as per ACS)
3. When I checked management logs I got this error message .* i.e.- "OOBM
is not configured or enabled for host"*
4. When I clicked on Power ON menu in the idrac dashboard*,* the host goes
into maintenance mode and VM Fencing takes place, but I see that still the
server is not ping-able and the power state in the idrac dashboard is
showing as *OFF*. Then I re-clicked on the power ON menu, and found that
the power state shows *ON.*
5.We have tested the server with one PSU at a time with different power
source and still we see the above issue.
6. Dell team has isolated the issue to ACS being, they were able to execute
the IPMI commands via CLI mode.

On other server models (PowerEdge R730 & R230) with the same configuration
KVM HA works fine.
The issue mentioned above is only with Dell R440 Servers.

*Management server logs:*

2020-02-25 20:25:11,173 DEBUG [o.a.c.o.OutOfBandManagementServiceImpl]
(pool-4-thread-6:ctx-c9e50e03) (logid:5b57b3b3) Out-of-band Management
action (RESET) on host (a295403a-d87e-4294-8c8a-f9085f36248b) *fail*ed with
*error*: Using best available cipher suite 3

*Invalid* completion code received: *Invalid* command

Set Chassis Power Control to reset *fail*ed: Command not supported in
present state

2020-02-25 20:25:11,178 *WARN*  [o.a.c.k.h.KVMHAProvider]
(pool-4-thread-6:ctx-c9e50e03) (logid:5b57b3b3) OOBM service is not
configured or enabled for this host host3.hyclon3.com *error* is
Out-of-band Management action (RESET) on host
(a295403a-d87e-4294-8c8a-f9085f36248b) *fail*ed with *error*: Using best
available cipher suite 3

*Invalid* completion code received: *Invalid* command

Set Chassis Power Control to reset *fail*ed: Command not supported in
present state

2020-02-25 20:25:11,178 *WARN*  [o.a.c.h.t.BaseHATask]
(pool-4-thread-5:null) (logid:5b57b3b3) *Exception* occurred while running
RecoveryTask on a resource: org.apache.cloudstack.ha.provider.HARecovery
*Exception*:  OOBM service is not configured or enabled for this host
host3.hyclon3.com

org.apache.cloudstack.ha.provider.HARecovery*Exception*:  OOBM service is
not configured or enabled for this host host3.hyclon3.com

Caused by: com.cloud.utils.*exception*.CloudRuntime*Exception*: Out-of-band
Management action (RESET) on host (a295403a-d87e-4294-8c8a-f9085f36248b)
*fail*ed with *error*: Using best available cipher suite 3

*Invalid* completion code received: *Invalid* command

Set Chassis Power Control to reset *fail*ed: Command not supported in
present state

2020-02-25 20:25:11,182 *WARN*  [c.c.a.AlertManagerImpl]
(pool-4-thread-5:null) (logid:5b57b3b3) AlertType:: 30 | dataCenterId:: 1 |
podId:: 1 | clusterId:: null | message:: HA Recovery of host id=13, in dc
id=1 performed

2020-02-25 20:25:15,056 DEBUG [o.a.c.o.OutOfBandManagementServiceImpl]
(pool-5-thread-24:ctx-08e29baf) (logid:8e05151d) Out-of-band Management
action (OFF) on host (a295403a-d87e-4294-8c8a-f9085f36248b) *fail*ed with
*error*: Using best available cipher suite 3

*Invalid* completion code received: *Invalid* command

Set Chassis Power Control to down/off *fail*ed: Command not supported in
present state


It will be of great help if anyone can help us to resolve this issue.


Regards,

Mark

Reply via email to