[ClusterLabs] VirtualDomain - monitor misses to report & plays up

lejeczek Sun, 11 Apr 2021 15:39:25 -0700

Hi guys.

I've experiencing weir "handling" of VirtualDomain by thecluster. It seems that cluster sometimes fails to reportreal state of VM which results sometime in troubles - likewhen cluster thinks VM is not running, which is running thencluster starts it on another node which fcuks up qcow image.Right now for example I'm looking at cluster report VM is up& okey while it is not, on none of the nodes (because VM was'poweroff' from itself)

So I:


-> $ pcs resource refresh c8kubermaster1
Cleaned up c8kubermaster1 on swir
Cleaned up c8kubermaster1 on dzien
Waiting for 2 replies from the controller
... got reply
... got reply (done)

In logs where VM is supposed to be running, according to cluster
..

notice: Requesting local execution of probe operation forc8kubermaster1 on swir notice: Result of probe operation for c8kubermaster1 onswir: ok notice: Requesting local execution of monitor operationfor c8kubermaster1 on swir notice: Result of monitor operation for c8kubermaster1 onswir: ok


, on the second node (2-node cluster) in logs:
..
 notice: State transition S_IDLE -> S_POLICY_ENGINE

notice: Ignoring expired c8kubernode1_migrate_to_0 failureon dzien

 notice:  * Start      c8kubermaster1     (          swir )

notice: Calculated transition 42, saving inputs in/var/lib/pacemaker/pengine/pe-input-2655.bz2 notice: Initiating monitor operationc8kubermaster1_monitor_0 on swir notice: Initiating monitor operationc8kubermaster1_monitor_0 locally on dzien notice: Requesting local execution of probe operation forc8kubermaster1 on dzien notice: Result of probe operation for c8kubermaster1 ondzien: not running notice: Transition 42 aborted by operationc8kubermaster1_monitor_0 'modify' on swir: Event failed notice: Transition 42 action 11 (c8kubermaster1_monitor_0on swir): expected 'not running' but got 'ok'


-> $ pcs resource config c8kubermaster1

Resource: c8kubermaster1 (class=ocf provider=heartbeattype=VirtualDomain) Attributes:config=/var/lib/pacemaker/conf.d/c8kubermaster1.xmlhypervisor=qemu:///system migration_transport=ssh

  Meta Attrs: allow-migrate=true failure-timeout=120s

Operations: migrate_from interval=0s timeout=180s(c8kubermaster1-migrate_from-interval-0s) migrate_to interval=0s timeout=180s(c8kubermaster1-migrate_to-interval-0s) monitor interval=30s(c8kubermaster1-monitor-interval-30s) start interval=0s timeout=90s(c8kubermaster1-start-interval-0s) stop interval=0s timeout=90s(c8kubermaster1-stop-interval-0s)

Disable + enable the resource 'fixes' the glitch but,naturally the obvious question would be - why that is evenallowed to happen?

many thanks, L.
_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

[ClusterLabs] VirtualDomain - monitor misses to report & plays up

Reply via email to