On 8/5/19 3:00 PM, Ulrich Windl wrote: >>>> Andrei Borzenkov <arvidj...@gmail.com> schrieb am 03.08.2019 um 18:17 in > Nachricht <35a226a8-115b-4dc0-f505-dbd78cdd7...@gmail.com>: >> I'm using sbd watchdog and stonith‑watchdog‑timeout without explicit >> stonith agents (shared nothing cluster). How can I clean up failed >> fencing action? >> >> Current DC: ha1 (version >> 2.0.1+20190408.1b68da8e8‑1.3‑2.0.1+20190408.1b68da8e8) ‑ partition with >> quorum >> Last updated: Sat Aug 3 19:10:12 2019 >> Last change: Sat Aug 3 19:04:56 2019 by hacluster via crmd on ha1 >> >> 2 nodes configured >> 7 resources configured >> >> Online: [ ha1 ha2 ] >> >> Active resources: >> >> A (ocf::heartbeat:Dummy): Started ha1 >> B (ocf::heartbeat:Dummy): Started ha1 >> C (ocf::heartbeat:Dummy): Started ha1 >> D (ocf::heartbeat:Dummy): Started ha1 >> E (ocf::heartbeat:Dummy): Started ha1 >> F (ocf::heartbeat:Dummy): Started ha1 >> >> Failed Fencing Actions: >> * reboot of ha2 failed: delegate=, client=pacemaker‑controld.1910, >> origin=ha1, >> last‑failed='Sat Aug 3 18:54:13 2019' >> >> crm_resource requires resource which does not exist. > I'd say manual reboot of ha2 should clean up the situation ;-) > But why did fenciong fail? Nope, at least with kind of current pacemaker-versions (both 1.1.x and 2.x.x), fencing-history is inherited from pre-existing nodes when a node joins a cluster. Thus rebooting of a single node won't purge the history.
Low-level command for handling fencing-history is stonith_admin: -H, --history=value Show last successful fencing operation for named node (or '*' for all nodes). Optional: --timeout, --cleanup, --quiet (show only the operation's epoch timestamp), --verbose (show all recorded and pending operations), --broadcast (update history from all nodes available). Regarding high-level-tooling it is e.g. 'pcs stonith cleanup ...' Just to be on the safe side: You are using qdevice for quorum? (2-node cluster and watchdog-fencing aren't gonna work without source of real quorum out of obvious reasons) I'm just wondering how watchdog-fencing can go wrong. It is basically just waiting for stonith-watchdog-timeout seconds to wait till the unseen node has committed suicide. Klaus > >> _______________________________________________ >> Manage your subscription: >> https://lists.clusterlabs.org/mailman/listinfo/users >> >> ClusterLabs home: https://www.clusterlabs.org/ > > > _______________________________________________ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/