On Thu, 2021-01-28 at 11:23 +0100, Ulrich Windl wrote: > Ken, > > thanks for analyzing the logs! See comments inline... > > > > > Ken Gaillot <kgail...@redhat.com> schrieb am 27.01.2021 um > > > > 19:55 in > > Nachricht > <644fc719a2e8870c332db859bcdef275d986249a.ca...@redhat.com>: > > On Wed, 2021‑01‑27 at 12:36 +0100, Ulrich Windl wrote: > > ... > > > Jan 27 10:43:48 h16 pacemaker‑execd[25960]: warning: > > > prm_CFS_VMI_stop_0[11502] timed out after 90000ms > > > Jan 27 10:43:48 h16 pacemaker‑execd[25960]: notice: prm_CFS_VMI > > > stop > > > (call 129, PID 11502) exited with status 1 (execution time > > > 90007ms, > > > queue time 0ms) > > > Jan 27 10:43:48 h16 pacemaker‑controld[25963]: error: Result of > > > stop > > > operation for prm_CFS_VMI on h16: Timed Out > > > > This stop timeout is why h16 correctly needs to be fenced. The only > > question is why the stop timed out. > > The resouirce is OCFS2, needing DLM. DLM in turn wants a quorum, > right? > So: No quorum, no action -> timeout. Is that right? > > ... > > > Finally: ;‑) > > > > > > Jan 27 11:35:14 h19 pacemaker‑fenced[2099]: notice: Versions did > > > not > > > change in patch 0.250.39 > > > Jan 27 11:36:43 h19 pacemaker‑fenced[2099]: notice: Operation > > > 'reboot' targeting h18 on h16 for > > > pacemaker‑controld.7467@h16.46c6f6cc: OK > > > Jan 27 11:36:43 h19 pacemaker‑fenced[2099]: error: > > > stonith_construct_reply: Triggered assert at > > > fenced_commands.c:2363 : > > > request != NULL > > You did not comment on that; is that expected behavior? ;-)
Sort of ;) This was changed to a more reasonable log warning in the 2.0.5 release: Missing request information for client notifications for operation with result <N> (initiated before we came up?) It can happen (and is perfectly OK) when a node is coming up while some fencing operation is already in-flight. Ideally we'd synchronize in- flight operation information when a node comes up, but it wouldn't really change anything, it would just allow us to tell that situation from an actual error when this message comes up. > > > Jan 27 11:36:43 h19 pacemaker‑fenced[2099]: warning: Can't > > > create a > > > sane reply > > > > > > Regards, > > > Ulrich -- Ken Gaillot <kgail...@redhat.com> _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/