>>> Steffen Vinther Sørensen <svint...@gmail.com> schrieb am 18.01.2021 um 10:00 in Nachricht <CALhdMBi4Q=3xRU8yubFEkX2XgBZf1WLO5+LQzNqCo=co2e-...@mail.gmail.com>: > Hi, > > I have persistent journal, but 'journalctl -b -1' was empty in this > case, so it might not be optimally configured. And centralized logging > is on the todo list > > > btw. about the fencing, I have set ' HandlePowerKey=ignore' in > /etc/systemd/logind.conf > (for this hardware, I can find no bios settings on how to react to > power key being pressed, so can not be set to instant-off) > > Now when a node is fenced it goes down more quickly, and its only > journal output is: > Jan 18 09:33:19 kvm03-node03 systemd-logind[4354]: Power key pressed. > Jan 18 09:33:24 kvm03-node03 systemd-logind[4354]: Power key pressed. > > So it seems it needs to be pressed twice with 5 sec delay, and by > looking at the hardware console, the system does not reboot before > about 09.33.27 ( 8 secs totally)
I haven't looked into the IPMI fenceing agent, but ipmitool can: chassis power on chassis power off chassis power cycle chassis power reset IMHO for fencing only "power off" and "power reset" (assuming a hardware reset) make sense. Also I don't know how it's implemented: My guess is that it directs the power supply to transit to off, and _not_ to simulate an ACPI power buttoin press... Playing with the tool here (Dell server), I get: h16:~ # ipmitool chassis power ## only list commands available chassis power Commands: status, on, off, cycle, reset, diag, soft h16:~ # ipmitool chassis restart_cause System restart cause: unknown > > When the node is back online, 'journalctl -b -1' only reports the first > Jan 18 09:33:19 kvm03-node03 systemd-logind[4354]: Power key pressed. > > The second line was never written to persistent journal What might help is running "journalctl -f" on a terminal. So you see the last messages received, even if not written to the filesystem (I think). So when the host is down, you see the last messages. Disk writes frequently miss the last two or three seconds IMHO. Regards, Ulrich > > > > On Mon, Jan 18, 2021 at 8:49 AM Ulrich Windl > <ulrich.wi...@rz.uni-regensburg.de> wrote: >> >> >>> Steffen Vinther Sørensen <svint...@gmail.com> schrieb am 16.01.2021 um >> 19:28 in >> Nachricht >> <CALhdMBho79Kd7XjV2BvD+-J5i+94vKejnJYB5UEjG=w_hg1...@mail.gmail.com>: >> > Hi and thank you for the insights >> >> Hi! >> ... >> >> > I just did a test after the latest adjustments with colocations etc. >> > trying to standby node02, ends up with node02 being fenced before >> > migrations complete. Unfortunately logs from node02 was lost >> >> Don't you have a persistent journal on node2? Maybe it's a good idea to > make >> all nodes log to an external syslog server, at least until your problems are >> fixed. That would also have the benefit that you get a better global insight > of >> the sequence of events... >> >> ... >> >> Regards, >> Ulrich >> >> _______________________________________________ >> Manage your subscription: >> https://lists.clusterlabs.org/mailman/listinfo/users >> >> ClusterLabs home: https://www.clusterlabs.org/ > _______________________________________________ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/