On Thu, Feb 24, 2022 at 4:22 AM Ulrich Windl <ulrich.wi...@rz.uni-regensburg.de> wrote: > > Hi! > > After reading about fence_kdump and fence_kdump_send I wonder: > Does anybody use that in production?
Quite a lot of people, in fact. > Having the networking and bonding in initrd does not sound like a good idea > to me. > Wouldn't it be easier to integrate that functionality into sbd? > I mean: Let sbd wait for a "kdump-ed" message that initrd could send when > kdump is complete. > Basically that would be the same mechanism, but using storage instead of > networking. > > If I get it right, the original fence_kdump would also introduce an extra > fencing delay, and I wonder what happens with a hardware watchdog while a > kdump is in progress... > > The background of all this is that our nodes kernel-panic, and support says > the kdumps are all incomplete. > The events are most likely: > node1: panics (kdump) > other_node: seens node1 had failed and fences it (via sbd). > > However sbd fencing wont work while kdump is executing (IMHO) > > So what happens most likely is that the watchdog terminates the kdump. > In that case all the mess with fence_kdump won't help, right? You can configure extra_modules in your /etc/kdump.conf file to include the watchdog module, and then restart kdump.service. For example: # grep ^extra_modules /etc/kdump.conf extra_modules i6300esb If you're not sure of the name of your watchdog module, wdctl can help you find it. sbd needs to be stopped first, because it keeps the watchdog device timer busy. # pcs cluster stop --all # wdctl | grep Identity Identity: i6300ESB timer [version 0] # lsmod | grep -i i6300ESB i6300esb 13566 0 If you're also using fence_sbd (poison-pill fencing via block device), then you should be able to protect yourself from that during a dump by configuring fencing levels so that fence_kdump is level 1 and fence_sbd is level 2. > > Regards, > Ulrich > > > > > _______________________________________________ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ > -- Regards, Reid Wahl (He/Him), RHCA Senior Software Maintenance Engineer, Red Hat CEE - Platform Support Delivery - ClusterHA _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/