On Fri, Jun 3, 2022 at 10:19 AM Zoran Bošnjak <zoran.bosn...@via.si> wrote: > > Hi all, > I would appreciate an advice about sbd fencing (without shared storage). > > I am using ubuntu 20.04., with default packages from the repository > (pacemaker, corosync, fence-agents, ipmitool, pcs...). > > HW watchdog is present on servers. The first problem was to load/unload the > watchdog module. For some reason the module is blacklisted on ubuntu, so I've > created a service for this purpose. > > --- file: /etc/systemd/system/watchdog.service > [Unit] > Description=Load watchdog timer module > After=syslog.target > > [Service] > Type=oneshot > RemainAfterExit=yes > ExecStart=/sbin/modprobe ipmi_watchdog > ExecStop=/sbin/rmmod ipmi_watchdog > > [Install] > WantedBy=multi-user.target > --- > > Is this a proper way to load watchdog module under ubuntu? > > Anyway, once the module is loaded, the /dev/watchdog (which is required by > 'sbd') is present. > Next, the 'sbd' is installed by > > sudo apt install sbd > (followed by one reboot to get the sbd active) > > The configuration of the 'sbd' is default. The sbd reacts to network failure > as expected (reboots the server). However, when the 'sbd' is active, the > server won't reboot normally any more. For example from the command line > "sudo reboot", it gets stuck at the end of the reboot sequence. There is a > message on the console: > > ... reboot progress > [ OK ] Finished Reboot. > [ OK ] Reached target Reboot. > [ ... ] IPMI Watchdog: Unexpected close, not stopping watchdog! > [ ... ] IPMI Watchdog: Unexpected close, not stopping watchdog! > ... it gets stuck at this point > > After some long timeout, it looks like the watchdog timer expires and server > boots, but the failure indication remains on the front panel of the server. > If I uninstall the 'sbd' package, the "sudo reboot" works normally again. > > My question is: How do I configure the system, to have the 'sbd' function > present, but still be able to reboot the system normally.
Loading modules - depending on distribution an version - should probably rather be done editing /etc/modules or putting some files under /etc/modprobe-d/. Guess in your case stopping the unit won't work as the watchdog-device is still opened by sbd. In general I don't see why the watchdog-module should be unloaded upon shutdown. So as a first try you just might remove that part. Klaus > > regards, > Zoran > _______________________________________________ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ > _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/