On Wed, Feb 16, 2022 at 4:59 PM Klaus Wenninger <kwenn...@redhat.com> wrote:
> > > On Wed, Feb 16, 2022 at 4:26 PM Klaus Wenninger <kwenn...@redhat.com> > wrote: > >> >> >> On Wed, Feb 16, 2022 at 3:09 PM Ulrich Windl < >> ulrich.wi...@rz.uni-regensburg.de> wrote: >> >>> Hi! >>> >>> When changing some FC cables I noticed that sbd complained 2 seconds >>> after the connection went down (event though the device is multi-pathed >>> with other paths being still up). >>> I don't know any sbd parameter being set so low that after 2 seconds sbd >>> would panic. Which parameter (if any) is responsible for that? >>> >>> In fact multipath takes up to 5 seconds to adjust paths. >>> >>> Here are some sample events (sbd-1.5.0+20210720.f4ca41f-3.6.1.x86_64 >>> from SLES15 SP3): >>> Feb 14 13:01:36 h18 kernel: qla2xxx [0000:41:00.0]-500b:3: LOOP DOWN >>> detected (2 7 0 0). >>> Feb 14 13:01:38 h18 sbd[6621]: /dev/disk/by-id/dm-name-SBD_1-3P2: >>> error: servant_md: slot read failed in servant. >>> Feb 14 13:01:38 h18 sbd[6619]: /dev/disk/by-id/dm-name-SBD_1-3P1: >>> error: servant_md: mbox read failed in servant. >>> Feb 14 13:01:40 h18 sbd[6615]: warning: inquisitor_child: Servant >>> /dev/disk/by-id/dm-name-SBD_1-3P1 is outdated (age: 11) >>> Feb 14 13:01:40 h18 sbd[6615]: warning: inquisitor_child: Servant >>> /dev/disk/by-id/dm-name-SBD_1-3P2 is outdated (age: 11) >>> Feb 14 13:01:40 h18 sbd[6615]: warning: inquisitor_child: Majority of >>> devices lost - surviving on pacemaker >>> Feb 14 13:01:42 h18 kernel: sd 3:0:3:2: rejecting I/O to offline device >>> Feb 14 13:01:42 h18 kernel: blk_update_request: I/O error, dev sdbt, >>> sector 2048 op 0x0:(READ) flags 0x4200 phys_seg 1 prio class 1 >>> Feb 14 13:01:42 h18 kernel: device-mapper: multipath: 254:17: Failing >>> path 68:112. >>> Feb 14 13:01:42 h18 kernel: sd 3:0:1:2: rejecting I/O to offline device >>> >> Sry forgotten to address the following. > > Guess your sbd-package predates > > https://github.com/ClusterLabs/sbd/commit/9e6cbbad9e259de374cbf41b713419c342528db1 > and thus doesn't properly destroy the io-context using the aio-api. > This flaw has been in kind of since ever and I actually found it due to a > kernel-issue that made > all block-io done the way sbd is doing it (aio + O_SYNC + O_DIRECT > Actually never successfully > tracked it down to the real kernel issue playing with kprobes. But it was > gone on the next kernel > update > ) timeout. > Without survival on pacemaker it would have suicided after > msgwait-timeout (10s in your case probably). > Would be interesting what happens if you raise msgwait-timeout to a value > that would allow > another read attempt. > Does your setup actually recover? Could be possible that it doesn't > missing the fix referenced above. > One more thing: Even if it looks as if it recovers there might be a leak of kernel resources (maybe per process) so that issues surface just after the timeout has happened several times. > > Regards, > Klaus > >> >>> Most puzzling is the fact that sbd reports a problem 4 seconds before >>> the kernel reports an I/O error. I guess sbd "times out" the pending read. >>> >> Yep - that is timeout_io defaulting to 3s. >> You can set it with -I daemon start parameter. >> Together with the rest of the default-timeout-scheme the 3s do make sense. >> Not sure but if you increase that significantly you might have to adapt >> other timeouts. >> There are a certain number of checks regarding relationship of timeouts >> but they might not be exhaustive. >> >>> >>> The thing is: Both SBD disks are on different storage systems, each >>> being connected by two separate FC fabrics, but still when disconnecting >>> one cable from the host sbd panics. >>> My guess is if "surviving on pacemaker" would not have happened, the >>> node would be fenced; is that right? >>> >>> The other thing I wonder is the "outdated age": >>> How can the age be 11 (seconds) when the disk was disconnected 4 seconds >>> ago? >>> It seems here the age is "current time - time_of_last read" instead of >>> "current_time - time_when read_attempt_started". >>> >> Exactly! And that is the correct way to do it as we need to record the >> time passed since last successful read. >> There is no value in starting the clock when we start the read attempt as >> these attempts are not synced throughout >> the cluster. >> >> Regards, >> Klaus >> >>> >>> Regards, >>> Ulrich >>> >>> >>> >>> >>> _______________________________________________ >>> Manage your subscription: >>> https://lists.clusterlabs.org/mailman/listinfo/users >>> >>> ClusterLabs home: https://www.clusterlabs.org/ >>> >>>
_______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/