Oh well, pingd is interesting: My guess is that it was originally designed to check the connectivity of an interface by pinging some hosts. but some people seem to use it to check the reachability of a specific host. Regardless of the number of packets being sent, some non-binary behavior would be desired (instead of setting the attribute to 0 or 100 (for example), the value could _range_ from 0 to 1000, indicating the quality of the reachability). As said before, some moving average or exponential average, maybe.
When trying to find out more about pingd, I found this interesting thing in SLES15 SP2 (resource-agents-4.4.0+git57.70549516-3.36.1.x86_64): "crm ra info pingd" reports: --- Monitors connectivity to specific hosts or IP addresses ("ping nodes") (deprecated) (ocf:heartbeat:pingd) Deprecation warning: This agent is deprecated and may be removed from a future release. See the ocf:pacemaker:pingd resource agent for a supported alternative. -- This is a pingd Resource Agent. ... --- However when I use the recommended "crm ra info ocf:pacemaker:pingd", I also get: --- pingd resource agent (ocf:pacemaker:pingd) This agent (ocf:pacemaker:pingd) is deprecated and broken, and has been replaced by the more reliable ocf:pacemaker:ping. It records (in the CIB) the current number of ping nodes (specified in the 'host_list' parameter) a cluster node can connect to. --- The final ocf:pacemaker:ping still has the same poor description: --- dampen (integer, [5s]): Dampening interval The time to wait (dampening) further changes occur --- (IMHO "wait ... _for_ further changes _to_ occur" would be a half-was correct sentence) Regards, Ulrich >>> Klaus Wenninger <kwenn...@redhat.com> schrieb am 15.10.2021 um 08:24 in Nachricht <calrdao3c5dohx2kruqllwnqeo9ymod0ygdzsrt54o6xe1yz...@mail.gmail.com>: > On Thu, Oct 14, 2021 at 10:51 PM martin doc <db1...@hotmail.com> wrote: > >> >> >> ------------------------------ >> *From: *Andrei Borzenkov <arvidj...@gmail.com>, Friday, 15 October 2021 >> 4:59 AM >> *...* >> > Dampening defines delay before attributes are committed to CIB. >> > Private attributes are never ever written into CIB, so dampening >> > makes no sense here. Private attributes are managed by attrd >> > itself and you see the latest value. >> >> > If you change transient attribute (without -p option) value you >> > will see different values reported by >> >> > attrd_updater -n my_ping -Q >> >> > and >> >> > cibadmin -Q -A "//nvpair[@name='my_ping']" >> >> > until dampening timeout expires. >> >> > This applies even to deleting attribute. >> >> Ok, now I understand what the dampen function does. >> >> If I understand this correctly then this probably makes every documented >> example of using ocf:pacemaker:ping with a colocation statement wrong >> because the only way to see the effect of dampen is to use a rule that >> references the value of pingd directly. That or the script for ping has a >> major flaw with respect to dampen. >> > > As we've already tried to explain, purpose of dampening is not > implementation of any > kind of resilience against loss of a certain percentage of packets or > anything similar. > > Basic idea is to have more than one ping host so that - given failure_score > is low enough - > there is gonna be a certain resilience against packet loss. > If your number of ping-hosts isn't large enough you might play with adding > them in multiple > times to get some kind of resilience. > But I agree that this one out of two behavior is probably too resilient for > most cases and > thus there might be room for improvement. > Main pain-point here is that ping-RA allows us to configure the count of > pings sent, but it > is just using the exit-value from ping that becomes negative already when > one of the > answers is missing. > This is why with > https://github.com/ClusterLabs/fence-agents/blob/master/agents/heuristics_pi > ng/fence_heuristics_ping.py > I chose to both give the number of packets sent + number received necessary > to be > assumed as alive. If we assume the latter, when not given at all, as equal > to the number > of packets sent we would preserve unchanged behavior for existent > configurations. > > Klaus > > >> >> That is when I do this: >> >> pcs resource create myPing ocf:pacemaker:ping host_list=192.168.1.1 >> failure_score=1 >> pcs resource create database ocf:heartbeat:pgsql >> pcs group add pgrp myPing database >> >> PCS will move everything to a new node if there is even 1 ping failure >> because monitor in ping doesn't look at the dampened value, only the value >> of the immediate returned value. >> >> The same is true with colocation statements - if a constraint is made with >> a ping resource without using a rule that references pingd then the dampen >> behaviour is ignored completely. >> >> Is the ping'er missing something that does this: >> >> score=`cibadmin -Q -A "//nvpair[@name='ping']" | sed -e >> 's/.*value="\([^"]*\)".*/\1/'` >> >> before it checks if $score is less than $OCF_RESKEY_failure_score? >> >> Thanks >> >> _______________________________________________ >> Manage your subscription: >> https://lists.clusterlabs.org/mailman/listinfo/users >> >> ClusterLabs home: https://www.clusterlabs.org/ >> _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/