Hi! So maybe the original defective RA would be valuable for debugging the issue. I guess the RA was invalid in some way that wasn't detected or handled properly...
Regards, Ulrich >>> Andrei Borzenkov <arvidj...@gmail.com> schrieb am 21.05.2019 um 09:13 in Nachricht <bd253405-e98c-251e-e908-1431d6d65...@gmail.com>: > 21.05.2019 0:46, Ken Gaillot пишет: >>> >>>> From what's described here, the op-restart-digest is changing every >>>> time, which means something is going wrong in the hash comparison >>>> (since the definition is not really changing). >>>> >>>> The log that stands out to me is: >>>> >>>> trace May 18 23:02:49 calculate_xml_digest_v1(83):0: >>>> digest:source <parameters id="0"/> >>>> >>>> The id is the resource name, which isn't "0". That leads me to: >>>> >>>> trace May 18 23:02:49 svc_read_output(87):0: Got 499 chars: >>>> <parameter name="id" unique="1" required="1"> >>>> >>>> which is the likely source of the problem. "id" is a pacemaker >>>> property, >>>> not an OCF resource parameter. It shouldn't be in the resource >>>> agent >>>> meta-data. Remove that, and I bet it will be OK. >>> >>> I renamed the parameter to "tunnel_id", redefined the resources and >>> started them again. >>> >>>> BTW the "every 15 minutes" would be the cluster-recheck-interval >>>> cluster property. >>> >>> I have waited more than half an hour and there are no more >>> stopping/starting of the resources. :-) I haven't thought that "id" >>> is >>> reserved as parameter name. >> >> It isn't, by the OCF standard. :) This could be considered a pacemaker >> bug; pacemaker should be able to distinguish its own "id" from an OCF >> parameter "id", but it currently can't. >> > > > I'm really baffled by this explanation. I tried to create resource with > "id" unique instance property and I do not observe this problem. No > restarts. > > As none of traces provided captures of the moment of restart-digest > mismatch I also am not sure where to look. I do not see "id" being > treated anyway specially in the code. > > Somewhat interesting is that restart digest source in two traces is > different: > > bor@bor-Latitude-E5450:~$ grep -w 'restart digest' /tmp/trace.log* > /tmp/trace.log:trace May 18 23:02:49 append_restart_list(694):0: > restart digest source <parameters id="0"/> > /tmp/trace.log:trace May 18 23:02:50 append_restart_list(694):0: > restart digest source <parameters id="1"/> > /tmp/trace.log.2:trace May 20 13:56:16 append_restart_list(694):0: > restart digest source <parameters name="eduroam IPv4 tunnel" id="0"/> > /tmp/trace.log.2:trace May 20 13:56:17 append_restart_list(694):0: > restart digest source <parameters name="eduroam IPv4 tunnel" id="0"/> > /tmp/trace.log.2:trace May 20 13:56:18 append_restart_list(694):0: > restart digest source <parameters name="Wigner guest IPv4 tunnel" id="1"/> > bor@bor-Latitude-E5450:~$ > > In one case it does not include "name" parameter. Whether configuration > was changed in between is unknown, we never have seen full RA metadata > in each case nor full resource definition so ... > > My hunch is that "id" is red herring and something else has changed when > resource definition was edited. If I'm wrong I appreciate pointer to > code where "id" is mishandled. > _______________________________________________ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/