11.12.2017 23:06, Ken Gaillot wrote:
[...]
=====

* The first issue I found (and I expect that to be a reason for some
other issues) is that
pacemaker_remote does not drop an old crmds' connection after new
crmd connects.
As IPC proxy connections are in the hash table, there is a 50% chance
that remoted tries to
reach an old crmd to f.e. proxy checks of node attributes when
resources are reprobed.
That leads to timeouts of that resources' probes with consequent
reaction from a cluster.
A solution here could be to drop old IPC proxy connection as soon as
new one is established.

We can't drop connections from the pacemaker_remoted side because it
doesn't know anything about the cluster state (e.g. whether the cluster
connection resource is live-migrating).

Well, ok. But what happens when the fenced cluster node goes back and receives a TCP packet from the old connection? Yes, it sends RST which would terminate a connection on the peer side and then pcmk_remoted should shutdown it on a socket event.


However we can simply always use the most recently connected provider,
which I think solves the issue. See commit e9a7e3bb, one of a few
recent bugfixes in the master branch for pacemaker_remoted. It will
most likely not make it into 2.0 (which I'm trying to focus on
deprecated syntax removals), but the next release after that.

Will definitely try it, all stakeholders are already notified that we need another round on all available hardware :) We will test as soon as it becomes free.

I will return to this as soon as I have some results.

Thank you,
Vladislav

_______________________________________________
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Reply via email to