Michal Koutný napsal(a):
On 02/18/2016 10:40 AM, Christine Caulfield wrote:
I definitely remember looking into this, or something very like it, ages
ago. I can't find anything in the commit logs for either corosync or
cman that looks relevant though. If you're seeing it on recent builds
then it's obviously still a problem anyway and we ought to look into it!
Thanks for you replies.
So far this happened only once and we've done only "post mortem", alas
no available reproducer. If I have time, I'll try to reproduce it
Ok. Actually I was trying to reproduce and was really not successful
(current master). Steps I've used:
- 2 nodes, token set to 30 sec
- execute cpgbench on node2
- pause node1 corosync (ctrl+z), kill node1 corosync (kill -9 %1)
- wait until corosync on node2 move into "entering GATHER
state from..."
- execute corosync on node1
Basically during recovery new node trans list was never send (and/or
ignored by node2).
I'm going to try test v1.4.7, but it's also possible that bug is fixed
by other commits (my favorites are cfbb021e130337603fe5b545d1e377296ecb92ea,
4ee84c51fa73c4ec7cbee922111a140a3aaf75df,
f135b680967aaef1d466f40170c75ae3e470e147).
Regards,
Honza
locally and check whether it exists in the current version.
Michal
_______________________________________________
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
_______________________________________________
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org