Michal,
Hello. The subject is just a hypothesis that I'd like to confirm/discuss here. TL;DR Token timeout shouldn't be greater than reboot cycle, is that correct?
actually when I was reading your mail it was like "this sounds so familiar, we solved this problem looong time ago". Chrissie, are you able to remember/find patch solving this problem.
The situation with the assert is as follows: node01: 18:08:02 scheduling node01 for STONITH [ gap ] 18:08:26 start of log after reboot 18:08:36 corosync started node02: 18:08:02 it starts queuing outgoing messages (it has no token) 18:08:32 timeout, it realizes token is lost in OPERATIONAL state 18:08:32 it enters gather->commit->recovery process 18:08:37 it received commit token for the 2nd time and wants to enter recovery state 18:08:37 it thinks that node01 is transitioning from the old ring configuration to the new and wants to retransmit old messages with seq number greater that node01's aru -- problem is that the difference is too large and 'assert(range < QUEUE_RTR_ITEMS_SIZE_MAX)' in memb_state_recovery_enter fails IOW, node01 shouldn't be considered transitioning from old ring to new ring only because it is present in both configurations because it was restarted(!) in between. In the end, its new aru is zero, however, the seq number before the token loss might have been arbitrarily high, especially greater than QUEUE_RTR_ITEMS_SIZE_MAX and thus the assertion fails.
Yep, sounds very reasonable.
I'm referring to the Shift_to_recovery routine as described in paper [1] on page 21. There are two types of nodes: - my_new_memb (they are fresh new in the ring), - my_trans_memb (they transition from the old ring). I've looked more into the paper, rather than into the code, so my reasoning might not fit the reality. This happened with corosync-1.4.7, however, the relevant code of memb_state_recovery_enter in upstream is still the same. Below is last output [2] from corosync-fplay on the node which failed the assertion. Does this make sense to anyone? FTR, totem:token value is set to 30000ms, perhaps decreasing this below reboot cycle would prevent the issue.
Highly probably. Also did you find any difference between case where node was fenced (so corosync wasn't killed correctly) and case where corosync was properly terminated?
Do you have any reproducer for this problem? It's for sure something good to have fixed.
Regards, Honza
Thanks, Michal [1] http://corosync.github.com/corosync/doc/tocssrp95.ps.gz [2] corosync-fplay output rec=[5444857] time=[2015-10-24 18:08:02] Tracing(1) Messsage=Delivering MCAST message with seq 12a9b4 to pending delivery queue rec=[5444858] time=[2015-10-24 18:08:02] Tracing(1) Messsage=releasing messages up to and including 12a9b3 rec=[5444859] time=[2015-10-24 18:08:02] Tracing(1) Messsage=releasing messages up to and including 12a9b4 rec=[5444860] time=[2015-10-24 18:08:04] Tracing(1) Messsage=mcasted message added to pending queue rec=[5444861] time=[2015-10-24 18:08:06] Tracing(1) Messsage=mcasted message added to pending queue rec=[5444862] time=[2015-10-24 18:08:07] Tracing(1) Messsage=mcasted message added to pending queue rec=[5444863] time=[2015-10-24 18:08:09] Tracing(1) Messsage=mcasted message added to pending queue rec=[5444864] time=[2015-10-24 18:08:10] Tracing(1) Messsage=mcasted message added to pending queue rec=[5444865] time=[2015-10-24 18:08:12] Tracing(1) Messsage=mcasted message added to pending queue rec=[5444866] time=[2015-10-24 18:08:13] Tracing(1) Messsage=mcasted message added to pending queue rec=[5444867] time=[2015-10-24 18:08:15] Tracing(1) Messsage=mcasted message added to pending queue rec=[5444868] time=[2015-10-24 18:08:16] Tracing(1) Messsage=mcasted message added to pending queue rec=[5444869] time=[2015-10-24 18:08:18] Tracing(1) Messsage=mcasted message added to pending queue rec=[5444870] time=[2015-10-24 18:08:20] Tracing(1) Messsage=mcasted message added to pending queue rec=[5444871] time=[2015-10-24 18:08:21] Tracing(1) Messsage=mcasted message added to pending queue rec=[5444872] time=[2015-10-24 18:08:23] Tracing(1) Messsage=mcasted message added to pending queue rec=[5444873] time=[2015-10-24 18:08:24] Tracing(1) Messsage=mcasted message added to pending queue rec=[5444874] time=[2015-10-24 18:08:26] Tracing(1) Messsage=mcasted message added to pending queue rec=[5444875] time=[2015-10-24 18:08:27] Tracing(1) Messsage=mcasted message added to pending queue rec=[5444876] time=[2015-10-24 18:08:30] Tracing(1) Messsage=mcasted message added to pending queue rec=[5444877] time=[2015-10-24 18:08:31] Tracing(1) Messsage=mcasted message added to pending queue rec=[5444878] time=[2015-10-24 18:08:32] Log Message=The token was lost in the OPERATIONAL state. rec=[5444879] time=[2015-10-24 18:08:32] Log Message=A processor failed, forming new configuration. rec=[5444880] time=[2015-10-24 18:08:32] Log Message=Receive multicast socket recv buffer size (262142 bytes). rec=[5444881] time=[2015-10-24 18:08:32] Log Message=Transmit multicast socket send buffer size (262142 bytes). rec=[5444882] time=[2015-10-24 18:08:32] Log Message=Local receive multicast loop socket recv buffer size (262142 bytes). rec=[5444883] time=[2015-10-24 18:08:32] Log Message=Local transmit multicast loop socket send buffer size (262142 bytes). rec=[5444884] time=[2015-10-24 18:08:32] Log Message=entering GATHER state from 2. rec=[5444885] time=[2015-10-24 18:08:33] Tracing(1) Messsage=mcasted message added to pending queue rec=[5444886] time=[2015-10-24 18:08:34] Tracing(1) Messsage=mcasted message added to pending queue rec=[5444887] time=[2015-10-24 18:08:36] Tracing(1) Messsage=mcasted message added to pending queue rec=[5444888] time=[2015-10-24 18:08:37] Log Message=got commit token rec=[5444889] time=[2015-10-24 18:08:37] Log Message=Saving state aru 12a9b4 high seq received 12a9b4 rec=[5444890] time=[2015-10-24 18:08:37] Log Message=Storing new sequence id for ring 6a4 rec=[5444891] time=[2015-10-24 18:08:37] Log Message=entering COMMIT state. rec=[5444892] time=[2015-10-24 18:08:37] Log Message=got commit token rec=[5444893] time=[2015-10-24 18:08:37] Log Message=entering RECOVERY state. rec=[5444894] time=[2015-10-24 18:08:37] Log Message=TRANS [0] member 10.10.102.65: (node1, rebooted node) rec=[5444895] time=[2015-10-24 18:08:37] Log Message=TRANS [1] member 10.10.102.66: (node2) rec=[5444896] time=[2015-10-24 18:08:37] Log Message=position [0] member 10.10.102.65: rec=[5444897] time=[2015-10-24 18:08:37] Log Message=previous ring seq 6a0 rep 10.10.102.65 rec=[5444898] time=[2015-10-24 18:08:37] Log Message=aru 0 high delivered 0 received flag 0 rec=[5444899] time=[2015-10-24 18:08:37] Log Message=position [1] member 10.10.102.66: rec=[5444900] time=[2015-10-24 18:08:37] Log Message=previous ring seq 6a0 rep 10.10.102.65 rec=[5444901] time=[2015-10-24 18:08:37] Log Message=aru 12a9b4 high delivered 12a9b4 received flag 1 _______________________________________________ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
_______________________________________________ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org