Re: [ClusterLabs] Too quick node reboot leads to failed corosync assert on other node(s)

Jan Friesse Fri, 12 Feb 2016 07:54:50 -0800

Michal,

Hello.

The subject is just a hypothesis that I'd like to confirm/discuss here.

TL;DR Token timeout shouldn't be greater than reboot cycle, is that correct?

actually when I was reading your mail it was like "this sounds sofamiliar, we solved this problem looong time ago". Chrissie, are youable to remember/find patch solving this problem.


The situation with the assert is as follows:

node01:
18:08:02 scheduling node01 for STONITH
[ gap ]
18:08:26 start of log after reboot
18:08:36 corosync started

node02:
18:08:02 it starts queuing outgoing messages (it has no token)
18:08:32 timeout, it realizes token is lost in OPERATIONAL state
18:08:32 it enters gather->commit->recovery process
18:08:37 it received commit token for the 2nd time and wants to
          enter recovery state
18:08:37 it thinks that node01 is transitioning from the old ring
          configuration to the new and wants to retransmit old messages
          with seq number greater that node01's aru -- problem is that
          the difference is too large and
          'assert(range < QUEUE_RTR_ITEMS_SIZE_MAX)' in
          memb_state_recovery_enter fails

IOW, node01 shouldn't be considered transitioning from old ring to new
ring only because it is present in both configurations because it was
restarted(!) in between. In the end, its new aru is zero, however, the
seq number before the token loss might have been arbitrarily high,
especially greater than QUEUE_RTR_ITEMS_SIZE_MAX and thus the assertion
fails.


Yep, sounds very reasonable.


I'm referring to the Shift_to_recovery routine as described in paper [1]
on page 21. There are two types of nodes:
- my_new_memb (they are fresh new in the ring),
- my_trans_memb (they transition from the old ring).

I've looked more into the paper, rather than into the code, so my
reasoning might not fit the reality. This happened with corosync-1.4.7,
however, the relevant code of memb_state_recovery_enter in upstream is
still the same.

Below is last output [2] from corosync-fplay on the node which failed
the assertion.
Does this make sense to anyone?
FTR, totem:token value is set to 30000ms, perhaps decreasing this below
reboot cycle would prevent the issue.

Highly probably. Also did you find any difference between case wherenode was fenced (so corosync wasn't killed correctly) and case wherecorosync was properly terminated?

Do you have any reproducer for this problem? It's for sure somethinggood to have fixed.


Regards,
  Honza


Thanks,
Michal

[1] http://corosync.github.com/corosync/doc/tocssrp95.ps.gz
[2] corosync-fplay output

rec=[5444857] time=[2015-10-24 18:08:02] Tracing(1) Messsage=Delivering
MCAST message with seq 12a9b4 to pending delivery queue
rec=[5444858] time=[2015-10-24 18:08:02] Tracing(1) Messsage=releasing
messages up to and including 12a9b3
rec=[5444859] time=[2015-10-24 18:08:02] Tracing(1) Messsage=releasing
messages up to and including 12a9b4

rec=[5444860] time=[2015-10-24 18:08:04] Tracing(1) Messsage=mcasted
message added to pending queue
rec=[5444861] time=[2015-10-24 18:08:06] Tracing(1) Messsage=mcasted
message added to pending queue
rec=[5444862] time=[2015-10-24 18:08:07] Tracing(1) Messsage=mcasted
message added to pending queue
rec=[5444863] time=[2015-10-24 18:08:09] Tracing(1) Messsage=mcasted
message added to pending queue
rec=[5444864] time=[2015-10-24 18:08:10] Tracing(1) Messsage=mcasted
message added to pending queue
rec=[5444865] time=[2015-10-24 18:08:12] Tracing(1) Messsage=mcasted
message added to pending queue
rec=[5444866] time=[2015-10-24 18:08:13] Tracing(1) Messsage=mcasted
message added to pending queue
rec=[5444867] time=[2015-10-24 18:08:15] Tracing(1) Messsage=mcasted
message added to pending queue
rec=[5444868] time=[2015-10-24 18:08:16] Tracing(1) Messsage=mcasted
message added to pending queue
rec=[5444869] time=[2015-10-24 18:08:18] Tracing(1) Messsage=mcasted
message added to pending queue
rec=[5444870] time=[2015-10-24 18:08:20] Tracing(1) Messsage=mcasted
message added to pending queue
rec=[5444871] time=[2015-10-24 18:08:21] Tracing(1) Messsage=mcasted
message added to pending queue
rec=[5444872] time=[2015-10-24 18:08:23] Tracing(1) Messsage=mcasted
message added to pending queue
rec=[5444873] time=[2015-10-24 18:08:24] Tracing(1) Messsage=mcasted
message added to pending queue
rec=[5444874] time=[2015-10-24 18:08:26] Tracing(1) Messsage=mcasted
message added to pending queue
rec=[5444875] time=[2015-10-24 18:08:27] Tracing(1) Messsage=mcasted
message added to pending queue
rec=[5444876] time=[2015-10-24 18:08:30] Tracing(1) Messsage=mcasted
message added to pending queue
rec=[5444877] time=[2015-10-24 18:08:31] Tracing(1) Messsage=mcasted
message added to pending queue

rec=[5444878] time=[2015-10-24 18:08:32] Log Message=The token was lost
in the OPERATIONAL state.
rec=[5444879] time=[2015-10-24 18:08:32] Log Message=A processor failed,
forming new configuration.
rec=[5444880] time=[2015-10-24 18:08:32] Log Message=Receive multicast
socket recv buffer size (262142 bytes).
rec=[5444881] time=[2015-10-24 18:08:32] Log Message=Transmit multicast
socket send buffer size (262142 bytes).
rec=[5444882] time=[2015-10-24 18:08:32] Log Message=Local receive
multicast loop socket recv buffer size (262142 bytes).
rec=[5444883] time=[2015-10-24 18:08:32] Log Message=Local transmit
multicast loop socket send buffer size (262142 bytes).
rec=[5444884] time=[2015-10-24 18:08:32] Log Message=entering GATHER
state from 2.
rec=[5444885] time=[2015-10-24 18:08:33] Tracing(1) Messsage=mcasted
message added to pending queue
rec=[5444886] time=[2015-10-24 18:08:34] Tracing(1) Messsage=mcasted
message added to pending queue
rec=[5444887] time=[2015-10-24 18:08:36] Tracing(1) Messsage=mcasted
message added to pending queue
rec=[5444888] time=[2015-10-24 18:08:37] Log Message=got commit token
rec=[5444889] time=[2015-10-24 18:08:37] Log Message=Saving state aru
12a9b4 high seq received 12a9b4
rec=[5444890] time=[2015-10-24 18:08:37] Log Message=Storing new
sequence id for ring 6a4
rec=[5444891] time=[2015-10-24 18:08:37] Log Message=entering COMMIT state.
rec=[5444892] time=[2015-10-24 18:08:37] Log Message=got commit token
rec=[5444893] time=[2015-10-24 18:08:37] Log Message=entering RECOVERY
state.
rec=[5444894] time=[2015-10-24 18:08:37] Log Message=TRANS [0] member
10.10.102.65: (node1, rebooted node)
rec=[5444895] time=[2015-10-24 18:08:37] Log Message=TRANS [1] member
10.10.102.66: (node2)
rec=[5444896] time=[2015-10-24 18:08:37] Log Message=position [0] member
10.10.102.65:
rec=[5444897] time=[2015-10-24 18:08:37] Log Message=previous ring seq
6a0 rep 10.10.102.65


rec=[5444898] time=[2015-10-24 18:08:37] Log Message=aru 0 high
delivered 0 received flag 0

rec=[5444899] time=[2015-10-24 18:08:37] Log Message=position [1] member
10.10.102.66:
rec=[5444900] time=[2015-10-24 18:08:37] Log Message=previous ring seq
6a0 rep 10.10.102.65
rec=[5444901] time=[2015-10-24 18:08:37] Log Message=aru 12a9b4 high
delivered 12a9b4 received flag 1






_______________________________________________
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org



_______________________________________________
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Too quick node reboot leads to failed corosync assert on other node(s)

Reply via email to