Resurrecting a really old thread in case anyone had similar questions.
This arrived at a crazy busy time and got neglected.

The "finalization timer" is a timeout in the DC election process. The
value is the join-finalization-timeout cluster property (formerly crmd-
finalization-timeout), which defaults to 30 minutes.

The whole election process is undocumented and quite arcane. It would
be nice to document it but that's a bigger project than there is time
for at the moment.

The controller (crmd) is implemented as a finite state machine, meaning
various inputs move it from one state to another according to fixed
rules. The finalization timer is started when the "finalize join" state
is reached, and stopped whenever that state is left. This state is
achieved once the (possibly newly elected) DC has received join
requests from all nodes, and is left once the DC has sync'd the CIB to
those nodes, ack'd their join requests, and received confirmations back
from them.

Obviously it's too late to look at this particular incident, but if it
can be reliably reproduced, I can take a new look.

On Sun, 2018-10-28 at 23:32 +0530, Prasad Nagaraj wrote:
> Hi :
> 
> I came across  a strange situation in my cluster few days back. I was
> trying to replace one of the nodes in my 3 node cluster and I removed
> and added that node back following due process as per 
> https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/_removing_a_corosync_node.html
> . Updated corosync.conf on all nodes and restarted corosync and
> pacemaker on all the nodes. The cluster didnt elect a DC and also
> didnt report any activities for almost 28 mins as seen from below
> logs. Then I saw this message: 
> Oct 22 22:35:14 [76417] vm85c4465533       crmd:     info:
> crm_timer_popped:    Finalization Timer (I_ELECTION) just popped
> (1800000ms)
> after which I could see further activities happening including DC
> election.
> 
> I was not able to understand or identify any reasons for this
> behavior and also there is absolutely no documentation on this
> Finalization timer and what it means. Appreciate any help in terms of
> explaining what exactly this timer means and what could be reasons
> for this behavior. I have pasted a snippet of logs during the time
> here. I do have more logs and also logs from other nodes that I can
> share if required.
> 
> Thanks in advance for the help!
> Prasad
> 
> 
> Oct 22 22:07:46 [76412] vm85c4465533        cib:     info:
> cib_process_request: Completed cib_modify operation for section
> nodes: OK (rc=0, origin=vm46890219c5/crm_attribute/4,
> version=0.100.0)
> Oct 22 22:07:46 [76412] vm85c4465533        cib:     info:
> cib_file_write_with_digest:  Reading cluster configuration file
> /var/lib/pacemaker/cib/cib.QC3Kay (digest:
> /var/lib/pacemaker/cib/cib.c74mjr)
> Oct 22 22:07:47 [76412] vm85c4465533        cib:     info:
> cib_file_backup:     Archived previous version as
> /var/lib/pacemaker/cib/cib-96.raw
> Oct 22 22:07:47 [76412] vm85c4465533        cib:     info:
> cib_file_write_with_digest:  Wrote version 0.100.0 of the CIB to disk
> (digest: 3ab4bffdc9c372985cfe50ad3100131d)
> Oct 22 22:07:47 [76412] vm85c4465533        cib:     info:
> cib_file_write_with_digest:  Reading cluster configuration file
> /var/lib/pacemaker/cib/cib.7NqOEy (digest:
> /var/lib/pacemaker/cib/cib.gamHhs)
> Oct 22 22:07:48 [76412] vm85c4465533        cib:     info:
> cib_perform_op:      Diff: --- 0.100.0 2
> Oct 22 22:07:48 [76412] vm85c4465533        cib:     info:
> cib_perform_op:      Diff: +++ 0.101.0 (null)
> Oct 22 22:07:48 [76412] vm85c4465533        cib:     info:
> cib_perform_op:      +  /cib:  @epoch=101
> Oct 22 22:07:48 [76412] vm85c4465533        cib:     info:
> cib_perform_op:      ++ /cib/configuration/constraints: 
> <rsc_location id="ms_mysql_member453" rsc="ms_mysql" score="0"
> node="vm46890219c5"/>
> Oct 22 22:07:48 [76412] vm85c4465533        cib:     info:
> cib_process_request: Completed cib_apply_diff operation for section
> 'all': OK (rc=0, origin=local/cibadmin/2, version=0.101.0)
> Oct 22 22:07:48 [76412] vm85c4465533        cib:     info:
> cib_file_backup:     Archived previous version as
> /var/lib/pacemaker/cib/cib-97.raw
> Oct 22 22:07:48 [76412] vm85c4465533        cib:     info:
> cib_file_write_with_digest:  Wrote version 0.101.0 of the CIB to disk
> (digest: 7ffa91a8ca752581d4c1df13d287e467)
> Oct 22 22:07:48 [76412] vm85c4465533        cib:     info:
> cib_file_write_with_digest:  Reading cluster configuration file
> /var/lib/pacemaker/cib/cib.JkMV2C (digest:
> /var/lib/pacemaker/cib/cib.mLy23A)
> Oct 22 22:07:49 [76412] vm85c4465533        cib:     info:
> cib_perform_op:      Diff: --- 0.101.0 2
> Oct 22 22:07:49 [76412] vm85c4465533        cib:     info:
> cib_perform_op:      Diff: +++ 0.102.0 (null)
> Oct 22 22:07:49 [76412] vm85c4465533        cib:     info:
> cib_perform_op:      +  /cib:  @epoch=102
> Oct 22 22:07:49 [76412] vm85c4465533        cib:     info:
> cib_perform_op:      + 
> /cib/configuration/resources/master[@id='ms_mysql']/meta_attributes[@
> id='ms_mysql-meta_attributes']/nvpair[@id='ms_mysql-meta_attributes-
> maintenance']:  @value=false
> Oct 22 22:07:49 [76412] vm85c4465533        cib:     info:
> cib_process_request: Completed cib_modify operation for section
> resources: OK (rc=0, origin=local/crm_resource/6, version=0.102.0)
> Oct 22 22:07:49 [76412] vm85c4465533        cib:     info:
> cib_file_backup:     Archived previous version as
> /var/lib/pacemaker/cib/cib-98.raw
> Oct 22 22:07:49 [76412] vm85c4465533        cib:     info:
> cib_file_write_with_digest:  Wrote version 0.102.0 of the CIB to disk
> (digest: dd6263dc226d652721b09ec702e37742)
> Oct 22 22:07:49 [76412] vm85c4465533        cib:     info:
> cib_file_write_with_digest:  Reading cluster configuration file
> /var/lib/pacemaker/cib/cib.DOunDE (digest:
> /var/lib/pacemaker/cib/cib.MYBrjE)
> Oct 22 22:35:14 [76417] vm85c4465533       crmd:     info:
> crm_timer_popped:    Finalization Timer (I_ELECTION) just popped
> (1800000ms)
> Oct 22 22:35:14 [76417] vm85c4465533       crmd:     info:
> do_state_transition: State transition S_FINALIZE_JOIN -> S_ELECTION [
> input=I_ELECTION cause=C_TIMER_POPPED origin=crm_timer_popped ]
> Oct 22 22:35:14 [76417] vm85c4465533       crmd:     info:
> update_dc:   Unset DC. Was vm85c4465533
> Oct 22 22:35:14 [76417] vm85c4465533       crmd:     info:
> election_complete:   Election election-0 complete
> Oct 22 22:35:14 [76417] vm85c4465533       crmd:     info:
> election_timeout_popped:     Election failed: Declaring ourselves the
> winner
> Oct 22 22:35:14 [76417] vm85c4465533       crmd:     info: do_log:   
>   FSA: Input I_ELECTION_DC from election_timeout_popped() received in
> state S_ELECTION
> Oct 22 22:35:14 [76417] vm85c4465533       crmd:   notice:
> do_state_transition: State transition S_ELECTION -> S_INTEGRATION [
> input=I_ELECTION_DC cause=C_TIMER_POPPED
> origin=election_timeout_popped ]
> Oct 22 22:35:14 [76417] vm85c4465533       crmd:     info:
> do_dc_takeover:      Taking over DC status for this partition
> Oct 22 22:35:14 [76412] vm85c4465533        cib:     info:
> cib_process_request: Completed cib_master operation for section
> 'all': OK (rc=0, origin=local/crmd/86, version=0.102.0)
> Oct 22 22:35:14 [76412] vm85c4465533        cib:     info:
> cib_process_request: Completed cib_modify operation for section cib:
> OK (rc=0, origin=local/crmd/87, version=0.102.0)
> Oct 22 22:35:14 [76412] vm85c4465533        cib:     info:
> cib_process_request: Completed cib_modify operation for section
> crm_config: OK (rc=0, origin=local/crmd/89, version=0.102.0)
> Oct 22 22:35:14 [76412] vm85c4465533        cib:     info:
> cib_process_request: Completed cib_modify operation for section
> crm_config: OK (rc=0, origin=local/crmd/91, version=0.102.0)
> Oct 22 22:35:14 [76417] vm85c4465533       crmd:     info:
> crm_update_peer_join:        initialize_join: Node
> vm46890219c5[218109612] - join-2 phase 4 -> 0
> Oct 22 22:35:14 [76417] vm85c4465533       crmd:     info:
> crm_update_peer_join:        initialize_join: Node
> vm85c4465533[83891884] - join-2 phase 4 -> 0
> Oct 22 22:35:14 [76417] vm85c4465533       crmd:     info:
> crm_update_peer_join:        initialize_join: Node
> vm2ad44008dc[251664044] - join-2 phase 3 -> 0
> Oct 22 22:35:14 [76417] vm85c4465533       crmd:     info:
> join_make_offer:     join-2: Sending offer to vm46890219c5
> Oct 22 22:35:14 [76417] vm85c4465533       crmd:     info:
> crm_update_peer_join:        join_make_offer: Node
> vm46890219c5[218109612] - join-2 phase 0 -> 1
> Oct 22 22:35:14 [76417] vm85c4465533       crmd:     info:
> join_make_offer:     join-2: Sending offer to vm85c4465533
> Oct 22 22:35:14 [76417] vm85c4465533       crmd:     info:
> crm_update_peer_join:        join_make_offer: Node
> vm85c4465533[83891884] - join-2 phase 0 -> 1
> Oct 22 22:35:14 [76417] vm85c4465533       crmd:     info:
> join_make_offer:     join-2: Sending offer to vm2ad44008dc
> Oct 22 22:35:14 [76417] vm85c4465533       crmd:     info:
> crm_update_peer_join:        join_make_offer: Node
> vm2ad44008dc[251664044] - join-2 phase 0 -> 1
> Oct 22 22:35:14 [76417] vm85c4465533       crmd:     info:
> do_dc_join_offer_all:        join-2: Waiting on 3 outstanding join
> acks
> Oct 22 22:35:14 [76417] vm85c4465533       crmd:  warning: do_log:   
>   FSA: Input I_ELECTION_DC from do_election_check() received in state
> S_INTEGRATION
> Oct 22 22:35:14 [76417] vm85c4465533       crmd:     info:
> crm_update_peer_join:        initialize_join: Node
> vm46890219c5[218109612] - join-3 phase 1 -> 0
> Oct 22 22:35:14 [76417] vm85c4465533       crmd:     info:
> crm_update_peer_join:        initialize_join: Node
> vm85c4465533[83891884] - join-3 phase 1 -> 0
> Oct 22 22:35:14 [76417] vm85c4465533       crmd:     info:
> crm_update_peer_join:        initialize_join: Node
> vm2ad44008dc[251664044] - join-3 phase 1 -> 0
> Oct 22 22:35:14 [76417] vm85c4465533       crmd:     info:
> join_make_offer:     join-3: Sending offer to vm46890219c5
> Oct 22 22:35:14 [76417] vm85c4465533       crmd:     info:
> crm_update_peer_join:        join_make_offer: Node
> vm46890219c5[218109612] - join-3 phase 0 -> 1
> Oct 22 22:35:14 [76417] vm85c4465533       crmd:     info:
> join_make_offer:     join-3: Sending offer to vm85c4465533
> Oct 22 22:35:14 [76417] vm85c4465533       crmd:     info:
> crm_update_peer_join:        join_make_offer: Node
> vm85c4465533[83891884] - join-3 phase 0 -> 1
> Oct 22 22:35:14 [76417] vm85c4465533       crmd:     info:
> join_make_offer:     join-3: Sending offer to vm2ad44008dc
> Oct 22 22:35:14 [76417] vm85c4465533       crmd:     info:
> crm_update_peer_join:        join_make_offer: Node
> vm2ad44008dc[251664044] - join-3 phase 0 -> 1
> Oct 22 22:35:14 [76417] vm85c4465533       crmd:     info:
> do_dc_join_offer_all:        join-3: Waiting on 3 outstanding join
> acks
> Oct 22 22:35:14 [76412] vm85c4465533        cib:     info:
> cib_process_request: Completed cib_modify operation for section
> crm_config: OK (rc=0, origin=local/crmd/93, version=0.102.0)
> Oct 22 22:35:14 [76417] vm85c4465533       crmd:     info:
> plugin_handle_membership:    Membership 52: quorum retained
> Oct 22 22:35:14 [76417] vm85c4465533       crmd:     info:
> crmd_cs_dispatch:    Setting expected votes to 3
> Oct 22 22:35:14 [76417] vm85c4465533       crmd:     info:
> plugin_handle_membership:    Membership 52: quorum retained
> Oct 22 22:35:14 [76412] vm85c4465533        cib:     info:
> cib_process_request: Completed cib_modify operation for section
> crm_config: OK (rc=0, origin=local/crmd/96, version=0.102.0)
> Oct 22 22:35:14 [76417] vm85c4465533       crmd:     info:
> crmd_cs_dispatch:    Setting expected votes to 3
> Oct 22 22:35:14 [76417] vm85c4465533       crmd:     info:
> update_dc:   Set DC to vm85c4465533 (3.0.10)
-- 
Ken Gaillot <kgail...@redhat.com>

_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Reply via email to