Hi. We have, sometimes, a problem in our two nodes cluster on CentOS7. Let node-2 and node-3 be the names of the nodes. When the problem happens, both nodes are recognized OFFLINE on node-3 and on node-2, only node-3 is recognized OFFLINE.
When that happens, the following log message is added repeatedly on node-2 and log file (/var/log/cluster/corosync.log) becomes hundreds of megabytes in short time. Log message content on node-3 is different. The erroneous state is temporally solved if OS of node-2 is restarted. On the other hand, restarting OS of node-3 results in the same state. I’ve searched content of ML and found a post (Mon Oct 1 01:27:39 CEST 2012) about "Discarding update with feature set” problem. According to the message, our problem may be solved by removing /var/lib/pacemaker/crm/cib.* on node-2. What I want to know is whether removing the above files on just one of the node is safe ? If there’s other method to solve the problem, I’d like to hear that. Thanks. —— from corosync.log ———————————————————————————————— cib: error: cib_perform_op: Discarding update with feature set '3.0.11' greater than our own '3.0.10' cib: error: cib_process_request: Completed cib_replace operation for section 'all': Protocol not supported (rc=-93, origin=node-3/crmd/12708, version=0.83.30) crmd: error: finalize_sync_callback: Sync from node-3 failed: Protocol not supported crmd: info: register_fsa_error_adv: Resetting the current action list crmd: warning: do_log: Input I_ELECTION_DC received in state S_FINALIZE_JOIN from finalize_sync_callback crmd: info: do_state_transition: State transition S_FINALIZE_JOIN -> S_INTEGRATION | input=I_ELECTION_DC cause=C_FSA_INTERNAL origin=finalize_sync_callback crmd: info: crm_update_peer_join: initialize_join: Node node-2[1] - join-6329 phase 2 -> 0 crmd: info: crm_update_peer_join: initialize_join: Node node-3[2] - join-6329 phase 2 -> 0 crmd: info: update_dc: Unset DC. Was node-2 crmd: info: join_make_offer: join-6329: Sending offer to node-2 crmd: info: crm_update_peer_join: join_make_offer: Node node-2[1] - join-6329 phase 0 -> 1 crmd: info: join_make_offer: join-6329: Sending offer to node-3 crmd: info: crm_update_peer_join: join_make_offer: Node node-3[2] - join-6329 phase 0 -> 1 crmd: info: do_dc_join_offer_all: join-6329: Waiting on 2 outstanding join acks crmd: info: update_dc: Set DC to node-2 (3.0.10) crmd: info: crm_update_peer_join: do_dc_join_filter_offer: Node node-2[1] - join-6329 phase 1 -> 2 crmd: info: crm_update_peer_join: do_dc_join_filter_offer: Node node-3[2] - join-6329 phase 1 -> 2 crmd: info: do_state_transition: State transition S_INTEGRATION -> S_FINALIZE_JOIN | input=I_INTEGRATED cause=C_FSA_INTERNAL origin=check_join_state crmd: info: crmd_join_phase_log: join-6329: node-2=integrated crmd: info: crmd_join_phase_log: join-6329: node-3=integrated crmd: notice: do_dc_join_finalize: Syncing the Cluster Information Base from node-3 to rest of cluster | join-6329 crmd: notice: do_dc_join_finalize: Requested version <generation_tuple crm_feature_set="3.0.11" validate-with="pacemaker-2.5" epoch="84" num_updates="1" admin_epoch="0" cib-last-written="Thu May 11 08:05:45 2017" update-origin="node-2" update-client="crm_resource" update-user="root" have-quorum="1"/> cib: info: cib_process_request: Forwarding cib_sync operation for section 'all' to node-3 (origin=local/crmd/12710) cib: info: cib_process_replace: Digest matched on replace from node-3: 85a19c7927c54ccb15794f2720e07ce1 cib: info: cib_process_replace: Replaced 0.83.30 with 0.84.1 from node-3 cib: info: __xml_diff_object: Moved node_state@crmd (3 -> 2) _______________________________________________ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org