Hi Folks, I am using pacemaker-1.1.10 with heartbeat-3.0.5 on Ubuntu Trusty & hit upon a bug in CRMD/CCM. This bug exists in mainline pacemaker as well.
The bug can be easily exposed in a 2-node cluster by adding a sleep(5) within do_ha_control() just before it does crm_cluster_connect(cluster) & reboot one of the node. This node will remain in pending state & never join the cluster. The bug is this # cib/ccm connects into heartbeat cluster earlier & also crmd connects into cib/ccm on this node. Because of this peer gets membership events & considers crmd on the incoming node as online, though crmd on it is not connected to heartbeat cluster. So the node remaining as DC when it sends join-offers they are lost & election doesnt progress. Later when the incoming node actually connects to heartbeat cluster it generates node events that are wrongly considered at CRMD as offline events & we remove the membership about the incoming node. After this point all messages from peer are completely ignored. Can you pls suggest what is the best way to fix this problem? Would making crmd connecting to heartbeat first & then to rest of cib/ccm help? Here are detailed logs on the problem node1 is coming up & starts heartbeat Nov 26 19:07:08 node1 heartbeat: [2937]: info: Configuration validated. Starting heartbeat 3.0.5 as part of this the running node2/DC gets link up event & also gets crm event that peer heartbeat is online Nov 26 19:07:08 node2 heartbeat: [3393]: info: Link node1:eth0 up. Nov 26 19:07:08 node2 heartbeat: [3393]: info: Status update for node node1: status init Nov 26 19:07:08 node2 crmd[3523]: notice: crmd_ha_status_callback: Status update: Node node1 now has status [init] Nov 26 19:07:08 node2 crmd[3523]: info: crm_update_peer_proc: crmd_ha_status_callback: Node node1[0] - heartbeat is now online Nov 26 19:07:08 node2 crmd[3523]: crit: peer_update_callback: Client node1/peer now has status [offline] (DC=true) Nov 26 19:07:08 node2 crmd[3523]: crit: peer_update_callback: No change 1 1000001 4000200 Nov 26 19:07:08 node2 crmd[3523]: notice: crmd_ha_status_callback: Status update: Node node1 now has status [up] while crmd/cib on node1 starts up + crmd connects into cib Nov 26 19:07:09 node1 crmd[2969]: notice: main: CRM Git Version: 42f2063 Nov 26 19:07:09 node1 crmd[2969]: debug: crmd_init: Starting crmd Nov 26 19:07:09 node1 cib[2965]: info: crm_get_peer: Node 0 has uuid 00000432-0432-0000-2b91-000000000000 Nov 26 19:07:09 node1 cib[2965]: info: register_heartbeat_conn: Hostname: node1 Nov 26 19:07:09 node1 cib[2965]: info: register_heartbeat_conn: UUID: 00000432-0432-0000-2b91-000000000000 Nov 26 19:07:09 node1 cib[2965]: info: ccm_connect: Registering with CCM... Nov 26 19:07:09 node1 cib[2965]: debug: ccm_connect: CCM Activation passed... all set to go! Nov 26 19:07:09 node1 cib[2965]: info: cib_init: Requesting the list of configured nodes Nov 26 19:07:10 node1 crmd[2969]: debug: cib_native_signon_raw: Connection to CIB successful Nov 26 19:07:10 node1 cib[2965]: info: cib_process_request: Completed cib_query operation for section 'all': OK (rc=0, origin=local/crmd/2, version=1.3.0) Nov 26 19:07:10 node1 cib[2965]: info: crm_client_new: Connecting 0x7fcbfab10480 for uid=0 gid=0 pid=2967 id=276f6f8e-14cc-426e-bf63-b79674f1bfaa Nov 26 19:07:10 node1 cib[2965]: debug: handle_new_connection: IPC credentials authenticated (2965-2967-12) Nov 26 19:07:10 node1 cib[2965]: debug: qb_ipcs_shm_connect: connecting to client [2967] Nov 26 19:07:10 node1 cib[2965]: debug: qb_rb_open_2: shm size:524301; real_size:528384; rb->word_size:132096 cib on DC gets notified/marked online Nov 26 19:07:09 node2 cib[3519]: info: cib_client_status_callback: Status update: Client node1/cib now has status [join] Nov 26 19:07:09 node2 crmd[3523]: notice: crmd_ha_status_callback: Status update: Node node1 now has status [active] Nov 26 19:07:09 node2 crmd[3523]: notice: crmd_client_status_callback: Status update: Client node1/crmd now has status [join] (DC=true) Nov 26 19:07:09 node2 cib[3519]: info: crm_get_peer: Created entry 5886699f-30f8-4b7d-9d38-4e661d99b2d9/0x7fa985e33b70 for node node1/0 (2 total) Nov 26 19:07:09 node2 crmd[3523]: notice: crmd_client_status_callback: Status update: Client node1/crmd now has status [leave] (DC=true) Nov 26 19:07:10 node2 cib[3519]: info: crm_get_peer: Node 0 has uuid 00000432-0432-0000-2b91-000000000000 Nov 26 19:07:10 node2 cib[3519]: info: crm_update_peer_proc: cib_client_status_callback: Node node1[0] - cib is now online Nov 26 19:07:10 node2 cib[3519]: info: cib_client_status_callback: Status update: Client node1/cib now has status [leave] Nov 26 19:07:10 node2 cib[3519]: info: crm_update_peer_proc: cib_client_status_callback: Node node1[0] - cib is now offline Nov 26 19:07:10 node2 cib[3519]: info: cib_client_status_callback: Status update: Client node1/cib now has status [join] Nov 26 19:07:10 node2 cib[3519]: info: crm_update_peer_proc: cib_client_status_callback: Node node1[0] - cib is now online Nov 26 19:07:10 node2 cib[3519]: info: cib_process_request: Completed cib_modify operation for section status: OK (rc=0, origin=local/crmd/58, version=1.3.31) Nov 26 19:07:10 node2 cib[3519]: info: cib_process_request: Completed cib_modify operation for section status: OK (rc=0, origin=local/crmd/59, version=1.3.31) Nov 26 19:07:10 node2 cib[3519]: info: cib_process_request: Completed cib_modify operation for section status: OK (rc=0, origin=local/crmd/60, version=1.3.31) as part of CCM processing, the DC considers the peer node for membership & marks its crmd online Nov 26 19:07:12 node2 ccm: [3518]: debug: quorum plugin: majority Nov 26 19:07:12 node2 ccm: [3518]: debug: cluster:linux-ha, member_count=2, member_quorum_votes=200 Nov 26 19:07:12 node2 ccm: [3518]: debug: total_node_count=2, total_quorum_votes=200 Nov 26 19:07:12 node2 crmd[3523]: info: crmd_ccm_msg_callback: Quorum (re)attained after event=NEW MEMBERSHIP (id=2) Nov 26 19:07:12 node2 cib[3519]: info: cib_ccm_msg_callback: Processing CCM event=NEW MEMBERSHIP (id=2) Nov 26 19:07:12 node2 crmd[3523]: info: ccm_event_detail: NEW MEMBERSHIP: trans=2, nodes=2, new=1, lost=0 n_idx=0, new_idx=2, old_idx=4 Nov 26 19:07:12 node2 crmd[3523]: info: ccm_event_detail: #011CURRENT: node2 [nodeid=1, born=1] Nov 26 19:07:12 node2 cib[3519]: notice: crm_update_peer_state: crm_update_ccm_node: Node node1[0] - state is now member (was (null)) Nov 26 19:07:12 node2 crmd[3523]: info: ccm_event_detail: #011CURRENT: node1 [nodeid=0, born=2] Nov 26 19:07:12 node2 crmd[3523]: info: ccm_event_detail: #011NEW: node1 [nodeid=0, born=2] Nov 26 19:07:12 node2 crmd[3523]: notice: crm_update_peer_state: crm_update_ccm_node: Node node1[0] - state is now member (was lost) Nov 26 19:07:12 node2 cib[3519]: info: crm_update_peer_proc: crm_update_ccm_node: Node node1[0] - heartbeat is now online Nov 26 19:07:12 node2 cib[3519]: info: crm_update_peer_proc: crm_update_ccm_node: Node node1[0] - crmd is now online Nov 26 19:07:12 node2 crmd[3523]: crit: peer_update_callback: node1 is now member (was lost) Nov 26 19:07:12 node2 crmd[3523]: crit: peer_update_callback: Alive=0, appear=1, down=(nil) Nov 26 19:07:12 node2 crmd[3523]: crit: peer_update_callback: Other (nil) Nov 26 19:07:12 node2 crmd[3523]: info: crm_update_peer_proc: crm_update_ccm_node: Node node1[0] - crmd is now online Nov 26 19:07:12 node2 cib[3519]: info: cib_process_request: Completed cib_modify operation for section status: OK (rc=0, origin=local/crmd/61, version=1.3.31) Nov 26 19:07:12 node2 crmd[3523]: crit: peer_update_callback: Client node1/peer now has status [online] (DC=true) Nov 26 19:07:12 node2 crmd[3523]: crit: peer_update_callback: Alive=1, appear=1, down=(nil) Nov 26 19:07:12 node2 crmd[3523]: crit: peer_update_callback: Other (nil) subsequently this triggers the join processing. but since the peer node crmd has not yet connected to heartbeat cluster it cannot get the join-offers that is being sent by the DC Nov 26 19:07:12 node2 crmd[3523]: debug: post_cache_update: Updated cache after membership event 2. Nov 26 19:07:12 node2 crmd[3523]: debug: post_cache_update: post_cache_update added action A_ELECTION_CHECK to the FSA Nov 26 19:07:12 node2 crmd[3523]: crit: s_crmd_fsa: Processing I_NODE_JOIN: [ state=S_IDLE cause=C_FSA_INTERNAL origin=peer_update_callback ] Nov 26 19:07:12 node2 crmd[3523]: notice: do_state_transition: State transition S_IDLE -> S_INTEGRATION [ input=I_NODE_JOIN cause=C_FSA_INTERNAL origin=peer_update_callback ] Nov 26 19:07:12 node2 crmd[3523]: debug: crm_timer_start: Started Integration Timer (I_INTEGRATED:180000ms), src=95 Nov 26 19:07:12 node2 crmd[3523]: debug: do_election_check: Ignore election check: we not in an election Nov 26 19:07:12 node2 crmd[3523]: crit: do_dc_join_offer_one: An unknown node joined - (re-)offer to any unconfirmed nodes Nov 26 19:07:12 node2 crmd[3523]: crit: join_make_offer: about to make an offer to node2 Nov 26 19:07:12 node2 crmd[3523]: crit: join_make_offer: Making join offers based on membership 2 Nov 26 19:07:12 node2 crmd[3523]: crit: join_make_offer: Skipping node2: already known 4 Nov 26 19:07:12 node2 crmd[3523]: crit: join_make_offer: about to make an offer to node1 Nov 26 19:07:12 node2 crmd[3523]: crit: join_make_offer: join-1: Sending offer to node1 Nov 26 19:07:12 node2 crmd[3523]: info: crm_update_peer_join: join_make_offer: Node node1[0] - join-1 phase 0 -> 1 Nov 26 19:07:12 node2 crmd[3523]: debug: check_join_state: Invoked by do_dc_join_offer_one in state: S_INTEGRATION Nov 26 19:07:12 node2 crmd[3523]: debug: do_te_invoke: Halting the transition: inactive Nov 26 19:07:12 node2 crmd[3523]: info: abort_transition_graph: do_te_invoke:158 - Triggered transition abort (complete=1) : Peer Halt Nov 26 19:07:12 node2 crmd[3523]: crit: s_crmd_fsa: Processing I_PE_CALC: [ state=S_INTEGRATION cause=C_FSA_INTERNAL origin=abort_transition_graph ] later crmd on the node that is coming up connects to heartbeat Nov 26 19:07:15 node1 crmd[2969]: notice: crm_cluster_connect: Connecting to cluster infrastructure: heartbeat Nov 26 19:07:15 node1 crmd[2969]: debug: register_heartbeat_conn: Signing in with Heartbeat Nov 26 19:07:15 node1 crmd[2969]: info: crm_get_peer: Created entry 75b26770-ac15-461c-889d-190e6c8139ac/0x7f2fd194f870 for node node1/0 (1 total) Nov 26 19:07:15 node1 crmd[2969]: crit: peer_update_callback: node1 is now (null) Nov 26 19:07:15 node1 crmd[2969]: info: crm_get_peer: Node 0 has uuid 00000432-0432-0000-2b91-000000000000 Nov 26 19:07:15 node1 crmd[2969]: info: register_heartbeat_conn: Hostname: node1 Nov 26 19:07:15 node1 crmd[2969]: info: register_heartbeat_conn: UUID: 00000432-0432-0000-2b91-000000000000 & at this point heartbeat delivers a notification into crmd as part of which crmd wrongly moves the status of the incoming node as offline/removing out of membership Nov 26 19:07:15 node2 crmd[3523]: notice: crmd_client_status_callback: Status update: Client node1/crmd now has status [join] (DC=true) Nov 26 19:07:15 node2 crmd[3523]: info: crm_update_peer_proc: crmd_client_status_callback: Node node1[0] - crmd is now join Nov 26 19:07:15 node2 crmd[3523]: crit: peer_update_callback: Client node1/peer now has status [offline] (DC=true) Nov 26 19:07:15 node2 crmd[3523]: warning: match_down_event: No match for shutdown action on 00000432-0432-0000-2b91-000000000000 Nov 26 19:07:15 node2 crmd[3523]: crit: peer_update_callback: Alive=0, appear=0, down=(nil) Nov 26 19:07:15 node2 crmd[3523]: notice: peer_update_callback: Stonith/shutdown of node1 not matched Nov 26 19:07:15 node2 crmd[3523]: info: crm_update_peer_join: peer_update_callback: Node node1[0] - join-1 phase 1 -> 0 Nov 26 19:07:15 node2 crmd[3523]: debug: check_join_state: Invoked by peer_update_callback in state: S_INTEGRATION Nov 26 19:07:15 node2 crmd[3523]: debug: check_join_state: join-1: Integration of 0 peers complete: peer_update_callback Nov 26 19:07:15 node2 crmd[3523]: info: abort_transition_graph: peer_update_callback:214 - Triggered transition abort (complete=1) : Node failure so bottom-line due to a timing issue when crmd connects into hearbeat cluster we wrongly marked in the DC that the incoming node is not a member. this causes any message from incoming node from this point onwards to be ignored like Nov 26 19:07:18 node2 crmd[3523]: debug: crmd_ha_msg_callback: Ignoring HA message (op=noop) from node1: not in our membership list (size=1) Nov 26 19:07:19 node2 crmd[3523]: warning: crmd_ha_msg_callback: Ignoring HA message (op=join_announce) from node1: not in our membership list (size=1) --Shyam On Mon, Nov 23, 2015 at 9:30 PM, Shyam <shyam.kaus...@gmail.com> wrote: > One note on this. > > This problem doesnt happen if > > Nov 19 08:36:30 node1 crmd[3298]: notice: crmd_client_status_callback: > Status update: Client node2/crmd now has status [join] (DC=true) > Nov 19 08:36:30 node1 crmd[3298]: notice: crmd_client_status_callback: > Status update: Client node2/crmd now has status [leave] (DC=true) > Nov 19 08:36:31 node1 crmd[3298]: notice: crmd_client_status_callback: > Status update: Client node2/crmd now has status [join] (DC=true) > > join happens before > Nov 19 08:36:34 node1 crmd[3298]: notice: do_state_transition: State > transition S_IDLE -> S_INTEGRATION [ input=I_NODE_JOIN cause=C_FSA_INTERNAL > origin=peer_update_callback ] > > I am not sure why crmd join/leave & then join happens. With previous > version of heartbeat/pacemaker this doesnt happen. > > here are more logs when the problem doesnt happen > > Nov 19 08:36:29 node1 heartbeat: [3143]: info: Link node2:eth0 up. > Nov 19 08:36:29 node1 heartbeat: [3143]: info: Status update for node > node2: status init > Nov 19 08:36:29 node1 crmd[3298]: notice: crmd_ha_status_callback: > Status update: Node node2 now has status [init] > Nov 19 08:36:29 node1 crmd[3298]: notice: crmd_ha_status_callback: > Status update: Node node2 now has status [up] > Nov 19 08:36:30 node1 heartbeat: [3143]: debug: get_delnodelist: > delnodelist= > Nov 19 08:36:30 node1 heartbeat: [3143]: info: Status update for node > node2: status active > Nov 19 08:36:30 node1 crmd[3298]: notice: crmd_ha_status_callback: > Status update: Node node2 now has status [active] > Nov 19 08:36:30 node1 crmd[3298]: notice: crmd_client_status_callback: > Status update: Client node2/crmd now has status [join] (DC=true) > Nov 19 08:36:30 node1 crmd[3298]: notice: crmd_client_status_callback: > Status update: Client node2/crmd now has status [leave] (DC=true) > Nov 19 08:36:31 node1 crmd[3298]: notice: crmd_client_status_callback: > Status update: Client node2/crmd now has status [join] (DC=true) > Nov 19 08:36:34 node1 ccm: [3293]: debug: quorum plugin: majority > Nov 19 08:36:34 node1 ccm: [3293]: debug: cluster:linux-ha, > member_count=2, member_quorum_votes=200 > Nov 19 08:36:34 node1 ccm: [3293]: debug: total_node_count=2, > total_quorum_votes=200 > Nov 19 08:36:34 node1 cib[3294]: notice: crm_update_peer_state: > crm_update_ccm_node: Node node2[1] - state is now member (was (null)) > Nov 19 08:36:34 node1 crmd[3298]: notice: crm_update_peer_state: > crm_update_ccm_node: Node node2[1] - state is now member (was lost) > Nov 19 08:36:34 node1 crmd[3298]: notice: do_state_transition: State > transition S_IDLE -> S_INTEGRATION [ input=I_NODE_JOIN cause=C_FSA_INTERNAL > origin=peer_update_callback ] > > after the above, resources are started alright > Nov 19 08:36:40 node1 attrd[3297]: notice: attrd_local_callback: Sending > full refresh (origin=crmd) > Nov 19 08:36:40 node1 attrd[3297]: notice: attrd_trigger_update: Sending > flush op to all hosts for: probe_complete (true) > Nov 19 08:36:40 node1 attrd[3297]: notice: attrd_trigger_update: Sending > flush op to all hosts for: master-MYSQL:0 (100) > Nov 19 08:36:41 node1 pengine[3299]: notice: unpack_config: On loss of > CCM Quorum: Ignore > Nov 19 08:36:41 node1 pengine[3299]: notice: LogActions: Start > IPADDR:1#011(node2) > > Any help/pointers greatly apprecited. > > Thanks. > > --Shyam > > On Mon, Nov 23, 2015 at 12:14 PM, Shyam <shyam.kaus...@gmail.com> wrote: > >> Hi All, >> >> Need a help w.r.t. a timing issue that I hit when using pacemaker-1.1.10 >> with heartbeat-3.0.5 on Ubuntu Trusty. >> >> As seen in logs below, issue appears to be w.r.t. membership that CRMD >> looks at that it wrongly moves to INTEGRATION phase when the peer CRMD >> hasnt joined the cluster fully yet. >> >> We have a 2 node cluster & occasionally when one of the node comes up >> after reboot it keeps remaining in S_PENDING state/until >> heartbeat/pacemaker on that node is restarted. This happens periodically & >> not always. >> >> Below logs shows the problem clearly >> >> ndoe1 has been running & sees node2 coming up >> >> Nov 20 09:09:10 node1 crmd[3905]: notice: crmd_ha_status_callback: >> Status update: Node node2 now has status [init] >> Nov 20 09:09:10 node1 crmd[3905]: notice: crmd_ha_status_callback: >> Status update: Node node2 now has status [up] >> Nov 20 09:09:10 node1 heartbeat: [2943]: info: Link node2:eth0 up. >> Nov 20 09:09:10 node1 heartbeat: [2943]: info: Status update for node >> node2: status init >> Nov 20 09:09:10 node1 heartbeat: [2943]: info: Status update for node >> node2: status up >> Nov 20 09:09:10 node1 heartbeat: [2943]: debug: get_delnodelist: >> delnodelist= >> Nov 20 09:09:10 node1 heartbeat: [2943]: info: all clients are now paused >> Nov 20 09:09:10 node1 crmd[3905]: notice: crmd_ha_status_callback: >> Status update: Node node2 now has status [active] >> Nov 20 09:09:10 node1 crmd[3905]: notice: crmd_client_status_callback: >> Status update: Client node2/crmd now has status [join] (DC=true) >> Nov 20 09:09:10 node1 heartbeat: [2943]: info: Status update for node >> node2: status active >> Nov 20 09:09:10 node1 crmd[3905]: notice: crmd_client_status_callback: >> Status update: Client node2/crmd now has status [leave] (DC=true) >> Nov 20 09:09:11 node1 heartbeat: [2943]: info: all clients are now resumed >> >> >> as can be seen above, w.r.t. CRMD on peer, it registered at heartbeat & >> de-registered (not sure why) >> Nov 20 09:09:10 node1 crmd[3905]: notice: crmd_client_status_callback: >> Status update: Client node2/crmd now has status [join] (DC=true) >> Nov 20 09:09:10 node1 crmd[3905]: notice: crmd_client_status_callback: >> Status update: Client node2/crmd now has status [leave] (DC=true) >> >> >> while this is the state CRMD on node1 moves to INTEGRATION phase & there >> are join requests flowing through, but presume I cannot be handled by peer >> >> Nov 20 09:09:13 node1 cib[3901]: notice: crm_update_peer_state: >> crm_update_ccm_node: Node node2[0] - state is now member (was (null)) >> Nov 20 09:09:13 node1 crmd[3905]: notice: crm_update_peer_state: >> crm_update_ccm_node: Node node2[0] - state is now member (was lost) >> Nov 20 09:09:13 node1 crmd[3905]: notice: do_state_transition: State >> transition S_IDLE -> S_INTEGRATION [ input=I_NODE_JOIN cause=C_FSA_INTERNAL >> origin=peer_update_callback ] >> Nov 20 09:09:13 node1 ccm: [3900]: debug: quorum plugin: majority >> Nov 20 09:09:13 node1 ccm: [3900]: debug: cluster:linux-ha, >> member_count=2, member_quorum_votes=200 >> Nov 20 09:09:13 node1 ccm: [3900]: debug: total_node_count=2, >> total_quorum_votes=200 >> Nov 20 09:09:13 node1 crmd[3905]: notice: crmd_client_status_callback: >> Status update: Client node2/crmd now has status [join] (DC=true) >> Nov 20 09:09:13 node1 crmd[3905]: warning: match_down_event: No match >> for shutdown action on 00000432-0432-0000-2b91-000000000000 >> Nov 20 09:09:13 node1 crmd[3905]: notice: peer_update_callback: >> Stonith/shutdown of node2 not matched >> Nov 20 09:09:13 node1 attrd[3904]: notice: attrd_local_callback: >> Sending full refresh (origin=crmd) >> Nov 20 09:09:13 node1 attrd[3904]: notice: attrd_trigger_update: >> Sending flush op to all hosts for: probe_complete (true) >> Nov 20 09:09:13 node1 attrd[3904]: notice: attrd_trigger_update: >> Sending flush op to all hosts for: master-MYSQL:0 (100) >> Nov 20 09:09:13 node1 pengine[3906]: notice: unpack_config: On loss of >> CCM Quorum: Ignore >> Nov 20 09:09:13 node1 pengine[3906]: warning: custom_action: Action >> IPADDR:0_monitor_0 on node2 is unrunnable (pending) >> >> >> As can be seen we had >> Nov 20 09:09:13 node1 crmd[3905]: notice: do_state_transition: State >> transition S_IDLE -> S_INTEGRATION [ input=I_NODE_JOIN cause=C_FSA_INTERNAL >> >> & only then CRMD on peer seems to be alright >> Nov 20 09:09:13 node1 crmd[3905]: notice: crmd_client_status_callback: >> Status update: Client node2/crmd now has status [join] (DC=true) >> >> so pengine decided not being able to run any resources & we are stuck in >> this state. While node2 has this trace >> Nov 20 09:09:13 node2 cib[2964]: notice: cib_server_process_diff: Not >> applying diff 1.3.33 -> 1.3.34 (sync in progress) >> Nov 20 09:09:13 node2 cib[2964]: notice: crm_update_peer_state: >> crm_update_ccm_node: Node node1[1] - state is now member (was (null)) >> Nov 20 09:09:13 node2 cib[2964]: notice: crm_update_peer_state: >> crm_update_ccm_node: Node node2[0] - state is now member (was (null)) >> Nov 20 09:09:13 node2 crmd[2968]: notice: crm_cluster_connect: >> Connecting to cluster infrastructure: heartbeat >> Nov 20 09:09:14 node2 heartbeat: [2938]: info: the send queue length from >> heartbeat to client crmd is set to 1024 >> Nov 20 09:09:15 node2 crmd[2968]: notice: crmd_client_status_callback: >> Status update: Client node2/crmd now has status [join] (DC=false) >> Nov 20 09:09:15 node2 crmd[2968]: notice: crmd_client_status_callback: >> Status update: Client node2/crmd now has status [online] (DC=false) >> Nov 20 09:09:15 node2 crmd[2968]: notice: crmd_client_status_callback: >> Status update: Client node1/crmd now has status [online] (DC=false) >> Nov 20 09:09:16 node2 crmd[2968]: notice: crm_update_peer_state: >> crm_update_ccm_node: Node node1[1] - state is now member (was (null)) >> Nov 20 09:09:16 node2 crmd[2968]: notice: crm_update_peer_state: >> crm_update_ccm_node: Node node2[0] - state is now member (was (null)) >> Nov 20 09:09:16 node2 crmd[2968]: notice: do_started: The local CRM is >> operational >> Nov 20 09:09:16 node2 crmd[2968]: notice: do_state_transition: State >> transition S_STARTING -> S_PENDING [ input=I_PENDING cause=C_FSA_INTERNAL >> origin=do_started ] >> Nov 20 09:09:27 node2 crmd[2968]: warning: do_log: FSA: Input >> I_DC_TIMEOUT from crm_timer_popped() received in state S_PENDING >> >> >> i.e. it moves to S_PENDING state & keeps receiving I_DC_TIMEOUT >> >> I tried using latest heartbeat, but that doesnt seem to be the problem. >> Can anyone suggest if this issue has already been fixed in latest pacemaker >> or any other suggestions how to debug this issue? >> >> If I enable higher debug level (both in heartbeat/pacemaker), this >> problem doesnt show up. Any help/pointers on how to go forward is greatly >> appreciated. >> >> Thanks! >> >> --Shyam >> > >
_______________________________________________ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org