Hi, I don’t get my cluster running. I had problems with an OCFS2 Volume, both nodes have been fenced. When I do now a “systemctl start pacemaker.service”, crm_mon shows for a few seconds both nodes as UNCLEAN, then pacemaker stops. I try to confirm the fendcing with “Stonith_admin –C”, but it doesn’t work. Maybe time is to short, pacemaker is just running for a few seconds.
Here is the log: Mar 10 19:36:24 [31037] ha-idg-1 corosync notice [MAIN ] Corosync Cluster Engine ('2.3.6'): started and ready to provide service. Mar 10 19:36:24 [31037] ha-idg-1 corosync info [MAIN ] Corosync built-in features: debug testagents augeas systemd pie relro bindnow Mar 10 19:36:24 [31037] ha-idg-1 corosync notice [TOTEM ] Initializing transport (UDP/IP Multicast). Mar 10 19:36:24 [31037] ha-idg-1 corosync notice [TOTEM ] Initializing transmit/receive security (NSS) crypto: aes256 hash: sha1 Mar 10 19:36:25 [31037] ha-idg-1 corosync notice [TOTEM ] The network interface [192.168.100.10] is now up. Mar 10 19:36:25 [31037] ha-idg-1 corosync notice [SERV ] Service engine loaded: corosync configuration map access [0] Mar 10 19:36:25 [31037] ha-idg-1 corosync info [QB ] server name: cmap Mar 10 19:36:25 [31037] ha-idg-1 corosync notice [SERV ] Service engine loaded: corosync configuration service [1] Mar 10 19:36:25 [31037] ha-idg-1 corosync info [QB ] server name: cfg Mar 10 19:36:25 [31037] ha-idg-1 corosync notice [SERV ] Service engine loaded: corosync cluster closed process group service v1.01 [2] Mar 10 19:36:25 [31037] ha-idg-1 corosync info [QB ] server name: cpg Mar 10 19:36:25 [31037] ha-idg-1 corosync notice [SERV ] Service engine loaded: corosync profile loading service [4] Mar 10 19:36:25 [31037] ha-idg-1 corosync notice [QUORUM] Using quorum provider corosync_votequorum Mar 10 19:36:25 [31037] ha-idg-1 corosync notice [QUORUM] This node is within the primary component and will provide service. Mar 10 19:36:25 [31037] ha-idg-1 corosync notice [QUORUM] Members[0]: Mar 10 19:36:25 [31037] ha-idg-1 corosync notice [SERV ] Service engine loaded: corosync vote quorum service v1.0 [5] Mar 10 19:36:25 [31037] ha-idg-1 corosync info [QB ] server name: votequorum Mar 10 19:36:25 [31037] ha-idg-1 corosync notice [SERV ] Service engine loaded: corosync cluster quorum service v0.1 [3] Mar 10 19:36:25 [31037] ha-idg-1 corosync info [QB ] server name: quorum Mar 10 19:36:25 [31037] ha-idg-1 corosync notice [TOTEM ] A new membership (192.168.100.10:2340) was formed. Members joined: 1084777482 Mar 10 19:36:25 [31037] ha-idg-1 corosync notice [QUORUM] Members[1]: 1084777482 Mar 10 19:36:25 [31037] ha-idg-1 corosync notice [MAIN ] Completed service synchronization, ready to provide service. Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd: notice: main: Starting Pacemaker 1.1.24+20210811.f5abda0ee-3.27.1 | build=1.1.24+20210811.f5abda0ee features: generated-manpages agent-manp ages ncurses libqb-logging libqb-ipc lha-fencing systemd nagios corosync-native atomic-attrd snmp libesmtp acls cibsecrets Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd: info: main: Maximum core file size is: 18446744073709551615 Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd: info: qb_ipcs_us_publish: server name: pacemakerd Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd: info: pcmk__ipc_is_authentic_process_active: Could not connect to lrmd IPC: Connection refused Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd: info: pcmk__ipc_is_authentic_process_active: Could not connect to cib_ro IPC: Connection refused Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd: info: pcmk__ipc_is_authentic_process_active: Could not connect to crmd IPC: Connection refused Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd: info: pcmk__ipc_is_authentic_process_active: Could not connect to attrd IPC: Connection refused Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd: info: pcmk__ipc_is_authentic_process_active: Could not connect to pengine IPC: Connection refused Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd: info: pcmk__ipc_is_authentic_process_active: Could not connect to stonith-ng IPC: Connection refused Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd: info: corosync_node_name: Unable to get node name for nodeid 1084777482 Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd: notice: get_node_name: Could not obtain a node name for corosync nodeid 1084777482 Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd: info: crm_get_peer: Created entry 3c2499de-58a8-44f7-bf1e-03ff1fbec774/0x1456550 for node (null)/1084777482 (1 total) Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd: info: crm_get_peer: Node 1084777482 has uuid 1084777482 Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd: info: crm_update_peer_proc: cluster_connect_cpg: Node (null)[1084777482] - corosync-cpg is now online Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd: notice: cluster_connect_quorum: Quorum acquired Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd: info: corosync_node_name: Unable to get node name for nodeid 1084777482 Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd: notice: get_node_name: Defaulting to uname -n for the local corosync node name Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd: info: crm_get_peer: Node 1084777482 is now known as ha-idg-1 Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd: info: start_child: Using uid=90 and group=90 for process cib Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd: info: start_child: Forked child 31045 for process cib Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd: info: start_child: Forked child 31046 for process stonith-ng Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd: info: start_child: Forked child 31047 for process lrmd Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd: info: start_child: Using uid=90 and group=90 for process attrd Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd: info: start_child: Forked child 31048 for process attrd Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd: info: start_child: Using uid=90 and group=90 for process pengine Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd: info: start_child: Forked child 31049 for process pengine Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd: info: start_child: Using uid=90 and group=90 for process crmd Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd: info: start_child: Forked child 31050 for process crmd Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd: info: main: Starting mainloop Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd: info: pcmk_quorum_notification: Quorum retained | membership=2340 members=1 Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd: notice: crm_update_peer_state_iter: Node ha-idg-1 state is now member | nodeid=1084777482 previous=unknown source=pcmk_quorum_notification Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd: info: pcmk_cpg_membership: Group pacemakerd event 0: node 1084777482 pid 31044 joined via cpg_join Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd: info: pcmk_cpg_membership: Group pacemakerd event 0: ha-idg-1 (node 1084777482 pid 31044) is member Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd: info: mcp_cpg_deliver: Ignoring process list sent by peer for local node Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd: info: mcp_cpg_deliver: Ignoring process list sent by peer for local node Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd: info: mcp_cpg_deliver: Ignoring process list sent by peer for local node Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd: info: mcp_cpg_deliver: Ignoring process list sent by peer for local node Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd: info: mcp_cpg_deliver: Ignoring process list sent by peer for local node Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd: info: mcp_cpg_deliver: Ignoring process list sent by peer for local node Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd: info: mcp_cpg_deliver: Ignoring process list sent by peer for local node Mar 10 19:36:25 [31045] ha-idg-1 cib: info: crm_log_init: Changed active directory to /var/lib/pacemaker/cores Mar 10 19:36:25 [31049] ha-idg-1 pengine: info: crm_log_init: Changed active directory to /var/lib/pacemaker/cores Mar 10 19:36:25 [31049] ha-idg-1 pengine: info: qb_ipcs_us_publish: server name: pengine Mar 10 19:36:25 [31045] ha-idg-1 cib: info: get_cluster_type: Verifying cluster type: 'corosync' Mar 10 19:36:25 [31048] ha-idg-1 attrd: info: crm_log_init: Changed active directory to /var/lib/pacemaker/cores Mar 10 19:36:25 [31045] ha-idg-1 cib: info: get_cluster_type: Assuming an active 'corosync' cluster Mar 10 19:36:25 [31049] ha-idg-1 pengine: info: main: Starting pengine Mar 10 19:36:25 [31048] ha-idg-1 attrd: info: main: Starting up Mar 10 19:36:25 [31045] ha-idg-1 cib: info: retrieveCib: Reading cluster configuration file /var/lib/pacemaker/cib/cib.xml (digest: /var/lib/pacemaker/cib/cib.xml.sig) Mar 10 19:36:25 [31048] ha-idg-1 attrd: info: get_cluster_type: Verifying cluster type: 'corosync' Mar 10 19:36:25 [31048] ha-idg-1 attrd: info: get_cluster_type: Assuming an active 'corosync' cluster Mar 10 19:36:25 [31048] ha-idg-1 attrd: notice: crm_cluster_connect: Connecting to cluster infrastructure: corosync Mar 10 19:36:25 [31046] ha-idg-1 stonith-ng: info: crm_log_init: Changed active directory to /var/lib/pacemaker/cores Mar 10 19:36:25 [31046] ha-idg-1 stonith-ng: info: get_cluster_type: Verifying cluster type: 'corosync' Mar 10 19:36:25 [31046] ha-idg-1 stonith-ng: info: get_cluster_type: Assuming an active 'corosync' cluster Mar 10 19:36:25 [31046] ha-idg-1 stonith-ng: notice: crm_cluster_connect: Connecting to cluster infrastructure: corosync Mar 10 19:36:25 [31047] ha-idg-1 lrmd: info: crm_log_init: Changed active directory to /var/lib/pacemaker/cores Mar 10 19:36:25 [31047] ha-idg-1 lrmd: info: qb_ipcs_us_publish: server name: lrmd Mar 10 19:36:25 [31047] ha-idg-1 lrmd: info: main: Starting Mar 10 19:36:25 [31050] ha-idg-1 crmd: info: crm_log_init: Changed active directory to /var/lib/pacemaker/cores Mar 10 19:36:25 [31050] ha-idg-1 crmd: info: main: CRM Git Version: 1.1.24+20210811.f5abda0ee-3.27.1 (1.1.24+20210811.f5abda0ee) Mar 10 19:36:25 [31050] ha-idg-1 crmd: info: get_cluster_type: Verifying cluster type: 'corosync' Mar 10 19:36:25 [31050] ha-idg-1 crmd: info: get_cluster_type: Assuming an active 'corosync' cluster Mar 10 19:36:25 [31050] ha-idg-1 crmd: warning: log_deprecation_warnings: Compile-time support for crm_mon SNMP options is deprecated and will be removed in a future release (configure alerts instead) Mar 10 19:36:25 [31050] ha-idg-1 crmd: warning: log_deprecation_warnings: Compile-time support for crm_mon SMTP options is deprecated and will be removed in a future release (configure alerts instead) Mar 10 19:36:25 [31050] ha-idg-1 crmd: info: do_log: Input I_STARTUP received in state S_STARTING from crmd_init Mar 10 19:36:25 [31045] ha-idg-1 cib: info: validate_with_relaxng: Creating RNG parser context Mar 10 19:36:25 [31048] ha-idg-1 attrd: info: corosync_node_name: Unable to get node name for nodeid 1084777482 ⇐========= this happens quite often Mar 10 19:36:25 [31048] ha-idg-1 attrd: notice: get_node_name: Could not obtain a node name for corosync nodeid 1084777482 Mar 10 19:36:25 [31048] ha-idg-1 attrd: info: crm_get_peer: Created entry c1bd522c-34da-49b3-97cb-22fd4580959b/0x109e210 for node (null)/1084777482 (1 total) Mar 10 19:36:25 [31048] ha-idg-1 attrd: info: crm_get_peer: Node 1084777482 has uuid 1084777482 Mar 10 19:36:25 [31048] ha-idg-1 attrd: info: crm_update_peer_proc: cluster_connect_cpg: Node (null)[1084777482] - corosync-cpg is now online Mar 10 19:36:25 [31048] ha-idg-1 attrd: notice: crm_update_peer_state_iter: Node (null) state is now member | nodeid=1084777482 previous=unknown source=crm_update_peer_proc Mar 10 19:36:25 [31048] ha-idg-1 attrd: info: init_cs_connection_once: Connection to 'corosync': established Mar 10 19:36:25 [31046] ha-idg-1 stonith-ng: info: corosync_node_name: Unable to get node name for nodeid 1084777482 Mar 10 19:36:25 [31046] ha-idg-1 stonith-ng: notice: get_node_name: Could not obtain a node name for corosync nodeid 1084777482 Mar 10 19:36:25 [31046] ha-idg-1 stonith-ng: info: crm_get_peer: Created entry 1d232d33-d274-415d-be94-765dc1b4e1e4/0x9478d0 for node (null)/1084777482 (1 total) Mar 10 19:36:25 [31046] ha-idg-1 stonith-ng: info: crm_get_peer: Node 1084777482 has uuid 1084777482 Mar 10 19:36:25 [31046] ha-idg-1 stonith-ng: info: crm_update_peer_proc: cluster_connect_cpg: Node (null)[1084777482] - corosync-cpg is now online Mar 10 19:36:25 [31046] ha-idg-1 stonith-ng: notice: crm_update_peer_state_iter: Node (null) state is now member | nodeid=1084777482 previous=unknown source=crm_update_peer_proc Mar 10 19:36:25 [31045] ha-idg-1 cib: info: startCib: CIB Initialization completed successfully Mar 10 19:36:25 [31045] ha-idg-1 cib: notice: crm_cluster_connect: Connecting to cluster infrastructure: corosync Mar 10 19:36:25 [31048] ha-idg-1 attrd: info: corosync_node_name: Unable to get node name for nodeid 1084777482 Mar 10 19:36:25 [31048] ha-idg-1 attrd: notice: get_node_name: Defaulting to uname -n for the local corosync node name Mar 10 19:36:25 [31048] ha-idg-1 attrd: info: crm_get_peer: Node 1084777482 is now known as ha-idg-1 Mar 10 19:36:25 [31046] ha-idg-1 stonith-ng: info: corosync_node_name: Unable to get node name for nodeid 1084777482 Mar 10 19:36:25 [31046] ha-idg-1 stonith-ng: notice: get_node_name: Defaulting to uname -n for the local corosync node name Mar 10 19:36:25 [31046] ha-idg-1 stonith-ng: info: init_cs_connection_once: Connection to 'corosync': established Mar 10 19:36:25 [31045] ha-idg-1 cib: info: corosync_node_name: Unable to get node name for nodeid 1084777482 Mar 10 19:36:25 [31045] ha-idg-1 cib: notice: get_node_name: Could not obtain a node name for corosync nodeid 1084777482 Mar 10 19:36:25 [31048] ha-idg-1 attrd: info: main: Cluster connection active Mar 10 19:36:25 [31045] ha-idg-1 cib: info: crm_get_peer: Created entry 7c2b1d3d-0ab6-4fa6-887c-5d01e5927a67/0x147af10 for node (null)/1084777482 (1 total) Mar 10 19:36:25 [31045] ha-idg-1 cib: info: crm_get_peer: Node 1084777482 has uuid 1084777482 Mar 10 19:36:25 [31045] ha-idg-1 cib: info: crm_update_peer_proc: cluster_connect_cpg: Node (null)[1084777482] - corosync-cpg is now online Mar 10 19:36:25 [31045] ha-idg-1 cib: notice: crm_update_peer_state_iter: Node (null) state is now member | nodeid=1084777482 previous=unknown source=crm_update_peer_proc Mar 10 19:36:25 [31045] ha-idg-1 cib: info: init_cs_connection_once: Connection to 'corosync': established Mar 10 19:36:25 [31046] ha-idg-1 stonith-ng: info: corosync_node_name: Unable to get node name for nodeid 1084777482 Mar 10 19:36:25 [31046] ha-idg-1 stonith-ng: notice: get_node_name: Defaulting to uname -n for the local corosync node name Mar 10 19:36:25 [31046] ha-idg-1 stonith-ng: info: crm_get_peer: Node 1084777482 is now known as ha-idg-1 Mar 10 19:36:25 [31045] ha-idg-1 cib: info: corosync_node_name: Unable to get node name for nodeid 1084777482 Mar 10 19:36:25 [31045] ha-idg-1 cib: notice: get_node_name: Defaulting to uname -n for the local corosync node name Mar 10 19:36:25 [31045] ha-idg-1 cib: info: crm_get_peer: Node 1084777482 is now known as ha-idg-1 Mar 10 19:36:25 [31045] ha-idg-1 cib: info: qb_ipcs_us_publish: server name: cib_ro Mar 10 19:36:25 [31045] ha-idg-1 cib: info: qb_ipcs_us_publish: server name: cib_rw Mar 10 19:36:25 [31045] ha-idg-1 cib: info: qb_ipcs_us_publish: server name: cib_shm Mar 10 19:36:25 [31045] ha-idg-1 cib: info: cib_init: Starting cib mainloop Mar 10 19:36:25 [31045] ha-idg-1 cib: info: pcmk_cpg_membership: Group cib event 0: node 1084777482 pid 31045 joined via cpg_join Mar 10 19:36:25 [31045] ha-idg-1 cib: info: pcmk_cpg_membership: Group cib event 0: ha-idg-1 (node 1084777482 pid 31045) is member Mar 10 19:36:25 [31045] ha-idg-1 cib: info: cib_file_backup: Archived previous version as /var/lib/pacemaker/cib/cib-34.raw Mar 10 19:36:25 [31045] ha-idg-1 cib: info: cib_file_write_with_digest: Wrote version 7.29548.0 of the CIB to disk (digest: 03b4ec65319cef255d43fc1ec9d285a5) Mar 10 19:36:25 [31045] ha-idg-1 cib: info: cib_file_write_with_digest: Reading cluster configuration file /var/lib/pacemaker/cib/cib.MBy2v0 (digest: /var/lib/pacemaker/cib/cib.nDn0X9) Mar 10 19:36:26 [31050] ha-idg-1 crmd: info: do_cib_control: CIB connection established Mar 10 19:36:26 [31050] ha-idg-1 crmd: notice: crm_cluster_connect: Connecting to cluster infrastructure: corosync Mar 10 19:36:26 [31050] ha-idg-1 crmd: info: corosync_node_name: Unable to get node name for nodeid 1084777482 Mar 10 19:36:26 [31050] ha-idg-1 crmd: notice: get_node_name: Could not obtain a node name for corosync nodeid 1084777482 Mar 10 19:36:26 [31050] ha-idg-1 crmd: info: crm_get_peer: Created entry 873262c1-ede0-4ba7-97e6-53ead0a6d7b0/0x1613910 for node (null)/1084777482 (1 total) Mar 10 19:36:26 [31050] ha-idg-1 crmd: info: crm_get_peer: Node 1084777482 has uuid 1084777482 Mar 10 19:36:26 [31050] ha-idg-1 crmd: info: crm_update_peer_proc: cluster_connect_cpg: Node (null)[1084777482] - corosync-cpg is now online Mar 10 19:36:26 [31050] ha-idg-1 crmd: info: corosync_node_name: Unable to get node name for nodeid 1084777482 Mar 10 19:36:26 [31050] ha-idg-1 crmd: notice: get_node_name: Defaulting to uname -n for the local corosync node name Mar 10 19:36:26 [31050] ha-idg-1 crmd: info: init_cs_connection_once: Connection to 'corosync': established Mar 10 19:36:26 [31050] ha-idg-1 crmd: info: corosync_node_name: Unable to get node name for nodeid 1084777482 Mar 10 19:36:26 [31050] ha-idg-1 crmd: notice: get_node_name: Defaulting to uname -n for the local corosync node name Mar 10 19:36:26 [31050] ha-idg-1 crmd: info: crm_get_peer: Node 1084777482 is now known as ha-idg-1 Mar 10 19:36:26 [31050] ha-idg-1 crmd: info: peer_update_callback: Cluster node ha-idg-1 is now in unknown state ⇐===== is that the problem ? Mar 10 19:36:26 [31048] ha-idg-1 attrd: info: attrd_erase_attrs: Clearing transient attributes from CIB | xpath=//node_state[@uname='ha-idg-1']/transient_attributes Mar 10 19:36:26 [31048] ha-idg-1 attrd: info: attrd_start_election_if_needed: Starting an election to determine the writer Mar 10 19:36:26 [31045] ha-idg-1 cib: info: cib_process_request: Forwarding cib_delete operation for section //node_state[@uname='ha-idg-1']/transient_attributes to all (origin=local/attrd/2) Mar 10 19:36:26 [31048] ha-idg-1 attrd: info: corosync_node_name: Unable to get node name for nodeid 1084777482 Mar 10 19:36:26 [31048] ha-idg-1 attrd: notice: get_node_name: Defaulting to uname -n for the local corosync node name Mar 10 19:36:26 [31048] ha-idg-1 attrd: info: main: CIB connection active Mar 10 19:36:26 [31048] ha-idg-1 attrd: info: qb_ipcs_us_publish: server name: attrd Mar 10 19:36:26 [31048] ha-idg-1 attrd: info: main: Accepting attribute updates Mar 10 19:36:26 [31048] ha-idg-1 attrd: info: pcmk_cpg_membership: Group attrd event 0: node 1084777482 pid 31048 joined via cpg_join Mar 10 19:36:26 [31048] ha-idg-1 attrd: info: pcmk_cpg_membership: Group attrd event 0: ha-idg-1 (node 1084777482 pid 31048) is member Mar 10 19:36:26 [31045] ha-idg-1 cib: info: corosync_node_name: Unable to get node name for nodeid 1084777482 Mar 10 19:36:26 [31045] ha-idg-1 cib: notice: get_node_name: Defaulting to uname -n for the local corosync node name Mar 10 19:36:26 [31048] ha-idg-1 attrd: info: election_check: election-attrd won by local node Mar 10 19:36:26 [31048] ha-idg-1 attrd: notice: attrd_declare_winner: Recorded local node as attribute writer (was unset) Mar 10 19:36:26 [31048] ha-idg-1 attrd: info: attrd_peer_update: Setting #attrd-protocol[ha-idg-1]: (null) -> 2 from ha-idg-1 Mar 10 19:36:26 [31048] ha-idg-1 attrd: info: write_attribute: Processed 1 private change for #attrd-protocol, id=n/a, set=n/a Mar 10 19:36:26 [31046] ha-idg-1 stonith-ng: info: setup_cib: Watching for stonith topology changes Mar 10 19:36:26 [31046] ha-idg-1 stonith-ng: info: qb_ipcs_us_publish: server name: stonith-ng Mar 10 19:36:26 [31046] ha-idg-1 stonith-ng: info: main: Starting stonith-ng mainloop Mar 10 19:36:26 [31046] ha-idg-1 stonith-ng: info: pcmk_cpg_membership: Group stonith-ng event 0: node 1084777482 pid 31046 joined via cpg_join Mar 10 19:36:26 [31046] ha-idg-1 stonith-ng: info: pcmk_cpg_membership: Group stonith-ng event 0: ha-idg-1 (node 1084777482 pid 31046) is member Mar 10 19:36:26 [31050] ha-idg-1 crmd: notice: cluster_connect_quorum: Quorum acquired Mar 10 19:36:26 [31046] ha-idg-1 stonith-ng: info: init_cib_cache_cb: Updating device list from the cib: init Mar 10 19:36:26 [31046] ha-idg-1 stonith-ng: info: cib_devices_update: Updating devices to version 7.29548.0 Mar 10 19:36:26 [31046] ha-idg-1 stonith-ng: notice: unpack_config: On loss of CCM Quorum: Ignore Mar 10 19:36:26 [31045] ha-idg-1 cib: info: cib_process_request: Completed cib_delete operation for section //node_state[@uname='ha-idg-1']/transient_attributes: OK (rc=0, origin=ha-idg-1/attrd/2, version=7.29548.0) Mar 10 19:36:26 [31050] ha-idg-1 crmd: info: do_ha_control: Connected to the cluster Mar 10 19:36:26 [31045] ha-idg-1 cib: info: cib_process_request: Forwarding cib_modify operation for section nodes to all (origin=local/crmd/3) Mar 10 19:36:26 [31050] ha-idg-1 crmd: info: lrmd_ipc_connect: Connecting to lrmd Mar 10 19:36:26 [31050] ha-idg-1 crmd: info: do_lrm_control: LRM connection established Mar 10 19:36:26 [31050] ha-idg-1 crmd: info: do_started: Delaying start, no membership data (0000000000100000) Mar 10 19:36:26 [31050] ha-idg-1 crmd: info: pcmk_quorum_notification: Quorum retained | membership=2340 members=1 Mar 10 19:36:26 [31050] ha-idg-1 crmd: notice: crm_update_peer_state_iter: Node ha-idg-1 state is now member | nodeid=1084777482 previous=unknown source=pcmk_quorum_notification Mar 10 19:36:26 [31050] ha-idg-1 crmd: info: peer_update_callback: Cluster node ha-idg-1 is now member (was in unknown state) Mar 10 19:36:26 [31050] ha-idg-1 crmd: info: do_started: Delaying start, Config not read (0000000000000040) Mar 10 19:36:26 [31050] ha-idg-1 crmd: info: pcmk_cpg_membership: Group crmd event 0: node 1084777482 pid 31050 joined via cpg_join Mar 10 19:36:26 [31050] ha-idg-1 crmd: info: pcmk_cpg_membership: Group crmd event 0: ha-idg-1 (node 1084777482 pid 31050) is member Mar 10 19:36:26 [31050] ha-idg-1 crmd: info: do_started: Delaying start, Config not read (0000000000000040) Mar 10 19:36:26 [31050] ha-idg-1 crmd: info: do_started: Delaying start, Config not read (0000000000000040) Mar 10 19:36:26 [31045] ha-idg-1 cib: info: cib_process_request: Completed cib_modify operation for section nodes: OK (rc=0, origin=ha-idg-1/crmd/3, version=7.29548.0) Mar 10 19:36:26 [31050] ha-idg-1 crmd: info: qb_ipcs_us_publish: server name: crmd Mar 10 19:36:26 [31050] ha-idg-1 crmd: notice: do_started: The local CRM is operational ⇐============================ looks pretty good Mar 10 19:36:26 [31050] ha-idg-1 crmd: info: do_log: Input I_PENDING received in state S_STARTING from do_started Mar 10 19:36:26 [31050] ha-idg-1 crmd: notice: do_state_transition: State transition S_STARTING -> S_PENDING | input=I_PENDING cause=C_FSA_INTERNAL origin=do_started Mar 10 19:36:26 [31046] ha-idg-1 stonith-ng: info: action_synced_wait: Managed fence_ilo2_metadata_1 process 31052 exited with rc=0 Mar 10 19:36:26 [31046] ha-idg-1 stonith-ng: info: stonith_device_register: Added 'fence_ilo_ha-idg-2' to the device list (1 active devices) Mar 10 19:36:26 [31046] ha-idg-1 stonith-ng: info: action_synced_wait: Managed fence_ilo4_metadata_1 process 31054 exited with rc=0 Mar 10 19:36:26 [31046] ha-idg-1 stonith-ng: info: stonith_device_register: Added 'fence_ilo_ha-idg-1' to the device list (2 active devices) Mar 10 19:36:28 [31050] ha-idg-1 crmd: info: te_trigger_stonith_history_sync: Fence history will be synchronized cluster-wide within 30 seconds Mar 10 19:36:28 [31050] ha-idg-1 crmd: notice: te_connect_stonith: Fencer successfully connected Mar 10 19:36:34 [31046] ha-idg-1 stonith-ng: notice: handle_request: Received manual confirmation that ha-idg-1 is fenced <===================== seems to be my "stonith_admin -C" Mar 10 19:36:34 [31046] ha-idg-1 stonith-ng: notice: initiate_remote_stonith_op: Initiating manual confirmation for ha-idg-1: 23926653-7baa-44b8-ade3-5ee8468f3db6 Mar 10 19:36:34 [31046] ha-idg-1 stonith-ng: notice: stonith_manual_ack: Injecting manual confirmation that ha-idg-1 is safely off/down Mar 10 19:36:34 [31046] ha-idg-1 stonith-ng: notice: remote_op_done: Operation 'off' targeting ha-idg-1 on a human for stonith_admin.31555@ha-idg-1.23926653: OK Mar 10 19:36:34 [31050] ha-idg-1 crmd: info: exec_alert_list: Sending fencing alert via smtp_alert to informatic....@helmholtz-muenchen.de Mar 10 19:36:34 [31047] ha-idg-1 lrmd: info: process_lrmd_alert_exec: Executing alert smtp_alert for 6bb5a831-e90c-4b0b-8783-0092a26a1e6c Mar 10 19:36:34 [31050] ha-idg-1 crmd: crit: tengine_stonith_notify: We were allegedly just fenced by a human for ha-idg-1! <===================== what does that mean ? I didn't fence it Mar 10 19:36:34 [31050] ha-idg-1 crmd: info: crm_xml_cleanup: Cleaning up memory from libxml2 Mar 10 19:36:34 [31044] ha-idg-1 pacemakerd: warning: pcmk_child_exit: Shutting cluster down because crmd[31050] had fatal failure <======================= ??? Mar 10 19:36:34 [31044] ha-idg-1 pacemakerd: notice: pcmk_shutdown_worker: Shutting down Pacemaker Mar 10 19:36:34 [31044] ha-idg-1 pacemakerd: notice: stop_child: Stopping pengine | sent signal 15 to process 31049 Mar 10 19:36:34 [31049] ha-idg-1 pengine: notice: crm_signal_dispatch: Caught 'Terminated' signal | 15 (invoking handler) Mar 10 19:36:34 [31049] ha-idg-1 pengine: info: qb_ipcs_us_withdraw: withdrawing server sockets Mar 10 19:36:34 [31049] ha-idg-1 pengine: info: crm_xml_cleanup: Cleaning up memory from libxml2 Mar 10 19:36:34 [31044] ha-idg-1 pacemakerd: info: pcmk_child_exit: pengine[31049] exited with status 0 (OK) Mar 10 19:36:34 [31044] ha-idg-1 pacemakerd: notice: stop_child: Stopping attrd | sent signal 15 to process 31048 Mar 10 19:36:34 [31048] ha-idg-1 attrd: notice: crm_signal_dispatch: Caught 'Terminated' signal | 15 (invoking handler) Mar 10 19:36:34 [31048] ha-idg-1 attrd: info: main: Shutting down attribute manager Mar 10 19:36:34 [31048] ha-idg-1 attrd: info: qb_ipcs_us_withdraw: withdrawing server sockets Mar 10 19:36:34 [31048] ha-idg-1 attrd: info: attrd_cib_destroy_cb: Connection disconnection complete Mar 10 19:36:34 [31048] ha-idg-1 attrd: info: crm_xml_cleanup: Cleaning up memory from libxml2 Mar 10 19:36:34 [31044] ha-idg-1 pacemakerd: info: pcmk_child_exit: attrd[31048] exited with status 0 (OK) Mar 10 19:36:34 [31044] ha-idg-1 pacemakerd: notice: stop_child: Stopping lrmd | sent signal 15 to process 31047 Mar 10 19:36:34 [31047] ha-idg-1 lrmd: notice: crm_signal_dispatch: Caught 'Terminated' signal | 15 (invoking handler) Mar 10 19:36:34 [31047] ha-idg-1 lrmd: info: lrmd_exit: Terminating with 0 clients Mar 10 19:36:34 [31047] ha-idg-1 lrmd: info: qb_ipcs_us_withdraw: withdrawing server sockets Mar 10 19:36:34 [31044] ha-idg-1 pacemakerd: info: mcp_cpg_deliver: Ignoring process list sent by peer for local node Mar 10 19:36:34 [31044] ha-idg-1 pacemakerd: info: mcp_cpg_deliver: Ignoring process list sent by peer for local node Mar 10 19:36:34 [31044] ha-idg-1 pacemakerd: info: mcp_cpg_deliver: Ignoring process list sent by peer for local node Mar 10 19:36:34 [31047] ha-idg-1 lrmd: info: crm_xml_cleanup: Cleaning up memory from libxml2 Mar 10 19:36:34 [31044] ha-idg-1 pacemakerd: info: pcmk_child_exit: lrmd[31047] exited with status 0 (OK) Mar 10 19:36:34 [31044] ha-idg-1 pacemakerd: notice: stop_child: Stopping stonith-ng | sent signal 15 to process 31046 Mar 10 19:36:34 [31046] ha-idg-1 stonith-ng: notice: crm_signal_dispatch: Caught 'Terminated' signal | 15 (invoking handler) Mar 10 19:36:34 [31046] ha-idg-1 stonith-ng: info: stonith_shutdown: Terminating with 3 clients Mar 10 19:36:34 [31046] ha-idg-1 stonith-ng: info: cib_connection_destroy: Connection to the CIB closed. Mar 10 19:36:34 [31044] ha-idg-1 pacemakerd: info: mcp_cpg_deliver: Ignoring process list sent by peer for local node Mar 10 19:36:34 [31046] ha-idg-1 stonith-ng: info: qb_ipcs_us_withdraw: withdrawing server sockets Mar 10 19:36:34 [31046] ha-idg-1 stonith-ng: info: crm_xml_cleanup: Cleaning up memory from libxml2 Mar 10 19:36:34 [31044] ha-idg-1 pacemakerd: info: pcmk_child_exit: stonith-ng[31046] exited with status 0 (OK) Mar 10 19:36:34 [31044] ha-idg-1 pacemakerd: notice: stop_child: Stopping cib | sent signal 15 to process 31045 Mar 10 19:36:34 [31045] ha-idg-1 cib: notice: crm_signal_dispatch: Caught 'Terminated' signal | 15 (invoking handler) Mar 10 19:36:34 [31045] ha-idg-1 cib: info: cib_shutdown: Disconnected 0 clients Mar 10 19:36:34 [31045] ha-idg-1 cib: info: cib_shutdown: All clients disconnected (0) Mar 10 19:36:34 [31045] ha-idg-1 cib: info: terminate_cib: initiate_exit: Exiting from mainloop... Mar 10 19:36:34 [31045] ha-idg-1 cib: info: crm_cluster_disconnect: Disconnecting from cluster infrastructure: corosync Mar 10 19:36:34 [31045] ha-idg-1 cib: info: terminate_cs_connection: Disconnecting from Corosync Mar 10 19:36:34 [31045] ha-idg-1 cib: info: terminate_cs_connection: No Quorum connection Mar 10 19:36:34 [31045] ha-idg-1 cib: notice: terminate_cs_connection: Disconnected from Corosync Mar 10 19:36:34 [31045] ha-idg-1 cib: info: crm_cluster_disconnect: Disconnected from corosync Mar 10 19:36:34 [31045] ha-idg-1 cib: info: crm_cluster_disconnect: Disconnecting from cluster infrastructure: corosync Mar 10 19:36:34 [31045] ha-idg-1 cib: info: terminate_cs_connection: Disconnecting from Corosync Mar 10 19:36:34 [31045] ha-idg-1 cib: info: cluster_disconnect_cpg: No CPG connection Mar 10 19:36:34 [31045] ha-idg-1 cib: info: terminate_cs_connection: No Quorum connection Mar 10 19:36:34 [31045] ha-idg-1 cib: notice: terminate_cs_connection: Disconnected from Corosync Mar 10 19:36:34 [31045] ha-idg-1 cib: info: crm_cluster_disconnect: Disconnected from corosync Mar 10 19:36:34 [31044] ha-idg-1 pacemakerd: info: mcp_cpg_deliver: Ignoring process list sent by peer for local node Mar 10 19:36:34 [31045] ha-idg-1 cib: info: qb_ipcs_us_withdraw: withdrawing server sockets Mar 10 19:36:34 [31045] ha-idg-1 cib: info: qb_ipcs_us_withdraw: withdrawing server sockets Mar 10 19:36:34 [31045] ha-idg-1 cib: info: qb_ipcs_us_withdraw: withdrawing server sockets Mar 10 19:36:34 [31045] ha-idg-1 cib: info: crm_xml_cleanup: Cleaning up memory from libxml2 Mar 10 19:36:34 [31044] ha-idg-1 pacemakerd: info: pcmk_child_exit: cib[31045] exited with status 0 (OK) Mar 10 19:36:34 [31044] ha-idg-1 pacemakerd: notice: pcmk_shutdown_worker: Shutdown complete Mar 10 19:36:34 [31044] ha-idg-1 pacemakerd: notice: pcmk_shutdown_worker: Attempting to inhibit respawning after fatal error Mar 10 19:36:34 [31044] ha-idg-1 pacemakerd: info: pcmk_exit_with_cluster: Asking Corosync to shut down Mar 10 19:36:34 [31037] ha-idg-1 corosync notice [CFG ] Node 1084777482 was shut down by sysadmin Mar 10 19:36:34 [31044] ha-idg-1 pacemakerd: info: crm_xml_cleanup: Cleaning up memory from libxml2 Mar 10 19:36:34 [31037] ha-idg-1 corosync notice [SERV ] Unloading all Corosync service engines. Mar 10 19:36:34 [31037] ha-idg-1 corosync info [QB ] withdrawing server sockets Mar 10 19:36:34 [31037] ha-idg-1 corosync notice [SERV ] Service engine unloaded: corosync vote quorum service v1.0 Mar 10 19:36:34 [31037] ha-idg-1 corosync info [QB ] withdrawing server sockets Mar 10 19:36:34 [31037] ha-idg-1 corosync notice [SERV ] Service engine unloaded: corosync configuration map access Mar 10 19:36:34 [31037] ha-idg-1 corosync info [QB ] withdrawing server sockets Mar 10 19:36:34 [31037] ha-idg-1 corosync notice [SERV ] Service engine unloaded: corosync configuration service Mar 10 19:36:34 [31037] ha-idg-1 corosync info [QB ] withdrawing server sockets Mar 10 19:36:34 [31037] ha-idg-1 corosync notice [SERV ] Service engine unloaded: corosync cluster closed process group service v1.01 Mar 10 19:36:34 [31037] ha-idg-1 corosync info [QB ] withdrawing server sockets Mar 10 19:36:34 [31037] ha-idg-1 corosync notice [SERV ] Service engine unloaded: corosync cluster quorum service v0.1 Mar 10 19:36:34 [31037] ha-idg-1 corosync notice [SERV ] Service engine unloaded: corosync profile loading service Mar 10 19:36:34 [31037] ha-idg-1 corosync notice [MAIN ] Corosync Cluster Engine exiting normally Bernd -- Bernd Lentes System Administrator Institute for Metabolism and Cell Death (MCD) Building 25 - office 122 HelmholtzZentrum München bernd.len...@helmholtz-muenchen.de phone: +49 89 3187 1241 +49 89 3187 49123 fax: +49 89 3187 2294 https://www.helmholtz-munich.de/en/mcd Public key: 30 82 01 0a 02 82 01 01 00 b3 72 3e ce 2c 0a 6f 58 49 2c 92 23 c7 b9 c1 ff 6c 3a 53 be f7 9e e9 24 b7 49 fa 3c e8 de 28 85 2c d3 ed f7 70 03 3f 4d 82 fc cc 96 4f 18 27 1f df 25 b3 13 00 db 4b 1d ec 7f 1b cf f9 cd e8 5b 1f 11 b3 a7 48 f8 c8 37 ed 41 ff 18 9f d7 83 51 a9 bd 86 c2 32 b3 d6 2d 77 ff 32 83 92 67 9e ae ae 9c 99 ce 42 27 6f bf d8 c2 a1 54 fd 2b 6b 12 65 0e 8a 79 56 be 53 89 70 51 02 6a eb 76 b8 92 25 2d 88 aa 57 08 42 ef 57 fb fe 00 71 8e 90 ef b2 e3 22 f3 34 4f 7b f1 c4 b1 7c 2f 1d 6f bd c8 a6 a1 1f 25 f3 e4 4b 6a 23 d3 d2 fa 27 ae 97 80 a3 f0 5a c4 50 4a 45 e3 45 4d 82 9f 8b 87 90 d0 f9 92 2d a7 d2 67 53 e6 ae 1e 72 3e e9 e0 c9 d3 1c 23 e0 75 78 4a 45 60 94 f8 e3 03 0b 09 85 08 d0 6c f3 ff ce fa 50 25 d9 da 81 7b 2a dc 9e 28 8b 83 04 b4 0a 9f 37 b8 ac 58 f1 38 43 0e 72 af 02 03 01 00 01 (null) Helmholtz Zentrum Muenchen Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH), Ingolstadter Landstr. 1, 85764 Neuherberg, www.helmholtz-munich.de. Geschaeftsfuehrung: Prof. Dr. med. Dr. h.c. Matthias Tschoep, Kerstin Guenther, Daniela Sommer (kom.) | Aufsichtsratsvorsitzende: Prof. Dr. Veronika von Messling | Registergericht: Amtsgericht Muenchen HRB 6466 | USt-IdNr. DE 129521671 _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/