[ClusterLabs] Q: SLES10 SP4: Node (cmirrord) refuses to join cluster (long)

Ulrich Windl Mon, 29 May 2017 00:33:07 -0700

Hi!

I have a problem that doesn't make any progress in SUSE support, so I want to 
ask here:
We have a three-node cluster that has a stable configuration for a long time. 
When we updates one node (stop cluster node, update software, reboot, start 
cluster software during reboot), the node failed to join the cluster (some CPG 
problem it seems). During that the other two nodes were refusing to provide 
cLVM/OCFS2 service, and the software relying on it (Xen VMs) seemed to hang. I 
had to emergency-reboot the node that seemed to trigger this problem, and on 
the second boot everything was fine...


More then three months had passed since then, and today I was hit by the same 
problem again.

Has anyone seen this problem, also? The cluster uses UDPU since multicast 
proved to be even worse.

For reference here is the status (when the cluster is OK):
Last updated: Mon May 29 09:23:52 2017
Last change: Tue May 23 16:05:16 2017 by root via cibadmin on h01
Stack: classic openais (with plugin)
Current DC: h05 - partition with quorum
Version: 1.1.12-f47ea56
3 Nodes configured, 3 expected votes
46 Resources configured


Online: [ h01 h05 h10 ]

Full list of resources:

 prm_stonith_sbd        (stonith:external/sbd): Started h05
 Clone Set: cln_DLM [prm_DLM]
     Started: [ h01 h05 h10 ]
 Clone Set: cln_cLVMd [prm_cLVMd]
     Started: [ h01 h05 h10 ]
 Clone Set: cln_LVM_CFS_VMs [prm_LVM_CFS_VMs]
     Started: [ h01 h05 h10 ]
 Clone Set: cln_O2CB [prm_O2CB]
     Started: [ h01 h05 h10 ]
 Clone Set: cln_CFS_VMs_fs [prm_CFS_VMs_fs]
     Started: [ h01 h05 h10 ]
 prm_xen_v01    (ocf::heartbeat:Xen):   Started h01
 prm_xen_v07    (ocf::heartbeat:Xen):   Started h01
 prm_xen_v05    (ocf::heartbeat:Xen):   Started h01
 prm_xen_v03    (ocf::heartbeat:Xen):   Started h01
 prm_xen_v04    (ocf::heartbeat:Xen):   Started h05
 prm_xen_v02    (ocf::heartbeat:Xen):   Started h01
 prm_xen_v06    (ocf::heartbeat:Xen):   Started h05
 prm_xen_v08    (ocf::heartbeat:Xen):   Started h05
 prm_xen_v09    (ocf::heartbeat:Xen):   Started h01
 Clone Set: cln_iotw_xen-LV [prm_iotw_xen-LV]
     Started: [ h01 h05 h10 ]
 Clone Set: cln_iotw_xen-FS [prm_iotw_xen-FS]
     Started: [ h01 h05 h10 ]
 Clone Set: cln_iotw_xen-LV-L1 [prm_iotw_xen-LV-L1]
     Started: [ h01 h05 h10 ]
 Clone Set: cln_iotw_xen-LV-L2 [prm_iotw_xen-LV-L2]
     Started: [ h01 h05 h10 ]
 Clone Set: cln_iotw_xen-ML [prm_iotw_xen-ML]
     Started: [ h01 h05 h10 ]
 Clone Set: cln_iotw_xen-ML-L1 [prm_iotw_xen-ML-L1]
     Started: [ h01 h05 h10 ]
 Clone Set: cln_iotw_xen-ML-L2 [prm_iotw_xen-ML-L2]
     Started: [ h01 h05 h10 ]

Node Attributes:
* Node h01:
    + xen_migration_ip                  : 172.20.78.1
* Node h05:
    + xen_migration_ip                  : 172.20.78.5
* Node h10:
    + xen_migration_ip                  : 172.20.78.10

Migration summary:
* Node h01:
* Node h10:
* Node h05:

This is where the node should have joined the cluster (node 1 of 3):
May 29 08:30:07 h01 corosync[10873]:  [CLM   ] CLM CONFIGURATION CHANGE
May 29 08:30:07 h01 corosync[10873]:  [CLM   ] New Configuration:
May 29 08:30:07 h01 corosync[10873]:  [CLM   ]  r(0) ip(172.20.16.1) r(1) 
ip(10.2.2.1)
May 29 08:30:07 h01 corosync[10873]:  [CLM   ]  r(0) ip(172.20.16.5) r(1) 
ip(10.2.2.5)
May 29 08:30:07 h01 corosync[10873]:  [CLM   ] Members Left:
May 29 08:30:07 h01 corosync[10873]:  [CLM   ] Members Joined:
May 29 08:30:07 h01 corosync[10873]:  [pcmk  ] notice: pcmk_peer_update: 
Transitional membership event on ring 3444: memb=2, new=0, lost=0
May 29 08:30:07 h01 corosync[10873]:  [pcmk  ] info: pcmk_peer_update: memb: 
h01 739512321
May 29 08:30:07 h01 corosync[10873]:  [pcmk  ] info: pcmk_peer_update: memb: 
h05 739512325
May 29 08:30:07 h01 corosync[10873]:  [CLM   ] CLM CONFIGURATION CHANGE
May 29 08:30:07 h01 corosync[10873]:  [CLM   ] New Configuration:
May 29 08:30:07 h01 corosync[10873]:  [CLM   ]  r(0) ip(172.20.16.1) r(1) 
ip(10.2.2.1)
May 29 08:30:07 h01 corosync[10873]:  [CLM   ]  r(0) ip(172.20.16.5) r(1) 
ip(10.2.2.5)
May 29 08:30:07 h01 cluster-dlm[11711]:   notice: crm_update_peer_state: 
plugin_handle_membership: Node h10[739512330] - state is now member (was lost)
May 29 08:30:07 h01 cluster-dlm[11711]: update_cluster: Processing membership 
3444
May 29 08:30:07 h01 ocfs2_controld[12424]:   notice: crm_update_peer_state: 
plugin_handle_membership: Node h10[739512330] - state is now member (was lost)
May 29 08:30:07 h01 corosync[10873]:  [CLM   ]  r(0) ip(172.20.16.10) r(1) 
ip(10.2.2.10)
May 29 08:30:07 h01 crmd[10883]:   notice: crm_update_peer_state: 
plugin_handle_membership: Node h10[739512330] - state is now member (was lost)
May 29 08:30:07 h01 cluster-dlm[11711]: dlm_process_node: Skipped active node 
739512325: born-on=3420, last-seen=3444, this-event=3444, last-event=3440
May 29 08:30:07 h01 corosync[10873]:  [CLM   ] Members Left:
May 29 08:30:07 h01 corosync[10873]:  [CLM   ] Members Joined:
May 29 08:30:07 h01 corosync[10873]:  [CLM   ]  r(0) ip(172.20.16.10) r(1) 
ip(10.2.2.10)
May 29 08:30:07 h01 corosync[10873]:  [pcmk  ] notice: pcmk_peer_update: Stable 
membership event on ring 3444: memb=3, new=1, lost=0
May 29 08:30:07 h01 cluster-dlm[11711]: dlm_process_node: Adding address 
ip(172.20.16.10) to configfs for node 739512330
May 29 08:30:07 h01 corosync[10873]:  [pcmk  ] info: update_member: Node 
739512330/h10 is now: member
May 29 08:30:07 h01 cluster-dlm[11711]: add_configfs_node: set_configfs_node 
739512330 172.20.16.10 local 0
May 29 08:30:07 h01 corosync[10873]:  [pcmk  ] info: pcmk_peer_update: NEW:  
h10 739512330
May 29 08:30:07 h01 corosync[10873]:  [pcmk  ] info: pcmk_peer_update: MEMB: 
h01 739512321
May 29 08:30:07 h01 corosync[10873]:  [pcmk  ] info: pcmk_peer_update: MEMB: 
h05 739512325
May 29 08:30:07 h01 cib[10878]:   notice: crm_update_peer_state: 
plugin_handle_membership: Node h10[739512330] - state is now member (was lost)
May 29 08:30:07 h01 corosync[10873]:  [pcmk  ] info: pcmk_peer_update: MEMB: 
h10 739512330
May 29 08:30:07 h01 cluster-dlm[11711]: dlm_process_node: Adding address 
ip(10.2.2.10) to configfs for node 739512330
May 29 08:30:07 h01 cluster-dlm[11711]: add_configfs_node: set_configfs_node 
739512330 10.2.2.10 local 0
May 29 08:30:07 h01 corosync[10873]:  [pcmk  ] info: send_member_notification: 
Sending membership update 3444 to 5 children
May 29 08:30:07 h01 cluster-dlm[11711]: create_path: 
/sys/kernel/config/dlm/cluster/comms/739512330: mkdir failed: 17
May 29 08:30:07 h01 cib[10878]:   notice: crm_update_peer_state: 
cib_peer_update_callback: Node h10[739512330] - state is now lost (was member)
May 29 08:30:07 h01 stonith-ng[10879]:   notice: crm_update_peer_state: 
plugin_handle_membership: Node h10[739512330] - state is now member (was lost)
May 29 08:30:07 h01 cib[10878]:   notice: crm_update_peer_state: 
plugin_handle_membership: Node h10[739512330] - state is now member (was lost)
May 29 08:30:07 h01 cluster-dlm[11711]: dlm_process_node: Added active node 
739512330: born-on=3436, last-seen=3444, this-event=3444, last-event=3440
May 29 08:30:07 h01 corosync[10873]:  [TOTEM ] A processor joined or left the 
membership and a new membership was formed.
May 29 08:30:07 h01 stonith-ng[10879]:   notice: crm_update_peer_state: 
st_peer_update_callback: Node h10[739512330] - state is now lost (was member)
May 29 08:30:07 h01 cluster-dlm[11711]: dlm_process_node: Skipped active node 
739512321: born-on=3428, last-seen=3444, this-event=3444, last-event=3440
May 29 08:30:07 h01 corosync[10873]:  [pcmk  ] info: update_member: 0x6aa5b0 
Node 739512330 (h10) born on: 3444
May 29 08:30:07 h01 stonith-ng[10879]:   notice: crm_update_peer_state: 
plugin_handle_membership: Node h10[739512330] - state is now member (was lost)
May 29 08:30:07 h01 corosync[10873]:  [pcmk  ] info: update_member: Node h10 
now has process list: 00000000000000000000000000151312 (1381138)
May 29 08:30:07 h01 corosync[10873]:  [pcmk  ] info: send_member_notification: 
Sending membership update 3444 to 5 children
May 29 08:30:08 h01 cmirrord[12926]: [yEa32lLX]  Retry #1 of cpg_mcast_joined: 
SA_AIS_ERR_TRY_AGAIN
May 29 08:30:08 h01 cmirrord[12926]: [yEa32lLX]  Retry #1 of cpg_mcast_joined: 
SA_AIS_ERR_TRY_AGAIN
May 29 08:30:08 h01 cmirrord[12926]: [yEa32lLX]  Retry #1 of cpg_mcast_joined: 
SA_AIS_ERR_TRY_AGAIN
[...and so on ...]

On the DC-node:
May 29 08:30:07 h05 corosync[24255]:  [CLM   ] CLM CONFIGURATION CHANGE
May 29 08:30:07 h05 corosync[24255]:  [CLM   ] New Configuration:
May 29 08:30:07 h05 corosync[24255]:  [CLM   ]  r(0) ip(172.20.16.1) r(1) 
ip(10.2.2.1)
May 29 08:30:07 h05 corosync[24255]:  [CLM   ]  r(0) ip(172.20.16.5) r(1) 
ip(10.2.2.5)
May 29 08:30:07 h05 corosync[24255]:  [CLM   ] Members Left:
May 29 08:30:07 h05 corosync[24255]:  [CLM   ] Members Joined:
May 29 08:30:07 h05 corosync[24255]:  [pcmk  ] notice: pcmk_peer_update: 
Transitional membership event on ring 3444: memb=2, new=0, lost=0
May 29 08:30:07 h05 corosync[24255]:  [pcmk  ] info: pcmk_peer_update: memb: 
h01 739512321
May 29 08:30:07 h05 corosync[24255]:  [pcmk  ] info: pcmk_peer_update: memb: 
h05 739512325
May 29 08:30:07 h05 corosync[24255]:  [CLM   ] CLM CONFIGURATION CHANGE
May 29 08:30:07 h05 corosync[24255]:  [CLM   ] New Configuration:
May 29 08:30:07 h05 corosync[24255]:  [CLM   ]  r(0) ip(172.20.16.1) r(1) 
ip(10.2.2.1)
May 29 08:30:07 h05 corosync[24255]:  [CLM   ]  r(0) ip(172.20.16.5) r(1) 
ip(10.2.2.5)
May 29 08:30:07 h05 corosync[24255]:  [CLM   ]  r(0) ip(172.20.16.10) r(1) 
ip(10.2.2.10)
May 29 08:30:07 h05 corosync[24255]:  [CLM   ] Members Left:
May 29 08:30:07 h05 corosync[24255]:  [CLM   ] Members Joined:
May 29 08:30:07 h05 corosync[24255]:  [CLM   ]  r(0) ip(172.20.16.10) r(1) 
ip(10.2.2.10)
May 29 08:30:07 h05 corosync[24255]:  [pcmk  ] notice: pcmk_peer_update: Stable 
membership event on ring 3444: memb=3, new=1, lost=0
May 29 08:30:07 h05 corosync[24255]:  [pcmk  ] info: update_member: Node 
739512330/h10 is now: member
May 29 08:30:07 h05 corosync[24255]:  [pcmk  ] info: pcmk_peer_update: NEW:  
h10 739512330
May 29 08:30:07 h05 corosync[24255]:  [pcmk  ] info: pcmk_peer_update: MEMB: 
h01 739512321
May 29 08:30:07 h05 corosync[24255]:  [pcmk  ] info: pcmk_peer_update: MEMB: 
h05 739512325
May 29 08:30:07 h05 corosync[24255]:  [pcmk  ] info: pcmk_peer_update: MEMB: 
h10 739512330
May 29 08:30:07 h05 corosync[24255]:  [pcmk  ] info: send_member_notification: 
Sending membership update 3444 to 5 children
May 29 08:30:07 h05 cib[24260]:   notice: crm_update_peer_state: 
plugin_handle_membership: Node h10[739512330] - state is now member (was lost)
May 29 08:30:07 h05 corosync[24255]:  [TOTEM ] A processor joined or left the 
membership and a new membership was formed.
May 29 08:30:07 h05 cluster-dlm[24772]:   notice: crm_update_peer_state: 
plugin_handle_membership: Node h10[739512330] - state is now member (was lost)
May 29 08:30:07 h05 ocfs2_controld[24901]:   notice: crm_update_peer_state: 
plugin_handle_membership: Node h10[739512330] - state is now member (was lost)
May 29 08:30:07 h05 cluster-dlm[24772]: update_cluster: Processing membership 
3444
May 29 08:30:07 h05 cluster-dlm[24772]: dlm_process_node: Skipped active node 
739512325: born-on=3420, last-seen=3444, this-event=3444, last-event=3440
May 29 08:30:07 h05 crmd[24265]:   notice: crm_update_peer_state: 
plugin_handle_membership: Node h10[739512330] - state is now member (was lost)
May 29 08:30:07 h05 cluster-dlm[24772]: dlm_process_node: Skipped active node 
739512321: born-on=3428, last-seen=3444, this-event=3444, last-event=3440
May 29 08:30:07 h05 stonith-ng[24261]:   notice: crm_update_peer_state: 
plugin_handle_membership: Node h10[739512330] - state is now member (was lost)
May 29 08:30:07 h05 cluster-dlm[24772]: dlm_process_node: Adding address 
ip(172.20.16.10) to configfs for node 739512330
May 29 08:30:07 h05 cluster-dlm[24772]: add_configfs_node: set_configfs_node 
739512330 172.20.16.10 local 0
May 29 08:30:07 h05 cluster-dlm[24772]: dlm_process_node: Adding address 
ip(10.2.2.10) to configfs for node 739512330
May 29 08:30:07 h05 cluster-dlm[24772]: add_configfs_node: set_configfs_node 
739512330 10.2.2.10 local 0
May 29 08:30:07 h05 cluster-dlm[24772]: create_path: 
/sys/kernel/config/dlm/cluster/comms/739512330: mkdir failed: 17
May 29 08:30:07 h05 cluster-dlm[24772]: dlm_process_node: Added active node 
739512330: born-on=3436, last-seen=3444, this-event=3444, last-event=3440
May 29 08:30:07 h05 corosync[24255]:  [pcmk  ] info: update_member: 0x6b59c0 
Node 739512330 (h10) born on: 3444
May 29 08:30:07 h05 corosync[24255]:  [pcmk  ] info: update_member: Node h10 
now has process list: 00000000000000000000000000151312 (1381138)
May 29 08:30:07 h05 corosync[24255]:  [pcmk  ] info: send_member_notification: 
Sending membership update 3444 to 5 children
May 29 08:30:07 h05 stonith-ng[24261]:   notice: crm_update_peer_state: 
st_peer_update_callback: Node h10[739512330] - state is now lost (was member)
May 29 08:30:07 h05 stonith-ng[24261]:   notice: crm_update_peer_state: 
plugin_handle_membership: Node h10[739512330] - state is now member (was lost)
May 29 08:30:07 h05 cib[24260]:   notice: crm_update_peer_state: 
cib_peer_update_callback: Node h10[739512330] - state is now lost (was member)
May 29 08:30:07 h05 cib[24260]:   notice: crm_update_peer_state: 
plugin_handle_membership: Node h10[739512330] - state is now member (was lost)
May 29 08:30:07 h05 crmd[24265]:   notice: do_state_transition: State 
transition S_IDLE -> S_INTEGRATION [ input=I_NODE_JOIN cause=C_FSA_INTERNAL 
origin=peer_update_callback ]
May 29 08:30:07 h05 cmirrord[25158]: [yEa32lLX]  Retry #1 of cpg_mcast_joined: 
SA_AIS_ERR_TRY_AGAIN
May 29 08:30:07 h05 cmirrord[25158]: [yEa32lLX]  Retry #1 of cpg_mcast_joined: 
SA_AIS_ERR_TRY_AGAIN
May 29 08:30:07 h05 cmirrord[25158]: [yEa32lLX]  Retry #1 of cpg_mcast_joined: 
SA_AIS_ERR_TRY_AGAIN
May 29 08:30:07 h05 cmirrord[25158]: [yEa32lLX]  Retry #1 of cpg_mcast_joined: 
SA_AIS_ERR_TRY_AGAIN
May 29 08:30:07 h05 cmirrord[25158]: [yEa32lLX]  Retry #1 of cpg_mcast_joined: 
SA_AIS_ERR_TRY_AGAIN
[...and so on...]

On the rebooted node:
May 29 08:30:07 h10 corosync[17254]:  [TOTEM ] adding new UDPU member 
{172.20.16.1}
May 29 08:30:07 h10 corosync[17254]:  [TOTEM ] adding new UDPU member 
{172.20.16.5}
May 29 08:30:07 h10 corosync[17254]:  [TOTEM ] adding new UDPU member 
{172.20.16.10}
May 29 08:30:07 h10 corosync[17254]:  [TOTEM ] The network interface 
[10.2.2.10] is now up.
May 29 08:30:07 h10 corosync[17254]:  [TOTEM ] adding new UDPU member {10.2.2.1}
May 29 08:30:07 h10 corosync[17254]:  [TOTEM ] adding new UDPU member {10.2.2.5}
May 29 08:30:07 h10 corosync[17254]:  [TOTEM ] adding new UDPU member 
{10.2.2.10}
May 29 08:30:07 h10 corosync[17254]:  [CLM   ] CLM CONFIGURATION CHANGE
May 29 08:30:07 h10 corosync[17254]:  [CLM   ] New Configuration:
May 29 08:30:07 h10 corosync[17254]:  [CLM   ] Members Left:
May 29 08:30:07 h10 corosync[17254]:  [CLM   ] Members Joined:
May 29 08:30:07 h10 corosync[17254]:  [pcmk  ] notice: pcmk_peer_update: 
Transitional membership event on ring 3440: memb=0, new=0, lost=0
May 29 08:30:07 h10 corosync[17254]:  [CLM   ] CLM CONFIGURATION CHANGE
May 29 08:30:07 h10 corosync[17254]:  [CLM   ] New Configuration:
May 29 08:30:07 h10 corosync[17254]:  [CLM   ]  r(0) ip(172.20.16.10) r(1) 
ip(10.2.2.10)
May 29 08:30:07 h10 corosync[17254]:  [CLM   ] Members Left:
May 29 08:30:07 h10 corosync[17254]:  [CLM   ] Members Joined:
May 29 08:30:07 h10 corosync[17254]:  [CLM   ]  r(0) ip(172.20.16.10) r(1) 
ip(10.2.2.10)
May 29 08:30:07 h10 corosync[17254]:  [pcmk  ] notice: pcmk_peer_update: Stable 
membership event on ring 3440: memb=1, new=1, lost=0
May 29 08:30:07 h10 corosync[17254]:  [pcmk  ] info: pcmk_peer_update: NEW:  
h10 739512330
May 29 08:30:07 h10 corosync[17254]:  [pcmk  ] info: pcmk_peer_update: MEMB: 
h10 739512330
May 29 08:30:07 h10 corosync[17254]:  [pcmk  ] info: update_member: Node h10 
now has process list: 00000000000000000000000000151312 (1381138)
May 29 08:30:07 h10 corosync[17254]:  [TOTEM ] A processor joined or left the 
membership and a new membership was formed.
May 29 08:30:07 h10 corosync[17254]:  [CPG   ] chosen downlist: sender r(0) 
ip(172.20.16.10) r(1) ip(10.2.2.10) ; members(old:0 left:0)
May 29 08:30:07 h10 corosync[17254]:  [MAIN  ] Completed service 
synchronization, ready to provide service.
May 29 08:30:07 h10 attrd[17262]:   notice: crm_add_logfile: Additional logging 
available in /var/log/pacemaker.log
May 29 08:30:07 h10 attrd[17262]:   notice: crm_cluster_connect: Connecting to 
cluster infrastructure: classic openais (with plugin)
May 29 08:30:07 h10 corosync[17254]:  [CLM   ] CLM CONFIGURATION CHANGE
May 29 08:30:07 h10 corosync[17254]:  [CLM   ] New Configuration:
May 29 08:30:07 h10 corosync[17254]:  [CLM   ]  r(0) ip(172.20.16.10) r(1) 
ip(10.2.2.10)
May 29 08:30:07 h10 corosync[17254]:  [CLM   ] Members Left:
May 29 08:30:07 h10 corosync[17254]:  [CLM   ] Members Joined:
May 29 08:30:07 h10 corosync[17254]:  [pcmk  ] notice: pcmk_peer_update: 
Transitional membership event on ring 3444: memb=1, new=0, lost=0
May 29 08:30:07 h10 corosync[17254]:  [pcmk  ] info: pcmk_peer_update: memb: 
h10 739512330
May 29 08:30:07 h10 corosync[17254]:  [CLM   ] CLM CONFIGURATION CHANGE
May 29 08:30:07 h10 corosync[17254]:  [CLM   ] New Configuration:
May 29 08:30:07 h10 corosync[17254]:  [CLM   ]  r(0) ip(172.20.16.1) r(1) 
ip(10.2.2.1)
May 29 08:30:07 h10 corosync[17254]:  [CLM   ]  r(0) ip(172.20.16.5) r(1) 
ip(10.2.2.5)
May 29 08:30:07 h10 corosync[17254]:  [CLM   ]  r(0) ip(172.20.16.10) r(1) 
ip(10.2.2.10)
May 29 08:30:07 h10 corosync[17254]:  [CLM   ] Members Left:
May 29 08:30:07 h10 corosync[17254]:  [CLM   ] Members Joined:
May 29 08:30:07 h10 corosync[17254]:  [CLM   ]  r(0) ip(172.20.16.1) r(1) 
ip(10.2.2.1)
May 29 08:30:07 h10 corosync[17254]:  [CLM   ]  r(0) ip(172.20.16.5) r(1) 
ip(10.2.2.5)
May 29 08:30:07 h10 corosync[17254]:  [pcmk  ] notice: pcmk_peer_update: Stable 
membership event on ring 3444: memb=3, new=2, lost=0
May 29 08:30:07 h10 corosync[17254]:  [pcmk  ] info: update_member: Creating 
entry for node 739512321 born on 3444
May 29 08:30:07 h10 corosync[17254]:  [pcmk  ] info: update_member: Node 
739512321/unknown is now: member
May 29 08:30:07 h10 corosync[17254]:  [pcmk  ] info: pcmk_peer_update: NEW:  
.pending. 739512321
May 29 08:30:07 h10 corosync[17254]:  [pcmk  ] info: update_member: Creating 
entry for node 739512325 born on 3444
May 29 08:30:07 h10 corosync[17254]:  [pcmk  ] info: update_member: Node 
739512325/unknown is now: member
May 29 08:30:07 h10 corosync[17254]:  [pcmk  ] info: pcmk_peer_update: NEW:  
.pending. 739512325
May 29 08:30:07 h10 corosync[17254]:  [pcmk  ] info: pcmk_peer_update: MEMB: 
.pending. 739512321
May 29 08:30:07 h10 corosync[17254]:  [pcmk  ] info: pcmk_peer_update: MEMB: 
.pending. 739512325
May 29 08:30:07 h10 corosync[17254]:  [pcmk  ] info: pcmk_peer_update: MEMB: 
h10 739512330
May 29 08:30:07 h10 corosync[17254]:  [pcmk  ] info: send_member_notification: 
Sending membership update 3444 to 0 children
May 29 08:30:07 h10 corosync[17254]:  [pcmk  ] info: update_member: 0x6950d0 
Node 739512330 ((null)) born on: 3444
May 29 08:30:07 h10 corosync[17254]:  [TOTEM ] A processor joined or left the 
membership and a new membership was formed.
May 29 08:30:07 h10 corosync[17254]:  [pcmk  ] info: update_member: 
0x7fad3c001140 Node 739512325 (h05) born on: 3420
May 29 08:30:07 h10 corosync[17254]:  [pcmk  ] info: update_member: 
0x7fad3c001140 Node 739512325 now known as h05 (was: (null))
May 29 08:30:07 h10 corosync[17254]:  [pcmk  ] info: update_member: Node h05 
now has process list: 00000000000000000000000000151312 (1381138)
May 29 08:30:07 h10 corosync[17254]:  [pcmk  ] info: update_member: Node h05 
now has 1 quorum votes (was 0)
May 29 08:30:07 h10 corosync[17254]:  [pcmk  ] info: send_member_notification: 
Sending membership update 3444 to 0 children
May 29 08:30:07 h10 corosync[17254]:  [pcmk  ] WARN: route_ais_message: Sending 
message to local.stonith-ng failed: ipc delivery failed (rc=-2)
May 29 08:30:07 h10 corosync[17254]:  [pcmk  ] WARN: route_ais_message: Sending 
message to local.crmd failed: ipc delivery failed (rc=-2)
May 29 08:30:07 h10 corosync[17254]:  [pcmk  ] info: update_member: 0x6a4e80 
Node 739512321 (h01) born on: 3428
May 29 08:30:07 h10 corosync[17254]:  [pcmk  ] info: update_member: 0x6a4e80 
Node 739512321 now known as h01 (was: (null))
May 29 08:30:07 h10 corosync[17254]:  [pcmk  ] info: update_member: Node h01 
now has process list: 00000000000000000000000000151312 (1381138)
May 29 08:30:07 h10 corosync[17254]:  [pcmk  ] info: update_member: Node h01 
now has 1 quorum votes (was 0)
May 29 08:30:07 h10 corosync[17254]:  [pcmk  ] info: update_expected_votes: 
Expected quorum votes 2 -> 3
May 29 08:30:07 h10 corosync[17254]:  [pcmk  ] info: send_member_notification: 
Sending membership update 3444 to 0 children
May 29 08:30:07 h10 corosync[17254]:  [pcmk  ] WARN: route_ais_message: Sending 
message to local.crmd failed: ipc delivery failed (rc=-2)
May 29 08:30:07 h10 corosync[17254]:  [pcmk  ] WARN: route_ais_message: Sending 
message to local.cib failed: ipc delivery failed (rc=-2)
May 29 08:30:07 h10 corosync[17254]:  [pcmk  ] WARN: route_ais_message: Sending 
message to local.crmd failed: ipc delivery failed (rc=-2)
May 29 08:30:07 h10 cib[17259]:   notice: crm_add_logfile: Additional logging 
available in /var/log/pacemaker.log
May 29 08:30:07 h10 corosync[17254]:  [pcmk  ] WARN: route_ais_message: Sending 
message to local.stonith-ng failed: ipc delivery failed (rc=-2)
May 29 08:30:07 h10 corosync[17254]:  [pcmk  ] WARN: route_ais_message: Sending 
message to local.stonith-ng failed: ipc delivery failed (rc=-2)
May 29 08:30:07 h10 corosync[17254]:  [pcmk  ] WARN: route_ais_message: Sending 
message to local.stonith-ng failed: ipc delivery failed (rc=-2)
May 29 08:30:07 h10 corosync[17254]:  [pcmk  ] WARN: route_ais_message: Sending 
message to local.stonith-ng failed: ipc delivery failed (rc=-2)
May 29 08:30:07 h10 corosync[17254]:  [pcmk  ] WARN: route_ais_message: Sending 
message to local.stonith-ng failed: ipc delivery failed (rc=-2)
May 29 08:30:07 h10 pengine[17263]:   notice: crm_add_logfile: Additional 
logging available in /var/log/pacemaker.log
May 29 08:30:07 h10 stonith-ng[17260]:   notice: crm_add_logfile: Additional 
logging available in /var/log/pacemaker.log
May 29 08:30:07 h10 attrd[17262]:   notice: get_node_name: Defaulting to uname 
-n for the local classic openais (with plugin) node name
May 29 08:30:07 h10 corosync[17254]:  [pcmk  ] info: pcmk_ipc: Recorded 
connection 0x6a6cb0 for attrd/17262
May 29 08:30:07 h10 stonith-ng[17260]:   notice: crm_cluster_connect: 
Connecting to cluster infrastructure: classic openais (with plugin)
May 29 08:30:07 h10 corosync[17254]:  [pcmk  ] WARN: route_ais_message: Sending 
message to local.cib failed: ipc delivery failed (rc=-2)
May 29 08:30:07 h10 corosync[17254]:  [pcmk  ] WARN: route_ais_message: Sending 
message to local.crmd failed: ipc delivery failed (rc=-2)
May 29 08:30:07 h10 attrd[17262]:   notice: get_node_name: Defaulting to uname 
-n for the local classic openais (with plugin) node name
May 29 08:30:07 h10 attrd[17262]:   notice: main: Starting mainloop...
May 29 08:30:07 h10 stonith-ng[17260]:   notice: get_node_name: Defaulting to 
uname -n for the local classic openais (with plugin) node name
May 29 08:30:07 h10 corosync[17254]:  [pcmk  ] info: pcmk_ipc: Recorded 
connection 0x6b1260 for stonith-ng/17260
May 29 08:30:07 h10 corosync[17254]:  [pcmk  ] info: pcmk_ipc: Sending 
membership update 3444 to stonith-ng
May 29 08:30:07 h10 corosync[17254]:  [pcmk  ] WARN: route_ais_message: Sending 
message to local.cib failed: ipc delivery failed (rc=-2)
May 29 08:30:07 h10 mgmtd: [17265]: info: Pacemaker-mgmt Git Version: 236fee1
May 29 08:30:07 h10 mgmtd: [17265]: WARN: Core dumps could be lost if multiple 
dumps occur.
May 29 08:30:07 h10 mgmtd: [17265]: WARN: Consider setting non-default value in 
/proc/sys/kernel/core_pattern (or equivalent) for maximum supportability
May 29 08:30:07 h10 mgmtd: [17265]: WARN: Consider setting 
/proc/sys/kernel/core_uses_pid (or equivalent) to 1 for maximum supportability
May 29 08:30:07 h10 lrmd[17261]:   notice: crm_add_logfile: Additional logging 
available in /var/log/pacemaker.log
May 29 08:30:07 h10 mgmtd: [17265]: notice: connect to lrmd: 0, rc=-107
May 29 08:30:07 h10 stonith-ng[17260]:   notice: crm_update_peer_state: 
st_peer_update_callback: Node h10[739512330] - state is now lost (was (null))
May 29 08:30:07 h10 stonith-ng[17260]:   notice: get_node_name: Defaulting to 
uname -n for the local classic openais (with plugin) node name
May 29 08:30:07 h10 crmd[17264]:   notice: crm_add_logfile: Additional logging 
available in /var/log/pacemaker.log
May 29 08:30:07 h10 crmd[17264]:   notice: main: CRM Git Version: f47ea56
May 29 08:30:07 h10 corosync[17254]:  [pcmk  ] WARN: route_ais_message: Sending 
message to local.cib failed: ipc delivery failed (rc=-2)
May 29 08:30:07 h10 corosync[17254]:  [pcmk  ] WARN: route_ais_message: Sending 
message to local.crmd failed: ipc delivery failed (rc=-2)
May 29 08:30:07 h10 cib[17259]:   notice: crm_cluster_connect: Connecting to 
cluster infrastructure: classic openais (with plugin)
May 29 08:30:07 h10 cib[17259]:   notice: get_node_name: Defaulting to uname -n 
for the local classic openais (with plugin) node name
May 29 08:30:07 h10 corosync[17254]:  [pcmk  ] info: pcmk_ipc: Recorded 
connection 0x6b5df0 for cib/17259
May 29 08:30:07 h10 corosync[17254]:  [pcmk  ] info: pcmk_ipc: Sending 
membership update 3444 to cib
May 29 08:30:07 h10 cib[17259]:   notice: crm_update_peer_state: 
cib_peer_update_callback: Node h10[739512330] - state is now lost (was (null))
May 29 08:30:07 h10 cib[17259]:   notice: get_node_name: Defaulting to uname -n 
for the local classic openais (with plugin) node name
May 29 08:30:07 h10 cib[17259]:   notice: plugin_handle_membership: Membership 
3444: quorum acquired
May 29 08:30:07 h10 cib[17259]:   notice: crm_update_peer_state: 
cib_peer_update_callback: Node h01[739512321] - state is now lost (was (null))
May 29 08:30:07 h10 cib[17259]:   notice: crm_update_peer_state: 
plugin_handle_membership: Node h01[739512321] - state is now member (was lost)
May 29 08:30:07 h10 cib[17259]:   notice: crm_update_peer_state: 
plugin_handle_membership: Node h10[739512330] - state is now member (was lost)
May 29 08:30:07 h10 cib[17259]:   notice: crm_update_peer_state: 
cib_peer_update_callback: Node h05[739512325] - state is now lost (was (null))
May 29 08:30:07 h10 cib[17259]:   notice: crm_update_peer_state: 
plugin_handle_membership: Node h05[739512325] - state is now member (was lost)
May 29 08:30:08 h10 stonith-ng[17260]:   notice: setup_cib: Watching for 
stonith topology changes
May 29 08:30:08 h10 stonith-ng[17260]:   notice: plugin_handle_membership: 
Membership 3444: quorum acquired
May 29 08:30:08 h10 stonith-ng[17260]:   notice: crm_update_peer_state: 
st_peer_update_callback: Node h01[739512321] - state is now lost (was (null))
May 29 08:30:08 h10 stonith-ng[17260]:   notice: crm_update_peer_state: 
plugin_handle_membership: Node h01[739512321] - state is now member (was lost)
May 29 08:30:08 h10 stonith-ng[17260]:   notice: crm_update_peer_state: 
plugin_handle_membership: Node h10[739512330] - state is now member (was lost)
May 29 08:30:08 h10 stonith-ng[17260]:   notice: crm_update_peer_state: 
st_peer_update_callback: Node h05[739512325] - state is now lost (was (null))
May 29 08:30:08 h10 stonith-ng[17260]:   notice: crm_update_peer_state: 
plugin_handle_membership: Node h05[739512325] - state is now member (was lost)
May 29 08:30:08 h10 crmd[17264]:   notice: crm_cluster_connect: Connecting to 
cluster infrastructure: classic openais (with plugin)
May 29 08:30:08 h10 crmd[17264]:   notice: get_node_name: Defaulting to uname 
-n for the local classic openais (with plugin) node name
May 29 08:30:08 h10 corosync[17254]:  [pcmk  ] info: pcmk_ipc: Recorded 
connection 0x6ba170 for crmd/17264
May 29 08:30:08 h10 corosync[17254]:  [pcmk  ] info: pcmk_ipc: Sending 
membership update 3444 to crmd
May 29 08:30:08 h10 crmd[17264]:   notice: get_node_name: Defaulting to uname 
-n for the local classic openais (with plugin) node name
May 29 08:30:08 h10 crmd[17264]:   notice: plugin_handle_membership: Membership 
3444: quorum acquired
May 29 08:30:08 h10 crmd[17264]:   notice: crm_update_peer_state: 
plugin_handle_membership: Node h01[739512321] - state is now member (was (null))
May 29 08:30:08 h10 crmd[17264]:   notice: crm_update_peer_state: 
plugin_handle_membership: Node h10[739512330] - state is now member (was (null))
May 29 08:30:08 h10 crmd[17264]:   notice: crm_update_peer_state: 
plugin_handle_membership: Node h05[739512325] - state is now member (was (null))
May 29 08:30:08 h10 crmd[17264]:   notice: do_started: The local CRM is 
operational
May 29 08:30:08 h10 crmd[17264]:   notice: do_state_transition: State 
transition S_STARTING -> S_PENDING [ input=I_PENDING cause=C_FSA_INTERNAL 
origin=do_started ]
May 29 08:30:08 h10 mgmtd: [17265]: info: Started.
May 29 08:30:08 h10 SuSEfirewall2: Setting up rules from 
/etc/sysconfig/SuSEfirewall2 ...
May 29 08:30:08 h10 SuSEfirewall2: using default zone 'ext' for interface eth0
May 29 08:30:08 h10 SuSEfirewall2: using default zone 'ext' for interface eth1
May 29 08:30:08 h10 SuSEfirewall2: using default zone 'ext' for interface eth2
May 29 08:30:08 h10 SuSEfirewall2: using default zone 'ext' for interface eth3
May 29 08:30:09 h10 SuSEfirewall2: using default zone 'ext' for interface eth4
May 29 08:30:09 h10 SuSEfirewall2: using default zone 'ext' for interface eth5
May 29 08:30:10 h10 crmd[17264]:   notice: do_state_transition: State 
transition S_PENDING -> S_NOT_DC [ input=I_NOT_DC cause=C_HA_MESSAGE 
origin=do_cl_join_finalize_respond ]
May 29 08:30:10 h10 attrd[17262]:   notice: attrd_trigger_update: Sending flush 
op to all hosts for: shutdown (0)
May 29 08:30:10 h10 attrd[17262]:   notice: attrd_local_callback: Sending full 
refresh (origin=crmd)
May 29 08:30:10 h10 attrd[17262]:   notice: attrd_trigger_update: Sending flush 
op to all hosts for: shutdown (0)
May 29 08:30:11 h10 SuSEfirewall2: batch committing...
May 29 08:30:11 h10 crmd[17264]:   notice: process_lrm_event: Operation 
prm_stonith_sbd_monitor_0: not running (node=h10, call=6, rc=7, cib-update=13, 
confirmed=true)
May 29 08:30:11 h10 crmd[17264]:   notice: process_lrm_event: Operation 
prm_DLM_monitor_0: not running (node=h10, call=11, rc=7, cib-update=14, 
confirmed=true)
May 29 08:30:11 h10 SuSEfirewall2: Firewall rules successfully set
May 29 08:30:11 h10 LVM(prm_LVM_CFS_VMs)[18708]: WARNING: LVM Volume CFS_VMs is 
not available (stopped)
May 29 08:30:11 h10 o2cb(prm_O2CB)[18738]: INFO: configfs not loaded
May 29 08:30:11 h10 crmd[17264]:   notice: process_lrm_event: Operation 
prm_cLVMd_monitor_0: not running (node=h10, call=16, rc=7, cib-update=17, 
confirmed=true)
May 29 08:30:11 h10 crmd[17264]:   notice: process_lrm_event: Operation 
prm_LVM_CFS_VMs_monitor_0: not running (node=h10, call=21, rc=7, cib-update=18, 
confirmed=true)
May 29 08:30:11 h10 crmd[17264]:   notice: process_lrm_event: Operation 
prm_O2CB_monitor_0: not running (node=h10, call=26, rc=7, cib-update=19, 
confirmed=true)
May 29 08:30:11 h10 Filesystem(prm_CFS_VMs_fs)[18753]: WARNING: Couldn't find 
device [/dev/CFS_VMs/xen]. Expected /dev/??? to exist
May 29 08:30:11 h10 crmd[17264]:   notice: process_lrm_event: Operation 
prm_CFS_VMs_fs_monitor_0: not running (node=h10, call=31, rc=7, cib-update=23, 
confirmed=true)
May 29 08:30:12 h10 crmd[17264]:   notice: process_lrm_event: Operation 
prm_xen_v07_monitor_0: not running (node=h10, call=39, rc=7, cib-update=25, 
confirmed=true)
May 29 08:30:12 h10 crmd[17264]:   notice: process_lrm_event: Operation 
prm_xen_v05_monitor_0: not running (node=h10, call=43, rc=7, cib-update=26, 
confirmed=true)
May 29 08:30:12 h10 crmd[17264]:   notice: process_lrm_event: Operation 
prm_xen_v01_monitor_0: not running (node=h10, call=35, rc=7, cib-update=29, 
confirmed=true)
May 29 08:30:12 h10 crmd[17264]:   notice: process_lrm_event: Operation 
prm_xen_v03_monitor_0: not running (node=h10, call=47, rc=7, cib-update=30, 
confirmed=true)
May 29 08:30:12 h10 crmd[17264]:   notice: process_lrm_event: Operation 
prm_xen_v02_monitor_0: not running (node=h10, call=55, rc=7, cib-update=33, 
confirmed=true)
May 29 08:30:12 h10 attrd[17262]:   notice: attrd_perform_update: Sent update 
3: shutdown=0
May 29 08:30:12 h10 crmd[17264]:   notice: process_lrm_event: Operation 
prm_xen_v04_monitor_0: not running (node=h10, call=51, rc=7, cib-update=34, 
confirmed=true)
May 29 08:30:12 h10 crmd[17264]:   notice: process_lrm_event: Operation 
prm_xen_v09_monitor_0: not running (node=h10, call=59, rc=7, cib-update=36, 
confirmed=true)
May 29 08:30:12 h10 crmd[17264]:   notice: process_lrm_event: Operation 
prm_xen_v06_monitor_0: not running (node=h10, call=63, rc=7, cib-update=37, 
confirmed=true)
May 29 08:30:12 h10 crmd[17264]:   notice: process_lrm_event: Operation 
prm_xen_v08_monitor_0: not running (node=h10, call=67, rc=7, cib-update=38, 
confirmed=true)
May 29 08:30:12 h10 crmd[17264]:   notice: process_lrm_event: Operation 
prm_iotw_xen-LV_monitor_0: not running (node=h10, call=72, rc=7, cib-update=43, 
confirmed=true)
May 29 08:30:12 h10 crmd[17264]:   notice: process_lrm_event: Operation 
prm_iotw_xen-FS_monitor_0: not running (node=h10, call=77, rc=7, cib-update=44, 
confirmed=true)
May 29 08:30:12 h10 crmd[17264]:   notice: process_lrm_event: Operation 
prm_iotw_xen-LV-L1_monitor_0: not running (node=h10, call=82, rc=7, 
cib-update=45, confirmed=true)
May 29 08:30:12 h10 crmd[17264]:   notice: process_lrm_event: Operation 
prm_iotw_xen-LV-L2_monitor_0: not running (node=h10, call=87, rc=7, 
cib-update=46, confirmed=true)
May 29 08:30:12 h10 crmd[17264]:   notice: process_lrm_event: Operation 
prm_iotw_xen-ML-L1_monitor_0: not running (node=h10, call=98, rc=7, 
cib-update=51, confirmed=true)
May 29 08:30:12 h10 crmd[17264]:   notice: process_lrm_event: Operation 
prm_iotw_xen-ML_monitor_0: not running (node=h10, call=93, rc=7, cib-update=52, 
confirmed=true)
May 29 08:30:12 h10 crmd[17264]:   notice: process_lrm_event: Operation 
prm_iotw_xen-ML-L2_monitor_0: not running (node=h10, call=103, rc=7, 
cib-update=53, confirmed=true)
May 29 08:30:12 h10 attrd[17262]:   notice: attrd_trigger_update: Sending flush 
op to all hosts for: probe_complete (true)
May 29 08:30:12 h10 attrd[17262]:   notice: attrd_perform_update: Sent update 
7: probe_complete=true
May 29 08:30:12 h10 kernel: [  194.613196] sctp: Hash tables configured 
(established 65536 bind 65536)
May 29 08:30:12 h10 kernel: [  194.626425] DLM installed
May 29 08:30:12 h10 cluster-dlm[19180]: main: dlm_controld master started
May 29 08:30:12 h10 cluster-dlm[19180]:   notice: crm_cluster_connect: 
Connecting to cluster infrastructure: classic openais (with plugin)
May 29 08:30:12 h10 cluster-dlm[19180]:   notice: get_node_name: Defaulting to 
uname -n for the local classic openais (with plugin) node name
May 29 08:30:12 h10 cluster-dlm[19180]:   notice: get_node_name: Defaulting to 
uname -n for the local classic openais (with plugin) node name
May 29 08:30:12 h10 corosync[17254]:  [pcmk  ] info: pcmk_notify: Enabling node 
notifications for child 19180 (0x6be4f0)
May 29 08:30:12 h10 cluster-dlm[19180]: setup_misc_devices: found 
/dev/misc/dlm-control minor 54
May 29 08:30:12 h10 cluster-dlm[19180]: setup_misc_devices: found 
/dev/misc/dlm-monitor minor 53
May 29 08:30:12 h10 cluster-dlm[19180]: setup_misc_devices: found 
/dev/misc/dlm_plock minor 52
May 29 08:30:12 h10 cluster-dlm[19180]: setup_monitor: /dev/misc/dlm-monitor fd 
11
May 29 08:30:12 h10 cluster-dlm[19180]: update_comms_nodes: 
/sys/kernel/config/dlm/cluster/comms: opendir failed: 2
May 29 08:30:12 h10 cluster-dlm[19180]: clear_configfs_spaces: 
/sys/kernel/config/dlm/cluster/spaces: opendir failed: 2
May 29 08:30:12 h10 cluster-dlm[19180]: detect_protocol: totem/rrp_mode = 
'passive'
May 29 08:30:12 h10 cluster-dlm[19180]: set_configfs_protocol: set protocol 1
May 29 08:30:12 h10 cluster-dlm[19180]: set_proc_rmem: set 
/proc/sys/net/core/rmem_default 4194304
May 29 08:30:12 h10 cluster-dlm[19180]: set_proc_rmem: set 
/proc/sys/net/core/rmem_max 4194304
May 29 08:30:22 h10 cluster-dlm[19180]: setup_cpg_daemon: daemon cpg_join error 
retrying
May 29 08:30:32 h10 cluster-dlm[19180]: setup_cpg_daemon: daemon cpg_join error 
retrying
May 29 08:30:33 h10 login[18729]: FAILED LOGIN 1 FROM /dev/tty1 FOR root, User 
not known to the underlying authentication module
May 29 08:30:42 h10 cluster-dlm[19180]: setup_cpg_daemon: daemon cpg_join error 
retrying
May 29 08:30:52 h10 cluster-dlm[19180]: setup_cpg_daemon: daemon cpg_join error 
retrying
[here I decided to shutdown openais]
May 29 08:30:53 h10 sshd[19608]: Accepted keyboard-interactive/pam for root 
from 172.20.64.67 port 57512 ssh2
May 29 08:31:02 h10 cluster-dlm[19180]: setup_cpg_daemon: daemon cpg_join error 
retrying
May 29 08:31:12 h10 corosync[17254]:  [SERV  ] Unloading all Corosync service 
engines.
May 29 08:31:12 h10 corosync[17254]:  [pcmk  ] notice: pcmk_shutdown: Shuting 
down Pacemaker
[...]
[as this seemed to hang, I tried a reboot via Ctrl+Alt+Del]
May 29 08:32:12 h10 cluster-dlm[19180]: setup_cpg_daemon: daemon cpg_join error 
retrying
May 29 08:32:22 h10 cluster-dlm[19180]: setup_cpg_daemon: daemon cpg_join error 
retrying
May 29 08:32:32 h10 cluster-dlm[19180]: setup_cpg_daemon: daemon cpg_join error 
retrying
May 29 08:32:42 h10 corosync[17254]:  [pcmk  ] notice: pcmk_shutdown: Still 
waiting for crmd (pid=17264, seq=6) to terminate...
May 29 08:32:42 h10 cluster-dlm[19180]: setup_cpg_daemon: daemon cpg_join error 
retrying
May 29 08:32:51 h10 shutdown[22154]: shutting down for system reboot
May 29 08:32:51 h10 init: Switching to runlevel: 6
[...]
May 29 08:33:32 h10 cluster-dlm[19180]: setup_cpg_daemon: daemon cpg_join error 
retrying
May 29 08:33:42 h10 corosync[17254]:  [pcmk  ] notice: pcmk_shutdown: Still 
waiting for crmd (pid=17264, seq=6) to terminate...
May 29 08:33:42 h10 lrmd[17261]:  warning: child_timeout_callback: 
prm_DLM_stop_0 process (PID 20736) timed out
May 29 08:33:42 h10 lrmd[17261]:  warning: operation_finished: 
prm_DLM_stop_0:20736 - timed out after 120000ms
May 29 08:33:42 h10 crmd[17264]:    error: process_lrm_event: Operation 
prm_DLM_stop_0: Timed Out (node=h10, call=104, timeout=120000ms)
May 29 08:33:42 h10 attrd[17262]:   notice: attrd_cs_dispatch: Update relayed 
from h05
May 29 08:33:42 h10 attrd[17262]:   notice: attrd_cs_dispatch: Update relayed 
from h05
May 29 08:33:42 h10 attrd[17262]:   notice: attrd_trigger_update: Sending flush 
op to all hosts for: last-failure-prm_DLM (1496039622)
May 29 08:33:42 h10 attrd[17262]:   notice: attrd_perform_update: Sent update 
19: last-failure-prm_DLM=1496039622
May 29 08:33:42 h10 attrd[17262]:   notice: attrd_cs_dispatch: Update relayed 
from h05
May 29 08:33:42 h10 attrd[17262]:   notice: attrd_cs_dispatch: Update relayed 
from h05
May 29 08:33:42 h10 attrd[17262]:   notice: attrd_trigger_update: Sending flush 
op to all hosts for: last-failure-prm_DLM (1496039622)
May 29 08:33:42 h10 attrd[17262]:   notice: attrd_perform_update: Sent update 
21: last-failure-prm_DLM=1496039622
May 29 08:33:42 h10 cluster-dlm[19180]: setup_cpg_daemon: daemon cpg_join error 
retrying
[here fencing kicked in]

DC:
May 29 08:33:42 h05 cmirrord[25158]: [yEa32lLX]  Retry #1 of cpg_mcast_joined: 
SA_AIS_ERR_TRY_AGAIN
May 29 08:33:42 h05 cmirrord[25158]: [yEa32lLX]  Retry #1 of cpg_mcast_joined: 
SA_AIS_ERR_TRY_AGAIN
May 29 08:33:42 h05 cmirrord[25158]: [yEa32lLX]  Retry #1 of cpg_mcast_joined: 
SA_AIS_ERR_TRY_AGAIN
May 29 08:33:42 h05 pengine[24264]:  warning: unpack_rsc_op_failure: Processing 
failed op stop for prm_DLM:1 on h10: unknown error (1)
May 29 08:33:42 h05 pengine[24264]:  warning: unpack_rsc_op_failure: Processing 
failed op stop for prm_DLM:1 on h10: unknown error (1)
May 29 08:33:42 h05 pengine[24264]:  warning: pe_fence_node: Node h10 will be 
fenced because of resource failure(s)
May 29 08:33:42 h05 cmirrord[25158]: [yEa32lLX]  Retry #1 of cpg_mcast_joined: 
SA_AIS_ERR_TRY_AGAIN
May 29 08:33:42 h05 cmirrord[25158]: [yEa32lLX]  Retry #1 of cpg_mcast_joined: 
SA_AIS_ERR_TRY_AGAIN
[...]
May 29 08:33:42 h05 pengine[24264]:  warning: stage6: Scheduling Node h10 for 
STONITH
May 29 08:33:42 h05 pengine[24264]:   notice: native_stop_constraints: Stop of 
failed resource prm_DLM:1 is implicit after h10 is fenced
[...]
May 29 08:33:42 h05 cmirrord[25158]: [yEa32lLX]  Retry #1 of cpg_mcast_joined: 
SA_AIS_ERR_TRY_AGAIN
May 29 08:33:42 h05 crmd[24265]:   notice: do_te_invoke: Processing graph 1810 
(ref=pe_calc-dc-1496039622-1956) derived from 
/var/lib/pacemaker/pengine/pe-warn-30.bz2
May 29 08:33:42 h05 crmd[24265]:   notice: te_fence_node: Executing reboot 
fencing operation (160) on h10 (timeout=240000)
May 29 08:33:42 h05 stonith-ng[24261]:   notice: handle_request: Client 
crmd.24265.56d8e329 wants to fence (reboot) 'h10' with device '(any)'
May 29 08:33:42 h05 stonith-ng[24261]:   notice: initiate_remote_stonith_op: 
Initiating remote operation reboot for h10: 
c1f89edc-fe28-47b5-9890-1c3e893ced7f (0)
May 29 08:33:42 h05 cmirrord[25158]: [yEa32lLX]  Retry #1 of cpg_mcast_joined: 
SA_AIS_ERR_TRY_AGAIN
May 29 08:33:42 h05 pengine[24264]:  warning: process_pe_message: Calculated 
Transition 1810: /var/lib/pacemaker/pengine/pe-warn-30.bz2
May 29 08:33:42 h05 cmirrord[25158]: [yEa32lLX]  Retry #1 of cpg_mcast_joined: 
SA_AIS_ERR_TRY_AGAIN
May 29 08:33:42 h05 cmirrord[25158]: [yEa32lLX]  Retry #1 of cpg_mcast_joined: 
SA_AIS_ERR_TRY_AGAIN
May 29 08:33:42 h05 cmirrord[25158]: [yEa32lLX]  Retry #1 of cpg_mcast_joined: 
SA_AIS_ERR_TRY_AGAIN
[...]
May 29 08:33:42 h05 cmirrord[25158]: [yEa32lLX]  Retry #1 of cpg_mcast_joined: 
SA_AIS_ERR_TRY_AGAIN
May 29 08:33:42 h05 cmirrord[25158]: [yEa32lLX]  Retry #1 of cpg_mcast_joined: 
SA_AIS_ERR_TRY_AGAIN
May 29 08:33:42 h05 sbd: [24857]: info: Delivery process handling 
/dev/disk/by-id/dm-name-SBD_1-E3
May 29 08:33:42 h05 cmirrord[25158]: [yEa32lLX]  Retry #1 of cpg_mcast_joined: 
SA_AIS_ERR_TRY_AGAIN
May 29 08:33:42 h05 sbd: [24857]: info: Writing reset to node slot h10
May 29 08:33:42 h05 sbd: [24857]: info: Messaging delay: 180
May 29 08:33:42 h05 cmirrord[25158]: [yEa32lLX]  Retry #1 of cpg_mcast_joined: 
SA_AIS_ERR_TRY_AGAIN
May 29 08:33:42 h05 sbd: [24858]: info: Delivery process handling 
/dev/disk/by-id/dm-name-SBD_1-E4
May 29 08:33:42 h05 sbd: [24858]: info: Writing reset to node slot h10
May 29 08:33:42 h05 sbd: [24858]: info: Messaging delay: 180
May 29 08:33:42 h05 cmirrord[25158]: [yEa32lLX]  Retry #1 of cpg_mcast_joined: 
SA_AIS_ERR_TRY_AGAIN
May 29 08:33:42 h05 cmirrord[25158]: [yEa32lLX]  Retry #1 of cpg_mcast_joined: 
SA_AIS_ERR_TRY_AGAIN
[...]
May 29 08:33:54 h05 cmirrord[25158]: [yEa32lLX]  Retry #1 of cpg_mcast_joined: 
SA_AIS_ERR_TRY_AGAIN
May 29 08:33:54 h05 cmirrord[25158]: [yEa32lLX]  Retry #1 of cpg_mcast_joined: 
SA_AIS_ERR_TRY_AGAIN
May 29 08:33:54 h05 corosync[24255]:  [CLM   ] CLM CONFIGURATION CHANGE
May 29 08:33:54 h05 corosync[24255]:  [CLM   ] New Configuration:
May 29 08:33:54 h05 corosync[24255]:  [CLM   ]  r(0) ip(172.20.16.1) r(1) 
ip(10.2.2.1)
May 29 08:33:54 h05 corosync[24255]:  [CLM   ]  r(0) ip(172.20.16.5) r(1) 
ip(10.2.2.5)
May 29 08:33:54 h05 corosync[24255]:  [CLM   ] Members Left:
May 29 08:33:54 h05 corosync[24255]:  [CLM   ]  r(0) ip(172.20.16.10) r(1) 
ip(10.2.2.10)
May 29 08:33:54 h05 corosync[24255]:  [CLM   ] Members Joined:
May 29 08:33:54 h05 corosync[24255]:  [pcmk  ] notice: pcmk_peer_update: 
Transitional membership event on ring 3448: memb=2, new=0, lost=1
May 29 08:33:54 h05 corosync[24255]:  [pcmk  ] info: pcmk_peer_update: memb: 
h01 739512321
May 29 08:33:54 h05 corosync[24255]:  [pcmk  ] info: pcmk_peer_update: memb: 
h05 739512325
May 29 08:33:54 h05 corosync[24255]:  [pcmk  ] info: pcmk_peer_update: lost: 
h10 739512330
May 29 08:33:54 h05 corosync[24255]:  [CLM   ] CLM CONFIGURATION CHANGE
May 29 08:33:54 h05 corosync[24255]:  [CLM   ] New Configuration:
May 29 08:33:54 h05 corosync[24255]:  [CLM   ]  r(0) ip(172.20.16.1) r(1) 
ip(10.2.2.1)
May 29 08:33:54 h05 corosync[24255]:  [CLM   ]  r(0) ip(172.20.16.5) r(1) 
ip(10.2.2.5)
May 29 08:33:54 h05 ocfs2_controld[24901]:   notice: crm_update_peer_state: 
plugin_handle_membership: Node h10[739512330] - state is now lost (was member)
May 29 08:33:54 h05 cluster-dlm[24772]:   notice: crm_update_peer_state: 
plugin_handle_membership: Node h10[739512330] - state is now lost (was member)
May 29 08:33:54 h05 cmirrord[25158]: [yEa32lLX]  Retry #1 of cpg_mcast_joined: 
SA_AIS_ERR_TRY_AGAIN
May 29 08:33:54 h05 cluster-dlm[24772]: update_cluster: Processing membership 
3448
May 29 08:33:54 h05 cib[24260]:   notice: crm_update_peer_state: 
cib_peer_update_callback: Node h10[739512330] - state is now lost (was member)
May 29 08:33:54 h05 corosync[24255]:  [CLM   ] Members Left:
May 29 08:33:54 h05 crmd[24265]:   notice: peer_update_callback: Node return 
implies stonith of h10 (action 160) completed
May 29 08:33:54 h05 cluster-dlm[24772]: dlm_process_node: Skipped active node 
739512325: born-on=3420, last-seen=3448, this-event=3448, last-event=3444
May 29 08:33:54 h05 stonith-ng[24261]:   notice: crm_update_peer_state: 
st_peer_update_callback: Node h10[739512330] - state is now lost (was member)
May 29 08:33:54 h05 cmirrord[25158]: [yEa32lLX]  Retry #1 of cpg_mcast_joined: 
SA_AIS_ERR_TRY_AGAIN
May 29 08:33:54 h05 corosync[24255]:  [CLM   ] Members Joined:
May 29 08:33:54 h05 crmd[24265]:   notice: crm_update_peer_state: 
plugin_handle_membership: Node h10[739512330] - state is now lost (was member)
May 29 08:33:54 h05 cluster-dlm[24772]: dlm_process_node: Skipped active node 
739512321: born-on=3428, last-seen=3448, this-event=3448, last-event=3444
May 29 08:33:54 h05 corosync[24255]:  [pcmk  ] notice: pcmk_peer_update: Stable 
membership event on ring 3448: memb=2, new=0, lost=0
May 29 08:33:54 h05 cluster-dlm[24772]: del_configfs_node: del_configfs_node 
rmdir "/sys/kernel/config/dlm/cluster/comms/739512330"
May 29 08:33:54 h05 corosync[24255]:  [pcmk  ] info: pcmk_peer_update: MEMB: 
h01 739512321
May 29 08:33:54 h05 corosync[24255]:  [pcmk  ] info: pcmk_peer_update: MEMB: 
h05 739512325
May 29 08:33:54 h05 cluster-dlm[24772]: dlm_process_node: Removed inactive node 
739512330: born-on=3444, last-seen=3444, this-event=3448, last-event=3444
May 29 08:33:54 h05 cmirrord[25158]: [yEa32lLX]  Retry #1 of cpg_mcast_joined: 
SA_AIS_ERR_TRY_AGAIN
May 29 08:33:54 h05 corosync[24255]:  [pcmk  ] info: ais_mark_unseen_peer_dead: 
Node h10 was not seen in the previous transition
May 29 08:33:54 h05 corosync[24255]:  [pcmk  ] info: update_member: Node 
739512330/h10 is now: lost
May 29 08:33:54 h05 corosync[24255]:  [pcmk  ] info: send_member_notification: 
Sending membership update 3448 to 5 children
[??? 5 children???]
May 29 08:33:54 h05 corosync[24255]:  [TOTEM ] A processor joined or left the 
membership and a new membership was formed.
May 29 08:33:54 h05 cmirrord[25158]: [yEa32lLX]  Retry #1 of cpg_mcast_joined: 
SA_AIS_ERR_TRY_AGAIN
May 29 08:33:54 h05 cmirrord[25158]: [yEa32lLX]  Retry #1 of cpg_mcast_joined: 
SA_AIS_ERR_TRY_AGAIN
May 29 08:33:54 h05 kernel: [1622638.010108] dlm: closing connection to node 
739512330
May 29 08:33:54 h05 cmirrord[25158]: [yEa32lLX]  Retry #1 of cpg_mcast_joined: 
SA_AIS_ERR_TRY_AGAIN
May 29 08:33:54 h05 cmirrord[25158]: [yEa32lLX]  Retry #1 of cpg_mcast_joined: 
SA_AIS_ERR_TRY_AGAIN
May 29 08:33:54 h05 cmirrord[25158]: [yEa32lLX]  Retry #1 of cpg_mcast_joined: 
SA_AIS_ERR_TRY_AGAIN
May 29 08:33:54 h05 cmirrord[25158]: [yEa32lLX]  Retry #1 of cpg_mcast_joined: 
SA_AIS_ERR_TRY_AGAIN
May 29 08:33:54 h05 corosync[24255]:  [CPG   ] chosen downlist: sender r(0) 
ip(172.20.16.1) r(1) ip(10.2.2.1) ; members(old:2 left:0)
May 29 08:33:54 h05 cmirrord[25158]: [yEa32lLX]  Retry #1 of cpg_mcast_joined: 
SA_AIS_ERR_TRY_AGAIN
May 29 08:33:54 h05 cmirrord[25158]: [yEa32lLX]  Retry #1 of cpg_mcast_joined: 
SA_AIS_ERR_TRY_AGAIN
May 29 08:33:54 h05 cmirrord[25158]: [yEa32lLX]  Retry #1 of cpg_mcast_joined: 
SA_AIS_ERR_TRY_AGAIN
May 29 08:33:54 h05 cmirrord[25158]: [yEa32lLX]  Retry #1 of cpg_mcast_joined: 
SA_AIS_ERR_TRY_AGAIN
May 29 08:33:54 h05 cmirrord[25158]: [yEa32lLX]  Retry #1 of cpg_mcast_joined: 
SA_AIS_ERR_TRY_AGAIN
May 29 08:33:54 h05 corosync[24255]:  [MAIN  ] Completed service 
synchronization, ready to provide service.
May 29 08:33:54 h05 crmd[24265]:   notice: abort_transition_graph: Transition 
aborted: Peer Halt (source=do_te_invoke:168, 0)
May 29 08:33:54 h05 crmd[24265]:   notice: abort_transition_graph: Transition 
aborted by deletion of transient_attributes[@id='h10']: Transient attribute 
change (cib=1.150.115, source=te_update_diff:395, pa
th=/cib/status/node_state[@id='h10']/transient_attributes[@id='h10'], 0)
May 29 08:33:54 h05 crmd[24265]:   notice: crm_update_quorum: Updating quorum 
status to true (call=2080)
May 29 08:33:54 h05 attrd[24263]:   notice: attrd_local_callback: Sending full 
refresh (origin=crmd)
May 29 08:33:54 h05 attrd[24263]:   notice: attrd_trigger_update: Sending flush 
op to all hosts for: shutdown (0)
May 29 08:33:54 h05 attrd[24263]:   notice: attrd_trigger_update: Sending flush 
op to all hosts for: probe_complete (true)
[...from here the system behaved sanely again...]

Regards,
Ulrich



_______________________________________________
Users mailing list: [email protected]
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] Q: SLES10 SP4: Node (cmirrord) refuses to join cluster (long)

Reply via email to