Re: [ClusterLabs] ClusterIP location constraint reappears after reboot

Jeremy Matthews Mon, 22 Feb 2016 15:27:38 -0800

Thanks for the quick response again, and pardon for the delay in responding. A 
colleague of mine and I have been trying some different things today.


But from the reboot on Friday, further below are the logs from corosync.log 
from the time of the reboot command to the constraint being added.

I am not able to perform a "pcs cluster cib-upgrade". The version of pcs that I 
have does not have that option (just cib [filename] and cib-push <filename>). 
My versions at the time of these logs were:

[root@g5se-f3efce Packages]# pcs --version
0.9.90
[root@g5se-f3efce Packages]# pacemakerd --version
Pacemaker 1.1.11
Written by Andrew Beekhof

I think you're right in that we had a script banning the ClusterIP. It is 
called from a message daemon that we created that acts as middleware between 
the cluster software and our application. In this daemon, it has an exit 
handler that calls a script which runs:

pcs resource ban ClusterIP $host                        # where $host is the 
result of "host =`hostname`

...cause we normally try to push the cluster IP to the other side (though in 
this case, we just have one node), but then right after that the script calls:

pcs resource clear ClusterIP


...but for some reason, it doesn't seem to result in the constraint being 
removed (see even FURTHER below where I show a /var/log/message log snippet 
with both the constraint addition and removal; this was using an earlier 
version of pacemaker, Pacemaker 1.1.10-1.el6_4.4). I guess with the earlier pcs 
or pacemaker version, these logs went to messages rather than corosync.log 
today.

I am in a bit of a conundrum in that if I upgrade pcs to the 0.9.149 (retrieved 
and "make install" 'ed from github.com because 0.9.139 had a pcs issue with one 
node clusters) which has the cib-upgrade option), then if I manually remove the 
ClusterIP constraint this causes a problem for our message daemon in that it 
thinks neither side in the cluster is active; something to look at on our end. 
So it seems the removal of the constraint affects our daemon in the new pcs. 
For the time being, I've rolled back pcs to the above 0.9.90 version.

One other thing to mention is that the timing of pacemaker's start may have 
been delayed by what I found out was a change to its initialization header (by 
either our daemon or application installation script) from 90 1 to 70 20. So in 
/etc/rc3.d, there is S70pacemaker rather than S90pacemaker. I am not a Linux 
expert by any means. I guess that may affect start up, but I'm not sure about 
shutdown.

Corosync logs from the time reboot was issued to the constraint being added:

Feb 19 15:22:22 [1997] g5se-f3efce      attrd:   notice: attrd_trigger_update:  
Sending flush op to all hosts for: standby (true)
Feb 19 15:22:22 [1997] g5se-f3efce      attrd:   notice: attrd_perform_update:  
Sent update 24: standby=true
Feb 19 15:22:22 [1994] g5se-f3efce        cib:     info: cib_process_request:   
Forwarding cib_modify operation for section status to master 
(origin=local/attrd/24)
Feb 19 15:22:22 [1994] g5se-f3efce        cib:     info: cib_perform_op:        
Diff: --- 0.291.2 2
Feb 19 15:22:22 [1994] g5se-f3efce        cib:     info: cib_perform_op:        
Diff: +++ 0.291.3 (null)
Feb 19 15:22:22 [1994] g5se-f3efce        cib:     info: cib_perform_op:        
+  /cib:  @num_updates=3
Feb 19 15:22:22 [1994] g5se-f3efce        cib:     info: cib_perform_op:        
++ 
/cib/status/node_state[@id='g5se-f3efce']/transient_attributes[@id='g5se-f3efce']/instance_attributes[@id='status-g5se-f3efce']:
  <nvpair id="status-g5se-f3efce-standby" name="standby" value="true"/>
Feb 19 15:22:22 [1999] g5se-f3efce       crmd:     info: 
abort_transition_graph:        Transition aborted by 
status-g5se-f3efce-standby, standby=true: Transient attribute change (create 
cib=0.291.3, source=te_update_diff:391, 
path=/cib/status/node_state[@id='g5se-f3efce']/transient_attributes[@id='g5se-f3efce']/instance_attributes[@id='status-g5se-f3efce'],
 1)
Feb 19 15:22:22 [1999] g5se-f3efce       crmd:   notice: do_state_transition:   
State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC 
cause=C_FSA_INTERNAL origin=abort_transition_graph ]
Feb 19 15:22:22 [1994] g5se-f3efce        cib:     info: cib_process_request:   
Completed cib_modify operation for section status: OK (rc=0, 
origin=g5se-f3efce/attrd/24, version=0.291.3)
Feb 19 15:22:22 [1998] g5se-f3efce    pengine:   notice: update_validation:     
pacemaker-1.2-style configuration is also valid for pacemaker-1.3
Feb 19 15:22:22 [1998] g5se-f3efce    pengine:     info: update_validation:     
Transformation upgrade-1.3.xsl successful
Feb 19 15:22:22 [1998] g5se-f3efce    pengine:     info: update_validation:     
Transformed the configuration from pacemaker-1.2 to pacemaker-2.0
Feb 19 15:22:22 [1998] g5se-f3efce    pengine:     info: cli_config_update:     
Your configuration was internally updated to the latest version (pacemaker-2.0)
Feb 19 15:22:22 [1998] g5se-f3efce    pengine:   notice: unpack_config:         
On loss of CCM Quorum: Ignore
Feb 19 15:22:22 [1998] g5se-f3efce    pengine:     info: unpack_status:         
Node g5se-f3efce is in standby-mode
Feb 19 15:22:22 [1998] g5se-f3efce    pengine:     info: 
determine_online_status:       Node g5se-f3efce is standby
Feb 19 15:22:22 [1998] g5se-f3efce    pengine:     info: native_print:  
sw-ready-g5se-f3efce    (ocf::pacemaker:GBmon): Started g5se-f3efce
Feb 19 15:22:22 [1998] g5se-f3efce    pengine:     info: native_print:  
meta-data       (ocf::pacemaker:GBmon): Started g5se-f3efce
Feb 19 15:22:22 [1998] g5se-f3efce    pengine:     info: native_print:  netmon  
(ocf::heartbeat:ethmonitor):    Started g5se-f3efce
Feb 19 15:22:22 [1998] g5se-f3efce    pengine:     info: native_print:  
ClusterIP       (ocf::heartbeat:IPaddr2):       Started g5se-f3efce
Feb 19 15:22:22 [1998] g5se-f3efce    pengine:     info: native_color:  
Resource sw-ready-g5se-f3efce cannot run anywhere
Feb 19 15:22:22 [1998] g5se-f3efce    pengine:     info: native_color:  
Resource meta-data cannot run anywhere
Feb 19 15:22:22 [1998] g5se-f3efce    pengine:     info: native_color:  
Resource netmon cannot run anywhere
Feb 19 15:22:22 [1998] g5se-f3efce    pengine:     info: native_color:  
Resource ClusterIP cannot run anywhere
Feb 19 15:22:22 [1998] g5se-f3efce    pengine:   notice: LogActions:    Stop    
sw-ready-g5se-f3efce    (g5se-f3efce)
Feb 19 15:22:22 [1998] g5se-f3efce    pengine:   notice: LogActions:    Stop    
meta-data       (g5se-f3efce)
Feb 19 15:22:22 [1998] g5se-f3efce    pengine:   notice: LogActions:    Stop    
netmon  (g5se-f3efce)
Feb 19 15:22:22 [1998] g5se-f3efce    pengine:   notice: LogActions:    Stop    
ClusterIP       (g5se-f3efce)
Feb 19 15:22:22 [1999] g5se-f3efce       crmd:     info: do_state_transition:   
State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS 
cause=C_IPC_MESSAGE origin=handle_response ]
Feb 19 15:22:22 [1999] g5se-f3efce       crmd:     info: do_te_invoke:  
Processing graph 8 (ref=pe_calc-dc-1455920542-41) derived from 
/var/lib/pacemaker/pengine/pe-input-641.bz2
Feb 19 15:22:22 [1999] g5se-f3efce       crmd:   notice: te_rsc_command:        
Initiating action 8: stop sw-ready-g5se-f3efce_stop_0 on g5se-f3efce (local)
Feb 19 15:22:22 [1996] g5se-f3efce       lrmd:     info: 
cancel_recurring_action:       Cancelling operation 
sw-ready-g5se-f3efce_monitor_10000
Feb 19 15:22:22 [1999] g5se-f3efce       crmd:     info: do_lrm_rsc_op:         
Performing key=8:8:0:b7b85b39-a745-4cd7-abc4-059a684da6ed 
op=sw-ready-g5se-f3efce_stop_0
Feb 19 15:22:22 [1996] g5se-f3efce       lrmd:     info: log_execute:   
executing - rsc:sw-ready-g5se-f3efce action:stop call_id:31
Feb 19 15:22:22 [1999] g5se-f3efce       crmd:   notice: te_rsc_command:        
Initiating action 9: stop meta-data_stop_0 on g5se-f3efce (local)
Feb 19 15:22:22 [1996] g5se-f3efce       lrmd:     info: 
cancel_recurring_action:       Cancelling operation meta-data_monitor_60000
Feb 19 15:22:22 [1999] g5se-f3efce       crmd:     info: do_lrm_rsc_op:         
Performing key=9:8:0:b7b85b39-a745-4cd7-abc4-059a684da6ed op=meta-data_stop_0
Feb 19 15:22:22 [1996] g5se-f3efce       lrmd:     info: log_execute:   
executing - rsc:meta-data action:stop call_id:33
Feb 19 15:22:22 [1999] g5se-f3efce       crmd:   notice: te_rsc_command:        
Initiating action 10: stop netmon_stop_0 on g5se-f3efce (local)
Feb 19 15:22:22 [1996] g5se-f3efce       lrmd:     info: 
cancel_recurring_action:       Cancelling operation netmon_monitor_10000
Feb 19 15:22:22 [1999] g5se-f3efce       crmd:     info: do_lrm_rsc_op:         
Performing key=10:8:0:b7b85b39-a745-4cd7-abc4-059a684da6ed op=netmon_stop_0
Feb 19 15:22:22 [1996] g5se-f3efce       lrmd:     info: log_execute:   
executing - rsc:netmon action:stop call_id:35
Feb 19 15:22:22 [1999] g5se-f3efce       crmd:   notice: te_rsc_command:        
Initiating action 11: stop ClusterIP_stop_0 on g5se-f3efce (local)
Feb 19 15:22:22 [1996] g5se-f3efce       lrmd:     info: 
cancel_recurring_action:       Cancelling operation ClusterIP_monitor_30000
Feb 19 15:22:22 [1999] g5se-f3efce       crmd:     info: do_lrm_rsc_op:         
Performing key=11:8:0:b7b85b39-a745-4cd7-abc4-059a684da6ed op=ClusterIP_stop_0
Feb 19 15:22:22 [1996] g5se-f3efce       lrmd:     info: log_execute:   
executing - rsc:ClusterIP action:stop call_id:37
Feb 19 15:22:22 [1999] g5se-f3efce       crmd:     info: process_lrm_event:     
Operation sw-ready-g5se-f3efce_monitor_10000: Cancelled (node=g5se-f3efce, 
call=29, confirmed=true)
Feb 19 15:22:22 [1999] g5se-f3efce       crmd:     info: process_lrm_event:     
Operation meta-data_monitor_60000: Cancelled (node=g5se-f3efce, call=21, 
confirmed=true)
Feb 19 15:22:22 [1999] g5se-f3efce       crmd:     info: process_lrm_event:     
Operation netmon_monitor_10000: Cancelled (node=g5se-f3efce, call=23, 
confirmed=true)
Feb 19 15:22:22 [1999] g5se-f3efce       crmd:     info: process_lrm_event:     
Operation ClusterIP_monitor_30000: Cancelled (node=g5se-f3efce, call=25, 
confirmed=true)
Feb 19 15:22:22 [1998] g5se-f3efce    pengine:   notice: process_pe_message:    
Calculated Transition 8: /var/lib/pacemaker/pengine/pe-input-641.bz2
Feb 19 15:22:22 [1996] g5se-f3efce       lrmd:     info: log_finished:  
finished - rsc:sw-ready-g5se-f3efce action:stop call_id:31 pid:6013 exit-code:0 
exec-time:56ms queue-time:0ms
Feb 19 15:22:22 [1999] g5se-f3efce       crmd:   notice: process_lrm_event:     
Operation sw-ready-g5se-f3efce_stop_0: ok (node=g5se-f3efce, call=31, rc=0, 
cib-update=72, confirmed=true)
Feb 19 15:22:22 [1994] g5se-f3efce        cib:     info: cib_process_request:   
Forwarding cib_modify operation for section status to master 
(origin=local/crmd/72)
Feb 19 15:22:22 [1994] g5se-f3efce        cib:     info: cib_perform_op:        
Diff: --- 0.291.3 2
Feb 19 15:22:22 [1994] g5se-f3efce        cib:     info: cib_perform_op:        
Diff: +++ 0.291.4 (null)
Feb 19 15:22:22 [1994] g5se-f3efce        cib:     info: cib_perform_op:        
+  /cib:  @num_updates=4
Feb 19 15:22:22 [1994] g5se-f3efce        cib:     info: cib_perform_op:        
+  
/cib/status/node_state[@id='g5se-f3efce']/lrm[@id='g5se-f3efce']/lrm_resources/lrm_resource[@id='sw-ready-g5se-f3efce']/lrm_rsc_op[@id='sw-ready-g5se-f3efce_last_0']:
  @operation_key=sw-ready-g5se-f3efce_stop_0, @operation=stop, 
@transition-key=8:8:0:b7b85b39-a745-4cd7-abc4-059a684da6ed, 
@transition-magic=0:0;8:8:0:b7b85b39-a745-4cd7-abc4-059a684da6ed, @call-id=31, 
@last-run=1455920542, @last-rc-change=1455920542, @exec-time=56
Feb 19 15:22:22 [1994] g5se-f3efce        cib:     info: cib_process_request:   
Completed cib_modify operation for section status: OK (rc=0, 
origin=g5se-f3efce/crmd/72, version=0.291.4)
Feb 19 15:22:22 [1999] g5se-f3efce       crmd:     info: match_graph_event:     
Action sw-ready-g5se-f3efce_stop_0 (8) confirmed on g5se-f3efce (rc=0)
Feb 19 15:22:22 [1996] g5se-f3efce       lrmd:     info: log_finished:  
finished - rsc:meta-data action:stop call_id:33 pid:6014 exit-code:0 
exec-time:72ms queue-time:0ms
Feb 19 15:22:22 [1999] g5se-f3efce       crmd:   notice: process_lrm_event:     
Operation meta-data_stop_0: ok (node=g5se-f3efce, call=33, rc=0, cib-update=73, 
confirmed=true)
Feb 19 15:22:22 [1994] g5se-f3efce        cib:     info: cib_process_request:   
Forwarding cib_modify operation for section status to master 
(origin=local/crmd/73)
Feb 19 15:22:22 [1994] g5se-f3efce        cib:     info: cib_perform_op:        
Diff: --- 0.291.4 2
Feb 19 15:22:22 [1994] g5se-f3efce        cib:     info: cib_perform_op:        
Diff: +++ 0.291.5 (null)
Feb 19 15:22:22 [1994] g5se-f3efce        cib:     info: cib_perform_op:        
+  /cib:  @num_updates=5
Feb 19 15:22:22 [1994] g5se-f3efce        cib:     info: cib_perform_op:        
+  
/cib/status/node_state[@id='g5se-f3efce']/lrm[@id='g5se-f3efce']/lrm_resources/lrm_resource[@id='meta-data']/lrm_rsc_op[@id='meta-data_last_0']:
  @operation_key=meta-data_stop_0, @operation=stop, 
@crm-debug-origin=do_update_resource, 
@transition-key=9:8:0:b7b85b39-a745-4cd7-abc4-059a684da6ed, 
@transition-magic=0:0;9:8:0:b7b85b39-a745-4cd7-abc4-059a684da6ed, @call-id=33, 
@last-run=1455920542, @last-rc-change=1455920542, @exec-time=
Feb 19 15:22:22 [1994] g5se-f3efce        cib:     info: cib_process_request:   
Completed cib_modify operation for section status: OK (rc=0, 
origin=g5se-f3efce/crmd/73, version=0.291.5)
Feb 19 15:22:22 [1999] g5se-f3efce       crmd:     info: match_graph_event:     
Action meta-data_stop_0 (9) confirmed on g5se-f3efce (rc=0)
Feb 19 15:22:22 [1997] g5se-f3efce      attrd:   notice: attrd_trigger_update:  
Sending flush op to all hosts for: ethmonitor-eth0 (<null>)
Feb 19 15:22:22 [1994] g5se-f3efce        cib:     info: cib_process_request:   
Forwarding cib_delete operation for section status to master 
(origin=local/attrd/26)
Feb 19 15:22:22 [1997] g5se-f3efce      attrd:   notice: attrd_perform_update:  
Sent delete 26: node=g5se-f3efce, attr=ethmonitor-eth0, id=<n/a>, set=(null), 
section=status
Feb 19 15:22:22 [1994] g5se-f3efce        cib:     info: cib_perform_op:        
Diff: --- 0.291.5 2
Feb 19 15:22:22 [1994] g5se-f3efce        cib:     info: cib_perform_op:        
Diff: +++ 0.291.6 (null)
Feb 19 15:22:22 [1994] g5se-f3efce        cib:     info: cib_perform_op:        
-- 
/cib/status/node_state[@id='g5se-f3efce']/transient_attributes[@id='g5se-f3efce']/instance_attributes[@id='status-g5se-f3efce']/nvpair[@id='status-g5se-f3efce-ethmonitor-eth0']
Feb 19 15:22:22 [1994] g5se-f3efce        cib:     info: cib_perform_op:        
+  /cib:  @num_updates=6
Feb 19 15:22:22 [1996] g5se-f3efce       lrmd:     info: log_finished:  
finished - rsc:netmon action:stop call_id:35 pid:6015 exit-code:0 
exec-time:99ms queue-time:0ms
Feb 19 15:22:22 [1994] g5se-f3efce        cib:     info: cib_process_request:   
Completed cib_delete operation for section status: OK (rc=0, 
origin=g5se-f3efce/attrd/26, version=0.291.6)
Feb 19 15:22:22 [1999] g5se-f3efce       crmd:   notice: 
abort_transition_graph:        Transition aborted by deletion of 
nvpair[@id='status-g5se-f3efce-ethmonitor-eth0']: Transient attribute change 
(cib=0.291.6, source=te_update_diff:391, 
path=/cib/status/node_state[@id='g5se-f3efce']/transient_attributes[@id='g5se-f3efce']/instance_attributes[@id='status-g5se-f3efce']/nvpair[@id='status-g5se-f3efce-ethmonitor-eth0'],
 0)
Feb 19 15:22:22 [1999] g5se-f3efce       crmd:   notice: process_lrm_event:     
Operation netmon_stop_0: ok (node=g5se-f3efce, call=35, rc=0, cib-update=74, 
confirmed=true)
Feb 19 15:22:22 [1994] g5se-f3efce        cib:     info: cib_process_request:   
Forwarding cib_modify operation for section status to master 
(origin=local/crmd/74)
Feb 19 15:22:23 [1994] g5se-f3efce        cib:     info: cib_perform_op:        
Diff: --- 0.291.6 2
Feb 19 15:22:23 [1994] g5se-f3efce        cib:     info: cib_perform_op:        
Diff: +++ 0.291.7 (null)
Feb 19 15:22:23 [1994] g5se-f3efce        cib:     info: cib_perform_op:        
+  /cib:  @num_updates=7
Feb 19 15:22:23 [1994] g5se-f3efce        cib:     info: cib_perform_op:        
+  
/cib/status/node_state[@id='g5se-f3efce']/lrm[@id='g5se-f3efce']/lrm_resources/lrm_resource[@id='netmon']/lrm_rsc_op[@id='netmon_last_0']:
  @operation_key=netmon_stop_0, @operation=stop, 
@crm-debug-origin=do_update_resource, 
@transition-key=10:8:0:b7b85b39-a745-4cd7-abc4-059a684da6ed, 
@transition-magic=0:0;10:8:0:b7b85b39-a745-4cd7-abc4-059a684da6ed, @call-id=35, 
@last-run=1455920542, @last-rc-change=1455920542, @exec-time=99
Feb 19 15:22:23 [1994] g5se-f3efce        cib:     info: cib_process_request:   
Completed cib_modify operation for section status: OK (rc=0, 
origin=g5se-f3efce/crmd/74, version=0.291.7)
Feb 19 15:22:23 [1999] g5se-f3efce       crmd:     info: match_graph_event:     
Action netmon_stop_0 (10) confirmed on g5se-f3efce (rc=0)
Feb 19 15:22:23 [1994] g5se-f3efce        cib:     info: cib_process_request:   
Forwarding cib_delete operation for section constraints to master 
(origin=local/crm_resource/3)
Feb 19 15:22:23 [1994] g5se-f3efce        cib:     info: cib_process_request:   
Completed cib_delete operation for section constraints: OK (rc=0, 
origin=g5se-f3efce/crm_resource/3, version=0.291.7)
IPaddr2[6016]:  2016/02/19_15:22:23 INFO: IP status = ok, IP_CIP=
Feb 19 15:22:23 [1996] g5se-f3efce       lrmd:     info: log_finished:  
finished - rsc:ClusterIP action:stop call_id:37 pid:6016 exit-code:0 
exec-time:127ms queue-time:0ms
Feb 19 15:22:23 [1999] g5se-f3efce       crmd:   notice: process_lrm_event:     
Operation ClusterIP_stop_0: ok (node=g5se-f3efce, call=37, rc=0, cib-update=75, 
confirmed=true)
Feb 19 15:22:23 [1994] g5se-f3efce        cib:     info: cib_process_request:   
Forwarding cib_modify operation for section status to master 
(origin=local/crmd/75)
Feb 19 15:22:23 [1994] g5se-f3efce        cib:     info: cib_perform_op:        
Diff: --- 0.291.7 2
Feb 19 15:22:23 [1994] g5se-f3efce        cib:     info: cib_perform_op:        
Diff: +++ 0.291.8 (null)
Feb 19 15:22:23 [1994] g5se-f3efce        cib:     info: cib_perform_op:        
+  /cib:  @num_updates=8
Feb 19 15:22:23 [1994] g5se-f3efce        cib:     info: cib_perform_op:        
+  
/cib/status/node_state[@id='g5se-f3efce']/lrm[@id='g5se-f3efce']/lrm_resources/lrm_resource[@id='ClusterIP']/lrm_rsc_op[@id='ClusterIP_last_0']:
  @operation_key=ClusterIP_stop_0, @operation=stop, 
@transition-key=11:8:0:b7b85b39-a745-4cd7-abc4-059a684da6ed, 
@transition-magic=0:0;11:8:0:b7b85b39-a745-4cd7-abc4-059a684da6ed, @call-id=37, 
@last-run=1455920542, @last-rc-change=1455920542, @exec-time=127
Feb 19 15:22:23 [1994] g5se-f3efce        cib:     info: cib_process_request:   
Completed cib_modify operation for section status: OK (rc=0, 
origin=g5se-f3efce/crmd/75, version=0.291.8)
Feb 19 15:22:23 [1999] g5se-f3efce       crmd:     info: match_graph_event:     
Action ClusterIP_stop_0 (11) confirmed on g5se-f3efce (rc=0)
Feb 19 15:22:23 [1999] g5se-f3efce       crmd:   notice: run_graph:     
Transition 8 (Complete=4, Pending=0, Fired=0, Skipped=1, Incomplete=0, 
Source=/var/lib/pacemaker/pengine/pe-input-641.bz2): Stopped
Feb 19 15:22:23 [1999] g5se-f3efce       crmd:     info: do_state_transition:   
State transition S_TRANSITION_ENGINE -> S_POLICY_ENGINE [ input=I_PE_CALC 
cause=C_FSA_INTERNAL origin=notify_crmd ]
Feb 19 15:22:23 [1998] g5se-f3efce    pengine:   notice: update_validation:     
pacemaker-1.2-style configuration is also valid for pacemaker-1.3
Feb 19 15:22:23 [1998] g5se-f3efce    pengine:     info: update_validation:     
Transformation upgrade-1.3.xsl successful
Feb 19 15:22:23 [1998] g5se-f3efce    pengine:     info: update_validation:     
Transformed the configuration from pacemaker-1.2 to pacemaker-2.0
Feb 19 15:22:23 [1998] g5se-f3efce    pengine:     info: cli_config_update:     
Your configuration was internally updated to the latest version (pacemaker-2.0)
Feb 19 15:22:23 [1998] g5se-f3efce    pengine:   notice: unpack_config:         
On loss of CCM Quorum: Ignore
Feb 19 15:22:23 [1998] g5se-f3efce    pengine:     info: unpack_status:         
Node g5se-f3efce is in standby-mode
Feb 19 15:22:23 [1998] g5se-f3efce    pengine:     info: 
determine_online_status:       Node g5se-f3efce is standby
Feb 19 15:22:23 [1998] g5se-f3efce    pengine:     info: native_print:  
sw-ready-g5se-f3efce    (ocf::pacemaker:GBmon): Stopped
Feb 19 15:22:23 [1998] g5se-f3efce    pengine:     info: native_print:  
meta-data       (ocf::pacemaker:GBmon): Stopped
Feb 19 15:22:23 [1998] g5se-f3efce    pengine:     info: native_print:  netmon  
(ocf::heartbeat:ethmonitor):    Stopped
Feb 19 15:22:23 [1998] g5se-f3efce    pengine:     info: native_print:  
ClusterIP       (ocf::heartbeat:IPaddr2):       Stopped
Feb 19 15:22:23 [1998] g5se-f3efce    pengine:     info: native_color:  
Resource sw-ready-g5se-f3efce cannot run anywhere
Feb 19 15:22:23 [1998] g5se-f3efce    pengine:     info: native_color:  
Resource meta-data cannot run anywhere
Feb 19 15:22:23 [1998] g5se-f3efce    pengine:     info: native_color:  
Resource netmon cannot run anywhere
Feb 19 15:22:23 [1998] g5se-f3efce    pengine:     info: native_color:  
Resource ClusterIP cannot run anywhere
Feb 19 15:22:23 [1998] g5se-f3efce    pengine:     info: LogActions:    Leave   
sw-ready-g5se-f3efce    (Stopped)
Feb 19 15:22:23 [1998] g5se-f3efce    pengine:     info: LogActions:    Leave   
meta-data       (Stopped)
Feb 19 15:22:23 [1998] g5se-f3efce    pengine:     info: LogActions:    Leave   
netmon  (Stopped)
Feb 19 15:22:23 [1998] g5se-f3efce    pengine:     info: LogActions:    Leave   
ClusterIP       (Stopped)
Feb 19 15:22:23 [1999] g5se-f3efce       crmd:     info: do_state_transition:   
State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS 
cause=C_IPC_MESSAGE origin=handle_response ]
Feb 19 15:22:23 [1999] g5se-f3efce       crmd:     info: do_te_invoke:  
Processing graph 9 (ref=pe_calc-dc-1455920543-46) derived from 
/var/lib/pacemaker/pengine/pe-input-642.bz2
Feb 19 15:22:23 [1999] g5se-f3efce       crmd:   notice: run_graph:     
Transition 9 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, 
Source=/var/lib/pacemaker/pengine/pe-input-642.bz2): Complete
Feb 19 15:22:23 [1999] g5se-f3efce       crmd:     info: do_log:        FSA: 
Input I_TE_SUCCESS from notify_crmd() received in state S_TRANSITION_ENGINE
Feb 19 15:22:23 [1999] g5se-f3efce       crmd:   notice: do_state_transition:   
State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS 
cause=C_FSA_INTERNAL origin=notify_crmd ]
Feb 19 15:22:23 [1998] g5se-f3efce    pengine:   notice: process_pe_message:    
Calculated Transition 9: /var/lib/pacemaker/pengine/pe-input-642.bz2
Feb 19 15:22:23 [1994] g5se-f3efce        cib:     info: cib_process_request:   
Forwarding cib_modify operation for section constraints to master 
(origin=local/crm_resource/3)
Feb 19 15:22:23 [1994] g5se-f3efce        cib:     info: cib_perform_op:        
Diff: --- 0.291.8 2
Feb 19 15:22:23 [1994] g5se-f3efce        cib:     info: cib_perform_op:        
Diff: +++ 0.292.0 (null)
Feb 19 15:22:23 [1994] g5se-f3efce        cib:     info: cib_perform_op:        
+  /cib:  @epoch=292, @num_updates=0
Feb 19 15:22:23 [1994] g5se-f3efce        cib:     info: cib_perform_op:        
++ /cib/configuration/constraints:  <rsc_location 
id="cli-ban-ClusterIP-on-g5se-f3efce" rsc="ClusterIP" role="Started" 
node="g5se-f3efce" score="-INFINITY"/>
Feb 19 15:22:23 [1994] g5se-f3efce        cib:     info: cib_process_request:   
Completed cib_modify operation for section constraints: OK (rc=0, 
origin=g5se-f3efce/crm_resource/3, version=0.292.0)
Feb 19 15:22:23 [1999] g5se-f3efce       crmd:     info: 
abort_transition_graph:        Transition aborted by 
rsc_location.cli-ban-ClusterIP-on-g5se-f3efce 'create': Non-status change 
(cib=0.292.0, source=te_update_diff:383, path=/cib/configuration/constraints, 1)
Feb 19 15:22:23 [1999] g5se-f3efce       crmd:   notice: do_state_transition:   
State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC 
cause=C_FSA_INTERNAL origin=abort_transition_graph ]
Feb 19 15:22:23 [1998] g5se-f3efce    pengine:   notice: update_validation:     
pacemaker-1.2-style configuration is also valid for pacemaker-1.3
Feb 19 15:22:23 [1998] g5se-f3efce    pengine:     info: update_validation:     
Transformation upgrade-1.3.xsl successful
Feb 19 15:22:23 [1998] g5se-f3efce    pengine:     info: update_validation:     
Transformed the configuration from pacemaker-1.2 to pacemaker-2.0
Feb 19 15:22:23 [1998] g5se-f3efce    pengine:     info: cli_config_update:     
Your configuration was internally updated to the latest version (pacemaker-2.0)
Feb 19 15:22:23 [1998] g5se-f3efce    pengine:   notice: unpack_config:         
On loss of CCM Quorum: Ignore
Feb 19 15:22:23 [1998] g5se-f3efce    pengine:     info: unpack_status:         
Node g5se-f3efce is in standby-mode

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

/var/log/messages snippet showing at the bottom addition and removal of 
constraint (this is with pcs 0.9.90 and pacemakerd 1.1.10-1.el6_4.4):

Feb 21 23:10:38 g5se-dea2b1 azMD[1584]: Sending INIT message to partner. Count 
21
Feb 21 23:10:41 g5se-dea2b1 init: tty (/dev/tty1) main process (1732) killed by 
TERM signal
Feb 21 23:10:41 g5se-dea2b1 init: tty (/dev/tty2) main process (1734) killed by 
TERM signal
Feb 21 23:10:41 g5se-dea2b1 init: tty (/dev/tty3) main process (1736) killed by 
TERM signal
Feb 21 23:10:41 g5se-dea2b1 init: tty (/dev/tty4) main process (1738) killed by 
TERM signal
Feb 21 23:10:41 g5se-dea2b1 init: tty (/dev/tty5) main process (1740) killed by 
TERM signal
Feb 21 23:10:41 g5se-dea2b1 init: tty (/dev/tty6) main process (1742) killed by 
TERM signal
Feb 21 23:10:41 g5se-dea2b1 avahi-daemon[1473]: Got SIGTERM, quitting.
Feb 21 23:10:41 g5se-dea2b1 avahi-daemon[1473]: Leaving mDNS multicast group on 
interface eth0.IPv4 with address 172.20.240.124.
Feb 21 23:10:42 g5se-dea2b1 azMD[1584]: [azIntTrmHandler] Int Trm handler 15
Feb 21 23:10:42 g5se-dea2b1 azMD[1584]: [azExitHandler] exit handler
Feb 21 23:10:42 g5se-dea2b1 crmd[1562]:   notice: do_state_transition: State 
transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL 
origin=abort_transition_graph ]
Feb 21 23:10:42 g5se-dea2b1 cib[1557]:   notice: cib:diff: Diff: --- 0.66.3
Feb 21 23:10:42 g5se-dea2b1 cib[1557]:   notice: cib:diff: Diff: +++ 0.67.1 
44c794d4381e36ea4f5d51d0dd7fde1d
Feb 21 23:10:42 g5se-dea2b1 cib[1557]:   notice: cib:diff: -- <cib 
admin_epoch="0" epoch="66" num_updates="3"/>
Feb 21 23:10:42 g5se-dea2b1 cib[1557]:   notice: cib:diff: ++           <nvpair 
id="sw-ready-g5se-dea2b1-meta_attributes-target-role" name="target-role" 
value="Stopped"/>
Feb 21 23:10:42 g5se-dea2b1 stonith-ng[1558]:   notice: unpack_config: On loss 
of CCM Quorum: Ignore
Feb 21 23:10:42 g5se-dea2b1 pengine[1561]:   notice: unpack_config: On loss of 
CCM Quorum: Ignore
Feb 21 23:10:42 g5se-dea2b1 pengine[1561]:   notice: LogActions: Stop    
sw-ready-g5se-dea2b1#011(g5se-dea2b1)
Feb 21 23:10:42 g5se-dea2b1 pengine[1561]:   notice: process_pe_message: 
Calculated Transition 32: /var/lib/pacemaker/pengine/pe-input-134.bz2
Feb 21 23:10:42 g5se-dea2b1 crmd[1562]:   notice: te_rsc_command: Initiating 
action 10: stop sw-ready-g5se-dea2b1_stop_0 on g5se-dea2b1 (local)
Feb 21 23:10:42 g5se-dea2b1 crmd[1562]:   notice: process_lrm_event: LRM 
operation sw-ready-g5se-dea2b1_stop_0 (call=48, rc=0, cib-update=67, 
confirmed=true) ok
Feb 21 23:10:42 g5se-dea2b1 crmd[1562]:   notice: run_graph: Transition 32 
(Complete=2, Pending=0, Fired=0, Skipped=0, Incomplete=0, 
Source=/var/lib/pacemaker/pengine/pe-input-134.bz2): Complete
Feb 21 23:10:42 g5se-dea2b1 crmd[1562]:   notice: do_state_transition: State 
transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS 
cause=C_FSA_INTERNAL origin=notify_crmd ]
Feb 21 23:10:42 g5se-dea2b1 stonith-ng[1558]:   notice: unpack_config: On loss 
of CCM Quorum: Ignore
Feb 21 23:10:42 g5se-dea2b1 cib[1557]:   notice: cib:diff: Diff: --- 0.69.3
Feb 21 23:10:42 g5se-dea2b1 cib[1557]:   notice: cib:diff: Diff: +++ 0.70.1 
216351853e036a12a96b442b30522287
Feb 21 23:10:42 g5se-dea2b1 cib[1557]:   notice: cib:diff: -- <cib 
admin_epoch="0" epoch="69" num_updates="3"/>
Feb 21 23:10:42 g5se-dea2b1 cib[1557]:   notice: cib:diff: ++       
<rsc_location id="cli-ban-ClusterIP-on-g5se-dea2b1" rsc="ClusterIP" 
role="Started" node="g5se-dea2b1" score="-INFINITY"/>
Feb 21 23:10:42 g5se-dea2b1 crmd[1562]:   notice: do_state_transition: State 
transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL 
origin=abort_transition_graph ]
Feb 21 23:10:42 g5se-dea2b1 pengine[1561]:   notice: unpack_config: On loss of 
CCM Quorum: Ignore
Feb 21 23:10:42 g5se-dea2b1 pengine[1561]:   notice: LogActions: Stop    
ClusterIP#011(g5se-dea2b1)
Feb 21 23:10:42 g5se-dea2b1 pengine[1561]:   notice: process_pe_message: 
Calculated Transition 35: /var/lib/pacemaker/pengine/pe-input-137.bz2
Feb 21 23:10:42 g5se-dea2b1 crmd[1562]:   notice: te_rsc_command: Initiating 
action 7: stop ClusterIP_stop_0 on g5se-dea2b1 (local)
Feb 21 23:10:42 g5se-dea2b1 IPaddr2[13237]: INFO: IP status = ok, IP_CIP=
Feb 21 23:10:42 g5se-dea2b1 crmd[1562]:   notice: process_lrm_event: LRM 
operation ClusterIP_stop_0 (call=64, rc=0, cib-update=74, confirmed=true) ok
Feb 21 23:10:42 g5se-dea2b1 crmd[1562]:   notice: run_graph: Transition 35 
(Complete=2, Pending=0, Fired=0, Skipped=0, Incomplete=0, 
Source=/var/lib/pacemaker/pengine/pe-input-137.bz2): Complete
Feb 21 23:10:42 g5se-dea2b1 crmd[1562]:   notice: do_state_transition: State 
transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS 
cause=C_FSA_INTERNAL origin=notify_crmd ]
Feb 21 23:10:42 g5se-dea2b1 crmd[1562]:   notice: do_state_transition: State 
transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL 
origin=abort_transition_graph ]
Feb 21 23:10:42 g5se-dea2b1 stonith-ng[1558]:   notice: unpack_config: On loss 
of CCM Quorum: Ignore
Feb 21 23:10:42 g5se-dea2b1 cib[1557]:   notice: cib:diff: Diff: --- 0.70.2
Feb 21 23:10:42 g5se-dea2b1 cib[1557]:   notice: cib:diff: Diff: +++ 0.71.1 
453ef48657244dc188b444348eb547ed
Feb 21 23:10:42 g5se-dea2b1 cib[1557]:   notice: cib:diff: --       
<rsc_location id="cli-ban-ClusterIP-on-g5se-dea2b1" rsc="ClusterIP" 
role="Started" node="g5se-dea2b1" score="-INFINITY"/>
Feb 21 23:10:42 g5se-dea2b1 cib[1557]:   notice: cib:diff: ++ <cib epoch="71" 
num_updates="1" admin_epoch="0" validate-with="pacemaker-1.2" 
cib-last-written="Sun Feb 21 23:10:42 2016" update-origin="g5se-dea2b1" 
update-client="crm_resource" crm_feature_set="3.0.7" have-quorum="1" 
dc-uuid="g5se-dea2b1"/>
Feb 21 23:10:42 g5se-dea2b1 pengine[1561]:   notice: unpack_config: On loss of 
CCM Quorum: Ignore
Feb 21 23:10:42 g5se-dea2b1 pengine[1561]:   notice: LogActions: Start   
ClusterIP#011(g5se-dea2b1)
Feb 21 23:10:42 g5se-dea2b1 pengine[1561]:   notice: process_pe_message: 
Calculated Transition 36: /var/lib/pacemaker/pengine/pe-input-138.bz2
Feb 21 23:10:42 g5se-dea2b1 crmd[1562]:   notice: te_rsc_command: Initiating 
action 6: start ClusterIP_start_0 on g5se-dea2b1 (local)
Feb 21 23:10:42 g5se-dea2b1 azMD[1584]: [azExitHandler] exit handler
Feb 21 23:10:42 g5se-dea2b1 IPaddr2[13282]: INFO: ip -f inet addr add 
172.20.240.123/24 brd 172.20.240.255 dev eth0
Feb 21 23:10:42 g5se-dea2b1 IPaddr2[13282]: INFO: ip link set eth0 up
Feb 21 23:10:42 g5se-dea2b1 IPaddr2[13282]: INFO: /usr/lib64/heartbeat/send_arp 
-i 200 -r 5 -p /var/run/heartbeat/rsctmp/send_arp-172.20.240.123 eth0 
172.20.240.123 auto not_used not_used
Feb 21 23:10:42 g5se-dea2b1 crmd[1562]:   notice: process_lrm_event: LRM 
operation ClusterIP_start_0 (call=68, rc=0, cib-update=76, confirmed=true) ok
Feb 21 23:10:42 g5se-dea2b1 crmd[1562]:   notice: te_rsc_command: Initiating 
action 7: monitor ClusterIP_monitor_30000 on g5se-dea2b1 (local)
Feb 21 23:10:42 g5se-dea2b1 crmd[1562]:   notice: process_lrm_event: LRM 
operation ClusterIP_monitor_30000 (call=71, rc=0, cib-update=77, 
confirmed=false) ok
Feb 21 23:10:42 g5se-dea2b1 crmd[1562]:   notice: run_graph: Transition 36 
(Complete=2, Pending=0, Fired=0, Skipped=0, Incomplete=0, 
Source=/var/lib/pacemaker/pengine/pe-input-138.bz2): Complete
Feb 21 23:10:42 g5se-dea2b1 crmd[1562]:   notice: do_state_transition: State 
transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS 
cause=C_FSA_INTERNAL origin=notify_crmd ]
Feb 21 23:10:43 g5se-dea2b1 crmd[1562]:   notice: do_state_transition: State 
transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL 
origin=abort_transition_graph ]
Feb 21 23:10:43 g5se-dea2b1 stonith-ng[1558]:   notice: unpack_config: On loss 
of CCM Quorum: Ignore
Feb 21 23:10:43 g5se-dea2b1 cib[1557]:   notice: cib:diff: Diff: --- 0.71.3
Feb 21 23:10:43 g5se-dea2b1 cib[1557]:   notice: cib:diff: Diff: +++ 0.72.1 
4e5a3b6259a59f84bcfec6d0f16ad3ba
Feb 21 23:10:43 g5se-dea2b1 cib[1557]:   notice: cib:diff: -- <cib 
admin_epoch="0" epoch="71" num_updates="3"/>
Feb 21 23:10:43 g5se-dea2b1 cib[1557]:   notice: cib:diff: ++       
<rsc_location id="cli-ban-ClusterIP-on-g5se-dea2b1" rsc="ClusterIP" 
role="Started" node="g5se-dea2b1" score="-INFINITY"/>
Feb 21 23:10:43 g5se-dea2b1 pengine[1561]:   notice: unpack_config: On loss of 
CCM Quorum: Ignore
Feb 21 23:10:43 g5se-dea2b1 pengine[1561]:   notice: LogActions: Stop    
ClusterIP#011(g5se-dea2b1)
Feb 21 23:10:43 g5se-dea2b1 pengine[1561]:   notice: process_pe_message: 
Calculated Transition 37: /var/lib/pacemaker/pengine/pe-input-139.bz2
Feb 21 23:10:43 g5se-dea2b1 crmd[1562]:   notice: te_rsc_command: Initiating 
action 7: stop ClusterIP_stop_0 on g5se-dea2b1 (local)
Feb 21 23:10:43 g5se-dea2b1 IPaddr2[13372]: INFO: IP status = ok, IP_CIP=
Feb 21 23:10:43 g5se-dea2b1 crmd[1562]:   notice: process_lrm_event: LRM 
operation ClusterIP_stop_0 (call=75, rc=0, cib-update=79, confirmed=true) ok
Feb 21 23:10:43 g5se-dea2b1 crmd[1562]:   notice: run_graph: Transition 37 
(Complete=2, Pending=0, Fired=0, Skipped=0, Incomplete=0, 
Source=/var/lib/pacemaker/pengine/pe-input-139.bz2): Complete
Feb 21 23:10:43 g5se-dea2b1 crmd[1562]:   notice: do_state_transition: State 
transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS 
cause=C_FSA_INTERNAL origin=notify_crmd ]
Feb 21 23:10:43 g5se-dea2b1 stonith-ng[1558]:   notice: unpack_config: On loss 
of CCM Quorum: Ignore
Feb 21 23:10:43 g5se-dea2b1 cib[1557]:   notice: cib:diff: Diff: --- 0.72.2
Feb 21 23:10:43 g5se-dea2b1 cib[1557]:   notice: cib:diff: Diff: +++ 0.73.1 
93f902fd51a6750b828144d42f8c7a6e
Feb 21 23:10:43 g5se-dea2b1 cib[1557]:   notice: cib:diff: --       
<rsc_location id="cli-ban-ClusterIP-on-g5se-dea2b1" rsc="ClusterIP" 
role="Started" node="g5se-dea2b1" score="-INFINITY"/>
Feb 21 23:10:43 g5se-dea2b1 cib[1557]:   notice: cib:diff: ++ <cib epoch="73" 
num_updates="1" admin_epoch="0" validate-with="pacemaker-1.2" 
cib-last-written="Sun Feb 21 23:10:43 2016" update-origin="g5se-dea2b1" 
update-client="crm_resource" crm_feature_set="3.0.7" have-quorum="1" 
dc-uuid="g5se-dea2b1"/>
Feb 21 23:10:43 g5se-dea2b1 crmd[1562]:   notice: do_state_transition: State 
transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL 
origin=abort_transition_graph ]
Feb 21 23:10:43 g5se-dea2b1 pengine[1561]:   notice: unpack_config: On loss of 
CCM Quorum: Ignore
Feb 21 23:10:43 g5se-dea2b1 pengine[1561]:   notice: LogActions: Start   
ClusterIP#011(g5se-dea2b1)
Feb 21 23:10:43 g5se-dea2b1 pengine[1561]:   notice: process_pe_message: 
Calculated Transition 38: /var/lib/pacemaker/pengine/pe-input-140.bz2
Feb 21 23:10:43 g5se-dea2b1 crmd[1562]:   notice: te_rsc_command: Initiating 
action 6: start ClusterIP_start_0 on g5se-dea2b1 (local)



-----Original Message-----
From: users-requ...@clusterlabs.org [mailto:users-requ...@clusterlabs.org] 
Sent: Monday, February 22, 2016 11:42 AM
To: users@clusterlabs.org
Subject: Users Digest, Vol 13, Issue 44

Send Users mailing list submissions to
        users@clusterlabs.org

To subscribe or unsubscribe via the World Wide Web, visit
        http://clusterlabs.org/mailman/listinfo/users
or, via email, send a message with subject or body 'help' to
        users-requ...@clusterlabs.org

You can reach the person managing the list at
        users-ow...@clusterlabs.org

When replying, please edit your Subject line so it is more specific than "Re: 
Contents of Users digest..."


Today's Topics:

   1. Re: fencing by node name or by node ID (Ken Gaillot)
   2. Re: ClusterIP location constraint reappears after reboot
      (Ken Gaillot)


----------------------------------------------------------------------

Message: 1
Date: Mon, 22 Feb 2016 11:10:57 -0600
From: Ken Gaillot <kgail...@redhat.com>
To: users@clusterlabs.org
Subject: Re: [ClusterLabs] fencing by node name or by node ID
Message-ID: <56cb4121.7000...@redhat.com>
Content-Type: text/plain; charset=windows-1252

On 02/21/2016 06:19 PM, Ferenc W?gner wrote:
> Hi,
> 
> Last night a node in our cluster (Corosync 2.3.5, Pacemaker 1.1.14) 
> experienced some failure and fell out of the cluster:
> 
> Feb 21 22:11:12 vhbl06 corosync[3603]:   [TOTEM ] A new membership 
> (10.0.6.9:612) was formed. Members left: 167773709
> Feb 21 22:11:12 vhbl06 corosync[3603]:   [TOTEM ] Failed to receive the leave 
> message. failed: 167773709
> Feb 21 22:11:12 vhbl06 attrd[8307]:   notice: crm_update_peer_proc: Node 
> vhbl07[167773709] - state is now lost (was member)
> Feb 21 22:11:12 vhbl06 cib[8304]:   notice: crm_update_peer_proc: Node 
> vhbl07[167773709] - state is now lost (was member)
> Feb 21 22:11:12 vhbl06 attrd[8307]:   notice: Removing all vhbl07 attributes 
> for attrd_peer_change_cb
> Feb 21 22:11:12 vhbl06 cib[8304]:   notice: Removing vhbl07/167773709 from 
> the membership list
> Feb 21 22:11:12 vhbl06 cib[8304]:   notice: Purged 1 peers with id=167773709 
> and/or uname=vhbl07 from the membership cache
> Feb 21 22:11:12 vhbl06 attrd[8307]:   notice: Lost attribute writer vhbl07
> Feb 21 22:11:12 vhbl06 attrd[8307]:   notice: Removing vhbl07/167773709 from 
> the membership list
> Feb 21 22:11:12 vhbl06 stonith-ng[8305]:   notice: crm_update_peer_proc: Node 
> vhbl07[167773709] - state is now lost (was member)
> Feb 21 22:11:12 vhbl06 attrd[8307]:   notice: Purged 1 peers with 
> id=167773709 and/or uname=vhbl07 from the membership cache
> Feb 21 22:11:12 vhbl06 crmd[8309]:   notice: Our peer on the DC (vhbl07) is 
> dead
> Feb 21 22:11:12 vhbl06 stonith-ng[8305]:   notice: Removing vhbl07/167773709 
> from the membership list
> Feb 21 22:11:12 vhbl06 stonith-ng[8305]:   notice: Purged 1 peers with 
> id=167773709 and/or uname=vhbl07 from the membership cache
> Feb 21 22:11:12 vhbl06 crmd[8309]:   notice: State transition S_NOT_DC -> 
> S_ELECTION [ input=I_ELECTION cause=C_CRMD_STATUS_CALLBACK 
> origin=peer_update_callback ]
> Feb 21 22:11:12 vhbl06 corosync[3603]:   [QUORUM] Members[4]: 167773705 
> 167773706 167773707 167773708
> Feb 21 22:11:12 vhbl06 corosync[3603]:   [MAIN  ] Completed service 
> synchronization, ready to provide service.
> Feb 21 22:11:12 vhbl06 crmd[8309]:   notice: crm_reap_unseen_nodes: Node 
> vhbl07[167773709] - state is now lost (was member)
> Feb 21 22:11:12 vhbl06 pacemakerd[8261]:   notice: crm_reap_unseen_nodes: 
> Node vhbl07[167773709] - state is now lost (was member)
> Feb 21 22:11:12 vhbl06 kernel: [343490.563365] dlm: closing connection to 
> node 167773709
> Feb 21 22:11:12 vhbl06 stonith-ng[8305]:   notice: fencing-vhbl05 can not 
> fence (reboot) 167773709: static-list
> Feb 21 22:11:12 vhbl06 stonith-ng[8305]:   notice: fencing-vhbl07 can not 
> fence (reboot) 167773709: static-list
> Feb 21 22:11:12 vhbl06 stonith-ng[8305]:   notice: fencing-vhbl01 can not 
> fence (reboot) 167773709: static-list
> Feb 21 22:11:12 vhbl06 stonith-ng[8305]:   notice: fencing-vhbl02 can not 
> fence (reboot) 167773709: static-list
> Feb 21 22:11:12 vhbl06 stonith-ng[8305]:   notice: fencing-vhbl03 can not 
> fence (reboot) 167773709: static-list
> Feb 21 22:11:12 vhbl06 stonith-ng[8305]:   notice: fencing-vhbl04 can not 
> fence (reboot) 167773709: static-list
> Feb 21 22:11:12 vhbl06 stonith-ng[8305]:   notice: Operation reboot of 
> 167773709 by <no-one> for stonith-api.20937@vhbl03.9c470723: No such device
> Feb 21 22:11:12 vhbl06 crmd[8309]:   notice: Peer 167773709 was not 
> terminated (reboot) by <anyone> for vhbl03: No such device 
> (ref=9c470723-d318-4c7e-a705-ce9ee5c7ffe5) by client stonith-api.20937
> Feb 21 22:11:12 vhbl06 dlm_controld[3641]: 343352 tell corosync to remove 
> nodeid 167773705 from cluster
> Feb 21 22:11:15 vhbl06 corosync[3603]:   [TOTEM ] A processor failed, forming 
> new configuration.
> Feb 21 22:11:19 vhbl06 corosync[3603]:   [TOTEM ] A new membership 
> (10.0.6.10:616) was formed. Members left: 167773705
> Feb 21 22:11:19 vhbl06 corosync[3603]:   [TOTEM ] Failed to receive the leave 
> message. failed: 167773705
> 
> However, no fencing agent reported ability to fence the failing node 
> (vhbl07), because stonith-ng wasn't looking it up by name, but by 
> numeric ID (at least that's what the logs suggest to me), and the 
> pcmk_host_list attributes contained strings like vhbl07.
> 
> 1. Was it dlm_controld who requested the fencing?
> 
>    I suspect it because of the "dlm: closing connection to node
>    167773709" kernel message right before the stonith-ng logs.  And
>    dlm_controld really hasn't got anything to use but the corosync node
>    ID.

Not based on this; dlm would print messages about fencing, with 
"dlm_controld.*fence request".

However it looks like these logs are not from the DC, which will say what 
process requested the fencing. It may be DLM or something else.
Also, DLM on any node might initiate fencing, so it's worth looking at all the 
nodes' logs around this time.

> 2. Shouldn't some component translate between node IDs and node names?
>    Is this a configuration error in our setup?  Should I include both in
>    pcmk_host_list?

Yes, stonithd's create_remote_stonith_op() function will do the translation if 
the st_opt_cs_nodeid call option is set in the request XML. If that fails, 
you'll see a "Could not expand nodeid" warning in the log. That option is set 
by the kick() stonith API used by DLM, so it should happen automatically.

I'm not sure why it appears not to have worked here; logs from other nodes 
might help. Do corosync and pacemaker know the same node names?
That would be necessary to get the node name from corosync.

Have you tested fencing vhbl07 from the command line with stonith_admin to make 
sure fencing is configured correctly?

> 3. After the failed fence, why was 167773705 (vhbl03) removed from the
>    cluster?  Because it was chosen to execute the fencing operation, but
>    failed?

dlm_controld explicitly requested it. I'm not familiar enough with DLM to know 
why. It doesn't sound like a good idea to me.

> The logs continue like this:
> 
> Feb 21 22:11:19 vhbl06 attrd[8307]:   notice: crm_update_peer_proc: Node 
> vhbl03[167773705] - state is now lost (was member)
> Feb 21 22:11:19 vhbl06 attrd[8307]:   notice: Removing all vhbl03 attributes 
> for attrd_peer_change_cb
> Feb 21 22:11:19 vhbl06 attrd[8307]:   notice: Removing vhbl03/167773705 from 
> the membership list
> Feb 21 22:11:19 vhbl06 attrd[8307]:   notice: Purged 1 peers with 
> id=167773705 and/or uname=vhbl03 from the membership cache
> Feb 21 22:11:19 vhbl06 corosync[3603]:   [QUORUM] Members[3]: 167773706 
> 167773707 167773708
> Feb 21 22:11:19 vhbl06 corosync[3603]:   [MAIN  ] Completed service 
> synchronization, ready to provide service.
> Feb 21 22:11:19 vhbl06 crmd[8309]:   notice: crm_reap_unseen_nodes: Node 
> vhbl03[167773705] - state is now lost (was member)
> Feb 21 22:11:19 vhbl06 crmd[8309]:   notice: State transition S_ELECTION -> 
> S_INTEGRATION [ input=I_ELECTION_DC cause=C_TIMER_POPPED 
> origin=election_timeout_popped ]
> Feb 21 22:11:19 vhbl06 pacemakerd[8261]:   notice: crm_reap_unseen_nodes: 
> Node vhbl03[167773705] - state is now lost (was member)
> Feb 21 22:11:19 vhbl06 cib[8304]:   notice: crm_update_peer_proc: Node 
> vhbl03[167773705] - state is now lost (was member)
> Feb 21 22:11:19 vhbl06 cib[8304]:   notice: Removing vhbl03/167773705 from 
> the membership list
> Feb 21 22:11:19 vhbl06 cib[8304]:   notice: Purged 1 peers with id=167773705 
> and/or uname=vhbl03 from the membership cache
> Feb 21 22:11:19 vhbl06 stonith-ng[8305]:   notice: crm_update_peer_proc: Node 
> vhbl03[167773705] - state is now lost (was member)
> Feb 21 22:11:19 vhbl06 stonith-ng[8305]:   notice: Removing vhbl03/167773705 
> from the membership list
> Feb 21 22:11:19 vhbl06 stonith-ng[8305]:   notice: Purged 1 peers with 
> id=167773705 and/or uname=vhbl03 from the membership cache
> Feb 21 22:11:19 vhbl06 stonith-ng[8305]:   notice: fencing-vhbl05 can not 
> fence (reboot) 167773709: static-list
> Feb 21 22:11:19 vhbl06 stonith-ng[8305]:   notice: fencing-vhbl07 can not 
> fence (reboot) 167773709: static-list
> Feb 21 22:11:19 vhbl06 stonith-ng[8305]:   notice: fencing-vhbl01 can not 
> fence (reboot) 167773709: static-list
> Feb 21 22:11:19 vhbl06 stonith-ng[8305]:   notice: fencing-vhbl02 can not 
> fence (reboot) 167773709: static-list
> Feb 21 22:11:19 vhbl06 stonith-ng[8305]:   notice: fencing-vhbl03 can not 
> fence (reboot) 167773709: static-list
> Feb 21 22:11:19 vhbl06 stonith-ng[8305]:   notice: fencing-vhbl04 can not 
> fence (reboot) 167773709: static-list
> Feb 21 22:11:19 vhbl06 kernel: [343497.392381] dlm: closing connection 
> to node 167773705
> 
> 4. Why can't I see any action above to fence 167773705 (vhbl03)?

Only the DC and the node that executes the fence will have those logs.
The other nodes will just have the query results ("can/can not fence") and the 
final stonith result.

> Feb 21 22:11:19 vhbl06 crmd[8309]:  warning: FSA: Input I_ELECTION_DC from 
> do_election_check() received in state S_INTEGRATION
> Feb 21 22:11:19 vhbl06 stonith-ng[8305]:   notice: Operation reboot of 
> 167773709 by <no-one> for stonith-api.17462@vhbl04.0cd1625d: No such device
> Feb 21 22:11:19 vhbl06 crmd[8309]:   notice: Peer 167773709 was not 
> terminated (reboot) by <anyone> for vhbl04: No such device 
> (ref=0cd1625d-a61e-4f94-930d-bb80a10b89da) by client stonith-api.17462
> Feb 21 22:11:19 vhbl06 dlm_controld[3641]: 343359 tell corosync to remove 
> nodeid 167773706 from cluster
> Feb 21 22:11:22 vhbl06 corosync[3603]:   [TOTEM ] A processor failed, forming 
> new configuration.
> Feb 21 22:11:26 vhbl06 corosync[3603]:   [TOTEM ] A new membership 
> (10.0.6.11:620) was formed. Members left: 167773706
> Feb 21 22:11:26 vhbl06 corosync[3603]:   [TOTEM ] Failed to receive the leave 
> message. failed: 167773706
> 
> Looks like vhbl04 took over the job of fencing vhbl07 from vhbl03, and 
> of course failed the exact same way.  So it was expelled, too.
> 
> Feb 21 22:11:26 vhbl06 attrd[8307]:   notice: crm_update_peer_proc: Node 
> vhbl04[167773706] - state is now lost (was member)
> Feb 21 22:11:26 vhbl06 cib[8304]:   notice: crm_update_peer_proc: Node 
> vhbl04[167773706] - state is now lost (was member)
> Feb 21 22:11:26 vhbl06 attrd[8307]:   notice: Removing all vhbl04 attributes 
> for attrd_peer_change_cb
> Feb 21 22:11:26 vhbl06 cib[8304]:   notice: Removing vhbl04/167773706 from 
> the membership list
> Feb 21 22:11:26 vhbl06 cib[8304]:   notice: Purged 1 peers with id=167773706 
> and/or uname=vhbl04 from the membership cache
> Feb 21 22:11:26 vhbl06 attrd[8307]:   notice: Removing vhbl04/167773706 from 
> the membership list
> Feb 21 22:11:26 vhbl06 attrd[8307]:   notice: Purged 1 peers with 
> id=167773706 and/or uname=vhbl04 from the membership cache
> Feb 21 22:11:26 vhbl06 stonith-ng[8305]:   notice: crm_update_peer_proc: Node 
> vhbl04[167773706] - state is now lost (was member)
> Feb 21 22:11:26 vhbl06 stonith-ng[8305]:   notice: Removing vhbl04/167773706 
> from the membership list
> Feb 21 22:11:26 vhbl06 crmd[8309]:  warning: No match for shutdown action on 
> 167773706
> Feb 21 22:11:26 vhbl06 stonith-ng[8305]:   notice: Purged 1 peers with 
> id=167773706 and/or uname=vhbl04 from the membership cache
> Feb 21 22:11:26 vhbl06 crmd[8309]:   notice: Stonith/shutdown of vhbl04 not 
> matched
> Feb 21 22:11:26 vhbl06 corosync[3603]:   [QUORUM] This node is within the 
> non-primary component and will NOT provide any services.
> Feb 21 22:11:26 vhbl06 pacemakerd[8261]:   notice: Membership 620: quorum 
> lost (2)
> Feb 21 22:11:26 vhbl06 crmd[8309]:   notice: Membership 620: quorum lost (2)
> Feb 21 22:11:26 vhbl06 corosync[3603]:   [QUORUM] Members[2]: 167773707 
> 167773708
> 
> That, finally, was enough to lose quorum and paralize the cluster.
> Later, vhbl07 was rebooted by the hardware watchdog and came back for 
> a cold welcome:
> 
> Feb 21 22:24:53 vhbl06 corosync[3603]:   [TOTEM ] A new membership 
> (10.0.6.12:628) was formed. Members joined: 167773709
> Feb 21 22:24:53 vhbl06 corosync[3603]:   [QUORUM] Members[2]: 167773708 
> 167773709
> Feb 21 22:24:53 vhbl06 corosync[3603]:   [MAIN  ] Completed service 
> synchronization, ready to provide service.
> Feb 21 22:24:53 vhbl06 crmd[8309]:   notice: pcmk_quorum_notification: Node 
> vhbl07[167773709] - state is now member (was lost)
> Feb 21 22:24:53 vhbl06 pacemakerd[8261]:   notice: pcmk_quorum_notification: 
> Node vhbl07[167773709] - state is now member (was lost)
> Feb 21 22:24:53 vhbl06 dlm_controld[3641]: 344173 daemon joined 
> 167773709 needs fencing Feb 21 22:25:47 vhbl06 dlm_controld[3641]: 344226 
> clvmd wait for quorum
> Feb 21 22:29:26 vhbl06 crmd[8309]:   notice: State transition S_IDLE -> 
> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_TIMER_POPPED 
> origin=crm_timer_popped ]
> Feb 21 22:29:27 vhbl06 pengine[8308]:   notice: We do not have quorum - 
> fencing and resource management disabled
> Feb 21 22:29:27 vhbl06 pengine[8308]:  warning: Node vhbl04 is unclean 
> because the node is no longer part of the cluster Feb 21 22:29:27 
> vhbl06 pengine[8308]:  warning: Node vhbl04 is unclean Feb 21 22:29:27 
> vhbl06 pengine[8308]:  warning: Node vhbl05 is unclean because the 
> node is no longer part of the cluster Feb 21 22:29:27 vhbl06 
> pengine[8308]:  warning: Node vhbl05 is unclean Feb 21 22:29:27 vhbl06 
> pengine[8308]:  warning: Node vhbl07 is unclean because our peer 
> process is no longer available Feb 21 22:29:27 vhbl06 pengine[8308]:  
> warning: Node vhbl07 is unclean Feb 21 22:29:27 vhbl06 pengine[8308]:  
> warning: Node vhbl03 is unclean because vm-niifdc is thought to be 
> active there Feb 21 22:29:27 vhbl06 pengine[8308]:  warning: Action 
> vm-dogwood_stop_0 on vhbl03 is unrunnable (offline) Feb 21 22:29:27 vhbl06 
> pengine[8308]:  warning: Action vm-niifidp_stop_0 on vhbl03 is unrunnable 
> (offline) [...] Feb 21 22:29:27 vhbl06 pengine[8308]:  warning: Node vhbl03 
> is unclean!
> Feb 21 22:29:27 vhbl06 pengine[8308]:  warning: Node vhbl04 is unclean!
> Feb 21 22:29:27 vhbl06 pengine[8308]:  warning: Node vhbl05 is unclean!
> Feb 21 22:29:27 vhbl06 pengine[8308]:   notice: We can fence vhbl07 without 
> quorum because they're in our membership
> Feb 21 22:29:27 vhbl06 pengine[8308]:  warning: Scheduling Node vhbl07 for 
> STONITH
> Feb 21 22:29:27 vhbl06 pengine[8308]:   notice: Cannot fence unclean nodes 
> until quorum is attained (or no-quorum-policy is set to ignore)
> [...]
> Feb 21 22:29:27 vhbl06 crmd[8309]:   notice: Executing reboot fencing 
> operation (212) on vhbl07 (timeout=60000)
> Feb 21 22:29:27 vhbl06 stonith-ng[8305]:   notice: Client crmd.8309.09cea2e7 
> wants to fence (reboot) 'vhbl07' with device '(any)'
> Feb 21 22:29:27 vhbl06 stonith-ng[8305]:   notice: Initiating remote 
> operation reboot for vhbl07: 31b2023d-3fc5-419e-8490-91eb81254497 (0)
> Feb 21 22:29:27 vhbl06 stonith-ng[8305]:   notice: fencing-vhbl05 can not 
> fence (reboot) vhbl07: static-list
> Feb 21 22:29:27 vhbl06 stonith-ng[8305]:   notice: fencing-vhbl07 can fence 
> (reboot) vhbl07: static-list
> Feb 21 22:29:27 vhbl06 stonith-ng[8305]:   notice: fencing-vhbl01 can not 
> fence (reboot) vhbl07: static-list
> Feb 21 22:29:27 vhbl06 stonith-ng[8305]:   notice: fencing-vhbl02 can not 
> fence (reboot) vhbl07: static-list
> Feb 21 22:29:27 vhbl06 stonith-ng[8305]:   notice: fencing-vhbl03 can not 
> fence (reboot) vhbl07: static-list
> Feb 21 22:29:27 vhbl06 stonith-ng[8305]:   notice: fencing-vhbl04 can not 
> fence (reboot) vhbl07: static-list
> Feb 21 22:29:27 vhbl06 stonith-ng[8305]:   notice: fencing-vhbl05 can not 
> fence (reboot) vhbl07: static-list
> Feb 21 22:29:27 vhbl06 stonith-ng[8305]:   notice: fencing-vhbl07 can fence 
> (reboot) vhbl07: static-list
> Feb 21 22:29:27 vhbl06 stonith-ng[8305]:   notice: fencing-vhbl01 can not 
> fence (reboot) vhbl07: static-list
> Feb 21 22:29:27 vhbl06 stonith-ng[8305]:   notice: fencing-vhbl02 can not 
> fence (reboot) vhbl07: static-list
> Feb 21 22:29:27 vhbl06 stonith-ng[8305]:   notice: fencing-vhbl03 can not 
> fence (reboot) vhbl07: static-list
> Feb 21 22:29:27 vhbl06 stonith-ng[8305]:   notice: fencing-vhbl04 can not 
> fence (reboot) vhbl07: static-list
> Feb 21 22:29:27 vhbl06 dlm_controld[3641]: 344447 daemon remove 
> 167773709 already needs fencing Feb 21 22:29:27 vhbl06 dlm_controld[3641]: 
> 344447 tell corosync to remove nodeid 167773709 from cluster
> Feb 21 22:29:30 vhbl06 corosync[3603]:   [TOTEM ] A processor failed, forming 
> new configuration.
> Feb 21 22:29:34 vhbl06 corosync[3603]:   [TOTEM ] A new membership 
> (10.0.6.12:632) was formed. Members left: 167773709
> Feb 21 22:29:34 vhbl06 corosync[3603]:   [TOTEM ] Failed to receive the leave 
> message. failed: 167773709
> Feb 21 22:29:34 vhbl06 corosync[3603]:   [QUORUM] Members[1]: 167773708
> Feb 21 22:29:34 vhbl06 corosync[3603]:   [MAIN  ] Completed service 
> synchronization, ready to provide service.
> Feb 21 22:29:34 vhbl06 pacemakerd[8261]:   notice: crm_reap_unseen_nodes: 
> Node vhbl07[167773709] - state is now lost (was member)
> Feb 21 22:29:34 vhbl06 crmd[8309]:   notice: crm_reap_unseen_nodes: Node 
> vhbl07[167773709] - state is now lost (was member)
> Feb 21 22:29:34 vhbl06 kernel: [344592.424938] dlm: closing connection to 
> node 167773709
> Feb 21 22:29:42 vhbl06 stonith-ng[8305]:   notice: Operation 'reboot' [5533] 
> (call 2 from crmd.8309) for host 'vhbl07' with device 'fencing-vhbl07' 
> returned: 0 (OK)
> Feb 21 22:29:42 vhbl06 stonith-ng[8305]:   notice: Operation reboot of vhbl07 
> by vhbl06 for crmd.8309@vhbl06.31b2023d: OK
> Feb 21 22:29:42 vhbl06 crmd[8309]:   notice: Stonith operation 
> 2/212:1:0:d06e9743-b452-4b6a-b3a9-d352a4454269: OK (0)
> Feb 21 22:29:42 vhbl06 crmd[8309]:   notice: Peer vhbl07 was terminated 
> (reboot) by vhbl06 for vhbl06: OK (ref=31b2023d-3fc5-419e-8490-91eb81254497) 
> by client crmd.8309
> 
> That is, fencing by node name worked all right.
> 
> I wonder if I understood the issue right and what would be the best 
> way to avoid it in the future.  Please advise.
> 




------------------------------

Message: 2
Date: Mon, 22 Feb 2016 11:39:03 -0600
From: Ken Gaillot <kgail...@redhat.com>
To: users@clusterlabs.org
Subject: Re: [ClusterLabs] ClusterIP location constraint reappears
        after reboot
Message-ID: <56cb47b7.3060...@redhat.com>
Content-Type: text/plain; charset=windows-1252

On 02/22/2016 07:26 AM, Jeremy Matthews wrote:
> Thank you, Ken Gaillot, for your response. Sorry for the delayed followup, 
> but I have looked and looked at the scripts. There are a couple of scripts 
> that have a pcs resource ban command, but they are not executed at the time 
> of shutdown which is when I've discovered that the constraint is put back in. 
> Our application software did not change on the system. We just updated pcs 
> and pacemaker (and dependencies). I had to rollback pcs because it has an 
> issue. 
> 
> Below is from /var/log/cluster/corosync.log. Any clues here as to why the 
> constraint might have been added? In my other system without the pacemaker 
> update, there is not the addition of the constraint. 

It might help to see the entire log from the time you issued the reboot command 
to when the constraint was added.

Notice in the cib logs it says "origin=local/crm_resource". That means that 
crm_resource was what originally added the constraint (pcs resource ban calls 
crm_resource).

I'd be curious whether this makes a difference: after removing the constraint, 
run "pcs cib-upgrade". It shouldn't, but it's the only thing I can think of to 
try.

CIB schema versions change when new features are added that require new CIB 
syntax. pcs should automatically run cib-upgrade if you ever use a newer 
feature than your current CIB version supports. You don't really need to 
cib-upgrade explicitly, but it doesn't hurt, and it will get rid of those 
"Transformed the configuration" messages.

> Feb 19 15:22:23 [1999] g5se-f3efce       crmd:     info: do_state_transition: 
>   State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ 
> input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ]
> Feb 19 15:22:23 [1999] g5se-f3efce       crmd:     info: do_te_invoke:  
> Processing graph 9 (ref=pe_calc-dc-1455920543-46) derived from 
> /var/lib/pacemaker/pengine/pe-input-642.bz2
> Feb 19 15:22:23 [1999] g5se-f3efce       crmd:   notice: run_graph:     
> Transition 9 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, 
> Source=/var/lib/pacemaker/pengine/pe-input-642.bz2): Complete
> Feb 19 15:22:23 [1999] g5se-f3efce       crmd:     info: do_log:        FSA: 
> Input I_TE_SUCCESS from notify_crmd() received in state S_TRANSITION_ENGINE
> Feb 19 15:22:23 [1999] g5se-f3efce       crmd:   notice: do_state_transition: 
>   State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS 
> cause=C_FSA_INTERNAL origin=notify_crmd ]
> Feb 19 15:22:23 [1998] g5se-f3efce    pengine:   notice: process_pe_message:  
>   Calculated Transition 9: /var/lib/pacemaker/pengine/pe-input-642.bz2
> Feb 19 15:22:23 [1994] g5se-f3efce        cib:     info: cib_process_request: 
>   Forwarding cib_modify operation for section constraints to master 
> (origin=local/crm_resource/3)
> Feb 19 15:22:23 [1994] g5se-f3efce        cib:     info: cib_perform_op:      
>   Diff: --- 0.291.8 2
> Feb 19 15:22:23 [1994] g5se-f3efce        cib:     info: cib_perform_op:      
>   Diff: +++ 0.292.0 (null)
> Feb 19 15:22:23 [1994] g5se-f3efce        cib:     info: cib_perform_op:      
>   +  /cib:  @epoch=292, @num_updates=0
> Feb 19 15:22:23 [1994] g5se-f3efce        cib:     info: cib_perform_op:      
>   ++ /cib/configuration/constraints:  <rsc_location 
> id="cli-ban-ClusterIP-on-g5se-f3efce" rsc="ClusterIP" role="Started" 
> node="g5se-f3efce" score="-INFINITY"/>
> Feb 19 15:22:23 [1994] g5se-f3efce        cib:     info: cib_process_request: 
>   Completed cib_modify operation for section constraints: OK (rc=0, 
> origin=g5se-f3efce/crm_resource/3, version=0.292.0)
> Feb 19 15:22:23 [1999] g5se-f3efce       crmd:     info: 
> abort_transition_graph:        Transition aborted by 
> rsc_location.cli-ban-ClusterIP-on-g5se-f3efce 'create': Non-status change 
> (cib=0.292.0, source=te_update_diff:383, path=/cib/configuration/constraints, 
> 1)
> Feb 19 15:22:23 [1999] g5se-f3efce       crmd:   notice: do_state_transition: 
>   State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC 
> cause=C_FSA_INTERNAL origin=abort_transition_graph ]
> Feb 19 15:22:23 [1998] g5se-f3efce    pengine:   notice: update_validation:   
>   pacemaker-1.2-style configuration is also valid for pacemaker-1.3
> Feb 19 15:22:23 [1998] g5se-f3efce    pengine:     info: update_validation:   
>   Transformation upgrade-1.3.xsl successful
> Feb 19 15:22:23 [1998] g5se-f3efce    pengine:     info: update_validation:   
>   Transformed the configuration from pacemaker-1.2 to pacemaker-2.0
> Feb 19 15:22:23 [1998] g5se-f3efce    pengine:     info: cli_config_update:   
>   Your configuration was internally updated to the latest version 
> (pacemaker-2.0)
> Feb 19 15:22:23 [1998] g5se-f3efce    pengine:   notice: unpack_config:       
>   On loss of CCM Quorum: Ignore
> Feb 19 15:22:23 [1998] g5se-f3efce    pengine:     info: unpack_status:       
>   Node g5se-f3efce is in standby-mode
> 
> I'm not sure what all has to be included my original email and Ken Gaillot's 
> response embedded in it below. 
> 
> Message: 3
> Date: Thu, 18 Feb 2016 13:37:31 -0600
> From: Ken Gaillot <kgail...@redhat.com>
> To: users@clusterlabs.org
> Subject: Re: [ClusterLabs] ClusterIP location constraint reappears
>       after reboot
> Message-ID: <56c61d7b.9090...@redhat.com>
> Content-Type: text/plain; charset=windows-1252
> 
> On 02/18/2016 01:07 PM, Jeremy Matthews wrote:
>> Hi,
>>
>> We're having an issue with our cluster where after a reboot of our system a 
>> location constraint reappears for the ClusterIP. This causes a problem, 
>> because we have a daemon that checks the cluster state and waits until the 
>> ClusterIP is started before it kicks off our application. We didn't have 
>> this issue when using an earlier version of pacemaker. Here is the 
>> constraint as shown by pcs:
>>
>> [root@g5se-f3efce cib]# pcs constraint Location Constraints:
>>   Resource: ClusterIP
>>     Disabled on: g5se-f3efce (role: Started) Ordering Constraints:
>> Colocation Constraints:
>>
>> ...and here is our cluster status with the ClusterIP being Stopped:
>>
>> [root@g5se-f3efce cib]# pcs status
>> Cluster name: cl-g5se-f3efce
>> Last updated: Thu Feb 18 11:36:01 2016 Last change: Thu Feb 18
>> 10:48:33 2016 via crm_resource on g5se-f3efce
>> Stack: cman
>> Current DC: g5se-f3efce - partition with quorum
>> Version: 1.1.11-97629de
>> 1 Nodes configured
>> 4 Resources configured
>>
>>
>> Online: [ g5se-f3efce ]
>>
>> Full list of resources:
>>
>> sw-ready-g5se-f3efce   (ocf::pacemaker:GBmon): Started g5se-f3efce
>> meta-data      (ocf::pacemaker:GBmon): Started g5se-f3efce
>> netmon (ocf::heartbeat:ethmonitor):    Started g5se-f3efce
>> ClusterIP      (ocf::heartbeat:IPaddr2):       Stopped
>>
>>
>> The cluster really just has one node at this time.
>>
>> I retrieve the constraint ID, remove the constraint, verify that ClusterIP 
>> is started,  and then reboot:
>>
>> [root@g5se-f3efce cib]# pcs constraint ref ClusterIP
>> Resource: ClusterIP
>>   cli-ban-ClusterIP-on-g5se-f3efce
>> [root@g5se-f3efce cib]# pcs constraint remove 
>> cli-ban-ClusterIP-on-g5se-f3efce
>>
>> [root@g5se-f3efce cib]# pcs status
>> Cluster name: cl-g5se-f3efce
>> Last updated: Thu Feb 18 11:45:09 2016 Last change: Thu Feb 18
>> 11:44:53 2016 via crm_resource on g5se-f3efce
>> Stack: cman
>> Current DC: g5se-f3efce - partition with quorum
>> Version: 1.1.11-97629de
>> 1 Nodes configured
>> 4 Resources configured
>>
>>
>> Online: [ g5se-f3efce ]
>>
>> Full list of resources:
>>
>> sw-ready-g5se-f3efce   (ocf::pacemaker:GBmon): Started g5se-f3efce
>> meta-data      (ocf::pacemaker:GBmon): Started g5se-f3efce
>> netmon (ocf::heartbeat:ethmonitor):    Started g5se-f3efce
>> ClusterIP      (ocf::heartbeat:IPaddr2):       Started g5se-f3efce
>>
>>
>> [root@g5se-f3efce cib]# reboot
>>
>> ....after reboot, log in, and the constraint is back and ClusterIP has not 
>> started.
>>
>>
>> I have noticed in /var/lib/pacemaker/cib that the cib-x.raw files get 
>> created when there are changes to the cib (cib.xml). After a reboot, I see 
>> the constraint being added in a diff between .raw files:
>>
>> [root@g5se-f3efce cib]# diff cib-7.raw cib-8.raw
>> 1c1
>> < <cib epoch="239" num_updates="0" admin_epoch="0" 
>> validate-with="pacemaker-1.2" cib-last-written="Thu Feb 18 11:44:53 
>> 2016" update-origin="g5se-f3efce" update-client="crm_resource"
>> crm_feature_set="3.0.9" have-quorum="1" dc-uuid="g5se-f3efce">
>> ---
>>> <cib epoch="240" num_updates="0" admin_epoch="0" 
>>> validate-with="pacemaker-1.2" cib-last-written="Thu Feb 18 11:46:49 
>>> 2016" update-origin="g5se-f3efce" update-client="crm_resource"
>>> crm_feature_set="3.0.9" have-quorum="1" dc-uuid="g5se-f3efce">
>> 50c50,52
>> <     <constraints/>
>> ---
>>>     <constraints>
>>>       <rsc_location id="cli-ban-ClusterIP-on-g5se-f3efce" rsc="ClusterIP" 
>>> role="Started" node="g5se-f3efce" score="-INFINITY"/>
>>>     </constraints>
>>
>>
>> I have also looked in /var/log/cluster/corosync.log and seen logs where it 
>> seems the cib is getting updated. I'm not sure if the constraint is being 
>> put back in at shutdown or at start up. I just don't understand why it's 
>> being put back in. I don't think our daemon code or other scripts are doing 
>> this,  but it is something I could verify.
> 
> I would look at any scripts running around that time first. Constraints that 
> start with "cli-" were created by one of the CLI tools, so something must be 
> calling it. The most likely candidates are pcs resource move/ban or 
> crm_resource -M/--move/-B/--ban.
> 
>> ********************************
>>
>> From "yum info pacemaker", my current version is:
>>
>> Name        : pacemaker
>> Arch        : x86_64
>> Version     : 1.1.12
>> Release     : 8.el6_7.2
>>
>> My earlier version was:
>>
>> Name        : pacemaker
>> Arch        : x86_64
>> Version     : 1.1.10
>> Release     : 1.el6_4.4
>>
>> I'm still using an earlier version pcs, because the new one seems to have 
>> issues with python:
>>
>> Name        : pcs
>> Arch        : noarch
>> Version     : 0.9.90
>> Release     : 1.0.1.el6.centos
>>
>> *******************************
>>
>> If anyone has ideas on the cause or thoughts on this, anything would be 
>> greatly appreciated.
>>
>> Thanks!
>>
>>
>>
>> Jeremy Matthews
> 
> -----Original Message-----
> From: users-requ...@clusterlabs.org 
> [mailto:users-requ...@clusterlabs.org]
> Sent: Friday, February 19, 2016 2:21 AM
> To: users@clusterlabs.org
> Subject: Users Digest, Vol 13, Issue 35
> 
> Send Users mailing list submissions to
>       users@clusterlabs.org
> 
> To subscribe or unsubscribe via the World Wide Web, visit
>       http://clusterlabs.org/mailman/listinfo/users
> or, via email, send a message with subject or body 'help' to
>       users-requ...@clusterlabs.org
> 
> You can reach the person managing the list at
>       users-ow...@clusterlabs.org
> 
> When replying, please edit your Subject line so it is more specific than "Re: 
> Contents of Users digest..."
> 
> 
> Today's Topics:
> 
>    1. Re: Too quick node reboot leads to failed corosync assert on
>       other node(s) (Michal Koutn?)
>    2. ClusterIP location constraint reappears after reboot
>       (Jeremy Matthews)
>    3. Re: ClusterIP location constraint reappears after reboot
>       (Ken Gaillot)
>    4. Re: Too quick node reboot leads to failed corosync assert on
>       other node(s) (Jan Friesse)
> 
> 
> ----------------------------------------------------------------------
> 
> Message: 1
> Date: Thu, 18 Feb 2016 17:32:48 +0100
> From: Michal Koutn? <mkou...@suse.com>
> To: users@clusterlabs.org
> Subject: Re: [ClusterLabs] Too quick node reboot leads to failed
>       corosync assert on other node(s)
> Message-ID: <56c5f230.6020...@suse.com>
> Content-Type: text/plain; charset="windows-1252"
> 
> On 02/18/2016 10:40 AM, Christine Caulfield wrote:
>> I definitely remember looking into this, or something very like it, 
>> ages ago. I can't find anything in the commit logs for either 
>> corosync or cman that looks relevant though. If you're seeing it on 
>> recent builds then it's obviously still a problem anyway and we ought to 
>> look into it!
> Thanks for you replies.
> 
> So far this happened only once and we've done only "post mortem", alas no 
> available reproducer. If I have time, I'll try to reproduce it locally and 
> check whether it exists in the current version.
> 
> Michal
> 
> -------------- next part -------------- A non-text attachment was 
> scrubbed...
> Name: signature.asc
> Type: application/pgp-signature
> Size: 819 bytes
> Desc: OpenPGP digital signature
> URL: 
> <http://clusterlabs.org/pipermail/users/attachments/20160218/97908c9d/
> attachment-0001.sig>
> 
> ------------------------------
> 
> Message: 2
> Date: Thu, 18 Feb 2016 19:07:19 +0000
> From: Jeremy Matthews <jeremy.matth...@genband.com>
> To: "users@clusterlabs.org" <users@clusterlabs.org>
> Subject: [ClusterLabs] ClusterIP location constraint reappears after
>       reboot
> Message-ID:
>       <ba3fced1d982a94aa64964f08b104956012d760...@gbplmail01.genband.com>
> Content-Type: text/plain; charset="windows-1252"
> 
> Hi,
> 
> We're having an issue with our cluster where after a reboot of our system a 
> location constraint reappears for the ClusterIP. This causes a problem, 
> because we have a daemon that checks the cluster state and waits until the 
> ClusterIP is started before it kicks off our application. We didn't have this 
> issue when using an earlier version of pacemaker. Here is the constraint as 
> shown by pcs:
> 
> [root@g5se-f3efce cib]# pcs constraint Location Constraints:
>   Resource: ClusterIP
>     Disabled on: g5se-f3efce (role: Started) Ordering Constraints:
> Colocation Constraints:
> 
> ...and here is our cluster status with the ClusterIP being Stopped:
> 
> [root@g5se-f3efce cib]# pcs status
> Cluster name: cl-g5se-f3efce
> Last updated: Thu Feb 18 11:36:01 2016 Last change: Thu Feb 18 
> 10:48:33 2016 via crm_resource on g5se-f3efce
> Stack: cman
> Current DC: g5se-f3efce - partition with quorum
> Version: 1.1.11-97629de
> 1 Nodes configured
> 4 Resources configured
> 
> 
> Online: [ g5se-f3efce ]
> 
> Full list of resources:
> 
> sw-ready-g5se-f3efce   (ocf::pacemaker:GBmon): Started g5se-f3efce
> meta-data      (ocf::pacemaker:GBmon): Started g5se-f3efce
> netmon (ocf::heartbeat:ethmonitor):    Started g5se-f3efce
> ClusterIP      (ocf::heartbeat:IPaddr2):       Stopped
> 
> 
> The cluster really just has one node at this time.
> 
> I retrieve the constraint ID, remove the constraint, verify that ClusterIP is 
> started,  and then reboot:
> 
> [root@g5se-f3efce cib]# pcs constraint ref ClusterIP
> Resource: ClusterIP
>   cli-ban-ClusterIP-on-g5se-f3efce
> [root@g5se-f3efce cib]# pcs constraint remove 
> cli-ban-ClusterIP-on-g5se-f3efce
> 
> [root@g5se-f3efce cib]# pcs status
> Cluster name: cl-g5se-f3efce
> Last updated: Thu Feb 18 11:45:09 2016 Last change: Thu Feb 18 
> 11:44:53 2016 via crm_resource on g5se-f3efce
> Stack: cman
> Current DC: g5se-f3efce - partition with quorum
> Version: 1.1.11-97629de
> 1 Nodes configured
> 4 Resources configured
> 
> 
> Online: [ g5se-f3efce ]
> 
> Full list of resources:
> 
> sw-ready-g5se-f3efce   (ocf::pacemaker:GBmon): Started g5se-f3efce
> meta-data      (ocf::pacemaker:GBmon): Started g5se-f3efce
> netmon (ocf::heartbeat:ethmonitor):    Started g5se-f3efce
> ClusterIP      (ocf::heartbeat:IPaddr2):       Started g5se-f3efce
> 
> 
> [root@g5se-f3efce cib]# reboot
> 
> ....after reboot, log in, and the constraint is back and ClusterIP has not 
> started.
> 
> 
> I have noticed in /var/lib/pacemaker/cib that the cib-x.raw files get created 
> when there are changes to the cib (cib.xml). After a reboot, I see the 
> constraint being added in a diff between .raw files:
> 
> [root@g5se-f3efce cib]# diff cib-7.raw cib-8.raw
> 1c1
> < <cib epoch="239" num_updates="0" admin_epoch="0" 
> validate-with="pacemaker-1.2" cib-last-written="Thu Feb 18 11:44:53 
> 2016" update-origin="g5se-f3efce" update-client="crm_resource" 
> crm_feature_set="3.0.9" have-quorum="1" dc-uuid="g5se-f3efce">
> ---
>> <cib epoch="240" num_updates="0" admin_epoch="0" 
>> validate-with="pacemaker-1.2" cib-last-written="Thu Feb 18 11:46:49 
>> 2016" update-origin="g5se-f3efce" update-client="crm_resource"
>> crm_feature_set="3.0.9" have-quorum="1" dc-uuid="g5se-f3efce">
> 50c50,52
> <     <constraints/>
> ---
>>     <constraints>
>>       <rsc_location id="cli-ban-ClusterIP-on-g5se-f3efce" rsc="ClusterIP" 
>> role="Started" node="g5se-f3efce" score="-INFINITY"/>
>>     </constraints>
> 
> 
> I have also looked in /var/log/cluster/corosync.log and seen logs where it 
> seems the cib is getting updated. I'm not sure if the constraint is being put 
> back in at shutdown or at start up. I just don't understand why it's being 
> put back in. I don't think our daemon code or other scripts are doing this,  
> but it is something I could verify.
> 
> ********************************
> 
>>From "yum info pacemaker", my current version is:
> 
> Name        : pacemaker
> Arch        : x86_64
> Version     : 1.1.12
> Release     : 8.el6_7.2
> 
> My earlier version was:
> 
> Name        : pacemaker
> Arch        : x86_64
> Version     : 1.1.10
> Release     : 1.el6_4.4
> 
> I'm still using an earlier version pcs, because the new one seems to have 
> issues with python:
> 
> Name        : pcs
> Arch        : noarch
> Version     : 0.9.90
> Release     : 1.0.1.el6.centos
> 
> *******************************
> 
> If anyone has ideas on the cause or thoughts on this, anything would be 
> greatly appreciated.
> 
> Thanks!
> 
> 
> 
> Jeremy Matthews
> 
> 
> -------------- next part -------------- An HTML attachment was 
> scrubbed...
> URL: 
> <http://clusterlabs.org/pipermail/users/attachments/20160218/8a4b99fd/
> attachment-0001.html>
> 
> ------------------------------
> 
> Message: 3
> Date: Thu, 18 Feb 2016 13:37:31 -0600
> From: Ken Gaillot <kgail...@redhat.com>
> To: users@clusterlabs.org
> Subject: Re: [ClusterLabs] ClusterIP location constraint reappears
>       after reboot
> Message-ID: <56c61d7b.9090...@redhat.com>
> Content-Type: text/plain; charset=windows-1252
> 
> On 02/18/2016 01:07 PM, Jeremy Matthews wrote:
>> Hi,
>>
>> We're having an issue with our cluster where after a reboot of our system a 
>> location constraint reappears for the ClusterIP. This causes a problem, 
>> because we have a daemon that checks the cluster state and waits until the 
>> ClusterIP is started before it kicks off our application. We didn't have 
>> this issue when using an earlier version of pacemaker. Here is the 
>> constraint as shown by pcs:
>>
>> [root@g5se-f3efce cib]# pcs constraint Location Constraints:
>>   Resource: ClusterIP
>>     Disabled on: g5se-f3efce (role: Started) Ordering Constraints:
>> Colocation Constraints:
>>
>> ...and here is our cluster status with the ClusterIP being Stopped:
>>
>> [root@g5se-f3efce cib]# pcs status
>> Cluster name: cl-g5se-f3efce
>> Last updated: Thu Feb 18 11:36:01 2016 Last change: Thu Feb 18
>> 10:48:33 2016 via crm_resource on g5se-f3efce
>> Stack: cman
>> Current DC: g5se-f3efce - partition with quorum
>> Version: 1.1.11-97629de
>> 1 Nodes configured
>> 4 Resources configured
>>
>>
>> Online: [ g5se-f3efce ]
>>
>> Full list of resources:
>>
>> sw-ready-g5se-f3efce   (ocf::pacemaker:GBmon): Started g5se-f3efce
>> meta-data      (ocf::pacemaker:GBmon): Started g5se-f3efce
>> netmon (ocf::heartbeat:ethmonitor):    Started g5se-f3efce
>> ClusterIP      (ocf::heartbeat:IPaddr2):       Stopped
>>
>>
>> The cluster really just has one node at this time.
>>
>> I retrieve the constraint ID, remove the constraint, verify that ClusterIP 
>> is started,  and then reboot:
>>
>> [root@g5se-f3efce cib]# pcs constraint ref ClusterIP
>> Resource: ClusterIP
>>   cli-ban-ClusterIP-on-g5se-f3efce
>> [root@g5se-f3efce cib]# pcs constraint remove 
>> cli-ban-ClusterIP-on-g5se-f3efce
>>
>> [root@g5se-f3efce cib]# pcs status
>> Cluster name: cl-g5se-f3efce
>> Last updated: Thu Feb 18 11:45:09 2016 Last change: Thu Feb 18
>> 11:44:53 2016 via crm_resource on g5se-f3efce
>> Stack: cman
>> Current DC: g5se-f3efce - partition with quorum
>> Version: 1.1.11-97629de
>> 1 Nodes configured
>> 4 Resources configured
>>
>>
>> Online: [ g5se-f3efce ]
>>
>> Full list of resources:
>>
>> sw-ready-g5se-f3efce   (ocf::pacemaker:GBmon): Started g5se-f3efce
>> meta-data      (ocf::pacemaker:GBmon): Started g5se-f3efce
>> netmon (ocf::heartbeat:ethmonitor):    Started g5se-f3efce
>> ClusterIP      (ocf::heartbeat:IPaddr2):       Started g5se-f3efce
>>
>>
>> [root@g5se-f3efce cib]# reboot
>>
>> ....after reboot, log in, and the constraint is back and ClusterIP has not 
>> started.
>>
>>
>> I have noticed in /var/lib/pacemaker/cib that the cib-x.raw files get 
>> created when there are changes to the cib (cib.xml). After a reboot, I see 
>> the constraint being added in a diff between .raw files:
>>
>> [root@g5se-f3efce cib]# diff cib-7.raw cib-8.raw
>> 1c1
>> < <cib epoch="239" num_updates="0" admin_epoch="0" 
>> validate-with="pacemaker-1.2" cib-last-written="Thu Feb 18 11:44:53 
>> 2016" update-origin="g5se-f3efce" update-client="crm_resource"
>> crm_feature_set="3.0.9" have-quorum="1" dc-uuid="g5se-f3efce">
>> ---
>>> <cib epoch="240" num_updates="0" admin_epoch="0" 
>>> validate-with="pacemaker-1.2" cib-last-written="Thu Feb 18 11:46:49 
>>> 2016" update-origin="g5se-f3efce" update-client="crm_resource"
>>> crm_feature_set="3.0.9" have-quorum="1" dc-uuid="g5se-f3efce">
>> 50c50,52
>> <     <constraints/>
>> ---
>>>     <constraints>
>>>       <rsc_location id="cli-ban-ClusterIP-on-g5se-f3efce" rsc="ClusterIP" 
>>> role="Started" node="g5se-f3efce" score="-INFINITY"/>
>>>     </constraints>
>>
>>
>> I have also looked in /var/log/cluster/corosync.log and seen logs where it 
>> seems the cib is getting updated. I'm not sure if the constraint is being 
>> put back in at shutdown or at start up. I just don't understand why it's 
>> being put back in. I don't think our daemon code or other scripts are doing 
>> this,  but it is something I could verify.
> 
> I would look at any scripts running around that time first. Constraints that 
> start with "cli-" were created by one of the CLI tools, so something must be 
> calling it. The most likely candidates are pcs resource move/ban or 
> crm_resource -M/--move/-B/--ban.
> 
>> ********************************
>>
>> From "yum info pacemaker", my current version is:
>>
>> Name        : pacemaker
>> Arch        : x86_64
>> Version     : 1.1.12
>> Release     : 8.el6_7.2
>>
>> My earlier version was:
>>
>> Name        : pacemaker
>> Arch        : x86_64
>> Version     : 1.1.10
>> Release     : 1.el6_4.4
>>
>> I'm still using an earlier version pcs, because the new one seems to have 
>> issues with python:
>>
>> Name        : pcs
>> Arch        : noarch
>> Version     : 0.9.90
>> Release     : 1.0.1.el6.centos
>>
>> *******************************
>>
>> If anyone has ideas on the cause or thoughts on this, anything would be 
>> greatly appreciated.
>>
>> Thanks!
>>
>>
>>
>> Jeremy Matthews
> 
> 
> 
> 
> ------------------------------
> 
> Message: 4
> Date: Fri, 19 Feb 2016 09:18:22 +0100
> From: Jan Friesse <jfrie...@redhat.com>
> To: Cluster Labs - All topics related to open-source clustering
>       welcomed        <users@clusterlabs.org>
> Subject: Re: [ClusterLabs] Too quick node reboot leads to failed
>       corosync assert on other node(s)
> Message-ID: <56c6cfce.5060...@redhat.com>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
> 
> Michal Koutn? napsal(a):
>> On 02/18/2016 10:40 AM, Christine Caulfield wrote:
>>> I definitely remember looking into this, or something very like it, 
>>> ages ago. I can't find anything in the commit logs for either 
>>> corosync or cman that looks relevant though. If you're seeing it on 
>>> recent builds then it's obviously still a problem anyway and we ought to 
>>> look into it!
>> Thanks for you replies.
>>
>> So far this happened only once and we've done only "post mortem", 
>> alas no available reproducer. If I have time, I'll try to reproduce 
>> it
> 
> Ok. Actually I was trying to reproduce and was really not successful (current 
> master). Steps I've used:
> - 2 nodes, token set to 30 sec
> - execute cpgbench on node2
> - pause node1 corosync (ctrl+z), kill node1 corosync (kill -9 %1)
> - wait until corosync on node2 move into "entering GATHER state from..."
> - execute corosync on node1
> 
> Basically during recovery new node trans list was never send (and/or ignored 
> by node2).
> 
> I'm going to try test v1.4.7, but it's also possible that bug is fixed 
> by other commits (my favorites are 
> cfbb021e130337603fe5b545d1e377296ecb92ea,
> 4ee84c51fa73c4ec7cbee922111a140a3aaf75df,
> f135b680967aaef1d466f40170c75ae3e470e147).
> 
> Regards,
>    Honza
> 
>> locally and check whether it exists in the current version.
>>
>> Michal




------------------------------

_______________________________________________
Users mailing list
Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users


End of Users Digest, Vol 13, Issue 44
*************************************


_______________________________________________
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] ClusterIP location constraint reappears after reboot

Reply via email to