Re: [ClusterLabs] ClusterIP location constraint reappears after reboot

Ken Gaillot Mon, 22 Feb 2016 09:49:17 -0800

On 02/22/2016 07:26 AM, Jeremy Matthews wrote:
> Thank you, Ken Gaillot, for your response. Sorry for the delayed followup, 
> but I have looked and looked at the scripts. There are a couple of scripts 
> that have a pcs resource ban command, but they are not executed at the time 
> of shutdown which is when I've discovered that the constraint is put back in. 
> Our application software did not change on the system. We just updated pcs 
> and pacemaker (and dependencies). I had to rollback pcs because it has an 
> issue. 
> 
> Below is from /var/log/cluster/corosync.log. Any clues here as to why the 
> constraint might have been added? In my other system without the pacemaker 
> update, there is not the addition of the constraint.


It might help to see the entire log from the time you issued the reboot
command to when the constraint was added.

Notice in the cib logs it says "origin=local/crm_resource". That means
that crm_resource was what originally added the constraint (pcs resource
ban calls crm_resource).

I'd be curious whether this makes a difference: after removing the
constraint, run "pcs cib-upgrade". It shouldn't, but it's the only thing
I can think of to try.

CIB schema versions change when new features are added that require new
CIB syntax. pcs should automatically run cib-upgrade if you ever use a
newer feature than your current CIB version supports. You don't really
need to cib-upgrade explicitly, but it doesn't hurt, and it will get rid
of those "Transformed the configuration" messages.

> Feb 19 15:22:23 [1999] g5se-f3efce       crmd:     info: do_state_transition: 
>   State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ 
> input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ]
> Feb 19 15:22:23 [1999] g5se-f3efce       crmd:     info: do_te_invoke:  
> Processing graph 9 (ref=pe_calc-dc-1455920543-46) derived from 
> /var/lib/pacemaker/pengine/pe-input-642.bz2
> Feb 19 15:22:23 [1999] g5se-f3efce       crmd:   notice: run_graph:     
> Transition 9 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, 
> Source=/var/lib/pacemaker/pengine/pe-input-642.bz2): Complete
> Feb 19 15:22:23 [1999] g5se-f3efce       crmd:     info: do_log:        FSA: 
> Input I_TE_SUCCESS from notify_crmd() received in state S_TRANSITION_ENGINE
> Feb 19 15:22:23 [1999] g5se-f3efce       crmd:   notice: do_state_transition: 
>   State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS 
> cause=C_FSA_INTERNAL origin=notify_crmd ]
> Feb 19 15:22:23 [1998] g5se-f3efce    pengine:   notice: process_pe_message:  
>   Calculated Transition 9: /var/lib/pacemaker/pengine/pe-input-642.bz2
> Feb 19 15:22:23 [1994] g5se-f3efce        cib:     info: cib_process_request: 
>   Forwarding cib_modify operation for section constraints to master 
> (origin=local/crm_resource/3)
> Feb 19 15:22:23 [1994] g5se-f3efce        cib:     info: cib_perform_op:      
>   Diff: --- 0.291.8 2
> Feb 19 15:22:23 [1994] g5se-f3efce        cib:     info: cib_perform_op:      
>   Diff: +++ 0.292.0 (null)
> Feb 19 15:22:23 [1994] g5se-f3efce        cib:     info: cib_perform_op:      
>   +  /cib:  @epoch=292, @num_updates=0
> Feb 19 15:22:23 [1994] g5se-f3efce        cib:     info: cib_perform_op:      
>   ++ /cib/configuration/constraints:  <rsc_location 
> id="cli-ban-ClusterIP-on-g5se-f3efce" rsc="ClusterIP" role="Started" 
> node="g5se-f3efce" score="-INFINITY"/>
> Feb 19 15:22:23 [1994] g5se-f3efce        cib:     info: cib_process_request: 
>   Completed cib_modify operation for section constraints: OK (rc=0, 
> origin=g5se-f3efce/crm_resource/3, version=0.292.0)
> Feb 19 15:22:23 [1999] g5se-f3efce       crmd:     info: 
> abort_transition_graph:        Transition aborted by 
> rsc_location.cli-ban-ClusterIP-on-g5se-f3efce 'create': Non-status change 
> (cib=0.292.0, source=te_update_diff:383, path=/cib/configuration/constraints, 
> 1)
> Feb 19 15:22:23 [1999] g5se-f3efce       crmd:   notice: do_state_transition: 
>   State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC 
> cause=C_FSA_INTERNAL origin=abort_transition_graph ]
> Feb 19 15:22:23 [1998] g5se-f3efce    pengine:   notice: update_validation:   
>   pacemaker-1.2-style configuration is also valid for pacemaker-1.3
> Feb 19 15:22:23 [1998] g5se-f3efce    pengine:     info: update_validation:   
>   Transformation upgrade-1.3.xsl successful
> Feb 19 15:22:23 [1998] g5se-f3efce    pengine:     info: update_validation:   
>   Transformed the configuration from pacemaker-1.2 to pacemaker-2.0
> Feb 19 15:22:23 [1998] g5se-f3efce    pengine:     info: cli_config_update:   
>   Your configuration was internally updated to the latest version 
> (pacemaker-2.0)
> Feb 19 15:22:23 [1998] g5se-f3efce    pengine:   notice: unpack_config:       
>   On loss of CCM Quorum: Ignore
> Feb 19 15:22:23 [1998] g5se-f3efce    pengine:     info: unpack_status:       
>   Node g5se-f3efce is in standby-mode
> 
> I'm not sure what all has to be included my original email and Ken Gaillot's 
> response embedded in it below. 
> 
> Message: 3
> Date: Thu, 18 Feb 2016 13:37:31 -0600
> From: Ken Gaillot <kgail...@redhat.com>
> To: users@clusterlabs.org
> Subject: Re: [ClusterLabs] ClusterIP location constraint reappears
>       after reboot
> Message-ID: <56c61d7b.9090...@redhat.com>
> Content-Type: text/plain; charset=windows-1252
> 
> On 02/18/2016 01:07 PM, Jeremy Matthews wrote:
>> Hi,
>>
>> We're having an issue with our cluster where after a reboot of our system a 
>> location constraint reappears for the ClusterIP. This causes a problem, 
>> because we have a daemon that checks the cluster state and waits until the 
>> ClusterIP is started before it kicks off our application. We didn't have 
>> this issue when using an earlier version of pacemaker. Here is the 
>> constraint as shown by pcs:
>>
>> [root@g5se-f3efce cib]# pcs constraint Location Constraints:
>>   Resource: ClusterIP
>>     Disabled on: g5se-f3efce (role: Started) Ordering Constraints:
>> Colocation Constraints:
>>
>> ...and here is our cluster status with the ClusterIP being Stopped:
>>
>> [root@g5se-f3efce cib]# pcs status
>> Cluster name: cl-g5se-f3efce
>> Last updated: Thu Feb 18 11:36:01 2016 Last change: Thu Feb 18 
>> 10:48:33 2016 via crm_resource on g5se-f3efce
>> Stack: cman
>> Current DC: g5se-f3efce - partition with quorum
>> Version: 1.1.11-97629de
>> 1 Nodes configured
>> 4 Resources configured
>>
>>
>> Online: [ g5se-f3efce ]
>>
>> Full list of resources:
>>
>> sw-ready-g5se-f3efce   (ocf::pacemaker:GBmon): Started g5se-f3efce
>> meta-data      (ocf::pacemaker:GBmon): Started g5se-f3efce
>> netmon (ocf::heartbeat:ethmonitor):    Started g5se-f3efce
>> ClusterIP      (ocf::heartbeat:IPaddr2):       Stopped
>>
>>
>> The cluster really just has one node at this time.
>>
>> I retrieve the constraint ID, remove the constraint, verify that ClusterIP 
>> is started,  and then reboot:
>>
>> [root@g5se-f3efce cib]# pcs constraint ref ClusterIP
>> Resource: ClusterIP
>>   cli-ban-ClusterIP-on-g5se-f3efce
>> [root@g5se-f3efce cib]# pcs constraint remove 
>> cli-ban-ClusterIP-on-g5se-f3efce
>>
>> [root@g5se-f3efce cib]# pcs status
>> Cluster name: cl-g5se-f3efce
>> Last updated: Thu Feb 18 11:45:09 2016 Last change: Thu Feb 18 
>> 11:44:53 2016 via crm_resource on g5se-f3efce
>> Stack: cman
>> Current DC: g5se-f3efce - partition with quorum
>> Version: 1.1.11-97629de
>> 1 Nodes configured
>> 4 Resources configured
>>
>>
>> Online: [ g5se-f3efce ]
>>
>> Full list of resources:
>>
>> sw-ready-g5se-f3efce   (ocf::pacemaker:GBmon): Started g5se-f3efce
>> meta-data      (ocf::pacemaker:GBmon): Started g5se-f3efce
>> netmon (ocf::heartbeat:ethmonitor):    Started g5se-f3efce
>> ClusterIP      (ocf::heartbeat:IPaddr2):       Started g5se-f3efce
>>
>>
>> [root@g5se-f3efce cib]# reboot
>>
>> ....after reboot, log in, and the constraint is back and ClusterIP has not 
>> started.
>>
>>
>> I have noticed in /var/lib/pacemaker/cib that the cib-x.raw files get 
>> created when there are changes to the cib (cib.xml). After a reboot, I see 
>> the constraint being added in a diff between .raw files:
>>
>> [root@g5se-f3efce cib]# diff cib-7.raw cib-8.raw
>> 1c1
>> < <cib epoch="239" num_updates="0" admin_epoch="0" 
>> validate-with="pacemaker-1.2" cib-last-written="Thu Feb 18 11:44:53 
>> 2016" update-origin="g5se-f3efce" update-client="crm_resource" 
>> crm_feature_set="3.0.9" have-quorum="1" dc-uuid="g5se-f3efce">
>> ---
>>> <cib epoch="240" num_updates="0" admin_epoch="0" 
>>> validate-with="pacemaker-1.2" cib-last-written="Thu Feb 18 11:46:49 
>>> 2016" update-origin="g5se-f3efce" update-client="crm_resource" 
>>> crm_feature_set="3.0.9" have-quorum="1" dc-uuid="g5se-f3efce">
>> 50c50,52
>> <     <constraints/>
>> ---
>>>     <constraints>
>>>       <rsc_location id="cli-ban-ClusterIP-on-g5se-f3efce" rsc="ClusterIP" 
>>> role="Started" node="g5se-f3efce" score="-INFINITY"/>
>>>     </constraints>
>>
>>
>> I have also looked in /var/log/cluster/corosync.log and seen logs where it 
>> seems the cib is getting updated. I'm not sure if the constraint is being 
>> put back in at shutdown or at start up. I just don't understand why it's 
>> being put back in. I don't think our daemon code or other scripts are doing 
>> this,  but it is something I could verify.
> 
> I would look at any scripts running around that time first. Constraints that 
> start with "cli-" were created by one of the CLI tools, so something must be 
> calling it. The most likely candidates are pcs resource move/ban or 
> crm_resource -M/--move/-B/--ban.
> 
>> ********************************
>>
>> From "yum info pacemaker", my current version is:
>>
>> Name        : pacemaker
>> Arch        : x86_64
>> Version     : 1.1.12
>> Release     : 8.el6_7.2
>>
>> My earlier version was:
>>
>> Name        : pacemaker
>> Arch        : x86_64
>> Version     : 1.1.10
>> Release     : 1.el6_4.4
>>
>> I'm still using an earlier version pcs, because the new one seems to have 
>> issues with python:
>>
>> Name        : pcs
>> Arch        : noarch
>> Version     : 0.9.90
>> Release     : 1.0.1.el6.centos
>>
>> *******************************
>>
>> If anyone has ideas on the cause or thoughts on this, anything would be 
>> greatly appreciated.
>>
>> Thanks!
>>
>>
>>
>> Jeremy Matthews
> 
> -----Original Message-----
> From: users-requ...@clusterlabs.org [mailto:users-requ...@clusterlabs.org] 
> Sent: Friday, February 19, 2016 2:21 AM
> To: users@clusterlabs.org
> Subject: Users Digest, Vol 13, Issue 35
> 
> Send Users mailing list submissions to
>       users@clusterlabs.org
> 
> To subscribe or unsubscribe via the World Wide Web, visit
>       http://clusterlabs.org/mailman/listinfo/users
> or, via email, send a message with subject or body 'help' to
>       users-requ...@clusterlabs.org
> 
> You can reach the person managing the list at
>       users-ow...@clusterlabs.org
> 
> When replying, please edit your Subject line so it is more specific than "Re: 
> Contents of Users digest..."
> 
> 
> Today's Topics:
> 
>    1. Re: Too quick node reboot leads to failed corosync assert on
>       other node(s) (Michal Koutn?)
>    2. ClusterIP location constraint reappears after reboot
>       (Jeremy Matthews)
>    3. Re: ClusterIP location constraint reappears after reboot
>       (Ken Gaillot)
>    4. Re: Too quick node reboot leads to failed corosync assert on
>       other node(s) (Jan Friesse)
> 
> 
> ----------------------------------------------------------------------
> 
> Message: 1
> Date: Thu, 18 Feb 2016 17:32:48 +0100
> From: Michal Koutn? <mkou...@suse.com>
> To: users@clusterlabs.org
> Subject: Re: [ClusterLabs] Too quick node reboot leads to failed
>       corosync assert on other node(s)
> Message-ID: <56c5f230.6020...@suse.com>
> Content-Type: text/plain; charset="windows-1252"
> 
> On 02/18/2016 10:40 AM, Christine Caulfield wrote:
>> I definitely remember looking into this, or something very like it, 
>> ages ago. I can't find anything in the commit logs for either corosync 
>> or cman that looks relevant though. If you're seeing it on recent 
>> builds then it's obviously still a problem anyway and we ought to look into 
>> it!
> Thanks for you replies.
> 
> So far this happened only once and we've done only "post mortem", alas no 
> available reproducer. If I have time, I'll try to reproduce it locally and 
> check whether it exists in the current version.
> 
> Michal
> 
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: signature.asc
> Type: application/pgp-signature
> Size: 819 bytes
> Desc: OpenPGP digital signature
> URL: 
> <http://clusterlabs.org/pipermail/users/attachments/20160218/97908c9d/attachment-0001.sig>
> 
> ------------------------------
> 
> Message: 2
> Date: Thu, 18 Feb 2016 19:07:19 +0000
> From: Jeremy Matthews <jeremy.matth...@genband.com>
> To: "users@clusterlabs.org" <users@clusterlabs.org>
> Subject: [ClusterLabs] ClusterIP location constraint reappears after
>       reboot
> Message-ID:
>       <ba3fced1d982a94aa64964f08b104956012d760...@gbplmail01.genband.com>
> Content-Type: text/plain; charset="windows-1252"
> 
> Hi,
> 
> We're having an issue with our cluster where after a reboot of our system a 
> location constraint reappears for the ClusterIP. This causes a problem, 
> because we have a daemon that checks the cluster state and waits until the 
> ClusterIP is started before it kicks off our application. We didn't have this 
> issue when using an earlier version of pacemaker. Here is the constraint as 
> shown by pcs:
> 
> [root@g5se-f3efce cib]# pcs constraint
> Location Constraints:
>   Resource: ClusterIP
>     Disabled on: g5se-f3efce (role: Started) Ordering Constraints:
> Colocation Constraints:
> 
> ...and here is our cluster status with the ClusterIP being Stopped:
> 
> [root@g5se-f3efce cib]# pcs status
> Cluster name: cl-g5se-f3efce
> Last updated: Thu Feb 18 11:36:01 2016
> Last change: Thu Feb 18 10:48:33 2016 via crm_resource on g5se-f3efce
> Stack: cman
> Current DC: g5se-f3efce - partition with quorum
> Version: 1.1.11-97629de
> 1 Nodes configured
> 4 Resources configured
> 
> 
> Online: [ g5se-f3efce ]
> 
> Full list of resources:
> 
> sw-ready-g5se-f3efce   (ocf::pacemaker:GBmon): Started g5se-f3efce
> meta-data      (ocf::pacemaker:GBmon): Started g5se-f3efce
> netmon (ocf::heartbeat:ethmonitor):    Started g5se-f3efce
> ClusterIP      (ocf::heartbeat:IPaddr2):       Stopped
> 
> 
> The cluster really just has one node at this time.
> 
> I retrieve the constraint ID, remove the constraint, verify that ClusterIP is 
> started,  and then reboot:
> 
> [root@g5se-f3efce cib]# pcs constraint ref ClusterIP
> Resource: ClusterIP
>   cli-ban-ClusterIP-on-g5se-f3efce
> [root@g5se-f3efce cib]# pcs constraint remove cli-ban-ClusterIP-on-g5se-f3efce
> 
> [root@g5se-f3efce cib]# pcs status
> Cluster name: cl-g5se-f3efce
> Last updated: Thu Feb 18 11:45:09 2016
> Last change: Thu Feb 18 11:44:53 2016 via crm_resource on g5se-f3efce
> Stack: cman
> Current DC: g5se-f3efce - partition with quorum
> Version: 1.1.11-97629de
> 1 Nodes configured
> 4 Resources configured
> 
> 
> Online: [ g5se-f3efce ]
> 
> Full list of resources:
> 
> sw-ready-g5se-f3efce   (ocf::pacemaker:GBmon): Started g5se-f3efce
> meta-data      (ocf::pacemaker:GBmon): Started g5se-f3efce
> netmon (ocf::heartbeat:ethmonitor):    Started g5se-f3efce
> ClusterIP      (ocf::heartbeat:IPaddr2):       Started g5se-f3efce
> 
> 
> [root@g5se-f3efce cib]# reboot
> 
> ....after reboot, log in, and the constraint is back and ClusterIP has not 
> started.
> 
> 
> I have noticed in /var/lib/pacemaker/cib that the cib-x.raw files get created 
> when there are changes to the cib (cib.xml). After a reboot, I see the 
> constraint being added in a diff between .raw files:
> 
> [root@g5se-f3efce cib]# diff cib-7.raw cib-8.raw
> 1c1
> < <cib epoch="239" num_updates="0" admin_epoch="0" 
> validate-with="pacemaker-1.2" cib-last-written="Thu Feb 18 11:44:53 2016" 
> update-origin="g5se-f3efce" update-client="crm_resource" 
> crm_feature_set="3.0.9" have-quorum="1" dc-uuid="g5se-f3efce">
> ---
>> <cib epoch="240" num_updates="0" admin_epoch="0" 
>> validate-with="pacemaker-1.2" cib-last-written="Thu Feb 18 11:46:49 
>> 2016" update-origin="g5se-f3efce" update-client="crm_resource" 
>> crm_feature_set="3.0.9" have-quorum="1" dc-uuid="g5se-f3efce">
> 50c50,52
> <     <constraints/>
> ---
>>     <constraints>
>>       <rsc_location id="cli-ban-ClusterIP-on-g5se-f3efce" rsc="ClusterIP" 
>> role="Started" node="g5se-f3efce" score="-INFINITY"/>
>>     </constraints>
> 
> 
> I have also looked in /var/log/cluster/corosync.log and seen logs where it 
> seems the cib is getting updated. I'm not sure if the constraint is being put 
> back in at shutdown or at start up. I just don't understand why it's being 
> put back in. I don't think our daemon code or other scripts are doing this,  
> but it is something I could verify.
> 
> ********************************
> 
>>From "yum info pacemaker", my current version is:
> 
> Name        : pacemaker
> Arch        : x86_64
> Version     : 1.1.12
> Release     : 8.el6_7.2
> 
> My earlier version was:
> 
> Name        : pacemaker
> Arch        : x86_64
> Version     : 1.1.10
> Release     : 1.el6_4.4
> 
> I'm still using an earlier version pcs, because the new one seems to have 
> issues with python:
> 
> Name        : pcs
> Arch        : noarch
> Version     : 0.9.90
> Release     : 1.0.1.el6.centos
> 
> *******************************
> 
> If anyone has ideas on the cause or thoughts on this, anything would be 
> greatly appreciated.
> 
> Thanks!
> 
> 
> 
> Jeremy Matthews
> 
> 
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: 
> <http://clusterlabs.org/pipermail/users/attachments/20160218/8a4b99fd/attachment-0001.html>
> 
> ------------------------------
> 
> Message: 3
> Date: Thu, 18 Feb 2016 13:37:31 -0600
> From: Ken Gaillot <kgail...@redhat.com>
> To: users@clusterlabs.org
> Subject: Re: [ClusterLabs] ClusterIP location constraint reappears
>       after reboot
> Message-ID: <56c61d7b.9090...@redhat.com>
> Content-Type: text/plain; charset=windows-1252
> 
> On 02/18/2016 01:07 PM, Jeremy Matthews wrote:
>> Hi,
>>
>> We're having an issue with our cluster where after a reboot of our system a 
>> location constraint reappears for the ClusterIP. This causes a problem, 
>> because we have a daemon that checks the cluster state and waits until the 
>> ClusterIP is started before it kicks off our application. We didn't have 
>> this issue when using an earlier version of pacemaker. Here is the 
>> constraint as shown by pcs:
>>
>> [root@g5se-f3efce cib]# pcs constraint Location Constraints:
>>   Resource: ClusterIP
>>     Disabled on: g5se-f3efce (role: Started) Ordering Constraints:
>> Colocation Constraints:
>>
>> ...and here is our cluster status with the ClusterIP being Stopped:
>>
>> [root@g5se-f3efce cib]# pcs status
>> Cluster name: cl-g5se-f3efce
>> Last updated: Thu Feb 18 11:36:01 2016 Last change: Thu Feb 18 
>> 10:48:33 2016 via crm_resource on g5se-f3efce
>> Stack: cman
>> Current DC: g5se-f3efce - partition with quorum
>> Version: 1.1.11-97629de
>> 1 Nodes configured
>> 4 Resources configured
>>
>>
>> Online: [ g5se-f3efce ]
>>
>> Full list of resources:
>>
>> sw-ready-g5se-f3efce   (ocf::pacemaker:GBmon): Started g5se-f3efce
>> meta-data      (ocf::pacemaker:GBmon): Started g5se-f3efce
>> netmon (ocf::heartbeat:ethmonitor):    Started g5se-f3efce
>> ClusterIP      (ocf::heartbeat:IPaddr2):       Stopped
>>
>>
>> The cluster really just has one node at this time.
>>
>> I retrieve the constraint ID, remove the constraint, verify that ClusterIP 
>> is started,  and then reboot:
>>
>> [root@g5se-f3efce cib]# pcs constraint ref ClusterIP
>> Resource: ClusterIP
>>   cli-ban-ClusterIP-on-g5se-f3efce
>> [root@g5se-f3efce cib]# pcs constraint remove 
>> cli-ban-ClusterIP-on-g5se-f3efce
>>
>> [root@g5se-f3efce cib]# pcs status
>> Cluster name: cl-g5se-f3efce
>> Last updated: Thu Feb 18 11:45:09 2016 Last change: Thu Feb 18 
>> 11:44:53 2016 via crm_resource on g5se-f3efce
>> Stack: cman
>> Current DC: g5se-f3efce - partition with quorum
>> Version: 1.1.11-97629de
>> 1 Nodes configured
>> 4 Resources configured
>>
>>
>> Online: [ g5se-f3efce ]
>>
>> Full list of resources:
>>
>> sw-ready-g5se-f3efce   (ocf::pacemaker:GBmon): Started g5se-f3efce
>> meta-data      (ocf::pacemaker:GBmon): Started g5se-f3efce
>> netmon (ocf::heartbeat:ethmonitor):    Started g5se-f3efce
>> ClusterIP      (ocf::heartbeat:IPaddr2):       Started g5se-f3efce
>>
>>
>> [root@g5se-f3efce cib]# reboot
>>
>> ....after reboot, log in, and the constraint is back and ClusterIP has not 
>> started.
>>
>>
>> I have noticed in /var/lib/pacemaker/cib that the cib-x.raw files get 
>> created when there are changes to the cib (cib.xml). After a reboot, I see 
>> the constraint being added in a diff between .raw files:
>>
>> [root@g5se-f3efce cib]# diff cib-7.raw cib-8.raw
>> 1c1
>> < <cib epoch="239" num_updates="0" admin_epoch="0" 
>> validate-with="pacemaker-1.2" cib-last-written="Thu Feb 18 11:44:53 
>> 2016" update-origin="g5se-f3efce" update-client="crm_resource" 
>> crm_feature_set="3.0.9" have-quorum="1" dc-uuid="g5se-f3efce">
>> ---
>>> <cib epoch="240" num_updates="0" admin_epoch="0" 
>>> validate-with="pacemaker-1.2" cib-last-written="Thu Feb 18 11:46:49 
>>> 2016" update-origin="g5se-f3efce" update-client="crm_resource" 
>>> crm_feature_set="3.0.9" have-quorum="1" dc-uuid="g5se-f3efce">
>> 50c50,52
>> <     <constraints/>
>> ---
>>>     <constraints>
>>>       <rsc_location id="cli-ban-ClusterIP-on-g5se-f3efce" rsc="ClusterIP" 
>>> role="Started" node="g5se-f3efce" score="-INFINITY"/>
>>>     </constraints>
>>
>>
>> I have also looked in /var/log/cluster/corosync.log and seen logs where it 
>> seems the cib is getting updated. I'm not sure if the constraint is being 
>> put back in at shutdown or at start up. I just don't understand why it's 
>> being put back in. I don't think our daemon code or other scripts are doing 
>> this,  but it is something I could verify.
> 
> I would look at any scripts running around that time first. Constraints that 
> start with "cli-" were created by one of the CLI tools, so something must be 
> calling it. The most likely candidates are pcs resource move/ban or 
> crm_resource -M/--move/-B/--ban.
> 
>> ********************************
>>
>> From "yum info pacemaker", my current version is:
>>
>> Name        : pacemaker
>> Arch        : x86_64
>> Version     : 1.1.12
>> Release     : 8.el6_7.2
>>
>> My earlier version was:
>>
>> Name        : pacemaker
>> Arch        : x86_64
>> Version     : 1.1.10
>> Release     : 1.el6_4.4
>>
>> I'm still using an earlier version pcs, because the new one seems to have 
>> issues with python:
>>
>> Name        : pcs
>> Arch        : noarch
>> Version     : 0.9.90
>> Release     : 1.0.1.el6.centos
>>
>> *******************************
>>
>> If anyone has ideas on the cause or thoughts on this, anything would be 
>> greatly appreciated.
>>
>> Thanks!
>>
>>
>>
>> Jeremy Matthews
> 
> 
> 
> 
> ------------------------------
> 
> Message: 4
> Date: Fri, 19 Feb 2016 09:18:22 +0100
> From: Jan Friesse <jfrie...@redhat.com>
> To: Cluster Labs - All topics related to open-source clustering
>       welcomed        <users@clusterlabs.org>
> Subject: Re: [ClusterLabs] Too quick node reboot leads to failed
>       corosync assert on other node(s)
> Message-ID: <56c6cfce.5060...@redhat.com>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
> 
> Michal Koutn? napsal(a):
>> On 02/18/2016 10:40 AM, Christine Caulfield wrote:
>>> I definitely remember looking into this, or something very like it, 
>>> ages ago. I can't find anything in the commit logs for either 
>>> corosync or cman that looks relevant though. If you're seeing it on 
>>> recent builds then it's obviously still a problem anyway and we ought to 
>>> look into it!
>> Thanks for you replies.
>>
>> So far this happened only once and we've done only "post mortem", alas 
>> no available reproducer. If I have time, I'll try to reproduce it
> 
> Ok. Actually I was trying to reproduce and was really not successful (current 
> master). Steps I've used:
> - 2 nodes, token set to 30 sec
> - execute cpgbench on node2
> - pause node1 corosync (ctrl+z), kill node1 corosync (kill -9 %1)
> - wait until corosync on node2 move into "entering GATHER state from..."
> - execute corosync on node1
> 
> Basically during recovery new node trans list was never send (and/or ignored 
> by node2).
> 
> I'm going to try test v1.4.7, but it's also possible that bug is fixed by 
> other commits (my favorites are cfbb021e130337603fe5b545d1e377296ecb92ea,
> 4ee84c51fa73c4ec7cbee922111a140a3aaf75df,
> f135b680967aaef1d466f40170c75ae3e470e147).
> 
> Regards,
>    Honza
> 
>> locally and check whether it exists in the current version.
>>
>> Michal


_______________________________________________
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] ClusterIP location constraint reappears after reboot

Reply via email to