Hi Ken, Thanks for your response, I've now corrected the constraint order but the behaviour is still the same, the IP does not fail over (after the first time) unless I issue a pcs resource cleanup command on dirsrv-daemon.
Also, I'm not sure why you advise against using is-managed=false in production. We are trying to use pacemaker purely to fail over on detection of a failure and not to control starting or stopping of the instances. It is essential that in normal operation we have both instances up as we are using MMR. Thanks, Bernie -----Original Message----- From: Ken Gaillot [mailto:kgail...@redhat.com] Sent: 10 March 2016 15:01 To: users@clusterlabs.org Subject: Re: [ClusterLabs] FLoating IP failing over but not failing back with active/active LDAP (dirsrv) On 03/10/2016 08:48 AM, Bernie Jones wrote: > A bit more info.. > > > > If, after I restart the failed dirsrv instance, I then perform a "pcs > resource cleanup dirsrv-daemon" to clear the FAIL messages then the failover > will work OK. > > So it's as if the cleanup is changing the status in some way.. > > > > From: Bernie Jones [mailto:ber...@securityconsulting.ltd.uk] > Sent: 10 March 2016 08:47 > To: 'Cluster Labs - All topics related to open-source clustering welcomed' > Subject: [ClusterLabs] FLoating IP failing over but not failing back with > active/active LDAP (dirsrv) > > > > Hi all, could you advise please? > > > > I'm trying to configure a floating IP with an active/active deployment of > 389 directory server. I don't want pacemaker to manage LDAP but just to > monitor and switch the IP as required to provide resilience. I've seen some > other similar threads and based my solution on those. > > > > I've amended the ocf for slapd to work with 389 DS and this tests out OK > (dirsrv). > > > > I've then created my resources as below: > > > > pcs resource create dirsrv-ip ocf:heartbeat:IPaddr2 ip="192.168.26.100" > cidr_netmask="32" op monitor timeout="20s" interval="5s" op start > interval="0" timeout="20" op stop interval="0" timeout="20" > > pcs resource create dirsrv-daemon ocf:heartbeat:dirsrv op monitor > interval="10" timeout="5" op start interval="0" timeout="5" op stop > interval="0" timeout="5" meta "is-managed=false" is-managed=false means the cluster will not try to start or stop the service. It should never be used in regular production, only when doing maintenance on the service. > pcs resource clone dirsrv-daemon meta globally-unique="false" > interleave="true" target-role="Started" "master-max=2" > > pcs constraint colocation add dirsrv-daemon-clone with dirsrv-ip > score=INFINITY This constraint means that dirsrv is only allowed to run where dirsrv-ip is. I suspect you want the reverse, dirsrv-ip with dirsrv-daemon-clone, which means keep the IP with a working dirsrv instance. > pcs property set no-quorum-policy=ignore If you're using corosync 2, you generally don't need or want this. Instead, ensure corosync.conf has two_node: 1 (which will be done automatically if you used pcs cluster setup). > pcs resource defaults migration-threshold=1 > > pcs property set stonith-enabled=false > > > > On startup all looks well: > > ____________________________________________________________________________ > ____________ > > > > Last updated: Thu Mar 10 08:28:03 2016 > > Last change: Thu Mar 10 08:26:14 2016 > > Stack: cman > > Current DC: ga2.idam.com - partition with quorum > > Version: 1.1.11-97629de > > 2 Nodes configured > > 3 Resources configured > > > > > > Online: [ ga1.idam.com ga2.idam.com ] > > > > dirsrv-ip (ocf::heartbeat:IPaddr2): Started ga1.idam.com > > Clone Set: dirsrv-daemon-clone [dirsrv-daemon] > > dirsrv-daemon (ocf::heartbeat:dirsrv): Started ga2.idam.com > (unmanaged) > > dirsrv-daemon (ocf::heartbeat:dirsrv): Started ga1.idam.com > (unmanaged) > > > > > > ____________________________________________________________________________ > ____________ > > > > Stop dirsrv on ga1: > > > > Last updated: Thu Mar 10 08:28:43 2016 > > Last change: Thu Mar 10 08:26:14 2016 > > Stack: cman > > Current DC: ga2.idam.com - partition with quorum > > Version: 1.1.11-97629de > > 2 Nodes configured > > 3 Resources configured > > > > > > Online: [ ga1.idam.com ga2.idam.com ] > > > > dirsrv-ip (ocf::heartbeat:IPaddr2): Started ga2.idam.com > > Clone Set: dirsrv-daemon-clone [dirsrv-daemon] > > dirsrv-daemon (ocf::heartbeat:dirsrv): Started ga2.idam.com > (unmanaged) > > dirsrv-daemon (ocf::heartbeat:dirsrv): FAILED ga1.idam.com > (unmanaged) > > > > Failed actions: > > dirsrv-daemon_monitor_10000 on ga1.idam.com 'not running' (7): call=12, > status=complete, last-rc-change='Thu Mar 10 08:28:41 2016', queued=0ms, > exec=0ms > > > > IP fails over to ga2 OK: > > > > ____________________________________________________________________________ > ____________ > > > > Restart dirsrv on ga1 > > > > Last updated: Thu Mar 10 08:30:01 2016 > > Last change: Thu Mar 10 08:26:14 2016 > > Stack: cman > > Current DC: ga2.idam.com - partition with quorum > > Version: 1.1.11-97629de > > 2 Nodes configured > > 3 Resources configured > > > > > > Online: [ ga1.idam.com ga2.idam.com ] > > > > dirsrv-ip (ocf::heartbeat:IPaddr2): Started ga2.idam.com > > Clone Set: dirsrv-daemon-clone [dirsrv-daemon] > > dirsrv-daemon (ocf::heartbeat:dirsrv): Started ga2.idam.com > (unmanaged) > > dirsrv-daemon (ocf::heartbeat:dirsrv): Started ga1.idam.com > (unmanaged) > > > > Failed actions: > > dirsrv-daemon_monitor_10000 on ga1.idam.com 'not running' (7): call=12, > status=complete, last-rc-change='Thu Mar 10 08:28:41 2016', queued=0ms, > exec=0ms > > > > ____________________________________________________________________________ > ____________ > > > > Stop dirsrv on ga2: > > > > Last updated: Thu Mar 10 08:31:14 2016 > > Last change: Thu Mar 10 08:26:14 2016 > > Stack: cman > > Current DC: ga2.idam.com - partition with quorum > > Version: 1.1.11-97629de > > 2 Nodes configured > > 3 Resources configured > > > > > > Online: [ ga1.idam.com ga2.idam.com ] > > > > dirsrv-ip (ocf::heartbeat:IPaddr2): Started ga2.idam.com > > Clone Set: dirsrv-daemon-clone [dirsrv-daemon] > > dirsrv-daemon (ocf::heartbeat:dirsrv): FAILED ga2.idam.com > (unmanaged) > > dirsrv-daemon (ocf::heartbeat:dirsrv): Started ga1.idam.com > (unmanaged) > > > > Failed actions: > > dirsrv-daemon_monitor_10000 on ga2.idam.com 'not running' (7): call=11, > status=complete, last-rc-change='Thu Mar 10 08:31:12 2016', queued=0ms, > exec=0ms > > dirsrv-daemon_monitor_10000 on ga1.idam.com 'not running' (7): call=12, > status=complete, last-rc-change='Thu Mar 10 08:28:41 2016', queued=0ms, > exec=0ms > > > > But IP stays on failed node > > Looking in the logs it seems that the cluster is not aware that ga1 is > available even though the status output shows it is. > > > > If I repeat the tests but with ga2 started up first the behaviour is similar > i.e. it fails over to ga1 but not back to ga2. > > > > Many thanks, > > Bernie _______________________________________________ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org --- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus --- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus _______________________________________________ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org