Re: " Does slave have master score? Your logs show only one node with master. To select another node as new master it needs non-zero master score as well."
Yes, it does. I included the corosync.log file from the DC node earlier, but the corosync log file from the other node shows messages like the following: Aug 13 05:59:50 [16795] mgraid-16201289RN00023-1 pengine: debug: master_color: SS16201289RN00023:1 master score: 500 Regards, michael -----Original Message----- From: Users <users-boun...@clusterlabs.org> On Behalf Of users-requ...@clusterlabs.org Sent: Monday, August 12, 2019 9:38 PM To: users@clusterlabs.org Subject: [EXTERNAL] Users Digest, Vol 55, Issue 24 Send Users mailing list submissions to users@clusterlabs.org To subscribe or unsubscribe via the World Wide Web, visit https://lists.clusterlabs.org/mailman/listinfo/users or, via email, send a message with subject or body 'help' to users-requ...@clusterlabs.org You can reach the person managing the list at users-ow...@clusterlabs.org When replying, please edit your Subject line so it is more specific than "Re: Contents of Users digest..." Today's Topics: 1. Re: [EXTERNAL] Users Digest, Vol 55, Issue 19 (Andrei Borzenkov) ---------------------------------------------------------------------- Message: 1 Date: Tue, 13 Aug 2019 07:37:47 +0300 From: Andrei Borzenkov <arvidj...@gmail.com> To: Cluster Labs - All topics related to open-source clustering welcomed <users@clusterlabs.org> Cc: Venkata Reddy Chappavarapu <venkata.chappavar...@harmonicinc.com> Subject: Re: [ClusterLabs] [EXTERNAL] Users Digest, Vol 55, Issue 19 Message-ID: <f7b3f2d1-95e8-4b65-aa72-4074b5944...@gmail.com> Content-Type: text/plain; charset="utf-8" ?????????? ? iPhone > 13 ???. 2019 ?., ? 0:17, Michael Powell <michael.pow...@harmonicinc.com> > ???????(?): > > Yes, I have tried that. I used crm_resource --meta -p resource-stickiness -v > 0 -r SS16201289RN00023 to disable resource stickiness and then kill -9 <pid> > to kill the application associated with the master resource. The results are > the same: the slave resource remains a slave while the failed resource is > restarted and becomes master again. > Does slave have master score? Your logs show only one node with master. To select another node as new master it needs non-zero master score as well. > One approach that seems to work is to run crm_resource -M -r > ms-SS16201289RN00023 -H mgraid-16201289RN00023-1 to move the resource to the > other node (assuming that the master is running on node > mgraid-16201289RN00023-0.) My original understanding was that this would > ?restart? the resource on the destination node, but that was apparently a > misunderstanding. I can change our scripts to use this approach, but a) > thought that maintain the approach of demoting the master resource and > promoting the slave to master was more generic and b) I am unsure of any > potential side effects of moving the resource. Given what I?m trying to > accomplish, is this in fact the preferred approach? > > Regards, > Michael > > > -----Original Message----- > From: Users <users-boun...@clusterlabs.org> On Behalf Of > users-requ...@clusterlabs.org > Sent: Monday, August 12, 2019 1:10 PM > To: users@clusterlabs.org > Subject: [EXTERNAL] Users Digest, Vol 55, Issue 19 > > Send Users mailing list submissions to > users@clusterlabs.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://lists.clusterlabs.org/mailman/listinfo/users > or, via email, send a message with subject or body 'help' to > users-requ...@clusterlabs.org > > You can reach the person managing the list at > users-ow...@clusterlabs.org > > When replying, please edit your Subject line so it is more specific than "Re: > Contents of Users digest..." > > > Today's Topics: > > 1. why is node fenced ? (Lentes, Bernd) > 2. Postgres HA - pacemaker RA do not support auto failback (Shital A) > 3. Re: why is node fenced ? (Chris Walker) > 4. Re: Master/slave failover does not work as expected > (Andrei Borzenkov) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Mon, 12 Aug 2019 18:09:24 +0200 (CEST) > From: "Lentes, Bernd" <bernd.len...@helmholtz-muenchen.de> > To: Pacemaker ML <users@clusterlabs.org> > Subject: [ClusterLabs] why is node fenced ? > Message-ID: > > <546330844.1686419.1565626164456.javamail.zim...@helmholtz-muenchen.de > > > > Content-Type: text/plain; charset=utf-8 > > Hi, > > last Friday (9th of August) i had to install patches on my two-node cluster. > I put one of the nodes (ha-idg-2) into standby (crm node standby ha-idg-2), > patched it, rebooted, started the cluster (systemctl start pacemaker) again, > put the node again online, everything fine. > > Then i wanted to do the same procedure with the other node (ha-idg-1). > I put it in standby, patched it, rebooted, started pacemaker again. > But then ha-idg-1 fenced ha-idg-2, it said the node is unclean. > I know that nodes which are unclean need to be shutdown, that's logical. > > But i don't know from where the conclusion comes that the node is unclean > respectively why it is unclean, i searched in the logs and didn't find any > hint. > > I put the syslog and the pacemaker log on a seafile share, i'd be very > thankful if you'll have a look. > https://hmgubox.helmholtz-muenchen.de/d/53a10960932445fb9cfe/ > > Here the cli history of the commands: > > 17:03:04 crm node standby ha-idg-2 > 17:07:15 zypper up (install Updates on ha-idg-2) > 17:17:30 systemctl reboot > 17:25:21 systemctl start pacemaker.service > 17:25:47 crm node online ha-idg-2 > 17:26:35 crm node standby ha-idg1- > 17:30:21 zypper up (install Updates on ha-idg-1) > 17:37:32 systemctl reboot > 17:43:04 systemctl start pacemaker.service > 17:44:00 ha-idg-1 is fenced > > Thanks. > > Bernd > > OS is SLES 12 SP4, pacemaker 1.1.19, corosync 2.3.6-9.13.1 > > > -- > > Bernd Lentes > Systemadministration > Institut f?r Entwicklungsgenetik > Geb?ude 35.34 - Raum 208 > HelmholtzZentrum m?nchen > bernd.len...@helmholtz-muenchen.de > phone: +49 89 3187 1241 > phone: +49 89 3187 3827 > fax: +49 89 3187 2294 > http://www.helmholtz-muenchen.de/idg > > Perfekt ist wer keine Fehler macht > Also sind Tote perfekt > > Helmholtz Zentrum Muenchen > Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH) > Ingolstaedter Landstr. 1 > 85764 Neuherberg > www.helmholtz-muenchen.de > Aufsichtsratsvorsitzende: MinDir'in Prof. Dr. Veronika von Messling > Geschaeftsfuehrung: Prof. Dr. med. Dr. h.c. Matthias Tschoep, Heinrich > Bassler, Kerstin Guenther > Registergericht: Amtsgericht Muenchen HRB 6466 > USt-IdNr: DE 129521671 > > > > ------------------------------ > > Message: 2 > Date: Mon, 12 Aug 2019 12:24:02 +0530 > From: Shital A <brightuser2...@gmail.com> > To: pgsql-gene...@postgresql.com, Users@clusterlabs.org > Subject: [ClusterLabs] Postgres HA - pacemaker RA do not support auto > failback > Message-ID: > > <camp7vw_kf2em_buh_fpbznc9z6pvvx+7rxjymhfmcozxuwg...@mail.gmail.com> > Content-Type: text/plain; charset="utf-8" > > Hello, > > Postgres version : 9.6 > OS:Rhel 7.6 > > We are working on HA setup for postgres cluster of two nodes in > active-passive mode. > > Installed: > Pacemaker 1.1.19 > Corosync 2.4.3 > > The pacemaker agent with this installation doesn't support automatic > failback. What I mean by that is explained below: > 1. Cluster is setup like A - B with A as master. > 2. Kill services on A, node B will come up as master. > 3. node A is ready to join the cluster, we have to delete the lock > file it creates on any one of the node and execute the cleanup command > to get the node back as standby > > Step 3 is manual so HA is not achieved in real sense. > > Please help to check: > 1. Is there any version of the resouce agent which supports automatic > failback? To avoid generation of lock file and deleting it. > > 2. If there is no such support, if we need such functionality, do we > have to modify existing code? > > How this can be achieved. Please suggest. > Thanks. > > Thanks. > -------------- next part -------------- An HTML attachment was > scrubbed... > URL: > <https://lists.clusterlabs.org/pipermail/users/attachments/20190812/73 > 7a010e/attachment-0001.html> > > ------------------------------ > > Message: 3 > Date: Mon, 12 Aug 2019 17:47:02 +0000 > From: Chris Walker <cwal...@cray.com> > To: Cluster Labs - All topics related to open-source clustering > welcomed <users@clusterlabs.org> > Subject: Re: [ClusterLabs] why is node fenced ? > Message-ID: <eafef777-5a49-4c06-a2f6-8711f528b...@cray.com> > Content-Type: text/plain; charset="utf-8" > > When ha-idg-1 started Pacemaker around 17:43, it did not see ha-idg-2, > for example, > > Aug 09 17:43:05 [6318] ha-idg-1 pacemakerd: info: > pcmk_quorum_notification: Quorum retained | membership=1320 members=1 > > after ~20s (dc-deadtime parameter), ha-idg-2 is marked 'unclean' and > STONITHed as part of startup fencing. > > There is nothing in ha-idg-2's HA logs around 17:43 indicating that it saw > ha-idg-1 either, so it appears that there was no communication at all between > the two nodes. > > I'm not sure exactly why the nodes did not see one another, but there > are indications of network issues around this time > > 2019-08-09T17:42:16.427947+02:00 ha-idg-2 kernel: [ 1229.245533] bond1: now > running without any active interface! > > so perhaps that's related. > > HTH, > Chris > > > ?On 8/12/19, 12:09 PM, "Users on behalf of Lentes, Bernd" > <users-boun...@clusterlabs.org on behalf of > bernd.len...@helmholtz-muenchen.de> wrote: > > Hi, > > last Friday (9th of August) i had to install patches on my two-node > cluster. > I put one of the nodes (ha-idg-2) into standby (crm node standby > ha-idg-2), patched it, rebooted, > started the cluster (systemctl start pacemaker) again, put the node again > online, everything fine. > > Then i wanted to do the same procedure with the other node (ha-idg-1). > I put it in standby, patched it, rebooted, started pacemaker again. > But then ha-idg-1 fenced ha-idg-2, it said the node is unclean. > I know that nodes which are unclean need to be shutdown, that's logical. > > But i don't know from where the conclusion comes that the node is unclean > respectively why it is unclean, > i searched in the logs and didn't find any hint. > > I put the syslog and the pacemaker log on a seafile share, i'd be very > thankful if you'll have a look. > https://hmgubox.helmholtz-muenchen.de/d/53a10960932445fb9cfe/ > > Here the cli history of the commands: > > 17:03:04 crm node standby ha-idg-2 > 17:07:15 zypper up (install Updates on ha-idg-2) > 17:17:30 systemctl reboot > 17:25:21 systemctl start pacemaker.service > 17:25:47 crm node online ha-idg-2 > 17:26:35 crm node standby ha-idg1- > 17:30:21 zypper up (install Updates on ha-idg-1) > 17:37:32 systemctl reboot > 17:43:04 systemctl start pacemaker.service > 17:44:00 ha-idg-1 is fenced > > Thanks. > > Bernd > > OS is SLES 12 SP4, pacemaker 1.1.19, corosync 2.3.6-9.13.1 > > > -- > > Bernd Lentes > Systemadministration > Institut f?r Entwicklungsgenetik > Geb?ude 35.34 - Raum 208 > HelmholtzZentrum m?nchen > bernd.len...@helmholtz-muenchen.de > phone: +49 89 3187 1241 > phone: +49 89 3187 3827 > fax: +49 89 3187 2294 > http://www.helmholtz-muenchen.de/idg > > Perfekt ist wer keine Fehler macht > Also sind Tote perfekt > > > Helmholtz Zentrum Muenchen > Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH) > Ingolstaedter Landstr. 1 > 85764 Neuherberg > www.helmholtz-muenchen.de > Aufsichtsratsvorsitzende: MinDir'in Prof. Dr. Veronika von Messling > Geschaeftsfuehrung: Prof. Dr. med. Dr. h.c. Matthias Tschoep, Heinrich > Bassler, Kerstin Guenther > Registergericht: Amtsgericht Muenchen HRB 6466 > USt-IdNr: DE 129521671 > > _______________________________________________ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ > > > ------------------------------ > > Message: 4 > Date: Mon, 12 Aug 2019 23:09:31 +0300 > From: Andrei Borzenkov <arvidj...@gmail.com> > To: Cluster Labs - All topics related to open-source clustering > welcomed <users@clusterlabs.org> > Cc: Venkata Reddy Chappavarapu <venkata.chappavar...@harmonicinc.com> > Subject: Re: [ClusterLabs] Master/slave failover does not work as > expected > Message-ID: > > <CAA91j0WxSxt_eVmUvXgJ_0goBkBw69r3o-VesRvGc6atg6o=j...@mail.gmail.com> > Content-Type: text/plain; charset="utf-8" > > On Mon, Aug 12, 2019 at 4:12 PM Michael Powell < > michael.pow...@harmonicinc.com> wrote: > > > At 07:44:49, the ss agent discovers that the master instance has > > failed on node *mgraid?-0* as a result of a failed *ssadm* request > > in response to an *ss_monitor()* operation. It issues a *crm_master > > -Q -D* command with the intent of demoting the master and promoting > > the slave, on the other node, to master. The *ss_demote()* function > > finds that the application is no longer running and returns > > *OCF_NOT_RUNNING* (7). In the older product, this was sufficient to > > promote the other instance to master, but in the current product, > > that does not happen. Currently, the failed application is > > restarted, as expected, and is promoted to master, but this takes 10?s of > > seconds. > > > > > > > > Did you try to disable resource stickiness for this ms? > -------------- next part -------------- An HTML attachment was > scrubbed... > URL: > <https://lists.clusterlabs.org/pipermail/users/attachments/20190812/12 > 978d55/attachment.html> > -------------- next part -------------- A non-text attachment was > scrubbed... > Name: image001.gif > Type: image/gif > Size: 1854 bytes > Desc: not available > URL: > <https://lists.clusterlabs.org/pipermail/users/attachments/20190812/12 > 978d55/attachment.gif> > > ------------------------------ > > Subject: Digest Footer > > _______________________________________________ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ > > ------------------------------ > > End of Users Digest, Vol 55, Issue 19 > ************************************* > _______________________________________________ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20190813/6b477712/attachment.html> ------------------------------ Subject: Digest Footer _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/ ------------------------------ End of Users Digest, Vol 55, Issue 24 ************************************* _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/