Re: [ClusterLabs] Pacemaker Shutdown

2020-07-22 Thread Harvey Shepherd
then you could fence it from node 2. Others on the list may think of something I haven't considered here. On Wed, Jul 22, 2020 at 2:43 PM Harvey Shepherd mailto:harvey.sheph...@aviatnet.com>> wrote: Thanks for your response Reid. What you say makes sense, and under normal circumsta

Re: [ClusterLabs] Pacemaker Shutdown

2020-07-22 Thread Harvey Shepherd
uster Labs - All topics related to open-source clustering welcomed Subject: EXTERNAL: Re: [ClusterLabs] Pacemaker Shutdown On Tue, Jul 21, 2020 at 11:42 PM Harvey Shepherd mailto:harvey.sheph...@aviatnet.com>> wrote: Hi All, I'm running Pacemaker 2.0.3 on a two-node cluster

[ClusterLabs] Pacemaker Shutdown

2020-07-21 Thread Harvey Shepherd
Hi All, I'm running Pacemaker 2.0.3 on a two-node cluster, controlling 40+ resources which are a mixture of clones and other resources that are colocated with the master instance of certain clones. I've noticed that if I terminate pacemaker on the node that is hosting the master instances of th

Re: [ClusterLabs] Master/slave failover does not work as expected

2019-08-12 Thread Harvey Shepherd
help' to users-requ...@clusterlabs.org You can reach the person managing the list at users-ow...@clusterlabs.org When replying, please edit your Subject line so it is more specific than "Re: Contents

Re: [ClusterLabs] Master/slave failover does not work as expected

2019-08-12 Thread Harvey Shepherd
I've been experiencing exactly the same issue. Pacemaker prioritises restarting the failed resource over maintaining a master instance. In my case I used crm_simulate to analyse the actions planned and taken by pacemaker during resource recovery. It showed that the system did plan to failover th

Re: [ClusterLabs] Problems with master/slave failovers

2019-07-04 Thread Harvey Shepherd
er Labs - All topics related to open-source clustering welcomed Subject: EXTERNAL: Re: [ClusterLabs] Problems with master/slave failovers On Wed, Jul 3, 2019 at 12:59 AM Ken Gaillot wrote: > > On Mon, 2019-07-01 at 23:30 +, Harvey Shepherd wrote: > > > The "transition summ

Re: [ClusterLabs] Strange monitor return code log for LSB resource

2019-07-04 Thread Harvey Shepherd
ols installed or in use (it's a very isolated system with very few resources installed), although I was using crm_monitor at the time. Does that run crm_resource under the hood? Regards, Harvey ____ From: Harvey Shepherd Sent: Wednesday, 26 June 2019 9:26 a.

Re: [ClusterLabs] Problems with master/slave failovers

2019-07-01 Thread Harvey Shepherd
efore promoting the slave. From: Users on behalf of Andrei Borzenkov Sent: Tuesday, 2 July 2019 3:42 p.m. To: users@clusterlabs.org Subject: EXTERNAL: Re: [ClusterLabs] Problems with master/slave failovers 02.07.2019 2:30, Harvey Shepherd пишет: >

Re: [ClusterLabs] Problems with master/slave failovers

2019-07-01 Thread Harvey Shepherd
> The "transition summary" is just a resource-by-resource list, not the > order things will be done. The "executing cluster transition" section > is the order things are being done. Thanks Ken. I think that's where the problem is originating. If you look at the "executing cluster transition" sect

Re: [ClusterLabs] Problems with master/slave failovers

2019-06-30 Thread Harvey Shepherd
hen-action="start". As I mentioned in my last message I have trouble with using first-action="promote" because some of the dependents are clone resources. I just tried it again and the dependent clones only start on the master node with this setting. What I really need is a f

Re: [ClusterLabs] EXTERNAL: Re: Problems with master/slave failovers

2019-06-29 Thread Harvey Shepherd
d let you know how it goes. Thanks, Harvey On 30 Jun 2019 5:14 pm, Andrei Borzenkov wrote: 28.06.2019 9:45, Andrei Borzenkov пишет: > On Fri, Jun 28, 2019 at 7:24 AM Harvey Shepherd > wrote: >> >> Hi All, >> >> >> I'm running Pacemaker 2.0.2 on a two node

Re: [ClusterLabs] Problems with master/slave failovers

2019-06-28 Thread Harvey Shepherd
ster/slave failovers 29.06.2019 6:01, Harvey Shepherd пишет: > > As you can see, it eventually gives up in the transition attempt and starts a > new one. Eventually the failed king resource master has had time to come back > online and it then just promotes it again and forgets about tryi

Re: [ClusterLabs] Problems with master/slave failovers

2019-06-28 Thread Harvey Shepherd
abs - All topics related to open-source clustering welcomed Subject: EXTERNAL: Re: [ClusterLabs] Problems with master/slave failovers On Fri, 2019-06-28 at 07:36 +, Harvey Shepherd wrote: > Thanks for your reply Andrei. Whilst I understand what you say about > the difficulties of diagnosing

Re: [ClusterLabs] Problems with master/slave failovers

2019-06-28 Thread Harvey Shepherd
re-promote it to master rather than failing over. Could these transitions be being aborted due to them taking too long to complete? If so, is there a configuration option I can set to increase the timeout? Thanks, Harvey ____ From: Users on behalf of Harvey Shepher

Re: [ClusterLabs] Problems with master/slave failovers

2019-06-28 Thread Harvey Shepherd
help. On 28 Jun 2019 6:46 pm, Andrei Borzenkov wrote: On Fri, Jun 28, 2019 at 7:24 AM Harvey Shepherd wrote: > > Hi All, > > > I'm running Pacemaker 2.0.2 on a two node cluster. It runs one master/slave > resource (I'll refer to it as the king resource) and about

[ClusterLabs] Problems with master/slave failovers

2019-06-27 Thread Harvey Shepherd
Hi All, I'm running Pacemaker 2.0.2 on a two node cluster. It runs one master/slave resource (I'll refer to it as the king resource) and about 20 other resources which are a mixture of: - resources that only run on the king resource master node (colocation constraint with a score of INFINITY

Re: [ClusterLabs] EXTERNAL: Re: Strange monitor return code log for LSB resource

2019-06-25 Thread Harvey Shepherd
_ From: Users on behalf of Andrei Borzenkov Sent: Wednesday, 26 June 2019 4:47 a.m. To: users@clusterlabs.org Subject: EXTERNAL: Re: [ClusterLabs] Strange monitor return code log for LSB resource 25.06.2019 16:53, Harvey Shepherd пишет: > Hi All, > > > I have a 2 node cluster run

[ClusterLabs] Strange monitor return code log for LSB resource

2019-06-25 Thread Harvey Shepherd
B in the CIB. 2. Why does the status operation always return 0 (running) and the monitor operation always returns 7 (not running)? 2. Why is fail-count not being incremented even though failures are being logged? I would really appreciate any pointers that anyone could give me. Perhaps

Re: [ClusterLabs] EXTERNAL: Re: Pacemaker not reacting as I would expect when two resources fail at the same time

2019-06-09 Thread Harvey Shepherd
manager recover. ocf_log err "Unexpected error, cannot promote" exit $rc ;; esac return $OCF_SUCCESS } main_demote() { main_start_backup return $OCF_SUCCESS } Thanks again for any help you can provide. Regards, Harvey ______

Re: [ClusterLabs] EXTERNAL: Re: Pacemaker not reacting as I would expect when two resources fail at the same time

2019-06-07 Thread Harvey Shepherd
__ From: Users on behalf of Ken Gaillot Sent: Saturday, 1 June 2019 5:40 a.m. To: Cluster Labs - All topics related to open-source clustering welcomed Subject: EXTERNAL: Re: [ClusterLabs] Pacemaker not reacting as I would expect when two resources fail at the

[ClusterLabs] Pacemaker not reacting as I would expect when two resources fail at the same time

2019-05-30 Thread Harvey Shepherd
Hi All, I'm running Pacemaker 2.0.1 on a cluster containing two nodes; one master and one slave. I have a main master/slave resource (m_main_system), a group of resources that run in active-active mode (active_active - i.e. run on both nodes), and a group that runs in active-disabled mode (snm