Re: [ClusterLabs] two virtual domains start and stop every 15 minutes

2019-06-14 Thread Ken Gaillot
On Fri, 2019-06-14 at 18:27 +0200, Lentes, Bernd wrote: > Hi, > > i had that problem already once but still it's not clear for me what > really happens. > I had this problem some days ago: > I have a 2-node cluster with several virtual domains as resources. I > put one node (ha-idg-2) into

[ClusterLabs] two virtual domains start and stop every 15 minutes

2019-06-14 Thread Lentes, Bernd
Hi, i had that problem already once but still it's not clear for me what really happens. I had this problem some days ago: I have a 2-node cluster with several virtual domains as resources. I put one node (ha-idg-2) into standby, and two running virtual domains were migrated to the other node

Re: [ClusterLabs] PostgreSQL PAF failover issue

2019-06-14 Thread Tiemen Ruiten
Right, so I may have been too fast to give up. I set maintenance mode back on and promoted ph-sql-04 manually. Unfortunately I don't have the logs of ph-sql-03 anymore because I reinitialized it. You mention that demote timeout should be start timeout + stop timeout. Start/stop are 60s, so that

Re: [ClusterLabs] PostgreSQL PAF failover issue

2019-06-14 Thread Tiemen Ruiten
I've crossposted the question about checkpoints taking a long time to pgsql-general as well :) On Fri, 14 Jun 2019 at 15:05, Tiemen Ruiten wrote: > Current size of the database is around 600GB uncompressed (LZ4 compression > is enabled on the ZFS dataset). > > On Fri, 14 Jun 2019 at 14:59,

Re: [ClusterLabs] PostgreSQL PAF failover issue

2019-06-14 Thread Jehan-Guillaume de Rorthais
On Fri, 14 Jun 2019 13:18:09 +0200 Tiemen Ruiten wrote: > Thank you, useful advice! > > Logs are attached, they cover the period between when I set > maintenance-mode=false till after the node fencing. Switchover started @ 09:51:43 In fact, the action that timed out was the demote action, not

Re: [ClusterLabs] PostgreSQL PAF failover issue

2019-06-14 Thread Tiemen Ruiten
Current size of the database is around 600GB uncompressed (LZ4 compression is enabled on the ZFS dataset). On Fri, 14 Jun 2019 at 14:59, Tiemen Ruiten wrote: > Hi, yes I'm also puzzled by this. The cluster is certainly not > underpowered, running on baremetal with 8x SSD in ZFS stripe of

Re: [ClusterLabs] PostgreSQL PAF failover issue

2019-06-14 Thread Tiemen Ruiten
Hi, yes I'm also puzzled by this. The cluster is certainly not underpowered, running on baremetal with 8x SSD in ZFS stripe of mirrors, 128 GB RAM and shared_buffers is set to 8GB. other related settings: wal_buffers = 128MB checkpoint_timeout = 60min max_wal_size = 8GB min_wal_size = 1GB

Re: [ClusterLabs] PostgreSQL PAF failover issue

2019-06-14 Thread Adrien Nayrat
On 6/14/19 12:27 PM, Tiemen Ruiten wrote: > This took longer than the configured timeout of 60s (checkpoint hadn't > completed > yet) and the node was fenced. That's surprising checkpoint took longer than 60s. What is the size of your shared_buffers? What kind of hardware do you use (baremetal,

Re: [ClusterLabs] PostgreSQL PAF failover issue

2019-06-14 Thread Jehan-Guillaume de Rorthais
Hi, On Fri, 14 Jun 2019 12:27:12 +0200 Tiemen Ruiten wrote: > I setup a new 3-node PostgreSQL cluster with HA managed by PAF. Nodes are > named ph-sql-03, ph-sql-04, ph-sql-05. Archive mode is on and writing > archive files to an NFS share that's mounted on all nodes using pgBackRest. > > What

[ClusterLabs] PostgreSQL PAF failover issue

2019-06-14 Thread Tiemen Ruiten
Hello, I setup a new 3-node PostgreSQL cluster with HA managed by PAF. Nodes are named ph-sql-03, ph-sql-04, ph-sql-05. Archive mode is on and writing archive files to an NFS share that's mounted on all nodes using pgBackRest. What I did: - Create a pacemaker cluster, cib.xml is attached. - Set

[ClusterLabs] resource-agents v4.3.0 rc1

2019-06-14 Thread Oyvind Albrigtsen
ClusterLabs is happy to announce resource-agents v4.3.0 rc1. Source code is available at: https://github.com/ClusterLabs/resource-agents/releases/tag/v4.3.0rc1 The most significant enhancements in this release are: - new resource agents: - dovecot - vdo-vol - bugfixes and enhancements: -