Re: [ClusterLabs] Add Resource Environment Variables

2019-07-15 Thread Tiemen Ruiten
4. > > On Jul 15, 2019 4:53:47 PM, Tiemen Ruiten wrote: > > You could just export the variables in .pgsql_profile in the home > directory of the user running PostgreSQL (usually /var/lib/pgsql). This is > what I have in there for oracle_fdw: > > export PATH=$PATH:/usr/pgsql-11

Re: [ClusterLabs] Add Resource Environment Variables

2019-07-15 Thread Tiemen Ruiten
You could just export the variables in .pgsql_profile in the home directory of the user running PostgreSQL (usually /var/lib/pgsql). This is what I have in there for oracle_fdw: export PATH=$PATH:/usr/pgsql-11/bin > ORACLE_HOME=/usr/lib/oracle/12.1/client64 > export

Re: [ClusterLabs] PAF fails to promote slave: Can not get current node LSN location

2019-07-10 Thread Tiemen Ruiten
On Wed, Jul 10, 2019 at 2:47 PM Jehan-Guillaume de Rorthais wrote: > > > > I double-checked monitoring data: there was approximately one minute of > > replication lag on one slave and two minutes of replication lag on the > > other slave when the original issue occurred. > > what lag? current

Re: [ClusterLabs] PAF fails to promote slave: Can not get current node LSN location

2019-07-09 Thread Tiemen Ruiten
On Tue, Jul 9, 2019 at 4:21 PM Jehan-Guillaume de Rorthais wrote: > On Tue, 9 Jul 2019 13:22:06 +0200 > Tiemen Ruiten wrote: > > > On Mon, Jul 8, 2019 at 10:01 PM Jehan-Guillaume de Rorthais < > j...@dalibo.com> > ... > > > I dig in xlog.c today. Maybe

Re: [ClusterLabs] PAF fails to promote slave: Can not get current node LSN location

2019-07-09 Thread Tiemen Ruiten
On Mon, Jul 8, 2019 at 10:01 PM Jehan-Guillaume de Rorthais wrote: > I should have step up to this thread, sorry :) > Really appreciate all the assistance so far. > The real problem is not how much xact you will lost during failover, but > how we > can choose the best standby to elect. This

Re: [ClusterLabs] PAF fails to promote slave: Can not get current node LSN location

2019-07-08 Thread Tiemen Ruiten
On Mon, Jul 8, 2019 at 4:59 PM Jehan-Guillaume de Rorthais wrote: > On Mon, 8 Jul 2019 13:56:49 +0200 > Tiemen Ruiten wrote: > > > Thank you for the clear explanation and advice. > > > > Hardware is adequate: 8x SSD and 20 cores per node, but I should note >

Re: [ClusterLabs] PAF fails to promote slave: Can not get current node LSN location

2019-07-08 Thread Tiemen Ruiten
200 > Tiemen Ruiten wrote: > > > On Fri, Jul 5, 2019 at 5:09 PM Jehan-Guillaume de Rorthais < > j...@dalibo.com> > > wrote: > > > > > It seems to me the problem comes from here: > > > > > > Jul 03 19:31:38 [30151] ph-sql-03.prod.ams.i.

Re: [ClusterLabs] PAF fails to promote slave: Can not get current node LSN location

2019-07-06 Thread Tiemen Ruiten
On Fri, Jul 5, 2019 at 5:09 PM Jehan-Guillaume de Rorthais wrote: > It seems to me the problem comes from here: > > Jul 03 19:31:38 [30151] ph-sql-03.prod.ams.i.rdmedia.com crmd: > notice: > te_rsc_command: Initiating notify operation > pgsqld_pre_notify_promote_0 on

Re: [ClusterLabs] PostgreSQL PAF failover issue

2019-06-14 Thread Tiemen Ruiten
that would mean 120s for demote timeout? Or 30s for start/stop? On Fri, 14 Jun 2019 at 15:55, Jehan-Guillaume de Rorthais wrote: > On Fri, 14 Jun 2019 13:18:09 +0200 > Tiemen Ruiten wrote: > > > Thank you, useful advice! > > > > Logs are attached, they cover th

Re: [ClusterLabs] PostgreSQL PAF failover issue

2019-06-14 Thread Tiemen Ruiten
I've crossposted the question about checkpoints taking a long time to pgsql-general as well :) On Fri, 14 Jun 2019 at 15:05, Tiemen Ruiten wrote: > Current size of the database is around 600GB uncompressed (LZ4 compression > is enabled on the ZFS dataset). > > On Fri, 14 Jun 2

Re: [ClusterLabs] PostgreSQL PAF failover issue

2019-06-14 Thread Tiemen Ruiten
Current size of the database is around 600GB uncompressed (LZ4 compression is enabled on the ZFS dataset). On Fri, 14 Jun 2019 at 14:59, Tiemen Ruiten wrote: > Hi, yes I'm also puzzled by this. The cluster is certainly not > underpowered, running on baremetal with 8x SSD in ZFS

Re: [ClusterLabs] PostgreSQL PAF failover issue

2019-06-14 Thread Tiemen Ruiten
checkpoint_completion_target = 0.9 I wonder if checkpoint_timeout should be lowered? On Fri, 14 Jun 2019 at 14:49, Adrien Nayrat wrote: > On 6/14/19 12:27 PM, Tiemen Ruiten wrote: > > This took longer than the configured timeout of 60s (checkpoint hadn't > completed > > yet) and t

[ClusterLabs] PostgreSQL PAF failover issue

2019-06-14 Thread Tiemen Ruiten
heckpoints can take up to 15 minutes to complete on this cluster. So is 20 minutes reasonable? Any other operations I should increase the timeouts for? Why didn't pacemaker elect and promote one of the other nodes? -- Tiemen Ruiten Infrastructure Engine

Re: [ClusterLabs] can't create master/slave resource

2017-09-20 Thread Tiemen Ruiten
won't be possible to >> master/slave systemd resources as it is not supported anyway. >> > > https://bugzilla.redhat.com/show_bug.cgi?id=1493416 > > > >> Regards, >> Tomas >> >> [1]: http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-sin

[ClusterLabs] can't create master/slave resource

2017-09-19 Thread Tiemen Ruiten
. pacemaker-libs-1.1.16-12.el7_4.2.x86_64 pacemaker-cluster-libs-1.1.16-12.el7_4.2.x86_64 pacemaker-1.1.16-12.el7_4.2.x86_64 pacemaker-cli-1.1.16-12.el7_4.2.x86_64 corosynclib-2.4.0-9.el7_4.2.x86_64 corosync-2.4.0-9.el7_4.2.x86_64 Am I doing something wrong? -- Tiemen Ruiten Systems Engineer R