We tried to fix ldap issue with nss_initgroups_ignoreusers option in
nslcd.conf for postgres and hacluster users. So cluster shouldn't contact
ldap server every 15 seconds when it checks psql with postgres user:
/usr/lib/postgresql/9.5/bin/pg_isready -h /var/run/postgresql/ -p 5432
We have two
On Wed, 10 Jul 2019 17:25:57 +0200
Danka Ivanovic wrote:
...
> I know it should be avoided starting master database with systemctl, but I
> didn't find a way to start it with pacemaker. I will test again, but I am
> out of ideas.
Put the cluster in debug mode and provide the full logs +
On Wed, 10 Jul 2019 16:34:17 +0200
Danka Ivanovic wrote:
> Hi, Thank you all for responding so quickly. Part of corosync.log file is
> attached. Cluster failure occured in 09:16 AM yesterday.
> Debug mode is turned on in corosync configuration, but I didn't turn it on
> in pacemaker config. I
On Wed, 10 Jul 2019 12:53:59 +0300
Andrei Borzenkov wrote:
> On Wed, Jul 10, 2019 at 12:42 PM Jehan-Guillaume de Rorthais
> wrote:
>
> >
> > > > Jul 09 09:16:32 [2679] postgres1 lrmd:debug:
> > > > child_kill_helper: Kill pid 12735's group Jul 09 09:16:34 [2679]
> > > > postgres1
On Wed, Jul 10, 2019 at 12:42 PM Jehan-Guillaume de Rorthais
wrote:
>
> > > Jul 09 09:16:32 [2679] postgres1 lrmd:debug:
> > > child_kill_helper: Kill pid 12735's group Jul 09 09:16:34 [2679]
> > > postgres1 lrmd: warning: child_timeout_callback:
> > > PGSQL_monitor_15000
On Wed, Jul 10, 2019 at 12:42 PM Jehan-Guillaume de Rorthais
wrote:
>
> > P.S. crm_resource is called by resource agent (pgsqlms). And it shows
> > result of original resource probing which makes it confusing. At least
> > it explains where these logs entries come from.
>
> Not sure tu understand
On Tue, 9 Jul 2019 19:57:06 +0300
Andrei Borzenkov wrote:
> 09.07.2019 13:08, Danka Ivanović пишет:
> > Hi I didn't manage to start master with postgres, even if I increased start
> > timeout. I checked executable paths and start options.
We would require much more logs from this failure...
>
09.07.2019 13:08, Danka Ivanović пишет:
> Hi I didn't manage to start master with postgres, even if I increased start
> timeout. I checked executable paths and start options.
> When cluster is running with manually started master and slave started over
> pacemaker, everything works ok. Today we
On Thu, 2019-05-16 at 10:20 +0200, Jehan-Guillaume de Rorthais wrote:
> On Wed, 15 May 2019 16:53:48 -0500
> Ken Gaillot wrote:
>
> > On Wed, 2019-05-15 at 11:50 +0200, Jehan-Guillaume de Rorthais
> > wrote:
> > > On Mon, 29 Apr 2019 19:59:49 +0300
> > > Andrei Borzenkov wrote:
> > >
> > > >
On Wed, 15 May 2019 16:53:48 -0500
Ken Gaillot wrote:
> On Wed, 2019-05-15 at 11:50 +0200, Jehan-Guillaume de Rorthais wrote:
> > On Mon, 29 Apr 2019 19:59:49 +0300
> > Andrei Borzenkov wrote:
> >
> > > 29.04.2019 18:05, Ken Gaillot пишет:
> > > > >
> > > > > > Why does not it check
On Wed, 2019-05-15 at 11:50 +0200, Jehan-Guillaume de Rorthais wrote:
> On Mon, 29 Apr 2019 19:59:49 +0300
> Andrei Borzenkov wrote:
>
> > 29.04.2019 18:05, Ken Gaillot пишет:
> > > >
> > > > > Why does not it check OCF_RESKEY_CRM_meta_notify?
> > > >
> > > > I was just not aware of this
On Mon, 29 Apr 2019 19:59:49 +0300
Andrei Borzenkov wrote:
> 29.04.2019 18:05, Ken Gaillot пишет:
> >>
> >>> Why does not it check OCF_RESKEY_CRM_meta_notify?
> >>
> >> I was just not aware of this env variable. Sadly, it is not
> >> documented
> >> anywhere :(
> >
> > It's not a
On Tue, 30 Apr 2019 17:28:44 +0200
Danka Ivanović wrote:
> Hi, I tried new clean config with upgraded postgres and corosync and
> pacemaker packages.
In this attempt, your PostgreSQL resource timed out while starting up:
Apr 30 15:09:43 [13342] master lrmd:debug:
29.04.2019 18:05, Ken Gaillot пишет:
>>
>>> Why does not it check OCF_RESKEY_CRM_meta_notify?
>>
>> I was just not aware of this env variable. Sadly, it is not
>> documented
>> anywhere :(
>
> It's not a Pacemaker-created value like the other notify variables --
> all user-specified
On Sun, 2019-04-28 at 00:27 +0200, Jehan-Guillaume de Rorthais wrote:
> On Sat, 27 Apr 2019 09:15:29 +0300
> Andrei Borzenkov wrote:
>
> > 27.04.2019 1:04, Danka Ivanović пишет:
> > > Hi, here is a complete cluster configuration:
> > >
> > > node 1: master
> > > node 2: secondary
> > >
On Sat, 27 Apr 2019 09:15:29 +0300
Andrei Borzenkov wrote:
> 27.04.2019 1:04, Danka Ivanović пишет:
> > Hi, here is a complete cluster configuration:
> >
> > node 1: master
> > node 2: secondary
> > primitive AWSVIP awsvip \
> > params secondary_private_ip=10.x.x.x api_delay=5
> >
27.04.2019 1:04, Danka Ivanović пишет:
> Hi, here is a complete cluster configuration:
>
> node 1: master
> node 2: secondary
> primitive AWSVIP awsvip \
> params secondary_private_ip=10.x.x.x api_delay=5
> primitive PGSQL pgsqlms \
> params pgdata="/var/lib/postgresql/9.5/main"
>
Hi,
On Thu, 25 Apr 2019 18:57:55 +0200
Danka Ivanović wrote:
> Apr 25 16:39:50 [4213] master lrmd: notice:
> operation_finished: PGSQL_monitor_0:5849:stderr [ ocf-exit-reason:You
> must set meta parameter notify=true for your master resource ]
Resource agent pgsqlms refuse to start
Hi,
Here are the logs when pacemaker fails to start postgres service on master.
It manage to start only postgres slave.
I tried different configuration with pgslqms and pgsql resource agents.
Those errors are when I use pgsqlms agent, which configuration I have sent
in first mail:
Apr 25 16:40:23
Hi,
It seems that ldap timeout caused cluster failure. Cluster is checking
status every 15s on master and 16s on slave. Cluster needs postgres user
for authentication, but ldap first query user on ldap server and then
localy on host. When connection to ldap server was interrupted, cluster
couldn't
On Fri, 19 Apr 2019 17:26:14 +0200
Danka Ivanović wrote:
...
> Should I change any of those timeout parameters in order to avoid timeout?
You can try to raise the timeout, indeed. But as far as we don't know **why**
your VMs froze for some time, it is difficult to guess how high should be
these
Here is the command output from crm configure show:
node 1: master \
attributes master-PGSQL=1001
node 2: secondary \
attributes master-PGSQL=1000
primitive AWSVIP awsvip \
params secondary_private_ip=10.x.x.x api_delay=5
primitive PGSQL pgsqlms \
params pgdata="/var/lib/postgresql/9.5/main"
Thanks for the clarification about failure-timeout, migration threshold and
pacemaker.
Instances are hosted on AWS cloud, and they are in the same security groups
and availability zones.
I don't have information about hardware which hosts those VMs since they
are non dedicated. UTC timezone is
On Fri, 19 Apr 2019 11:08:33 +0200
Danka Ivanović wrote:
> Hi,
> Thank you for your response.
>
> Ok, It seems that fencing resources and secondary timed out at the same
> time, together with ldap.
> I understand that because of "migration-threshold=1", standby tried to
> recover just once and
Hi,
Thank you for your response.
Ok, It seems that fencing resources and secondary timed out at the same
time, together with ldap.
I understand that because of "migration-threshold=1", standby tried to
recover just once and then was stopped. Is this ok, or the threshold should
be increased?
On Thu, 18 Apr 2019 14:19:44 +0200
Danka Ivanović wrote:
It seems you had timeout for both fencing resources and your standby in the same
time here:
> Apr 17 10:03:34 master pengine[12480]: warning: Processing failed op
> monitor for fencing-secondary on master: unknown error (1)
> Apr 17
Hi,
Can you help me with troubleshooting postgres pacemaker cluster failure?
Today cluster failed without promoting secondary to master. At the same
time appeared ldap time out.
Here are the logs, master was stopped by pacemaker at 10:03:40 AM UTC.
Thank you in advance.
corosync.log
Apr 17
27 matches
Mail list logo