Public bug reported:

We were debugging an unexpected failover of a PostgreSQL-9.3 Pacemaker
cluster running on 14.04 LTS at a client. As a timeline, the client put
back the second node (h1db2) into the cluster at around 9:10 AM, and the
unexpected failover occured at 10:32 AM.

What exactly lead to the failover could not be exatly figured out, but
two problems were apparent from the logs:

1. The standby monitor action thought there was no master running:

May  4 09:10:59 h1db2 pgsql(pgsql)[2460]: INFO: Master does not exist.
[Message repeats 175 times]
May  4 10:32:04 h1db2 pgsql(pgsql)[2611]: INFO: Master does not exist.
[...]
May  4 10:32:04 h1db2 pgsql(pgsql)[2611]: INFO: I have a master right.

At this point, Pacemaker decided to promote h1db2.

2. Between 9:10 and 10:32, the score of the standby was -INFINITY,
at10:32 it was then set to the same score as the master (1000) while it
should be 100 for standbys.

Both problems were debugged and traced back to bugs in the pgsql
resource agent version in trusty, which are due to output changes in
newer pacemaker versions (including the one in trusty) and have since
been fixed.

The following git commits from https://github.com/ClusterLabs/resource-
agents are relevant:

https://github.com/ClusterLabs/resource-
agents/commit/78ddf466e413d0c1f18f7610cfbd63968b012ce0 fixes the first
issue.

https://github.com/ClusterLabs/resource-
agents/commit/956244dd05f69bdad979b252a3e359855b88e6bd fixes the second
issue.

However, several other intermediate commits are required on top of the
version in trusty, so the full list we are using is:

956244dd05f69bdad979b252a3e359855b88e6bd
b7911abce27889becc8a4637e003bfcf5ef1b15e (adjusted)
ffc9c6444996144076ef2b4bc79a38569e05250a
404d205636ad02e09ddffdb9710dd660b8171c6b
ff9f0ed32e64f9be9e57dc712ec241231b04d917
78ddf466e413d0c1f18f7610cfbd63968b012ce0

b7911abce27 needs to be adjusted as it uses a function (exec_with_retry)
which is not available yet, but it (and its first argument) can be
safely removed. Commits 3-5 just keep changing the same line (as does
the last) so the final patch isn't getting any bigger.

The attached patch makes the pgsql resource agent work much better for
us, would it be possible to apply it to the resource-agents package in
trusty?

** Affects: resource-agents (Ubuntu)
     Importance: Undecided
         Status: New

** Patch added: "Proposed patch"
   https://bugs.launchpad.net/bugs/1688613/+attachment/4872336/+files/pgsql.diff

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1688613

Title:
  pgsql RA has problems with pacemaker version

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/resource-agents/+bug/1688613/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to