On Mon, 02 Apr 2018 14:05:09 -0600 Casey & Gina <caseyandg...@icloud.com> wrote: [...] > Now, if I restart the second node, and execute `pcs cluster start` once it's > back up, it fails to start the resource and shows me this in the `pcs status` > output: > > > ------ > * postgresql-10-main_start_0 on d-gp2-dbp62-2 'unknown error' (1): call=11, > status=complete, exitreason='Instance "postgresql-10-main" failed to start > (rc: 1)', last-rc-change='Mon Apr 2 19:50:40 2018', queued=0ms, exec=228ms > ------
When the cluster was down on "d-gp2-dbp62-2", did PostgreSQL stopped as well on this node? [...] > It tells me to examine the PostgreSQL log output, so I look there, but I > don't see anything logged at all since the server shutdown. Because I suppose: * the error comes from pg_ctl before it was able to actually start PostgreSQL.. * ...or the error raised before PostgreSQL were able to setup its logging behavior In both case, something is failing in very early stage. Something comes in mind: did you setup "systemd-tmpfiles" as explained in the end of the following chapter ? https://clusterlabs.github.io/PAF/Quick_Start-Debian-9-pcs.html#postgresql-and-cluster-stack-installation [...] > Now for the weird part, which is the workaround I inadvertently discovered > when trying to figure out what was going wrong. First, I do a `pcs cluster > stop` again. Then I start up the service using systemd, and it starts up > just fine; so I do that and then shut it back down using `service > postgresql@10-main start` and `service postgresql@10-main stop`, which should > put it back in the original state. Now, when I issue `pcs cluster start`, > everything comes up just fine as expected with no errors!?! Systemd use the postgresql wrappers (see: man postgresql-common) to start your cluster. There's a bunch of other actions taking place there. So you can not compare how postgresql's Debian wrapper behave with how pgsqlms start your cluster. > I have also seen this "unknown error" come up at other undesirable times, > like when doing a manual failover using `pcs cluster stop` on the primary or > a `pcs resource move --master ...` command, however once the workaround is > applied to the node having issues, it works perfectly fine until it's > rebooted. > > Can anyone explain what is happening here, and how I can fix it properly? Make sure Systemd sees PostgreSQL as stopped (and disable it). Try to start your PostgreSQL using these commands: sudo -iu postgres /usr/lib/postgresql/10/bin/pg_ctl --pgdata /var/lib/postgresql/10/main \ -w --timeout 60 start And report here the errors you can find. If it starts...report as well, but stop it using: sudo -iu postgres /usr/lib/postgresql/10/bin/pg_ctl --pgdata /var/lib/postgresql/10/main \ -w --timeout 60 -m fast stop _______________________________________________ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org