On Wed, 10 Jul 2019 17:25:57 +0200 Danka Ivanovic <danka.ivano...@sbgenomics.com> wrote: ... > I know it should be avoided starting master database with systemctl, but I > didn't find a way to start it with pacemaker. I will test again, but I am > out of ideas.
Put the cluster in debug mode and provide the full logs + pacemaker conf + pgsql confs. It will certainly help understand. > On Wed, Jul 10, 2019 at 4:57 PM Jehan-Guillaume de Rorthais <j...@dalibo.com> > wrote: > > > On Wed, 10 Jul 2019 16:34:17 +0200 > > Danka Ivanovic <danka.ivano...@sbgenomics.com> wrote: > > > > > Hi, Thank you all for responding so quickly. Part of corosync.log file is > > > attached. Cluster failure occured in 09:16 AM yesterday. > > > Debug mode is turned on in corosync configuration, but I didn't turn it > > on > > > in pacemaker config. I will test that. > > > > There's really nothing interesting in there sadly. It could even be like > > pgsqlms hadn't been called at all and the action timed out... > > > > > Postgres log is also attached. > > > > Nothing really revelent there as well. > > > > > Several times cluster failed because of ldap time out, even if I tried to > > > disable ldap searching for local postgres user, > > > > This is really anoying. IIRC, this was already happening last time. Fix > > this > > first if you didn't yet? > > > > ... > > > From syslog it looks like postgres systemd process was > > > stoped, > > > > Again, systemd shouldn't take part of anything in your cluster irw > > postgresql. > > If Pacemaker manage PostgreSQL, systemd should have nothing to do with it. > > > > If you really need to start/stop it by hands (I really discourage you to > > do so), do it using pg_ctl. And make sure to unmanage the Pacemaker > > resource > > before. > > > > > On Tue, 9 Jul 2019 19:57:06 +0300 > > > > Andrei Borzenkov <arvidj...@gmail.com> wrote: > > > > > > > > > 09.07.2019 13:08, Danka Ivanović пишет: > > > > > > Hi I didn't manage to start master with postgres, even if I > > increased > > > > start > > > > > > timeout. I checked executable paths and start options. > > > > > > > > We would require much more logs from this failure... > > > > > > > > > > When cluster is running with manually started master and slave > > started > > > > over > > > > > > pacemaker, everything works ok. > > > > > > > > Logs from this scenario might be interesting as well to check and > > compare. > > > > > > > > > > Today we had failover again. > > > > > > I cannot find reason from the logs, can you help me with > > debugging? > > > > Thanks. > > > > > > > > logs logs logs please. > > > > > > > > > > Jul 09 09:16:32 [2679] postgres1 lrmd: debug: > > > > > > child_kill_helper: Kill pid 12735's group Jul 09 09:16:34 [2679] > > > > > > postgres1 lrmd: warning: child_timeout_callback: > > > > > > PGSQL_monitor_15000 process (PID 12735) timed out > > > > > > > > > > You probably want to enable debug output in resource agent. As far > > as I > > > > > can tell, this requires HA_debug=1 in environment of resource agent, > > but > > > > > for the life of me I cannot find where it is possible to set it. > > > > > > > > > > Probably setting it directly in resource agent for debugging is the > > most > > > > > simple way. > > > > > > > > I usually set this in "/etc/sysconfig/pacemaker". Never tried to add it > > > > to pgsqlms, interesting. > > > > > > > > > P.S. crm_resource is called by resource agent (pgsqlms). And it shows > > > > > result of original resource probing which makes it confusing. At > > least > > > > > it explains where these logs entries come from. > > > > > > > > Not sure tu understand what you mean :/ > > > > > > > > > > > > -- > > Jehan-Guillaume de Rorthais > > Dalibo -- Jehan-Guillaume de Rorthais Dalibo _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/