Hello Everybody. I have a 2 node cluster with clone resource “postgres-ms”. We are running following versions of pacemaker/corosync: d19-25-left.lab.archivas.com ~ # rpm -qa | grep "pacemaker\|corosync" pacemaker-cluster-libs-2.0.5-9.el8.x86_64 pacemaker-libs-2.0.5-9.el8.x86_64 pacemaker-cli-2.0.5-9.el8.x86_64 corosynclib-3.1.0-5.el8.x86_64 pacemaker-schemas-2.0.5-9.el8.noarch corosync-3.1.0-5.el8.x86_64 pacemaker-2.0.5-9.el8.x86_64
There are couple of issues that could be related. 1. There are following messages in the logs coming from pacemaker-controld: Jul 2 14:59:27 d19-25-right pacemaker-controld[1489734]: error: Failed to receive meta-data for ocf:heartbeat:pgsql-rhino Jul 2 14:59:27 d19-25-right pacemaker-controld[1489734]: warning: Failed to get metadata for postgres (ocf:heartbeat:pgsql-rhino) 2. ocf:heartbeat:pgsql-rhino does not get any "notice" operations which causes multiple issues with postgres synchronization during availability events. 3. Item 2 raises another question. Who is setting these values: ${OCF_RESKEY_CRM_meta_notify_type} ${OCF_RESKEY_CRM_meta_notify_operation} Here is excerpt from cluster config: d19-25-left.lab.archivas.com ~ # pcs config Cluster Name: Corosync Nodes: d19-25-right.lab.archivas.com d19-25-left.lab.archivas.com Pacemaker Nodes: d19-25-left.lab.archivas.com d19-25-right.lab.archivas.com Resources: Clone: postgres-ms Meta Attrs: promotable=true target-role=started Resource: postgres (class=ocf provider=heartbeat type=pgsql-rhino) Attributes: master_ip=172.16.1.6 node_list="d19-25-left.lab.archivas.com d19-25-right.lab.archivas.com" pgdata=/pg_data remote_wals_dir=/remote/walarchive rep_mode=sync reppassword=XXXXXX repuser=XXXXXXX restore_command="/opt/rhino/sil/bin/script_wrapper.sh wal_restore.py %f %p" tmpdir=/pg_data/tmp wals_dir=/pg_data/pg_wal xlogs_dir=/pg_data/pg_xlog Meta Attrs: is-managed=true Operations: demote interval=0 on-fail=restart timeout=120s (postgres-demote-interval-0) methods interval=0s timeout=5 (postgres-methods-interval-0s) monitor interval=10s on-fail=restart timeout=300s (postgres-monitor-interval-10s) monitor interval=5s on-fail=restart role=Master timeout=300s (postgres-monitor-interval-5s) notify interval=0 on-fail=restart timeout=90s (postgres-notify-interval-0) promote interval=0 on-fail=restart timeout=120s (postgres-promote-interval-0) start interval=0 on-fail=restart timeout=1800s (postgres-start-interval-0) stop interval=0 on-fail=fence timeout=120s (postgres-stop-interval-0) Thank you very much! _Vitaly _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/