On 12/19/2005 7:56 PM, Marc G. Fournier wrote:
'k, setting up monitoring, and the script is reporting 1 out of 3 nodes out of sync:

./check_slony_cluster.sh dns ams ams.hub.org
ERROR - 2 of 3 nodes not in sync

no problem, figured out in the script how it is being determined, and:

  st_received |    cfmdelay
-------------+-----------------
            2 | 00:00:00.010721
            3 | 03:59:55.2181
            4 | 00:00:00.125318
(3 rows)

wow ... 3 hours and 59 minutes where the other two (Node 4 is a remote server, somewhere in the US, while node 3 is the server beside the master) ...

Now, I've checked Node 3, and it contains the same # of records as Node 1 ..

Now, I just did an update on one record in the table, and checked all 3 slaves and they see the change, yet now I'm seeing:

  st_received |    cfmdelay
-------------+-----------------
            2 | 00:00:00.009916
            3 | 03:59:55.175099
            4 | 01:46:02.69134
(3 rows)

Node 4 just shot up ...

Looking at sl_status:

# select * from "_dns".sl_status;
st_origin | st_received | st_last_event | st_last_event_ts | st_last_received | st_last_received_ts | st_last_received_event_ts | st_lag_num_events | st_lag_time -----------+-------------+---------------+----------------------------+------------------+----------------------------+----------------------------+-------------------+-----------------
          1 |           2 |           837 | 2005-12-19 20:52:23.576685 |        
      837 | 2005-12-19 20:52:23.589583 | 2005-12-19 20:52:23.576685 |           
      0 | 00:00:06.669823
          1 |           3 |           837 | 2005-12-19 20:52:23.576685 |        
      837 | 2005-12-20 00:52:18.736552 | 2005-12-19 20:52:23.576685 |           
      0 | 00:00:06.669823
          1 |           4 |           837 | 2005-12-19 20:52:23.576685 |        
      837 | 2005-12-19 22:36:25.514229 | 2005-12-19 20:52:23.576685 |           
      0 | 00:00:06.669823

So, what is st_last_received_ts, and why isn't Node 3 updating it? I've checked my slon_ams.out file on Node 3, and there are no errors being generated that I can see ... and replication appears to be working fine on all the Nodes ...

Somewhere else I need to be looking for this?

The timestamps st_last_event_ts (origin) and st_last_received_ts (subscriber) are taken on different servers. If you play around with the clocks of them you will find also settings where sl_status reports them to be in the future.


Jan

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== [EMAIL PROTECTED] #
_______________________________________________
Slony1-general mailing list
[email protected]
http://gborg.postgresql.org/mailman/listinfo/slony1-general

Reply via email to