[Slony1-general] sl_confirm aging issue?

Richard Yen Fri, 30 Mar 2007 12:22:07 -0800

Hi all,

I've recently been experiencing climbing lags, followed by a sudden  
drop, at random times during the day.  I understand that for some  
people a ~40 event lag isn't much, but it's quite unusual for my  
cluster.


I run a 4-node cluster (1 provider, 3 subscribers), and it appears  
that at random times, the event lag climbs up to ~40, and then  
suddenly drops to 0.  Load on all nodes is < 1.0 during these times,  
so I don't suspect that it's hardware or configuration.  That leaves  
me with no explanation of what's happening that causes these "lag  
spikes."

Tried running test_slony_state-dbi.pl, and found the following output:

===BEGIN LOG===
Tests for node 1 - DSN = dbname=tii host=tii- 
db1.oaktown.iparadigms.com user=slony password=3l3phant
========================================
pg_listener info:
Pages: 9
Tuples: 1

Size Tests
================================================
        sl_log_1      1918 26082.000000
        sl_log_2         0  0.000000
       sl_seqlog        20 1543.000000

Listen Path Analysis
===================================================
No problems found with sl_listen

------------------------------------------------------------------------ 
--------
Summary of event info
Origin  Min SYNC  Max SYNC Min SYNC Age Max SYNC Age
======================================================================== 
========
       2   2277006   2277401     00:00:00     00:19:00    0
       1   2999671   3001970     00:00:00     00:19:00    0
       5    516048    516088     00:00:00     00:20:00    0
       4    173746    174140     00:00:00     00:19:00    0


------------------------------------------------------------------------ 
---------
Summary of sl_confirm aging
    Origin   Receiver   Min SYNC   Max SYNC  Age of latest SYNC  Age  
of eldest SYNC
======================================================================== 
=========
         1          2    2999672    3001969      00:00:00       
00:19:00    0
         1          4    2999678    3001969      00:00:00       
00:19:00    0
         1          5    2999671    3001962      00:00:00       
00:19:00    0
         2          1    2277006    2277401      00:00:00       
00:19:00    0
         2          4    2277006    2277401      00:00:00       
00:19:00    0
         2          5    2277006    2277400      00:00:00       
00:19:00    0
         4          1     173746     174140      00:00:00       
00:19:00    0
         4          2    6030310    6030310  6 days 01:52:00  6 days  
01:52:00    1
         4          5    6030307    6030307  6 days 01:52:00  6 days  
01:52:00    1
         5          1     516048     516088      00:00:00       
00:20:00    0
         5          2     516048     516088      00:00:00       
00:20:00    0
         5          4     516048     516088      00:00:00       
00:20:00    0


------------------------------------------------------------------------ 
------

Listing of old open connections
        Database             PID            User    Query  
Age                Query
======================================================================== 
========
===END OF LOG===

If you notice, the lines for Origin->Receiver on 4->2 and 4->2 have  
some old SYNCs.  These nodes (2 and 5) are the ones I experience the  
"lag spikes" on.  The other subscriber, node 4, doesn't experience  
lag spikes at all.  This report is similar for every node in the  
test_slony_state-dbi.pl script, so I'm kind of perplexed.

Wondering if anyone would be able to interpret this for me and  
provide and help/advice.

Thanks a lot!
--Richard
_______________________________________________
Slony1-general mailing list
[email protected]
http://gborg.postgresql.org/mailman/listinfo/slony1-general

[Slony1-general] sl_confirm aging issue?

Reply via email to