Hi all,
I've recently been experiencing climbing lags, followed by a sudden
drop, at random times during the day. I understand that for some
people a ~40 event lag isn't much, but it's quite unusual for my
cluster.
I run a 4-node cluster (1 provider, 3 subscribers), and it appears
that at random times, the event lag climbs up to ~40, and then
suddenly drops to 0. Load on all nodes is < 1.0 during these times,
so I don't suspect that it's hardware or configuration. That leaves
me with no explanation of what's happening that causes these "lag
spikes."
Tried running test_slony_state-dbi.pl, and found the following output:
===BEGIN LOG===
Tests for node 1 - DSN = dbname=tii host=tii-
db1.oaktown.iparadigms.com user=slony password=3l3phant
========================================
pg_listener info:
Pages: 9
Tuples: 1
Size Tests
================================================
sl_log_1 1918 26082.000000
sl_log_2 0 0.000000
sl_seqlog 20 1543.000000
Listen Path Analysis
===================================================
No problems found with sl_listen
------------------------------------------------------------------------
--------
Summary of event info
Origin Min SYNC Max SYNC Min SYNC Age Max SYNC Age
========================================================================
========
2 2277006 2277401 00:00:00 00:19:00 0
1 2999671 3001970 00:00:00 00:19:00 0
5 516048 516088 00:00:00 00:20:00 0
4 173746 174140 00:00:00 00:19:00 0
------------------------------------------------------------------------
---------
Summary of sl_confirm aging
Origin Receiver Min SYNC Max SYNC Age of latest SYNC Age
of eldest SYNC
========================================================================
=========
1 2 2999672 3001969 00:00:00
00:19:00 0
1 4 2999678 3001969 00:00:00
00:19:00 0
1 5 2999671 3001962 00:00:00
00:19:00 0
2 1 2277006 2277401 00:00:00
00:19:00 0
2 4 2277006 2277401 00:00:00
00:19:00 0
2 5 2277006 2277400 00:00:00
00:19:00 0
4 1 173746 174140 00:00:00
00:19:00 0
4 2 6030310 6030310 6 days 01:52:00 6 days
01:52:00 1
4 5 6030307 6030307 6 days 01:52:00 6 days
01:52:00 1
5 1 516048 516088 00:00:00
00:20:00 0
5 2 516048 516088 00:00:00
00:20:00 0
5 4 516048 516088 00:00:00
00:20:00 0
------------------------------------------------------------------------
------
Listing of old open connections
Database PID User Query
Age Query
========================================================================
========
===END OF LOG===
If you notice, the lines for Origin->Receiver on 4->2 and 4->2 have
some old SYNCs. These nodes (2 and 5) are the ones I experience the
"lag spikes" on. The other subscriber, node 4, doesn't experience
lag spikes at all. This report is similar for every node in the
test_slony_state-dbi.pl script, so I'm kind of perplexed.
Wondering if anyone would be able to interpret this for me and
provide and help/advice.
Thanks a lot!
--Richard
_______________________________________________
Slony1-general mailing list
[email protected]
http://gborg.postgresql.org/mailman/listinfo/slony1-general