On 3/30/07, Richard Yen <[EMAIL PROTECTED]> wrote: > Hi, > > As a follow-up to my previous post about sl_confirm getting aged, I > *did* do a move_set from node 4 to node 1 about 6 days ago. Any > reason why the slon cleanup cycle didn't pick up these confirmations > and delete them? Perhaps it is a bug of some sort?
Or, perhaps the confirmation set wasn't complete for all nodes, and the slons were behaving correctly? > In any case, I deleted the rows in sl_confirm, so the Clever. Did it occur to you that perhaps they're there for a reason and that simply deleting them is not going to fix your problem, but may in fact make it worse? You have probably broken your replication cluster, unless you kept some copies of the deleted rows around. Alternatively you can just assume that the syncs mentioned in sl_confirm were applied and then (optionally) try to figure out which ones they were in sl_log and purge them out of there too. However, this strikes me as a pretty sloppy way to treat your data and cluster. > test_slony_state-dbi.pl script doesn't list these anomalies anymore. Of course not. By treating the symptom, you've managed to further obscure your actual problem. > Could anyone else has encountered this, or have an explanation for this? Slightly messed up listen paths? Slons which needed a restart? Who knows? I doubt we can help you figure it out now that you've deleted the evidence. > --Richard > > > > > On Mar 30, 2007, at 12:17 PM, Richard Yen wrote: > > > Hi all, > > > > I've recently been experiencing climbing lags, followed by a sudden > > drop, at random times during the day. I understand that for some > > people a ~40 event lag isn't much, but it's quite unusual for my > > cluster. > > > > I run a 4-node cluster (1 provider, 3 subscribers), and it appears > > that at random times, the event lag climbs up to ~40, and then > > suddenly drops to 0. Load on all nodes is < 1.0 during these times, > > so I don't suspect that it's hardware or configuration. That leaves > > me with no explanation of what's happening that causes these "lag > > spikes." > > > > Tried running test_slony_state-dbi.pl, and found the following output: > > > > ===BEGIN LOG=== > > Tests for node 1 - DSN = dbname=tii host=tii- > > db1.oaktown.iparadigms.com user=slony password=3l3phant > > ======================================== > > pg_listener info: > > Pages: 9 > > Tuples: 1 > > > > Size Tests > > ================================================ > > sl_log_1 1918 26082.000000 > > sl_log_2 0 0.000000 > > sl_seqlog 20 1543.000000 > > > > Listen Path Analysis > > =================================================== > > No problems found with sl_listen > > > > ---------------------------------------------------------------------- > > -- > > -------- > > Summary of event info > > Origin Min SYNC Max SYNC Min SYNC Age Max SYNC Age > > ====================================================================== > > == > > ======== > > 2 2277006 2277401 00:00:00 00:19:00 0 > > 1 2999671 3001970 00:00:00 00:19:00 0 > > 5 516048 516088 00:00:00 00:20:00 0 > > 4 173746 174140 00:00:00 00:19:00 0 > > > > > > ---------------------------------------------------------------------- > > -- > > --------- > > Summary of sl_confirm aging > > Origin Receiver Min SYNC Max SYNC Age of latest SYNC Age > > of eldest SYNC > > ====================================================================== > > == > > ========= > > 1 2 2999672 3001969 00:00:00 > > 00:19:00 0 > > 1 4 2999678 3001969 00:00:00 > > 00:19:00 0 > > 1 5 2999671 3001962 00:00:00 > > 00:19:00 0 > > 2 1 2277006 2277401 00:00:00 > > 00:19:00 0 > > 2 4 2277006 2277401 00:00:00 > > 00:19:00 0 > > 2 5 2277006 2277400 00:00:00 > > 00:19:00 0 > > 4 1 173746 174140 00:00:00 > > 00:19:00 0 > > 4 2 6030310 6030310 6 days 01:52:00 6 days > > 01:52:00 1 > > 4 5 6030307 6030307 6 days 01:52:00 6 days > > 01:52:00 1 > > 5 1 516048 516088 00:00:00 > > 00:20:00 0 > > 5 2 516048 516088 00:00:00 > > 00:20:00 0 > > 5 4 516048 516088 00:00:00 > > 00:20:00 0 > > > > > > ---------------------------------------------------------------------- > > -- > > ------ > > > > Listing of old open connections > > Database PID User Query > > Age Query > > ====================================================================== > > == > > ======== > > ===END OF LOG=== > > > > If you notice, the lines for Origin->Receiver on 4->2 and 4->2 have > > some old SYNCs. These nodes (2 and 5) are the ones I experience the > > "lag spikes" on. The other subscriber, node 4, doesn't experience > > lag spikes at all. This report is similar for every node in the > > test_slony_state-dbi.pl script, so I'm kind of perplexed. > > > > Wondering if anyone would be able to interpret this for me and > > provide and help/advice. > > > > Thanks a lot! > > --Richard > > _______________________________________________ > > Slony1-general mailing list > > [email protected] > > http://gborg.postgresql.org/mailman/listinfo/slony1-general > > _______________________________________________ > Slony1-general mailing list > [email protected] > http://gborg.postgresql.org/mailman/listinfo/slony1-general > _______________________________________________ Slony1-general mailing list [email protected] http://gborg.postgresql.org/mailman/listinfo/slony1-general
