On Apr 2, 2007, at 4:44 PM, Andrew Hammond wrote: > On 3/30/07, Richard Yen <[EMAIL PROTECTED]> wrote: >> Hi, >> >> As a follow-up to my previous post about sl_confirm getting aged, I >> *did* do a move_set from node 4 to node 1 about 6 days ago. Any >> reason why the slon cleanup cycle didn't pick up these confirmations >> and delete them? Perhaps it is a bug of some sort? > > Or, perhaps the confirmation set wasn't complete for all nodes, and > the slons were behaving correctly? Not sure what you mean here. How do I check for confirmation set completeness?
>> In any case, I deleted the rows in sl_confirm, so the > > Clever. Did it occur to you that perhaps they're there for a reason > and that simply deleting them is not going to fix your problem, but > may in fact make it worse? You have probably broken your replication > cluster, unless you kept some copies of the deleted rows around. > > Alternatively you can just assume that the syncs mentioned in > sl_confirm were applied and then (optionally) try to figure out which > ones they were in sl_log and purge them out of there too. However, > this strikes me as a pretty sloppy way to treat your data and cluster. Lesson learned. I'm a rookie DBA, and that's why I'm looking for help from people like you guys. I *did* however, check both sl_log_1 and sl_log_2 for corresponding entries to the rows in sl_confirm. I checked sl_event also, but found nothing (perhaps I should've checked elsewhere, but I couldn't find any direction in the documentation). I concluded that these rows in sl_confirm were, in effect, orphaned, and deleted them. True, it's sloppy--I'll not do it again. >> test_slony_state-dbi.pl script doesn't list these anomalies anymore. > > Of course not. By treating the symptom, you've managed to further > obscure your actual problem. > >> Could anyone else has encountered this, or have an explanation for >> this? > > Slightly messed up listen paths? Slons which needed a restart? Who > knows? I doubt we can help you figure it out now that you've deleted > the evidence. > test_slony_state-dbi.pl said "No problems found with sl_listen" and I *did* try a restart of all slon daemons. Neither of them did anything to the rows in sl_confirm. --Richard >> --Richard >> >> >> >> >> On Mar 30, 2007, at 12:17 PM, Richard Yen wrote: >> >> > Hi all, >> > >> > I've recently been experiencing climbing lags, followed by a sudden >> > drop, at random times during the day. I understand that for some >> > people a ~40 event lag isn't much, but it's quite unusual for my >> > cluster. >> > >> > I run a 4-node cluster (1 provider, 3 subscribers), and it appears >> > that at random times, the event lag climbs up to ~40, and then >> > suddenly drops to 0. Load on all nodes is < 1.0 during these >> times, >> > so I don't suspect that it's hardware or configuration. That >> leaves >> > me with no explanation of what's happening that causes these "lag >> > spikes." >> > >> > Tried running test_slony_state-dbi.pl, and found the following >> output: >> > >> > ===BEGIN LOG=== >> > Tests for node 1 - DSN = dbname=tii host=tii- >> > db1.oaktown.iparadigms.com user=slony password=3l3phant >> > ======================================== >> > pg_listener info: >> > Pages: 9 >> > Tuples: 1 >> > >> > Size Tests >> > ================================================ >> > sl_log_1 1918 26082.000000 >> > sl_log_2 0 0.000000 >> > sl_seqlog 20 1543.000000 >> > >> > Listen Path Analysis >> > =================================================== >> > No problems found with sl_listen >> > >> > >> --------------------------------------------------------------------- >> - >> > -- >> > -------- >> > Summary of event info >> > Origin Min SYNC Max SYNC Min SYNC Age Max SYNC Age >> > >> ===================================================================== >> = >> > == >> > ======== >> > 2 2277006 2277401 00:00:00 00:19:00 0 >> > 1 2999671 3001970 00:00:00 00:19:00 0 >> > 5 516048 516088 00:00:00 00:20:00 0 >> > 4 173746 174140 00:00:00 00:19:00 0 >> > >> > >> > >> --------------------------------------------------------------------- >> - >> > -- >> > --------- >> > Summary of sl_confirm aging >> > Origin Receiver Min SYNC Max SYNC Age of latest SYNC >> Age >> > of eldest SYNC >> > >> ===================================================================== >> = >> > == >> > ========= >> > 1 2 2999672 3001969 00:00:00 >> > 00:19:00 0 >> > 1 4 2999678 3001969 00:00:00 >> > 00:19:00 0 >> > 1 5 2999671 3001962 00:00:00 >> > 00:19:00 0 >> > 2 1 2277006 2277401 00:00:00 >> > 00:19:00 0 >> > 2 4 2277006 2277401 00:00:00 >> > 00:19:00 0 >> > 2 5 2277006 2277400 00:00:00 >> > 00:19:00 0 >> > 4 1 173746 174140 00:00:00 >> > 00:19:00 0 >> > 4 2 6030310 6030310 6 days 01:52:00 6 >> days >> > 01:52:00 1 >> > 4 5 6030307 6030307 6 days 01:52:00 6 >> days >> > 01:52:00 1 >> > 5 1 516048 516088 00:00:00 >> > 00:20:00 0 >> > 5 2 516048 516088 00:00:00 >> > 00:20:00 0 >> > 5 4 516048 516088 00:00:00 >> > 00:20:00 0 >> > >> > >> > >> --------------------------------------------------------------------- >> - >> > -- >> > ------ >> > >> > Listing of old open connections >> > Database PID User Query >> > Age Query >> > >> ===================================================================== >> = >> > == >> > ======== >> > ===END OF LOG=== >> > >> > If you notice, the lines for Origin->Receiver on 4->2 and 4->2 have >> > some old SYNCs. These nodes (2 and 5) are the ones I experience >> the >> > "lag spikes" on. The other subscriber, node 4, doesn't experience >> > lag spikes at all. This report is similar for every node in the >> > test_slony_state-dbi.pl script, so I'm kind of perplexed. >> > >> > Wondering if anyone would be able to interpret this for me and >> > provide and help/advice. >> > >> > Thanks a lot! >> > --Richard >> > _______________________________________________ >> > Slony1-general mailing list >> > [email protected] >> > http://gborg.postgresql.org/mailman/listinfo/slony1-general >> >> _______________________________________________ >> Slony1-general mailing list >> [email protected] >> http://gborg.postgresql.org/mailman/listinfo/slony1-general >> _______________________________________________ Slony1-general mailing list [email protected] http://gborg.postgresql.org/mailman/listinfo/slony1-general
