Re: [Slony1-general] sl_confirm aging issue?

Andrew Hammond Mon, 02 Apr 2007 16:49:22 -0700

On 3/30/07, Richard Yen <[EMAIL PROTECTED]> wrote:
> Hi,
>
> As a follow-up to my previous post about sl_confirm getting aged, I
> *did* do a move_set from node 4 to node 1 about 6 days ago.  Any
> reason why the slon cleanup cycle didn't pick up these confirmations
> and delete them?  Perhaps it is a bug of some sort?


Or, perhaps the confirmation set wasn't complete for all nodes, and
the slons were behaving correctly?

> In any case, I deleted the rows in sl_confirm, so the

Clever. Did it occur to you that perhaps they're there for a reason
and that simply deleting them is not going to fix your problem, but
may in fact make it worse? You have probably broken your replication
cluster, unless you kept some copies of the deleted rows around.

Alternatively you can just assume that the syncs mentioned in
sl_confirm were applied and then (optionally) try to figure out which
ones they were in sl_log and purge them out of there too. However,
this strikes me as a pretty sloppy way to treat your data and cluster.

> test_slony_state-dbi.pl script doesn't list these anomalies anymore.

Of course not. By treating the symptom, you've managed to further
obscure your actual problem.

> Could anyone else has encountered this, or have an explanation for this?

Slightly messed up listen paths? Slons which needed a restart? Who
knows? I doubt we can help you figure it out now that you've deleted
the evidence.

> --Richard
>
>
>
>
> On Mar 30, 2007, at 12:17 PM, Richard Yen wrote:
>
> > Hi all,
> >
> > I've recently been experiencing climbing lags, followed by a sudden
> > drop, at random times during the day.  I understand that for some
> > people a ~40 event lag isn't much, but it's quite unusual for my
> > cluster.
> >
> > I run a 4-node cluster (1 provider, 3 subscribers), and it appears
> > that at random times, the event lag climbs up to ~40, and then
> > suddenly drops to 0.  Load on all nodes is < 1.0 during these times,
> > so I don't suspect that it's hardware or configuration.  That leaves
> > me with no explanation of what's happening that causes these "lag
> > spikes."
> >
> > Tried running test_slony_state-dbi.pl, and found the following output:
> >
> > ===BEGIN LOG===
> > Tests for node 1 - DSN = dbname=tii host=tii-
> > db1.oaktown.iparadigms.com user=slony password=3l3phant
> > ========================================
> > pg_listener info:
> > Pages: 9
> > Tuples: 1
> >
> > Size Tests
> > ================================================
> >         sl_log_1      1918 26082.000000
> >         sl_log_2         0  0.000000
> >        sl_seqlog        20 1543.000000
> >
> > Listen Path Analysis
> > ===================================================
> > No problems found with sl_listen
> >
> > ----------------------------------------------------------------------
> > --
> > --------
> > Summary of event info
> > Origin  Min SYNC  Max SYNC Min SYNC Age Max SYNC Age
> > ======================================================================
> > ==
> > ========
> >        2   2277006   2277401     00:00:00     00:19:00    0
> >        1   2999671   3001970     00:00:00     00:19:00    0
> >        5    516048    516088     00:00:00     00:20:00    0
> >        4    173746    174140     00:00:00     00:19:00    0
> >
> >
> > ----------------------------------------------------------------------
> > --
> > ---------
> > Summary of sl_confirm aging
> >     Origin   Receiver   Min SYNC   Max SYNC  Age of latest SYNC  Age
> > of eldest SYNC
> > ======================================================================
> > ==
> > =========
> >          1          2    2999672    3001969      00:00:00
> > 00:19:00    0
> >          1          4    2999678    3001969      00:00:00
> > 00:19:00    0
> >          1          5    2999671    3001962      00:00:00
> > 00:19:00    0
> >          2          1    2277006    2277401      00:00:00
> > 00:19:00    0
> >          2          4    2277006    2277401      00:00:00
> > 00:19:00    0
> >          2          5    2277006    2277400      00:00:00
> > 00:19:00    0
> >          4          1     173746     174140      00:00:00
> > 00:19:00    0
> >          4          2    6030310    6030310  6 days 01:52:00  6 days
> > 01:52:00    1
> >          4          5    6030307    6030307  6 days 01:52:00  6 days
> > 01:52:00    1
> >          5          1     516048     516088      00:00:00
> > 00:20:00    0
> >          5          2     516048     516088      00:00:00
> > 00:20:00    0
> >          5          4     516048     516088      00:00:00
> > 00:20:00    0
> >
> >
> > ----------------------------------------------------------------------
> > --
> > ------
> >
> > Listing of old open connections
> >         Database             PID            User    Query
> > Age                Query
> > ======================================================================
> > ==
> > ========
> > ===END OF LOG===
> >
> > If you notice, the lines for Origin->Receiver on 4->2 and 4->2 have
> > some old SYNCs.  These nodes (2 and 5) are the ones I experience the
> > "lag spikes" on.  The other subscriber, node 4, doesn't experience
> > lag spikes at all.  This report is similar for every node in the
> > test_slony_state-dbi.pl script, so I'm kind of perplexed.
> >
> > Wondering if anyone would be able to interpret this for me and
> > provide and help/advice.
> >
> > Thanks a lot!
> > --Richard
> > _______________________________________________
> > Slony1-general mailing list
> > [email protected]
> > http://gborg.postgresql.org/mailman/listinfo/slony1-general
>
> _______________________________________________
> Slony1-general mailing list
> [email protected]
> http://gborg.postgresql.org/mailman/listinfo/slony1-general
>
_______________________________________________
Slony1-general mailing list
[email protected]
http://gborg.postgresql.org/mailman/listinfo/slony1-general

Re: [Slony1-general] sl_confirm aging issue?

Reply via email to