Re: [Slony1-general] Replication falling behind...

Dan Falconer Wed, 07 Feb 2007 07:44:24 -0800

On Wednesday 07 February 2007 8:24 am, Dan Falconer wrote:
> On Tuesday 06 February 2007 7:47 pm, [EMAIL PROTECTED] wrote:
> > > On Tue, Feb 06, 2007 at 05:34:34PM -0600, Dan Falconer wrote:
> > >>  I'm going to leave this overnight, and find out what happens, but I'm
> > >> not
> > >> very hopeful.  Tomorrow, if it hasn't caught up significantly, I'm
> > >> going to
> > >> have to do something drastic... and if 1.2.6 continues to drop behind
> > >> so badly, I may have to (attempt to) revert backup to 1.1.0.  I may
> > >> try just
> > >
> > > Naw: if that's what's happening, something is wrong in a way we need
> > > to fix right away, and will do in collaboration with you.  We'll need
> > > more data, though, about what's going on under the hood.
> >
> > Further, I don't think there's anything about 1.1 that would be expected
> > to be *better* than 1.2, in terms of performance.
> >
> > The one thing that would be expected to affect performance is the
> > switching between log tables (sl_log_1 and sl_log_2).  And the fact that
> > this has the ability to *empty* the tables should be an improvement.
> >
> > I can suggest one thing to take a particular look at, namely what indices
> > are on sl_log_1 and sl_log_2.
> >
> > There should, over time, become a set of partial indexes on these tables
> > based on the node numbers that are the origins of replication sets.  That
> > should be a help (e.g. - better performance than in 1.1, which didn't do
> > this).
> >
> > If there aren't good indices on these tables, which would be unexpected,
> > that could cause problems
>
>       First off, I'd like to say that you guys have done a great job on
> Slony--for the most part ;) -- and I really appreciate your help.
>
>       After running the vacuum analyze + restart of the slon daemons, it ran
> through at least one "burst" of activities (where it suddenly cleared 1M
> records out of sl_log_2 on the master).  It seemed to start falling behind,
> as I ran checks on that table, and the latency of the slave, by running a
> query recommended by Chris Browne to our former DBA (our replication
> cluster is "pl_replication"):::
>
> select con_origin, con_received, max(con_seqno), max(con_timestamp), now()
> - max(con_timestamp) as age from _pl_replication.sl_confirm group by
> con_origin, con_received order by age;
>
>       Anyway, this morning, it appears to have pounded through, and finally
> caught up.  I think the problem really was resolved by the vacuum analyze +
> restart slons... my co-worker was talking to me about it, and mentioned
> that it may be something along the lines of what happens in Perl with
> prepared statements: it gets a good plan right away, but if the table grows
> too fast during that time, the plan becomes "stale" and more intensive. 
> Might be something to think about, though I have little knowledge of the
> deeper inner-workings of Slony (I'm okay with using the good ol' "it's
> magic" explanation).
>
>       About the partial indexes: the slave appears to have a partial index on
> both sl_log_1 and sl_log_2, while the master has a partial index on
> sl_log_1 ONLY (important because sl_log_2 still has about 23,000 records in
> it).  The slave is now only ~7sec behind... I would expect that once
> whatever magic happens that causes the master to start using sl_log_1 again
> will also cause that partial index to get thrown onto sl_log_2.
>
>       Final thought: if my Slony setup hasn't figured itself out already, 
> then I
> would venture to guess that this problem will recur next week after
> inventory processing.  If so, I'll put a message on the list again, and
> maybe we can figure out what causes this little beasty to rear it's ugly
> not-so-little head.


        One more quick note: over the night, we had a vacuum analyze run on the 
whole 
database (it's cronned to run weekly).  In the output of the vacuum (it was 
set to verbose), it looks like our main inventory table had 7 indexes, each 
of which had 12.7M index row versions (along with 12.7 row versions from the 
table itself) removed.  This would probably affect Slony *somehow*... but I'm 
not sure if it would directly contribute to the problem at hand.

-- 
Best Regards,


Dan Falconer
"Head Geek",
AvSupport, Inc. (http://www.partslogistics.com)
_______________________________________________
Slony1-general mailing list
[email protected]
http://gborg.postgresql.org/mailman/listinfo/slony1-general

Re: [Slony1-general] Replication falling behind...

Reply via email to