I'm pretty sure I do have a problem here, even though I don't see it in the
logs or slony1 tables.

It looks like the DB is getting around 20 transactions per minute (approx
30,000 inserts per day) and we have a cleanup routine that runs on weekends
only.  Both master and slave servers have been rebooted at least once but
the tables on the slave only have data up to Nov 28, 2005.  It looks like NO
replication is occurring at all but I still do not see any errors and the
two servers are "talking" to each other.  Also last week my sl_log_1 was at
around 5.5 million rows, this week it is 6 million and climbing.

I've created another cluster on a different DB with the same two servers.
These are live tables being replicated but with very low volume.  So far
this new cluster is working great with transactions replicated over within
seconds of all updates/inserts.  The original cluster is still stuck on Nov
28.

I will try to put aside some time this week to look into the tools dir and
see what tests are not being run properly.  If I don't find an answer I
think rebuilding the slave will have to be done.

Thanks,
Robert

-----Original Message-----
From: Christopher Browne [mailto:[EMAIL PROTECTED]
Sent: Tuesday, January 24, 2006 4:44 PM
To: Robert Littlejohn
Cc: '[email protected]'
Subject: Re: [Slony1-general] sl_log_1 filling


Robert Littlejohn <[EMAIL PROTECTED]> writes:

> Great, thanks for the info. I've been meaning to get back to this but I've
> been out of the office a bit.
>
> I had already been through the faq (that's why I did the vacuums) but so
far
> I still can't find anything.  The query you posted returned no results.
I'm
> looking at the test_slony_state-dbi.pl now but so far it only tells me
> sl_seqlog, sl_log_1, sl_seqlog all exceed 200000.  Some of the tests fail
> with perl errors so I'll try to get those test run.
>
> I've also looked quite a bit at the logs and have found nothing.  The
> cleanupThread is starting every 5 - 15 minutes and reports things like:
> 2006-01-06 12:25:47 AST DEBUG1 cleanupThread:    5.849 seconds for
> cleanupEvent()
> 2006-01-06 12:26:20 AST DEBUG3 cleanupThread: minxid: 199383042
> 2006-01-06 12:26:20 AST DEBUG4 cleanupThread: xid 199383042 still active -
> analyze instead
> 2006-01-06 12:39:49 AST DEBUG1 cleanupThread:    3.261 seconds for
> cleanupEvent()
> 2006-01-06 12:40:08 AST DEBUG1 cleanupThread:   18.638 seconds for delete
> logs
>
> except for the xid 199383042 still active - analyze instead nothing really
> jumps out at me.

Well, the "analyze instead" part ought to be a reasonably useful
optimization.

Essentially, if a transaction is running now that was running the last
time the cleanup thread was running, then it's futile to try to VACUUM
the tables, as no data will get cleaned out; that
fairly-old-transaction will hold onto the data.

I'm not sure that you necessarily have any problem going on right now.
If you have some long-running transactions (and you certainly do), and
see hundreds/thousands of database updates per minute, it would be
pretty easy for the size of sl_log_1 and sl_seqlog to grow to ~200K.

Consider: sl_log_1 contains a row for each tuple that is updated.

If you do a transaction per second, each of which involves 10 table
updates, that would add, to this table...

  60 x 10 = 600 rows per minute

If you had *no* long running transactions, then you'd expect the
cleanup thread, after 10 minutes, to find, and leave alone, 600 x 10 =
6000 rows that are relevant to the last 10 minutes of activity.

If you have some transaction that's running for an hour, then that
leads to growth to 36000 rows.

If you're doing about 5 transactions per second, rather than 1, that
easily gets you to 200K rows in sl_log_1.

sl_seqlog gets, for each sync, a row for each sequence that you
replicate.  If you have 50 sequences, that can grow pretty big pretty
easily...

I'm sort of doing some "back of the napkin" estimates here just to
suggest how you might check to see if the numbers are reasonable or
not...

If you ever see replication fall behind, if a WAN connection slows up,
it would be fully natural to see sl_log_1 grow to:

 period of time in seconds, plus 10 minutes
      times
 expected transactions per second
      times
 expected tuples updated per transaction

Does that help?
-- 
"cbbrowne","@","ca.afilias.info"
<http://dba2.int.libertyrms.com/>
Christopher Browne
(416) 673-4124 (land)
_______________________________________________
Slony1-general mailing list
[email protected]
http://gborg.postgresql.org/mailman/listinfo/slony1-general

Reply via email to