Problem solved.
although there's no old transaction in pg_stat_activity (no <IDLE> in transaction), I checked the pg_locks and found the transactionid = 746264948 (ev_minxid).
So, I terminated the process associated with that transactionid, and a big chunk of sl_log_1 entries got deleted.
:D
I am not too sure if this is a bug related to slony or our application.
 
J


 
On 10/16/06, Junaili Lie <[EMAIL PROTECTED]> wrote:
Hi Christopher
I didn't see anything from the cleanup thread. most are pretty fast, with the exception of vacuuming.
Number of records on sl_log_1 has been increasing since yesterday.
on the sl_event, the ev_minxid (for ev_origin=1) has been the same since yesterday also.
 
MONSOON=# select ev_origin, ev_seqno, ev_minxid from "_MONSOONCLUSTER".sl_event where (ev_origin, ev_seqno) in   (select ev_origin, min(ev_seqno)  from "_MONSOONCLUSTER".sl_event
    where ev_type = 'SYNC'    group by ev_origin);
 ev_origin | ev_seqno | ev_minxid
-----------+----------+-----------
         1 |    13952 | MONSOON=# select ev_origin, ev_seqno, ev_minxid from "_MONSOONCLUSTER".sl_event where (ev_origin, ev_seqno) in   (select ev_origin, min(ev_seqno)  from "_MONSOONCLUSTER".sl_event
    where ev_type = 'SYNC'    group by ev_origin);
 ev_origin | ev_seqno | ev_minxid
-----------+----------+-----------
         1 |    13952 | 746264948
         2 |     6881 | 82628418
(2 rows)

How do I know whether the slon processes aren't 'living' long enough to do a good clean up?
 
this is the cleanup thread from
- Master
2006-10-16 08:23:31 PDT DEBUG1 cleanupThread:    0.230 seconds for cleanupEvent()
2006-10-16 08:23:32 PDT DEBUG1 cleanupThread:    0.543 seconds for delete logs
2006-10-16 08:34:04 PDT DEBUG1 cleanupThread:    0.237 seconds for cleanupEvent()
2006-10-16 08:34:05 PDT DEBUG1 cleanupThread:    0.507 seconds for delete logs
2006-10-16 08:44:22 PDT DEBUG1 cleanupThread:    0.278 seconds for cleanupEvent()
2006-10-16 08:44:24 PDT DEBUG1 cleanupThread:    1.357 seconds for delete logs
2006-10-16 08:45:06 PDT DEBUG2 cleanupThread:   42.636 seconds for vacuuming
2006-10-16 08:55:44 PDT DEBUG1 cleanupThread:    0.127 seconds for cleanupEvent()
2006-10-16 08:55:45 PDT DEBUG1 cleanupThread:    0.056 seconds for delete logs
2006-10-16 09:06:02 PDT DEBUG1 cleanupThread:    0.370 seconds for cleanupEvent()
2006-10-16 09:06:02 PDT DEBUG1 cleanupThread:    0.473 seconds for delete logs
2006-10-16 09:16:51 PDT DEBUG1 cleanupThread:    0.692 seconds for cleanupEvent()
2006-10-16 09:16:53 PDT DEBUG1 cleanupThread:    1.158 seconds for delete logs
2006-10-16 09:17:43 PDT DEBUG2 cleanupThread:   50.542 seconds for vacuuming
2006-10-16 09:28:02 PDT DEBUG1 cleanupThread:    0.222 seconds for cleanupEvent()
2006-10-16 09:28:02 PDT DEBUG1 cleanupThread:    0.103 seconds for delete logs
2006-10-16 09:38:31 PDT DEBUG1 cleanupThread:    0.069 seconds for cleanupEvent()
2006-10-16 09:38:32 PDT DEBUG1 cleanupThread:    0.060 seconds for delete logs
2006-10-16 09:49:14 PDT DEBUG1 cleanupThread:    1.182 seconds for cleanupEvent()
2006-10-16 09:49:16 PDT DEBUG1 cleanupThread:    2.004 seconds for delete logs
2006-10-16 09:50:05 PDT DEBUG2 cleanupThread:   49.172 seconds for vacuuming
2006-10-16 10:00:27 PDT DEBUG1 cleanupThread:    0.490 seconds for cleanupEvent()
2006-10-16 10:00:27 PDT DEBUG1 cleanupThread:    0.107 seconds for delete logs
2006-10-16 10:11:03 PDT DEBUG1 cleanupThread:    0.453 seconds for cleanupEvent()
2006-10-16 10:11:04 PDT DEBUG1 cleanupThread:    0.368 seconds for delete logs
2006-10-16 10:21:33 PDT DEBUG1 cleanupThread:    0.374 seconds for cleanupEvent()
2006-10-16 10:21:33 PDT DEBUG1 cleanupThread:    0.126 seconds for delete logs
2006-10-16 10:22:29 PDT DEBUG2 cleanupThread:   55.574 seconds for vacuuming
2006-10-16 10:33:21 PDT DEBUG1 cleanupThread:    3.556 seconds for cleanupEvent()
2006-10-16 10:33:21 PDT DEBUG1 cleanupThread:    0.178 seconds for delete logs
2006-10-16 10:43:52 PDT DEBUG1 cleanupThread:    0.125 seconds for cleanupEvent()
2006-10-16 10:43:53 PDT DEBUG1 cleanupThread:    0.868 seconds for delete logs
2006-10-16 10:54:33 PDT DEBUG1 cleanupThread:    0.134 seconds for cleanupEvent()
2006-10-16 10:54:34 PDT DEBUG1 cleanupThread:    0.650 seconds for delete logs
2006-10-16 10:55:38 PDT DEBUG2 cleanupThread:   64.382 seconds for vacuuming
2006-10-16 11:05:56 PDT DEBUG1 cleanupThread:    0.179 seconds for cleanupEvent()
2006-10-16 11:05:56 PDT DEBUG1 cleanupThread:    0.124 seconds for delete logs
 
- Slave
2006-10-16 08:31:16 PDT DEBUG1 cleanupThread:    0.002 seconds for cleanupEvent()
2006-10-16 08:31:16 PDT DEBUG1 cleanupThread:    0.003 seconds for delete logs
2006-10-16 08:42:00 PDT DEBUG1 cleanupThread:    0.001 seconds for cleanupEvent()
2006-10-16 08:42:00 PDT DEBUG1 cleanupThread:    0.003 seconds for delete logs
2006-10-16 08:52:27 PDT DEBUG1 cleanupThread:    0.002 seconds for cleanupEvent()
2006-10-16 08:52:27 PDT DEBUG1 cleanupThread:    0.004 seconds for delete logs
2006-10-16 08:52:47 PDT DEBUG2 cleanupThread:   20.650 seconds for vacuuming
2006-10-16 09:03:19 PDT DEBUG1 cleanupThread:    0.002 seconds for cleanupEvent()
2006-10-16 09:03:19 PDT DEBUG1 cleanupThread:    0.032 seconds for delete logs
2006-10-16 09:13:44 PDT DEBUG1 cleanupThread:    0.002 seconds for cleanupEvent()
2006-10-16 09:13:44 PDT DEBUG1 cleanupThread:    0.003 seconds for delete logs
2006-10-16 09:24:06 PDT DEBUG1 cleanupThread:    0.002 seconds for cleanupEvent()
2006-10-16 09:24:06 PDT DEBUG1 cleanupThread:    0.003 seconds for delete logs
2006-10-16 09:24:34 PDT DEBUG2 cleanupThread:   27.288 seconds for vacuuming
2006-10-16 09:34:53 PDT DEBUG1 cleanupThread:    0.002 seconds for cleanupEvent()
2006-10-16 09:34:53 PDT DEBUG1 cleanupThread:    0.003 seconds for delete logs
2006-10-16 09:45:24 PDT DEBUG1 cleanupThread:    0.001 seconds for cleanupEvent()
2006-10-16 09:45:24 PDT DEBUG1 cleanupThread:    0.003 seconds for delete logs
2006-10-16 09:55:58 PDT DEBUG1 cleanupThread:    0.002 seconds for cleanupEvent()
2006-10-16 09:55:58 PDT DEBUG1 cleanupThread:    0.003 seconds for delete logs
2006-10-16 09:56:52 PDT DEBUG2 cleanupThread:   54.306 seconds for vacuuming
2006-10-16 10:07:20 PDT DEBUG1 cleanupThread:    0.001 seconds for cleanupEvent()
2006-10-16 10:07:20 PDT DEBUG1 cleanupThread:    0.003 seconds for delete logs
2006-10-16 10:17:46 PDT DEBUG1 cleanupThread:    0.002 seconds for cleanupEvent()
2006-10-16 10:17:46 PDT DEBUG1 cleanupThread:    0.003 seconds for delete logs
2006-10-16 10:28:13 PDT DEBUG1 cleanupThread:    0.001 seconds for cleanupEvent()
2006-10-16 10:28:13 PDT DEBUG1 cleanupThread:    0.003 seconds for delete logs
2006-10-16 10:29:10 PDT DEBUG2 cleanupThread:   57.198 seconds for vacuuming
2006-10-16 10:39:55 PDT DEBUG1 cleanupThread:    0.002 seconds for cleanupEvent()
2006-10-16 10:39:55 PDT DEBUG1 cleanupThread:    0.003 seconds for delete logs
2006-10-16 10:50:39 PDT DEBUG1 cleanupThread:    0.002 seconds for cleanupEvent()
2006-10-16 10:50:39 PDT DEBUG1 cleanupThread:    0.003 seconds for delete logs
2006-10-16 11:01:21 PDT DEBUG1 cleanupThread:    0.002 seconds for cleanupEvent()
2006-10-16 11:01:21 PDT DEBUG1 cleanupThread:    0.003 seconds for delete logs
2006-10-16 11:02:18 PDT DEBUG2 cleanupThread:   57.428 seconds for vacuuming
2006-10-16 11:12:45 PDT DEBUG1 cleanupThread:    0.002 seconds for cleanupEvent()
2006-10-16 11:12:45 PDT DEBUG1 cleanupThread:    0.003 seconds for delete logs
 
Thanks for your help.
 
j
 
On 10/16/06, Christopher Browne < [EMAIL PROTECTED]> wrote:
Junaili Lie wrote:
> This is the result from test_slony_state-dbi.pl
>
> DSN: dbi:Pg:dbname=MONSOON;host=localhost;port=5432
>
Pulling out some bits...
>
> --------------------------------------------------------------------------------
> Summary of event info
>  Origin  Min SYNC  Max SYNC Min SYNC Age Max SYNC Age
> ================================================================================
>
>       2      6872      6879     00:00:00     00:14:00    0
>       1     13934     13947     00:00:00     00:13:00    0
>
>
> ---------------------------------------------------------------------------------
> Summary of sl_confirm aging
>    Origin   Receiver   Min SYNC   Max SYNC  Age of latest SYNC  Age of
> eldest SYNC
> =================================================================================
>
>         1          2      13934      13946      00:01:00
> 00:13:00    0
>         2          1       6872       6879      00:00:00
> 00:14:00    0
>
>
> --------------------------------------------------------------------------------
> Summary of event info
>  Origin  Min SYNC  Max SYNC Min SYNC Age Max SYNC Age
> ================================================================================
>
>       2      6872      6879     00:00:00     00:14:00    0
>       1     13934     13947     00:00:00     00:13:00    0
>
>
> ---------------------------------------------------------------------------------
> Summary of sl_confirm aging
>    Origin   Receiver   Min SYNC   Max SYNC  Age of latest SYNC  Age of
> eldest SYNC
> =================================================================================
>
>         1          2      13934      13946      00:01:00
> 00:13:00    0
>         2          1       6872       6879      00:00:00
> 00:14:00    0
>

Well, the fact that you don't have any cases of elderly information in
sl_confirm or sl_event indicates that there doesn't appear to be any
problem with the flow of events, which represents a major class of
"kinds of problems."

It doesn't look as though anything is broken with your cluster that
would prevent cleaning out sl_log_1.

The next suggestion:  Grep for cleanup thread entries in the logs for
both slons.

Here's a sample excerpt for a not-too-busy set of nodes.  Note the
longer times at 16:30/16:31; there was evidently some "blip" of a bunch
of data to clean out.

2006-10-16 16:09:39 UTC DEBUG1 cleanupThread:     0.046 seconds for
delete logs
2006-10-16 16:20:23 UTC DEBUG1 cleanupThread:    0.115 seconds for
cleanupEvent()
2006-10-16 16:20:23 UTC DEBUG1 cleanupThread:    0.221 seconds for
delete logs
2006-10-16 16:30:43 UTC DEBUG1 cleanupThread:     0.068 seconds for
cleanupEvent()
2006-10-16 16:30:46 UTC DEBUG1 cleanupThread:    3.914 seconds for
delete logs
2006-10-16 16:31:08 UTC DEBUG2 cleanupThread:   21.869 seconds for vacuuming
2006-10-16 16:41:34 UTC DEBUG1 cleanupThread:     0.081 seconds for
cleanupEvent()
2006-10-16 16:41:34 UTC DEBUG1 cleanupThread:    0.066 seconds for
delete logs
2006-10-16 16:52:03 UTC DEBUG1 cleanupThread:    0.069 seconds for
cleanupEvent()
2006-10-16 16:52:03 UTC DEBUG1 cleanupThread:     0.059 seconds for
delete logs
2006-10-16 17:02:36 UTC DEBUG1 cleanupThread:    0.070 seconds for
cleanupEvent()
2006-10-16 17:02:36 UTC DEBUG1 cleanupThread:    0.059 seconds for
delete logs
2006-10-16 17:02:37 UTC DEBUG2 cleanupThread:     0.841 seconds for vacuuming
2006-10-16 17:13:02 UTC DEBUG1 cleanupThread:    0.070 seconds for
cleanupEvent()
2006-10-16 17:13:02 UTC DEBUG1 cleanupThread:    0.123 seconds for
delete logs
2006-10-16 17:23:19 UTC DEBUG1 cleanupThread:     0.079 seconds for
cleanupEvent()
2006-10-16 17:23:19 UTC DEBUG1 cleanupThread:    0.081 seconds for
delete logs
2006-10-16 17:33:37 UTC DEBUG1 cleanupThread:    0.073 seconds for
cleanupEvent()
2006-10-16 17:33:37 UTC DEBUG1 cleanupThread:     0.411 seconds for
delete logs
2006-10-16 17:33:45 UTC DEBUG2 cleanupThread:    7.732 seconds for vacuuming
2006-10-16 17:44:29 UTC DEBUG1 cleanupThread:    0.116 seconds for
cleanupEvent()
2006-10-16 17:44:29 UTC DEBUG1 cleanupThread:     0.058 seconds for
delete logs
2006-10-16 17:55:03 UTC DEBUG1 cleanupThread:    0.071 seconds for
cleanupEvent()
2006-10-16 17:55:03 UTC DEBUG1 cleanupThread:    0.059 seconds for
delete logs
2006-10-16 18:05:40 UTC DEBUG1 cleanupThread:     0.071 seconds for
cleanupEvent()
2006-10-16 18:05:40 UTC DEBUG1 cleanupThread:    0.063 seconds for
delete logs
2006-10-16 18:05:51 UTC DEBUG2 cleanupThread:   11.087 seconds for vacuuming

A problem could arise if your slon processes aren't "living" long enough
to do a good cleanup, if you're not actually seeing these sorts of steps
complete.  (Which should happen about every 10 minutes.)

It could be that you did a whole lot of updates a little while ago, and
there's simply some waiting before the log tables get cleaned out.


_______________________________________________
Slony1-general mailing list
[email protected]
http://gborg.postgresql.org/mailman/listinfo/slony1-general

Reply via email to