I've recently upgraded to Slony 1.2.6 on our master/slave system, both 
running PostgreSQL 8.0.4.  The initial replication went spectacularly fast.  
Kudos, guys. 

        Unfortunately, replication is now spectacularly behind, which is the 
reason 
I'm searching for help.  The slave is running about 10M events behind the 
master, and doesn't seem to be doing a great job of catching up.  I've 
encountered this before, as had our previous DBA: 
http://gborg.postgresql.org/pipermail/slony1-general/2005-January/001409.html

        After going to the end of that archive, and looking through my notes, I 
feel 
at this point that it's simply a performance problem with Slony.  On sunday 
(the 4th) at ~11pm, we processed about 2.4M rows of inventory changes, which 
took the master until yesterday @~3:30pm to complete.  Another 0.7M was 
modified yesterday.  With the rest of the activity on the system, I think it 
alll added up to this problem (the inventory changes all happen within a 
single massive transaction block).  The slave appears to be trying to catch 
up, but it's about 10.4M transactions behind (according to sl_log_2). 

        Pertinent information:

Master & Slave (identical setup): 
        HARDWARE::: dual opteron 846 procs, 8G ram, RAID5 array (SAN) running 6 
fibre 
15k drives.  Internal OS runs on mirrored 15k SCSI array (~32G), with a 
mirrored 15k SCSI array (~32G) for the WAL directory.  
        OS::: SLES 8.1
        SOFTWARE::: PostgreSQL 8.0.4, Slony 1.2.6

The slave shows it's "delay for first row" holding steady at just under 60 
seconds since the big update.  Here's info from slony (dispater==master, 
mammon=slave):::


-------------------------- [snip] --------------------------
[dispater]=# select con_origin, con_received, max(con_seqno), 
max(con_timestamp), now() - max(con_timestamp) as age from 
_pl_replication.sl_confirm group by con_origin, con_received order by age;
 con_origin | con_received |  max   |            max             |       age
------------+--------------+--------+----------------------------+-----------------
          1 |            2 |  67489 | 2007-02-06 12:43:30.956724 | 
00:00:02.51687
          2 |            1 | 426428 | 2007-02-06 12:41:57.258811 | 
00:01:36.214783
(2 rows)

[mammon]=# select con_origin, con_received, max(con_seqno), 
max(con_timestamp), now() - max(con_timestamp) as age from 
_pl_replication.sl_confirm group by con_origin, con_received order by age;
 con_origin | con_received |  max   |            max             |          
age
------------+--------------+--------+----------------------------+-----------------------
          2 |            1 | 426428 | 2007-02-06 12:41:57.258811 | 
00:01:42.729157
          1 |            2 |  55229 | 2007-02-05 02:30:38.268296 | 1 day 
10:13:01.719672
(2 rows)

------------------------- [/snip] --------------------------
-- 
Best Regards,


Dan Falconer
"Head Geek",
AvSupport, Inc. (http://www.partslogistics.com)
_______________________________________________
Slony1-general mailing list
[email protected]
http://gborg.postgresql.org/mailman/listinfo/slony1-general

Reply via email to