I've recently upgraded to Slony 1.2.6 on our master/slave system, both
running PostgreSQL 8.0.4. The initial replication went spectacularly fast.
Kudos, guys.
Unfortunately, replication is now spectacularly behind, which is the
reason
I'm searching for help. The slave is running about 10M events behind the
master, and doesn't seem to be doing a great job of catching up. I've
encountered this before, as had our previous DBA:
http://gborg.postgresql.org/pipermail/slony1-general/2005-January/001409.html
After going to the end of that archive, and looking through my notes, I
feel
at this point that it's simply a performance problem with Slony. On sunday
(the 4th) at ~11pm, we processed about 2.4M rows of inventory changes, which
took the master until yesterday @~3:30pm to complete. Another 0.7M was
modified yesterday. With the rest of the activity on the system, I think it
alll added up to this problem (the inventory changes all happen within a
single massive transaction block). The slave appears to be trying to catch
up, but it's about 10.4M transactions behind (according to sl_log_2).
Pertinent information:
Master & Slave (identical setup):
HARDWARE::: dual opteron 846 procs, 8G ram, RAID5 array (SAN) running 6
fibre
15k drives. Internal OS runs on mirrored 15k SCSI array (~32G), with a
mirrored 15k SCSI array (~32G) for the WAL directory.
OS::: SLES 8.1
SOFTWARE::: PostgreSQL 8.0.4, Slony 1.2.6
The slave shows it's "delay for first row" holding steady at just under 60
seconds since the big update. Here's info from slony (dispater==master,
mammon=slave):::
-------------------------- [snip] --------------------------
[dispater]=# select con_origin, con_received, max(con_seqno),
max(con_timestamp), now() - max(con_timestamp) as age from
_pl_replication.sl_confirm group by con_origin, con_received order by age;
con_origin | con_received | max | max | age
------------+--------------+--------+----------------------------+-----------------
1 | 2 | 67489 | 2007-02-06 12:43:30.956724 |
00:00:02.51687
2 | 1 | 426428 | 2007-02-06 12:41:57.258811 |
00:01:36.214783
(2 rows)
[mammon]=# select con_origin, con_received, max(con_seqno),
max(con_timestamp), now() - max(con_timestamp) as age from
_pl_replication.sl_confirm group by con_origin, con_received order by age;
con_origin | con_received | max | max |
age
------------+--------------+--------+----------------------------+-----------------------
2 | 1 | 426428 | 2007-02-06 12:41:57.258811 |
00:01:42.729157
1 | 2 | 55229 | 2007-02-05 02:30:38.268296 | 1 day
10:13:01.719672
(2 rows)
------------------------- [/snip] --------------------------
--
Best Regards,
Dan Falconer
"Head Geek",
AvSupport, Inc. (http://www.partslogistics.com)
_______________________________________________
Slony1-general mailing list
[email protected]
http://gborg.postgresql.org/mailman/listinfo/slony1-general