Re: [Slony1-general] Replication falling behind...

Dan Falconer Mon, 12 Feb 2007 15:24:37 -0800

        Once again, my slave server has fallen behind, even after doing the 
vacuum 
analyze + restart slons trick.  And once again, it was after processing 
inventories... my slave has 1,079,592 records in sl_log_1, while the master 
is sitting at a whopping 2,655,927.  Also, the slave is reporting (okay, a 
query I ran on the slave's db is reporting) that it is now 14 hours behind, 
while the master appears to be saying that it's not behind at all... unless 
I'm reading something wrong.  Here's what I'm running, & the output (FYI: 
node1 is the slave, node2 is the master):::



[------------------------- snip -------------------------]
MASTER=# select con_origin, con_received, max(con_seqno), max(con_timestamp), 
now() - max(con_timestamp) as age from _pl_replication.sl_confirm group by 
con_origin, con_received order by age;
 con_origin | con_received |  max   |            max             |       age
------------+--------------+--------+----------------------------+-----------------
          1 |            2 | 120564 | 2007-02-12 17:11:15.42497  | 
00:00:00.948413
          2 |            1 | 895115 | 2007-02-12 17:10:03.907914 | 
00:01:12.465469
(2 rows)


SLAVE=# select con_origin, con_received, max(con_seqno), max(con_timestamp), 
now() - max(con_timestamp) as age from _pl_replication.sl_confirm group by 
con_origin, con_received order by age;
 con_origin | con_received |  max   |            max             |       age
------------+--------------+--------+----------------------------+-----------------
          2 |            1 | 895115 | 2007-02-12 17:10:03.907914 | 
00:01:35.189915
          1 |            2 | 115554 | 2007-02-12 02:50:16.085218 | 
14:21:23.012611
(2 rows)
[------------------------- /snip -------------------------]


        I find the output slightly disturbing: the master (node2) thinks the 
slave 
(node1) is lagging just a little, but the slave thinks it's lagging A LOT.  
Am I reading something wrong?  

        Also, in an effort to try fixing the problem, I manually ran a vacuum 
analyze 
verbose on ALL the slony tables.  Nothing of any consequence there.  

        One final bit of information: when the servers were recovered a few 
weeks ago 
from a disastrous crash of the SAN, we found that our backups were missing a 
copy of the postgresql.conf file.  We've been tweaking the one copied from 
our development server.  Anybody have any insight on tweaks to that which 
might make a difference?  Pertinent information (copied from my original 
post):::

Master & Slave (identical setup): 
        HARDWARE::: dual opteron 846 procs, 8G ram, RAID5 array (SAN) running 
6 fibre 15k drives.  Internal OS runs on mirrored 15k SCSI array (~32G), with 
a mirrored 15k SCSI array (~32G) for the WAL directory.  
        OS::: SLES 8.1
        SOFTWARE::: PostgreSQL 8.0.4, Slony 1.2.6

-- 
Best Regards,


Dan Falconer
"Head Geek",
AvSupport, Inc. (http://www.partslogistics.com)
_______________________________________________
Slony1-general mailing list
[email protected]
http://gborg.postgresql.org/mailman/listinfo/slony1-general

Re: [Slony1-general] Replication falling behind...

Reply via email to