Re: [Slony1-general] Replication falling behind...

Dan Falconer Tue, 13 Feb 2007 06:59:35 -0800

On Monday 12 February 2007 5:17 pm, Dan Falconer wrote:
>       Once again, my slave server has fallen behind, even after doing the 
> vacuum
> analyze + restart slons trick.  And once again, it was after processing
> inventories... my slave has 1,079,592 records in sl_log_1, while the master
> is sitting at a whopping 2,655,927.  Also, the slave is reporting (okay, a
> query I ran on the slave's db is reporting) that it is now 14 hours behind,
> while the master appears to be saying that it's not behind at all... unless
> I'm reading something wrong.  Here's what I'm running, & the output (FYI:
> node1 is the slave, node2 is the master):::
>
>
> [------------------------- snip -------------------------]
> MASTER=# select con_origin, con_received, max(con_seqno),
> max(con_timestamp), now() - max(con_timestamp) as age from
> _pl_replication.sl_confirm group by con_origin, con_received order by age;
>  con_origin | con_received |  max   |            max             |      
> age
> ------------+--------------+--------+----------------------------+---------
>-------- 1 |            2 | 120564 | 2007-02-12 17:11:15.42497  |
> 00:00:00.948413
>           2 |            1 | 895115 | 2007-02-12 17:10:03.907914 |
> 00:01:12.465469
> (2 rows)
>
>
> SLAVE=# select con_origin, con_received, max(con_seqno),
> max(con_timestamp), now() - max(con_timestamp) as age from
> _pl_replication.sl_confirm group by con_origin, con_received order by age;
>  con_origin | con_received |  max   |            max             |      
> age
> ------------+--------------+--------+----------------------------+---------
>-------- 2 |            1 | 895115 | 2007-02-12 17:10:03.907914 |
> 00:01:35.189915
>           1 |            2 | 115554 | 2007-02-12 02:50:16.085218 |
> 14:21:23.012611
> (2 rows)
> [------------------------- /snip -------------------------]
>
>
>       I find the output slightly disturbing: the master (node2) thinks the 
> slave
> (node1) is lagging just a little, but the slave thinks it's lagging A LOT.
> Am I reading something wrong?
>
>       Also, in an effort to try fixing the problem, I manually ran a vacuum
> analyze verbose on ALL the slony tables.  Nothing of any consequence there.
>
>       One final bit of information: when the servers were recovered a few 
> weeks
> ago from a disastrous crash of the SAN, we found that our backups were
> missing a copy of the postgresql.conf file.  We've been tweaking the one
> copied from our development server.  Anybody have any insight on tweaks to
> that which might make a difference?  Pertinent information (copied from my
> original post):::
>
> Master & Slave (identical setup):
>         HARDWARE::: dual opteron 846 procs, 8G ram, RAID5 array (SAN)
> running 6 fibre 15k drives.  Internal OS runs on mirrored 15k SCSI array
> (~32G), with a mirrored 15k SCSI array (~32G) for the WAL directory.
>         OS::: SLES 8.1
>         SOFTWARE::: PostgreSQL 8.0.4, Slony 1.2.6


        Sadly, the master is now showing 4.2M rows in sl_log_1, while the slave 
is 
still at 1.08M... I'm just not sure what to do here, or where to look.  I 
suppose today I'll be going through the drop + add node procedure.  :(( 

-- 
Best Regards,


Dan Falconer
"Head Geek",
AvSupport, Inc. (http://www.partslogistics.com)
_______________________________________________
Slony1-general mailing list
[email protected]
http://gborg.postgresql.org/mailman/listinfo/slony1-general

Re: [Slony1-general] Replication falling behind...

Reply via email to