On Monday 12 February 2007 5:17 pm, Dan Falconer wrote:
> Once again, my slave server has fallen behind, even after doing the
> vacuum
> analyze + restart slons trick. And once again, it was after processing
> inventories... my slave has 1,079,592 records in sl_log_1, while the master
> is sitting at a whopping 2,655,927. Also, the slave is reporting (okay, a
> query I ran on the slave's db is reporting) that it is now 14 hours behind,
> while the master appears to be saying that it's not behind at all... unless
> I'm reading something wrong. Here's what I'm running, & the output (FYI:
> node1 is the slave, node2 is the master):::
>
>
> [------------------------- snip -------------------------]
> MASTER=# select con_origin, con_received, max(con_seqno),
> max(con_timestamp), now() - max(con_timestamp) as age from
> _pl_replication.sl_confirm group by con_origin, con_received order by age;
> con_origin | con_received | max | max |
> age
> ------------+--------------+--------+----------------------------+---------
>-------- 1 | 2 | 120564 | 2007-02-12 17:11:15.42497 |
> 00:00:00.948413
> 2 | 1 | 895115 | 2007-02-12 17:10:03.907914 |
> 00:01:12.465469
> (2 rows)
>
>
> SLAVE=# select con_origin, con_received, max(con_seqno),
> max(con_timestamp), now() - max(con_timestamp) as age from
> _pl_replication.sl_confirm group by con_origin, con_received order by age;
> con_origin | con_received | max | max |
> age
> ------------+--------------+--------+----------------------------+---------
>-------- 2 | 1 | 895115 | 2007-02-12 17:10:03.907914 |
> 00:01:35.189915
> 1 | 2 | 115554 | 2007-02-12 02:50:16.085218 |
> 14:21:23.012611
> (2 rows)
> [------------------------- /snip -------------------------]
>
>
> I find the output slightly disturbing: the master (node2) thinks the
> slave
> (node1) is lagging just a little, but the slave thinks it's lagging A LOT.
> Am I reading something wrong?
>
> Also, in an effort to try fixing the problem, I manually ran a vacuum
> analyze verbose on ALL the slony tables. Nothing of any consequence there.
>
> One final bit of information: when the servers were recovered a few
> weeks
> ago from a disastrous crash of the SAN, we found that our backups were
> missing a copy of the postgresql.conf file. We've been tweaking the one
> copied from our development server. Anybody have any insight on tweaks to
> that which might make a difference? Pertinent information (copied from my
> original post):::
>
> Master & Slave (identical setup):
> HARDWARE::: dual opteron 846 procs, 8G ram, RAID5 array (SAN)
> running 6 fibre 15k drives. Internal OS runs on mirrored 15k SCSI array
> (~32G), with a mirrored 15k SCSI array (~32G) for the WAL directory.
> OS::: SLES 8.1
> SOFTWARE::: PostgreSQL 8.0.4, Slony 1.2.6
Sadly, the master is now showing 4.2M rows in sl_log_1, while the slave
is
still at 1.08M... I'm just not sure what to do here, or where to look. I
suppose today I'll be going through the drop + add node procedure. :((
--
Best Regards,
Dan Falconer
"Head Geek",
AvSupport, Inc. (http://www.partslogistics.com)
_______________________________________________
Slony1-general mailing list
[email protected]
http://gborg.postgresql.org/mailman/listinfo/slony1-general