Once again, my slave server has fallen behind, even after doing the
vacuum
analyze + restart slons trick. And once again, it was after processing
inventories... my slave has 1,079,592 records in sl_log_1, while the master
is sitting at a whopping 2,655,927. Also, the slave is reporting (okay, a
query I ran on the slave's db is reporting) that it is now 14 hours behind,
while the master appears to be saying that it's not behind at all... unless
I'm reading something wrong. Here's what I'm running, & the output (FYI:
node1 is the slave, node2 is the master):::
[------------------------- snip -------------------------]
MASTER=# select con_origin, con_received, max(con_seqno), max(con_timestamp),
now() - max(con_timestamp) as age from _pl_replication.sl_confirm group by
con_origin, con_received order by age;
con_origin | con_received | max | max | age
------------+--------------+--------+----------------------------+-----------------
1 | 2 | 120564 | 2007-02-12 17:11:15.42497 |
00:00:00.948413
2 | 1 | 895115 | 2007-02-12 17:10:03.907914 |
00:01:12.465469
(2 rows)
SLAVE=# select con_origin, con_received, max(con_seqno), max(con_timestamp),
now() - max(con_timestamp) as age from _pl_replication.sl_confirm group by
con_origin, con_received order by age;
con_origin | con_received | max | max | age
------------+--------------+--------+----------------------------+-----------------
2 | 1 | 895115 | 2007-02-12 17:10:03.907914 |
00:01:35.189915
1 | 2 | 115554 | 2007-02-12 02:50:16.085218 |
14:21:23.012611
(2 rows)
[------------------------- /snip -------------------------]
I find the output slightly disturbing: the master (node2) thinks the
slave
(node1) is lagging just a little, but the slave thinks it's lagging A LOT.
Am I reading something wrong?
Also, in an effort to try fixing the problem, I manually ran a vacuum
analyze
verbose on ALL the slony tables. Nothing of any consequence there.
One final bit of information: when the servers were recovered a few
weeks ago
from a disastrous crash of the SAN, we found that our backups were missing a
copy of the postgresql.conf file. We've been tweaking the one copied from
our development server. Anybody have any insight on tweaks to that which
might make a difference? Pertinent information (copied from my original
post):::
Master & Slave (identical setup):
HARDWARE::: dual opteron 846 procs, 8G ram, RAID5 array (SAN) running
6 fibre 15k drives. Internal OS runs on mirrored 15k SCSI array (~32G), with
a mirrored 15k SCSI array (~32G) for the WAL directory.
OS::: SLES 8.1
SOFTWARE::: PostgreSQL 8.0.4, Slony 1.2.6
--
Best Regards,
Dan Falconer
"Head Geek",
AvSupport, Inc. (http://www.partslogistics.com)
_______________________________________________
Slony1-general mailing list
[email protected]
http://gborg.postgresql.org/mailman/listinfo/slony1-general