Re: [Slony1-general] Replication falling behind...

Dan Falconer Tue, 13 Feb 2007 07:20:54 -0800

On Tuesday 13 February 2007 8:44 am, Dan Falconer wrote:
> On Monday 12 February 2007 5:17 pm, Dan Falconer wrote:
> >     Once again, my slave server has fallen behind, even after doing the
> > vacuum analyze + restart slons trick.  And once again, it was after
> > processing inventories... my slave has 1,079,592 records in sl_log_1,
> > while the master is sitting at a whopping 2,655,927.  Also, the slave is
> > reporting (okay, a query I ran on the slave's db is reporting) that it is
> > now 14 hours behind, while the master appears to be saying that it's not
> > behind at all... unless I'm reading something wrong.  Here's what I'm
> > running, & the output (FYI: node1 is the slave, node2 is the master):::
> >
> >
> > [------------------------- snip -------------------------]
> > MASTER=# select con_origin, con_received, max(con_seqno),
> > max(con_timestamp), now() - max(con_timestamp) as age from
> > _pl_replication.sl_confirm group by con_origin, con_received order by
> > age; con_origin | con_received |  max   |            max             |
> > age
> > ------------+--------------+--------+----------------------------+-------
> >-- -------- 1 |            2 | 120564 | 2007-02-12 17:11:15.42497  |
> > 00:00:00.948413
> >           2 |            1 | 895115 | 2007-02-12 17:10:03.907914 |
> > 00:01:12.465469
> > (2 rows)
> >
> >
> > SLAVE=# select con_origin, con_received, max(con_seqno),
> > max(con_timestamp), now() - max(con_timestamp) as age from
> > _pl_replication.sl_confirm group by con_origin, con_received order by
> > age; con_origin | con_received |  max   |            max             |
> > age
> > ------------+--------------+--------+----------------------------+-------
> >-- -------- 2 |            1 | 895115 | 2007-02-12 17:10:03.907914 |
> > 00:01:35.189915
> >           1 |            2 | 115554 | 2007-02-12 02:50:16.085218 |
> > 14:21:23.012611
> > (2 rows)
> > [------------------------- /snip -------------------------]
> >
> >
> >     I find the output slightly disturbing: the master (node2) thinks the
> > slave (node1) is lagging just a little, but the slave thinks it's lagging
> > A LOT. Am I reading something wrong?
> >
> >     Also, in an effort to try fixing the problem, I manually ran a vacuum
> > analyze verbose on ALL the slony tables.  Nothing of any consequence
> > there.
> >
> >     One final bit of information: when the servers were recovered a few
> > weeks ago from a disastrous crash of the SAN, we found that our backups
> > were missing a copy of the postgresql.conf file.  We've been tweaking the
> > one copied from our development server.  Anybody have any insight on
> > tweaks to that which might make a difference?  Pertinent information
> > (copied from my original post):::
> >
> > Master & Slave (identical setup):
> >         HARDWARE::: dual opteron 846 procs, 8G ram, RAID5 array (SAN)
> > running 6 fibre 15k drives.  Internal OS runs on mirrored 15k SCSI array
> > (~32G), with a mirrored 15k SCSI array (~32G) for the WAL directory.
> >         OS::: SLES 8.1
> >         SOFTWARE::: PostgreSQL 8.0.4, Slony 1.2.6
>
>       Sadly, the master is now showing 4.2M rows in sl_log_1, while the slave 
> is
> still at 1.08M... I'm just not sure what to do here, or where to look.  I
> suppose today I'll be going through the drop + add node procedure.  :((


        This may have already been posted on the list/known about, but I found 
this 
bit of info (printed directly to the terminal instead of a log) a bit 
disturbing, and quite possibly the culprit:::


NOTICE:  Slony-I: cleanup stale sl_nodelock entry for pid=29721
CONTEXT:  SQL statement "SELECT  "_pl_replication".cleanupNodelock()"
PL/pgSQL function "cleanupevent" line 77 at perform
NOTICE:  Slony-I: cleanup stale sl_nodelock entry for pid=29723
CONTEXT:  SQL statement "SELECT  "_pl_replication".cleanupNodelock()"
PL/pgSQL function "cleanupevent" line 77 at perform
-- 
Best Regards,


Dan Falconer
"Head Geek",
AvSupport, Inc. (http://www.partslogistics.com)
_______________________________________________
Slony1-general mailing list
[email protected]
http://gborg.postgresql.org/mailman/listinfo/slony1-general

Re: [Slony1-general] Replication falling behind...

Reply via email to