Andew

You're right, I realised after, that they're not full vacuums.

There was another database (mail_lxtreme) that was unused (as far as I can tell) and which was not being vacuumed:

 SELECT datname, age(datfrozenxid) FROM pg_database;
   datname    |     age
--------------+-------------
 mail_lxtreme | -2074187459
 bp_live      |  1079895636
 template1    |  1076578064
 template0    | -2074187459
(4 rows)

In the end, I did a moveset on all 6 sets from the (damaged) master to the slave. Then I shutdown slon and postgress on the old master and deleted its data dir, and then re-initdb'd it.

I removed the replication info on the surviving node by doing an uninstall.

I created a new cluster and subscribed to get all the data back onto the rebuilt server.

Later when I'm feeling less tired, I'll switchover to reinstate the former master (as it is a much faster server).

I'm going to be a lot more careful when I add databases to ensure that they always vacuumed periodically.

I'l also going to add a new nagios script to scan serverlog for any WARN or ERROR messages for the current day - this way I should get notice of a problem before it becomes a disaster!

Thanks for your feedback.

John

Andrew Sullivan wrote:
On Sun, Oct 30, 2005 at 09:00:12AM +0000, John Sidney-Woollett wrote:

over 2 billion transactions
DETAIL:  You may have already suffered transaction-wraparound data loss.

We have cronscripts that perform FULL vacuums


Not on all your your databases.  And anyway


# vacuum template1 every sunday
35 2 * * 7 /usr/local/pgsql/bin/vacuumdb --analyze --verbose template1

# vacuum live DB every day
35 5 * * * /usr/local/bin/psql -c "vacuum verbose analyze" -d bp_live -U
postgres --output /home/postgres/cronscripts/live/vacuumfull.log


Those aren't fill vacuums.  There must be some database in there that
you're not telling us about.  Do you have anything other than
template0, template1, and bp_live?  Also, has template0 always been
frozen?


2) What can I do to recover the data?


Nothing, save for restoring from old backups.


I can failover to the slave server, but what do I need to do to rebuild
the original database?


You'll need to rebuild it from scratch.  You could do a switchover
instead, but I think that's risky in this case.


Should I failover now?!! And then start rebuilding the old master
database (using slon, I presume)?


That's what I'd do.  It's just like adding a new node.


How do I stop this EVER happening again??!!!


Well, _something_ didn't get vacuumed in time.  Better find out what
that was.  I'm also extremely surprised you didn't see the warnings
in time -- are you sure you're not overlooking something important in
your logs?

A

_______________________________________________
Slony1-general mailing list
[email protected]
http://gborg.postgresql.org/mailman/listinfo/slony1-general

Reply via email to