Andew
You're right, I realised after, that they're not full vacuums.
There was another database (mail_lxtreme) that was unused (as far as I
can tell) and which was not being vacuumed:
SELECT datname, age(datfrozenxid) FROM pg_database;
datname | age
--------------+-------------
mail_lxtreme | -2074187459
bp_live | 1079895636
template1 | 1076578064
template0 | -2074187459
(4 rows)
In the end, I did a moveset on all 6 sets from the (damaged) master to
the slave. Then I shutdown slon and postgress on the old master and
deleted its data dir, and then re-initdb'd it.
I removed the replication info on the surviving node by doing an uninstall.
I created a new cluster and subscribed to get all the data back onto the
rebuilt server.
Later when I'm feeling less tired, I'll switchover to reinstate the
former master (as it is a much faster server).
I'm going to be a lot more careful when I add databases to ensure that
they always vacuumed periodically.
I'l also going to add a new nagios script to scan serverlog for any WARN
or ERROR messages for the current day - this way I should get notice of
a problem before it becomes a disaster!
Thanks for your feedback.
John
Andrew Sullivan wrote:
On Sun, Oct 30, 2005 at 09:00:12AM +0000, John Sidney-Woollett wrote:
over 2 billion transactions
DETAIL: You may have already suffered transaction-wraparound data loss.
We have cronscripts that perform FULL vacuums
Not on all your your databases. And anyway
# vacuum template1 every sunday
35 2 * * 7 /usr/local/pgsql/bin/vacuumdb --analyze --verbose template1
# vacuum live DB every day
35 5 * * * /usr/local/bin/psql -c "vacuum verbose analyze" -d bp_live -U
postgres --output /home/postgres/cronscripts/live/vacuumfull.log
Those aren't fill vacuums. There must be some database in there that
you're not telling us about. Do you have anything other than
template0, template1, and bp_live? Also, has template0 always been
frozen?
2) What can I do to recover the data?
Nothing, save for restoring from old backups.
I can failover to the slave server, but what do I need to do to rebuild
the original database?
You'll need to rebuild it from scratch. You could do a switchover
instead, but I think that's risky in this case.
Should I failover now?!! And then start rebuilding the old master
database (using slon, I presume)?
That's what I'd do. It's just like adding a new node.
How do I stop this EVER happening again??!!!
Well, _something_ didn't get vacuumed in time. Better find out what
that was. I'm also extremely surprised you didn't see the warnings
in time -- are you sure you're not overlooking something important in
your logs?
A
_______________________________________________
Slony1-general mailing list
[email protected]
http://gborg.postgresql.org/mailman/listinfo/slony1-general