[SAtalk] Bayes Database Corruption (possibly)

Rick Mallett Wed, 28 Jan 2004 20:38:42 -0800

I've got my bayes database files in a tmpfs filesystem and I wanted to
back them up every hour via cron so that I would have something to start
from in the event of a system crash, so I wrote a script to lock and copy
them and I decided to throw in a db_verify just to be sure I had a good
copy. What I discovered is that about 50% of the time db_verify fails
on bayes_toks as in


  db_verify bayes_toks

  db_verify: Page 2289: hash page has bad prev_pgno
  db_verify: Page 2110: hash page has bad prev_pgno
  db_verify: Page 2377: hash page has bad prev_pgno
  db_verify: DB->verify: bayes_toks: DB_VERIFY_BAD: Database verification failed

It doesn't seem to make any difference to ongoing operations
and I don't know if I should just ignore the problem and remove the
db_verify from my backup script. Any comments? I'd be happy to help to
diagnose this problem, if it really is a problem, but I'd need some
advice on where to look in the code.

One reason it only happens some of the time, BTW, is that my script
also included "sa-learn --force-expiry" and I've discovered that the
act of rebuilding the database always seemed to clean up the problem.
However, I have a really busy server and by the time the expiry was
finished the journal had often grown to the maximum size and while
the expiry was running there would be another process waiting for
a lock so that the journal could be rolled into the database, and
if that happened there would be a 50/50 chance that corruption
would occur.

I've now removed the "force-expiry" and sometimes bayes_toks can go
through several db_verify'ed backup cycles before I start getting
messages like those shown above - sometimes it happens right away.

Sorry to be so long-winded. One final question. I had included the
"sa-learn --force-expiry" in the script because the opportunistic
expiry sometimes takes 3 to 4 minutes on my server and if it runs at
an unopportune time when the system load is exceptionally high it
might even take more than 10 minutes at which point a journalizing
process might come along and delete the lock file and really corrupt
the database. Or so I figured. I've now removed the forced expiry
to try to simplify the operation, but I'd be interested in
knowing if forcing an expiry every hour instead of waiting
for the automatic expiry is a good idea or not.

System details are
  Sun Solaris 8 (sparc), SA 2.63, Mimedefang 2.39, perl 5.8.2, db 4.2.52

- rick


-------------------------------------------------------
The SF.Net email is sponsored by EclipseCon 2004
Premiere Conference on Open Tools Development and Integration
See the breadth of Eclipse activity. February 3-5 in Anaheim, CA.
http://www.eclipsecon.org/osdn
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

[SAtalk] Bayes Database Corruption (possibly)

Reply via email to