For a same kind of mail Bayes is giving a variable score. Say i recieved a mail for which Bayes_90 gave a score of 2.101, 3 hours later when the same kind of mail appeared again it was Bayes_60 score with 1.592 and then in the next one hour for the same mail it got a negative value. This has really started making me sweat, Many of the spam messages get detected in one instance and are left undetected in other. I suspected my sa-learn mechanism to be behind this variable score as for the given mails Bayes might have got some more number of HAM tokens during feedback. But the kind of feedback mechansim i have implemented in not reading from any inbox folder but works like this any mails to [EMAIL PROTECTED] is fed to the sa-learn using a perl wrapper script. When i checked my logs for a possible HAM feedback during the time period, I didnt find a single entry for HAM feedback which left me in more dilemma.
What about autolearning? Did you check for that? Recent versions of MailScanner will insert autolearn flags into the spam-hits header.
My next suspect is the Bayes DB expiry. I have read in many documentation that we need expire and rebuild the Bayes DB for old tokens to save disk space from being eaten up. But since i had a lot of hard drive space i decided not to expire the database and now my database size is 39 M.
OUCH.. don't circumvent the expiry mechanism if you don't understand it's full purpose. It's actually rather important because it weeds-out garbage tokens.
A bayes DB that never expires is *highly* vulnerable to bayes poisoning.
