I've been using SpamAssassin happily for the past 6 months or so, and it's dramatically reduced my amount of spam.
Recently, I upgraded to 2.60rc2 for the upgraded rules sets (I was getting a ton of spams with false X-Mailer headers set to Emacs Gnus coming through).
I've trained my Bayes database on a huge corpus of over 16,000 spams and 2,500 ham mails, but now I'm seeing the following in spamassassin -D output after my database has been converted to v2:
debug: using "/home/ben/.spamassassin" for user state dir debug: bayes: 2297 tie-ing to DB file R/O /home/ben/.spamassassin/bayes_toks debug: bayes: 2297 tie-ing to DB file R/O /home/ben/.spamassassin/bayes_seen debug: bayes: found bayes db version 2 debug: bayes: Not available for scanning, only 64 spam(s) in Bayes DB < 200 debug: bayes: 2297 untie-ing debug: bayes: 2297 untie-ing db_toks debug: bayes: 2297 untie-ing db_seen
My bayes_toks and bayes_seen files are huge, the result of much training:
-rw------- 1 ben ben 1331200 Aug 27 09:44 bayes_seen -rw------- 1 ben ben 11464704 Aug 27 10:05 bayes_toks
Has all my nice Bayes training from < 2.60r2 been expired away? When I run sa-learn on my collection of 16,000 old spam mails, it correctly reports that it's already learned all those Message-IDs:
[EMAIL PROTECTED]:~/mail-archive/cur$ sa-learn -D --spam .
debug: [EMAIL PROTECTED]: already learnt correctly, not learning twice
(many many repeats of this with different Message-IDs)
but I still get the "only X spam(s) in Bayes DB < 200" message when I run a message through spamassasin -D.
Should I wipe out my Bayes DBs and re-train completely from scratch?
Here's the output of sa-learn --dump magic:
[EMAIL PROTECTED]:~/.spamassassin$ sa-learn --dump magic
0.000 0 2 0 non-token data: bayes db version
0.000 0 64 0 non-token data: nspam
0.000 0 0 0 non-token data: nham
0.000 0 3125 0 non-token data: ntokens
0.000 0 0 0 non-token data: oldest atime
0.000 0 1062001932 0 non-token data: newest atime
0.000 0 0 0 non-token data: last journal sync atime
0.000 0 0 0 non-token data: last expiry atime
0.000 0 0 0 non-token data: last expire atime delta
0.000 0 0 0 non-token data: last expire reduction count
nspam and nham are completely wrong here, as is ntokens.
I can put up my bayes_seen and bayes_toks files, if it will help for debugging purposes.
Thanks,
Ben
------------------------------------------------------- This sf.net email is sponsored by:ThinkGeek Welcome to geek heaven. http://thinkgeek.com/sf _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk