Hi, SpamAssassin folks.

I've been using SpamAssassin happily for the past 6 months or so, and it's dramatically reduced my amount of spam.

Recently, I upgraded to 2.60rc2 for the upgraded rules sets (I was getting a ton of spams with false X-Mailer headers set to Emacs Gnus coming through).

I've trained my Bayes database on a huge corpus of over 16,000 spams and 2,500 ham mails, but now I'm seeing the following in spamassassin -D output after my database has been converted to v2:

debug: using "/home/ben/.spamassassin" for user state dir
debug: bayes: 2297 tie-ing to DB file R/O /home/ben/.spamassassin/bayes_toks
debug: bayes: 2297 tie-ing to DB file R/O /home/ben/.spamassassin/bayes_seen
debug: bayes: found bayes db version 2
debug: bayes: Not available for scanning, only 64 spam(s) in Bayes DB < 200
debug: bayes: 2297 untie-ing
debug: bayes: 2297 untie-ing db_toks
debug: bayes: 2297 untie-ing db_seen

My bayes_toks and bayes_seen files are huge, the result of much training:

-rw-------    1 ben      ben       1331200 Aug 27 09:44 bayes_seen
-rw-------    1 ben      ben      11464704 Aug 27 10:05 bayes_toks

Has all my nice Bayes training from < 2.60r2 been expired away? When I run sa-learn on my collection of 16,000 old spam mails, it correctly reports that it's already learned all those Message-IDs:

[EMAIL PROTECTED]:~/mail-archive/cur$ sa-learn -D --spam .

debug: [EMAIL PROTECTED]: already learnt correctly, not learning twice
(many many repeats of this with different Message-IDs)


but I still get the "only X spam(s) in Bayes DB < 200" message when I run a message through spamassasin -D.

Should I wipe out my Bayes DBs and re-train completely from scratch?

Here's the output of sa-learn --dump magic:

[EMAIL PROTECTED]:~/.spamassassin$ sa-learn --dump magic
0.000 0 2 0 non-token data: bayes db version
0.000 0 64 0 non-token data: nspam
0.000 0 0 0 non-token data: nham
0.000 0 3125 0 non-token data: ntokens
0.000 0 0 0 non-token data: oldest atime
0.000 0 1062001932 0 non-token data: newest atime
0.000 0 0 0 non-token data: last journal sync atime
0.000 0 0 0 non-token data: last expiry atime
0.000 0 0 0 non-token data: last expire atime delta
0.000 0 0 0 non-token data: last expire reduction count


nspam and nham are completely wrong here, as is ntokens.

I can put up my bayes_seen and bayes_toks files, if it will help for debugging purposes.

Thanks,

Ben





-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to