When I first started using SA many months ago, I used Bayes for a while, but
stopped using it because I seemed to be having "issues" with the values it was
assigning.  At the time, I didn't have time to properly "feed" it, so I just
disabled it.

Recently, I've decided to enable it again.  Today, I gave it a go.  I am
running MailScanner w/ SA v2.63 w/ sendmail.  I have /etc/mail/spamassassin/
as my SA directory -- and local.cf in it is softlinked to
/opt/MailScanner/etc/spam.assassin.prefs.conf -- so that SA and MS w/ SA
always use the same config.  I believe that this is the proper way to do it.

I went into local.cf and here are the relevant lines within:

---
auto_whitelist_path        /var/spool/spamassassin/auto-whitelist
auto_whitelist_file_mode   0600
bayes_path                 /var/spool/spamassassin/bayes
bayes_file_mode            0600

bayes_auto_expire 0

use_bayes 0
---

I made sure that /var/spool/spamassassin/ was empty and changed use_bayes from
0 to 1 -- and restarted MailScanner.

I expected that, at this point, nothing would happen differently -- since I
hadn't fed Bayes yet.

I then decided to feed it some ham.  I have an mbox format file of 2700 ham
messages.  I did the following:

sa-learn --ham --showdots --mbox my_ham_file

It chugged at it for a while and then seemed to complete successfully.  I then
went and looked in /var/spool/spamassassin/ and saw that it had properly
created bayes_journal, bayes_seen, and bayes_toks.  bayes_toks was about 700k.

A few minutes later, I noticed that incoming HAM was being marked as SPAM
suddenly!  The headers all show BAYES_99 tags!  It was suddenly tagging ALL
messages as BAYES_99 likely spam.  I looked at the bayes dir and bayes_toks
had grown to 1.3MB in only a few minutes -- almost double what it was after I
had fed it the ham.  I assume it is autolearning already?  Even though I
haven't fed it spam yet?

In any case, I then turned use_bayes back off -- and wrote this email trying
to determine what is going on.

I did a "sa-learn --dump magic" and I get NO output at all.  It works for a
few seconds and then just goes back to a prompt.  "sa-learn -D --dump magic"
gives:

---
debug: Score set 0 chosen.
debug: running in taint mode? yes
debug: Running in taint mode, removing unsafe env vars, and resetting PATH
debug: PATH included '/usr/bin', keeping.
debug: PATH included '/bin', keeping.
debug: PATH included '/usr/sbin', keeping.
debug: PATH included '/sbin', keeping.
debug: PATH included '/usr/local/bin', keeping.
debug: PATH included '/usr/sbin', keeping.
debug: PATH included '/usr/games', keeping.
debug: PATH included '/home/jgoggan/bin', keeping.
debug: PATH included '/sbin', keeping.
debug: PATH included '/usr/sbin', keeping.
debug: PATH included '/usr/games', keeping.
debug: Final PATH set to:
/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/sbin:/usr/games:/home/jgoggan/bin:/sbin:/usr/sbin:/usr/games
debug: using "/usr/local/share/spamassassin" for default rules dir
debug: using "/etc/mail/spamassassin" for site rules dir
debug: using "/home/johnroot/.spamassassin/user_prefs" for user prefs file
debug: Score set 0 chosen.
debug: Initialising learner
---

That user prefs file is basically empty -- nothing in it that isn't a
comment.  Do I have to specify /etc/mail/spamassassin/local.cf?  I thought it
would use that automatically since it has the site rules dir correct.

I also tried doing the "--dump magic" while specifying the DBPATH and such
just to be sure -- no difference.

Any thoughts/suggestions/corrections?

Also -- can I turn autolearn off somewhere?  I only want Bayes to learn from
emails that I specifically FEED with sa-learn --ham and --spam.

Thanks much!

 - John...

Reply via email to