Hi,

I recently upgraded to SA 3.4.0-rsvnunknown (using https://launchpad.net/~spamassassin/+archive/spamassassin-old on Ubuntu 10.04 LTS) from SA 3.3.2 on different machine running ArchLinux. I use MySQL to store user preferences as well as Bayesin data. No AWL, no autolearning of the Bayesin filter and both machines run sa-update as daily cronjobs.

I migrated my MySQL database containing all settings along with my /etc/spamassassin directory with my static settings/rules to the new machine, ran sa-update, sa-compile and restarted spamd. I was curious to see if 3.4.0 scored a certain message differently than 3.3.2, so I ran "cat spam | spamc -u jes...@ifconfig.se -R" in order to see the result.

To my surprice, the bayesin filter only scored 60-80% (BAYES_60) where it previously scored 90-95% (BAYES_95) .. Has there been any major changes to the bayesin engine in 3.4? (and/or the SQL storage backend for it) .. I copied my spam/ham corpus to the new machine and ran sa-learn on top of the current database in order to see if that helped. Shockingly, it now scored 1-5% (BAYES_05) and I decided to start over.. Ran a "sa-learn --clear" in order to wipe out the old database and re-ran the sa-learn.. Now it scored perfectly 99-100% (BAYES_99)

I also noticed that my old database only had 11k tokens while the new one got about 60k (both the old and new server has hapaxes enabled and was trained using a corpus of about 600 spam and 200 ham)

Any thoughts or ideas what might have caused this?


Regards,
Jesper Wallin

Reply via email to