For a while we were getting spam messages that had images embedded as
text and not an attachment. Those are marked as spam but couldn't the
random characters of the image data increase the entropy of the database
and cause some less than definitive scores?
That aside. It seems like all my ham is bellow 0 so would changing the
cut off to something like 2.0 be bad practice?
On 03/29/2011 01:06 PM, Max wrote:
On occasions we will train the .Junk folder and others using sa-learn.
Also here is an example of spam as requested
http://www.nomorepasting.com/getpaste.php?pasteid=36037
On 03/29/2011 01:00 PM, John Hardin wrote:
On Tue, 29 Mar 2011, Max wrote:
I'm going to change my required spam cutoff score though
Please, not until other troubleshooting steps are tried!
X-Spam-Status: No, score=2.6 required=3.6 tests=BAYES_50,
X-Spam-Status: No, score=2.6 required=3.6
tests=BAYES_50,HTML_IMAGE_RATIO_06,
X-Spam-Status: No, score=3.5 required=3.6
tests=BAYES_60,HTML_IMAGE_RATIO_02,
X-Spam-Status: No, score=1.0 required=3.6 tests=BAYES_50,HTML_MESSAGE,
X-Spam-Status: No, score=2.5 required=3.6
tests=BAYES_50,DATE_IN_FUTURE_06_12,
etc.
One thing that immediately leaps out is you need to train your bayes.
All of those are hitting 40-60, which shouldn't be happening,
especially for the bulk of spam.
Note that this does not mean "turn on autolearning". Do you have ham
and spam training corpora collected? If not, start collecting. You
should plan on training with new messages at the very least once a
week, daily review of FPs and FNs and dropping then into mail folders
that are trained from nightly is standard practice.