For a while we were getting spam messages that had images embedded as text and not an attachment. Those are marked as spam but couldn't the random characters of the image data increase the entropy of the database and cause some less than definitive scores?

That aside. It seems like all my ham is bellow 0 so would changing the cut off to something like 2.0 be bad practice?

On 03/29/2011 01:06 PM, Max wrote:
On occasions we will train the .Junk folder and others using sa-learn.
Also here is an example of spam as requested http://www.nomorepasting.com/getpaste.php?pasteid=36037

On 03/29/2011 01:00 PM, John Hardin wrote:
On Tue, 29 Mar 2011, Max wrote:

I'm going to change my required spam cutoff score though

Please, not until other troubleshooting steps are tried!

X-Spam-Status: No, score=2.6 required=3.6 tests=BAYES_50,
X-Spam-Status: No, score=2.6 required=3.6 tests=BAYES_50,HTML_IMAGE_RATIO_06, X-Spam-Status: No, score=3.5 required=3.6 tests=BAYES_60,HTML_IMAGE_RATIO_02,
X-Spam-Status: No, score=1.0 required=3.6 tests=BAYES_50,HTML_MESSAGE,
X-Spam-Status: No, score=2.5 required=3.6 tests=BAYES_50,DATE_IN_FUTURE_06_12,

etc.

One thing that immediately leaps out is you need to train your bayes. All of those are hitting 40-60, which shouldn't be happening, especially for the bulk of spam.

Note that this does not mean "turn on autolearning". Do you have ham and spam training corpora collected? If not, start collecting. You should plan on training with new messages at the very least once a week, daily review of FPs and FNs and dropping then into mail folders that are trained from nightly is standard practice.



Reply via email to