Re: Suddenly tons of spam

Max Tue, 29 Mar 2011 11:18:09 -0700

For a while we were getting spam messages that had images embedded astext and not an attachment. Those are marked as spam but couldn't therandom characters of the image data increase the entropy of the databaseand cause some less than definitive scores?

That aside. It seems like all my ham is bellow 0 so would changing thecut off to something like 2.0 be bad practice?


On 03/29/2011 01:06 PM, Max wrote:

On occasions we will train the .Junk folder and others using sa-learn.
Also here is an example of spam as requestedhttp://www.nomorepasting.com/getpaste.php?pasteid=36037
On 03/29/2011 01:00 PM, John Hardin wrote:
On Tue, 29 Mar 2011, Max wrote:
I'm going to change my required spam cutoff score though
Please, not until other troubleshooting steps are tried!
X-Spam-Status: No, score=2.6 required=3.6 tests=BAYES_50,
X-Spam-Status: No, score=2.6 required=3.6tests=BAYES_50,HTML_IMAGE_RATIO_06,X-Spam-Status: No, score=3.5 required=3.6tests=BAYES_60,HTML_IMAGE_RATIO_02,
X-Spam-Status: No, score=1.0 required=3.6 tests=BAYES_50,HTML_MESSAGE,
X-Spam-Status: No, score=2.5 required=3.6tests=BAYES_50,DATE_IN_FUTURE_06_12,
etc.
One thing that immediately leaps out is you need to train your bayes.All of those are hitting 40-60, which shouldn't be happening,especially for the bulk of spam.
Note that this does not mean "turn on autolearning". Do you have hamand spam training corpora collected? If not, start collecting. Youshould plan on training with new messages at the very least once aweek, daily review of FPs and FNs and dropping then into mail foldersthat are trained from nightly is standard practice.

Re: Suddenly tons of spam

Reply via email to