On Jun 30, 2007, at 1:20 AM, Marc Perkel wrote:




Tom Allison wrote:
For some years now there has been a lot of effective spam filtering using statistical approaches with variations on Bayesian theory, some of these are inverse Chi Square modifications to Niave Bayes or even CRM114 and other "languages" have been developed to improve the scoring of statistical analysis of spam. For all statistical processes the spamicity is always between 0 and 1.
<snip>

Many Thanks for those of you who have read this far for your patience and consideration.

Tom, I suggested something somilar to that years ago and I'd still like to see it tried out. I wonder what would happen if you stripped ot the body and ran bayes just on the headers and the rules and let bayes figure it out. You do have to have some points to start with to get bayes pointed in the right direction. But you could use black lists and white lists to do bayes training. Also needs more rules to identify ham and not just rules to identify spam.

I was under the belief that there were Ham-centric tests that would result in negative point scorings.

Ham doesn't try to be evasive. It's pretty easy to identify. Without SA tagging much of it falls to <<0.5 and whitelisting would capture much of the exceptions.

As for headers only testing -- The first five lines of stock spam is very telling...

My question about SA is the PerMsgStatus (I think) Is this the place to retrieve all the rules information? I know today you can get a list of all the rules that HIT, but is there where you would look to find all the rules that were attempted? Or is there a better place for it?

Reply via email to