On Jun 30, 2007, at 1:20 AM, Marc Perkel wrote:
Tom Allison wrote:
For some years now there has been a lot of effective spam
filtering using statistical approaches with variations on Bayesian
theory, some of these are inverse Chi Square modifications to
Niave Bayes or even CRM114 and other "languages" have been
developed to improve the scoring of statistical analysis of spam.
For all statistical processes the spamicity is always between 0
and 1.
<snip>
Many Thanks for those of you who have read this far for your
patience and consideration.
Tom, I suggested something somilar to that years ago and I'd still
like to see it tried out. I wonder what would happen if you
stripped ot the body and ran bayes just on the headers and the
rules and let bayes figure it out. You do have to have some points
to start with to get bayes pointed in the right direction. But you
could use black lists and white lists to do bayes training. Also
needs more rules to identify ham and not just rules to identify spam.
I was under the belief that there were Ham-centric tests that would
result in negative point scorings.
Ham doesn't try to be evasive. It's pretty easy to identify.
Without SA tagging much of it falls to <<0.5 and whitelisting would
capture much of the exceptions.
As for headers only testing -- The first five lines of stock spam is
very telling...
My question about SA is the PerMsgStatus (I think) Is this the place
to retrieve all the rules information? I know today you can get a
list of all the rules that HIT, but is there where you would look to
find all the rules that were attempted? Or is there a better place
for it?