From: "Chris Hastie" <[EMAIL PROTECTED]>

> The industry that I work in is currently having its concept of risk
assessment
> thoroughly shaken. The sort of risks we deal with have three main, largely
> independant factors. For years we've been assigning a value to each of
these
> factors, and then adding them up to come up with a figure representative
of
> relative risk.
>
> Then along came some bright spark who new a little bit about statistics.
He
> showed that we can estimate the risk of each of the three factors. Then he
> pointed out that for someone to be injured, all three had to happen. And
the
> probability of a AND b AND c is the *product* of the three probabilities,
not
> the sum. It all makes sense. And frighteningly, gives quite different
results
> to the way we've used for years.
>
> So I'm thinking about writing myself a policy server for Postfix. I want
to
> consider different things, weight them and use a combination of factors to
> decide whether or not to reject mail. Much like SA does. Thinking about
how to
> weight things, I realised that the same principles could be applied to
spam.
> Perhaps.
>
> For example (fictional figures here), say 95% of mail from clients in a
> particular RBL is spam. We could say, then, that such an item of mail has
a
> probability of 0.05 of being ham. 80% of mail from clients giving a
particular
> form of HELO is spam - probability of 0.2 that it is ham. 60% of SPF fails
are
> spam - probability of 0.4 that such a mail is ham.
>
> Thus if a piece of mail has failed all three of these tests, the
probability of
> it being ham is 0.05 * 0.2 * 0.4 = 0.004, or 1/250. Or put another way, we
can
> be 99.6% sure it is spam.
>
> Now I'm neither a stastician nor an expert in fighting spam, so I'm sure
there
> are some flaws in this idea somewhere. One of them is probably that the
various
> tests available are not statistically independant. But as a basic
principle, is
> there mileage in this, or should I stick with addition, or find another
way of
> weighting stuff altogether?
>
> I'm actually going to be away from my computer for the next ten days, so I
> apologise if I don't promptly respond to your responsed, but rest assured
I
> will read them with great interest when I get back...
>
> Thanks
> -- 
> Chris Hastie

They got there before you did, Chris. That is how the scoring system
within SpamAssassin itself works. It is also how the scores on rules are
set - nominally.

{^_^}


Reply via email to