On Jun 2, 2005, at 8:27 PM, Matt Kettler wrote:
If one's wrong, they are ALL wrong.
SA's rule scores are evolved based on a real-world test of a
hand-sorted corpus of fresh spam and ham. The whole scoreset is
evolved simultaneously to optimize the placement pattern.
Of course, one thing that can affect accuracy is if some spams are
accidentally misplaced into the ham pile it can cause some heavy score
biasing to occur. A little bit of this is unavoidable, as human
mistakes happen, but a lot of it will cause deflated scores and a lot
of FNs.
The rule scores are optimized for the spam which was sent at the time
that version of SA was released (actually, at the time the rule
scoreset was calculated). Since then, the static SA rules have become
less useful since spammers now write their messages to avoid them. The
only rules which spammers cannot easily avoid are the dynamic ones:
bayes and network checks (RBLs, URIBLs, razor, etc).
On my systems, I raise the scores for the dynamic tests since they are
the only ones which hit a lot of today's spam.
-Kevin