On Thu, 10 Mar 2011 15:01:34 -0500 dar...@chaosreigns.com wrote: > On 03/10, Jason Bertoch wrote: > > Wouldn't spam already scored at 15+ be considered a little redundant > > to the corpus? If not, I'm certain I could modify my config to keep > > a copy for processing in the mass checks. > > No. If all spams scored 15+ hit similar tests, and none of those > spams are included in the mass-checks, then those tests might not be > scored highly enough to catch those spams in the future. > > It's a big "if", but "redundant" is certainly not applicable.
This argument seems a bit far fetched when you take into account that corpora may retain spam for years, and that there will be other sites including the higher scoring examples. The scores for high scoring spams are determined by other mail that scores close to 5. If the scores for a particular set of rules systematically reduced over time they would drop below 15 before they dropped below 5, bringing in fresh examples It seems to me that rejecting on blocklists, or over-reliance on spamtraps is more of a problem than rejection on high scores. As far as BAYES is concerned different people train it in different ways so I don't see the sense in strictly mandating train-on-everything.