On 03/10/2011 11:49 AM, Jason Bertoch wrote: > On 2011/03/10 2:17 PM, Adam Katz wrote: >> I figure spam capped at 15+ points would be fine, but you'll need >> developer consensus on that. >> > > Wouldn't spam already scored at 15+ be considered a little redundant > to the corpus? If not, I'm certain I could modify my config to keep > a copy for processing in the mass checks.
You read me in reverse. Spam "capped at 15+" means spam that scores no more than 15 points (since that was rejected or deleted). If a minority of our corpora are limited to lower-scoring spams, the genetic algorithm would be slightly more biased in favor of the borderline cases and FNs. As Darxus points out, if the majority of our corpora pruned out such high-scoring messages, we would risk losing that certainty.
signature.asc
Description: OpenPGP digital signature