On 03/10/2011 11:49 AM, Jason Bertoch wrote:
> On 2011/03/10 2:17 PM, Adam Katz wrote:
>> I figure spam capped at 15+ points would be fine, but you'll need 
>> developer consensus on that.
>> 
> 
> Wouldn't spam already scored at 15+ be considered a little redundant
> to the corpus?  If not, I'm certain I could modify my config to keep
> a copy for processing in the mass checks.

You read me in reverse.  Spam "capped at 15+" means spam that scores no
more than 15 points (since that was rejected or deleted).  If a minority
of our corpora are limited to lower-scoring spams, the genetic algorithm
would be slightly more biased in favor of the borderline cases and FNs.

As Darxus points out, if the majority of our corpora pruned out such
high-scoring messages, we would risk losing that certainty.

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to