Re: Masscheck statistics

RW Wed, 15 May 2019 07:45:51 -0700

On Wed, 15 May 2019 16:41:00 +0300
Henrik K wrote:

> On Wed, May 15, 2019 at 02:15:19PM +0100, RW wrote:
> > 
> > Why are there no QA statistics for BAYES_* rules?  
> 
> How do you propose to generate such statistics, when all contributors
> already are supposed to have fully sorted ham/spam corpuses?  Seems
> kind of redundant as all spam would hit BAYES_99 etc.


The correct way to do this is to divide the corpus into N parts and
test each part with a database trained from the other N-1, but simply
starting with a fresh database and testing each email before training it
is much better then nothing. 


> What comes to BAYES_* scores anyway, they are hand tweaked /
> immutable and not subject to rescoring.

That not the point. Without taking account of Bayes, the other rules
get tuned differently. Bayes has a substantial effect on the score of
almost everything scanned.

I presume this did all work correctly in the past as the optimizer
produced scores like this: 

score DRUGS_MANYKINDS 2.001 1.473 0.841 0.342

It's  very common in 50_scores.cf to see much more aggressive
scores on the non-Bayes score sets.


> Network rules are only run every saturday:
> https://ruleqa.spamassassin.org/20190511-r1859108-n

Why is that necessary when network results should be reused? Most of
them are meaningless if retested after several days.

Re: Masscheck statistics

Reply via email to