Re: Masscheck statistics

Henrik K Wed, 15 May 2019 07:59:26 -0700

On Wed, May 15, 2019 at 03:45:22PM +0100, RW wrote:
> On Wed, 15 May 2019 16:41:00 +0300
> Henrik K wrote:
> 
> > On Wed, May 15, 2019 at 02:15:19PM +0100, RW wrote:
> > > 
> > > Why are there no QA statistics for BAYES_* rules?  
> > 
> > How do you propose to generate such statistics, when all contributors
> > already are supposed to have fully sorted ham/spam corpuses?  Seems
> > kind of redundant as all spam would hit BAYES_99 etc.
> 
> The correct way to do this is to divide the corpus into N parts and
> test each part with a database trained from the other N-1, but simply
> starting with a fresh database and testing each email before training it
> is much better then nothing.


I'm sure all the contributors would be happy to run 10-fold bayes tests all
day and night. :-)

> > What comes to BAYES_* scores anyway, they are hand tweaked /
> > immutable and not subject to rescoring.
> 
> That not the point. Without taking account of Bayes, the other rules
> get tuned differently. Bayes has a substantial effect on the score of
> almost everything scanned.

I think the concept of scoresets is pointless these days anyway.  Does
someone actually run legit mailserver without bayes and network tests?

> > Network rules are only run every saturday:
> > https://ruleqa.spamassassin.org/20190511-r1859108-n
> 
> Why is that necessary when network results should be reused? Most of
> them are meaningless if retested after several days.

It's a known thing and already discussed internally, not a thing for users
list.

Re: Masscheck statistics

Reply via email to