On Wed, 15 May 2019 16:41:00 +0300 Henrik K wrote: > On Wed, May 15, 2019 at 02:15:19PM +0100, RW wrote: > > > > Why are there no QA statistics for BAYES_* rules? > > How do you propose to generate such statistics, when all contributors > already are supposed to have fully sorted ham/spam corpuses? Seems > kind of redundant as all spam would hit BAYES_99 etc.
The correct way to do this is to divide the corpus into N parts and test each part with a database trained from the other N-1, but simply starting with a fresh database and testing each email before training it is much better then nothing. > What comes to BAYES_* scores anyway, they are hand tweaked / > immutable and not subject to rescoring. That not the point. Without taking account of Bayes, the other rules get tuned differently. Bayes has a substantial effect on the score of almost everything scanned. I presume this did all work correctly in the past as the optimizer produced scores like this: score DRUGS_MANYKINDS 2.001 1.473 0.841 0.342 It's very common in 50_scores.cf to see much more aggressive scores on the non-Bayes score sets. > Network rules are only run every saturday: > https://ruleqa.spamassassin.org/20190511-r1859108-n Why is that necessary when network results should be reused? Most of them are meaningless if retested after several days.