On Wed, May 15, 2019 at 03:45:22PM +0100, RW wrote: > On Wed, 15 May 2019 16:41:00 +0300 > Henrik K wrote: > > > On Wed, May 15, 2019 at 02:15:19PM +0100, RW wrote: > > > > > > Why are there no QA statistics for BAYES_* rules? > > > > How do you propose to generate such statistics, when all contributors > > already are supposed to have fully sorted ham/spam corpuses? Seems > > kind of redundant as all spam would hit BAYES_99 etc. > > The correct way to do this is to divide the corpus into N parts and > test each part with a database trained from the other N-1, but simply > starting with a fresh database and testing each email before training it > is much better then nothing.
I'm sure all the contributors would be happy to run 10-fold bayes tests all day and night. :-) > > What comes to BAYES_* scores anyway, they are hand tweaked / > > immutable and not subject to rescoring. > > That not the point. Without taking account of Bayes, the other rules > get tuned differently. Bayes has a substantial effect on the score of > almost everything scanned. I think the concept of scoresets is pointless these days anyway. Does someone actually run legit mailserver without bayes and network tests? > > Network rules are only run every saturday: > > https://ruleqa.spamassassin.org/20190511-r1859108-n > > Why is that necessary when network results should be reused? Most of > them are meaningless if retested after several days. It's a known thing and already discussed internally, not a thing for users list.