On Tue, 14 Aug 2018, micah anderson wrote:

I'm trying to understand the ruleQA results because I'm trying to track
down how common the rule FRNAME_IN_MSG_NO_SUBJ is spammy.

I load the latest rules: 

That run only has three masscheck corpora. You might want to look earlier or later to a run that has more, for example:


and I see the S/O value is 1.0, which is a rule that hits only on spam

Or close enough that rounding hides the ham hits.

(a rule that only hits on ham is 0.0, a rule that doesn't anything is

but how can I tell how many messages are part of the corpus?

As RW said, hover over the percentages.

Also, the percentages seem very low: 1.5192% Spam, and .0005%
Ham... 1.5% seems low to me to be adding 3.5 score to this rule, but
what do I know... which is why I'm asking.

It's not so much the raw amount of spam it hits, it's that it hits spam that few other rules hit, or that it hits spam that other rules hit but that doesn't score high enough with those other rules.

You also want to look at the score-map section when evaluating a rule.

I don't care when a rule hits a lot of spam scoring 20+ points. I care a lot if it hits spams that score 1-4 points.

Do you happen to be seeing FPs with this rule?

