On 2023-02-28 at 13:38:35 UTC-0500 (Tue, 28 Feb 2023 13:38:35 -0500)
joe a <joea-li...@j4computers.com>
is rumored to have said:

On 2/28/2023 12:05 PM, Jeff Mincy wrote:
  > From: joe a <joea-li...@j4computers.com>
  > Date: Tue, 28 Feb 2023 11:37:34 -0500
  >
> Curious as to why these scores, apparently "stock" are what they are.
  > I'd expect BAYES_999 BODY to count more than BAYES_99 BODY.
  >
  > Noted in a header this morning:
  >
  > *  3.5 BAYES_99 BODY: Bayes spam probability is 99 to 100%
  > *      [score: 1.0000]
  > *  0.2 BAYES_999 BODY: Bayes spam probability is 99.9 to 100%
  > *      [score: 1.0000]
  >
> Was this discussed recently? I added a local score to mollify my sense
  > of propriety.

Those two rules overlap.   A message with bayes >= 99.9% hits both
rules.   BAYES_99 ends at 1.00 not .999.
-jeff


I get that they overlap. I guess my thinker gets in a knot wondering why there is so little weight given to the more certain determination.

It is my understanding that an automated rescoring job was run quite some time ago (before I was on the PMC) to generate the Bayes scores, which determined that to be the best supplemental score to give to the greater certainty. Bayes rules are not rescored routinely in the daily rescoring task because those hits are inherently different at every site. If you wish to determine the ideal scores for YOUR mix of ham and spam, I believe all the tools for doing so are in the SA code tree, but they may not be well-documented.

That's likely to not be a satisfying answer, but as a volunteer project we have no funding for Customer Satisfaction, so the bare unsatisfying truth will have to do.

In my narrow view, anything that is 99.9% certain is probably worth a 5 on it's own. Or, at least should when, summed with BAYES_99, equal 5. As that is what the default "SPAM flag" is.

Appears more experienced or thoughtful persons think otherwise.

I don't know that I'd go that far. Rescoring is not done based on simple clear reason, but on numbers. I'm not sure whether any currently active SA developers are able to explain exactly how the rescoring works.

Yes, it did snow heavily overnight. Yes, I am looking for excuses not to visit that issue.

I vehemently recommend reading all of Justin's scripts and documentation (I think it's all in the 'build' sub-directory) and figuring out how to rescore based on your own mail. That's MUCH less unpleasant than dealing with the snow.


--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire

Reply via email to