Hi, On Thu, Sep 4, 2014 at 1:44 PM, Karsten Bräckelmann <guent...@rudersport.de> wrote:
> On Wed, 2014-09-03 at 23:50 -0400, Alex wrote: > > > > > I looked in the quarantined message, and according to the _TOKEN_ > > > > header I've added: > > > > > > > > X-Spam-MyReport: Tokens: new, 47; hammy, 7; neutral, 54; spammy, 16. > > > > > > > > Isn't that sufficient for auto-learning this message as spam? > ^^^^ > That's clearly referring to the _TOKEN_ data in the custom header, is it > not? > Yes. Burning the candle at both ends. Really overworked. > > > That has absolutely nothing to do with auto-learning. Where did you get > > > the impression it might? > > > > If the conditions for autolearning had been met, I understood that it > > would be those new tokens that would be learned. > > Learning is not limited to new tokens. All tokens are learned, > regardless their current (h|sp)ammyness. > > Still, the number of (new) tokens is not a condition for auto-learning. > That header shows some more or less nice information, but in this > context absolutely irrelevant information. > I understood "new" to mean the tokens that have not been seen before, and would be learned if the other conditions were met. > Auto-learning in a nutshell: Take all tests hit. Drop some of them with > certain tflags, like the BAYES_xx rules. For the remaining rules, look > up their scores in the non-Bayes scoreset 0 or 1. Sum up those scores to > a total, and compare with the auto-learn threshold values. For spam, > also check there are at least 3 points each by header and body rules. > Finally, if all that matches, learn. > Is it important to understand how those three points are achieved or calculated? > > Okay, of course I understood the difference between points and tokens. > > Since the points were over the specified threshold, I thought those > > new tokens would have been added. > > As I have mentioned before in this thread: It is NOT the message's > reported total score that must exceed the threshold. The auto-learning > discriminator uses an internally calculated score using the respective > non-Bayes scoreset. > Very helpful, thanks. Is there a way to see more about how it makes that decision on a particular message? Thanks, Alex