On Sunday, November 10 2013, Karsten Bräckelmann wrote:

> On Sun, 2013-11-10 at 01:59 -0200, Sergio Durigan Junior wrote:
>> Nice, thanks both of you for the answers.
>> 
>> I am now feeding SA with ham from my INBOX, while I also feed it with
>> false-negatives (interestingly, I am receiving now *much* more spam than
>> I was a week ago...).
>
> Given what you stated about your spam volume before, entirely possible.
> However, you're not using catch-all, do you?

No, I'm not.

>> So, I now have yet another question.  I let auto_learn active for SA,
>> and now for every false-negative SA will learn that it is not spam,
>
> No. False negative (not classified spam, although it is) is NOT what
> triggers auto-learn ham.

All right, I misunderstood things then.  I assumed that because of
sa-learn --dump magic output:

  ...
  0.000          0         37          0  non-token data: nham
  ...

And this number increases every time I receive a message (whether it is
a false-negative or a true-negative).  Since I have too little spam to
train, it is hard to keep up with the number of ham received.

But I will read the docs and learn how this works.

>> although it is.  I'm now thinking that maybe auto_learn is not a good
>> idea, at least until I have a good enough Bayes database (strangely, SA
>> did not catch *any* spam in the last 48 hours...).  Can you confirm
>> this?
>> 
>> Thanks a lot, and sorry if I'm asking too much :-).
>
> Just leave auto-learn enabled. And, yet again, do train both ham and
> spam (all, not only mis-classified messages) for initial training.

I am already doing that, thanks for the advice.

> Auto-learning in SA Bayes is much more than a pure feedback loop, as you
> described. A message just being classified ham (< 5.0) is NOT learned as
> ham. Neither are messages scored spam (>= 5.0) learned as spam.
>
> (1) The thresholds for auto-learning are 0.1 and 12.0 by default. Not
>     the required_score threshold of 5.0 default.
> (2) Certain rules are not considered for auto-learning, to prevent self-
>     feeding.
> (3) A minimum of header and body rules are required, to prevent biasing.
>
> See M::SA::Plugin::AutoLearnThreshold docs for more details.
>
> Part of the X-Spam-Status header way down the end tells you about SA
> auto-learning or not. Hardly surprising, that's
>   autolearn=(ham|spam|no|unavailable)

Great, thanks a lot for the pointers and the explanation.

> In your case, I'd say just let SA do it's job. Monitor the results, and
> train both ham and spam, at the very least until BAYES_xx rules show up
> in X-Spam-Status headers.
>
> Keep training Bayes after that, to improve performance. Definitely do
> train on false positives and negatives.
>
> Wait, observe, and learn how to read X-Spam headers. :)

Nice, I will keep monitoring everything the way I'm doing.  And I will
definitely read more about the headers and SA in general.

Thanks a lot for the replies and the patience.  It's been very
educational :-).

-- 
Sergio

Reply via email to