On Sunday, November 10 2013, Karsten Bräckelmann wrote: > On Sun, 2013-11-10 at 01:59 -0200, Sergio Durigan Junior wrote: >> Nice, thanks both of you for the answers. >> >> I am now feeding SA with ham from my INBOX, while I also feed it with >> false-negatives (interestingly, I am receiving now *much* more spam than >> I was a week ago...). > > Given what you stated about your spam volume before, entirely possible. > However, you're not using catch-all, do you?
No, I'm not. >> So, I now have yet another question. I let auto_learn active for SA, >> and now for every false-negative SA will learn that it is not spam, > > No. False negative (not classified spam, although it is) is NOT what > triggers auto-learn ham. All right, I misunderstood things then. I assumed that because of sa-learn --dump magic output: ... 0.000 0 37 0 non-token data: nham ... And this number increases every time I receive a message (whether it is a false-negative or a true-negative). Since I have too little spam to train, it is hard to keep up with the number of ham received. But I will read the docs and learn how this works. >> although it is. I'm now thinking that maybe auto_learn is not a good >> idea, at least until I have a good enough Bayes database (strangely, SA >> did not catch *any* spam in the last 48 hours...). Can you confirm >> this? >> >> Thanks a lot, and sorry if I'm asking too much :-). > > Just leave auto-learn enabled. And, yet again, do train both ham and > spam (all, not only mis-classified messages) for initial training. I am already doing that, thanks for the advice. > Auto-learning in SA Bayes is much more than a pure feedback loop, as you > described. A message just being classified ham (< 5.0) is NOT learned as > ham. Neither are messages scored spam (>= 5.0) learned as spam. > > (1) The thresholds for auto-learning are 0.1 and 12.0 by default. Not > the required_score threshold of 5.0 default. > (2) Certain rules are not considered for auto-learning, to prevent self- > feeding. > (3) A minimum of header and body rules are required, to prevent biasing. > > See M::SA::Plugin::AutoLearnThreshold docs for more details. > > Part of the X-Spam-Status header way down the end tells you about SA > auto-learning or not. Hardly surprising, that's > autolearn=(ham|spam|no|unavailable) Great, thanks a lot for the pointers and the explanation. > In your case, I'd say just let SA do it's job. Monitor the results, and > train both ham and spam, at the very least until BAYES_xx rules show up > in X-Spam-Status headers. > > Keep training Bayes after that, to improve performance. Definitely do > train on false positives and negatives. > > Wait, observe, and learn how to read X-Spam headers. :) Nice, I will keep monitoring everything the way I'm doing. And I will definitely read more about the headers and SA in general. Thanks a lot for the replies and the patience. It's been very educational :-). -- Sergio