Re: Bayes, Manual and Auto Learning Strategies

Karsten Bräckelmann Tue, 01 Jul 2014 21:16:26 -0700

On Tue, 2014-07-01 at 22:18 -0500, Steve Bergman wrote:
> On 07/01/2014 09:53 PM, Karsten Bräckelmann wrote:
> 
> > Frankly, it appears you don't understand what auto-learning is.
> 
> So please specify, explicitly, what it is. I asked some specific 
> questions about it. And I'm very interested in the answers.


If you want my opinion, please re-phrase your questions. I locally
deleted most of this previous (originally unrelated) thread.

> Is auto-learn still system-wide? I'd need it to apply to individual 
> users. Is it in-memory only? Or can I have it update the users' filedb 
> token databases?

SA itself never was system-wide, neither user-specific. It is both, can
be either. It depends on the context of calling SA.


> If it's now per user and uses the user databases, then I am more than 
> ready to reconsider my opinion. But I've not been able to get a clear 
> answer to this. I haven't had an opportunity to test. And I'd want 
> confirmation from someone in the know anyway, before I changed strategies.

It does not depend on SA, but on how you invoke SA. We cannot give you a
clear answer. It depends on your system, your SMTP, glue, system wide
calling of SA, and possibly per-user invocations even after system-wide.

To be clear: SA is a filter. It does nothing itself, other than
classification. Being called, and at which point, is outside the scope
of SA. Rejecting, deleting, delivering or any other kind of action is
outside the scope of SA. That's actions performed by the calling layer,
based on the result of SA evaluation.


> >> This method shields the user from the worst of the spam, while giving
> >> them full control of what gets relearned as spam.
> >
> > Wrong. It is not "this" (your) method, that shields the user from the
> > worst of the spam. That's SA. Not your style of auto-training.
> 
> Mine is not autotraining at all. it's giving the user a way of 
> explicitly training the backend spam filter.

Quoting your previous post, you "have a line in the users' default
.forward file to train incoming mail as ham". That is auto-training.

> > (Besides, you *are* doing auto-learning, which you just claimed to be a
> > complete joke.)
> 
> No. The messages are assumed ham until the user classifies it as spam. 
> It is explicit learning. Under user control,

Being "assumed" is not the same as being "treated and automatically
reinforced". The latter is what you do. (And btw, Yes. You are
auto-learning.)


> > At this point I won't get into details. It should suffice to highlight
> > that a default ham auto-learning threshold of 0.1 is part of the safety
> > concepts. (See the M::SA::Plugin::AutoLearnThreshold man-page for more.)
> 
> I really don't think you understand what it is I'm doing. Anything below 
> a score of 5.0 goes into their mailbox and learned as ham. If it's ham, 
> that's great. If it's spam, they move it to Junk and it gets learned as 
> spam. auto-learn is as brain dead as the defunct AWL.

I perfectly understood what you are doing.

You didn't understand why that is bad. Failing to explain might be my
bad, though I'll leave re-explaining for tomorrow my timezone. Or you
carefully re-reading my posts.


> > I never checked the TB internal Bayes implementation and auto-learn
> > strategy, but I'd be surprised if they do train on black/white, without
> > any gray area in between.
> 
> Optimally, I would have an "incoming folder" and then the user could 
> manually move the messages from there to spam or ham. But considering 

Which is basically what you came from, using Dovecot antispam plugin
with SA, and dedicated folders "where the user could manually move the
messages" to. Why didn't you just set that up?

(Hint: That's your set-up without auto-learning ham Inbox deliveries.)

> that this was not even remotely necessary with our old email provider, I 
> don't feel that I can put my users to that level of extra trouble that 
> they never even thought about having to deal with before, just because 
> SA is not performing as well as the spam filter they are used to. The 

Do initial manual training. Then get back to us.

> mail needs to go into the inbox directly. And for SA's bayesian tp work, 
> it needs to be assumed as ham initially.

No.

It seems your previous "email provider", whatever that might be, had
some sort of spam filtering service. Now you're on your own.

Which you are, unless you decide to ask for free (as in beer) support by
the community providing the software for free (as in speech) to help you
weed out the spam. You did ask, which is just fine, but your assumptions
are kind of hostile. Like your previous "email provider" would not use
SA internally. He most likely does.


> The only thing I see which might change my view would be explicit 
> details about where autolearn stores its data and how it is used on a 
> per user basis.

So the only thing that might change your view would be reading the docs.
Go read them.

Auto-learn stores its data exactly where Bayes generally stores its
data. In fact, it is the same. Just being triggered to learn
automatically (sic), rather than manually by invoking sa-learn.

Does that change your view?

Ah, no, changing your view also depends on how per-user learning is
done. Well, again, that depends on how you call SA. Or in other terms,
how you view SA depends on how you call SA. True...


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}

Re: Bayes, Manual and Auto Learning Strategies

Reply via email to