On Wed, 2010-11-10 at 18:04 +0100, Karsten Bräckelmann wrote:
> On Tue, 2010-11-09 at 22:57 -0800, Karl Meyer wrote:

> > > Using the Inbox rather than a dedicated ham folder therefore is NOT a
> > > good idea.
> > 
> > The problem is, that I can't persuade about 120 users to store all their ham
> > below a defined folder. They want to sort their mails into several folders
> > they created by their own.
> 
> You should be fine with some initial training of hand-sorted ham in a
> dedicated folder. Then let auto-learn kick in.

Other than the already established issues of delete and expunge, and
re-learning the same messages over and over again, there is another
problem with the "learn from Inbox" approach.

You would, effectively, implement a rather poor auto-learning mechanism.
The auto-learning available in SA does have quite some constraints, to
prevent bad training. Automatic, non-supervised training of the Inbox
does not have *any* such precaution, and instead blindly trusts the
initial classification.

You cannot tell your users to collect hand-classified ham? Do you really
believe you can tell your users, not *ever* to just delete the
occasional spam in their Inbox, rather than moving it for training? So
the user is in a hurry, he's late for the meeting, the project deadline
is close, the day's been a disaster anyway and the headache... No, don't
want that junk. Delete. Off there goes the spam into the great bin-
bucket, after it has been learned as ham.

And then there's the college next cubicle, on vacation for three weeks.
Meanwhile, all the FNs piling up in his Inbox are being trained as ham.
Despite the fact that all that would have been needed to properly
classify them are a bit of Bayes training to cross the threshold. As
spam, though, not ham -- bootstrapping as a dis-service for everyone
else in the office.


No, automatically training the Inbox is not a safe approach.

I recommend reading the sections Getting Started and Effective Training
in the sa-learn man page.


-- 
char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}

Reply via email to