On Wed, 2010-11-10 at 18:04 +0100, Karsten Bräckelmann wrote: > On Tue, 2010-11-09 at 22:57 -0800, Karl Meyer wrote:
> > > Using the Inbox rather than a dedicated ham folder therefore is NOT a > > > good idea. > > > > The problem is, that I can't persuade about 120 users to store all their ham > > below a defined folder. They want to sort their mails into several folders > > they created by their own. > > You should be fine with some initial training of hand-sorted ham in a > dedicated folder. Then let auto-learn kick in. Other than the already established issues of delete and expunge, and re-learning the same messages over and over again, there is another problem with the "learn from Inbox" approach. You would, effectively, implement a rather poor auto-learning mechanism. The auto-learning available in SA does have quite some constraints, to prevent bad training. Automatic, non-supervised training of the Inbox does not have *any* such precaution, and instead blindly trusts the initial classification. You cannot tell your users to collect hand-classified ham? Do you really believe you can tell your users, not *ever* to just delete the occasional spam in their Inbox, rather than moving it for training? So the user is in a hurry, he's late for the meeting, the project deadline is close, the day's been a disaster anyway and the headache... No, don't want that junk. Delete. Off there goes the spam into the great bin- bucket, after it has been learned as ham. And then there's the college next cubicle, on vacation for three weeks. Meanwhile, all the FNs piling up in his Inbox are being trained as ham. Despite the fact that all that would have been needed to properly classify them are a bit of Bayes training to cross the threshold. As spam, though, not ham -- bootstrapping as a dis-service for everyone else in the office. No, automatically training the Inbox is not a safe approach. I recommend reading the sections Getting Started and Effective Training in the sa-learn man page. -- char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4"; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1: (c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}