> On Montag, 22. Mai 2006 01:12 Sergei Gerasenko wrote: > > But I'm reading everywhere that it's not a paricularly good idea. > > There's not a single answer to whether which method is best. I use a > sitewide bayes, and it works good. > > sitewide: > + spam learned helps all users > + good when trained with 100% correct spam/ham > - dangerous when learning false spam/ham > > user: > + each user can train themselves > - users most often don't train good, or not at all, or false (YMMV) > - performance > - disk space
We use site-wise bayes here too. While users can report FN's and FP's, IT staff reviews the submissions prior to actual learning. This prevents people from learning various e-mail lists they've signed up for as SPAM-- we just send the report back and say, try unsubscribing first. The approach has worked fairly well for us here. The number of users that actually report anything is probably around 5%, so I'd say that a per-user system would be less effective for our users. (Either that or the other 95% of users get no spam ever.) Bret