Robert Menschel <[EMAIL PROTECTED]> schrieb am 17.01.2006 03:41:39: > sad> I'm currently trying to build up a new bayes DB here, ... > sad> ... yet it poses a problem to build up the ham part. > sad> ... Much of the inbound smtp mail either contains private or > sad> confidential information, so I cannot use them as I keep the > sad> source of the bayes messages in a Notes DB serverside - I'd run > sad> into privacy issues. > > If you keep the source of your bayes messages in a Notes DB, then you > should have had enough ham to retrain your bayes with, no?
Uhm, no? If you reread my message, you see that I have used autolearning before instead of manually training. I just ditched the old bayes DB and disabled autolearning, now building up a new bayes DB. I'm keeping the full corpus of both ham and spam to have more control over the bayes DB. Keeping the the sources of it enables me to always reproduce the DB and especially to remove selected messages containing tokens that prove to be problematic in the future. Of course I could do that with relearning wrongly tagged messages as ham - but 1 message as ham usually doesn't make much of a difference for bayes. > Bigger problem: bayes can only learn what it's taught. If you have > ham that really should be trained, and because of privacy issues it > should not be kept after training, then you really should develop a > system which will enable you to train without retaining. Bayes works > best when properly and fully trained, not just trained on "those > unimportant non-private emails are ham". Yes, I might forfeit the storage of ham mails in a Notes DB for that, BUT... I really doubt that the management would even give permission to send those messages into SA. When I say "confidential" it is really one of those few times where it means "confidential" ;) Our customers are mostly big banks, big insurance companies and the German government. Even the slightest risk of leaking _any_ kind of information could get us into problems noone even wants to imagine here... > I can't make recommendations on how to do so in your system, but > you'll get better results from bayes if you figure out how to manage > it. That's natural. I just wanted to know how bad it will come at me ;) regards sash -------------------------------------------------- Sascha Runschke Netzwerk Administration IT-Services ABIT AG Robert-Bosch-Str. 1 40668 Meerbusch Tel.:+49 (0) 2150.9153.226 Mobil:+49 (0) 173.5419665 mailto:[EMAIL PROTECTED] http://www.abit.net http://www.abit-epos.net --------------------------------- Sicherheitshinweis zur E-Mail Kommunikation / Security note regarding email communication: http://www.abit.net/sicherheitshinweis.html