On Sat, June 10, 2006 14:18, yahoo.de said: > >>The most efficient way to fill your database, is to let spambayes train >> on >>unsures and on mistakes. > > what do you mean with train on mistake? > how could i train the SB to recognize emails with advertistment images for > some product and so on? > let see the email has no text, but onla an image in the body! > (i know there are image scanner software for this purpose, but what could > be > done in such cases)
Image spam is indeed a problem. Otoh, in my personal experience it's only a problem in theory. In practice there are enough other spammy characteristics in such emails. I don't know about image scanners specifically for spam detection, but I think it's possible to feed emails trough such image scanners before they are fed to spambayes. I already send my emails trough some conversion filters before they are spambayesed. For example I use mimencode to preconvert some special MIME formats (like used in Asian languages) into 8-bit format. I can imagine one could make an ocr program that converts images to text (if possible) and attaches the text to the email, which is subsequently fed to spambayes. That way, spambayes virtually "reads" the image just like a human does. Actually, you suggest something interesting. I'm going to try a few things and iff they work, I'll post it on the list. Amedee PS: please don't do "reply all", reply to the list. _______________________________________________ [email protected] http://mail.python.org/mailman/listinfo/spambayes Check the FAQ before asking: http://spambayes.sf.net/faq.html
