User to user post... ( I am not a developer) I can see where this my be something to consider 10 or 20 years from now when we all have supercomputers in our pockets. :-)
But until then... I would concentrate on implementing the latest Spamassassin 3.0.2 It is a bit of work to get it working correctly. There are a lot of plug-ins, but they can be tested with spamassassin -D --lint etc. until you get them all running correctly. But, once you get all of the supporting programs installed, dcc, pyzor, db, dns, uri, etc, etc, etc (this upgrade is not for woosies) this new version kicks major spam ass..assin. The new URIBL function is the greatest thing since rbl, dcc and Pyzor. It detects the spam website in the email. It doesn't matter if they post an image in the email with no text, if the image has an IP or web address that is reported as spam they are likely to trip the spam points. I just took care of the same issue with the upgrade to 3.0.2 Good luck! -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Wednesday, March 02, 2005 1:36 PM To: [email protected] Subject: Suggestion: OCR I've kust made tests with gocr (a OCR command-line linux software) and it proves to be safe, i.e. if it fails to detect a text, you see some nonsense collection of symbols. It can handle pnm (and some other formats) directly and cannot handle gifs and jpegs directly. It supposses the text is darker than the background, so some preprocessing is needed (i don't know how to invert the colors with linux tools, but it's a matter of googling). What i managed to detect is [quote] click here to get removed all other enquiries send to: [EMAIL PROTECTED],com [/quote] The real picture is in the attachment. Two OCR errors were made. The command that used was giftopnm tG0rzUDQO.gif | pnmtojpeg --quality=100 |djpeg -pnm -grayscale | gocr - That gives you the quote on the standard output. Sometimes you have to split animated gif's into a sequence of .png's. Then it looks like this gif2png tG0rzUDQO.gif [.png, .p01, .p02, ... are generated] pngtopnm tG0rzUDQO.png | pnmtojpeg --quality=100 |djpeg -pnm -grayscale | gocr - For jpegs, you have a simpler procedure for the last two steps. Time for working is neglectible (at least on my machine). The question is whether it is implementable to convert all pictures via this tool gocr into text and then run the usual SA tests on the so combined email? If yes, could someone do that? Thanks a lot. Sasha. _______________________________________________ Join Excite! - http://www.excite.com The most personalized portal on the Web!
