Skip, Thanks. I did look through the last three months of forum archive. I didn't specifically see this addressed. I did notice some comments about porn images, etc.
I don't think the image size works. I just saved about 20 of my most recent spam images and while the vast majority (1/2) are pushing a stock and most of them are pretty similar in size they aren't all the same. The other 10 images were all of quite varying sizes, even 3 with exactly the same text were deliberately made quite different in size. I was amazed that they had gone to the extent of adding random bits into each of the images, but I guess they knew someone would try to compare them. I am not an expert at looking at the raw data of the e-mail. I can only hope that there is some way they reference them that might be different from images sent to me by friends. But I'm not optimistic that one can determine that by the attributes of the image or the rest of the message itself. So that does lead to examining the image. The main difference is immediately obvious. It's not a picture, it's just some formatted text put into an image. Given that one could hope that some simple image analysis process could quickly classify them as different, or even make it a learning process like the rest of the spam filtering. The big downside is that image analysis is expensive compute and time wise. Not to mention all the various formats of images that the tool would need to process. Perhaps that is constrained by limitations of what e-mail programs will actually render. All in all, not a good outlook it seems. -Alan -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Tuesday, August 01, 2006 4:29 PM To: Alan Arndt Cc: [email protected] Subject: Re: [Spambayes] Spam in Images Alan> I haven't thought of a decent way to filter these types of things. Alan> I hope someone else can and that it can get implemented into Alan> SpamBayes.... Alan> Does anyone have any good suggestions? This topic has come up several times in the past. There is, as yet, no perfect way to identify these sorts of spams. The last time it came up (maybe a month ago), optical character recognition (OCR) came up as a possible means of getting at the text. Unfortunately, the open source tools available fall far short of the mark as far as accuracy is concerned. Perhaps image size would be a helpful clue. I don't know if anyone has tried that before. Skip _______________________________________________ [email protected] http://mail.python.org/mailman/listinfo/spambayes Check the FAQ before asking: http://spambayes.sf.net/faq.html
