R: Stock spam in images

2006-10-02 Thread Giampaolo Tomassoni
I'm a newbie to the list and have been scanning recent posts to see if what I'm about to ask about has been covered but I haven't seen anything yet. Lately I have been getting more and more of the stock alert spam but now all the good info is in an image and typically following the image is

R: Stock spam in images

2006-10-02 Thread Giampaolo Tomassoni
...omissis... How about the FuzzyOCR plugin? That has been discussed quite a bit here recently. http://wiki.apache.org/spamassassin/FuzzyOcrPlugin -- Bowie And, by the way, it seems to work! Actually, the only limit I see is the own-made FuzzyOcr.words (and, maybe, the fact that

R: Stock spam in images

2006-10-02 Thread Giampaolo Tomassoni
On Mon, Oct 02, 2006 at 03:18:58PM +0100, Randal, Phil wrote: undetected). Wouldn't it be better to inject the detected text back to SA? There should be enough variants of spam worlds to let SA fuzzily catch the ones from images. I think so. Some of the words would be perfectly

R: Stock spam in images

2006-10-02 Thread Giampaolo Tomassoni
The real problem is the potentially fuzzy output from the ocr engine: shure all the copies of the very same spam would be detected the same, but what about slightly different copies? Would the use the sa force approach be feasible? The use of String::Approx in fuzzyocr has shurely a meaning,

R: Stock spam in images

2006-10-02 Thread Giampaolo Tomassoni
You'd need some clever rules... As an example, the word stock is perfectly valid in emails, but if you found it in an attached image you'd be pretty sure it was spam. It would be perfectly valid in a, say, graph image too. SA is meant to work in the overall message content. It is not that