I'm a newbie to the list and have been scanning recent posts to see if
what I'm about to ask about has been covered but I haven't seen anything
yet.
Lately I have been getting more and more of the stock alert spam but now
all the good info is in an image and typically following the image is
...omissis...
How about the FuzzyOCR plugin? That has been discussed quite a bit
here recently.
http://wiki.apache.org/spamassassin/FuzzyOcrPlugin
--
Bowie
And, by the way, it seems to work!
Actually, the only limit I see is the own-made FuzzyOcr.words (and, maybe, the
fact that
On Mon, Oct 02, 2006 at 03:18:58PM +0100, Randal, Phil wrote:
undetected). Wouldn't it be better to inject the detected
text back to SA? There should be enough variants of spam
worlds to let SA fuzzily catch the ones from images.
I think so. Some of the words would be perfectly
The real problem is the potentially fuzzy output from the ocr engine: shure all
the copies of the very same spam would be detected the same, but what about
slightly different copies? Would the use the sa force approach be feasible?
The use of String::Approx in fuzzyocr has shurely a meaning,
You'd need some clever rules...
As an example, the word stock is perfectly valid in emails, but if you
found it in an attached image you'd be pretty sure it was spam.
It would be perfectly valid in a, say, graph image too. SA is meant to work in
the overall message content. It is not that