Re: Stock spam in images

Andreas Pettersson Mon, 02 Oct 2006 09:26:54 -0700

Stuart Johnston wrote:

Theo Van Dinter wrote:
On Mon, Oct 02, 2006 at 03:18:58PM +0100, Randal, Phil wrote:
undetected). Wouldn't it be better to inject the detected text backto SA? There should be enough variants of spam worlds to let SAfuzzily catch the ones from images.
I think so. Some of the words would be perfectly legitimate in thetext
of emails but rarely found in attached legitimate images.

Quite apart from the fact that Spamassassin isn't designed for
"reinjection".
FWIW, 3.2 adds in support to have rendering of non-text parts. So aplugincould, for instance, OCR text from an image, and then the normal bodyrules
and such would be able to use that information.
Would it also be possible to create a rule that matches on textrendered specifically from a non-text part and not the whole body?That way you could get the benefit of Bayes and existing body rules inthe general case while still taking advantage of the fact the certainwords in an image have more spammy-weight than the same words in text.


Or perhaps:

tflags   RULE_NAME   ocr


/Andreas

Re: Stock spam in images

Reply via email to