Stuart Johnston wrote:

Theo Van Dinter wrote:

On Mon, Oct 02, 2006 at 03:18:58PM +0100, Randal, Phil wrote:

undetected). Wouldn't it be better to inject the detected text back to SA? There should be enough variants of spam worlds to let SA fuzzily catch the ones from images.

I think so. Some of the words would be perfectly legitimate in the text
of emails but rarely found in attached legitimate images.

Quite apart from the fact that Spamassassin isn't designed for
"reinjection".


FWIW, 3.2 adds in support to have rendering of non-text parts. So a plugin could, for instance, OCR text from an image, and then the normal body rules
and such would be able to use that information.


Would it also be possible to create a rule that matches on text rendered specifically from a non-text part and not the whole body? That way you could get the benefit of Bayes and existing body rules in the general case while still taking advantage of the fact the certain words in an image have more spammy-weight than the same words in text.


Or perhaps:

tflags   RULE_NAME   ocr


/Andreas

Reply via email to