Stuart Johnston wrote:
Theo Van Dinter wrote:
On Mon, Oct 02, 2006 at 03:18:58PM +0100, Randal, Phil wrote:
undetected). Wouldn't it be better to inject the detected text back
to SA? There should be enough variants of spam worlds to let SA
fuzzily catch the ones from images.
I think so. Some of the words would be perfectly legitimate in the
text
of emails but rarely found in attached legitimate images.
Quite apart from the fact that Spamassassin isn't designed for
"reinjection".
FWIW, 3.2 adds in support to have rendering of non-text parts. So a
plugin
could, for instance, OCR text from an image, and then the normal body
rules
and such would be able to use that information.
Would it also be possible to create a rule that matches on text
rendered specifically from a non-text part and not the whole body?
That way you could get the benefit of Bayes and existing body rules in
the general case while still taking advantage of the fact the certain
words in an image have more spammy-weight than the same words in text.
Or perhaps:
tflags RULE_NAME ocr
/Andreas