decoder wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hello there,
I have improved the original OcrPlugin (found at
http://wiki.apache.org/spamassassin/OcrPlugin), so it contains fuzzy
matching. Like that, mistakes made by the OCR recognition or
intentional obfuscations in the text don't make the recognition
impossible. This is being done with a relative distance calculation
between the pattern (word from a given word list) and a line in the
recognized input. Also, the plugin uses dynamic scoring (more matched
words means more score, this can be adjusted in the source).
You can find a full description and an example in the wiki under:
http://wiki.apache.org/spamassassin/FuzzyOcrPlugin
Ideas for improvements or critics are always welcome :)
Hi
Could this plugin be extended to support png images?
I receive quite a few of them...
I guess it's probably just a line or two in addition to the jpg and gif
Also might it be a good idea not to trust the content-type but instead
use file or another 'detection utility'? As mentioned on the original
ocrplugin page - gif2pnm and jpg2pnm have been abandoned because of
sometimes wrong content types....?
Matt