On Thu, 1 Sep 2016 06:23:37 -0400 Mauricio Tavares wrote: > On Thu, Sep 1, 2016 at 12:27 AM, Olivier > <olivier.nic...@cs.ait.ac.th> wrote:
> > I am running it, it does not do a very good job at extracting the > > text from the images. Then it uses it's own list of keywords to > > detect spam: to me it's the biggest problem, it should push back > > the text to SpamAssassin and let SA rules decide what to do with it. > > > I do agree that the OCR program should be doing the OCR'ing and > the text filtering should be left to a program that does that for a > living. It's a long time since I've used it, but IIRC the point of FuzzyOCR is that it does fuzzy matching on a dictionary of "bad" words - similar to the way that spelling checkers find the mostly likely suggestions. This gives it a very limited ability to deal with imperfectly read words. Putting garbled OCR text through SA body rules may be more trouble than it's worth.