On 09.06.16 10:43, Olivier wrote:
For years I am having FuzzyOcr pluging running, but it helps little,
because it has it's own list of words to keep updated.

I am wondering if, instead of using that own list of words, the result
was injected back into the body of the main message.

I raised this issue some years ago. The result was that pushing OCR-ed data
bach to SA for evaluating BAYES and other rules could cause troubles,
because freely availabel OCR SW was not very presice.

Most of the time, what will be injected back is plain garbade:
w_T___l_e?_

But other time the result is interesting like a proper English sentence
full of spam.

what exactly do you use for OCR? 10 years ago I made a comparison between
gocr, ocrad and tesseract, where gocr gave best results.

Now, since google sponsors tesseract development, the scaning looks much
much better, and I started thinking about tryint that again.

So how SA will react if I reinject the garbage? Wil lit just ignore it?

would be nice to see trhe results.
I'm mostly afraid about FUZZY_* rules...

--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Save the whales. Collect the whole set.

Reply via email to