Performance on 18th century text

fabrizio . gotti Sat, 10 Aug 2013 04:28:53 -0700

Hi,

I am trying to recognize an 18th century text for academic purposes. I 
followed the (very helpful) tutorial, and encountered no technical 
problems. However, the recognition rate is disappointing. I think the 
source material may just be too difficult for tesseract 3 (see sample 
image<http://i.imgur.com/d5RnxI4.png>and recognized text below). The 
difficulties are multiple: 3 fonts, 2 
languages (bilingual text), obsolete spellings, variable stroke width... I 
retrained tesseract on 10 samples of each character, without much 
improvement.


Could someone tell me if this is feasible? Or maybe the state of the art in 
OCR has not reached yet this kind of performance...

Thanks for the insight!

Fabrizio

--

Image: http://i.imgur.com/d5RnxI4.png

*Recognized text for image*

ACCOLADE,  [embraﬀement] A bug, clîppl’ng and
colling. Je hazardaî quèlques accolades qui ne îûrent pâs
trop mal reçûes, I ventured ſome bugs, wbicb were not very
îll receîved. * Nous nous mimes ä domler des accolades â
notre boutèille, PVc./ëll ta bugging our bottle. ☞ Il l’a fait
Chevalîér en lui donnant l’accolade, He bar dubbcd hl’ln a
K.wigbt. ☞ Sèrvîr unc accolade de lapereaûx (une couple)
To jZ-rve o couple oj’yortng rabbîts în one dﬄa.


-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Performance on 18th century text

Reply via email to