Hi, I am trying to recognize an 18th century text for academic purposes. I followed the (very helpful) tutorial, and encountered no technical problems. However, the recognition rate is disappointing. I think the source material may just be too difficult for tesseract 3 (see sample image<http://i.imgur.com/d5RnxI4.png>and recognized text below). The difficulties are multiple: 3 fonts, 2 languages (bilingual text), obsolete spellings, variable stroke width... I retrained tesseract on 10 samples of each character, without much improvement.
Could someone tell me if this is feasible? Or maybe the state of the art in OCR has not reached yet this kind of performance... Thanks for the insight! Fabrizio -- Image: http://i.imgur.com/d5RnxI4.png *Recognized text for image* ACCOLADE, [embraffement] A bug, clîppl’ng and colling. Je hazardaî quèlques accolades qui ne îûrent pâs trop mal reçûes, I ventured ſome bugs, wbicb were not very îll receîved. * Nous nous mimes ä domler des accolades â notre boutèille, PVc./ëll ta bugging our bottle. ☞ Il l’a fait Chevalîér en lui donnant l’accolade, He bar dubbcd hl’ln a K.wigbt. ☞ Sèrvîr unc accolade de lapereaûx (une couple) To jZ-rve o couple oj’yortng rabbîts în one dffla. -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en --- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.

