What command are you using? You should use fra+eng or something similar. The resolution may be too high -- 600dpi is typically the upper limit (but really just the letter height matters). You could limit the character set too... Sven
On Tuesday, August 6, 2013, wrote: > Hi, > > I am trying to recognize an 18th century text for academic purposes. I > followed the (very helpful) tutorial, and encountered no technical > problems. However, the recognition rate is disappointing. I think the > source material may just be too difficult for tesseract 3 (see sample > image <http://i.imgur.com/d5RnxI4.png> and recognized text below). The > difficulties are multiple: 3 fonts, 2 languages (bilingual text), obsolete > spellings, variable stroke width... I retrained tesseract on 10 samples of > each character, without much improvement. > > Could someone tell me if this is feasible? Or maybe the state of the art > in OCR has not reached yet this kind of performance... > > Thanks for the insight! > > Fabrizio > > -- > > Image: http://i.imgur.com/d5RnxI4.png > > *Recognized text for image* > > ACCOLADE, [embraffement] A bug, clîppl’ng and > colling. Je hazardaî quèlques accolades qui ne îûrent pâs > trop mal reçûes, I ventured ſome bugs, wbicb were not very > îll receîved. * Nous nous mimes ä domler des accolades â > notre boutèille, PVc./ëll ta bugging our bottle. ☞ Il l’a fait > Chevalîér en lui donnant l’accolade, He bar dubbcd hl’ln a > K.wigbt. ☞ Sèrvîr unc accolade de lapereaûx (une couple) > To jZ-rve o couple oj’yortng rabbîts în one dffla. > > > -- > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to > [email protected]<javascript:_e({}, 'cvml', > '[email protected]');> > To unsubscribe from this group, send email to > [email protected] <javascript:_e({}, 'cvml', > 'tesseract-ocr%[email protected]');> > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en > > --- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected] <javascript:_e({}, > 'cvml', 'tesseract-ocr%[email protected]');>. > For more options, visit https://groups.google.com/groups/opt_out. > > > -- ``All that is gold does not glitter, not all those who wander are lost; the old that is strong does not wither, deep roots are not reached by the frost. >From the ashes a fire shall be woken, a light from the shadows shall spring; renewed shall be blade that was broken, the crownless again shall be king.” -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en --- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.

