a) You can use the -e option for the combine_tessdata tool to extract
individual components of the .traineddata file, like this:
combine_tessdata -e tessdata/eng.traineddata /home/$USER/temp/
eng.config /home/$USER/temp/eng.unicharset
For more details see this:
Hi!
I train Tess using separate images for every text line. Recognition is
also ran over single text line images. Recognition performs pretty
well, however there are many errors that, I believe, related to
misdetected baselines, during training or recognition - I don't know.
These include:
*** On behalf of Andy Syme who could not post in this group probably
due to spam removal artefacts ***
...my problem is that I have some documents written in 1890-1920 that
I scanned want to OCR. They are in English using the standard
English language file I was getting 40-50% recognition. I
Dear Andrew,
I've a couple of observations on your problem.
- The standard English language file was created using the set of
training images of the famous computer fonts like Arial, Times,
Verdana, some Ghostscript fonts and of their italic and bold versions.
Your book document's characters
4 matches
Mail list logo