[tesseract-ocr] Re: Training CMC7 Font

Roger Thu, 03 Mar 2016 05:06:07 -0800

Yes. I've seen some people who accomplish that. But they didn't provide the 
.traineddata.


I have been able to make tesseract recognize some fonts, by reducing the 
image size, and increasing its contrast, so the characters are more 
condensed.

You have any other idea of how can I make tesseract recognize it better?

On Wednesday, March 2, 2016 at 1:48:25 PM UTC-3, Tom Morris wrote:
>
> On Wednesday, March 2, 2016 at 2:23:44 AM UTC-5, Roger wrote:
>>
>> I am training tesseract to recognize CMC7 font, following this 
>> <http://michaeljaylissner.com/posts/2012/02/11/adding-new-fonts-to-tesseract-3-ocr-engine/>
>>  and this 
>> <https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract>
>>  tutorial.
>>
>
> I see two immediate issues:
>
> - Tesseract assumes non-noisy character images are connected shapes 
> (except for diacritics, etc) while the CMC7 characters are made up of 
> disconnected vertical bars
> - According to this Wikipedia page https://fr.wikipedia.org/wiki/CMC7 the 
> significant part of the CMC7 encoding is the interbar spacing, *not* the 
> overall shape.
>
> Are you sure you're using the right tool for the job?
>
> Tom
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/ffb3e457-0665-456a-a36e-4994db0801af%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] Re: Training CMC7 Font

Reply via email to