Re: Training tesseract 3.01 with new font, for reading non dictionary strings - ideal training text?

Andres Fri, 19 Oct 2012 11:12:01 -0700

I thought that "abcdefghijklmn..." was not a good idea because of the
segmentation problem (e.g.: r followed by n interpreted as m ( rn -> m )).
So, as in my project I do the character segmentation by myself, I always
was using "abcdefghijklmn..." for training. It would be very interesting to
know the real reason for this recommendation.


Cheers,

Andres



2012/10/19 Adam Chapam <[email protected]>

> Just a quick follow up.
>
> I have spent the day running tests. I tried using the above linked data,
> pages from books, and simple (not recommended) ADBDEFG etc, but found i get
> the best results randomly generating strings with a simple algorithm that
> outputs characters in strings ranging from 1 to 12 chars, resulting in
> images like the one attached:
>
>
> <https://lh4.googleusercontent.com/-ml77V-YmLic/UIGNzz83GZI/AAAAAAAAABo/h1LNNSxQfEg/s1600/eng.dos.exp0.gif>
> If anyone knows why this might be a bad idea, please post, but so far it
> seams the most successful (and simplest) method.
>
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Re: Training tesseract 3.01 with new font, for reading non dictionary strings - ideal training text?

Reply via email to