I thought that "abcdefghijklmn..." was not a good idea because of the segmentation problem (e.g.: r followed by n interpreted as m ( rn -> m )). So, as in my project I do the character segmentation by myself, I always was using "abcdefghijklmn..." for training. It would be very interesting to know the real reason for this recommendation.
Cheers, Andres 2012/10/19 Adam Chapam <[email protected]> > Just a quick follow up. > > I have spent the day running tests. I tried using the above linked data, > pages from books, and simple (not recommended) ADBDEFG etc, but found i get > the best results randomly generating strings with a simple algorithm that > outputs characters in strings ranging from 1 to 12 chars, resulting in > images like the one attached: > > > <https://lh4.googleusercontent.com/-ml77V-YmLic/UIGNzz83GZI/AAAAAAAAABo/h1LNNSxQfEg/s1600/eng.dos.exp0.gif> > If anyone knows why this might be a bad idea, please post, but so far it > seams the most successful (and simplest) method. > > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

