I'm attempting to parse some data from screenshots of a mobile game:
<https://lh3.googleusercontent.com/-rPdE3K_LwhU/WBoSmI07pjI/AAAAAAAAC4g/4tnS8WLvqCE3voWNEt-g-6SsXjUtfq0RQCLcB/s1600/hero.jpg> Since all the text is in predetermined areas I can easily grab the individual numbers to feed to tesseract: <https://lh3.googleusercontent.com/-nScayQdhchU/WBoS09Nts1I/AAAAAAAAC4k/dD2njUqfS5cEJFE6Dwhei7YmPkQWrL9LgCLcB/s1600/DEF-1478103746919.png> <https://lh3.googleusercontent.com/-RRRVFXz4Wes/WBoS7GnMG4I/AAAAAAAAC4o/v-nxeyUcSXMG0PPYbUZ3ZlUajkRAR8I-ACLcB/s1600/DEFTRON-1478103746920.png> The top image is recognized fine, for the bottom image i receive the following translation : 1 EM) with -psm 6 or 7, without I just get EM) As you can see I've done some modification, greyscale and inversion. I've tried scaling the image, it helps some times and breaks others. At this point, I'm at a loss, I've done as much as I can think of to improve the results. The only thing left, is it due to the game using a weird font that Tesseract doesn't fully understand? If so, are there any good resources for training the tool for a new font? I'm pretty much a newbie when it comes to tesseract. Anyone have any other suggestions? -Brian -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/42be81ea-1923-449d-ade3-74505dcdda23%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

