I'm attempting to parse some data from screenshots of a mobile game:


<https://lh3.googleusercontent.com/-rPdE3K_LwhU/WBoSmI07pjI/AAAAAAAAC4g/4tnS8WLvqCE3voWNEt-g-6SsXjUtfq0RQCLcB/s1600/hero.jpg>


Since all the text is in predetermined areas I can easily grab the 
individual numbers to feed to tesseract:



<https://lh3.googleusercontent.com/-nScayQdhchU/WBoS09Nts1I/AAAAAAAAC4k/dD2njUqfS5cEJFE6Dwhei7YmPkQWrL9LgCLcB/s1600/DEF-1478103746919.png>


<https://lh3.googleusercontent.com/-RRRVFXz4Wes/WBoS7GnMG4I/AAAAAAAAC4o/v-nxeyUcSXMG0PPYbUZ3ZlUajkRAR8I-ACLcB/s1600/DEFTRON-1478103746920.png>


The top image is recognized fine, for the bottom image i receive the 
following translation :

1 EM)

with -psm 6 or 7, without I just get EM)

As you can see I've done some modification, greyscale and inversion.  I've 
tried scaling the image, it helps some times and breaks others.

At this point, I'm at a loss, I've done as much as I can think of to 
improve the results.  The only thing left, is it due to the game using a 
weird font that Tesseract doesn't fully understand?  If so, are there any 
good resources for training the tool for a new font?  I'm pretty much a 
newbie when it comes to tesseract.

Anyone have any other suggestions?

-Brian

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/42be81ea-1923-449d-ade3-74505dcdda23%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to