Skip to site navigation (Press enter)

[tesseract-ocr] Poor results from simple images

Brian Craig Wed, 02 Nov 2016 11:24:07 -0700


I'm attempting to parse some data from screenshots of a mobile game:

<https://lh3.googleusercontent.com/-rPdE3K_LwhU/WBoSmI07pjI/AAAAAAAAC4g/4tnS8WLvqCE3voWNEt-g-6SsXjUtfq0RQCLcB/s1600/hero.jpg>

Since all the text is in predetermined areas I can easily grab the
individual numbers to feed to tesseract:

<https://lh3.googleusercontent.com/-nScayQdhchU/WBoS09Nts1I/AAAAAAAAC4k/dD2njUqfS5cEJFE6Dwhei7YmPkQWrL9LgCLcB/s1600/DEF-1478103746919.png>

<https://lh3.googleusercontent.com/-RRRVFXz4Wes/WBoS7GnMG4I/AAAAAAAAC4o/v-nxeyUcSXMG0PPYbUZ3ZlUajkRAR8I-ACLcB/s1600/DEFTRON-1478103746920.png>

The top image is recognized fine, for the bottom image i receive the
following translation :

1 EM)

with -psm 6 or 7, without I just get EM)

As you can see I've done some modification, greyscale and inversion. I've
tried scaling the image, it helps some times and breaks others.

At this point, I'm at a loss, I've done as much as I can think of to
improve the results. The only thing left, is it due to the game using a
weird font that Tesseract doesn't fully understand? If so, are there any
good resources for training the tool for a new font? I'm pretty much a
newbie when it comes to tesseract.

Anyone have any other suggestions?

-Brian

--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit
https://groups.google.com/d/msgid/tesseract-ocr/42be81ea-1923-449d-ade3-74505dcdda23%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.