Hello!
I am also very much a newbie but one thing I think could help a lot is if 
you make the image black and white (Binarisation) and not greyscale.
You could pull this off using Leptonica which seem to be commonly used with 
Tesseract.

(See attachment for example)


Hope that helps!
/K

On Wednesday, November 2, 2016 at 7:23:53 PM UTC+1, Brian Craig wrote:
>
> I'm attempting to parse some data from screenshots of a mobile game:
>
>
>
> <https://lh3.googleusercontent.com/-rPdE3K_LwhU/WBoSmI07pjI/AAAAAAAAC4g/4tnS8WLvqCE3voWNEt-g-6SsXjUtfq0RQCLcB/s1600/hero.jpg>
>
>
> Since all the text is in predetermined areas I can easily grab the 
> individual numbers to feed to tesseract:
>
>
>
>
> <https://lh3.googleusercontent.com/-nScayQdhchU/WBoS09Nts1I/AAAAAAAAC4k/dD2njUqfS5cEJFE6Dwhei7YmPkQWrL9LgCLcB/s1600/DEF-1478103746919.png>
>
>
>
> <https://lh3.googleusercontent.com/-RRRVFXz4Wes/WBoS7GnMG4I/AAAAAAAAC4o/v-nxeyUcSXMG0PPYbUZ3ZlUajkRAR8I-ACLcB/s1600/DEFTRON-1478103746920.png>
>
>
> The top image is recognized fine, for the bottom image i receive the 
> following translation :
>
> 1 EM)
>
> with -psm 6 or 7, without I just get EM)
>
> As you can see I've done some modification, greyscale and inversion.  I've 
> tried scaling the image, it helps some times and breaks others.
>
> At this point, I'm at a loss, I've done as much as I can think of to 
> improve the results.  The only thing left, is it due to the game using a 
> weird font that Tesseract doesn't fully understand?  If so, are there any 
> good resources for training the tool for a new font?  I'm pretty much a 
> newbie when it comes to tesseract.
>
> Anyone have any other suggestions?
>
> -Brian
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/8647ccd2-0ef4-4a89-b4bd-be9b0038f5b8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to