[tesseract-ocr] Digit recognition errors / training

Suppressed Thu, 02 Apr 2020 11:21:41 -0700

Im working on a project in which I need to read digit values from an image, 
then do tasks based on the values that get extracted.  
Because of this, mistakes arent really acceptable. I attached the picture 
as an example of what the images look like. 
The digits barely change, they dont change positioning or angle, only some 
have more or less pixels each time but it isnt much.


23999
29999
30999
40000
40000
40000
40000
1
43000
44000

44000

44500

This is what tesseract extracts from the image. As you can see its mostly 
fine but instead for 4111 it extracts 1. Now, this can vary if I change the 
languages or change some thresholding values, but that might work for this 
case, but it wont work for the other ones.
I guess only training would be a possibility to fix errors, but I couldnt 
really do it. The positions or angles of the data doesnt change, its just 
the font I Would need to train, but I dont know how to generate a lot of 
training information.

code:
img = cv2.imread(xy.png',cv2.IMREAD_GRAYSCALE)
ret,thresh1 = cv2.threshold(img,150,255,cv2.THRESH_BINARY_INV)
ROI1 = thresh1[130:1050,1280:1420]
text = pytesseract.image_to_string(ROI1,config="digits")

I imagegrab the screen and select ROI.

Any suggestion? Maybe theres some training data that with some digits in it 
that I could change to my font?

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/a0fd3ccf-f681-4c34-8113-7d15f3a44101%40googlegroups.com.

[tesseract-ocr] Digit recognition errors / training

Reply via email to