As per my question on StackOverflow:  PyTesseract not recognizing decimals 
<https://stackoverflow.com/questions/64203559/pytesseract-not-recognizing-decimals>

I'm using PyTesseract to recognise text in table cells. When it comes to 
recognising drug doses with decimal points, the OCR fails to recognise the 
period character ( . ) , though is accurate for everything else. I'm 
using tesseract v5.0.0-alpha.20200328 on Windows 10.

My pre-processing consists of upscaling by 400% using cubic, conversion to 
black and white, dilation and erosion, morphology, and blurring. I've tried 
a decent combination of all of these (as well as each on their own), and 
nothing has recognized the .

I've tried --psm of various values as well as a character whitelist. I 
believe the font is Sergoe UI.

Before processing:  [image: S87rd.png] <https://i.stack.imgur.com/S87rd.png>

After processing:  [image: OFjoL.png] <https://i.stack.imgur.com/OFjoL.png>

PyTesseract output: 25mg »p

Processing code attached

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/5c754a36-a0e4-427f-9650-f41200a1cda5n%40googlegroups.com.
import cv2, pytesseract
import numpy as np

image = cv2.imread( '01.png' )
upscaled_image = cv2.resize(image, None, fx = 4, fy = 4, interpolation = 
cv2.INTER_CUBIC)
bw_image = cv2.cvtColor(upscaled_image, cv2.COLOR_BGR2GRAY)

kernel = np.ones((2, 2), np.uint8)
dilated_image = cv2.dilate(bw_image, kernel, iterations=1)
eroded_image = cv2.erode(dilated_image, kernel, iterations=1)

thresh = cv2.threshold(eroded_image, 205, 255, cv2.THRESH_BINARY)[1]
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (3, 3))
morh_image = cv2.morphologyEx(thresh, cv2.MORPH_CLOSE, kernel)
            
blur_image = cv2.threshold(cv2.bilateralFilter(morh_image, 5, 75, 75), 0, 255, 
cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]

final_image = blur_image
text = pytesseract.image_to_string(final_image, lang='eng', config='--psm 10')

Reply via email to