Hi all,

I'm learning tesseract in python and I'm trying to detect all the text 
regions using pytesseract.image_to_data() and then extract the text from 
each region one by one. My code below attempts to do this:
import cv2
import pytesseract
from pytesseract import Output
pytesseract.pytesseract.tesseract_cmd = r'C:\Program 
Files\Tesseract-OCR\tesseract.exe'

img = cv2.imread('cocacola_19.png')
d = pytesseract.image_to_data(img, output_type=Output.DICT)

n_boxes = len(d['text'])
img_rect = img.copy()
c=0
for i in range(n_boxes):
    if int(d['conf'][i]) > 50:

        (x1, y1, w1, h1) = (d['left'][i], d['top'][i], d['width'][i], d[
'height'][i])
   
        img_rect = cv2.rectangle(img_rect, (x1, y1), (x1 + w1, y1 + h1), (0, 
255, 0), 2)
        # cv2.imwrite('rect.jpg', img[y1:y1+h1, x1:x1+w1])
        out = pytesseract.image_to_string(img[y1:y1+h1, x1:x1+w1])
        if(len(out)==0):
            out = pytesseract.image_to_string(img[y1:y1+h1, x1:x1+w1], 
config='--psm 3')        
        c+=1

However, it does not work. For example, the first region of text contains 
'THE'. But pytesseract.image_to_string() returns an empty string. Why does 
this happen? My input image is attached.

Thanks in advance

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/99a956f6-ff6b-4c4b-b12b-880e74f44844o%40googlegroups.com.

Reply via email to