Re: [tesseract-ocr] Re: problems with upper-case character

'Sandra M.' via tesseract-ocr Thu, 19 Sep 2019 08:29:20 -0700

You were both right - updating to version 5 fixed the problem more or less! 
Only in one case there is still a problem with lower and upper case 
letters, but for the other cases it's working now!


Am Donnerstag, 19. September 2019 12:49:43 UTC+2 schrieb zdenop:
>
> your tesseract version is old. Current version is 4.1 (or dev version is 
> 5.0).
> For 4.x and above you can you different tessdata: best, fast or with 3.x 
> module.
>
> Zdenko
>
>
> št 19. 9. 2019 o 11:55 'Sandra M.' via tesseract-ocr <
> tesser...@googlegroups.com <javascript:>> napísal(a):
>
>> I use Tesseract 3.02 leptonica-1.68. What do you mean with tessdata_best? 
>> I'm new in this field and just know how to call tesseract with the given 
>> code line.... How can the resolution be 0 dpi?
>>
>> I'm using this Python code:
>>
>> import pytesseractimport argparseimport cv2import os
>> # construct the argument parse and parse the arguments
>> ap = argparse.ArgumentParser()
>> ap.add_argument("-i", "--image", required=True,
>>     help="path to input image to be OCR'd")
>> args = vars(ap.parse_args())
>> # load the example image and convert it to grayscale
>> image = cv2.imread(args["image"])
>> gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
>> # write the grayscale image to disk as a temporary file so we can# apply OCR 
>> to it
>> filename = "{}.png".format(os.getpid())
>> cv2.imwrite(filename, gray)
>> # load the image as a PIL/Pillow image, apply OCR, and then delete# the 
>> temporary file
>> text = pytesseract.image_to_string(gray)print("Output: " + text)
>>
>>
>> Am Donnerstag, 19. September 2019 11:23:50 UTC+2 schrieb zdenop:
>>>
>>> Please provide more information (versions info, how you do OCR - seem 
>>> like you use some coding).
>>> I just tried tesseract (tesseract 5.0.0-alpha-416-g408d6) command line 
>>> with tessdata_best and if work for me:
>>> tesseract unnamed.png -
>>> Warning: Invalid resolution 0 dpi. Using 70 instead.
>>> Estimating resolution as 497
>>> Calibrations
>>>
>>> Zdenko
>>>
>>>
>>> št 19. 9. 2019 o 10:43 'Sandra M.' via tesseract-ocr <
>>> tesser...@googlegroups.com> napísal(a):
>>>
>>>> [image: currentImage.png]
>>>> @Lorenzo Blz: This is an example image. The output of my code is 
>>>> "calibrations". The height of the letters is not the same. Of course it 
>>>> cannot be recognized if there is only a "c", but in the context to the 
>>>> other letters tesseract should be able to detect if it is a small or 
>>>> capital letter, I think. This image has no noise or anything else, I don't 
>>>> unterstand the problem. But nevertheless, your comment to change the size 
>>>> helped! If I resize it with 150% or 75% for example, it works. I just 
>>>> don't 
>>>> know how to solve it if I don't have a reference value later on. How to 
>>>> decide which is the right spelling, 100% image size or 150%. Or is it 
>>>> possible to say that it's always a more reliable result if I resize the 
>>>> image in preprocessing?
>>>>
>>>> Am Mittwoch, 18. September 2019 17:19:22 UTC+2 schrieb Sandra M.:
>>>>>
>>>>> I'm using Tesseract with Python. I have an image with 1-6 words in it 
>>>>> and need to read the text. Sometimes the character "C", which look the 
>>>>> same 
>>>>> in upper and lower case, is detected as lower case c instead of upper 
>>>>> case 
>>>>> C. I see the problem, but in context to the following letters it should 
>>>>> be 
>>>>> possible to detect the right notation. Is there any configuration or 
>>>>> something to improve this?
>>>>>
>>>>> I had a look at the configuration options of config='-psm x' with 
>>>>> different values for x, but nothing fits to my problem
>>>>>
>>>> -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "tesseract-ocr" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to tesser...@googlegroups.com.
>>>> To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/tesseract-ocr/e4ed704a-cee0-4bb2-80ae-9fc9b82ab55d%40googlegroups.com
>>>>  
>>>> <https://groups.google.com/d/msgid/tesseract-ocr/e4ed704a-cee0-4bb2-80ae-9fc9b82ab55d%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to tesser...@googlegroups.com <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/9faf77f7-c862-47f6-b01d-629773025a7f%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/9faf77f7-c862-47f6-b01d-629773025a7f%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/9c41dd0c-ddbc-4a70-aee8-ac155b9ce8cf%40googlegroups.com.

Re: [tesseract-ocr] Re: problems with upper-case character

Reply via email to