You were both right - updating to version 5 fixed the problem more or less! Only in one case there is still a problem with lower and upper case letters, but for the other cases it's working now!
Am Donnerstag, 19. September 2019 12:49:43 UTC+2 schrieb zdenop: > > your tesseract version is old. Current version is 4.1 (or dev version is > 5.0). > For 4.x and above you can you different tessdata: best, fast or with 3.x > module. > > Zdenko > > > št 19. 9. 2019 o 11:55 'Sandra M.' via tesseract-ocr < > tesser...@googlegroups.com <javascript:>> napísal(a): > >> I use Tesseract 3.02 leptonica-1.68. What do you mean with tessdata_best? >> I'm new in this field and just know how to call tesseract with the given >> code line.... How can the resolution be 0 dpi? >> >> I'm using this Python code: >> >> import pytesseractimport argparseimport cv2import os >> # construct the argument parse and parse the arguments >> ap = argparse.ArgumentParser() >> ap.add_argument("-i", "--image", required=True, >> help="path to input image to be OCR'd") >> args = vars(ap.parse_args()) >> # load the example image and convert it to grayscale >> image = cv2.imread(args["image"]) >> gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) >> # write the grayscale image to disk as a temporary file so we can# apply OCR >> to it >> filename = "{}.png".format(os.getpid()) >> cv2.imwrite(filename, gray) >> # load the image as a PIL/Pillow image, apply OCR, and then delete# the >> temporary file >> text = pytesseract.image_to_string(gray)print("Output: " + text) >> >> >> Am Donnerstag, 19. September 2019 11:23:50 UTC+2 schrieb zdenop: >>> >>> Please provide more information (versions info, how you do OCR - seem >>> like you use some coding). >>> I just tried tesseract (tesseract 5.0.0-alpha-416-g408d6) command line >>> with tessdata_best and if work for me: >>> tesseract unnamed.png - >>> Warning: Invalid resolution 0 dpi. Using 70 instead. >>> Estimating resolution as 497 >>> Calibrations >>> >>> Zdenko >>> >>> >>> št 19. 9. 2019 o 10:43 'Sandra M.' via tesseract-ocr < >>> tesser...@googlegroups.com> napísal(a): >>> >>>> [image: currentImage.png] >>>> @Lorenzo Blz: This is an example image. The output of my code is >>>> "calibrations". The height of the letters is not the same. Of course it >>>> cannot be recognized if there is only a "c", but in the context to the >>>> other letters tesseract should be able to detect if it is a small or >>>> capital letter, I think. This image has no noise or anything else, I don't >>>> unterstand the problem. But nevertheless, your comment to change the size >>>> helped! If I resize it with 150% or 75% for example, it works. I just >>>> don't >>>> know how to solve it if I don't have a reference value later on. How to >>>> decide which is the right spelling, 100% image size or 150%. Or is it >>>> possible to say that it's always a more reliable result if I resize the >>>> image in preprocessing? >>>> >>>> Am Mittwoch, 18. September 2019 17:19:22 UTC+2 schrieb Sandra M.: >>>>> >>>>> I'm using Tesseract with Python. I have an image with 1-6 words in it >>>>> and need to read the text. Sometimes the character "C", which look the >>>>> same >>>>> in upper and lower case, is detected as lower case c instead of upper >>>>> case >>>>> C. I see the problem, but in context to the following letters it should >>>>> be >>>>> possible to detect the right notation. Is there any configuration or >>>>> something to improve this? >>>>> >>>>> I had a look at the configuration options of config='-psm x' with >>>>> different values for x, but nothing fits to my problem >>>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "tesseract-ocr" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to tesser...@googlegroups.com. >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/tesseract-ocr/e4ed704a-cee0-4bb2-80ae-9fc9b82ab55d%40googlegroups.com >>>> >>>> <https://groups.google.com/d/msgid/tesseract-ocr/e4ed704a-cee0-4bb2-80ae-9fc9b82ab55d%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> >>> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to tesser...@googlegroups.com <javascript:>. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/9faf77f7-c862-47f6-b01d-629773025a7f%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/9faf77f7-c862-47f6-b01d-629773025a7f%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/9c41dd0c-ddbc-4a70-aee8-ac155b9ce8cf%40googlegroups.com.