Try to use the latest version of tesseract.

Zdenko


ut 14. 7. 2020 o 16:04 MysteriousGuy <gyt...@gmail.com> napĂ­sal(a):

> I am using Tesseract to extract text from images attached. For some
> reason, even though the images are nearly identical, tesseract makes a
> mistake in one of them: for 'bad.png' the output is ELHADIJ, whereas for
> 'good.png' it is ELHADJ
>
> Here is what I have and done:
>
>    - tesseract version: 4.0.0-beta.1
>    - leptonica version: 1.75.3
>    - I use English .traineddata file from here:
>    https://github.com/tesseract-ocr/tessdata_best/blob/master/eng.traineddata
>    - I tried these page segmentation modes: 3, 7, 8, 13 - the mistake is
>    always there.
>
> So the commands I ran were
>
> tesseract good.png output1 -l eng --psm 8
> tesseract bad.png output2 -l eng --psm 8
>
> and similarly for other PSMs
>
>
> My question is: how do I make tesseract more robust? Why does it make a
> mistake in one case but not in the other?
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/81a83479-b266-4686-a2d8-fae2d5916831o%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/81a83479-b266-4686-a2d8-fae2d5916831o%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8xF%3Di9KVL6%3DfJRQCJa_NR%2Bi%2BerMj9kFsrO1gn-KkU1Bng%40mail.gmail.com.

Reply via email to