This seems like an ad-hoc approach. I am already converting images to 
grayscale. If I apply blurring, binarisation, etc. then I will solve this 
case but I will prompt another case to fail as a result. There is something 
with tesseract that fails to generalize on clearly near-identical images, 
and I am interested in what is it.

2020 m. liepa 15 d., trečiadienis 12:08:33 UTC+3, Tuan Ardouin rašė:
>
> You need to apply some pre-processing to your image.
>
> On Wednesday, July 15, 2020 at 9:01:14 AM UTC+2, MysteriousGuy wrote:
>>
>> Hi. Latest stable version (4.1.1) produces the same error
>>
>> 2020 m. liepa 14 d., antradienis 17:13:40 UTC+3, zdenop rašė:
>>>
>>> Try to use the latest version of tesseract.
>>>
>>> Zdenko
>>>
>>>
>>> ut 14. 7. 2020 o 16:04 MysteriousGuy <gyt...@gmail.com> napísal(a):
>>>
>>>> I am using Tesseract to extract text from images attached. For some 
>>>> reason, even though the images are nearly identical, tesseract makes a 
>>>> mistake in one of them: for 'bad.png' the output is ELHADIJ, whereas for 
>>>> 'good.png' it is ELHADJ
>>>>
>>>> Here is what I have and done:
>>>>
>>>>    - tesseract version: 4.0.0-beta.1
>>>>    - leptonica version: 1.75.3
>>>>    - I use English .traineddata file from here: 
>>>>    
>>>> https://github.com/tesseract-ocr/tessdata_best/blob/master/eng.traineddata
>>>>    - I tried these page segmentation modes: 3, 7, 8, 13 - the mistake 
>>>>    is always there.
>>>>
>>>> So the commands I ran were
>>>>
>>>> tesseract good.png output1 -l eng --psm 8
>>>> tesseract bad.png output2 -l eng --psm 8
>>>>
>>>> and similarly for other PSMs
>>>>
>>>>
>>>> My question is: how do I make tesseract more robust? Why does it make a 
>>>> mistake in one case but not in the other?
>>>>
>>>> -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "tesseract-ocr" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to tesser...@googlegroups.com.
>>>> To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/tesseract-ocr/81a83479-b266-4686-a2d8-fae2d5916831o%40googlegroups.com
>>>>  
>>>> <https://groups.google.com/d/msgid/tesseract-ocr/81a83479-b266-4686-a2d8-fae2d5916831o%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/d6df0771-04e5-4e78-9109-28d91e2c2f2do%40googlegroups.com.

Reply via email to