[tesseract-ocr] Re: Trying to understand why Tesseract-ocr fails on some images

nor s Wed, 26 Jul 2023 12:09:58 -0700

OK I think I found the sweet spot. Setting the location for the crop 
rectangle to +933+1013 from the top left corner of the image gives me an 
amazing result of 98.8% and average on 670 images. I think that's pretty 
good!  
I still don't know why moving the box around a few pixels makes such a 
difference.


I think I'm where I want to be. if anyone has any ideas or suggestion about 
what's happening I'd love to hear from you.

Cheers
 Nor

On Wednesday, July 26, 2023 at 12:24:26 PM UTC-4 nor s wrote:

> Just to add a bit more information. I have found that changing the 
> vertical position of the crop box by a few pixels seems to make a 
> difference.
> One image that had a crop location of +930+1015 was not reading the 
> date/time. However, changing the vertical position to +1000 resulted in a 
> 105 out of 133 correct readings.  Again, not being familiar with the 
> internal workings of OCR, I having difficulty in understanding why OCR is 
> behaving this way.
>
> Still digging! :)
>
> Cheers
>  Nor
>
> On Wednesday, July 26, 2023 at 9:21:56 AM UTC-4 nor s wrote:
>
>> To show an example of an OCR that properly extracted the date/time, here 
>> are the files I used.
>> ShowPix it the full image , Outpx.2.jpg is the cropped image and 
>> outpx2.txt is the result of the OCR.
>>
>> As you can see the imaged that failed and the one that worked are very 
>> similar.
>>
>> Cheers
>>  Nor
>> On Wednesday, July 26, 2023 at 9:05:04 AM UTC-4 nor s wrote:
>>
>>> Hi All, 
>>>     As I had mentioned in an earlier message, I've got tesseract to 
>>> properly identify dates and time at a rate of about 84%.. However what 
>>> puzzles me is why the program reads the time stamp from the image 
>>> properly and on another image it fails. All the images are similar and 
>>> for all I crop put the date/time area to isolate it. I have attaches an 
>>> example. 
>>>
>>> The tempimage.jpg is the full image. outpx.jpx is the cropped image and 
>>> outpx.txt is the OCR result produced from the cropped image. 
>>>
>>> If anyone has any idea why OCR fails on this I would love to hear from 
>>> you. 
>>>
>>> Thanks for your help. 
>>>
>>> Cheers 
>>>  Nor
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/631ff8fd-660e-4bb2-b558-013bcc00218cn%40googlegroups.com.

[tesseract-ocr] Re: Trying to understand why Tesseract-ocr fails on some images

Reply via email to