[tesseract-ocr] Re: Inconsistencies in detection and extraction of text using tesseract

Jun Repasa Fri, 31 May 2024 00:37:16 -0700

Its hard to give opinion withour seeing how you setup tesseract, what PSM 
did you specify, .. etc?


On Friday 31 May 2024 at 02:34:36 UTC+12 [email protected] wrote:

> I have provided the image from which I am trying to extract text from, 
> using tesseract ocr (input.jpeg). Along with that, I have also provided the 
> result or the extracted text from the image. As it can be observed from the 
> images, the extracted text is not very accurate. Negative symbols have been 
> omitted, some undesired characters are also there in the extracted text. (I 
> have marked some of the incorrect results with blue boxes)
>
> I have tried to improve the results by preprocessing and bringing changes 
> in the parameters of the model. I have tried:
>
> 1. Binarizing the images
>
> 2. HDR processing of the processes
>
> Even then, such inconsistencies remain.
>
> How to improve the detection and extraction of text in tesseract? I have 
> also tried paddleocr for the same task. Even then, symbols such as euro, 
> some negative signs are not being detected.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/f67165ee-5a2c-4ad5-8b73-ac5afef24f33n%40googlegroups.com.

[tesseract-ocr] Re: Inconsistencies in detection and extraction of text using tesseract

Reply via email to