Try to resize the image increase it size, use interpolation with inter_area or inter_cubic the bigger the image the better tesseract perform. PSM 6 is the right setting
On Saturday 1 June 2024 at 00:19:32 UTC+12 [email protected] wrote: > > In order to improve the results, I have implemented canny edge detection > and Hough Lines Transform on the images. Then I fed the binarized image to > the tesseract model. > > text = pytesseract.image_to_string(cropped_frame,lang='eng', config =' > --psm 6 --oem 3') > The results have improved a bit, but are still far from perfect. The > negative symbols are being omitted, some of them are being misunderstood as > ~. Similarly some decimal points are also being omitted. 22.5 was extracted > as 225. > On Friday, May 31, 2024 at 1:07:01 PM UTC+5:30 [email protected] wrote: > >> Its hard to give opinion withour seeing how you setup tesseract, what PSM >> did you specify, .. etc? >> >> On Friday 31 May 2024 at 02:34:36 UTC+12 [email protected] wrote: >> >>> I have provided the image from which I am trying to extract text from, >>> using tesseract ocr (input.jpeg). Along with that, I have also provided the >>> result or the extracted text from the image. As it can be observed from the >>> images, the extracted text is not very accurate. Negative symbols have been >>> omitted, some undesired characters are also there in the extracted text. (I >>> have marked some of the incorrect results with blue boxes) >>> >>> I have tried to improve the results by preprocessing and bringing >>> changes in the parameters of the model. I have tried: >>> >>> 1. Binarizing the images >>> >>> 2. HDR processing of the processes >>> >>> Even then, such inconsistencies remain. >>> >>> How to improve the detection and extraction of text in tesseract? I have >>> also tried paddleocr for the same task. Even then, symbols such as euro, >>> some negative signs are not being detected. >>> >> -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/7693f84e-9971-4bf8-a1af-bb9ae5d76e5dn%40googlegroups.com.

