[tesseract-ocr] Re: Tesseract 5.x for Math recognition

Đỗ Đức Phượng Wed, 19 Jul 2023 23:18:10 -0700


The code you provided uses Tesseract OCR with a custom configuration (-l 
eng+equ) to recognize English and mathematical equations (equ) in the 
image. However, there is a small issue with the code – 
pytesseract.image_to_string() expects the image in PIL (Python Imaging 
Library) format, not OpenCV format (NumPy array).


To fix the issue, you can convert the image from OpenCV format to PIL 
format before passing it to Tesseract. You can use the PIL.Image.fromarray() 
function to perform this conversion.

Here's the updated code:
pythonCopy code
import pytesseract import cv2 from PIL import Image custom_config = r'-l 
eng+equ' img = cv2.imread("tessa.png") # Convert the image from OpenCV 
format (NumPy array) to PIL format pil_image = Image.fromarray(img) # 
Perform OCR using Tesseract and extract text from the image text = 
pytesseract.image_to_string(pil_image, config=custom_config) print(text) 

Make sure to replace "tessa.png" with the actual path to your image file.

With this code, Tesseract OCR will attempt to recognize both English text 
and mathematical equations present in the image. The custom_config 
parameter with the value -l eng+equ instructs Tesseract to use the English 
and mathematical equation language data for recognition.

Please note that while Tesseract is a powerful OCR engine, recognizing 
complex mathematical expressions accurately might be challenging. If you 
encounter issues with accuracy, consider using specialized OCR libraries or 
APIs that are designed specifically for math recognition.
source: Chat gpt
Vào lúc 04:55:05 UTC+7 ngày Thứ Tư, 19 tháng 7, 2023, [email protected] đã 
viết:

> Hi everyone, 
>
> I'm trying to use Tesseract to detect both the english part and the 
> mathematical part of the image below and it doesn't seem to work 
>
> [image: tessa.png]
>
> The code I'm using is :
>
> *import pytesseract*
> *import cv2*
>
> *custom_config = r'-l eng+equ'*
>
>
> *img = cv2.imread("tessa.png")text = pytesseract.image_to_string(img, 
> config=custom_config,)print(text)*
>
> The output being produced is just (see below) without the mathematical 
> part even though I've used eng+equ 
> [image: Screenshot 2023-07-18 at 5.54.13 PM.png]
>
> Did anyone find a workaround for this or must I retrain tesseract? 
>
> Regards,
> Nash
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/1262bffb-68e6-4bc9-a188-7bb806463a19n%40googlegroups.com.

[tesseract-ocr] Re: Tesseract 5.x for Math recognition

Reply via email to