[tesseract-ocr] pytesseract speed improvement?

Jean-Marc Spaggiari Wed, 21 May 2025 09:29:04 -0700

Hi,

I'm using tesseract to convert a small picture containing a title into a 
string. It runs in about one second.
Here is the command line I'm using:
pytesseract.image_to_string(cropped_image, nice=-10, config='--psm 7 --oem 
1 -l eng+fra+spa+deu+ita+por+jpn+kor+rus+chi_sim+chi_tra')


I have millions of those small pictures to process. I'm wondering if there 
is a way to make that faster. Can I keep tesseract in memory and "stream" 
the pictures to it?  I'm receiving the pictures one by one on a server, so 
I can't batch them.

I tried to to remove the -l parameter and it's way faster (98ms), but then 
the title is totally wrong. I'm wondering if the time is taken to load 
those dictionnaries, so I can pre-load them and keep them in memory, or 
it's more on the processing time.

Thanks,

JMS

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion visit 
https://groups.google.com/d/msgid/tesseract-ocr/35964860-b0d1-4a9a-be40-cda9bab14d3an%40googlegroups.com.

[tesseract-ocr] pytesseract speed improvement?

Reply via email to