[tesseract-ocr] Re: pytesseract speed improvement?

Tom Morris Thu, 22 May 2025 12:48:30 -0700

On Wednesday, May 21, 2025 at 12:28:52 PM UTC-4 [email protected] wrote:


I'm using tesseract to convert a small picture containing a title into a 
string. It runs in about one second.
Here is the command line I'm using:
pytesseract.image_to_string(cropped_image, nice=-10, config='--psm 7 --oem 
1 -l eng+fra+spa+deu+ita+por+jpn+kor+rus+chi_sim+chi_tra')


A small semantic distinction - tesseract and pytesseract are two different 
things, maintained by different teams.
 

I tried to to remove the -l parameter and it's way faster (98ms), but then 
the title is totally wrong. I'm wondering if the time is taken to load 
those dictionnaries, so I can pre-load them and keep them in memory, or 
it's more on the processing time.


Certainly every language model that you add is going to increase processing 
time, so you only want to load the ones that you really need, but I don't 
think you have the granularity of control with pytesseract to save 
significantly on initialization time. It appears to just use command line 
tesseract running in a subprocess. 

One thing which may cut down on overhead is collecting batch of images, 
saving them in a multi-image file format, and then have Tesseract process 
that.

Tom

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion visit 
https://groups.google.com/d/msgid/tesseract-ocr/77af7499-6271-4135-982b-4b2fd1ee27d9n%40googlegroups.com.

[tesseract-ocr] Re: pytesseract speed improvement?

Reply via email to