On Wednesday, May 21, 2025 at 12:28:52 PM UTC-4 [email protected] wrote:
I'm using tesseract to convert a small picture containing a title into a string. It runs in about one second. Here is the command line I'm using: pytesseract.image_to_string(cropped_image, nice=-10, config='--psm 7 --oem 1 -l eng+fra+spa+deu+ita+por+jpn+kor+rus+chi_sim+chi_tra') A small semantic distinction - tesseract and pytesseract are two different things, maintained by different teams. I tried to to remove the -l parameter and it's way faster (98ms), but then the title is totally wrong. I'm wondering if the time is taken to load those dictionnaries, so I can pre-load them and keep them in memory, or it's more on the processing time. Certainly every language model that you add is going to increase processing time, so you only want to load the ones that you really need, but I don't think you have the granularity of control with pytesseract to save significantly on initialization time. It appears to just use command line tesseract running in a subprocess. One thing which may cut down on overhead is collecting batch of images, saving them in a multi-image file format, and then have Tesseract process that. Tom -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion visit https://groups.google.com/d/msgid/tesseract-ocr/77af7499-6271-4135-982b-4b2fd1ee27d9n%40googlegroups.com.

