Use tesserocr https://github.com/sirfz/tesserocr

Zdenko


št 22. 5. 2025 o 22:35 Jean-Marc Spaggiari <[email protected]>
napísal(a):

> Hi Tom,
>
> Thanks for having a look at this. The challenge is that I don't know which
> of those languages the title is using.
>
> Let me remove pytesseract from the picture.
>
> If I run tesseract title.jpg stdout --psm 7 --oem 1 -l
> eng+fra+spa+deu+ita+por+jpn+kor+rus+chi_sim+chi_tra it takes 0.9 second and
> returns the right title ("Advance Scout")
>
> The title is in English.
>
> If I run *tesseract title.jpg stdout --psm 7 --oem 1 -l eng+fra+spa+deu*
> it's faster (0,3s) and the title is still correct.
> If I run *tesseract title.jpg stdout --psm 7 --oem 1 -l eng+fra+spa+deu*
> it's even faster (0.25) but the title is wrong ("AVEO Segue")
> If I run *tesseract title.jpg stdout --psm 7 --oem 1 -l eng* it's crazy
> fast! (0,09s) but title is wrong again ("clyzinee Segue")
> If I use just "deu" it's super fast and correct.
>
> I can't batch the pictures as the client is waiting for the reply before
> sending the next one.
>
> So I was thinking about running each of them in parallel. I'm able to get
> a reply in 300ms! Thats 3 times faster, and it gives me this:
> clyzinee Segue
> ANVanee Scout
> AVEO EU
> Advance Scout:
> YAVanicc Sco
> Advance So ui
> eV2pe22)らの016
> 여00200606 20600ㄷ
> Ао\алее Эсодиь
> 二司多5
> 和NOU2COCOUUE
>
> But then I don't know which one I should take from those. I see the one
> from DEU is the good one. But I don't have a way to confirm that in the
> script.
>
> So multiple questions here.
> - Can tesseract work like a shell? I send a picture, I get the txt. I send
> a picture, I get the text. Without ever closing tesseract?
> - Can I get the "confidence" level for each of those predictions? It might
> help to figure which one is the most probable?
>
> Thanks,
>
> JMS
>
>
>
>
>
> Le jeu. 22 mai 2025, à 15 h 48, Tom Morris <[email protected]> a écrit :
>
>> On Wednesday, May 21, 2025 at 12:28:52 PM UTC-4 [email protected]
>> wrote:
>>
>> I'm using tesseract to convert a small picture containing a title into a
>> string. It runs in about one second.
>> Here is the command line I'm using:
>> pytesseract.image_to_string(cropped_image, nice=-10, config='--psm 7
>> --oem 1 -l eng+fra+spa+deu+ita+por+jpn+kor+rus+chi_sim+chi_tra')
>>
>>
>> A small semantic distinction - tesseract and pytesseract are two
>> different things, maintained by different teams.
>>
>>
>> I tried to to remove the -l parameter and it's way faster (98ms), but
>> then the title is totally wrong. I'm wondering if the time is taken to load
>> those dictionnaries, so I can pre-load them and keep them in memory, or
>> it's more on the processing time.
>>
>>
>> Certainly every language model that you add is going to increase
>> processing time, so you only want to load the ones that you really need,
>> but I don't think you have the granularity of control with pytesseract to
>> save significantly on initialization time. It appears to just use command
>> line tesseract running in a subprocess.
>>
>> One thing which may cut down on overhead is collecting batch of images,
>> saving them in a multi-image file format, and then have Tesseract process
>> that.
>>
>> Tom
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To view this discussion visit
>> https://groups.google.com/d/msgid/tesseract-ocr/77af7499-6271-4135-982b-4b2fd1ee27d9n%40googlegroups.com
>> <https://groups.google.com/d/msgid/tesseract-ocr/77af7499-6271-4135-982b-4b2fd1ee27d9n%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion visit
> https://groups.google.com/d/msgid/tesseract-ocr/CAPQV63UrnpiXXMwypR2bq2gqk5YR1qm1TB22rZKUKQ_xXEgfvg%40mail.gmail.com
> <https://groups.google.com/d/msgid/tesseract-ocr/CAPQV63UrnpiXXMwypR2bq2gqk5YR1qm1TB22rZKUKQ_xXEgfvg%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8zB28Y5ur%2Bbeh_Ce6NWn4rqScd%2BS8NbCmFrqJ9RJ%2B8YGg%40mail.gmail.com.

Reply via email to