Oh nothing, `XD` was just an exclamation of laughter in place of the emoji!
I found it out the hard way as I did a lot of fiddling with Docker 
containers (I was using the `ocrmypdf` tool, and so originally thought it 
was a problem with that tool itself, until I found the same behaviour in 
Tesseract)
On Sunday, 6 July 2025 at 16:21:06 UTC+2 zdenop wrote:

> What is  `Tesseract 4 XD`?  What does that mean `I then found out the hard 
> way that ...` ????
>
> Zdenko
>
>
> ne 6. 7. 2025 o 16:18 Alessandro Griseta <[email protected]> napĂ­sal(a):
>
>> I tried manually adding files I needed from 
>> https://github.com/tesseract-ocr/tessdata_best (`equ.traineddata`, 
>> `osd.traineddata`, `ita.traineddata`) inside 
>> `/usr/share/tesseract-ocr/5/tessdata`: unfortunately I then found out the 
>> hard way that these only work on Tesseract 4 XD. 
>>
>> 1. It seems funny though: does that really mean I'll get better results 
>> by downgrading so that I can actually use these files?
>>
>> I understand the performance loss, but I'm particularly interested in 
>> getting the best of `equ.traineddata`, which to my understanding interprets 
>> math characters, which are often a challenge for OCR engines, so was trying 
>> to get the absolute best scan possible for that.
>>
>> 2. Also, I wasn't able to specify `-l equ` as the error told me Tesseract 
>> is supposed to deal with that on its own: if that's the case, is `equ` 
>> installed by default with `sudo apt-get install tesseract-ocr` (couldn't 
>> find it in `tessdata` folder, and don't know where else to look for it)?
>>
>> 3. I also tested the Docker image: if I put `equ.traineddata` and 
>> `osd.traineddata` inside the `tessdata` folder will they (which I have 
>> chosen manually) actually be used?
>>
>> Hope this all makes sense, don't be afraid to ask :)
>> Alessandro
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected].
>> To view this discussion visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/789d7514-bded-49e4-95ed-44cfb0049ad1n%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/789d7514-bded-49e4-95ed-44cfb0049ad1n%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion visit 
https://groups.google.com/d/msgid/tesseract-ocr/f9376bc3-dff6-49cd-bcb3-c0d691d0a054n%40googlegroups.com.

Reply via email to