Oh nothing, `XD` was just an exclamation of laughter in place of the emoji! I found it out the hard way as I did a lot of fiddling with Docker containers (I was using the `ocrmypdf` tool, and so originally thought it was a problem with that tool itself, until I found the same behaviour in Tesseract) On Sunday, 6 July 2025 at 16:21:06 UTC+2 zdenop wrote:
> What is `Tesseract 4 XD`? What does that mean `I then found out the hard > way that ...` ???? > > Zdenko > > > ne 6. 7. 2025 o 16:18 Alessandro Griseta <[email protected]> napĂsal(a): > >> I tried manually adding files I needed from >> https://github.com/tesseract-ocr/tessdata_best (`equ.traineddata`, >> `osd.traineddata`, `ita.traineddata`) inside >> `/usr/share/tesseract-ocr/5/tessdata`: unfortunately I then found out the >> hard way that these only work on Tesseract 4 XD. >> >> 1. It seems funny though: does that really mean I'll get better results >> by downgrading so that I can actually use these files? >> >> I understand the performance loss, but I'm particularly interested in >> getting the best of `equ.traineddata`, which to my understanding interprets >> math characters, which are often a challenge for OCR engines, so was trying >> to get the absolute best scan possible for that. >> >> 2. Also, I wasn't able to specify `-l equ` as the error told me Tesseract >> is supposed to deal with that on its own: if that's the case, is `equ` >> installed by default with `sudo apt-get install tesseract-ocr` (couldn't >> find it in `tessdata` folder, and don't know where else to look for it)? >> >> 3. I also tested the Docker image: if I put `equ.traineddata` and >> `osd.traineddata` inside the `tessdata` folder will they (which I have >> chosen manually) actually be used? >> >> Hope this all makes sense, don't be afraid to ask :) >> Alessandro >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To view this discussion visit >> https://groups.google.com/d/msgid/tesseract-ocr/789d7514-bded-49e4-95ed-44cfb0049ad1n%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/789d7514-bded-49e4-95ed-44cfb0049ad1n%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion visit https://groups.google.com/d/msgid/tesseract-ocr/f9376bc3-dff6-49cd-bcb3-c0d691d0a054n%40googlegroups.com.

