Hi, I wanted to reach out regarding my recent attempt to train Tesseract 5 for a new font, specifically in German. I followed a tutorial I found on YouTube: https://www.youtube.com/watch?v=KE4xEzFGSU8) and initially had success when training it for English. However, upon transitioning to German, I encountered an error that I'm struggling to resolve.
The issue arises with the file data/deu/Apex.lstm-unicharset, which appears to be missing. In langdata, I've confirmed that the file deu.unicharset exists and is correct; all German characters are present as expected. However, upon further inspection, I noticed discrepancies in the file data/Apex/my.unicharset. Not all characters from the all-gt dataset seem to be included. I've reviewed the process and ensured that all steps were followed accurately, but I'm still encountering this error. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/9689febe-6823-4498-a907-e9ee30c93788n%40googlegroups.com.

