Hello, I notice there may be some gaps in your understanding of Tesseract and its training requirements. Training Tesseract effectively requires careful adherence to its documentation and established processes. Proceeding without this foundation risks wasting both your time and ours. Anyway I put some notes below (inline with blue color)
Kind regards, Zdenko pi 21. 3. 2025 o 18:56 Mitya <[email protected]> napísal(a): > <https://stackoverflow.com/posts/79526256/timeline> > > I’ve been following this tutorial from YouTube: Guide to Tesseract > Training https://www.youtube.com/watch?v=KE4xEzFGSU8&t=13s and its > corresponding GitHub repository: astutejoe/tesseract_tutorial. > https://github.com/astutejoe/tesseract_tutorial > > The tutorial walks through the process of training a custom Tesseract > model, but I've run into an issue when trying to continue training the model > If the tutorial doesn't produce working results, you should contact its author. > *What we tried*: Setup: I followed the steps in the tutorial to set up > the environment, downloaded the necessary files, and began the training > process using the base eng.traineddata model. > > *Training Command*: After preparing the training data and ground truth, I > ran the following command to initiate the training: > make training MODEL_NAME=Apex START_MODEL=eng > TESSDATA=../tesseract/tessdata MAX_ITERATIONS=100 > > *Model Generation*: This command successfully generated the Apex.lstm > model file. However, I encountered an issue when trying to use the > Apex.lstm file for further training. > What does the statement ' *Model Generation*: This command successfully ...' mean? Which command did you run? What is the Apex.lstm model file? Tesseract uses traineddata files for models, correct?" > *Error:* When attempting to continue training the model, > Could you describe how you attempted to continue training the model? Also, can you specify which part of the Tesseract documentation ( https://tesseract-ocr.github.io/tessdoc/tess5/TrainingTesseract-5.html) or the tesstrain step (https://github.com/tesseract-ocr/tesstrain) you were referring to?" > I received the following error:Error, data/eng/Apex.lstm is an integer > (fast) model, cannot continue training > > **What we faced:**I have verified that the eng.traineddata file is located > correctly in /usr/share/tesseract-ocr/5/tessdata/ (path may differ > depending on installation).Despite following the tutorial and using the > correct paths for the eng.traineddata, > Not sure what you try to communicate with this as you use `../tesseract/tessdata` for training which seems to be a different location than `/usr/share/tesseract-ocr/5/tessdata/`. > I’m getting an error related to the model being an "integer model" and > unable to continue training.I tried downloading the latest eng.traineddata > from GitHub, but the error persists. > Try to search e.g. https://github.com/search?q=org%3Atesseract-ocr%20integer%20model&type=code > *Questions*: What does the "integer (fast) model" error mean, and how can > I resolve it? Is there something I missed in the training process that > would allow me to continue training Apex.lstm? Any advice or insights would > be greatly appreciated. *Environment*: Tesseract version: 5.3.0 OS: > Ubuntu 20.04 (MacBook Pro) Tesseract Data Path: > /usr/share/tesseract-ocr/5/tessdata/Base Model: eng.traineddata Makefile: > https://github.com/tesseract-ocr/tesstrain/blob/43ff10012af31914bb5b72304d9c21c8fdf4f464/Makefile > > Thank you in advance for your help! > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion visit > https://groups.google.com/d/msgid/tesseract-ocr/d09b45da-1e8a-4194-ad28-505857f0ad54n%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/d09b45da-1e8a-4194-ad28-505857f0ad54n%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion visit https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8yO1FSnEfF9xpgaj07itYdAkzhabLSha0-DVP-dF%3D5PPA%40mail.gmail.com.

