I’m currently working on fine-tuning the Tesseract OCR model (version 
5.5.3) and encountered an issue related to symbol and digit recognition.

With the original Tesseract weight file, the model was missing the colon ( 
: ) symbol. To address this, I fine-tuned the model using 500 ROIs. After 
fine-tuning, the model successfully recognized the colon; however, some 
digits began showing false positives — for example, ‘5’ was sometimes 
recognized as ‘6’.

When I used a combination of the original Russian model and the fine-tuned 
Russian model, the digits were recognized correctly, but the colon symbol 
was again missing.

*Approaches Tried (but didn’t yield the desired results):*

   - 
   
   Converted the images to binary
   - 
   
   Performed noise removal
   - 
   
   Applied CLAHE
   - 
   
   Tried all PSM modes
   - 
   
   Enabled early stopping to avoid overfitting                              
                               
   
*Training Command Used:*
make training MODEL_NAME=rusfinetune START_MODEL=rus MAX_ITERATIONS=4000 
STOP_TRAINING_CONVERGED=true TESSDATA=/usr/local/share/tessdata

May I know what could be the root cause of this issue or any suggestions to 
resolve it?

For your reference, I’ve attached the sample images.

sample_Images 
<https://acgworld-my.sharepoint.com/:f:/g/personal/sandeep_reddy_acg-world_com/EpMejgMNpZ1BmeETDsPCsHkBzP3c6dsr4ZlYfFKMc6PUyQ?e=vn0FGv>

Thank you for your time and support.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion visit 
https://groups.google.com/d/msgid/tesseract-ocr/40fad86b-8050-43f6-b613-dd096cfa5532n%40googlegroups.com.

Reply via email to