I followed the steps for fine-tuning Tesseract for handwriting recognition. 
I have the character images and the corresponding box files. Then I 
generated the .lstmf files, followed by the lstm_train.txt and 
lstm_test.txt files.

However, when I launch the training using these list files, it doesn't 
work. But when I test the training with only a single path in the train and 
test text files, it works perfectly — the training starts correctly.

Also, all the .lstmf files are generated properly, because I wrote a script 
that trains on each file one by one, continuing from the last checkpoint 
each time. This worked for all the .lstmf files.

I'm not sure if the issue is with the generation of the lstm_train.txt, or 
if lstmtraining only accepts a single .lstmf file as input?

Here is the code for generating the lstm_train.txt and lstm_test.txt files :

import os
import random

input_dir = "test"
train_file = "lstm_train.txt"
test_file = "lstm_test.txt"

# Liste tous les fichiers .lstmf
all_files = [f for f in os.listdir(input_dir) if f.endswith(".lstmf")]
random.shuffle(all_files)  # Mélange aléatoire

# Proportion pour l'entraînement (80%)
train_split = 0.8
train_count = int(len(all_files) * train_split)

train_files = all_files[:train_count]
test_files = all_files[train_count:]

# Écriture des fichiers train et test avec chemins relatifs
with open(train_file, "w", encoding="utf-8") as f_train, \
     open(test_file, "w", encoding="utf-8") as f_test:
    
    for f in train_files:
        relative_path = os.path.join(input_dir, f)
        f_train.write(relative_path+"\n")
        
    for f in test_files:
        relative_path = os.path.join(input_dir, f)
        f_test.write(relative_path+"\n")

print(f"[OK] Fichiers '{train_file}' et '{test_file}' créés avec chemins 
relatifs.")


voici un extrait de fichier lstm_train.txt : 

[image: Capture d'écran 2025-06-11 095440.png]


-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion visit 
https://groups.google.com/d/msgid/tesseract-ocr/5641c5d8-42b1-46d8-8ce0-67f614cf32dbn%40googlegroups.com.

Reply via email to