Yes, please take a look at Tesstrain 
<https://github.com/tesseract-ocr/tesstrain>, and particularly its 
Makefile, so that you know what is involved in the training process. I 
would go over the official documentation of Tesstrain and run "make help" 
to see the input needed. One of the items, among many, that you have not 
specified is the CNN-LSTM network specs, which you can ask GPT/Claude to 
explain to you.

Furthermore, you can use GPT or Claude to digest the Makefile for you so 
that you know what binaries are invoked during different steps of the 
training process. Once you find the binaries involved, you can do something 
like "lstmtraining --help" for each binary and check for the complete list 
of options, some of which are not specified in the Tesstrain Makefile.

Once you digest the Makefile of Tesstrain, it will become clear to you 
that, as messy as it may be, it is just an ugly wrapper to run various 
Tesseract binaries in sequence, which you can implement yourself. Then, you 
can (use GPT/Claude to) tailor the Makefile for you and even turn it into 
an equivalent Python script for easier modifications. This is almost 
certainly necessary if your training set is very large.

On Thursday, April 18, 2024 at 1:07:46 AM UTC-4 [email protected] wrote:

>  Hello,  I'm testing tesseract and I'm not able to process texts that use 
> cursive fonts. How do I proceed in this situation, should I train a model 
> myself? If so, do you have a tip for me to do this? I'm new to using 
> tesseract, please help me.
>  Hello,  I'm testing tesseract and I'm not able to process texts that use 
> cursive fonts. How do I proceed in this situation, should I train a model 
> myself? If so, do you have a tip for me to do this? I'm new to using 
> tesseract, please help me. Any mi 
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/f68c06e0-34b6-41ca-8285-045c58745d83n%40googlegroups.com.

Reply via email to