Did you fine-tune an existing model or trained a new model from scratch?

Fine-tuning without sufficient training material will degrade the performance of the base model. Also, you have to be thoughtful about how you want to resolve among, say, a circle, a zero, and letter O. Sufficient context in the training set may help. For example, letter o always appears within a word, while a circle usually stands alone. This is something LSTM can learn, but you need a big high quality training set, which can be procedurally generated if you design the rules well.

If you train a new model dedicated for shapes from scratch, you can use it with other models for normal languages at the same time. However, you might not have control over how Tesseract OCR assigns priority when it sees a circle among letter Os and zeros.


On May 26, 2024, at 14:49, Kassim Papa <[email protected]> wrote:

I tried to do it. It led to multiple bugs. 

For example it started seeing the images ok but not the usual letter.

Le mardi 21 mai 2024 à 20:20:48 UTC+2, [email protected] a écrit :
Absolutely.
1. I would first design my mapping between the shapes and a set of unicodes, so that each shape is mapped to a single character.
2. I would procedurally generate at least a few thousands of images for each shape with variations, and label them using the unicode characters. 
3. Please take a look at Tesstrain, and particularly its Makefile, so that you know what is involved in the training process. I would go over the official documentation of Tesstrain and run "make help" to see the input needed.
On Wednesday, April 17, 2024 at 12:47:52 AM UTC-4 [email protected] wrote:

Hello everyone,

I have a concern: is it possible to train Tesseract to recognize images or shapes? If so, could someone guide me on how to proceed?

--
You received this message because you are subscribed to a topic in the Google Groups "tesseract-ocr" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/tesseract-ocr/jTKhMTP6x3U/unsubscribe.
To unsubscribe from this group and all its topics, send an email to [email protected].
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/d3e09b62-de6f-4573-a136-663b9b36de20n%40googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/9F3C7D58-7E0F-43E1-AB8D-9CAB044BD68E%40gmail.com.

Reply via email to