Hi,

I've come here after quite a few attempts and tests with tesseract as part 
of a university study project in France. The aim of this project is to 
analyse exam papers written by students in order to facilitate marking.
Our teacher wanted an open-source OCR tool, so we turned to Tesseract.

Although we've made a few attempts, I'd like your opinion on the use and 
training of Tesseract in the context of handwritten text, more precisely on 
single digits images. At this moment, we already tested this :

- Tested fra and eng traineddata on MNIST *< 65 % precision* (with PSM 10 
and PSM 13)
- Training tesseract on MNIST dataset (only fine-tuning because training 
from scratch do not worked) *< 30 % precision*
- Tested fra and eng traineddata on our custom images (made from students 
exams papers) *< 50 %*

We are not specifically looking for high precision rates, 85 - 90 % will be 
enough because we compare the results with a database of students IDs.
Here are our interogations :

- Is it possible to reach a higher precision rate on handwritten text, and 
how ?
- Is there some existing models trained for handwritten recognition ?
- Is there some existing models trained for only digit recognition ? 
Otherwise, is it possible to make tesseract recognize only digits (and so 
get only digits form the *getBestLSTMSymbolChoices()* function)
- What does the confidence value returned by tesseract correspond to ?


Thanks in advance for your help, I hope my english is understandable at 
least !

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/f3113dbc-84d2-4255-bc52-e70e67be54bfn%40googlegroups.com.

Reply via email to