Hi, I've come here after quite a few attempts and tests with tesseract as part of a university study project in France. The aim of this project is to analyse exam papers written by students in order to facilitate marking. Our teacher wanted an open-source OCR tool, so we turned to Tesseract.
Although we've made a few attempts, I'd like your opinion on the use and training of Tesseract in the context of handwritten text, more precisely on single digits images. At this moment, we already tested this : - Tested fra and eng traineddata on MNIST *< 65 % precision* (with PSM 10 and PSM 13) - Training tesseract on MNIST dataset (only fine-tuning because training from scratch do not worked) *< 30 % precision* - Tested fra and eng traineddata on our custom images (made from students exams papers) *< 50 %* We are not specifically looking for high precision rates, 85 - 90 % will be enough because we compare the results with a database of students IDs. Here are our interogations : - Is it possible to reach a higher precision rate on handwritten text, and how ? - Is there some existing models trained for handwritten recognition ? - Is there some existing models trained for only digit recognition ? Otherwise, is it possible to make tesseract recognize only digits (and so get only digits form the *getBestLSTMSymbolChoices()* function) - What does the confidence value returned by tesseract correspond to ? Thanks in advance for your help, I hope my english is understandable at least ! -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/f3113dbc-84d2-4255-bc52-e70e67be54bfn%40googlegroups.com.