read full content on this link. https://groups.google.com/g/tesseract-ocr/c/-G7TZEnVHgE . i think it can help you if you find fine-tune or from scratch but about handwritten texts i don't know. On Friday, 17 November, 2023 at 11:58:50 pm UTC+6 tfmo...@gmail.com wrote:
> Hi and welcome to the group. > > On Thursday, November 16, 2023 at 10:25:40 AM UTC-5 israel...@gmail.com > wrote: > > I want to create an entirely new language from handwritten texts. > > > I think the "handwritten" aspect is probably at least as important as the > "new language" part. Tesseract was designed to do optical character > recognition of mechanically printed texts. Handwriting is very different. > There have been some attempts to do this in the past, but only with block > printed characters and, even then recognition rates were under 90% which > isn't adequate for most uses. If you search the archives here or google > "tesseract handwriting" (without the quotes), you'll find lots of reading > material. > > > The language in question is Innu-aimun. The alphabet is quite simple, > consisting of some of the Latin alphabets with the addition of a > superscript u character that always appears after a consonant. > > > There is a Latin script model which has been trained in a language > independent fashion, so you could give that a try to see how well it does > (modulo your superscript u). > > For training with natural images (standard training uses synthesized > images), look at some of the examples in the tesstrain wiki > <https://github.com/tesseract-ocr/tesstrain/wiki>, particularly the > GT4HistOCR page > <https://github.com/tesseract-ocr/tesstrain/wiki/GT4HistOCR>. > For any training you'll need ground truth text matched with your segmented > line images to train on. > > Good luck! It sounds like an interesting (but non-trivial) project. > > Tom > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/e0d7ca77-5981-472d-9056-599b496413e8n%40googlegroups.com.