read full content on this 
link. https://groups.google.com/g/tesseract-ocr/c/-G7TZEnVHgE . i think it 
can help you if you find fine-tune or from scratch but  about  
handwritten texts  i don't know.
On Friday, 17 November, 2023 at 11:58:50 pm UTC+6 tfmo...@gmail.com wrote:

> Hi and welcome to the group.
>
> On Thursday, November 16, 2023 at 10:25:40 AM UTC-5 israel...@gmail.com 
> wrote:
>
>  I want to create an entirely new language from handwritten texts. 
>
>
> I think the "handwritten" aspect is probably at least as important as the 
> "new language" part. Tesseract was designed to do optical character 
> recognition of mechanically printed texts. Handwriting is very different. 
> There have been some attempts to do this in the past, but only with block 
> printed characters and, even then recognition rates were under 90% which 
> isn't adequate for most uses. If you search the archives here or google 
> "tesseract handwriting" (without the quotes), you'll find lots of reading 
> material.
>  
>
> The language in question is Innu-aimun. The alphabet is quite simple, 
> consisting of some of the Latin alphabets with the addition of a 
> superscript u character that always appears after a consonant.
>
>
> There is a Latin script model which has been trained in a language 
> independent fashion, so you could give that a try to see how well it does 
> (modulo your superscript u). 
>
> For training with natural images (standard training uses synthesized 
> images), look at some of the examples in the tesstrain wiki 
> <https://github.com/tesseract-ocr/tesstrain/wiki>, particularly the 
> GT4HistOCR page 
> <https://github.com/tesseract-ocr/tesstrain/wiki/GT4HistOCR>.
> For any training you'll need ground truth text matched with your segmented 
> line images to train on.
>
> Good luck! It sounds like an interesting (but non-trivial) project.
>
> Tom
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/e0d7ca77-5981-472d-9056-599b496413e8n%40googlegroups.com.

Reply via email to