Thank you very much for the finetuned traineddata for modi. It is giving good results (with some deviation) for images generated though Aksharmukh . But as guessed, for scanned copy of handwritten text the result is still poor. I will try using images for training. As I understand tesseract doesnt support image based training by default need some extra steps to be followed.
If possible please share Links with the details about training tesseract with images. Thanks once again On Friday, January 31, 2020 at 12:39:31 AM UTC-5, shree wrote: > > Please see https://github.com/Shreeshrii/tesstrain-modi for finetune > training for Modi from Marathi using synthetic training data in 2 unicode > fonts. However since Modi documents are mostly handwritten in cursive > style, the training should preferably be done using images. > > On Sunday, January 26, 2020 at 9:22:43 PM UTC+5:30, Nilambari Joshi wrote: >> >> Hi... I want to create Modi script (Marathi language) traineddata in >> tesseract for OCR. Can somebody guide what steps should I follow. >> I referred to >> https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00 >> but stuckup at a stage of creating box files. >> > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/eb7b96d1-44b4-4444-9110-2769887ee6ad%40googlegroups.com.