[tesseract-ocr] Re: Adding Modi Script to Tesseract

'Nilambari Joshi' via tesseract-ocr Fri, 31 Jan 2020 10:30:36 -0800

Thank you very much for the finetuned traineddata for modi. It is giving 
good results (with some deviation) for images generated though Aksharmukh
. But as guessed, for scanned copy of handwritten text the result is still 
poor. I will try using images for training. 
As I understand tesseract doesnt support image based training by default 
need some extra steps to be followed.


If possible please share Links with the details about training tesseract 
with images. Thanks once again



On Friday, January 31, 2020 at 12:39:31 AM UTC-5, shree wrote:
>
> Please see https://github.com/Shreeshrii/tesstrain-modi for finetune 
> training for Modi from Marathi using synthetic training data in 2 unicode 
> fonts. However since Modi documents are mostly handwritten in cursive 
> style, the training should preferably be done using images.
>
> On Sunday, January 26, 2020 at 9:22:43 PM UTC+5:30, Nilambari Joshi wrote:
>>
>> Hi... I want to create Modi script (Marathi language) traineddata in 
>> tesseract for OCR. Can somebody guide what steps should I follow.
>> I referred to 
>> https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00 
>> but stuckup at a stage of creating box files.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/eb7b96d1-44b4-4444-9110-2769887ee6ad%40googlegroups.com.

[tesseract-ocr] Re: Adding Modi Script to Tesseract

Reply via email to