Re: [tesseract-ocr] Re: cmc7.traineddata

'Mamadou' via tesseract-ocr Sat, 04 Apr 2020 02:09:34 -0700

As already said, you've to use OCR-D (https://github.com/OCR-D/ocrd-train) 
for the training. It's a very easy to use tool. You should visit their 
website and read the documentation. Try doing something by yourself and if 
you have some issues then, ask for help. 
This said, the samples you're attaching won't help. You need thousands of 
samples for training. In our case we have 17k samples to train tensorflow. 
Try web scraping to collect real life samples instead of using synthetic 
data.


On Friday, April 3, 2020 at 7:11:01 PM UTC+2, Ghada Aruri wrote:
>
>   hi mamadou
>
> thank you for your reply ,i attach MICRS in a file names "cmc7.txt"   only 
> i need  to train it using cmc7 font to  prepare  cmc7.traineddata with high 
> accuracy  ,so what are the tools and step to do it  ??
>
> On Fri, 3 Apr 2020 at 18:43, 'Mamadou' via tesseract-ocr <
> [email protected] <javascript:>> wrote:
>
>> The easiest way to train MICR CMC-7 font for Tesseract would be using 
>> OCR-D (https://github.com/OCR-D/ocrd-train). This is what we've used in 
>> our R&D project (https://github.com/DoubangoTelecom/tesseractMICR 
>> <https://www.google.com/url?q=https%3A%2F%2Fgithub.com%2FDoubangoTelecom%2FtesseractMICR&sa=D&sntz=1&usg=AFQjCNHbc13lS6XZXxEFlL7PUrC4D0Bfjg>).
>>  
>> We open sourced the MICR E-13B traineddata but not the CMC-7. We're not 
>> using these models in our products but the result is more accurate than  
>> any commercial product you can find online (LEADTOLS 
>> <https://demo.leadtools.com/JavaScript/BankCheckReader/>, accusoft 
>> <http://download.accusoft.com/micrxpress/MICRXpressDemonstration.exe>, 
>> recogniform <http://www.recogniform.net/eng/micr-e13b-sdk.html> and abbyy 
>> <https://www.abbyy.com/ocr_sdk/>). You'll also need heavy pre-processing 
>> to fill the interspaces. If you're familiar with Tensorflow then, I'd 
>> recommend using it instead of Tesseract.
>>
>> On Thursday, April 2, 2020 at 8:22:44 PM UTC+2, Ghada Aruri wrote:
>>>
>>> Hi team, 
>>>
>>>  For CMC-7, I want to train it  by using jTessBoxEditor to get 
>>> cmc7.traineddata  what the steps to get the cmc7.traineddata?
>>>  and if anybody has done it and is willing to share me if you can? 
>>>
>>> Best Regards.
>>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/c2c7b529-f5ea-47a7-89f5-3b6b88668370%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/c2c7b529-f5ea-47a7-89f5-3b6b88668370%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/914bc0d2-3fca-4944-8c9c-539265abd9c1%40googlegroups.com.

Re: [tesseract-ocr] Re: cmc7.traineddata

Reply via email to