Re: [tesseract-ocr] Can I mix tiff/box files generated by ocrd-train with original training data used to train specific language in tesseract4 (from langdata direcotry)

2018-09-05 Thread Raniem
Thanks Shree, appreciate your support Regards On Tuesday, September 4, 2018 at 7:25:33 PM UTC+1, shree wrote: > > My earlier suggestion of mixing the two kinds of images - scanned pages > and text2image created synthetic ones - was from before ocrd-train was > available. > > ocrd-train works

Re: [tesseract-ocr] Making custom traineddata

2018-09-05 Thread Shree Devi Kumar
See https://github.com/Shreeshrii/tessdata_ocrb for the files and traineddata. On Wed, Sep 5, 2018 at 8:51 PM, Shree Devi Kumar wrote: > I think finetune will be a better option than training from scratch. > > Using a small training/test text - 40 lines, I get > >

Re: [tesseract-ocr] Making custom traineddata

2018-09-05 Thread Shree Devi Kumar
I think finetune will be a better option than training from scratch. Using a small training/test text - 40 lines, I get - + lstmeval --verbosity 0 --model /home/ubuntu/ *tessdata_best/script/Latin.traineddata* --eval_listfile

[tesseract-ocr] Making custom traineddata

2018-09-05 Thread kaminski . robert . it
Hi, (I might butcher English grammar- you have been warned!) For some time I'm trying to teach tesseract to read MRZ codes.Unfortunately it's not going very well. I'm using the latest version of tesseract (4.0) soI'mm trying to train it by lstm method. I've managed to pull it off and got