Re: [tesseract-ocr] How to use tesseract 4 engineMode 2 ( Legacy + LSTM engines)?

2018-07-12 Thread Shree Devi Kumar
The traineddata files can hold both types of models. The OCR Engine mode chooses which ones get used. https://github.com/tesseract-ocr/tesseract/wiki/Data-Files#format-of-traineddata-files On Fri, Jul 13, 2018 at 9:31 AM 于洋 wrote: > Tesseract 4 introduced new LSTM engine. The LSTM engine needs

[tesseract-ocr] How to use tesseract 4 engineMode 2 ( Legacy + LSTM engines)?

2018-07-12 Thread 于洋
Tesseract 4 introduced new LSTM engine. The LSTM engine needs LSTM trained data, and the legacy engine needs old trained data. Two types of trained data are incompatiable with each other. When I set OCR Engine to 2, it will use Legacy and LSTM engines. But how can i provide two types(LSTM and

Re: [tesseract-ocr] recognising roman with sanskrit diacritics

2018-07-12 Thread Shree Devi Kumar
Thank you for your feedback of eng+ I will try training for this and get back. On Thu, Jul 12, 2018 at 2:18 PM yajva wrote: > eng+iast-plus-3600 => no diacritics at all > Latin+iast-plus-3600 => only macrons none other > > > > On Thursday, July 12, 2018 at 1:12:25 AM UTC+5:30, shree wrote: >>

Re: [tesseract-ocr] why tesseract gives junk value for japanese language?

2018-07-12 Thread Shree Devi Kumar
Try traineddata from tessdata_best and tessdata_fast On Thu 12 Jul, 2018, 6:45 PM mahendrag gajera, wrote: > Hello all > > I am try to ocr japanese images via below code. But it give junk character. > My tesseract version is 4.0 > > Please let me know what is missing here. > > void Test(char*

[tesseract-ocr] Unable to train Tesseract on Windows

2018-07-12 Thread pradipdwellar72
I tried training tesseract in windows using jTessBoxEditor. It created eng.traineddata but while creating it gave several errors like: so when i tried using that trained file i am not getting newly improved result i also tried similar way serak-tesseract-trainer-master i tried box

[tesseract-ocr] why tesseract gives junk value for japanese language?

2018-07-12 Thread mahendrag gajera
Hello all I am try to ocr japanese images via below code. But it give junk character. My tesseract version is 4.0 Please let me know what is missing here. void Test(char* imagePath) { char *outText; tesseract::TessBaseAPI *api = new tesseract::TessBaseAPI(); // Initialize tesseract-ocr with

[tesseract-ocr] Re: Letters split in multiple parts

2018-07-12 Thread Lorenzo Bolzani
Any ideas about this? I'm encountering this problem quite often, even with custom training. I tried to do some data augmentation during training varying the number of pixels on the left but did not help. Should I report it as an issue on github and discuss it there? Thanks, bye Lorenzo

Re: [tesseract-ocr] recognising roman with sanskrit diacritics

2018-07-12 Thread yajva
eng+iast-plus-3600 => no diacritics at all Latin+iast-plus-3600 => only macrons none other On Thursday, July 12, 2018 at 1:12:25 AM UTC+5:30, shree wrote: > > What about ocr with > > eng+iast > > > > On Wed 11 Jul, 2018, 7:44 PM yajva, > > wrote: > >> shree >> namaste >> >> I am trying to OCR