Re: [tesseract-ocr] recognising roman with sanskrit diacritics

2018-07-02 Thread yajva
Many thanks. Downloaded and using. Will wait for next ver. On Sunday, July 1, 2018 at 12:21:19 AM UTC+5:30, shree wrote: > > I have uploaded a new version of traineddata file at > > https://github.com/Shreeshrii/tessdata_shreetest/blob/master/iast-layer-18003.traineddata > > Attached is the OCRe

Re: [tesseract-ocr] Fine tuning existing model

2018-07-02 Thread Lorenzo Bolzani
Hi Shree, I replaced the line: merge_unicharsets $(TESSDATA)/$(CONTINUE_FROM).lstm-unicharset $(TRAIN)/my.unicharset "$@" with: cp "$(TRAIN)/my.unicharset" "data/unicharset" (I write this in case someone else is following this thread). And now I have a fine tuned brand new model with only t

Re: [tesseract-ocr] Train 2 language together

2018-07-02 Thread Zohreh Khosrobeygi
Thx. you're right. On Sunday, July 1, 2018 at 10:02:55 PM UTC+4:30, shree wrote: > > The font being used does not support English. > > On Sun, Jul 1, 2018 at 10:06 PM Zohreh Khosrobeygi > wrote: > >> Hi, >> I have been training the text: >> >> 272-135031- BECAUSE YOU WERE SLEEPING INSTEAD OW

[tesseract-ocr] Re: Tesseract v3.05.02 Training Error During Processing

2018-07-02 Thread James Lipham
I have also updated the image to have everything as the same font/size/etc, but still, tesseract just says "Error during processing." with seemingly zero information as to why. Has anyone ever experienced this? If I can't find anything else out, I guess I'll just have to step through the page p

Re: [tesseract-ocr] Encoding of string failed when finetune fot adding new fonts is fas language

2018-07-02 Thread Shree Devi Kumar
You can use find_fonts with your training_text to locate the fonts to use. Modify the following command to match your directory setup and try echo "## FIND FONTS ##" # Find fonts which can render your training_text. Run `fc-cache -vf` to refresh cache. # You can change the minimum coverag

[tesseract-ocr] Re: Where can i get Other language Cube language files.

2018-07-02 Thread cohengil333
Great question, I'm stuck too with this just with Hebrew OCR. Any suggestion? On Tuesday, March 13, 2018 at 7:13:50 PM UTC+2, Harshit Dohare wrote: > > Hi, > > As far as I have looked into Tesseract, cube files are only available for > Hindi and Arabic language. > Check here - https://github.c

[tesseract-ocr] A friendly suggestion for the "tesseract-ocr" group members (Concern to all members)

2018-07-02 Thread cohengil333
It seems with all languages and revisions, people (including me) tend to search a lot for answers here in the group. So I have a suggestion, Can the group administrator pin a message with a spreadsheet, which consists the state of each revision with the corresponding language this way it would

Re: [tesseract-ocr] Encoding of string failed when finetune fot adding new fonts is fas language

2018-07-02 Thread Shree Devi Kumar
also see https://github.com/tesseract-ocr/tesseract/issues/549 On Mon, Jul 2, 2018 at 7:45 PM Shree Devi Kumar wrote: > You can use find_fonts with your training_text to locate the fonts to use. > > Modify the following command to match your directory setup and try > > echo "## FIND FONTS

[tesseract-ocr] Re: Tesseract v3.05.02 Training Error During Processing

2018-07-02 Thread Quan Nguyen
Wrong filename format. The box should be named `eng.dmd.exp0.box`. On Monday, July 2, 2018 at 7:40:26 AM UTC-5, James Lipham wrote: > > I have also updated the image to have everything as the same > font/size/etc, but still, tesseract just says "Error during processing." > with seemingly zero in

[tesseract-ocr] How to generate multiple teesedit_write_images output

2018-07-02 Thread Junye Li
Hi there, I want to see the actual input images processed by tesseract usingthe command -c tesseract and I used tessedit_write_images=TRUE. However, when I pass multi-layer (mutiple pages) .tiff image to tesseract the output tessinput.tif image only contains one layer, which is the last page

[tesseract-ocr] Check validity of box and image files

2018-07-02 Thread chandra churh chatterjee
We are trying to train tesseract 4 on hand written images and have generated the following types of images and their respective box files. We can't understand whether our box files are correct or not.Can any one please confirm? -- You received this message because you are subscribed to the Google

Re: [tesseract-ocr] Encoding of string failed when finetune fot adding new fonts is fas language

2018-07-02 Thread ran go
the problem is still there, i saw those links but problem is still here On Tue, Jul 3, 2018 at 12:54 AM, Shree Devi Kumar wrote: > also see https://github.com/tesseract-ocr/tesseract/issues/549 > > > > On Mon, Jul 2, 2018 at 7:45 PM Shree Devi Kumar > wrote: > >> You can use find_fonts with you