[tesseract-ocr] Capturing the largest characters from an image

2017-06-01 Thread Brent Roberts
All - I am curious to know if anyone has a solution or an idea for how to accomplish the following: exclude all but the largest characters. For example, in the following image how could i get only "10:23" ? thank you in advance for any hints or insight!

[tesseract-ocr] What can you expect from --user-words option?

2017-06-01 Thread Youcef
Hi, I have searched a lot about --user-words option in the internet to know more about it, but unsuccessfully. I'am treating a simple case with spanish trained data doing : api/tesseract -l spa --psm 6 test.png output tessdata/configs/unlv; I expect the following output from my image : numer

Re: [tesseract-ocr] Same Font with Multible Styles

2017-06-01 Thread ShreeDevi Kumar
text2image --list_available_fonts --fonts_dir /mnt/c/Windows/Fonts replace the fonts directory with your fonts location eg. 633: Times New Roman, 634: Times New Roman, Bold 635: Times New Roman, Bold Italic 636: Times New Roman, Italic 637: Trajan Pro 638: Trajan Pro Bold 639: Trebuchet MS 640:

Re: [tesseract-ocr] Re: Store rotated pages

2017-06-01 Thread gmail
This didn't go through for some reason: " The free program imagemagic has many tool for manipulating image files. I don't use it very often but the last time I did, something like this worked (on command line): imconvert -rotate 91 *.jpg jrot-%04d.jpg I renamed the convert.exe binary to i

[tesseract-ocr] Same Font with Multible Styles

2017-06-01 Thread Ibr
Hi, If we assume that we have set of fonts files, and all of there fonts files are for the same font, but each one of them is for a different style, for example if we have font "test" there will be file for test regular, and file for test bold and file for test italic, but all of these files or

[tesseract-ocr] Re: How to improve accuracy for OCR?

2017-06-01 Thread Jasnan Tp
Wilko, thank you very much for sharing your traineddata. it works almost perfect for me. most of the time 100% correct results and otherwise at least getting 98% right. May I know how you trained tesseract? the tools you were using to build this trained data? Thanking You Jasnan On Wednesday

Re: [tesseract-ocr] How to add space between strings of document. Punjabi (Gurmukhi) language have a space issue, after ocr the image it is showing no space b/w the text.

2017-06-01 Thread ShreeDevi Kumar
Read https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00 Follow the tutorials. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr

[tesseract-ocr] What is the "Confidence"value returned by Tesseract and how it is calculated?

2017-06-01 Thread Thilina Jayathilaka
Hello, 1. I need to know what is the confidence value (returned by tesseract API) and how it calculates that value? 2. Is there any possibility that I can change the accuracy levels of tesseract? 3. Can I detect the confidence value for *each letter* separately when I pass an image which c

Re: [tesseract-ocr] How to add space between strings of document. Punjabi (Gurmukhi) language have a space issue, after ocr the image it is showing no space b/w the text.

2017-06-01 Thread Mandeep Singh
Now i am using Tesseract 4.0 version as per your guidance. I want to train data for version 4.0 . Yes i am making spaces b/w the text but it is not showing spaces b/w the text. Please now tell me how do i train the data again for the new version. On Thursday, 1 June 2017 14:54:50 UTC+5:30, shr

[tesseract-ocr] How to use tesseractr multi languages ocr effectively

2017-06-01 Thread Matan
Hey, I've multiple images with different languages - English, Hebrew, Chinese, Arabic, Latin, etc.. I would like to run teasseractr on all of them in parallel (on the same command). I couldn't find a way to run it without getting bad results. Letters from Latin were translated to English, Chin

Re: [tesseract-ocr] How to add space between strings of document. Punjabi (Gurmukhi) language have a space issue, after ocr the image it is showing no space b/w the text.

2017-06-01 Thread ShreeDevi Kumar
Are you training for 3.0 or 4.0? Do you have spaces between the letters in your training text? Read https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00 ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com O

Re: [tesseract-ocr] How to add space between strings of document. Punjabi (Gurmukhi) language have a space issue, after ocr the image it is showing no space b/w the text.

2017-06-01 Thread Mandeep Singh
ohhh Thank you very much it is working. many many thanks to you. but i have more questions. 1. if i am training new data still there is space problem. 2. How do i add more data in pan.traindata or can i edit existing traindata? On Thursday, 1 June 2017 14:34:14 UTC+5:30, shree wrote: > > http

Re: [tesseract-ocr] How to add space between strings of document. Punjabi (Gurmukhi) language have a space issue, after ocr the image it is showing no space b/w the text.

2017-06-01 Thread ShreeDevi Kumar
https://github.com/tesseract-ocr/tessdata has the traineddata for 4.0. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.c

Re: [tesseract-ocr] How to add space between strings of document. Punjabi (Gurmukhi) language have a space issue, after ocr the image it is showing no space b/w the text.

2017-06-01 Thread ShreeDevi Kumar
Please read the wiki links I sent. If you have installed tesseract 4.0, please test first with the provided traineddata for Punjabi before trying to train. Most times, existing traineddata provides the best result. ShreeDevi भजन - की

Re: [tesseract-ocr] How to add space between strings of document. Punjabi (Gurmukhi) language have a space issue, after ocr the image it is showing no space b/w the text.

2017-06-01 Thread Mandeep Singh
i had install tesseract.exe 4.0 on my system after that i am using jTessBoxEditor 2.0 for training data punjabi language. Thats it. i dont what does it mean by lstm? please guide me On Thursday, 1 June 2017 14:04:34 UTC+5:30, shree wrote: > > Are you using the 4.0 version of tesseract with --oem

Re: [tesseract-ocr] How to add space between strings of document. Punjabi (Gurmukhi) language have a space issue, after ocr the image it is showing no space b/w the text.

2017-06-01 Thread ShreeDevi Kumar
Are you using the 4.0 version of tesseract with --oem 1 (LSTM engine only)? ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Thu, Jun 1, 2017 at 1:13 PM, Mandeep Singh wrote: > kindly view this issue or please guide me

Re: [tesseract-ocr] How to add space between strings of document. Punjabi (Gurmukhi) language have a space issue, after ocr the image it is showing no space b/w the text.

2017-06-01 Thread Mandeep Singh
kindly view this issue or please guide me how do i add config file for punjabi language. On Thursday, 1 June 2017 11:40:22 UTC+5:30, Mandeep Singh wrote: > > > There is still space issue. kindly review this attachment . > > > Please help me out . > > > On Wednesday, 31 May 2017 18:11:10 UTC+5:30,