more aggressive chopping in makebox or printout box info in recognition

2012-03-10 Thread Falke
Dear group: Is there a way to make tesseract print out box information upon/during recognition? I am trying to recognize low-rez images (mentioned in other threads), and tesseract does excellent, correct chopping of the text (the errors are mostly misrecognized individual glyphs but NOT lumping t

Re: Accuracy of Ocr is very low

2012-03-10 Thread Sandeep Parmar
Idealy for good recognition the text to be recognized should have high contrast colors i.e. either black font and white back ground and vice versa. and font size above 10 is good for recognition. On Fri, Mar 9, 2012 at 11:34 AM, swati sharma wrote: > Yes,we know it will not work for colored imag

Re: extract word-list failed

2012-03-10 Thread Sriranga(78yrsold)
TP, I am extremely thankful to you for the valuable guidance. Updated upto r-703 and successfully generated LIB_debug and LIB_ release tested both version successfully generated word.wordlist - both contains only *390*KB each. 18720 WORDS -each version. With warmest Regards, -sriranga(79yrs) On

Re: extract word-list failed

2012-03-10 Thread TP
Hmmm, my last post had URLs split in unfortunate places. Here's the list of screen capture URLs again but starting at the beginning of each line. 00 dawg2wordlist Debugging Pane Settings http://www.screencast.com/t/4eTQi8lZEa 01 Start Debugging dawg2wordlist http://www.screencast.com/t/wNw7ziQoQ5

Re: extract word-list failed

2012-03-10 Thread TP
Sriranga(78yrs), Here's some instructions and pictures of how to use Visual Studio 2008 to see where dawg2wordlist is crashing on Windows. Assuming that I have the following folder hierarchy: BuildFolder\ tesseract-3.02\ tessdata\ kan.traineddata testing\