You can simply use this in your program just after init to set whitelist / blacklist:
*api.Init(argv[**0**],** **lang,** **&(argv[arg]),** **argc-arg,** **false** );** **api.SetVariable(**"tessedit_char_whitelist"**,** ** "ABCDEFGHIJKLMNOPQRSTUVWXYZ.0123456789 "**);* -- Regards, Saurabh Gandhi On Fri, Feb 18, 2011 at 3:21 PM, Sriranga(78yrsold) <withblessi...@gmail.com > wrote: > *Customise the tesseract engine to recognize only the characters from > **A-Z,0-9,.(dot), > (space) by setting the character white-list * Kindly furnish the name > of the folder in which whitelist as well as blacklist are existed. I want to > utilise the same for Kannada scripts. > -sriranga(78yrs) > > > On Fri, Feb 18, 2011 at 11:57 AM, Ray Smith <theraysm...@gmail.com> wrote: > >> From all this, I have identified the following ways of improving the >> results: >> >> 1. Customise the tesseract engine to recognize only the characters >> from A-Z,0-9,.(dot), (space) by setting the character white-list. My >> understanding is that the white-list is the list of characters that are >> going to be sensed. I was inquisitive to know what the blacklist is meant >> to >> do? >> Just the opposite of whitelist. You can disable specific characters >> from the usual set. >> 2. A lot of times I have seen fairly good number plate images being >> OCRed inaccurately. This could possibly be due to the word recognition >> stage. Has anyone found a way to disable the dictionary / word >> recognition. >> Play with segment_penalty_dict_* >> 3. Then there are some page segmentation modes >> (PSM_AUTO,PSM_SINGLE_BLOCK, PSM_CHAR etc). Does PSM_CHAR imply that it >> will >> consider the input image as a single character and run the algorithm >> accordingly without attempting word recognition? >> Yes. >> 4. Another important configuration macro that I have seen within the >> code was AVS_FASTEST = 0, AVS_MOST_ACCURATE = 100. However, I could not >> find the same being used anywhere in the code. Does this have any impact >> on >> the *character recognition*accuracy? >> This control is dead in 3.01. Replaced by ocr_engine_mode. It just >> controls the combination of tesseract vs cube. Cube increases the accuracy >> slightly, but adds a lot of compute time. >> 5. Finally, I also plan to use the confidence level data. Are there >> any indicators of confidence for characters as well. There is word >> confidence data which can be found in TessBaseAPI:: >> AllWordConfidences(). >> Yes, and they are exposed in the new ResultIterator in 3.01, otherwise >> you have to go down into the guts of the data structures. >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To post to this group, send email to tesseract-ocr@googlegroups.com. >> To unsubscribe from this group, send email to >> tesseract-ocr+unsubscr...@googlegroups.com. >> For more options, visit this group at >> http://groups.google.com/group/tesseract-ocr?hl=en. >> > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To post to this group, send email to tesseract-ocr@googlegroups.com. > To unsubscribe from this group, send email to > tesseract-ocr+unsubscr...@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to tesseract-ocr@googlegroups.com. To unsubscribe from this group, send email to tesseract-ocr+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.