On Sat, Oct 13, 2012 at 10:47 PM, JVIyer <jawant...@gmail.com> wrote:
> *A lot of times I have seen fairly good number plate images being OCRed > inaccurately. This could possibly be due to the word recognition stage. Has > anyone found a way to disable the dictionary / word recognition. > * > Saurabh, Have you been able to accomplish this ? Could you kindly share > your insigths ? I have a similar need. > Thanks a lot in advance. > First of all - make sure that "fairly good" is also relevant for binarized version of your image. Next - dictionaries can be disable only at init time [1], [2]. So create config file where you specified (load_system_dawg F) which dictionaries[3] should not be loaded . [1] https://code.google.com/p/tesseract-ocr/issues/detail?id=737 [2] http://code.google.com/p/tesseract-ocr/wiki/ControlParams#Details [3] https://code.google.com/p/tesseract-ocr/source/browse/trunk/dict/dict.cpp#43 -- Zdenko > On Wednesday, February 16, 2011 10:48:56 PM UTC-6, Saurabh Gandhi wrote: >> >> Hello everyone, >> >> I am currently using tesseract 3.x for license plate recognition. >> I have an algorithm which does a good job in pre-processing the input >> image to localize the plate. >> However, when I use the Tesseract OCR engine to classify the plate >> number, the recognition is not that accurate. I have gone through the >> tesseract whitepapers as well as some of the threads discussing the LPR >> using tesseract. >> >> From all this, I have identified the following ways of improving the >> results: >> >> 1. Customise the tesseract engine to recognize only the characters >> from A-Z,0-9,.(dot), (space) by setting the character white-list. My >> understanding is that the white-list is the list of characters that are >> going to be sensed. I was inquisitive to know what the blacklist is meant >> to do? >> 2. A lot of times I have seen fairly good number plate images being >> OCRed inaccurately. This could possibly be due to the word recognition >> stage. Has anyone found a way to disable the dictionary / word >> recognition. >> 3. Then there are some page segmentation modes >> (PSM_AUTO,PSM_SINGLE_BLOCK, PSM_CHAR etc). Does PSM_CHAR imply that it >> will >> consider the input image as a single character and run the algorithm >> accordingly without attempting word recognition? >> 4. Another important configuration macro that I have seen within the >> code was AVS_FASTEST = 0, AVS_MOST_ACCURATE = 100. However, I could not >> find the same being used anywhere in the code. Does this have any impact >> on >> the *character recognition* accuracy? >> 5. Finally, I also plan to use the confidence level data. Are there >> any indicators of confidence for characters as well. There is word >> confidence data which can be found in TessBaseAPI::** >> AllWordConfidences(). >> >> Awaiting your valuable insights. >> Thank you. >> >> Regards, >> Saurabh Gandhi >> > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to tesseract-ocr@googlegroups.com > To unsubscribe from this group, send email to > tesseract-ocr+unsubscr...@googlegroups.com > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to tesseract-ocr@googlegroups.com To unsubscribe from this group, send email to tesseract-ocr+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en