Re: Customising Tesseract for character recognition

zdenko podobny Sun, 14 Oct 2012 04:46:43 -0700

On Sat, Oct 13, 2012 at 10:47 PM, JVIyer <jawant...@gmail.com> wrote:


> *A lot of times I have seen fairly good number plate images being OCRed
> inaccurately. This could possibly be due to the word recognition stage. Has
> anyone found a way to disable the dictionary / word recognition.
> *
> Saurabh, Have you been able to accomplish this ? Could you kindly share
> your insigths ? I have a similar need.
> Thanks a lot in advance.
>

First of all - make sure that "fairly good" is also relevant for binarized
version of your image.
Next - dictionaries can be disable only at init time [1], [2]. So create
config file where you specified (load_system_dawg F)
which dictionaries[3] should not be loaded .

[1] https://code.google.com/p/tesseract-ocr/issues/detail?id=737
[2] http://code.google.com/p/tesseract-ocr/wiki/ControlParams#Details
[3]
https://code.google.com/p/tesseract-ocr/source/browse/trunk/dict/dict.cpp#43

-- 
Zdenko


> On Wednesday, February 16, 2011 10:48:56 PM UTC-6, Saurabh Gandhi wrote:
>>
>> Hello everyone,
>>
>> I am currently using tesseract 3.x for license plate recognition.
>> I have an algorithm which does a good job in pre-processing the input
>> image to localize the plate.
>> However, when I use the Tesseract OCR engine to classify the plate
>> number, the recognition is not that accurate. I have gone through the
>> tesseract whitepapers as well as some of the threads discussing the LPR
>> using tesseract.
>>
>> From all this, I have identified the following ways of improving the
>> results:
>>
>>    1. Customise the tesseract engine to recognize only the characters
>>    from A-Z,0-9,.(dot), (space) by setting the character white-list. My
>>    understanding is that the white-list is the list of characters that are
>>    going to be sensed. I was inquisitive to know what the blacklist is meant
>>    to do?
>>    2. A lot of times I have seen fairly good number plate images being
>>    OCRed inaccurately. This could possibly be due to the word recognition
>>    stage. Has anyone found a way to disable the dictionary / word 
>> recognition.
>>    3. Then there are some page segmentation modes
>>    (PSM_AUTO,PSM_SINGLE_BLOCK, PSM_CHAR etc). Does PSM_CHAR imply that it 
>> will
>>    consider the input image as a single character and run the algorithm
>>    accordingly without attempting word recognition?
>>    4. Another important configuration macro that I have seen within the
>>    code was AVS_FASTEST = 0,  AVS_MOST_ACCURATE = 100. However, I could not
>>    find the same being used anywhere in the code. Does this have any impact 
>> on
>>    the *character recognition* accuracy?
>>    5. Finally, I also plan to use the confidence level data. Are there
>>    any indicators of confidence for characters as well. There is word
>>    confidence data which can be found in TessBaseAPI::**
>>    AllWordConfidences().
>>
>> Awaiting your valuable insights.
>> Thank you.
>>
>> Regards,
>> Saurabh Gandhi
>>
>  --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to tesseract-ocr@googlegroups.com
> To unsubscribe from this group, send email to
> tesseract-ocr+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com
To unsubscribe from this group, send email to
tesseract-ocr+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Re: Customising Tesseract for character recognition

Reply via email to