Re: Customising Tesseract for character recognition

JVIyer Sat, 13 Oct 2012 18:58:40 -0700

*A lot of times I have seen fairly good number plate images being OCRed 
inaccurately. This could possibly be due to the word recognition stage. Has 
anyone found a way to disable the dictionary / word recognition.
*
Saurabh, Have you been able to accomplish this ? Could you kindly share 
your insigths ? I have a similar need. 
Thanks a lot in advance.
rgds,
JV Iyer
On Wednesday, February 16, 2011 10:48:56 PM UTC-6, Saurabh Gandhi wrote:
>
> Hello everyone,
>
> I am currently using tesseract 3.x for license plate recognition.
> I have an algorithm which does a good job in pre-processing the input 
> image to localize the plate.
> However, when I use the Tesseract OCR engine to classify the plate number, 
> the recognition is not that accurate. I have gone through the tesseract 
> whitepapers as well as some of the threads discussing the LPR using 
> tesseract.
>
> From all this, I have identified the following ways of improving the 
> results:
>
>    1. Customise the tesseract engine to recognize only the characters 
>    from A-Z,0-9,.(dot), (space) by setting the character white-list. My 
>    understanding is that the white-list is the list of characters that are 
>    going to be sensed. I was inquisitive to know what the blacklist is meant 
>    to do?
>    2. A lot of times I have seen fairly good number plate images being 
>    OCRed inaccurately. This could possibly be due to the word recognition 
>    stage. Has anyone found a way to disable the dictionary / word recognition.
>    3. Then there are some page segmentation modes 
>    (PSM_AUTO,PSM_SINGLE_BLOCK, PSM_CHAR etc). Does PSM_CHAR imply that it 
> will 
>    consider the input image as a single character and run the algorithm 
>    accordingly without attempting word recognition?
>    4. Another important configuration macro that I have seen within the 
>    code was AVS_FASTEST = 0,  AVS_MOST_ACCURATE = 100. However, I could not 
>    find the same being used anywhere in the code. Does this have any impact 
> on 
>    the *character recognition* accuracy?
>    5. Finally, I also plan to use the confidence level data. Are there 
>    any indicators of confidence for characters as well. There is word 
>    confidence data which can be found in TessBaseAPI::AllWordConfidences().
>
> Awaiting your valuable insights.
> Thank you.
>
> Regards,
> Saurabh Gandhi
>


-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com
To unsubscribe from this group, send email to
tesseract-ocr+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Re: Customising Tesseract for character recognition

Reply via email to