Re: tips for improving Tesseract accuracy and speed...

cong nguyenba Wed, 30 Mar 2011 09:02:10 -0700

I have another approach for you here: try to apply binarization using
adaptive threshold! Delving into engine by following apdaptive
classification in source code for speedup! I think it is enough for
your expectation!


On Wednesday, March 30, 2011, Dmitri Silaev <[email protected]> wrote:
> P.S.: If you're still sure that reasonable downscaling of your images
> sacrifices the accuracy, please share one or two of your *unprocessed*
> images to investigate further.
>
> And I'd suggest to keep up with the latest revisions of Tesseract. The
> API changes significantly, but Tess is definitely being improved in
> the sense of stability, new capabilities and also code efficiency,
> which explicitly may lead to improved performance which you are
> looking for.
>
> Warm regards,
> Dmitri Silaev
>
>
>
>
>
> On Tue, Mar 29, 2011 at 8:17 AM, Andres <[email protected]> wrote:
>> ...required.
>>
>> Hello people,
>>
>> I'm develping a licence plate recognition system from long ago and I still
>> have to improve the use of Tesseract to make it usable.
>>
>> My first concern is about speed:
>> After extracting the licence plate image, I get an image like this:
>>
>> https://docs.google.com/leaf?id=0BxkuvS_LuBAzNmRkODhkYTUtNjcyYS00Nzg5LWE0ZDItNWM4YjRkYzhjYTFh&hl=en&authkey=CP-6tsgP
>>
>> As you may see, there are only 6 characters (tess is recognizing more
>> because there are some blemishes over there, but I get rid of them with some
>> postprocessing of the layout of the recognized chars)
>>
>> In an Intel I7 720 (good power, but using a single thread) the tesseract
>> part is taking something like 230 ms. This is too much time for what I need.
>>
>> The image is 500 x 117 pixels. I noted that when I reduce the size of this
>> image the detection time is reduced in proportion with the image area, which
>> makes good sense. But the accuracy of the OCR is poor when the characters
>> height is below 90 pixels.
>>
>> So, I assume that there is a problem with the way I trained tesseract.
>>
>> Because the characters in the plates are assorted (3 alphanumeric, 3
>> numeric) I trained it with just a single image with all the letters in the
>> alphabet. I saw that you suggest large training but I imagine that that
>> doesn't apply here where the characters are not organized in words. Am I
>> correct with this ?
>>
>> So, for you to see, this is the image with what I trained Tesseract:
>>
>> https://docs.google.com/viewer?a=v&pid=explorer&chrome=true&srcid=0BxkuvS_LuBAzODc1YjIxNWUtNzIxMS00Yjg3LTljMDctNDkyZGIxZWM4YWVm&hl=en&authkey=CMXwo-AL
>>
>> In this image the characters are about 55 pixels height.
>>
>> Then, for frequent_word_list and words_list I included a single entry for
>> each character, I mean, something starting with this:
>>
>> A
>> B
>> C
>> D
>> ...
>>
>> Do you see something to be improved on what I did ? Should I perhaps use a
>> training image with more letters, with more combinations ? Will that help
>> somehow ?
>>
>> Should I include in the same image a copy the same character set but with
>> smaller size ? In that way, will I be able to pass Tesseract smaller images
>> and get more speed without sacrificing detection quality ?
>>
>>
>> On the other hand, I found some strange behavior of Tesseract about which I
>> would like to know a little more:
>> In my preprocessing I tried Otsu thresholding
>> (http://en.wikipedia.org/wiki/Otsu%27s_method) and I visually got too much
>> better results, but surprisingly for Tesseract it was worse. It decreased
>> the thickness of the draw of the chars, and the chars I used to train
>> Tesseract were bolder. So, Tesseract matches the "boldness" of the
>> characters ? Should I train Tesseract with different levels of boldness ?
>>
>> I'm using Tesseract 2.04 for this. Do you think that some of these issues
>> will go better by using Tess 3.0 ?
>>
>>
>> Thanks,
>>
>> Andres
>>
>>
>>
>>
>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To post to this group, send email to [email protected].
>> To unsubscribe from this group, send email to
>> [email protected].
>> For more options, visit this group at
>> http://groups.google.com/group/tesseract-ocr?hl=en.
>>
>
> --
> You received this message because you are subscribed to the Google Groups 
> "tesseract-ocr" group.
> To post to this group, send email to

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.

Re: tips for improving Tesseract accuracy and speed...

Reply via email to