RE: tips for improving Tesseract accuracy and speed...

Cong Nguyen Wed, 30 Mar 2011 19:09:38 -0700

Please refer to "OPTIMIZING SPEED FOR ADAPTIVE LOCAL THRESHOLDING ALGORITHM
USING DYNAMIC PROGRAMMING".
Complexity is: O(n), n is number of pixels.


-----Original Message-----
From: [email protected] [mailto:[email protected]]
On Behalf Of Max Cantor
Sent: Thursday, March 31, 2011 7:28 AM
To: [email protected]
Cc: [email protected]
Subject: Re: tips for improving Tesseract accuracy and speed...

Yes. I've had great experience with sauvola binarize from leptonica. Gamer
works too but is much much slower

On Mar 31, 2011, at 0:02, cong nguyenba <[email protected]> wrote:

> I have another approach for you here: try to apply binarization using
> adaptive threshold! Delving into engine by following apdaptive
> classification in source code for speedup! I think it is enough for
> your expectation!
> 
> On Wednesday, March 30, 2011, Dmitri Silaev <[email protected]> wrote:
>> P.S.: If you're still sure that reasonable downscaling of your images
>> sacrifices the accuracy, please share one or two of your *unprocessed*
>> images to investigate further.
>> 
>> And I'd suggest to keep up with the latest revisions of Tesseract. The
>> API changes significantly, but Tess is definitely being improved in
>> the sense of stability, new capabilities and also code efficiency,
>> which explicitly may lead to improved performance which you are
>> looking for.
>> 
>> Warm regards,
>> Dmitri Silaev
>> 
>> 
>> 
>> 
>> 
>> On Tue, Mar 29, 2011 at 8:17 AM, Andres <[email protected]> wrote:
>>> ...required.
>>> 
>>> Hello people,
>>> 
>>> I'm develping a licence plate recognition system from long ago and I
still
>>> have to improve the use of Tesseract to make it usable.
>>> 
>>> My first concern is about speed:
>>> After extracting the licence plate image, I get an image like this:
>>> 
>>>
https://docs.google.com/leaf?id=0BxkuvS_LuBAzNmRkODhkYTUtNjcyYS00Nzg5LWE0ZDI
tNWM4YjRkYzhjYTFh&hl=en&authkey=CP-6tsgP
>>> 
>>> As you may see, there are only 6 characters (tess is recognizing more
>>> because there are some blemishes over there, but I get rid of them with
some
>>> postprocessing of the layout of the recognized chars)
>>> 
>>> In an Intel I7 720 (good power, but using a single thread) the tesseract
>>> part is taking something like 230 ms. This is too much time for what I
need.
>>> 
>>> The image is 500 x 117 pixels. I noted that when I reduce the size of
this
>>> image the detection time is reduced in proportion with the image area,
which
>>> makes good sense. But the accuracy of the OCR is poor when the
characters
>>> height is below 90 pixels.
>>> 
>>> So, I assume that there is a problem with the way I trained tesseract.
>>> 
>>> Because the characters in the plates are assorted (3 alphanumeric, 3
>>> numeric) I trained it with just a single image with all the letters in
the
>>> alphabet. I saw that you suggest large training but I imagine that that
>>> doesn't apply here where the characters are not organized in words. Am I
>>> correct with this ?
>>> 
>>> So, for you to see, this is the image with what I trained Tesseract:
>>> 
>>>
https://docs.google.com/viewer?a=v&pid=explorer&chrome=true&srcid=0BxkuvS_Lu
BAzODc1YjIxNWUtNzIxMS00Yjg3LTljMDctNDkyZGIxZWM4YWVm&hl=en&authkey=CMXwo-AL
>>> 
>>> In this image the characters are about 55 pixels height.
>>> 
>>> Then, for frequent_word_list and words_list I included a single entry
for
>>> each character, I mean, something starting with this:
>>> 
>>> A
>>> B
>>> C
>>> D
>>> ...
>>> 
>>> Do you see something to be improved on what I did ? Should I perhaps use
a
>>> training image with more letters, with more combinations ? Will that
help
>>> somehow ?
>>> 
>>> Should I include in the same image a copy the same character set but
with
>>> smaller size ? In that way, will I be able to pass Tesseract smaller
images
>>> and get more speed without sacrificing detection quality ?
>>> 
>>> 
>>> On the other hand, I found some strange behavior of Tesseract about
which I
>>> would like to know a little more:
>>> In my preprocessing I tried Otsu thresholding
>>> (http://en.wikipedia.org/wiki/Otsu%27s_method) and I visually got too
much
>>> better results, but surprisingly for Tesseract it was worse. It
decreased
>>> the thickness of the draw of the chars, and the chars I used to train
>>> Tesseract were bolder. So, Tesseract matches the "boldness" of the
>>> characters ? Should I train Tesseract with different levels of boldness
?
>>> 
>>> I'm using Tesseract 2.04 for this. Do you think that some of these
issues
>>> will go better by using Tess 3.0 ?
>>> 
>>> 
>>> Thanks,
>>> 
>>> Andres
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> --
>>> You received this message because you are subscribed to the Google
Groups
>>> "tesseract-ocr" group.
>>> To post to this group, send email to [email protected].
>>> To unsubscribe from this group, send email to
>>> [email protected].
>>> For more options, visit this group at
>>> http://groups.google.com/group/tesseract-ocr?hl=en.
>>> 
>> 
>> --
>> You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
>> To post to this group, send email to
> 
> -- 
> You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to
[email protected].
> For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en.
> 

-- 
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.

RE: tips for improving Tesseract accuracy and speed...

Reply via email to