Please refer to "OPTIMIZING SPEED FOR ADAPTIVE LOCAL THRESHOLDING ALGORITHM USING DYNAMIC PROGRAMMING". Complexity is: O(n), n is number of pixels.
-----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Max Cantor Sent: Thursday, March 31, 2011 7:28 AM To: [email protected] Cc: [email protected] Subject: Re: tips for improving Tesseract accuracy and speed... Yes. I've had great experience with sauvola binarize from leptonica. Gamer works too but is much much slower On Mar 31, 2011, at 0:02, cong nguyenba <[email protected]> wrote: > I have another approach for you here: try to apply binarization using > adaptive threshold! Delving into engine by following apdaptive > classification in source code for speedup! I think it is enough for > your expectation! > > On Wednesday, March 30, 2011, Dmitri Silaev <[email protected]> wrote: >> P.S.: If you're still sure that reasonable downscaling of your images >> sacrifices the accuracy, please share one or two of your *unprocessed* >> images to investigate further. >> >> And I'd suggest to keep up with the latest revisions of Tesseract. The >> API changes significantly, but Tess is definitely being improved in >> the sense of stability, new capabilities and also code efficiency, >> which explicitly may lead to improved performance which you are >> looking for. >> >> Warm regards, >> Dmitri Silaev >> >> >> >> >> >> On Tue, Mar 29, 2011 at 8:17 AM, Andres <[email protected]> wrote: >>> ...required. >>> >>> Hello people, >>> >>> I'm develping a licence plate recognition system from long ago and I still >>> have to improve the use of Tesseract to make it usable. >>> >>> My first concern is about speed: >>> After extracting the licence plate image, I get an image like this: >>> >>> https://docs.google.com/leaf?id=0BxkuvS_LuBAzNmRkODhkYTUtNjcyYS00Nzg5LWE0ZDI tNWM4YjRkYzhjYTFh&hl=en&authkey=CP-6tsgP >>> >>> As you may see, there are only 6 characters (tess is recognizing more >>> because there are some blemishes over there, but I get rid of them with some >>> postprocessing of the layout of the recognized chars) >>> >>> In an Intel I7 720 (good power, but using a single thread) the tesseract >>> part is taking something like 230 ms. This is too much time for what I need. >>> >>> The image is 500 x 117 pixels. I noted that when I reduce the size of this >>> image the detection time is reduced in proportion with the image area, which >>> makes good sense. But the accuracy of the OCR is poor when the characters >>> height is below 90 pixels. >>> >>> So, I assume that there is a problem with the way I trained tesseract. >>> >>> Because the characters in the plates are assorted (3 alphanumeric, 3 >>> numeric) I trained it with just a single image with all the letters in the >>> alphabet. I saw that you suggest large training but I imagine that that >>> doesn't apply here where the characters are not organized in words. Am I >>> correct with this ? >>> >>> So, for you to see, this is the image with what I trained Tesseract: >>> >>> https://docs.google.com/viewer?a=v&pid=explorer&chrome=true&srcid=0BxkuvS_Lu BAzODc1YjIxNWUtNzIxMS00Yjg3LTljMDctNDkyZGIxZWM4YWVm&hl=en&authkey=CMXwo-AL >>> >>> In this image the characters are about 55 pixels height. >>> >>> Then, for frequent_word_list and words_list I included a single entry for >>> each character, I mean, something starting with this: >>> >>> A >>> B >>> C >>> D >>> ... >>> >>> Do you see something to be improved on what I did ? Should I perhaps use a >>> training image with more letters, with more combinations ? Will that help >>> somehow ? >>> >>> Should I include in the same image a copy the same character set but with >>> smaller size ? In that way, will I be able to pass Tesseract smaller images >>> and get more speed without sacrificing detection quality ? >>> >>> >>> On the other hand, I found some strange behavior of Tesseract about which I >>> would like to know a little more: >>> In my preprocessing I tried Otsu thresholding >>> (http://en.wikipedia.org/wiki/Otsu%27s_method) and I visually got too much >>> better results, but surprisingly for Tesseract it was worse. It decreased >>> the thickness of the draw of the chars, and the chars I used to train >>> Tesseract were bolder. So, Tesseract matches the "boldness" of the >>> characters ? Should I train Tesseract with different levels of boldness ? >>> >>> I'm using Tesseract 2.04 for this. Do you think that some of these issues >>> will go better by using Tess 3.0 ? >>> >>> >>> Thanks, >>> >>> Andres >>> >>> >>> >>> >>> >>> >>> -- >>> You received this message because you are subscribed to the Google Groups >>> "tesseract-ocr" group. >>> To post to this group, send email to [email protected]. >>> To unsubscribe from this group, send email to >>> [email protected]. >>> For more options, visit this group at >>> http://groups.google.com/group/tesseract-ocr?hl=en. >>> >> >> -- >> You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. >> To post to this group, send email to > > -- > You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to [email protected]. > For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.

