I have another approach for you here: try to apply binarization using adaptive threshold! Delving into engine by following apdaptive classification in source code for speedup! I think it is enough for your expectation!
On Wednesday, March 30, 2011, Dmitri Silaev <[email protected]> wrote: > P.S.: If you're still sure that reasonable downscaling of your images > sacrifices the accuracy, please share one or two of your *unprocessed* > images to investigate further. > > And I'd suggest to keep up with the latest revisions of Tesseract. The > API changes significantly, but Tess is definitely being improved in > the sense of stability, new capabilities and also code efficiency, > which explicitly may lead to improved performance which you are > looking for. > > Warm regards, > Dmitri Silaev > > > > > > On Tue, Mar 29, 2011 at 8:17 AM, Andres <[email protected]> wrote: >> ...required. >> >> Hello people, >> >> I'm develping a licence plate recognition system from long ago and I still >> have to improve the use of Tesseract to make it usable. >> >> My first concern is about speed: >> After extracting the licence plate image, I get an image like this: >> >> https://docs.google.com/leaf?id=0BxkuvS_LuBAzNmRkODhkYTUtNjcyYS00Nzg5LWE0ZDItNWM4YjRkYzhjYTFh&hl=en&authkey=CP-6tsgP >> >> As you may see, there are only 6 characters (tess is recognizing more >> because there are some blemishes over there, but I get rid of them with some >> postprocessing of the layout of the recognized chars) >> >> In an Intel I7 720 (good power, but using a single thread) the tesseract >> part is taking something like 230 ms. This is too much time for what I need. >> >> The image is 500 x 117 pixels. I noted that when I reduce the size of this >> image the detection time is reduced in proportion with the image area, which >> makes good sense. But the accuracy of the OCR is poor when the characters >> height is below 90 pixels. >> >> So, I assume that there is a problem with the way I trained tesseract. >> >> Because the characters in the plates are assorted (3 alphanumeric, 3 >> numeric) I trained it with just a single image with all the letters in the >> alphabet. I saw that you suggest large training but I imagine that that >> doesn't apply here where the characters are not organized in words. Am I >> correct with this ? >> >> So, for you to see, this is the image with what I trained Tesseract: >> >> https://docs.google.com/viewer?a=v&pid=explorer&chrome=true&srcid=0BxkuvS_LuBAzODc1YjIxNWUtNzIxMS00Yjg3LTljMDctNDkyZGIxZWM4YWVm&hl=en&authkey=CMXwo-AL >> >> In this image the characters are about 55 pixels height. >> >> Then, for frequent_word_list and words_list I included a single entry for >> each character, I mean, something starting with this: >> >> A >> B >> C >> D >> ... >> >> Do you see something to be improved on what I did ? Should I perhaps use a >> training image with more letters, with more combinations ? Will that help >> somehow ? >> >> Should I include in the same image a copy the same character set but with >> smaller size ? In that way, will I be able to pass Tesseract smaller images >> and get more speed without sacrificing detection quality ? >> >> >> On the other hand, I found some strange behavior of Tesseract about which I >> would like to know a little more: >> In my preprocessing I tried Otsu thresholding >> (http://en.wikipedia.org/wiki/Otsu%27s_method) and I visually got too much >> better results, but surprisingly for Tesseract it was worse. It decreased >> the thickness of the draw of the chars, and the chars I used to train >> Tesseract were bolder. So, Tesseract matches the "boldness" of the >> characters ? Should I train Tesseract with different levels of boldness ? >> >> I'm using Tesseract 2.04 for this. Do you think that some of these issues >> will go better by using Tess 3.0 ? >> >> >> Thanks, >> >> Andres >> >> >> >> >> >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To post to this group, send email to [email protected]. >> To unsubscribe from this group, send email to >> [email protected]. >> For more options, visit this group at >> http://groups.google.com/group/tesseract-ocr?hl=en. >> > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To post to this group, send email to -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.

