P.S.: If you're still sure that reasonable downscaling of your images sacrifices the accuracy, please share one or two of your *unprocessed* images to investigate further.
And I'd suggest to keep up with the latest revisions of Tesseract. The API changes significantly, but Tess is definitely being improved in the sense of stability, new capabilities and also code efficiency, which explicitly may lead to improved performance which you are looking for. Warm regards, Dmitri Silaev On Tue, Mar 29, 2011 at 8:17 AM, Andres <[email protected]> wrote: > ...required. > > Hello people, > > I'm develping a licence plate recognition system from long ago and I still > have to improve the use of Tesseract to make it usable. > > My first concern is about speed: > After extracting the licence plate image, I get an image like this: > > https://docs.google.com/leaf?id=0BxkuvS_LuBAzNmRkODhkYTUtNjcyYS00Nzg5LWE0ZDItNWM4YjRkYzhjYTFh&hl=en&authkey=CP-6tsgP > > As you may see, there are only 6 characters (tess is recognizing more > because there are some blemishes over there, but I get rid of them with some > postprocessing of the layout of the recognized chars) > > In an Intel I7 720 (good power, but using a single thread) the tesseract > part is taking something like 230 ms. This is too much time for what I need. > > The image is 500 x 117 pixels. I noted that when I reduce the size of this > image the detection time is reduced in proportion with the image area, which > makes good sense. But the accuracy of the OCR is poor when the characters > height is below 90 pixels. > > So, I assume that there is a problem with the way I trained tesseract. > > Because the characters in the plates are assorted (3 alphanumeric, 3 > numeric) I trained it with just a single image with all the letters in the > alphabet. I saw that you suggest large training but I imagine that that > doesn't apply here where the characters are not organized in words. Am I > correct with this ? > > So, for you to see, this is the image with what I trained Tesseract: > > https://docs.google.com/viewer?a=v&pid=explorer&chrome=true&srcid=0BxkuvS_LuBAzODc1YjIxNWUtNzIxMS00Yjg3LTljMDctNDkyZGIxZWM4YWVm&hl=en&authkey=CMXwo-AL > > In this image the characters are about 55 pixels height. > > Then, for frequent_word_list and words_list I included a single entry for > each character, I mean, something starting with this: > > A > B > C > D > ... > > Do you see something to be improved on what I did ? Should I perhaps use a > training image with more letters, with more combinations ? Will that help > somehow ? > > Should I include in the same image a copy the same character set but with > smaller size ? In that way, will I be able to pass Tesseract smaller images > and get more speed without sacrificing detection quality ? > > > On the other hand, I found some strange behavior of Tesseract about which I > would like to know a little more: > In my preprocessing I tried Otsu thresholding > (http://en.wikipedia.org/wiki/Otsu%27s_method) and I visually got too much > better results, but surprisingly for Tesseract it was worse. It decreased > the thickness of the draw of the chars, and the chars I used to train > Tesseract were bolder. So, Tesseract matches the "boldness" of the > characters ? Should I train Tesseract with different levels of boldness ? > > I'm using Tesseract 2.04 for this. Do you think that some of these issues > will go better by using Tess 3.0 ? > > > Thanks, > > Andres > > > > > > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.

