We have experienced precisely the same behavior with business cards and found that for business cards optimal image size is around 1,024 x 768. Try that same size with your documents and see what happens - just remember to adjust your size to the size of your documents - if you have twice the number of lines of text found on a business card you may want to use 1,024 x 1,536.
By the way: to be precise a higher DPI (i.e. a scanner capable of high DPI) is always better, what appears to degrade accuracy is a too large image size. In other words you can capture images at the highest possible DPI but then make sure to reduce image size before scanning (using resampling of course). Patrick On Oct 27, 8:32 pm, WalterA <arsen...@gmail.com> wrote: > Hi everyone, > > I have been testing tesseract for awhile and I have noticed some > unusual behaviour. > > Different cases (font, print size) seem to result in different optimal > resolutions. Some fonts and sizes seem to give many errors at > moderately higher DPI (e.g. 300), whereas once I nail an optimal DPI > (e.g. 200) the OCR accuracy is very high. I would naively expect that > higher DPI scans would at best provide no improvement in accuracy, but > not a decrease. > > In other cases, higher DPI scans result in better accuracy. > > Any explanation? Workaround that doesn't require trial and error for > each document scanned? -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to tesseract-ocr@googlegroups.com To unsubscribe from this group, send email to tesseract-ocr+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en