We have experienced precisely the same behavior with business cards
and found that for business cards optimal image size is around 1,024 x
768. Try that same size with your documents and see what happens -
just remember to adjust your size to the size of your documents - if
you have twice the number of lines of text found on a business card
you may want to use 1,024 x 1,536.

By the way: to be precise a higher DPI (i.e. a scanner capable of high
DPI) is always better, what appears to degrade accuracy is a too large
image size. In other words you can capture images at the highest
possible DPI but then make sure to reduce image size before scanning
(using resampling of course).

Patrick

On Oct 27, 8:32 pm, WalterA <arsen...@gmail.com> wrote:
> Hi everyone,
>
> I have been testing tesseract for awhile and I have noticed some
> unusual behaviour.
>
> Different cases (font, print size) seem to result in different optimal
> resolutions.  Some fonts and sizes seem to give many errors at
> moderately higher DPI (e.g. 300), whereas once I nail an optimal DPI
> (e.g. 200) the OCR accuracy is very high.  I would naively expect that
> higher DPI scans would at best provide no improvement in accuracy, but
> not a decrease.
>
> In other cases, higher DPI scans result in better accuracy.
>
> Any explanation?  Workaround that doesn't require trial and error for
> each document scanned?

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com
To unsubscribe from this group, send email to
tesseract-ocr+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to