Hey all, I use Tesseract to automatically OCR batches of TIFF files, but the accuracy is pretty much hit or miss. I've been using ImageMagick to convert from PDF to TIFF, and something like "convert -density 380" will produce great OCR results for one file, whereas the same will not work well for another scan. How do I work out what value would work well alongside "-density"? Is there some data I can use from the identify command to help me calculate the ideal value, i.e. anything from: http://pastie.org/2726564 ?
And what else should I try doing to the image to improve the results from Tesseract? At the moment I'm just using ImageMagick and was thinking of playing with parameters that increase brightness and contrast, turning the alpha layer off, etc. I'm open to any other tools and ideas if they're gonna help... Thanks -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to tesseract-ocr@googlegroups.com To unsubscribe from this group, send email to tesseract-ocr+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en