You can upload your input image here, so that others can be more helpful. The extra software cannot promise a success, it is just a way to try.
I also have a badcase image, it failed even after I use Scan Tailor, but I can inspect at which phase it is wrong. I am also new to Tesseract, so what I am saying below is not authoritative. There are at least two possible reasons for a failure. 1). the box for a single character is wrong, e.g., the box includes two or more letters together. 2). the box for a single character is correct, but the recognition of a single character is wrong. The reason might be serious corruption of the image, e.g., stroke broken of a letter. Serious skew can also make your recognition fail, you can use Scan Tailor to check the skewness of your image. It is worth mention that these tools do not always do a good job. Luckily, tools like Scan Tailor allow you to manually set the skew, so you can correct the skew of an image first. If you got an empty result, it is likely that Tesseract has selected one or more big boxes covering multiple lines of texts. Since the box is totally wrong (it is supposed that a box is an image region of a single letter), the recognition is unlikely to be correct. You can use make a box file and use jTessBoxEditor (check 3rd party of tesseract) to check the boxes visually. You can check one of my previous post where I have uploaded original image, intermediate results to ask for help. Although I have not got the answer yet, I believe I am providing enough details as possible as I can. Richard On Wednesday, February 19, 2014 10:38:09 PM UTC+8, MOU Mukherjee wrote: > > Thanks Richard. I did read the FAQs but am confused which software is the > best to use to clean/pre-process the image before running on Tesseract. I > need to convert FAXES into text files, but currently they are all coming up > garbled. Please help! thank you. > > On Wednesday, February 19, 2014 12:45:28 AM UTC-5, Richard Wang wrote: >> >> I suggest you to read the FAQ of tesseract, your question is >> >> "Why is the output is empty or of poor quality?" >> >> and you will find the answer. there are several pieces of freeware you >> can use. >> >> On Wednesday, February 19, 2014 4:25:05 AM UTC+8, MOU Mukherjee wrote: >>> >>> hi, I have recently started using the Tesseract OCR tool (on Windows), >>> so need help from the experts!!! I tested the tool using a simple "Tiff" >>> file which worked. However, the more complex image docs are not running >>> properly at all, the output is totally garbled. Can you please advise >>> which software/tool I need to use to Pre-Process/Clean the Input file so >>> that I may get the desired output? I am confused if I need ImageMagick, >>> GIMP or other tool, and which version to run? PLEASE HELP/suggest how to >>> clean the image prior to running tesseract? THANKS so much for your time!! >>> >> -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to tesseract-ocr@googlegroups.com To unsubscribe from this group, send email to tesseract-ocr+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en --- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.