Re: New to Tesseract OCR

Richard Wang Wed, 19 Feb 2014 17:28:25 -0800

You can upload your input image here, so that others can be more helpful.
The extra software cannot promise a success, it is just a way to try.

I also have a badcase image, it failed even after I use Scan Tailor, but I 
can 
inspect at which phase it is wrong. 

I am also new to Tesseract, so what I am saying below is not authoritative. 
There
are at least two possible reasons for a failure.

1). the box for a single character is wrong, e.g., the box includes two or
more letters together. 
2). the box for a single character is correct, but the recognition of a 
single
character is wrong. The reason might be serious corruption of the image,
e.g., stroke broken of a letter. 

Serious skew can also make your recognition fail, you can use Scan Tailor
to check the skewness of your image. It is worth mention that these tools
do not always do a good job. Luckily, tools like Scan Tailor allow you to
manually set the skew, so you can correct the skew of an image first.

If you got an empty result, it is likely that Tesseract has selected
one or more big boxes covering multiple lines of texts. Since the box
is totally wrong (it is supposed that a box is an image region of a single 
letter),
the recognition is unlikely to be correct. 

You can use make a box file and use jTessBoxEditor (check 3rd party of 
tesseract)
to check the boxes visually. 

You can check one of my previous post where I have uploaded original image,
intermediate results to ask for help. Although I have not got the answer 
yet, I believe
I am providing enough details as possible as I can.

Richard

On Wednesday, February 19, 2014 10:38:09 PM UTC+8, MOU Mukherjee wrote:
>
> Thanks Richard. I did read the FAQs but am confused which software is the 
> best to use to clean/pre-process the image before running on Tesseract. I 
> need to convert FAXES into text files, but currently they are all coming up 
> garbled. Please help! thank you.
>
> On Wednesday, February 19, 2014 12:45:28 AM UTC-5, Richard Wang wrote:
>>
>> I suggest you to read the FAQ of tesseract, your question is 
>>
>> "Why is the output is empty or of poor quality?"
>>
>> and you will find the answer. there are several pieces of freeware you 
>> can use.
>>
>> On Wednesday, February 19, 2014 4:25:05 AM UTC+8, MOU Mukherjee wrote:
>>>
>>> hi, I have recently started using the Tesseract OCR tool (on Windows), 
>>> so need help from the experts!!!  I tested the tool using a simple "Tiff" 
>>> file which worked. However, the more complex image docs are not running 
>>> properly at all, the output is totally garbled.  Can you please advise 
>>> which software/tool I need to use to Pre-Process/Clean the Input file so 
>>> that I may get the desired output? I am confused if I need ImageMagick, 
>>> GIMP or other tool, and which version to run? PLEASE HELP/suggest how to 
>>> clean the image prior to running tesseract? THANKS so much for your time!!
>>>
>>

-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com
To unsubscribe from this group, send email to
tesseract-ocr+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: New to Tesseract OCR

Reply via email to