Bad/Pathologic ocr output from a "good" input, similar inputs give good results

Kazuo Fri, 16 Apr 2010 19:32:03 -0700

Hi.

I dont have much experience in ocr, but I get something strange today.


I'm converting a vobsub subtitle to srt, using tesseract to do the
ocr.

All looks good but, two lines of text are plain wrong. I cant see
anything wrong with then, they looks visually equals to all the others
samples that give good results. All images are generated in the same
manner.

I'm using tesseract 2.04 in Linux (Arch Linux official packages) from
command line

I posted the tif file on the group

Bad:
http://groups.google.com/group/tesseract-ocr/web/subtitle-0001090989-0001094322.tif
give me the ocr: 'SCFQSITI in 8g0I’Iy.'

http://groups.google.com/group/tesseract-ocr/web/subtitle-0001317783-0001319717.tif
give me the ocr: 'YOU STG UI’I98Sy.'

Good:
http://groups.google.com/group/tesseract-ocr/web/subtitle-0001320819-0001323686.tif
give me the ocr: 'Death visited me this morning.'

Someone know what happens here?

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.

Bad/Pathologic ocr output from a "good" input, similar inputs give good results

Reply via email to