Hi. I dont have much experience in ocr, but I get something strange today.
I'm converting a vobsub subtitle to srt, using tesseract to do the ocr. All looks good but, two lines of text are plain wrong. I cant see anything wrong with then, they looks visually equals to all the others samples that give good results. All images are generated in the same manner. I'm using tesseract 2.04 in Linux (Arch Linux official packages) from command line I posted the tif file on the group Bad: http://groups.google.com/group/tesseract-ocr/web/subtitle-0001090989-0001094322.tif give me the ocr: 'SCFQSITI in 8g0I’Iy.' http://groups.google.com/group/tesseract-ocr/web/subtitle-0001317783-0001319717.tif give me the ocr: 'YOU STG UI’I98Sy.' Good: http://groups.google.com/group/tesseract-ocr/web/subtitle-0001320819-0001323686.tif give me the ocr: 'Death visited me this morning.' Someone know what happens here? -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.

