-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Loren Wilton wrote: >> Here's the pic in question as original gif (I joined the parts to >> make it easier for gocr): >> http://www.matthias-keller.ch/ocrmail.gif and converted to pnm: >> http://www.matthias-keller.ch/ocrmail.pnm >> >> And here's what gocr -i ocrmail.pnm spits out in my case: >> http://www.matthias-keller.ch/ocrmail.gocr > > The only thing your scan got decently was the sans-serif font. All > of the serif font stuff and the italic sans-serif fonts stuff > turned to garbage. > > I'm not quite sure why this should be. That looks like pretty > clean text that should be pretty recognizable. The contrast could > be a problem, but that 100% accuracy on the one line indicates that > it probably isn't. There should be an option to one of the > programs to do a b/w transform on this. That may help. > > I'd look to see if the ocr program has any options on the kinds of > fonts it recognizes. > > Ok. A little playing around in photoship. That is all > anti-aliased fonts. It looks real good in the gif. If you convert > it to jpg, or I suspect any other lossy compression at standard > compression rates, the results are unusable; there just aren't > enough pixels. > > If you keep all of the pixels (doing this on Windows I went > gif->bmp to import it to photoshop) you have better luck. > > However, if you attempt to threshold to b/w at the default 50% > threshold level the results are unusable. If you threshold at > around 170-190 (out of 255), or around 70-75%, then you get much > better results. > > If you can't control the threshold level, you can try taking the > contrast up. I set the contrast to 100% and then thresholded. The > results weren't quite as good, but they were numbers that don't > require experimentation. A contrast around 90% might have been > better, but I didn't try that yet. > > Loren >
We traced down the problem now to his gocr version. I ran my gocr over his pnm file and got very good results compared to him, actually the same results I got on his gif. So probably something is wrong with his gocr version because you don't need special arguments (we are using the same). Chris -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFE2btHJQIKXnJyDxURAjnHAJ4zriGQSU4B2Sr/ii+ivMfG3QRMZwCeI/7a lVOtMTrJPQbVSkrLpt0760g= =spr2 -----END PGP SIGNATURE-----