on Tue, Feb 22, 2005 at 04:45:38PM -0800, Dave Margolis ([EMAIL PROTECTED]) 
wrote:
> Does anyone know of a program that I could run a few thousand GIF images 
> through, perform an OCR-like operation on each, and get some kind of 
> text back for putting into a database for searching purposes.
> 
> I'm looking into making my collection of daily comics searchable.  I 
> know the fonts in most comics don't lend themselves to very good OCR, 
> but I'm thinking a certain margin of error would be acceptable.
> 
> And in case you're wondering, no, I'm not planning to make this 
> public...just for me.

No specific pointers, mostly bad news...

I've looked at a few free GNU/Linux-based OCR solutions and found *very*
mixed results.  Output is *highly* dependent on inputs, and poor
quality, dirty, misaligned, etc., images dramatically impact quality.

I'm not sure what the paid-up options are.  One alternative that works
very well for Groklaw is the IGM method.  That's Internet group mind.
Piece out the material to be OCRd, have different people text it, and
assemble the results.  For dealing with legal faxes, it's great (I can
testify to this, having typed out a few myself).


Peace.

-- 
Karsten M. Self <kmself@ix.netcom.com>        http://kmself.home.netcom.com/
 What Part of "Gestalt" don't you understand?
    I call bullshit on that one, sorry, no man pages no docs.  Come on
    now, what are they supposed do?  Call up the Psychic Hotline?
    - tek, describing GNOME documentation, on linux-elitists

Attachment: signature.asc
Description: Digital signature

_______________________________________________
vox-tech mailing list
vox-tech@lists.lugod.org
http://lists.lugod.org/mailman/listinfo/vox-tech

Reply via email to