on Tue, Feb 22, 2005 at 04:45:38PM -0800, Dave Margolis ([EMAIL PROTECTED]) wrote: > Does anyone know of a program that I could run a few thousand GIF images > through, perform an OCR-like operation on each, and get some kind of > text back for putting into a database for searching purposes. > > I'm looking into making my collection of daily comics searchable. I > know the fonts in most comics don't lend themselves to very good OCR, > but I'm thinking a certain margin of error would be acceptable. > > And in case you're wondering, no, I'm not planning to make this > public...just for me.
No specific pointers, mostly bad news... I've looked at a few free GNU/Linux-based OCR solutions and found *very* mixed results. Output is *highly* dependent on inputs, and poor quality, dirty, misaligned, etc., images dramatically impact quality. I'm not sure what the paid-up options are. One alternative that works very well for Groklaw is the IGM method. That's Internet group mind. Piece out the material to be OCRd, have different people text it, and assemble the results. For dealing with legal faxes, it's great (I can testify to this, having typed out a few myself). Peace. -- Karsten M. Self <kmself@ix.netcom.com> http://kmself.home.netcom.com/ What Part of "Gestalt" don't you understand? I call bullshit on that one, sorry, no man pages no docs. Come on now, what are they supposed do? Call up the Psychic Hotline? - tek, describing GNOME documentation, on linux-elitists
signature.asc
Description: Digital signature
_______________________________________________ vox-tech mailing list vox-tech@lists.lugod.org http://lists.lugod.org/mailman/listinfo/vox-tech