> On Feb 5, 2016, at 6:10 PM, Timothe Litt <l...@ieee.org> wrote:
> 
> Some of the PDFs on bitsavers are searchable.  It would be a good
> project to OCR the rest into searchable pdfs - as that also means that
> the text can be extracted.   OCR is getting good enough (finally) that
> it's feasible.  I'm sure that they'd be accepted back into bitsavers  -
> searchable is good for everyone.

Some disapprove of OCR for reasons I don't really understand.

A problem with OCR is that it's hard to find a good one.  I dabbled with an OCR 
plugin that Adobe once offered (free, and worth about that).  I also once tried 
an open source OCR, which was vastly inferior still.

But commercial OCR programs exist that do a decent job, especially if the 
scanned material is clean as is the case for much of what is on Bitsavers.  I 
use Abbyy FineReader which I rather like, but I expect there are other good 
ones out there too.

One key point is that you typically need to spend some time "training" the 
program on the particular type of material -- typeface etc. -- that you're 
working with.  The default settings are rarely adequate.

        paul

_______________________________________________
Simh mailing list
Simh@trailing-edge.com
http://mailman.trailing-edge.com/mailman/listinfo/simh

Reply via email to