> On Feb 5, 2016, at 6:10 PM, Timothe Litt <l...@ieee.org> wrote: > > Some of the PDFs on bitsavers are searchable. It would be a good > project to OCR the rest into searchable pdfs - as that also means that > the text can be extracted. OCR is getting good enough (finally) that > it's feasible. I'm sure that they'd be accepted back into bitsavers - > searchable is good for everyone.
Some disapprove of OCR for reasons I don't really understand. A problem with OCR is that it's hard to find a good one. I dabbled with an OCR plugin that Adobe once offered (free, and worth about that). I also once tried an open source OCR, which was vastly inferior still. But commercial OCR programs exist that do a decent job, especially if the scanned material is clean as is the case for much of what is on Bitsavers. I use Abbyy FineReader which I rather like, but I expect there are other good ones out there too. One key point is that you typically need to spend some time "training" the program on the particular type of material -- typeface etc. -- that you're working with. The default settings are rarely adequate. paul _______________________________________________ Simh mailing list Simh@trailing-edge.com http://mailman.trailing-edge.com/mailman/listinfo/simh