Some thoughts/questions in your text below
Michel

2009/9/3, Jed Rothwell <jedrothw...@gmail.com>:
> Steve uncovered a real treasure trove of stuff from EPRI! I urge everyone to
> read it. Good for Steve and good for EPRI.
>
> HOWEVER, the Acrobat files from the EPRI site are peculiar. They are bigger
> than they need to be, and they are not "searchable" (not text under image
> format). Some have text under the image which is all wrong. I guess someone
> did not know how to set the Acrobat parameters. Acrobat is an unfriendly
> program with lousy documentation so people often get it wrong.
>
> Anyway, I converted the papers to conventional text-under-image format and
> reduced the size. I am also eliminating some of the noise (speckles and
> dots) and correcting some of worst OCR errors in the underlying text. I can
> display and correct the underlying text separately with ABBYY, and then
> reassemble the Acrobat file.

For pdfs with low fi underlying ocr, in my experience saving them as
pure image files and then re-ocr-ing them with the latest version of
Acrobat often improves the ocr quality considerably.

Also, did you know you can batch-ocr any number of pdfs at a time?
I think it would be only a matter of a few days of automated computer
work to make your whole collection of many thousands of CF and
peripheral papers searchable.

A thought regarding copyright issues, rather than seeking uploading
permission for every single paper, would there be a big risk in
uploading everything and then removing only those the copyright
holders ask you to remove?

For those, how about functioning like a real library, where the
library card holders can download copyrighted material?

> I have just about finished the last paper and I
> will upload them tomorrow.
>
> I will upload a better version of the NSF/EPRI book as well as the TR-
> series.
>
> - Jed
>

Reply via email to