Jonathan Morgan wrote:
On Sun, Aug 30, 2009 at 7:41 AM, Chris Little<chris...@crosswire.org> wrote:
Peter von Kaehne wrote:
Peter von Kaehne wrote:
Just started to look around Google Books and saw the huge collection of
public domain books scanned, OCRd and transformed into epub books.

E.g. here Wesley's complete works:


http://books.google.co.uk/books?id=2tdhAAAAIAAJ&printsec=frontcover&dq=subject:%22+theology+%22&lr=&as_brr=1&ei=opmZSu6qDqCGygTxpIjODg&rview=1#v=onepage&q=&f=false

Scanned, OCRed and as epub. A rudimentary genbook should be creatable
within a couple of hours and once references etc are inserted it could
be a valuable resource beyond the ability of an epub reader.

Peter
No doubt this is a step in the right direction, but I have the same
misgivings as Matthew regarding OCRd and unproofed content.

I popped the book you cite into Adobe Digital Editions to check the quality,
and found most of the OCR problems we would expect to see:
weird layout, non-Latin text appears as gibberish, and one (text) page I
spotted was just presented inline as a scanned image.

So, it's a good step, but the quality is pretty bad.

I agree too.  I am involved with a website that distributes a lot of
scanned and OCR'd works, and when I read some of them I think "How
could you seriously present that document to the world?"  For what
it's worth, Logos say that it is faster and better to type the content
in yourself than to OCR and then proofread and correct, and Logos
produces a lot of content.  I suspect that would be certain for
reasonably complex scripts and layouts, and quite possible even for
reasonably simple content if you have good typists.

Jon

The Gutenberg Project might be a better source for Public Domain texts in many different formats.
http://www.gutenberg.org/

Daniel Blake


_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Reply via email to