Re: [sword-devel] Spelling (was Versification/Encoding issues)

Mike Hart Thu, 08 Jan 2009 15:01:55 -0800

On issue 4, spelling:

I've taken everyone's advice on spelling to heart, I will try to remain true to 
the original text copy.


> As for spelling, and as a fascinating learning experience, pick up your
> printed KJV Bible and examine the spelling of the word "ankle[s]" in Ezekiel
> 47:3 and Acts 3:7.
> 
> Some editions have "ancle", others have
> "ankle".
> 
> Ostensibly both streams are based on the Authorised Version
> of 1769.  So
> Peter's advice is spot on.
> 
> -- David

That's interesting, because ancle is one of the words I corrected in JSFB -- 
the OCR had ancle, but the PDF itself, my paper KJV copy, and my JPS complete 
Tanach (individual volumes) had ankle...  I can't say what verse it was, at the 
time I was hunting for e's that had been OCR'd into c's  (search for 'regular 
expression' [bcdfghjklmnpqrstvwxy]c[bcdfgjklmnpqrstvwx] in kwrite)

On the subject, but in an opposing view, if you look at the 1611 text of the 
KJV, you'll note that some ~50% of the words are spelled different from what we 
call call the "King James Version" today, but it doesn't really seem to matter. 
Read for example the 23rd psalm, It is still (or originally) the same as what 
we know and memorize in Sunday school at age 9, regardless of the spelling. I 
don't remember the spelling when I recite. 

KJV1611 23rd pfalme

http://www.us.archive.org/GnuBook/?id=holybiblefacsimi00polluoft#804
(there's a zoom button in the upper left margin, it is readable at 50% ) 
(**)-see further note below.

Since the 1769 version is still called the "King James" and they both read 
largely the same, I'd say the spelling is not as important as the word (as 
pronounced). And even then, a good number of words have been 'updated' from the 
1611 copy in the 1769 'true' KJV. 

I've taken everyone's advice on spelling to heart, I will try to remain true to 
the original text copy. 

That said, If you look at the quality of the Jewish School and Family Bible 
scans, you will see that I'm up against a mammoth task just getting a readable 
text, much less one that is letter-exact. About 10%-20% of the words were 
mis-interpreted by the OCR. I've managed to reverse engineer the OCR process 
and repair the meaning of most words. That is, an OCR interprets the same font 
the same way most of the time, so what may appear to be gibberish in the OCR 
output can be repaired by careful examination of the OCR errors. For example, 
in JSFB, the italicized words are generally simple short modifier words: the, 
of , to, etc.  The OCR did poorly at interpreting these words, but it did do a 
fair job of being repeatable in how it interpreted them ("of" turned into o/* 
or o/' or o/".)  I've done countless search and replace for things like V/ -> 
W, etc to restore the characters to readable text. What I've got now matches 
the PDF for 95+% of my random checks,
 with mostly missing letters and punctuation for most mismatches now. (and no 
I'm not trying to keep italicized words.. plain text only. )

Additionally, In the JSFB, verses are marked in the margins only. I am 
restoring the verse indicators to the verse divisions. In volume 1 this is 
easy, because the verse divisions appear as asterisks. (Don't ask me why, I 
don't see any divisions in the PDF, but they are there in the 2nd copy of 
volume 1 on the archive ( http://www.archive.org/details/schoolfamilybibl01beni 
) In the other volumes, the verse division is generally the nearest punctuation 
mark, but not always. The "not always" part gets tricky. I'm referring to the 
JSFB PDF, A hardcopy KJV, and a JPS new Tanach to see. 

Additionally, the JSFB has copious foot notes on each page (average 10 notes a 
page). I'm unable to devise a capture technique for the notes on this revision, 
so these are being tossed. The footnote markers are presenting another level of 
special problem, in that they mess with the word they're attached to. 

After all these issues, I by myself, will never be able to certify the correct 
spelling of each word from this witness, and that isn't my intention, because 
there is so much more to do. I'm semi-dyslexic anyway, so editing would never 
be my strong point. This work has a different (unique to me anyway) approach to 
translation, (uses "The Eternal" For the tetragrammation, for example) that 
seems to be interesting enough to study, and I study in bibletime or bible 
desktop, so I want it there. 

The years 2002-2008 were explosive for online texts. Over 1 million books now 
reside at the Internet Archive alone, and Google was a bigger (but more recent) 
operation. However, The bubble is over. The rate of books going online will 
drop significantly due to Microsoft dropping its program, and Google settling 
the lawsuits against it by the publishing industry. 

It is my belief that these texts (especially Judaeo-Christian texts) may not 
always be readily available online, so there is a limited window while they are 
being offered for free download to snag what you can. Also, there are many 
areas of the world where 'bibles' cannot be accessed online.  

When I first started looking for downloadable bibles in 2001, the "universal 
library" ( http://www.ulib.org -- Carnegie Mellon University ) had some bibles 
on it (way more than I could fit on my huge 2G hard drive at the time.)  If you 
go search there now, they are largely missing. For the search "BIBLE" Some 500 
listings come up, but try to actually view one. I don't see this as omission, 
but censorship.  Do the same search on Google (130,000 full view books) and the 
Internet Archive (11,000 texts), and note the difference in quantity of search 
results and availability of the texts. 

In years to come, with more people involved, adding footnotes, italics and 
certifying the spelling may be warranted. For now, making the words of the text 
itself available for study is my intention.  I have a very long list of texts 
to work on, so I won't be 'perfecting' any, but 'improving' many. 

______________________________________
(**) KJV1611 http://www.archive.org/details/holybiblefacsimi00polluoft
(Completely offtopic: the notes in this witness are OCR'd as separate 
collumns.. meaning the text files from this work may be a good candidate for a 
module. I think I put this on the module request list, but it was removed... a 
190 page preface does tend to obscure the fascimile behind it. The old gothic 
characters make for a 95% OCR error rate, but there is a good chance that the 
OCR can be corrected through error analysis.)



      

_______________________________________________
sword-devel mailing list: [email protected]
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Re: [sword-devel] Spelling (was Versification/Encoding issues)

Reply via email to