I thought I'd let everyone know where the KJV2006 project stands. So far it has been a solo effort. Not that I don't want help, but that it has all been prep work so far.

I have dumped the KJV2003 by books into
   www.crosswire.org/svn/modules/KJV/trunk/text
and have created a tag for it at
   www.crosswire.org/svn/modules/KJV/tags/kjv2003.

Each book is named according to its OSIS book name and suffixed with .xml. Each is a complete OSIS document, but many are not well-formed xml and of those that are, none are valid OSIS. (In fact almost every verse was not valid! More on that later.)

I have written a program to make the files well-formed and valid. (But not necessarily good OSIS) This program also checks that this is true. You can see the program at: http://www.crosswire.org/svn/jsword/trunk/jsword/src/main/java/org/crosswire/jsword/examples/ModToOsis.java

I am using this program to make global changes to the files. Troy has asked me to make the global changes before checking in the files. So when I get a few more questions answered and finish Troy's global change requests, I'll check in everything. I'll also see about creating a module for the beta area.

Then we can start fixing text problems.

Here is a summary of the changes:
1) fix the <note/>...</note> problem
2) replace <p/> (not allowed under OSIS) with <pb/>
3) On <w> elements, replace x-Strongs: with strong: and x-Robinson: with robinson: (OSIS does not like the x- as a prefix to a work id)
4) On <w> elements, changed splitID="n"  to type="x-split" subType="x-n"
5) On <w> elements removed the attributes without any values. (XML requires attributes to have values) 6) revert x-preverse to an enclosing <div type="section><title>...</title>.......</div> 7) changed type="transChanged" subType="type:added" to type="x-transChanged" subType="x-added"
   This was used in the following construct:
<w>...<seg type="x-transChange" subType="x-added">...</seg> ... </w>
   OSIS requires x- prefix for both type and subType for <seg> elements.
8) deleted all <resp> elements as this has never been part of the OSIS standard. resp is a global attribute. I could merge it with the preceding <note type="x-strongsMarkup">....</note> However, I think these "notes" should be removed as well. 9) Fixed 81 verses that had improperly specified <w> elements, either nested or containing <transChange>.
10) Fixed a few locations where xml elements overlapped as in <a><b></a></b>

Next steps:
1) merge empty indefinite articles to their following element as requested by Troy. 2) Get rid of <note type="x-strongMarkup">...</note> These contain notes from the taggers of the KJV2003 project, regarding the tagging. These also contain URL escape sequences. They look really bad when they show up to an end user. 3) Create a module with these changes for beta testing. It may be that we need to change the markup to what the SWORD API expects. If so, I recommend using xslt just before making the module.
4) Open up the effort for others to identify and correct problems.

I'm thinking that we might want to have 2 releases. An initial one that has the fixes that I have done so far and #1 & #2 from next steps. Then one that contains the fixes for the missing 's in the OT and any other problems that are found.

I also want to experiment with using <q sID="xxx" who="Jesus"/> ... <q eID="xxx"> to see if SWORD for Windows can handle it correctly. If so, I am inclined to change all quotes to this form. (feedback desired)





_______________________________________________
sword-devel mailing list: [email protected]
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Reply via email to