I don't have access to Thayers. It is no longer available on CrossWire. So, I have to speak "theoretically" and hopefully you can find and fix the problems.

Sean wrote:
Thanks, your detailed instructions and example (and a little puzzling about how Java works, since i'm not a Java guy) produced some useful results, as well as (of course!) a few more questions related to running this with Thayer's.

1) there are various complaints: i'm not sure if they're significant
org.crosswire.jsword.book.sword.ConfigEntryTable(INFO): Ignoring unexpected entry in orthodoxy of sMinimumVersion org.crosswire.jsword.book.sword.ConfigEntryTable(INFO): Ignoring empty entry in orthodoxy: CopyrightHolder= org.crosswire.jsword.book.sword.ConfigEntryTable(INFO): Ignoring empty entry in orthodoxy: CopyrightDate= org.crosswire.jsword.book.sword.ConfigEntryTable(INFO): Ignoring empty entry in orthodoxy: DistributionNotes= org.crosswire.jsword.book.sword.ConfigEntryTable(INFO): Ignoring empty entry in rsv: CopyrightNotes= org.crosswire.jsword.book.sword.ConfigEntryTable(INFO): Ignoring empty entry in rsv: CopyrightContactEmail= org.crosswire.jsword.book.sword.ConfigEntryTable(INFO): Ignoring empty entry in rsv: DistributionNotes= org.crosswire.jsword.book.filter.thml.THMLFilter(INFO): Could not fix it by cleaning tags: Illegal character or entity reference syntax.


JSword validates the conf files against what is expected or allowed. All of these "Ignoring" are warnings and can be ignored. Most of these have been cleaned up and will disappear if you download a fresh copy from the crosswire server.

The last one is that the input had characters that were out of range. Sword supports only two different encodings, CP1252 (called Latin 1) and UTF-8. If the encoding is UTF-8, then the conf needs to state that. Otherwise, it will interpret the input as CP1252.

If the module is something other than that you will need to re-encode the module into UTF-8.

2) the results from Thayer's seem to have lost the Greek characters. What's in the .imp file looks like some 8-bit chars
ωφελιμος
which i assume is some kind of representation of the Greek characters (haven't quite figured out what: doesn't seem to be UTF-8). But this winds up in the output as a string of '?'s.

When you see a ? or a box in the output, you should verify that you are using a Unicode font or one that contains the unicode characters in the range that interest you.


3) entry 5207 (huios) produces bad XML: looks like a TDNT reference attribute in a sync tag doesn't get its terminating quote (after "8:400"?) and slash+angle bracket ending the sync are also missing: AV-son(s) 85, Son of Man +<sync type="Strongs" value="G444" /> 87 (<sync type="TDNT" value="8:400, 1210), Son of God The fault seems to exist in the .imp file as well (which has these <sync> tags embedded)

JSword assumes that the module is good ThML in the first place. If this is not the case, it will have to be fixed and the module re-created. If


4) there are a number of bare "&" characters in the original which seem to get dropped in the output instead of replaced with &amp; (except for one in #5207, one might suppose because of the unterminated attribute/tag issue)

If the original is ThML and the & are not escaped, these will need to be fixed in the original.


5) There are some issues with the synonym references around ampersands (whether related to #4 i can't tell): the .imp file has For Synonyms see entry <sync type="Strongs" value="G5811" /> & <sync type="Strongs" value="G5889" />
but the OSISified output has
<w lemma='strong:G5811'>
For Synonyms see entry </w><w lemma='strong:G5889'> </w>

Hope this feedback is helpful, and thanks again for the pointers. Unless there's a solution to the problem with the Greek characters, i'll have to fall back to parsing the .imp file by hand, since getting these out is important to me. By the way, what displays in the Sword Project for Thayer's lacks accents and breathing marks, though by comparison i see them in e-Sword's version: anyone happen to know why?

His,
Sean

_______________________________________________
sword-devel mailing list: [email protected]
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Reply via email to