On 8/1/11 7:03 AM, DM Smith wrote:
David's observation has got me thinking on whether there is a way to
detect mismatches. The nature of osis2mod is to be lossless with regard
to biblical material. If a verse in the OSIS file is not in the v11n,
then it is appended it to the prior verse entry (it is a bit more
complicated than that, but it gives the idea).

I think a statistical analysis of a text could find such verses. Maybe a
comparison of the word count per verse of a Greek text for the NT and a
Hebrew text for the OT could serve as the expected. For a translation,
it's word count would be compared to the reference. I would guess the
ratio of words in the original compared to the translation would be
fairly tight. Anything differing significantly from the ratio (perhaps
1.5x standard deviation of the ratio) would be flagged.

Since the problem is potentially found only in the last verse of a
chapter, there could be a flag to report only those. (Analysis would
probably need to be done on the entire text to get a fair ratio and
standard deviation.)

I'm sure that there are problems with such an idea. And I'm not sure
whether it would serve much value beyond checking those with a KJV v11n.

In Him,
DM

Modules that were re-versified by Sword tools are rather easy to identify and then convert back to their native versification, since we include identification (within the text) of the concatenated verses and their original identity.

The excerpted Ukrainian text doesn't include these (unless David happened to remove them before posting), so there's not much chance of our being able to export the text, return it to its native versification, and re-import it, since the text came to us in a KJV-versified format.

--Chris

_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Reply via email to