On Jun 4, 2009, at 5:36 PM, Chris Little wrote:

Firstly, DM, let me express my appreciation for your taking on the task of improving osis2mod further. It's a daunting task and I'm glad not to be tackling it myself right now. :)

There are a couple of sections of your Wiki addition from today that concern me. Hopefully they'll be easy to address, but I'm not even sure whether these are implemented or only planned features, having not bothered to read the code.

My first concern is with the conversion of <p>...</p> to <div type="paragraph" sID="genX"/>...<div type="paragraph" eID="genX"/>. My problem with this is the use of "paragraph" here in what is essentially a private-use semantic. A <div type="paragraph"> is already defined by OSIS for use in works that have a structural division called paragraphs (as in law codes and, if I recollect correctly, the Catechism of the Catholic Church). I think osis2mod should instead translate <p>...</p> to something with an x- type, e.g. <div type="x-p" sID="genX"/>...<div type="x-p" eID="genX"/>.

I guess it is a private use semantic. Given that osis2mod also handles commentaries, I think it is important to have both allowed. I'll make the change to osis2mod, and will give you a patch for the filters.

The problem I had with the prior implementation (which I did), was that it used the <lb> element. The <lb/> element is roughly the equivalent of <br/> and one of the problems in using it was that it did not have precisely the same semantic as a <p> or a </p>. Most web browsers, which most SWORD front-ends use for display, will show subtle differences between the two. For example,
<div>
  <p>
will typically result in a single new line.
<div>
  <br/>
will typically result in 2 new lines.

The advantage to changing it to <div> is that it minimizes the problem by allowing for block element semantics.


My other concern may be less easily addressed. The Wiki now states that, when using <title>, the attributes type="book" and type="chapter" are required for book and chapter titles respectively. I have no problem with osis2mod adding these type attributes, but they can't be required if we intend to maintain the policy that osis2mod should accept any valid & best practice conformant document. I don't know whether this is actually written down anywhere, but somewhere between the OSIS 1.0 and 2.0, the committee decided that <title> types would be inherited from their parent element. So a <title> whose parent is <chapter> implicitly has type="chapter" and one whose parent is <div type="book"> implicitly has type="book".

The manual needs some help then. While it might be in there, I see no mention of type of a title being implied by it's parent.

The should be reworded. I was simply wrong about the reference to what goes into the book introduction. Everything between the opening of a book and the first chapter is put into the book introduction.

The problem comes with material between the start of a chapter and the first verse of the chapter. The material might be a chapter intro, a verse intro or both. The trouble is deciding what the boundary between the two should be.

Here is the comment from the code:
                // Have we found the start of pre-verse material?
                // Pre-verse material follows the following rules
// 1) Between the opening of a book and the first chapter, all the material is handled as an introduction to the book. // 2) Between the opening of a chapter and the first verse, the material is split between the introduction of the chapter
                //    and the first verse of the chapter.
// A <div> with a type other than section will be taken as a chapter introduction. // A <title> of type acrostic, psalm or no type, will be taken as a title for the verse. // A <title> of type main or chapter will be seen as a chapter title. // 3) Between verses, the material is split between the prior verse and the next verse. // Basically, while end and empty tags are found, they belong to the prior verse. // Once a begin tag is found, it belongs to the next verse. // If the title has an attribute type of "main" or "chapter" // it belongs to its <div> or <chapter> and is treated as part of its heading // Otherwise if it a title in a chapter before the first the first verse it
                // is put into the verse as a preverse title.

If in this location there is a div that has no type or has a type other than section, it is seen as part of the introduction. This would be something like:
<chapter>
   <div> introductory material </div>
   ...
   <verse n="1">...</verse>

The code assumes that the begin div element without type="section" goes into the chapter introduction. It does not assume that the div finishes before the first verse. This is important to note.

If a title is seen without an attribute or an attribute other than main or chapter, it is understood to be a title for the first verse. Otherwise it is for the chapter.

Once a transition is seen, then the division between chapter introduction and pre-verse material is set.

The code is still not very smart. It can be improved. For example, it does not know parent/child or sibling relationships. If this were to be added then we could say that a title immediately followed a chapter or was in a non-section div within the chapter introduction.

Anyway, as a real example, consider the following:
<chapter osisID="Ps.119">
<title>Chapter 119</title>
<title type="acrostic">...</title>
<verse osisID="Ps.119.1">verse text</verse>

Where should this be split? In getting feedback for the KJV most felt that the second title should be attached to the first verse.

Likewise for Psalm 3, which use type="psalm".

So what should the rule be?

The simplest change would be that a title without a type attribute or one of main or chapter is seen as a chapter title.

In Him,
        DM



_______________________________________________
sword-devel mailing list: [email protected]
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Reply via email to