On Jun 4, 2009, at 5:36 PM, Chris Little wrote:
Firstly, DM, let me express my appreciation for your taking on the
task of improving osis2mod further. It's a daunting task and I'm
glad not to be tackling it myself right now. :)
There are a couple of sections of your Wiki addition from today that
concern me. Hopefully they'll be easy to address, but I'm not even
sure whether these are implemented or only planned features, having
not bothered to read the code.
My first concern is with the conversion of <p>...</p> to <div
type="paragraph" sID="genX"/>...<div type="paragraph" eID="genX"/>.
My problem with this is the use of "paragraph" here in what is
essentially a private-use semantic. A <div type="paragraph"> is
already defined by OSIS for use in works that have a structural
division called paragraphs (as in law codes and, if I recollect
correctly, the Catechism of the Catholic Church). I think osis2mod
should instead translate <p>...</p> to something with an x- type,
e.g. <div type="x-p" sID="genX"/>...<div type="x-p" eID="genX"/>.
I guess it is a private use semantic. Given that osis2mod also handles
commentaries, I think it is important to have both allowed. I'll make
the change to osis2mod, and will give you a patch for the filters.
The problem I had with the prior implementation (which I did), was
that it used the <lb> element. The <lb/> element is roughly the
equivalent of <br/> and one of the problems in using it was that it
did not have precisely the same semantic as a <p> or a </p>. Most web
browsers, which most SWORD front-ends use for display, will show
subtle differences between the two. For example,
<div>
<p>
will typically result in a single new line.
<div>
<br/>
will typically result in 2 new lines.
The advantage to changing it to <div> is that it minimizes the problem
by allowing for block element semantics.
My other concern may be less easily addressed. The Wiki now states
that, when using <title>, the attributes type="book" and
type="chapter" are required for book and chapter titles
respectively. I have no problem with osis2mod adding these type
attributes, but they can't be required if we intend to maintain the
policy that osis2mod should accept any valid & best practice
conformant document. I don't know whether this is actually written
down anywhere, but somewhere between the OSIS 1.0 and 2.0, the
committee decided that <title> types would be inherited from their
parent element. So a <title> whose parent is <chapter> implicitly
has type="chapter" and one whose parent is <div type="book">
implicitly has type="book".
The manual needs some help then. While it might be in there, I see no
mention of type of a title being implied by it's parent.
The should be reworded. I was simply wrong about the reference to what
goes into the book introduction. Everything between the opening of a
book and the first chapter is put into the book introduction.
The problem comes with material between the start of a chapter and the
first verse of the chapter. The material might be a chapter intro, a
verse intro or both. The trouble is deciding what the boundary between
the two should be.
Here is the comment from the code:
// Have we found the start of pre-verse material?
// Pre-verse material follows the following rules
// 1) Between the opening of a book and the first
chapter, all the material is handled as an introduction to the book.
// 2) Between the opening of a chapter and the first
verse, the material is split between the introduction of the chapter
// and the first verse of the chapter.
// A <div> with a type other than section will be
taken as a chapter introduction.
// A <title> of type acrostic, psalm or no type,
will be taken as a title for the verse.
// A <title> of type main or chapter will be seen
as a chapter title.
// 3) Between verses, the material is split between
the prior verse and the next verse.
// Basically, while end and empty tags are found,
they belong to the prior verse.
// Once a begin tag is found, it belongs to the
next verse.
// If the title has an attribute type of "main" or
"chapter"
// it belongs to its <div> or <chapter> and is
treated as part of its heading
// Otherwise if it a title in a chapter before the
first the first verse it
// is put into the verse as a preverse title.
If in this location there is a div that has no type or has a type
other than section, it is seen as part of the introduction. This would
be something like:
<chapter>
<div> introductory material </div>
...
<verse n="1">...</verse>
The code assumes that the begin div element without type="section"
goes into the chapter introduction. It does not assume that the div
finishes before the first verse. This is important to note.
If a title is seen without an attribute or an attribute other than
main or chapter, it is understood to be a title for the first verse.
Otherwise it is for the chapter.
Once a transition is seen, then the division between chapter
introduction and pre-verse material is set.
The code is still not very smart. It can be improved. For example, it
does not know parent/child or sibling relationships. If this were to
be added then we could say that a title immediately followed a chapter
or was in a non-section div within the chapter introduction.
Anyway, as a real example, consider the following:
<chapter osisID="Ps.119">
<title>Chapter 119</title>
<title type="acrostic">...</title>
<verse osisID="Ps.119.1">verse text</verse>
Where should this be split? In getting feedback for the KJV most felt
that the second title should be attached to the first verse.
Likewise for Psalm 3, which use type="psalm".
So what should the rule be?
The simplest change would be that a title without a type attribute or
one of main or chapter is seen as a chapter title.
In Him,
DM
_______________________________________________
sword-devel mailing list: [email protected]
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page