Greg,

Thank you for posting the issue. I'm still really having a tough time understanding the problem. I know we've been crossing on IRC, so I'm not sure if you are seeing any of my responses to you there.

We have code to hand these divs and not pass them through, as shown here:

http://crosswire.org/svn/sword/trunk/src/modules/filters/osisxhtml.cpp

search for "paragraph" and it should be like the 2nd or 3rd hit, but there is a comment which specifically shows your construct of <div eID="" type="paragraph" />

The end result is that this get's output as <!P><br />

If you look below in your ./lookup output, you will see this exact output.

The <!P> was added for/by gnomesword years ago and can be taken out if you do a grep through the xiphos code and find it not needed any longer. I'm not sure why it was added.

But, the end result is that we do process this construct and should never pass it through. If Bibletime get's it to passed through, then they are not using our filters, either because they are using their own filter distinct filter set, or their filter set overrides this processing and doesn't accept our default processing.

If you point me to an svn or git or whatever link to the Bibletime Render Filter which processes OSIS, I'd be happy to have a look.

Troy


On 09/15/2012 06:56 PM, Greg Hellings wrote:
To emphasize that we have an issue here, in the SWORD filters, here is
the output from diatheke with HTML, HTMLHREF and XHTML (which support
I just hacked in now in order to test).

greg@Gateway08:~/Source/sword/build (master)$ !diath
diatheke -b TKE -o h -f HTMLHREF -k Gen 1:2
Genesis 1:2: Elaboya kayawomele naari kayanna dhego. Yaali mahinje
ooddiiha ni owoopiha yahuruwedhiwe ni yiihi. Muneba wa Mulugu
waviravira vadhulu va mahinje, osasanyedhelaga.  <!/P><br />
(TKE)
greg@Gateway08:~/Source/sword/build (master)$ diatheke -b TKE -o h -f
HTML -k Gen 1:2
<meta http-equiv="content-type" content="text/html;
charset=UTF-8">Genesis 1:2: Elaboya kayawomele naari kayanna dhego.
Yaali mahinje ooddiiha ni owoopiha yahuruwedhiwe ni yiihi. Muneba wa
Mulugu waviravira vadhulu va mahinje, osasanyedhelaga.  <div
eID="gen11" type="paragraph"/><br />
(TKE)
greg@Gateway08:~/Source/sword/build (master)$ diatheke -b TKE -o h -f
XHTML -k Gen 1:2
Genesis 1:2: Elaboya kayawomele naari kayanna dhego. Yaali mahinje
ooddiiha ni owoopiha yahuruwedhiwe ni yiihi. Muneba wa Mulugu
waviravira vadhulu va mahinje, osasanyedhelaga.  <div eID="gen11"
type="paragraph"/>
(TKE)

All three are outputting the same verse from the same module. HTML and
XHTML are outputting <div eID="gen11" type="paragraph"/> which is what
the module has in its rawest form. HTMLHREF outputs <!/P> which is not
valid anything. There are other, odd, differences between the three
but none of those are germane to this discussion, it would seem to me.

$ ./examples/cmdline/lookup TKE Gen.1.2
==Raw=Entry===============
Genesis 1:2:
Elaboya kayawomele naari kayanna dhego. Yaali mahinje ooddiiha ni
owoopiha yahuruwedhiwe ni yiihi. Muneba wa Mulugu<note n="1">1.2*
<catchWord>Muneba wa Mulugu</catchWord> naari wi «pevo yuulubale.»
Mulugu ohukalana muneba mmohi oneethanihu «Muneba Woweela.» Muneba
Woweela ohukamihedha voopaddusiwa elabo. Mwaana a Mulugu, Yesu
Kirisitu, teto ohukamihedha moopaddusa (Zhuwawu 1.1-3; aKolose 1.16;
aHeberi 1.2.)</note> waviravira vadhulu va mahinje, osasanyedhelaga.
<div eID="gen11" type="paragraph"/>
==Render=Entry============
                .divineName {                   font-variant: small-caps;       
        }               .wordsOfJesus {color: red;              }       
Elaboya kayawomele naari kayanna dhego. Yaali mahinje ooddiiha ni
owoopiha yahuruwedhiwe ni yiihi. Muneba wa Mulugu waviravira vadhulu
va mahinje, osasanyedhelaga.  <!/P><br />
==========================
Entry Attributes:

[ Footnote ]
        [ 1 ]
                body = 1.2* <catchWord>Muneba wa Mulugu</catchWord> naari wi 
«pevo
yuulubale.» Mulugu ohukalana muneba mmohi oneethanihu «Muneba
Woweela.» Muneba Woweela ohukamihedha voopaddusiwa elabo. Mwaana a
Mulugu, Yesu Kirisitu, teto ohukamihedha moopaddusa (Zhuwawu 1.1-3;
aKolose 1.16; aHeberi 1.2.)
                n = 1

On Fri, Sep 14, 2012 at 7:15 PM, Chris Little <chris...@crosswire.org> wrote:

On 09/14/2012 01:02 PM, Greg Hellings wrote:
So I've been debugging a module display problem in BibleTime. I
mentioned it on IRC with Troy the other day but we weren't able to
connect at the same time to discuss further. The issue has to do with
paragraph tags - in osis2mod these tags are being converted from <p>
to <div sID="someid" type="paragraph" />.
This is extraordinarily bad. This is a change in semantics, because <p> and
<div type="paragraph"> are not semantically equivalent.

<p> marks the type of paragraph we all probably think of first: generally, a
chunk of text with newlines before and after.

<div type="paragraph"> marks a formal division within a text that happens to
be identified as a 'paragraph' and may consist of multiple <p>-type
paragraphs. Examples of these divisions are found in many laws and the
Catechism of the Catholic Church (which does exist in OSIS form). Here's
part 1, section 1, chapter 1, article 1, paragraph 1 of the CCC:
http://www.vatican.va/archive/ENG0015/__P16.HTM. As you can see, it consists
of many <p>-type paragraphs but is a single <div type="paragraph">-type
paragraph.

Abhorrent though I consider milestoned <p/>, I think I would much prefer to
see us map <p>...</p> to <p sID=""/>...<p eID=""/> than see us clobber the
semantics of a defined <div> type.


Thus, osis2mod is in violation of the suggested XML best practice by
creating a non-EMPTY tag as self-closing but this is seemingly pretty
common in the OSIS world. Furthermore our filters are producing
invalid (or very strongly discouraged) HTML as per every still-in-use
version of the specs (HTML4, XHTML, HTML5). As such, I'm of the
opinion that this represents a bug in SWORD - at the very least in the
filters that permit empty, self-closing div tags to slip through what
are supposedly HTML outputs. Do others agree or disagree on this?
I'm of the opinion that our OSIS is generally fine, meaning we should go
ahead and keep allowing self-closing OSIS tags if possible (as input and
output from osis2mod and as content of modules not produced by osis2mod).
This is just a recommendation and specifically a recommendation for the
purpose of aiding processing with legacy SGML tools, which I can't see us
doing and don't personally care about. (The semantic violation noted above
is a bug in my mind, but that issue is orthogonal.)

I would agree that the filter output is buggy if we're generating disallowed
tag forms. OSIS <div> and <p> would need to be translated to their correct,
non-self-closing HTML forms. Beyond those two, I can't think of any tags
that have the same form & general semantics in both OSIS & HTML.

--Chris



_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page
_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page


_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Reply via email to