To emphasize that we have an issue here, in the SWORD filters, here is the output from diatheke with HTML, HTMLHREF and XHTML (which support I just hacked in now in order to test).
greg@Gateway08:~/Source/sword/build (master)$ !diath diatheke -b TKE -o h -f HTMLHREF -k Gen 1:2 Genesis 1:2: Elaboya kayawomele naari kayanna dhego. Yaali mahinje ooddiiha ni owoopiha yahuruwedhiwe ni yiihi. Muneba wa Mulugu waviravira vadhulu va mahinje, osasanyedhelaga. <!/P><br /> (TKE) greg@Gateway08:~/Source/sword/build (master)$ diatheke -b TKE -o h -f HTML -k Gen 1:2 <meta http-equiv="content-type" content="text/html; charset=UTF-8">Genesis 1:2: Elaboya kayawomele naari kayanna dhego. Yaali mahinje ooddiiha ni owoopiha yahuruwedhiwe ni yiihi. Muneba wa Mulugu waviravira vadhulu va mahinje, osasanyedhelaga. <div eID="gen11" type="paragraph"/><br /> (TKE) greg@Gateway08:~/Source/sword/build (master)$ diatheke -b TKE -o h -f XHTML -k Gen 1:2 Genesis 1:2: Elaboya kayawomele naari kayanna dhego. Yaali mahinje ooddiiha ni owoopiha yahuruwedhiwe ni yiihi. Muneba wa Mulugu waviravira vadhulu va mahinje, osasanyedhelaga. <div eID="gen11" type="paragraph"/> (TKE) All three are outputting the same verse from the same module. HTML and XHTML are outputting <div eID="gen11" type="paragraph"/> which is what the module has in its rawest form. HTMLHREF outputs <!/P> which is not valid anything. There are other, odd, differences between the three but none of those are germane to this discussion, it would seem to me. $ ./examples/cmdline/lookup TKE Gen.1.2 ==Raw=Entry=============== Genesis 1:2: Elaboya kayawomele naari kayanna dhego. Yaali mahinje ooddiiha ni owoopiha yahuruwedhiwe ni yiihi. Muneba wa Mulugu<note n="1">1.2* <catchWord>Muneba wa Mulugu</catchWord> naari wi «pevo yuulubale.» Mulugu ohukalana muneba mmohi oneethanihu «Muneba Woweela.» Muneba Woweela ohukamihedha voopaddusiwa elabo. Mwaana a Mulugu, Yesu Kirisitu, teto ohukamihedha moopaddusa (Zhuwawu 1.1-3; aKolose 1.16; aHeberi 1.2.)</note> waviravira vadhulu va mahinje, osasanyedhelaga. <div eID="gen11" type="paragraph"/> ==Render=Entry============ .divineName { font-variant: small-caps; } .wordsOfJesus {color: red; } Elaboya kayawomele naari kayanna dhego. Yaali mahinje ooddiiha ni owoopiha yahuruwedhiwe ni yiihi. Muneba wa Mulugu waviravira vadhulu va mahinje, osasanyedhelaga. <!/P><br /> ========================== Entry Attributes: [ Footnote ] [ 1 ] body = 1.2* <catchWord>Muneba wa Mulugu</catchWord> naari wi «pevo yuulubale.» Mulugu ohukalana muneba mmohi oneethanihu «Muneba Woweela.» Muneba Woweela ohukamihedha voopaddusiwa elabo. Mwaana a Mulugu, Yesu Kirisitu, teto ohukamihedha moopaddusa (Zhuwawu 1.1-3; aKolose 1.16; aHeberi 1.2.) n = 1 On Fri, Sep 14, 2012 at 7:15 PM, Chris Little <chris...@crosswire.org> wrote: > > > On 09/14/2012 01:02 PM, Greg Hellings wrote: >> So I've been debugging a module display problem in BibleTime. I >> mentioned it on IRC with Troy the other day but we weren't able to >> connect at the same time to discuss further. The issue has to do with >> paragraph tags - in osis2mod these tags are being converted from <p> >> to <div sID="someid" type="paragraph" />. > > This is extraordinarily bad. This is a change in semantics, because <p> and > <div type="paragraph"> are not semantically equivalent. > > <p> marks the type of paragraph we all probably think of first: generally, a > chunk of text with newlines before and after. > > <div type="paragraph"> marks a formal division within a text that happens to > be identified as a 'paragraph' and may consist of multiple <p>-type > paragraphs. Examples of these divisions are found in many laws and the > Catechism of the Catholic Church (which does exist in OSIS form). Here's > part 1, section 1, chapter 1, article 1, paragraph 1 of the CCC: > http://www.vatican.va/archive/ENG0015/__P16.HTM. As you can see, it consists > of many <p>-type paragraphs but is a single <div type="paragraph">-type > paragraph. > > Abhorrent though I consider milestoned <p/>, I think I would much prefer to > see us map <p>...</p> to <p sID=""/>...<p eID=""/> than see us clobber the > semantics of a defined <div> type. > > >> Thus, osis2mod is in violation of the suggested XML best practice by >> creating a non-EMPTY tag as self-closing but this is seemingly pretty >> common in the OSIS world. Furthermore our filters are producing >> invalid (or very strongly discouraged) HTML as per every still-in-use >> version of the specs (HTML4, XHTML, HTML5). As such, I'm of the >> opinion that this represents a bug in SWORD - at the very least in the >> filters that permit empty, self-closing div tags to slip through what >> are supposedly HTML outputs. Do others agree or disagree on this? > > I'm of the opinion that our OSIS is generally fine, meaning we should go > ahead and keep allowing self-closing OSIS tags if possible (as input and > output from osis2mod and as content of modules not produced by osis2mod). > This is just a recommendation and specifically a recommendation for the > purpose of aiding processing with legacy SGML tools, which I can't see us > doing and don't personally care about. (The semantic violation noted above > is a bug in my mind, but that issue is orthogonal.) > > I would agree that the filter output is buggy if we're generating disallowed > tag forms. OSIS <div> and <p> would need to be translated to their correct, > non-self-closing HTML forms. Beyond those two, I can't think of any tags > that have the same form & general semantics in both OSIS & HTML. > > --Chris > > > > _______________________________________________ > sword-devel mailing list: sword-devel@crosswire.org > http://www.crosswire.org/mailman/listinfo/sword-devel > Instructions to unsubscribe/change your settings at above page _______________________________________________ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page