3 brief points.

The HTML filter set is old and no one I know of uses this filter set. HTMLHREF, WEBIF, and XHTML are the 3 filter sets I know which are in use today. I've started to switch SWORDWeb from WEBIF to the XHTML filter set. Once this is done, I wouldn't mind deprecating both the WEBIF and HTML filter sets. Eventually, I'd like to deprecate the HTMLHREF filter set, leaving only one (XHTML) filter set we all use in common, but I know xiphos, and others are still using this as the primary HTML output filter set.

You should be seeing <!P><br /> output from this <div type="paragraph"> construct, not simply <!P>. Again, let's remove the <!P> if xiphos no longer needs it. <br /> is certainly valid, even if not necessarily the most desirable XHTML output for a paragraph division.

Bibletime should be calling module->RenderText(preverseBuffer) to get the processed preverse material instead of the raw preverse markup. Simply stripping the tags doesn't seem the most desired behavior.

Troy



On 09/16/2012 01:54 AM, Greg Hellings wrote:
On Sat, Sep 15, 2012 at 5:11 PM, Troy A. Griffitts <scr...@crosswire.org> wrote:
Greg,

Thank you for posting the issue.  I'm still really having a tough time
understanding the problem.  I know we've been crossing on IRC, so I'm not
sure if you are seeing any of my responses to you there.

Anything you say while my Nick is in the channel is saved by ZNC and
bounced to me the next time I login, up until I manually clear the
logs. So yes, I've been getting the messages you've sent.

We have code to hand these divs and not pass them through, as shown here:

http://crosswire.org/svn/sword/trunk/src/modules/filters/osisxhtml.cpp

search for "paragraph" and it should be like the 2nd or 3rd hit, but there
is a comment which specifically shows your construct of <div eID=""
type="paragraph" />

The end result is that this get's output as <!P><br />

If you look below in your ./lookup output, you will see this exact output.
That output is the result of FMT_WEBIF rendering. I'm not sure exactly
what that is, so I can't speak to that.

When I rebuild with HTMLHREF and XHTML I get <!/P>. This makes fine
for HTMLHREF according to what Chris has said elsewhere and you state
below as that is intended for use by GS/Xiphos. That does not make for
acceptable XHTML - it is not valid.

When I rebuild lookup with FMT_HTML I am still seeing the div tag
passed through untouched. That is not valid HTML as discussed earlier
in this thread unless we're hoping to target a very strongly
discouraged construct of an older version of HTML.

Strangely, I can't get the output of Diatheke and lookup to sync up on
the XHTML results.

The <!P> was added for/by gnomesword years ago and can be taken out if you
do a grep through the xiphos code and find it not needed any longer.  I'm
not sure why it was added.

But, the end result is that we do process this construct and should never
pass it through.  If Bibletime get's it to passed through, then they are not
using our filters, either because they are using their own filter distinct
filter set, or their filter set overrides this processing and doesn't accept
our default processing.
The issue in BibleTime has already been taken care of. This only came
to light because the offending <div> tags were in the preverse
material which BibleTime does not pass through any filters but instead
simply strips tags out of the raw text. I can't pretend to know what
that is a good idea, but I'm not interested in that - only in getting
my module looking correct.

I figured I'd point out the discrepancies between SWORD's usages and
the specs in the meantime. To that point, XHTML and HTML are still
generating invalid output according to lookup.

--Greg

If you point me to an svn or git or whatever link to the Bibletime Render
Filter which processes OSIS, I'd be happy to have a look.

Troy


On 09/15/2012 06:56 PM, Greg Hellings wrote:
To emphasize that we have an issue here, in the SWORD filters, here is
the output from diatheke with HTML, HTMLHREF and XHTML (which support
I just hacked in now in order to test).

greg@Gateway08:~/Source/sword/build (master)$ !diath
diatheke -b TKE -o h -f HTMLHREF -k Gen 1:2
Genesis 1:2: Elaboya kayawomele naari kayanna dhego. Yaali mahinje
ooddiiha ni owoopiha yahuruwedhiwe ni yiihi. Muneba wa Mulugu
waviravira vadhulu va mahinje, osasanyedhelaga.  <!/P><br />
(TKE)
greg@Gateway08:~/Source/sword/build (master)$ diatheke -b TKE -o h -f
HTML -k Gen 1:2
<meta http-equiv="content-type" content="text/html;
charset=UTF-8">Genesis 1:2: Elaboya kayawomele naari kayanna dhego.
Yaali mahinje ooddiiha ni owoopiha yahuruwedhiwe ni yiihi. Muneba wa
Mulugu waviravira vadhulu va mahinje, osasanyedhelaga.  <div
eID="gen11" type="paragraph"/><br />
(TKE)
greg@Gateway08:~/Source/sword/build (master)$ diatheke -b TKE -o h -f
XHTML -k Gen 1:2
Genesis 1:2: Elaboya kayawomele naari kayanna dhego. Yaali mahinje
ooddiiha ni owoopiha yahuruwedhiwe ni yiihi. Muneba wa Mulugu
waviravira vadhulu va mahinje, osasanyedhelaga.  <div eID="gen11"
type="paragraph"/>
(TKE)

All three are outputting the same verse from the same module. HTML and
XHTML are outputting <div eID="gen11" type="paragraph"/> which is what
the module has in its rawest form. HTMLHREF outputs <!/P> which is not
valid anything. There are other, odd, differences between the three
but none of those are germane to this discussion, it would seem to me.

$ ./examples/cmdline/lookup TKE Gen.1.2
==Raw=Entry===============
Genesis 1:2:
Elaboya kayawomele naari kayanna dhego. Yaali mahinje ooddiiha ni
owoopiha yahuruwedhiwe ni yiihi. Muneba wa Mulugu<note n="1">1.2*
<catchWord>Muneba wa Mulugu</catchWord> naari wi «pevo yuulubale.»
Mulugu ohukalana muneba mmohi oneethanihu «Muneba Woweela.» Muneba
Woweela ohukamihedha voopaddusiwa elabo. Mwaana a Mulugu, Yesu
Kirisitu, teto ohukamihedha moopaddusa (Zhuwawu 1.1-3; aKolose 1.16;
aHeberi 1.2.)</note> waviravira vadhulu va mahinje, osasanyedhelaga.
<div eID="gen11" type="paragraph"/>
==Render=Entry============
                 .divineName {                   font-variant: small-caps;
}               .wordsOfJesus {color: red;              }
Elaboya kayawomele naari kayanna dhego. Yaali mahinje ooddiiha ni
owoopiha yahuruwedhiwe ni yiihi. Muneba wa Mulugu waviravira vadhulu
va mahinje, osasanyedhelaga.  <!/P><br />
==========================
Entry Attributes:

[ Footnote ]
         [ 1 ]
                 body = 1.2* <catchWord>Muneba wa Mulugu</catchWord> naari
wi «pevo
yuulubale.» Mulugu ohukalana muneba mmohi oneethanihu «Muneba
Woweela.» Muneba Woweela ohukamihedha voopaddusiwa elabo. Mwaana a
Mulugu, Yesu Kirisitu, teto ohukamihedha moopaddusa (Zhuwawu 1.1-3;
aKolose 1.16; aHeberi 1.2.)
                 n = 1

On Fri, Sep 14, 2012 at 7:15 PM, Chris Little <chris...@crosswire.org>
wrote:

On 09/14/2012 01:02 PM, Greg Hellings wrote:
So I've been debugging a module display problem in BibleTime. I
mentioned it on IRC with Troy the other day but we weren't able to
connect at the same time to discuss further. The issue has to do with
paragraph tags - in osis2mod these tags are being converted from <p>
to <div sID="someid" type="paragraph" />.
This is extraordinarily bad. This is a change in semantics, because <p>
and
<div type="paragraph"> are not semantically equivalent.

<p> marks the type of paragraph we all probably think of first:
generally, a
chunk of text with newlines before and after.

<div type="paragraph"> marks a formal division within a text that happens
to
be identified as a 'paragraph' and may consist of multiple <p>-type
paragraphs. Examples of these divisions are found in many laws and the
Catechism of the Catholic Church (which does exist in OSIS form). Here's
part 1, section 1, chapter 1, article 1, paragraph 1 of the CCC:
http://www.vatican.va/archive/ENG0015/__P16.HTM. As you can see, it
consists
of many <p>-type paragraphs but is a single <div type="paragraph">-type
paragraph.

Abhorrent though I consider milestoned <p/>, I think I would much prefer
to
see us map <p>...</p> to <p sID=""/>...<p eID=""/> than see us clobber
the
semantics of a defined <div> type.


Thus, osis2mod is in violation of the suggested XML best practice by
creating a non-EMPTY tag as self-closing but this is seemingly pretty
common in the OSIS world. Furthermore our filters are producing
invalid (or very strongly discouraged) HTML as per every still-in-use
version of the specs (HTML4, XHTML, HTML5). As such, I'm of the
opinion that this represents a bug in SWORD - at the very least in the
filters that permit empty, self-closing div tags to slip through what
are supposedly HTML outputs. Do others agree or disagree on this?
I'm of the opinion that our OSIS is generally fine, meaning we should go
ahead and keep allowing self-closing OSIS tags if possible (as input and
output from osis2mod and as content of modules not produced by osis2mod).
This is just a recommendation and specifically a recommendation for the
purpose of aiding processing with legacy SGML tools, which I can't see us
doing and don't personally care about. (The semantic violation noted
above
is a bug in my mind, but that issue is orthogonal.)

I would agree that the filter output is buggy if we're generating
disallowed
tag forms. OSIS <div> and <p> would need to be translated to their
correct,
non-self-closing HTML forms. Beyond those two, I can't think of any tags
that have the same form & general semantics in both OSIS & HTML.

--Chris



_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page
_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page


_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page
_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page


_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Reply via email to