You didn't address my main point: Content providers should be given a
way to have final control over how their formatted texts appear, and one
which is simple and reliable. I'll comment below, but a Bible
translation is not a web-page or an app which might need a new look
someday, or a new skin. CSS and content abstraction etc. are great
ideas, but they should not be artificially forced onto Bible publishers.
Yes, they should be offered, and even encouraged- fine. But publishers
should be able to say: "This is exactly how I want the formatting,
everywhere, any time. Period." I don't understand why this expectation
is so abhorrent. Offering a handful of content abstractions and
extensions, all of whose definitions are arguable (see below) and likely
in flux, is neither simple, nor satisfying to content providers who
desire control over the presentation of their texts.
I've worked with many, many SFM texts, and they often do not follow SFM
rules or play nice in a variety of ways. All of this greatly complicates
an already serious conversion from SFM to Sword. The proof may in the
the pudding. Simple is sometimes better in the real world. Sure, IBT
could recreate their modules using container elements, but that still
would not provide the reliability or control enjoyed by the existing
modules. I still don't see (beyond theory and arguable semantics) a good
reason to deny "customers" a sound and working solution.
On 04/12/2013 03:26 PM, Chris Little wrote:
Executive summary:
I don't have a problem with making it clear how to encode indented
paragraphs and line breaks and improving support for diverse paragraph
types.
I do have problems with the specific syntax and the rationale described
below.
On 4/11/2013 11:04 PM, John Austin wrote:
Sword should support basic indents and line breaks. Content providers
should be able to control the formatting of their texts and should not
be required to assign their content to artificial <p>...</p> or other
containers to do so. Although these containers might be useful, the
text of some translation styles cannot be fit nicely into them. But
often content providers do rightly desire their texts to appear with
formatting similar to their printed texts, since this is exactly what
the translators deemed easiest to read and understand.
People who convert texts to Sword are often not at liberty to change
the source texts to do so, and source texts in strange languages come
with many unexpected language constructs. For these reasons it is
important that Sword tries to offer content providers a simple,
reliable way of formatting their own texts, without requiring them to
fit into Sword's container scheme to do so.
IBT of Russia is already using simple osis <milestone
type="x-p-indent" /> and <lb /> to achieve all their formatting needs
for their Sword modules. Currently, only xulsword supports both of
these. But perhaps they should both be included in Sword's osis2html
filters so that all front-ends can support them. At least something
very similar should be adopted, if there is a strong reason not to
adopt IBT's well tested method.
So, encoders should not have to assign content to 'artificial
<p>...</p>' but they should have to encode an artificial <milestone
type="x-p-indent"/>? They shouldn't assign content to the structure that
it clearly is (<p>...</p>), rather to an imagined indentation object?
Something like " Бўаз Рутга:" is not clearly a <p> even though that
is how at appears in SFM, and that is how it would appear in the module
according to your argument. For instance, if some front-end designer
thinks it is really neat for his front-end's paragraphs to have
drop-caps and so he modifies his CSS to add them to "paragraphs"- Then
my text is completely broke because, in fact the above is NOT a
paragraph, in any language. It is, in fact, an indented line.
There's not a location or an object that represents indentation.
Indentation is a property of paragraphs, so it should be marked on
paragraphs, as is our current practice.
Indentation is a property of paragraphs- usually... but not always...
well, it depends... This is exactly why Sword also needs a simple
indent. One which is always an indent.
Here's the list of paragraph types from the USFM reference along with
the paragraph type that usfm2osis.py will generate (in the form of a
Python dict): {'pc':'x-center', 'pr':'x-right', 'm':'x-noindent',
'pmo':'x-embedded-opening', 'pm':'x-embedded',
'pmc':'x-embedded-closing', 'pmr':'x-right', 'pi':'x-indented-1',
'pi1':'x-indented-1', 'pi2':'x-indented-2', 'pi3':'x-indented-3',
'pi4':'x-indented-4', 'pi5':'x-indented-5', 'mi':'x-noindent-indented',
'nb':'x-nobreak', 'phi':'x-indented-hanging', 'ps':'x-nobreakNext',
'psi':'x-nobreakNext-indented', 'p1':'x-level-1', 'p2':'x-level-2',
'p3':'x-level-3', 'p4':'x-level-4', 'p5':'x-level-5'}.
I believe that a bare <p> should, by default, be indented. The only case
where it shouldn't would be in a translation without any paragraphs,
which should have each verse start on a new line. I would argue that the
OSIS filters should be improved to translate these OSIS <p> types to
(X)HTML <p> classes or CSS or such. But we should not be supporting an
indentation milestone and generating s or something similar to
simulate indentation. (Nor should we translate indentation milestones to
(X)HTML <p> classes or CSS, if that's your implementation.)
There is a demonstrated need for an indent, and a good implementation.
Where is the serious argument for why Sword should deny support for that?
I presume you're already happy with the handling of <lb/>.
Assuming they always render (when formatting is desired of course) as
basic line breaks, and NOT as blank lines (similar to <br> in html) then
yes.
Hard spaces and other such formatting are not acceptable solutions
because they cannot be easily filtered. It is important that
unformatted text can easily be obtained from formatted text since
there are many uses for unformatted text, such as bookmark and
cross-reference verse texts etc.
Here is one example to show why forcing containers on a text is not a
good idea. This is a section of SFM from the book of Ruth 1:8-12:
\v 8 Йўлда давом этишаркан, Наима иккала келинига деди:
\p — Боринглар, икковингиз ҳам оналарингизнинг уйларига қайтинглар.
Менга ва марҳумларга бўлган иззат–ҳурматингиз учун Эгам сизларга ҳам
марҳамат қилсин.
\v 9 Икковларингизга ҳам яхши жойлардан ато қилсин, турмуш қуриб, ўз
эрларингиз билан бахтли бўлинглар!
\p Шундай деб Наима келинларини ўпди, иккаласи эса йиғлаб фарёд
кўтаришди:
\p
\v 10 — Йўқ, биз сиз билан кетамиз, сизнинг халқингиз орасида яшаймиз,
— дейишди.
\v 11 Наима эса яна келинларига:
\p — Қайтинглар, жон қизларим! — деди. — Мен билан кетганингиздан нима
фойда?! Қорнимда яна ўғилларим бормидики сизларга умр йўлдоши бўлса?!*
\v 12 Бўлди энди, қизларим, қайтинглар! Мен энди кексайдим, эрга
тегишга ожизман. Борди–ю, мен, ҳали умид қилсам бўлади, деб шу кеча
эрим билан қовушсаму ўғиллар туғсам,
Here is a PDF of exactly what the translators designed this SFM to
look like:
http://ibt.org.ru/russian/bible/uzb/otcyr/08%20Rut%20-%20Uzbek%20Cyrillic.pdf
And here is what it looks like in Sword format using only basic osis
intents and line breaks, rendered by xulsword's osis2html filter:
http://ibt.org.ru/en/text.htm?m=UZV&l=Ruth.1.1.1&g=0. As you can see,
the Sword module renders this strange (to us) formatting of text just
like the translators wanted.
However, now imagine trying to programmatically apply <p>...</p>,
<l>...</l> etc. constructs to the above SFM to achieve the same
effect. The designers of the SFM in this case are using the \p tag to
represent a simple indent (not a paragraph) in order to achieve their
desired non-Western layout. One might try and argue that the SFM
designers have done something wrong, but the point is that we have
what we have. So Sword should provide a simple way for content
providers to control the formatting of their texts. Basic indents and
line breaks do the trick for Central Asian languages, and probably
many others as well. Poetry is even made easy, by putting indents in
series as desired.
I disagree. Those are paragraphs. I'm not sure why you would argue that
something which looks like a paragraph, acts like a paragraph, and is
encoded using paragraph markup is nevertheless not a paragraph. You can
achieve your desired typesetting by putting the paragraphs in <p>
elements and indenting them all. (Again, I would argue that all
paragraphs should be indented, except in unparagraphed translations.)
Again, they are not paragraphs as most would understand them. Because if
they inherited any typical "paragraph" formatting, other than the
indent, they would render completely wrong. The fact that there is
serious discussion about whether they are paragraphs or not makes the
importance of point #1 clear as day: The content provider needs a simple
way to have control over their formatting. Now, forever, period.
My only guess is that you don't believe paragraph breaks can occur
within sentences, but evidently they can.
There's already defined syntax for poetry formatting using the level
attribute.
--Chris
_______________________________________________
sword-devel mailing list: [email protected]
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page
_______________________________________________
sword-devel mailing list: [email protected]
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page