RE: In defense of Plane 14 language tags (long)

Marco Cimarosti Tue, 05 Nov 2002 04:04:05 -0800

Doug Ewell wrote:
> [...]
> Readers are asked to consider the following arguments individually, so
> that any particular argument that seems untenable or contrary to
> consensus does not affect the validity of other arguments.
> [...]


Here are my three pence *pro* the deprecation:

> 1.  Language tags may be useful for display issues.
> 
> The most commonly suggested use, and the original impetus, 
> for Plane 14 language tags is to suggest to the display
> subsystem that “Chinese-style” or “Japanese-style” glyphs
> are preferred for unified Han characters. [...]

IMHO, there has never been any practical need to consider these glyphic
differences in plain text. It is a non-issue raised to the rank of issue
because of obscure political reasons.

It is false that Japanese is unreadable if displayed with Chinese-style
glyphs, or that Polish is unreadable if displayed with Spanish-styles acute
accents.

It is true that any language looks odd if displayed with an improper font,
and that these esthetic issues must be properly addressed in "rich text" and
in decent typography.

But such a level of graphical correctness does not apply to plain text: if
it would apply, we should also rule out many other typographic
simplifications which are in current use, such as fixed-width fonts for
Western script, fixed-height fonts for the Arabic script, horizontal display
of Japanese, etc.

> 2.  Language tags may be useful for non-display issues.
> 
> Although not frequently mentioned, plain-text language tagging could
> also be useful for applications such as speech synthesis,
> spell-checking, and grammar checking. [...]

These kinds of applications cannot rely on the presence of any kinds of
language tagging because, in most real-word cases, this will not be present.

{ As a side note, the idea that a language my use "foreign" words seems
terribly naive to me. It is true that, in Italian, we use loanwords such as
"hardware", "punk", or "footing", but it would be silly to consider or tag
them as "English words". They are genuinely Italian words, as demonstrated
by the fact that their pronunciation is very different from the English
(['ɑrdwer(e)], ['pɑŋk(e)] and ['futiŋg(e)], respectively), that their
morphology is different (e.g., plural is invariable), and that their meaning
is slightly different ("hardware" only refers to computers, "punk" only
refers to music and fashion), or even totally different from the English
original ("footing" means "jogging"). }

> 3.  Conflict with HTML/XML tags need not be a problem.
> 
> A common criticism of the Plane 14 language tags is that higher-level
> protocols such as HTML and XML already provide a mechanism 
> for language tagging.  There is a concern that the language specified
> by the “lang” attribute in HTML or “xml:lang” attribute in XML could 
> conflict with the one specified in a Plane 14 language tag, [...]

As I see it, the problem is not merely that the two fashions of tags may
specifying different languages. That would not be a real conflict. It is
perfectly legitimate to embed language tags into each other: the rule is
that the inner language tag wins. This general rule can be extended to
accommodate plain text tags, they will always take the precedence as they
clearly are the innermost specification.

The real problem is with *overlapping* and *unpaired* tags. XML parsers have
built in validation of the tree structure of a document, which ensures that
all tags are properly opened, closed and embedded into each other. E.g.,
overlapping spans like:

        <x lang="en"> ABC <y lang="fr"> DEF </x> GHI </y>

would not pass validation because the English and French span overlap
irregularly (as do tags <x> and <y>).

But that built-in validation cannot properly detect a situations like:

        <x lang="en"> ABC \uE0001 \uE0066 \uE0072 DEF </x> GHI \uE007F

where the English span (specified in tag <x>) overlap with the French span
(specified with plain text tags).

Just suggesting to ignore plain text tags is no solution, because this would
waste part of the information (and the author's effort provide this
information).

> 6.  Plane 14 tags are easy to filter out, and harmless if not
> interpreted.

If they are not processed correctly or filtered out, they are by no means
harmless.

If they are rendered as visible glyphs (such as [LNG][f][r]) or with
"missing glyph" boxes, they clutter the text, making it less readable --
i.e., they pejorate the main problem that they were supposed to solve.

If they are rendered as invisible glyphs, they make the text more difficult
to edit and to move the cursor within, because the user will have no way of
understanding why the cursor stops twice in apparently random positions.
This also exposes the information contained in language tags to be
unwillingly corrupted by subsequent editing.

_ Marco

RE: In defense of Plane 14 language tags (long)

Reply via email to