Re: In defense of Plane 14 language tags (long)

Otto Stolz Mon, 04 Nov 2002 09:57:16 -0800

Doug Ewell wrote:

1.  Language tags may be useful for display issues.

...

For example, it is often said that Japanese
users prefer “Japanese-style” glyphs universally, even for Chinese text.

The Plane 14 tagging approach is not perfect, but it is sufficient to
solve this problem.  Japanese users who prefer “Japanese-style” glyphs
universally can tag all Han text as “ja”, which may be linguistically
wrong but achieves the desired effect.  Users who want Chinese glyphs
for Chinese-language text and Japanese glyphs for Japanese-language text
can tag the former as “zh” and the latter as “ja” as they see fit.


The "user" viewing the text (and preferring 'Japanese-style' glyphs)
may be another person than the "user" authoring the text (and inserting
the plane-14 tags); in fact the user viewing the text may not be able
to modify the plane-14 tags, or may not even be aware of them.

I guess, this argument should be reworded, based on a clear distinction
of the various "users".

Other scripts besides Han can benefit from plain-text language tagging
as well. A common Latin-script example

...

A common Cyrillic example is the difference in the italic forms for,
e. g., Russian and Serbian, cf. "Rendering Serbian italics" (used to
be at <http://www.tiro.com/transfer/Serbian_Rendering.pdf> -- John,
can we have it back?).

Other examples include the different current (handwriting) forms,
e. g., a UK "I" is perceived as a "T" by most Germans; the Russian-
Serbian contrast mentioned above is also in current.

2.  Language tags may be useful for non-display issues.

...

3.  Conflict with HTML/XML tags need not be a problem.

...

The potential disruption caused by this scenario is probably overstated.
Almost every HTML file ever created contains at least one plain-text
line separator (CR and/or LF) and at least one HTML-style line separator
(<p> and/or <br>).  Which to follow?  The HTML specification very
clearly states that the higher-level protocol takes precedence in this
case (unless <pre>preformatted text</pre> is explicitly indicated).  The
same could be said for the interaction between Plane 14 language tags
and HTML language tags.


Other possibilities include a clear rule about their mutual interaction.

Paradigms to follow are

- interaction between Unicode formatting characters, such as U+200E,
  U+200F, and U+202A through U+202E, and HTML tagging, such as
  the Dir attribute and the Bdo element (cf.
  <http://www.w3.org/TR/html401/struct/dirlang.html#h-8.2>),

- interaction between HTTP arguments and the HTML Meta tag, e. g.,
  the HTTP Content-Type, including its charset attribute,
  cf.  <http://www.w3.org/TR/html401/charset.html#h-5.2.2>.

4.  The original need for language tags has not disappeared.

...

5.  “Statefulness” disadvantage is exaggerated.

...

6.  Plane 14 tags are easy to filter out, and harmless if not
interpreted.

...

Tags [...] do not affect searching,


There are indeed situations where language tags would affect searching,
if not handled properly.
Example: In my German WWW pages, I take pains to tag all English terms
in the hope to help speech synthesizers, or other clients depending on
the correct identification of the language. Now, German attaches pre-
fixes and suffixes to the word-stems, and also tends to form compounds.
Of course, I have to confine my LANG=EN span to the English word proper.
This leads to monsters such as
  <span lang="en">E-Mail</span>-Adresse
  <span lang="en">Mailing</span>listen
  ... aus den <span lang="en">Received-Header</span>n ...

A search engine should remove these tags before comparing a search argument
to this sort of text. For perfect results, this normalizing should be ap-
plied to HTML tags and Unicode tags, alike. (I fear that Google is not
that smart, but I haven't tested it.)

So the correct argument for Doug's issue #6, the correct argument is
probably:
Plane-14 Tags do not affect searching any more than high-level tags do.

7.  Rapid deprecation creates an image of instability.

...

8.  Other, as yet uninvented tags would be implicitly deprecated.

...



Best wishes,
  Otto Stolz

Re: In defense of Plane 14 language tags (long)

Reply via email to