Doug Ewell wrote:
1. Language tags may be useful for display issues.
...
For example, it is often said that Japanese users prefer “Japanese-style” glyphs universally, even for Chinese text. The Plane 14 tagging approach is not perfect, but it is sufficient to solve this problem. Japanese users who prefer “Japanese-style” glyphs universally can tag all Han text as “ja”, which may be linguistically wrong but achieves the desired effect. Users who want Chinese glyphs for Chinese-language text and Japanese glyphs for Japanese-language text can tag the former as “zh” and the latter as “ja” as they see fit.
The "user" viewing the text (and preferring 'Japanese-style' glyphs) may be another person than the "user" authoring the text (and inserting the plane-14 tags); in fact the user viewing the text may not be able to modify the plane-14 tags, or may not even be aware of them. I guess, this argument should be reworded, based on a clear distinction of the various "users".
Other scripts besides Han can benefit from plain-text language tagging
as well. A common Latin-script example
... A common Cyrillic example is the difference in the italic forms for, e. g., Russian and Serbian, cf. "Rendering Serbian italics" (used to be at <http://www.tiro.com/transfer/Serbian_Rendering.pdf> -- John, can we have it back?). Other examples include the different current (handwriting) forms, e. g., a UK "I" is perceived as a "T" by most Germans; the Russian- Serbian contrast mentioned above is also in current.
2. Language tags may be useful for non-display issues.
...
3. Conflict with HTML/XML tags need not be a problem.
...
The potential disruption caused by this scenario is probably overstated. Almost every HTML file ever created contains at least one plain-text line separator (CR and/or LF) and at least one HTML-style line separator (<p> and/or <br>). Which to follow? The HTML specification very clearly states that the higher-level protocol takes precedence in this case (unless <pre>preformatted text</pre> is explicitly indicated). The same could be said for the interaction between Plane 14 language tags and HTML language tags.
Other possibilities include a clear rule about their mutual interaction. Paradigms to follow are - interaction between Unicode formatting characters, such as U+200E, U+200F, and U+202A through U+202E, and HTML tagging, such as the Dir attribute and the Bdo element (cf. <http://www.w3.org/TR/html401/struct/dirlang.html#h-8.2>), - interaction between HTTP arguments and the HTML Meta tag, e. g., the HTTP Content-Type, including its charset attribute, cf. <http://www.w3.org/TR/html401/charset.html#h-5.2.2>.
4. The original need for language tags has not disappeared.
...
5. “Statefulness” disadvantage is exaggerated.
...
6. Plane 14 tags are easy to filter out, and harmless if not interpreted.
...
Tags [...] do not affect searching,
There are indeed situations where language tags would affect searching, if not handled properly. Example: In my German WWW pages, I take pains to tag all English terms in the hope to help speech synthesizers, or other clients depending on the correct identification of the language. Now, German attaches pre- fixes and suffixes to the word-stems, and also tends to form compounds. Of course, I have to confine my LANG=EN span to the English word proper. This leads to monsters such as <span lang="en">E-Mail</span>-Adresse <span lang="en">Mailing</span>listen ... aus den <span lang="en">Received-Header</span>n ... A search engine should remove these tags before comparing a search argument to this sort of text. For perfect results, this normalizing should be ap- plied to HTML tags and Unicode tags, alike. (I fear that Google is not that smart, but I haven't tested it.) So the correct argument for Doug's issue #6, the correct argument is probably: Plane-14 Tags do not affect searching any more than high-level tags do.
7. Rapid deprecation creates an image of instability.
...
8. Other, as yet uninvented tags would be implicitly deprecated.
... Best wishes, Otto Stolz