Doug Ewell wrote: > [...] > Readers are asked to consider the following arguments individually, so > that any particular argument that seems untenable or contrary to > consensus does not affect the validity of other arguments. > [...]
Here are my three pence *pro* the deprecation: > 1. Language tags may be useful for display issues. > > The most commonly suggested use, and the original impetus, > for Plane 14 language tags is to suggest to the display > subsystem that “Chinese-style” or “Japanese-style” glyphs > are preferred for unified Han characters. [...] IMHO, there has never been any practical need to consider these glyphic differences in plain text. It is a non-issue raised to the rank of issue because of obscure political reasons. It is false that Japanese is unreadable if displayed with Chinese-style glyphs, or that Polish is unreadable if displayed with Spanish-styles acute accents. It is true that any language looks odd if displayed with an improper font, and that these esthetic issues must be properly addressed in "rich text" and in decent typography. But such a level of graphical correctness does not apply to plain text: if it would apply, we should also rule out many other typographic simplifications which are in current use, such as fixed-width fonts for Western script, fixed-height fonts for the Arabic script, horizontal display of Japanese, etc. > 2. Language tags may be useful for non-display issues. > > Although not frequently mentioned, plain-text language tagging could > also be useful for applications such as speech synthesis, > spell-checking, and grammar checking. [...] These kinds of applications cannot rely on the presence of any kinds of language tagging because, in most real-word cases, this will not be present. { As a side note, the idea that a language my use "foreign" words seems terribly naive to me. It is true that, in Italian, we use loanwords such as "hardware", "punk", or "footing", but it would be silly to consider or tag them as "English words". They are genuinely Italian words, as demonstrated by the fact that their pronunciation is very different from the English (['ɑrdwer(e)], ['pɑŋk(e)] and ['futiŋg(e)], respectively), that their morphology is different (e.g., plural is invariable), and that their meaning is slightly different ("hardware" only refers to computers, "punk" only refers to music and fashion), or even totally different from the English original ("footing" means "jogging"). } > 3. Conflict with HTML/XML tags need not be a problem. > > A common criticism of the Plane 14 language tags is that higher-level > protocols such as HTML and XML already provide a mechanism > for language tagging. There is a concern that the language specified > by the “lang” attribute in HTML or “xml:lang” attribute in XML could > conflict with the one specified in a Plane 14 language tag, [...] As I see it, the problem is not merely that the two fashions of tags may specifying different languages. That would not be a real conflict. It is perfectly legitimate to embed language tags into each other: the rule is that the inner language tag wins. This general rule can be extended to accommodate plain text tags, they will always take the precedence as they clearly are the innermost specification. The real problem is with *overlapping* and *unpaired* tags. XML parsers have built in validation of the tree structure of a document, which ensures that all tags are properly opened, closed and embedded into each other. E.g., overlapping spans like: <x lang="en"> ABC <y lang="fr"> DEF </x> GHI </y> would not pass validation because the English and French span overlap irregularly (as do tags <x> and <y>). But that built-in validation cannot properly detect a situations like: <x lang="en"> ABC \uE0001 \uE0066 \uE0072 DEF </x> GHI \uE007F where the English span (specified in tag <x>) overlap with the French span (specified with plain text tags). Just suggesting to ignore plain text tags is no solution, because this would waste part of the information (and the author's effort provide this information). > 6. Plane 14 tags are easy to filter out, and harmless if not > interpreted. If they are not processed correctly or filtered out, they are by no means harmless. If they are rendered as visible glyphs (such as [LNG][f][r]) or with "missing glyph" boxes, they clutter the text, making it less readable -- i.e., they pejorate the main problem that they were supposed to solve. If they are rendered as invisible glyphs, they make the text more difficult to edit and to move the cursor within, because the user will have no way of understanding why the cursor stops twice in apparently random positions. This also exposes the information contained in language tags to be unwillingly corrupted by subsequent editing. _ Marco