Re: The result of the plane 14 tag characters review.

Jungshik Shin Sat, 30 Nov 2002 04:18:08 -0800

On Sun, 17 Nov 2002, Doug Ewell wrote:

> John Jenkins was referring to the preference of Japanese speakers for
> reading Chinese-language text in Japanese-style glyphs.  Perhaps an
> appropriate language tag for this scenario might be "zh-JP", meaning
> "Chinese as used in Japan."  Even then, the language-country model is
> not perfect; the Japanese speaker in question could be located anywhere
> in the world, even in China.


  I think 'country/region' in 'language-country' model should be
interpreted as the country/region  which a person want to be 'affiliated
with' whereever (s)he may live.  On the other hand, there are cases like
'default paper size' and 'measurement units' which are not strongly
correlated with the locale but which nonetheless can be inferred from
where a person lives (well, en-US is about the only place where US Letter
is the default and emperial units are still 'standard'). When
'can be inferred from' approach went to the extreme, we have an
absurd case.

  A few years ago, JDK 1.0(or 1.1) mapped locales to timezones.
When ja-JP locale was selected, the timezone was always set to UTC
+0900.  For en-US, it's set to UTC -0800 (or UTC -0700).  Obviously,
this couldn't be right because Japanese can live anywhere in the world
and US has several timezones other than US PST/PDT.


> > How do Chinese feel about this? They might find it objectionable to
> > have to read Chinese in Japanese glyphs in a multilingual document.
>
> You never hear this situation mentioned.  I take that to mean that
> Chinese speakers do not find it cripplingly objectionable the way some
> Japanese speakers find the opposite situation.

  I heard some Chinese complain although not to the extent
that some Japanese do.

> What about the other applications for language tagging mentioned in RFC
> 3066 and in my Plane 14 paper, like spelling and grammar checking and
> speech synthesis?  Should these be available only for fancy text?

  I just hit upon another use of Plane 14 lang. tag although it's just
as a convenience measure  some multilingual applications may want to take.
(IETF) RFC 2231(section 5)  extended RFC 2047 to allow the specification
of language in RFC-2047 encoded mail header with an optional 'language
tag' following MIME charset and '*' as shown below.

  =?ISO-8859-15*FR?Q?......?=  =?ISO-8859-15*DE?Q?.........?=
  =?UTF-8*ZH-CN?B?........?= =?UTF-8*JA?B?............?=
  =?EUC-KR?B?......?= =?ISO-2022-JP?Q?......?=
  =?KOI8-R?B?......?=

Some implementations  might find it handy  to 'decode' a sequence of
RFC 2047/2231 encoded words with lang spec.(explicitly or implied)
to a Unicode string with Plane 14 lang. tags embedded.  Of course,
this 'internal' use of Plane 14 lang. tags as a 'convenient' means of
preserving otherwise lost(in conversion to Unicode) information cannot be
used as an argument for/against their deprecation because even deprecated,
they still can be used this way *internally*.

  Jungshik Shin

Re: The result of the plane 14 tag characters review.

Reply via email to