Re: Coloured diacritics (Was: Transcoding Tamil in the presence of markup)

Jungshik Shin Mon, 08 Dec 2003 12:51:12 -0800

On Mon, 8 Dec 2003, Peter Kirk wrote:

> On 08/12/2003 08:37, Doug Ewell wrote:
>
> >Peter Kirk <peterkirk at qaya dot org> wrote:


> >>I may have missed or misunderstood the details, but it has been
> >>clearly stated here in the last few days that (a) there are more
> >>than 11,000 redundant Korean characters in the BMP, and (b) many
> >>precomposed Korean characters lack canonical or even compatibility
> >>decompositions which would be desirable.

  You're another 'victim'(?!) of the multi-level representability of the
Korean script. Although I consistently used syllables, letters (Jamos:
complex/compund vs simple/basic), it may not have been clear to you.

Doug, thank you for the clear summary.

> >Jungshik has been saying for years now that (a) the 11,172 precomposed
> >syllables are redundant, since they can all be easily decomposed into
> >jamos.

  Although I wasn't involved in encoding them, I wrote at least once in
the early 1990's in the public mailing list that all of them would have
to be encoded. So, I was certainly not free from the shortsightedness
of Koreans that pushed the proposal to encode them all. (I heard that
the ballot was passed with a narrow margin.) I just have respect for
Indians behind ISCII (which was first published in 1988?).  To be fair
to my fellow Koreans, I must add that the need for encoding Hanjas (CJK
ideographs used in Korea) made things complicated for Korean character
sets (before Unicode).

> > He also said recently that (b) the jamos that represent doubled
> >sounds or "letter clusters" had compatibility equivalences in Unicode
> >2.0, but these were subsequently removed, and that this too was a
> >mistake.

  Actually, I have been saying this almost as long :-) I also have
to add that there's at least one understandable reason for the removal
(which is escaping me at the moment.)

> >So there are (a) 11000+ redundant Korean characters, and there are (b)
> >Korean characters without decompositions.  But there are not (a Ã b)
> >"11000+ redundant Korean characters without decompositions."

> Do the 11,172 precomposed syllables actually have canonical or
> compatibility decompositions? Are they composition exclusions?

  Peter, can you just open up TUS 4.0 section 3.12 (refered to by
Doug in his first reply on the issue) and you would know.  They're
canonically equivalent and _not_ composition  exclusions. If even that's
not the case, it'd be really disastrous.

  Jungshik

Re: Coloured diacritics (Was: Transcoding Tamil in the presence of markup)

Reply via email to