Andy White wrote:

> And I today see that the precomposed character '0B71 ORIYA LETTER WA'
> has been added to the UCS4.0 charts
> http://www.unicode.org/charts/PDF/U40-0B00.pdf
> This is clearly a composition of ORIYA LETTER O and ORIYA LETTER LETTER
> VA (BA).

People on the list today are playing a little fast and loose
with the terminology of "precomposed" and "composition".

In the Unicode Standard, a character is not precomposed or
composite unless it has a formal decomposition mapping defined
in the Unicode Character Database (namely in UnicodeData.txt).

While ORIYA LETTER WA is graphically constructed of the
form for the ORIYA LETTER O and the bottom half of PA (not BA),
it doesn't fit the pattern one would expect for consonant
conjuncts (C+C, not V+C), and it isn't given a formal
decomposition in UnicodeData.txt, because even though it
is graphically complex, it otherwise fits into the pattern of
the regular consonant letters for Indic scripts (as an
alternate for VA). Note that the new ORIYA LETTER VA is
also graphically complex -- a dotted BA -- but is also
not given a decomposition.

For that matter, you could look to existing Oriya characters
such as U+0B06 ORIYA LETTER AA and claim it is just a graphic
combination of U+0B05 ORIYA LETTER A and U+0B3E ORIYA VOWEL
SIGN AA. But such decompositions are *also* not used in
the standard. So ORIYA LETTER AA is an *atomic* character
in Unicode, despite the fact that it is graphically
complex (and analyzable into parts).

If anyone ones a pointless exercise in simplification for
the benefit of complexity sometime, try working on the
Yi syllabary charts (U+A000..U+A48C) and pull these
graphically complex forms apart into all of their
duplicated constituent parts. The mere fact that such
forms are graphically complex and have identifiable parts
is not what establishes, however, their status as atomic
versus composite character in the Unicode Standard.

--Ken


Reply via email to