Re: Coloured diacritics (Was: Transcoding Tamil in the presence of markup)

Doug Ewell Sun, 07 Dec 2003 19:24:10 -0800

Peter Kirk <peterkirk at qaya dot org> wrote:

> Well, this is W3C's problem. They seem to have backed themselves into
> a corner which they need to get out of but have no easy way of doing
> so.


Only if this issue of applying style to individual combining marks is
considered a sufficiently important text operation do they "need to get
out of" this so-called corner into which they have supposedly backed.

There are plenty of things one can do with writing that aren't supported
by computer encodings, and aren't really expected to be.  The idea of a
black "i" with a red dot was mentioned.  Here's another: the
piece-by-piece "exploded diagrams" used to teach young children how to
write the letters of the alphabet.  For "a" you first draw a "c", then a
half-height vertical stroke to the right of (and connected with) the
"c".  Two diagrams.  For "b" you draw a full-height stroke, then a
backwards "c" to its right.  Another two diagrams.  And so on.

For each letter, each diagram except the last shows an incomplete
letter, which might or might not accidentally correspond to the glyph
for a real letter.  Also, each diagram except the first might show the
"new" stroke to be drawn in a different color from the strokes already
drawn, to clarify the step-by-step process.  There might also be little
arrows with dashed lines alongside each stroke, to show the direction in
which to draw it.

None of these pedagogical elements can be represented in Unicode or any
other computer encoding, or in HTML or XML markup, yet we don't conclude
that these technologies are fatally broken because of it.  These very
specialized uses of text are, and will continue to be, handled in
graphics.

> Unicode is of course very familiar with this kind of situation e.g.
> with character name errors, combining class errors, 11000+ redundant
> Korean characters without decompositions, etc etc.

"Without decompositions"?  What about the canonical equivalence between
jamos and syllables described in Section 3.12?  What about the algorithm
to derive the canonical decomposition shown on page 88?  What am I
missing here?

> So no doubt it can extend its sympathy; and possibly even offer to
> help by encoding the kind of character I was suggesting early (perhaps
> in exchange for some W3C readiness to accept correction of errors in
> the normalisation data?). But really this is not a Unicode issue.

You encode a character that violates your principles and existing
encoding models, and we'll break the promises we made to users to
maintain normalization stability.  Sounds like a great political
compromise to me.

-Doug Ewell
 Fullerton, California
 http://users.adelphia.net/~dewell/

Re: Coloured diacritics (Was: Transcoding Tamil in the presence of markup)

Reply via email to