At 08:35 AM 3/5/2003, John Cowan wrote:

> Then why does UnicodeData break them down as (e.g.) 0064 030C rather than
> 0064 0315?

To keep the upper case and lower case characters in sync for decomposition,
they always have the same combining characters.

Yes. There is nothing technically or grammatically incorrect about thinking of d' l' and t' as letters with 'carons': it is only typographically incorrect to represent them with the typical caron mark. The encoding of characters and the visual representation of characters do not always directly correspond.


For another example, G with
cedilla gets the cedilla on top when it's a capital, but it still decomposes
to the ordinary combining cedilla.  These are essentially font-ligaturing
issues.

Not quite, in that the font does not necessarily require ligature substitution data for characters that are encoded in Unicode in precomposed forms. Systems and applications should take care of canonical composition, not fonts.


By the way, although Unicode calls it a cedilla, the correct form to use with G is the disconnected, 'under comma' form.

John Hudson

Tiro Typeworks          www.tiro.com
Vancouver, BC           [EMAIL PROTECTED]

It is necessary that by all means and cunning,
the cursed owners of books should be persuaded
to make them available to us, either by argument
or by force.      - Michael Apostolis, 1467




Reply via email to