Peter Kirk posted:

Well, that's what was puzzling me about the recommendations not to use
these characters. In my opinion, there needs to be a clear statement
with each character definition (not somewhere in the text not linked to
it) of its status in such respects. Is it for compatibility use only? Is
it a presentation form not for use in general information interchange?
Is it a formatting variant of another character, which should be used if
that special formatting is to be indicated although the two might be
collated together?

Perhaps a cross-reference to areas in the main text where that particular character or kind of character is discussed when there is some special mention in the main text.


Otherwise the various indications of distinction and compabitility decomposition and canonical decomposition usually indicate a lot, if the reader looks at them and learns to understand them.

But indeed the standard is somewhat inconsistant in sometimes coming close to recommending not using compatibility characters at all and in other cases recommending particular ones.

For example, if I want a superscript 2 to indicate "squared" (which
someone used on this list recently), am I supposed to use U+00B2, or
should I avoid using it and instead use a higher level markup (which
implies I need to use HTML e-mail)? Maybe the text tells me somewhere,
but it certainly doesn't in the code chart.

Well if you are using unformatted text and want to use a superscript 2 then you don't have much choice. I suppose I could have sent "E=mc^2" or "E=mc{squared}" "E=mc<super>2" or something, but why would I when I have Unicode? :-)


Actually superscript 2 is also in the Latin-1 character set. :-)

In http://www.unicode.org/versions/Unicode4.0.0/ch14.pdf it states:

<< Therefore, the preferred means to encode superscripted letters or digits, such as “1st” or “DC0016”, is by style or markup in rich text. >>

I would think that statement obvious since in technical writing and mathematical writing it is theoretically possible for any displayable character in Unicode to be superscripted or subscripted, and even superscripted or subscripted to an already superscript or subscript character, and so on.

Also in the code chart (http://www.unicode.org/charts/PDF/U0080.pdf) U+00BS SUPERSCRIPT TWO is given a compatibility decomposition to "<super> *0032* 2". Similarly with other superscript characters.

But beyond all recommendations in the Unicode standard what is done depends on what the user wants to do for a particular purpose in a particular environment with particular fonts. There is no one correct way that fits all users at all places and times, nor should there be.

If I am printing out a document on a particular system with particular software and fonts in which plain text superscripts look to me better than superscripts created by formatting regular numbers by the word processor I am using then I will naturally in that time and place use Unicode plain text superscripts.

That Unicode gives me the choice is a benefit I should take advantage of without worrying that formatting regular numbers as superscript is theoretically better than using compatibility characters.

Unicode is messy and complex mostly because character usage is messy and complex and display technology is messy and complex and there are always edge-cases and things that don't fit well.

But Unicode's keeping deprecated individual character encodings while allowing applications to freely throw away non-deprecated canonical decomposable encodings (which supposedly only exist because they should not be thrown away) confuses me also.

I thought even deprecated ones were supposed to be usable, in that a
system should process them correctly.

It depends on what is meant by "usable" and the "system" and "correctly". No system has to support all of Unicode. Accordingly I would not expect systems to support deprecated control characters or fonts to go out of their way to support deprecated characters.


A system that does not support deprecated control codes (and even some of the non-deprecatated control codes) and does not support particular characters (perhaps only because there are no fonts on the system with those characters) can still be conformant to Unicode in what it supports.

A text editor that supports only fixed width fonts will probably not support the special-width spaces properly but may still be Unicode conformant.

Jim Allan


Reply via email to