Phillipe, instead of trying to sound authoritative by making up a whole-cloth definition -- one that is completely and utterly wrong -- and thereby confuse and mislead a beginner, you should either be silent or simply point the person to the Unicode glossary:
http://www.unicode.org/glossary/#compatibility_character Mark __________________________________ http://www.macchiato.com â ààààààààààààààààààààà â ----- Original Message ----- From: "Philippe Verdy" <[EMAIL PROTECTED]> To: "Alexandre Arcouteil" <[EMAIL PROTECTED]> Cc: <[EMAIL PROTECTED]> Sent: Fri, 2003 Nov 14 03:28 Subject: Re: compatibility characters (in XML context) > ----- Original Message ----- > From: "Alexandre Arcouteil" <[EMAIL PROTECTED]> > To: <[EMAIL PROTECTED]> > Sent: Friday, November 14, 2003 10:41 AM > Subject: compatibility characters (in XML context) > > > > This is a beginner question : > > > > In the XML 1.1 Proposed Recommendation 05 November 2003 > > (http://www.w3.org/TR/xml11), it is said that "Document authors are > > encouraged to avoid "compatibility characters", as defined in section > > 6.8 of [Unicode]" so relating to Unicode 2.0. > > > > I don't see any online documentation about explicit definition of > > "compatibility characters" according to 2.0. > > Compatibility characters can be defined as the characters whose canonical > decomposition mapping is either:: > > (1) a singleton (example the AngstrÃm symbol, canonically mapped to A > with diaeresis, or the list of unified Han ideographs, only included for > compatibility with legacy charsets or because of assignment errors in > Unicode 1.0) and that are implicitly restricted from being recomposed in all > NF* forms, or > > (2) two-code _canonical_ decomposition mapping, but are excluded from > canonical composition (example the hebrew shin letter with shin dot). > > These characters will never be part of any string in a normalized form (NFC, > NFD, NFKC, NFKD). > > > At least I'd like to know if characters like "Ã" "Ã" or "Å" are > > concerned. > > No.: "Ã" and "Ã" have canonical decompositions, but are not excluded from > recomposition. > And the "oe ligature" has only a compatiblity decomposition, and then is not > a compatibility character. > > > Is somewhere a complete chart of "compatibility characters" ? > > > Look at the Unicode data file which lists composition exclusions... > > >