The key term is 'open interchange'. "In effect, noncharacters can be thought of as application-internal private-use code points. Unlike the private-use characters discussed in Section 16.5, Private-Use Characters, which are assigned characters and which are intended for use in open interchange, subject to interpretation by private agreement, noncharacters are permanently reserved (unassigned) and have no interpretation whatsoever outside of their possible application-internal private uses."
For CLDR collation data - *not open interchange, but specific to use in CLDR collation data* - these characters have specified use as sentinel characters, marking the boundaries for CJK 'buckets' for use in indexes. This is described in http://unicode.org/reports/tr35/#Collation_Elements. The noncharacters are chosen specifically so that they do not overlap with publicly interchanged private use characters. Of course, implementations of LDML can tailor the collations to remove them, or replace by other mechanisms. > NULL and the two noncharacters U+FFFE and U+FFFF are banned from XML It is not just null, but most controls. Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] Unfortunately, some restrictions that were perfectly reasonable for use in document interchange become annoying flaws in a general structured data interchange format. The inability to interchange all Unicode scalar values is one. Mark <https://plus.google.com/114199149796022210033> * * *— Il meglio è l’inimico del bene —* ** On Fri, Jul 27, 2012 at 12:17 AM, Richard Wordingham < [email protected]> wrote: > On Thu, 26 Jul 2012 22:52:54 -0700 > "Steven R. Loomis" <[email protected]> wrote: > > > On Thu, Jul 26, 2012 at 6:19 PM, Richard Wordingham < > > [email protected]> wrote: > > > > > On Thu, 26 Jul 2012 17:01:53 -0700 > > > "Steven R. Loomis" <[email protected]> wrote: > > > I suspect it was simply an oversight and not indicative of any > > systemic issue. UTS#35 gives the example of <cp hex="0"> for > > representing NULL as an example of a character not to be used in XML. > > Note that there's nothing wrong with processing non-characters in > > memory- I have to deal with non-characters all the time. Thanks for > > filing the bug. > > NULL and the two noncharacters U+FFFE and U+FFFF are banned from XML; > the other noncharacters are allowed. It's the Unicode Standard that > bans them from *open interchange*. > > Richard. > >

