"reserved for CLDR" would be wrong in TUS, you have reached a borderline where you are no longer handling plain text (stream of scalar values assigned to code points), but binary data via a binary interface outside TUS (handling streams of collation elements, whose representation is not even bound to the ICU implementation of CLDR for its own definitions and syntax for its tailorings).
CLDR data defines its own interface and protocol, it can reserve these code points only for itself but not in TUS and no other conforming plain-text application is expected to accept these reservations, so they can **freely** mark them in error, replace them, or filter them out, or interpret them differently for their own usage, using their own specification and encapsulation mechanisms and specific **non-plain-text** data types. CLDR data transmitted in binary form that would embed these code points are not transporting plain-text, this is still a binary datatype specific to this application. CLDR data must remain isolated in its scope without forcing other protocols or TUS to follow its practices. Other applications may develop "gateway" interfaces to convert them to be interoperable with ICU but they are not required to do that. If they do, they will follow the ICU specifications, not TUS and this should not influence their own way to handle what TUS describe as plain-text. To make it clear, it is referable to just say in TUS that the behavior of applications with non-characters is completely undefined and unpredictable without an external specification, and these entities should not even be considered as encodable in any standard UTFs (which can be freely be replaced by another one without causing any loss or modification of the represented plain-text). It should be possible to define other (non standard) conforming UTFs which are completely unable to represent these non-characters (as well as any unpaired surrogate). A conforming UTF just needs to be able to represent streams of scalar values in their full standard range (even without knowing if they are assigned or not or without knowing their character properties). You can/should even design CLDR to completely ovoid the use of non-characters: it's up to it to define an encapsulation/escaping mechanism that clearly separates what is standard plain-text in the content and what is not and used for specific purpose in CLDR or ICU implementations. 2014-06-03 0:07 GMT+02:00 Shawn Steele <shawn.ste...@microsoft.com>: > Except that, particularly the max-weight ones, mean that developers can > be expected to use this as sentinels in code using ICU, which would > preclude their use for other things? > > > > Which makes them more like “reserved for use in CLDR” than “noncharacters”? > > > > -Shawn > > > > *From:* Unicode [mailto:unicode-boun...@unicode.org] *On Behalf Of *Markus > Scherer > *Sent:* Monday, June 2, 2014 2:53 PM > *To:* David Starner > *Cc:* Unicode Mailing List > *Subject:* Re: Corrigendum #9 > > > > On Mon, Jun 2, 2014 at 1:32 PM, David Starner <prosfil...@gmail.com> > wrote: > > I would especially discourage any web browser from handling > > these; they're noncharacters used for unknown purposes that are > undisplayable and if used carelessly for their stated purpose, can > probably trigger serious bugs in some lamebrained utility. > > > > I don't expect "handling these" in web browsers and lamebrained utilities. > I expect "treat like unassigned code points". > > > > markus > > _______________________________________________ > Unicode mailing list > Unicode@unicode.org > http://unicode.org/mailman/listinfo/unicode > >
_______________________________________________ Unicode mailing list Unicode@unicode.org http://unicode.org/mailman/listinfo/unicode