Throughout all of this, I had completely missed the fact that the Tech
Note for CESU-8 had been upgraded to a Tech Report, two and a half years
ago, in fact. Perhaps I was in denial. Anyway, that ... invalidates many of my comments...
Noted.
CESU-8 is the documentation of someone's internal, non-standard implementation of UTF-8. Of course, the "someone" is large and important and their implementation affects a lot of users. If nobody else is motivated by the presence of UTR #26 to adopt this non-standard version, good.
There are some UTF-8/UTF-16 interoperability aspects that are addressed by CESU-8. These concerns are real, and affect multi-component architectures that must interchange data across component boundaries. Therefore a standard specification serves a useful purpose.
What worries me is that there might be other people in the world like Philippe
Phillippe doesn't worry me ;-)
While we're on the subject of UTNs, I think it's a shame that BOCU-1, a genuinely novel and potentially useful compression scheme that was invented from scratch, is only documented in a "no-endorsement" UTN, when a draft UTR-upgrade that adds a white-box algorithm was written almost a year ago but has not been approved. This places BOCU-1 *below* CESU-8 in the food chain, which seems badly wrong.
You realize that the choice of material for a UTN rests with the authors.
Occasionally that will mean that material that could be a formal specification is placed into a UTN by an author uninterested in getting UTC endorsement, or one that lacks the time to pursue such.
In the case of BOCU-1 it's the latter, as the UTC has welcomed the idea of putting this on a standards track.
So, your remarks should be directed at the authors of the UTN, and/or the owners of the relevant technology.
A./