On Fri, 27 Jul 2012 09:01:13 -0700
Mark Davis ☕ <m...@macchiato.com> wrote:

> The key term is 'open interchange'.

XML documents are textual objects.  It is therefore reasonable to look
at them using tools for displaying textual objects.  However,
> "<snip> noncharacters are <snip>
> permanently reserved (unassigned) and have no interpretation
> whatsoever outside of their possible application-internal private
> uses."

> For CLDR collation data - *not open interchange, but specific to use
> in CLDR collation data* - these characters have specified use as
> sentinel characters, marking the boundaries for CJK 'buckets' for use
> in indexes.

I hope you're addressing a complaint I haven't made.  I haven't
complained about tailoring involving non-characters, though it
does strike me as a least evil. Are you perhaps arguing that I become
part of some CLDR application when I read CLDR XML files? 

> This is described in
> http://unicode.org/reports/tr35/#Collation_Elements. The
> noncharacters are chosen specifically so that they do not overlap
> with publicly interchanged private use characters. Of course,
> implementations of LDML can tailor the collations to remove them, or
> replace by other mechanisms.

I was going to ask when the LDML element suppress_contractions took
effect.  At least I now have some idea of the answer.

> Unfortunately, some restrictions that were perfectly reasonable for
> use in document interchange become annoying flaws in a general
> structured data interchange format. The inability to interchange all
> Unicode scalar values is one.

The restrictions improve legibility.  As it is, many of the
character-level elements in CLDR XML files tend to be unreadable.  It
would be better for them not to require genuinely complex text
rendering.  In a related matter, it was very inconvenient to have to
treat collation test files as binary data because they could not be DOS
text files - ctrl/Z in the comments cut the files short.

Richard.


Reply via email to