Re: What is the principle?

Ernest Cline Mon, 29 Mar 2004 16:08:01 -0800

> [Original Message]
> From: Kenneth Whistler <[EMAIL PROTECTED]>
> To: <[EMAIL PROTECTED]>
> Cc: <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
> Date: 3/29/2004 2:28:25 PM
> Subject: Re: What is the principle?
>
> Ernest Cline stated:
>
> > The standard is quite clear that if a Variation Selector is recognized,
but
> > not
> > the sequence it is, then it should be treated the same as if no
selector was
> > present.
>
> Which is true.
>
> > 
> > This is one reason why transferring some or all of the Variation
Selectors
> > on the SSP to Private Use is a possibility if they are not going to have
> > any official uses.  
>
> This, however, is distinctly inadvisable, for several reasons.
>
> First, the 240 Variation Selector characters on Plane 14 were added
> *explicitly* to deal with Han variation issues, which involve
> many, many more possible variants, in some cases, than the
> typical numerosity for the occasional variants notes in other
> scripts.

Well, I said that was only a possibility if they weren't.  Since they
are planned to have some, then it would be reasonable to retain them
for that use, altho it does seem a bit strange that they were assigned
before they were needed for official uses and that they were assigned
in a manner that leaves an empty row at U+E01F0 to U+E01FF when
there apparently wasn't a scheme being planned that would need
exactly 256 variation selectors.

> Second, the UTC is considering a scheme for dealing with existing
> large collections of Han variants by expliciting dedicating 128
> of those 240 to a preexisting glyph variant registration scheme,
> to move the Han variation problem off dead center (given that the
> task of spelling out exactly what *are* the variants is an enormous
> problem for Han).

Is there a pointer you could provide for this glyph variant registration
scheme?

> Third, the proposal to "transfer ... some or all of the Variation
> Selectors on the SSP to Private Use" is unclear on the concept of
> Private Use. The UTC will make *no* semantic encoding commitment
> regarding what a private use character is to be used for. That would
> include *not* specifying that some range of Private Use characters
> be dedicated to use as variation selectors (privately defined).
> Anyone who wanted to put in place their own private Idaho of
> two-character encoding for Mende or whatever, could simply define
> that private use space as they wish. Of course they cannot then
> expect automatic rendering (or other) support from standard OS
> interfaces, but that is the fundamental nature of Private Use
> characters.
>
> Essentially what you seem to be asking for is for the UTC to
> relax the restriction of definition of *variation sequences* --
> i.e. let some of the variation selectors be used on an ad hoc
> basis by consenting adults. But that was *explicitly* ruled out
> by the UTC as a potential barrier to interoperability and because
> it would be an invitation to chaotic glyph encoding.

For Variation Selectors that are used or contemplated
for official sequences, I agree that they should not be used
for ad hoc sequences.  What I was asking for there to be
Private Variation Selectors whose private use would not
interfere with official variation sequences, either those that
are or might be assigned in the future.  Since it appeared
that there were many more variation selectors than would
likely be needed for official variation sequences,
transferring some of those to being used ONLY for private
variation sequences seemed to be a possibility that would
make use of the excess selectors instead of adding new
characters that would have the same function.  Since an
official use is contemplated for the existing variation
selectors, then a transfer to private use is not desirable.

However, if I am understanding you correctly, when it comes
to the idea of even new Private Use only Variation Selectors,
too bad.  If Private Use characters don't have the default
characteristics, Unicode intentionally makes using them
as difficult as possible.  You consider this a boon, I feel
this is a fundamental flaw.

Unicode would benefit from having ranges of Private Use
characters that would be known to have certain character
properties, such as being a Variation Selector, or to take
a topic from a recent thread, if there were Private Use
characters with a default strong RTL property for the
Bidirectional Algorithm.  I can appreciate the desire to
further interoperability, but I don't think that unhelpfulness
on the part of Unicode towards such uses helps achieve
interoperability. It merely moves the interoperability problem
to a different place rather than doing anything to solve it,
and does so in a manner that impairs usability.

The chaotic glyph encoding you fear is exactly why
once official characters for private use characters
(of any variety) are established and supported, new
documents tend to use the official characters along with
pre-existing documents being migrated to the official
version, just as now happens with the various ad-hoc
8-bit pseudo-encodings when a version of Unicode
that supports its characters becomes available   I fail
to see why supporting robust private use characters
would either impair the adoption rate of official characters,
or cause a delay in the addition of new scripts or
characters to the official Unicode registry.  Indeed, one
could argue that by making it easier to experiment with
private use characters, one might see faster adoption
of new scripts and characters with something other
than the default characteristics now assigned to all
private use characters, as it would be easier to test the
utility of such scripts and characters.
Re: What is the principle?

Reply via email to