Re: What is the principle?

Peter Kirk Mon, 29 Mar 2004 15:25:46 -0800

On 29/03/2004 11:28, Kenneth Whistler wrote:

...

Third, the proposal to "transfer ... some or all of the Variation Selectors on the SSP to Private Use" is unclear on the concept of Private Use. The UTC will make *no* semantic encoding commitment regarding what a private use character is to be used for. That would include *not* specifying that some range of Private Use characters be dedicated to use as variation selectors (privately defined). ...

The problem here is that, despite what you say, the UTC has already specified the character properties of all of the existing PUA characters, in a way which rules out their use as variation selectors, or as combining marks, or as right-to-left characters.

As an alternative to adjusting the definitions of the existing variation selectors, might it be possible for the UTC to adjust the character properties of parts of the Supplementary Private Use Areas? For example, a range of characters could be defined as default ignorable, default collation weight [.0000.0000.0000.0000] etc., and so these could be used as private variation selectors, or as private diacritical marks (which would simply disappear if viewed with a regular font; they would be in combining class 0 and so there would be no normalisation issues); and another range could be defined as RTL; and whatever other ranges might be required. Alternatively, an additional PUA could be defined to avoid changing the properties of existing characters. This cannot be in conflict with the principle that "The UTC will make *no* semantic encoding commitment regarding what a private use character is to be used for" because these kinds of properties have already been specified by the UTC for the existing PUA.

...

Peter Kirk said:

Surely Variation Selectors are "default ignorable" characters, which implies that if a process (including collation?) doesn't know what to do with them they should be ignored, i.e. treated as not present rather than as undefined characters.

From DerivedCoreProperties.txt in the Unicode Character Database:
FE00..FE0F ; Default_Ignorable_Code_Point # Mn [16] VARIATION SELECTOR-1..VARIATION SELECTOR-16 E0100..E01EF ; Default_Ignorable_Code_Point # Mn [240] VARIATION SELECTOR-17..VARIATION SELECTOR-256
Please read the standard carefully regarding what "default ignorable"
means. TUS 4.0, p. 142:
"Default ignorable code points are those that should be ignored by default in rendering unless explicitly supported. ..." ^^^^^^^^^ Some, like U+00AD SOFT HYPHEN, don't necessarily get the zeroes treatment in the default collation table. Some, like U+034F COMBINING GRAPHEME JOINER, while getting zero weights in the default table, were added explicitly in order to make a potential distinction for collation.

Thanks for the clarification.

The *essential* concept of default ignorable characters is that
they consist of the class of characters which, if you don't know
what their impact on visual rendering is, you are better off
displaying *nothing* for them, rather than displaying the black
box (or other blort) indicating the presence of a nondisplayable
character.

--Ken

This, as I see it, is also the *essential* concept of the private variation selectors which Ernest and I are suggesting. It seems that some further properties need to be defined. These would probably be similar to the default properties of the existing variation selectors - but not to those of CGJ, because something in the properties of CGJ has led at least some implementers to assume that it is ALWAYS ignored in rendering (and so is not passed to the rendering engine).

--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/

Re: What is the principle?

Reply via email to