Re: holes (unassigned code points) in the code charts

Asmus Freytag Fri, 04 Jan 2013 03:54:08 -0800

On 1/4/2013 2:36 AM, Stephan Stiller wrote:

All,
There are plenty of unassigned code points within blocks that are inuse; these often come at the end of a block but there are plenty ofholes as well.
I have a cluster of interrelated questions:
1. What sorts of reasons are there (or have there been) for leavingholes? Code page conversion and changes to casing by simplearithmetic? What else?

There are a number of reasons why a code chart may not be contiguousbesides the reason you give. Sometimes, a character gets removed fromthe draft at last minute, In those cases, a hole may be left. Ingeneral, the possible reasons for leaving a hole can not be enumeratedin a fixed list. It's more of a case-by-case thing.

1.1 The rationale for particular holes is not documented in the codecharts I looked at; is there documentation? (Yes, in some instancesthe answer can be guessed.)


In general, no. Sometimes, there's explanation in the text.

1.2 How is the number of holes determined? It seems like multiples of16 are used for block sizes merely for practical reasons.

Blocks end on a value ending in "F" in hexadecimal notation.

2. I notice that ranges are often used to describe where scripts arefound. Do holes have properties? Are the other block-related policiesthat gives holes a certain semantics?

There are default values for some properties that can be applied tounassigned characters in order to make an algorithm "do the best" withas-yet-unassigned characters (so that if a new character is created, thealgorithm doesn't have to be reimplemented necessarily but still givesgood results).


There's no distinction between "holes" and other unassigned characters.

2.1 If not, how likely is it that Unicode assigns script-externalcharacters to holes?

It's generally not desirable, but there's no firm policy that blocksmust have a single script value (and in fact, no such restriction existsin existing blocks).

2.2 If yes, how does the number of assigned code points differ, ifholes that are assumed to be filled only by certain types ofcharacters are counted?

???

2.2.1 Would this make much of a difference wrt the question (thiscomes up from time to time it seems) of how much of Unicode willeventually fill up?

If strong technical reasons exist for placing a character into the BMP,there will be temptation to fill a "hole" if the BMP is otherwise full.Likewise, many, many years (decades) from now, similar pressure mightexist should the rest of the code space become filled.

However, the most likely scenario is that Unicode will continue for anindefinite period with sufficient "open" space (and the occasional hole).

3. Have there been "mistakes" wrt to hole assignment?


Unicode doesn't make mistakes. :)

A,.


Stephan

Re: holes (unassigned code points) in the code charts

Reply via email to