On 1/4/2013 2:36 AM, Stephan Stiller wrote:
All,

There are plenty of unassigned code points within blocks that are in use; these often come at the end of a block but there are plenty of holes as well.

I have a cluster of interrelated questions:
1. What sorts of reasons are there (or have there been) for leaving holes? Code page conversion and changes to casing by simple arithmetic? What else?

There are a number of reasons why a code chart may not be contiguous besides the reason you give. Sometimes, a character gets removed from the draft at last minute, In those cases, a hole may be left. In general, the possible reasons for leaving a hole can not be enumerated in a fixed list. It's more of a case-by-case thing.
1.1 The rationale for particular holes is not documented in the code charts I looked at; is there documentation? (Yes, in some instances the answer can be guessed.)

In general, no. Sometimes, there's explanation in the text.
1.2 How is the number of holes determined? It seems like multiples of 16 are used for block sizes merely for practical reasons.
Blocks end on a value ending in "F" in hexadecimal notation.
2. I notice that ranges are often used to describe where scripts are found. Do holes have properties? Are the other block-related policies that gives holes a certain semantics?

There are default values for some properties that can be applied to unassigned characters in order to make an algorithm "do the best" with as-yet-unassigned characters (so that if a new character is created, the algorithm doesn't have to be reimplemented necessarily but still gives good results).

There's no distinction between "holes" and other unassigned characters.
2.1 If not, how likely is it that Unicode assigns script-external characters to holes?

It's generally not desirable, but there's no firm policy that blocks must have a single script value (and in fact, no such restriction exists in existing blocks).
2.2 If yes, how does the number of assigned code points differ, if holes that are assumed to be filled only by certain types of characters are counted?

???
2.2.1 Would this make much of a difference wrt the question (this comes up from time to time it seems) of how much of Unicode will eventually fill up?

If strong technical reasons exist for placing a character into the BMP, there will be temptation to fill a "hole" if the BMP is otherwise full. Likewise, many, many years (decades) from now, similar pressure might exist should the rest of the code space become filled.

However, the most likely scenario is that Unicode will continue for an indefinite period with sufficient "open" space (and the occasional hole).
3. Have there been "mistakes" wrt to hole assignment?

Unicode doesn't make mistakes. :)

A,.

Stephan





Reply via email to