RE: holes (unassigned code points) in the code charts

Whistler, Ken Fri, 04 Jan 2013 10:33:30 -0800

Stephan Stiller continued:

> Occasionally the question is asked how many characters Unicode has. This
> question has an answer in section D.1 of the Unicode Standard. I
> suspect, however, that once in a while the motivation for asking this
> question is to find out how much of Unicode has been "used up". As
> filling in holes would be dispreferred, it might be interesting to know
> how much of Unicode has been filled if one counts partially filled
> blocks as full. I have no reason to disagree with the (stated and
> reiterated) opinion that our codespace won't be used up in the
> foreseeable future, but it's simply a fun question to ask.
>


The editors maintain some statistical information relevant to this fun question 
at:

http://www.unicode.org/alloc/CurrentAllocaiton.html

Feel free to reference those fun facts the next time Unicode comes up in 
conversation at the bar. ;-)

There have been a few notable examples where particularly egregious examples of 
holes in blocks that seemed unlikely to be filled with like material in the 
future were "reprogrammed" as it were, and grabbed for the encoding of 
unrelated sets of characters. The most notable of these is the range 
U+FDD0..U+FDEF in the middle of the Arabic Presentation Forms-A block. There 
was a clear consensus in both committees that nobody wanted to add any more 
encodings for presentation forms of Arabic ligatures. So, when a need arose to 
add another range of noncharacters, the UTC simply decided that the otherwise 
unused range U+FDD0..U+FDEF could serve for that, while not requiring the 
addition of a new two-column block that could otherwise be used on the BMP. 
There are several symbol blocks on the BMP which have also had a somewhat 
colorful and creative history of "hole-filling" over time.

--Ken

RE: holes (unassigned code points) in the code charts

Reply via email to