At 10:49 PM 2/3/03 -0800, Doug Ewell wrote:
> Can you please explain what is the best practice to handle unassigned
> code points so that applications can easily become forward compatible?
> If we just ignore unassigned code points, then will it make for
> application easier to migrate to later version of Unicode?
In many circumstances, the best approach for unassigned character
codes is to treat them like the characters around them.

An implementation might chose to interpolate the property values
of assigned characters bordering a range of unassigned characters,
using the following rules:

* Look at the nearest assigned characters in both directions.
If they are in the same block, and have the same property value,
then use that value.
* From any block boundary, extending to the nearest assigned
character inside the block, use the property value of that character.
* For all code points entirely in empty or unassigned blocks use the
default property value for that property as given in the Unicode Character
Database.

There are two important benefits of using that approach in implementations.
Property values become much more contiguous, allowing better compaction of
property tables. Furthermore, because similar characters are often
encoded in proximity, chances are good that the interpolated values
will match the actual property values when characters are assigned
to a given code point later.

Of course, many important properties may well not be predictable, but on
the whole, the approach has proven successful.

A./

Reply via email to