At this time there are about 160 different character properties defined in the UCD. In practice most applications probably only use a limited set of properties to work with. Nevertheless applications should be able to lookup all the properties of a code point. Compiling-in lookup tables for all defined properties (including Unihan) makes small applications become rather big. This made me decide to create a binary file format for storing character properties and initialize property lookup tables on demand.

Benefits of using run-time loadable lookup tables initialized from binary
files are:

  - no worries about total table size, since data will only be loaded
    on demand

  - initializing lookup tables from a binary file is relatively fast

  - property lookup files can be locale specific (useful for character
    names and case mappings for example)

  - new properties can be added quickly and never affect layout or
    content of other tables

  - any number of properties can be supported including custom
    (non-Unicode) properties

  - by initializing a lookup table from two sources (UCD-based and
    vendor-based), applications can overload the default property
    values assigned to PUA characters with private property values

The file format I've implemented is capable of storing any type of property.
Each file contains property values for one property (no more squeezing as
much property values as possible in as few bits as possible). The format
is called UPR (Unicode PRoperties).

I have written a tool to generate the necessary UPR files from the UCD. A
small C-library for reading a UPR file into a property lookup table, and
a high-level library which provides property lookup functions for *all*
Unicode properties in 4.0.0 are also available.

For more information on the file format and related software see:
http://www.let.uu.nl/~Theo.Veenker/personal/projects/upr/. My primary
development platform is UNIX/Linux, but you can compile and run it under
Windows as well (less tested however). Current version supports UCD 4.0.0,
I will add support for 4.0.1 soon.

Please check it out. Feedback is welcome.


Regards,

Theo Veenker




Reply via email to