Re: Unicode Normalisaton Optimisation Experiments

jon Thu, 25 Sep 2003 03:50:27 -0700

> > Hi,
> > I'm currently experimenting with various trade-offs for Unicode
> normalisation code. Any comments on these (particularly of the "that's
> insane, here's why, stop now!" variety) would be welcome.
> 
> You might want to look at, if not even use, the ICU open-source
> implementation:
> 
> http://oss.software.ibm.com/icu/
> http://oss.software.ibm.com/cvs/icu/~checkout~/icu/source/common/unorm.cpp


I did, but when I started this I was more interested in simply comparing various 
optimisations as a study into the related techniques. However I recently hit a 
practical need for such code for another task, and while it's nice that I've a bunch 
of "work" code already done as "fun" code maybe I should just use ICU...

> > The second is an optimisation of both speed and size, with the
> disadvantage that data cannot be shared between NFC and NFD operations (which
> is perhaps a reasonable trade in the case of web code which might only need NFC
> code to be linked). In this version decompositions of stable codepoints are
> ommitted from the decompositon data. For example since following the
> decomposition <U+0104> -> <U+0041, U+0328> there can be no
> character that is unblocked from the U+0041 that will combine with it, hence
> there is no circumstance in which they will not be recombined to U+0104 and
> hence dropping that decomposition from the data will not affect NFC (the
> relevant data would still have to be in the composition table, as the sequence
> <U+0041, U+0328> might occur in the source code).
> 
> Sounds possible and clever. As far as I remember, ICU uses the normalization
> quick check flags 
> (Unicode properties) to determine much of this, and should achieve the same in
> most cases.

The above would supplement use of quick check - indeed it would be a way of 
implementing the concept of "stable codepoints" that the UTR suggests using with quick 
check.

Re: Unicode Normalisaton Optimisation Experiments

Reply via email to