Jill Ramonsky wrote:
I had to write an API for my employer last year to handle some aspects of Unicode. We normalised everything to NFD, not NFC (but that's easier, not harder). Nonetheless, all the string handling routines were not allowed to assume that the input was in NFD, but they had to guarantee that the output was. These routines, therefore, had to do a "convert to NFD" on every input, even if the input were already in NFD. This did have a significant performance hit, since we were handling (Unicode) strings throughout the app.

Note that, in addition to "is normalized" flags, it is much faster to check whether a string is normalized, and to normalize it only if it's not. This at least if there is a good chance that the string is normalized - as appears to be true in your application, and is usually true where most other applications check for NFC on input. See UAX #15 for details. ICU has quick check and normalization functions.


markus


Reply via email to