glibc is not more borken and any other C library implementing toupper and tolower from the legacy "ctype" standard library. These are old APIs that are just widely used and still have valid contexts were they are simple and safe to use. But they are not meant to convert text.
The i18n data just shows the mappings used for tolower, toupper (and totile) but it is clearly not enough to implement strtolower and strtoupper which require more rules (notably 1 to 2 or 2 to 1 mappings, plus support for normalisation/composition/decomposition and recognizing canonical equivalents, in all possible reorderings, and more data for contextual rules such as the final form of sigma). Such data may be be easily expressible in some cases with such tabular format, and could be implemented by locale-specific code, for example to handle some dictionary lookups (as required with some Asian scripts for word breaking, and implicilty needed for the Korean script whose normalisation is not handle by table lookups but algorithmically by code only within the normalizer) I don't see anything wrong with existing glibc "18n" data. Glibc would be wrong however if it *only* used tolower/toupper to implement strtolower/strtoupper (but this was what was still done in the past since the creation of the "standard" C library on Unix and even later on DOS, MacOS, Windows and most other systems... before the creation of Unicode and its development to support more languages, scripts, and orthographic systems.) Modern i18n libraries (for various programming languages) contain more advanced support API for correct case mappings on full strings (including M-to-N mappings, contextual rules and support of canonical equivalences), and these API no longer assume that the output string will be the same length as the input and only 1:1 mappings will be performed over each character (even if this is still what is done when using the "C" root locale working only for a few languages and only with simple texts using restricted alphabets without all the possible Unicode extensions, needed now to support more than the native language but also many proper names and "foreign" toponyms, or texts containing small citations in another language, or any multilingual document). 2014-11-09 1:45 GMT+01:00 Christopher Vance <cjsva...@gmail.com>: > So glibc is broken. This doesn't make it a Unicode problem. > >
_______________________________________________ Unicode mailing list Unicode@unicode.org http://unicode.org/mailman/listinfo/unicode