Peter Kirk wrote:OK, as a C function handling wchar_t arrays it is not expected to conform to Unicode. But if it is presented as a function available to users for handling Unicode text, for determining how many characters (as defined by Unicode) are in a string, it should conform to Unicode, including C9.
No, surely not. If the wcslen() function is fully Unicode conformant, it should give the same output whatever theSo, should n equal four or five? The answer would appear to depend on whether or not the source file was saved in NFC or NFD format.
canonically equivalent form of its input.
That more or less implies that it should normalise its input.
Standards and fantasy are both good things, provided you don't mix them up.
The "wcslen" has nothing whatsoever to do with the Unicode standard, but it has all to do with the *C* standard. And, according to the C standard, "wcslen" must simply count the number "wchar_t" array elements from the location pointed to by its argument up to the first "wchar_t" element whose value is L'\0'. Full stop.
...TUS 4.0 p.60 (part of C9):
The Unicode standard does allow for special display modes in which the exact underlying string, including control
characters, is made visible.
Can you please cite the passage where the Unicode standard would not allow this?
Even processes that normally do not distinguish between canonical-equivalent character sequences can have reasonable exception behavior. Some examples of this behavior include ... “Show Hidden Text” modes that reveal memory representation structure; ...
Somewhere else I think there is more detail.
-- Peter Kirk [EMAIL PROTECTED] (personal) [EMAIL PROTECTED] (work) http://www.qaya.org/