Re: Text Editors and Canonical Equivalence (was Coloured diacriti cs)

Peter Kirk Tue, 09 Dec 2003 13:15:41 -0800

On 09/12/2003 10:22, Marco Cimarosti wrote:

Peter Kirk wrote:
So, should n equal four or five? The answer would appear to
depend on  whether or not the source file was saved in NFC
or NFD format.
No, surely not. If the wcslen() function is fully Unicode conformant, it should give the same output whatever the canonically equivalent form of its input. That more or less implies that it should normalise its input.
Standards and fantasy are both good things, provided you don't mix them up.
The "wcslen" has nothing whatsoever to do with the Unicode standard, but it
has all to do with the *C* standard. And, according to the C standard,
"wcslen" must simply count the number "wchar_t" array elements from the
location pointed to by its argument up to the first "wchar_t" element whose
value is L'\0'. Full stop.

OK, as a C function handling wchar_t arrays it is not expected to conform to Unicode. But if it is presented as a function available to users for handling Unicode text, for determining how many characters (as defined by Unicode) are in a string, it should conform to Unicode, including C9.

...

The Unicode standard does allow for special display modes in which the exact underlying string, including control characters, is made visible.
Can you please cite the passage where the Unicode standard would not allow
this?

TUS 4.0 p.60 (part of C9):

Even processes that normally do not distinguish between canonical-equivalent character sequences can have reasonable exception behavior. Some examples of this behavior include ... “Show Hidden Text” modes that reveal memory representation structure; ...

Somewhere else I think there is more detail.

--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/

Re: Text Editors and Canonical Equivalence (was Coloured diacriti cs)

Reply via email to