Re: discovering code points with embedded nulls

2003-02-06 Thread Jim Allan
Doug Ewell posted: The use of NULL to terminate strings is a basic part of the Standard C library, not just certain APIs. As such, it doesn't seem right to call this a "misuse" of the character. But ISO 646, in defining ASCII, states as the defintion of the control character NULL: "A control

RE: discovering code points with embedded nulls

2003-02-06 Thread Erik.Ostermueller
> Subject: Re: discovering code points with embedded nulls > > > What is that strange file (winmail.dat) attached to > your mail? I really > hope that it isn't a virus. > > Stefan > > Kent Karlsson wrote:

RE: discovering code points with embedded nulls

2003-02-06 Thread Marco Cimarosti
Stefan Persson wrote: > What is that strange file (winmail.dat) attached to your > mail? I really hope that it isn't a virus. http://support.microsoft.com/default.aspx?scid=KB;en-us;q241538 (Whether MS Outlook is a virus or not, is still a debated issue. :-) _ Marco

RE: discovering code points with embedded nulls

2003-02-06 Thread Marco Cimarosti
Doug Ewell wrote: > Kent Karlsson wrote: > > >> From what I'm hearing from you all is that a null in UTF-8 is > >> for termination and termination only. > >> Is this correct? > > > > No, NULL is a character (actually a control character) among many > > others. However, many C/C++ APIs (mis)use NU

Re: discovering code points with embedded nulls

2003-02-06 Thread Stefan Persson
What is that strange file (winmail.dat) attached to your mail? I really hope that it isn't a virus. Stefan Kent Karlsson wrote: From what I'm hearing from you all is that a null in UTF-8 is for termination and termination only. Is this correct? No, NULL is a character (actually a contro

Re: discovering code points with embedded nulls

2003-02-06 Thread Doug Ewell
Kent Karlsson wrote: >> From what I'm hearing from you all is that a null in UTF-8 is >> for termination and termination only. >> Is this correct? > > No, NULL is a character (actually a control character) among many > others. However, many C/C++ APIs (mis)use NULL as a string terminator > since

RE: discovering code points with embedded nulls

2003-02-06 Thread Kent Karlsson
> From what I'm hearing from you all is that a null in UTF-8 is > for termination and termination only. > Is this correct? No, NULL is a character (actually a control character) among many others. However, many C/C++ APIs (mis)use NULL as a string terminator since NULL isn't very useful for othe

RE: discovering code points with embedded nulls

2003-02-05 Thread Kenneth Whistler
Erik followed up: > From what I'm hearing from you all is that a null > in UTF-8 is for termination and termination only. > Is this correct? Not quite. A null byte (0x00) in UTF-8 is only a representation of the NULL character (U+). It can be present in UTF-8 for whatever purposes one might

RE: discovering code points with embedded nulls

2003-02-05 Thread Erik.Ostermueller
I'm replying to myself, here. Thank you all for so many quick and helpful responses. As most of you pointed out, I misread the documentation -- which is doc for multi-byte strings only (and not wide strings). So I was brain dead when I asked about encodings other than UTF-8. The doc states (in

Re: discovering code points with embedded nulls

2003-02-05 Thread Otto Stolz
[EMAIL PROTECTED] wrote: I'm dealing with an API that claims it doesn't support unicode characters with embedded nulls. ... Test all constituent bytes for 0x00. This depends on the encoding form you are using (and the API is expecting): - UTF-8 encodes a Unicode string into a sequence of by

RE: discovering code points with embedded nulls

2003-02-05 Thread Marco Cimarosti
Erik Ostermueller wrote: > I'm dealing with an API that claims it doesn't support > unicode characters with embedded nulls. > I'm trying to figure out how much of a liability this is. If by "embedded nulls" they mean bytes of value zero, that library can *only* work with UTF-8. The other two UTF'

RE: discovering code points with embedded nulls

2003-02-05 Thread Rick Cameron
Are you sure the API doesn't support Unicode _characters_ with embedded NULs? Or does it fail to support Unicode _strings_ with embedded NULs? If it really is the former, no character in UTF-8 (except, of course, U+) will include a NUL byte. In UTF-16, it will be any character of the form U+00