RE: My Querry

2004-11-27 Thread Mike Ayers
Title: RE: My Querry From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] On Behalf Of Mark E. Shoulson Sent: Tuesday, November 23, 2004 8:43 PM Why is it that even simple questions asked about straightforward aspects of Unicode somehow mutate into There is nothing straightforward

Re: Misuse of 8th bit [Was: My Querry]

2004-11-26 Thread Antoine Leca
On Thursday, November 25th, 2004 08:05Z Philippe Verdy va escriure: In ASCII, or in all other ISO 646 charsets, code positions are ALL in the range 0 to 127. Nothing is defined outside of this range, exactly like Unicode does not define or mandate anything for code points larger than

Re: Misuse of 8th bit [Was: My Querry]

2004-11-26 Thread Philippe Verdy
From: Antoine Leca [EMAIL PROTECTED] On Thursday, November 25th, 2004 08:05Z Philippe Verdy va escriure: In ASCII, or in all other ISO 646 charsets, code positions are ALL in the range 0 to 127. Nothing is defined outside of this range, exactly like Unicode does not define or mandate anything for

Re: Misuse of 8th bit [Was: My Querry]

2004-11-26 Thread Asmus Freytag
The fact is, once you dedicate the top bits in a pipe to some purposes, you've narrowed the width of the pipe. That's what happened to those systems that implemented a 7-bit pipe for ASCII by using the top bit for other purposes. And everybody seems to agree that when you serialize such an

Re: Misuse of 8th bit [Was: My Querry]

2004-11-26 Thread John Cowan
Antoine Leca scripsit: In a similar vein, I cannot be in agreement that it could be advisable to use the 22th, 23th, 32th, 63th, etc., the upper bits of the storage of a Unicode codepoint. Right now, nobody is seeing any use for them as part of characters, but history should have learned us

Re: Misuse of 8th bit [Was: My Querry]

2004-11-26 Thread Doug Ewell
John Cowan jcowan at reutershealth dot com wrote: No, I don't agree with this part. Unicode just isn't going to expand past 0x10 unless Earth joins the Galactic Empire. So the upper bits are indeed free for private uses. A few years ago there was the Whistler Constant, which basically

Misuse of 8th bit [Was: My Querry]

2004-11-25 Thread Antoine Leca
On Wednesday, November 24th, 2004 22:16Z Asmus Freytag va escriure: I'm not seeing a lot in this thread that adds to the store of knowledge on this issue, but I see a number of statements that are easily misconstrued or misapplied, including the thoroughly discredited practice of storing

Re: Misuse of 8th bit [Was: My Querry]

2004-11-25 Thread Philippe Verdy
From: Antoine Leca [EMAIL PROTECTED] On Wednesday, November 24th, 2004 22:16Z Asmus Freytag va escriure: I'm not seeing a lot in this thread that adds to the store of knowledge on this issue, but I see a number of statements that are easily misconstrued or misapplied, including the thoroughly

Re: Misuse of 8th bit [Was: My Querry]

2004-11-25 Thread Doug Ewell
Philippe Verdy verdy underscore p at wanadoo dot fr wrote: Whever an application chooses to use the 8th (or even 9th...) bit of a storage or memory or networking byte used also to store an ASCII-coded character, as a zero, or as a even or odd parity bit, of for other purpose is the choice of

Re: My Querry

2004-11-24 Thread Asmus Freytag
At 04:23 PM 11/23/2004, Chris Jacobs wrote: Now, this implies that UTF-8 does interpret U+ as an ASCII NULL control char. This is incompatible with using it as a string terminator. Except that it's up to you how to interpret the C0 control codes in Unicode. You can do it according to ISO 6429

My Querry

2004-11-23 Thread Harshal Trivedi
How can i make sure that UTF-8 format string has terminated while encoding it, as compared to C program string which ends with '\0' (NULL) character? - Is there any special symbol or procedure to determine end of UTF-8 string OR just ASCII NULL '\0' is used as it is to indicate that. -- Harshal

RE: My Querry

2004-11-23 Thread Addison Phillips [wM]
Internationalization is an architecture. It is not a feature. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Harshal Trivedi Sent: 20041123 3:42 To: [EMAIL PROTECTED] Subject: My Querry How can i make sure that UTF-8 format string has terminated while

RE: My Querry

2004-11-23 Thread Mike Ayers
Title: RE: My Querry From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] On Behalf Of Addison Phillips [wM] Sent: Tuesday, November 23, 2004 9:14 AM One of the nice things about UTF-8 is that the ASCII bytes from 0 to 7F hex (including the C0 control characters from \x00 through

Re: My Querry

2004-11-23 Thread Kenneth Whistler
Harshal Trivedi asked: How can i make sure that UTF-8 format string has terminated while encoding it, as compared to C program string which ends with '\0' (NULL) character? You don't need to do anything special at all when using UTF-8 in C programs, as far as string termination goes. UTF-8

RE: My Querry

2004-11-23 Thread Mike Ayers
Title: RE: My Querry From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] On Behalf Of Harshal Trivedi Sent: Tuesday, November 23, 2004 3:42 AM How can i make sure that UTF-8 format string has terminated while encoding it, as compared to C program string which ends with '\0' (NULL

RE: My Querry

2004-11-23 Thread Addison Phillips [wM]
Title: RE: My Querry (B (B (BHi Mike, (B (BYou misread my sentence, I think. I did NOT say that C language strings (Bare compatible with UTF-8, but rather that the UTF-8 was designed with (Bcompatibility with C language "strings" (char*) in mind. The (Bpoint of UTF-8 wa

Re: My Querry

2004-11-23 Thread Philippe Verdy
From: Antoine Leca [EMAIL PROTECTED] I do not know what does mean fully compatible in such a context. For example, ASCII as designed allowed (please note I did not write was designed to allow) the use of the 8th bit as parity bit when transmitted as octet on a telecommunication line; I doubt such

Re: My Querry

2004-11-23 Thread Antoine Leca
Philippe Verdy écrivit: From: Antoine Leca [EMAIL PROTECTED] For example, ASCII as designed allowed (please note I did not write was designed to allow) the use of the 8th bit as parity bit when transmitted as octet on a telecommunication line; I doubt such use is compatible with UTF-8. The

Re: My Querry

2004-11-23 Thread Chris Jacobs
RE: My Querry (B- Original Message - (BFrom: Addison Phillips [wM] (BTo: Mike Ayers (BCc: [EMAIL PROTECTED] (BSent: Tuesday, November 23, 2004 8:15 PM (BSubject: RE: My Querry (B (B (B Hi Mike, (B (B You misread my sentence, I think. I did NOT say that C language strings

RE: My Querry

2004-11-23 Thread D. Starner
Mike Ayers [EMAIL PROTECTED] writes: What is wrong? That UTF-8 (born FSS-UTF) was designed to be compatible with C language strings?' Yes. A character encoding can be compatible with ASCII or C language strings, but not both, as those two were not compatible to begin with.

Re: My Querry

2004-11-23 Thread Mark E. Shoulson
Why is it that even simple questions asked about straightforward aspects of Unicode somehow mutate into hairsplitting arguments about who exactly meant what and which version does which...? I'm glad I didn't ask this question here! ~mark

Re: My Querry

2004-11-23 Thread Doug Ewell
Philippe Verdy verdy underscore p at wanadoo dot fr wrote: By saying UTF-8 is fully compatible with ASCII, it says that any ASCII-only encoded file needs no reencoding of its bytes to make it UTF-8. Note that this is only true for the US version of ASCII (well, ASCII is normally designating

Re: My Querry

2004-11-23 Thread John Cowan
Antoine Leca scripsit: Sorry, no: there is no requirement to clear it. You are assuming something about the way data are handled. When you handle ASCII data using octets, you can perfectly, and conformantly, keep some other data (being parity or whatever) inside the 8th bit; so with even