Title: RE: My Querry
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]] On Behalf Of Mark E. Shoulson
Sent: Tuesday, November 23, 2004 8:43 PM
Why is it that even simple questions asked about
straightforward aspects of Unicode somehow mutate into
There is nothing straightforward
On Thursday, November 25th, 2004 08:05Z Philippe Verdy va escriure:
In ASCII, or in all other ISO 646 charsets, code positions are ALL in
the range 0 to 127. Nothing is defined outside of this range, exactly
like Unicode does not define or mandate anything for code points
larger than
From: Antoine Leca [EMAIL PROTECTED]
On Thursday, November 25th, 2004 08:05Z Philippe Verdy va escriure:
In ASCII, or in all other ISO 646 charsets, code positions are ALL in
the range 0 to 127. Nothing is defined outside of this range, exactly
like Unicode does not define or mandate anything for
The fact is, once you dedicate the top bits in a pipe to some purposes,
you've narrowed the width of the pipe. That's what happened to those
systems that implemented a 7-bit pipe for ASCII by using the top bit for
other purposes.
And everybody seems to agree that when you serialize such an
Antoine Leca scripsit:
In a similar vein, I cannot be in agreement that it could be advisable to
use the 22th, 23th, 32th, 63th, etc., the upper bits of the storage of a
Unicode codepoint. Right now, nobody is seeing any use for them as part of
characters, but history should have learned us
John Cowan jcowan at reutershealth dot com wrote:
No, I don't agree with this part. Unicode just isn't going to expand
past 0x10 unless Earth joins the Galactic Empire. So the upper
bits are indeed free for private uses.
A few years ago there was the Whistler Constant, which basically
On Wednesday, November 24th, 2004 22:16Z Asmus Freytag va escriure:
I'm not seeing a lot in this thread that adds to the store of
knowledge on this issue, but I see a number of statements that are
easily misconstrued or misapplied, including the thoroughly
discredited practice of storing
From: Antoine Leca [EMAIL PROTECTED]
On Wednesday, November 24th, 2004 22:16Z Asmus Freytag va escriure:
I'm not seeing a lot in this thread that adds to the store of
knowledge on this issue, but I see a number of statements that are
easily misconstrued or misapplied, including the thoroughly
Philippe Verdy verdy underscore p at wanadoo dot fr wrote:
Whever an application chooses to use the 8th (or even 9th...) bit of a
storage or memory or networking byte used also to store an ASCII-coded
character, as a zero, or as a even or odd parity bit, of for other
purpose is the choice of
At 04:23 PM 11/23/2004, Chris Jacobs wrote:
Now, this implies that UTF-8 does interpret U+ as an ASCII NULL
control char.
This is incompatible with using it as a string terminator.
Except that it's up to you how to interpret the C0 control codes in Unicode.
You can do it according to ISO 6429
How can i make sure that UTF-8 format string has terminated while
encoding it, as compared to C program string which ends with '\0'
(NULL) character?
- Is there any special symbol or procedure to determine end of UTF-8
string OR just ASCII NULL '\0' is used as it is to indicate that.
--
Harshal
Internationalization is an architecture.
It is not a feature.
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] Behalf Of Harshal Trivedi
Sent: 20041123 3:42
To: [EMAIL PROTECTED]
Subject: My Querry
How can i make sure that UTF-8 format string has terminated while
Title: RE: My Querry
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]] On Behalf Of Addison Phillips [wM]
Sent: Tuesday, November 23, 2004 9:14 AM
One of the nice things about UTF-8 is that the ASCII bytes
from 0 to 7F hex (including the C0 control characters from
\x00 through
Harshal Trivedi asked:
How can i make sure that UTF-8 format string has terminated while
encoding it, as compared to C program string which ends with '\0'
(NULL) character?
You don't need to do anything special at all when using UTF-8
in C programs, as far as string termination goes. UTF-8
Title: RE: My Querry
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]] On Behalf Of Harshal Trivedi
Sent: Tuesday, November 23, 2004 3:42 AM
How can i make sure that UTF-8 format string has terminated
while encoding it, as compared to C program string which ends
with '\0'
(NULL
Title: RE: My Querry
(B
(B
(BHi Mike,
(B
(BYou misread my sentence, I think. I did NOT say that C language strings
(Bare compatible with UTF-8, but rather that the UTF-8 was designed with
(Bcompatibility with C language "strings" (char*) in mind. The
(Bpoint of UTF-8 wa
From: Antoine Leca [EMAIL PROTECTED]
I do not know what does mean fully compatible in such a context. For
example, ASCII as designed allowed (please note I did not write was
designed to allow) the use of the 8th bit as parity bit when transmitted
as
octet on a telecommunication line; I doubt such
Philippe Verdy écrivit:
From: Antoine Leca [EMAIL PROTECTED]
For example, ASCII as designed allowed (please note I did not write
was designed to allow) the use of the 8th bit as parity bit when
transmitted as octet on a telecommunication line; I doubt such use is
compatible with UTF-8.
The
RE: My Querry
(B- Original Message -
(BFrom: Addison Phillips [wM]
(BTo: Mike Ayers
(BCc: [EMAIL PROTECTED]
(BSent: Tuesday, November 23, 2004 8:15 PM
(BSubject: RE: My Querry
(B
(B
(B Hi Mike,
(B
(B You misread my sentence, I think. I did NOT say that C language strings
Mike Ayers [EMAIL PROTECTED] writes:
What is wrong? That UTF-8 (born FSS-UTF) was designed to be
compatible with C language strings?'
Yes. A character encoding can be compatible with ASCII or C
language strings, but not both, as those two were not compatible to begin
with.
Why is it that even simple questions asked about straightforward aspects
of Unicode somehow mutate into hairsplitting arguments about who exactly
meant what and which version does which...? I'm glad I didn't ask this
question here!
~mark
Philippe Verdy verdy underscore p at wanadoo dot fr wrote:
By saying UTF-8 is fully compatible with ASCII, it says that any
ASCII-only encoded file needs no reencoding of its bytes to make it
UTF-8.
Note that this is only true for the US version of ASCII (well, ASCII
is normally designating
Antoine Leca scripsit:
Sorry, no: there is no requirement to clear it.
You are assuming something about the way data are handled. When you handle
ASCII data using octets, you can perfectly, and conformantly, keep some
other data (being parity or whatever) inside the 8th bit; so with even
23 matches
Mail list logo