At Mon, 11 Jun 2001 15:43:42 -0700,
Carl W. Brown <[EMAIL PROTECTED]> wrote:
> I first I thought the same thing but I have changed my mind. There are
> problems but the problems are with UTF-16 not UTF-8.
I don't think your new UTF-16 propesal solves any problem. It's yet
another encoding. It wo
On 06/11/2001 10:45:46 PM Mark Davis wrote:
[earlier]
> - Oracle could probably make a case for their name for UTF8 simply being
>an
> anachronism. After all, the original definition of UTF-8 did convert
> surrogate pairs as they are doing in what they call UTF8.
[now]
>UTF-8 was defined before
>In other words, Oracle has an alternate solution here for 9i -- they can
>simply explain that the old product defined the old pre-surrogate UTF-8
and
>the new product is now surrogate aware and uses the current definition.
There's a mistake being made here that has been made repeatedly througho
>Will the Unicode version of UTF-8 be registered with IANA and, if so, what
>will be its "charset" designation?
Firstly, please let's halt the confusion right here. I'll repeat myself
once more. We're not redefining UTF-8. What we're proposing is not a
"Unicode version of UTF-8". It is anoth
Marco Cimarosti <[EMAIL PROTECTED]> wrote:
>My assumption was that, in the first case (no sort order requested by the
>client), a server could in theory provide a result set randomly shuffled.
Of
>course, I know that this won't normally happen but, however, the server is
>allowed to provide whate
At 18:43 -0700 2001-06-11, Rick McGowan wrote:
>Everson wrote:
>
>> Lots of people with names like McGowan like to have the "c",
>> ostensibly an abbreviation for "ac" superscripted and underlined. ;-)
>
>(Sound of wretching...)
You mean "Ack!"?
>Uh, no. I like it just fine as-is. If I
>actu
Antoine Leca shcrissi (Sicilian, this time):
> Marco Cimarosti écrivit (!):
> That is true. It is as true as the fact that when we French
> are to write the oe digraph, we *type* it as two separate
> letters, for lack of better solutions.
The two issues are quite different.
- The lack of French
Peter Constable wrote:
> >The point is that encodings currently used for French have
> none of these.
>
> Well, then, just do what the French do: don't use any of
> them, even though you may be tempted to use some.
> [...]
> >The ideal for me, rather than adding the missing "e" and
> "i", woul
Tuesday, June 12, 2001
Did the Lion dip his thorn in ink?
Jim Agenbroad (discalimer and addresses at bottom)
On Mon, 11 Jun 2001, John Hudson wrote:
> At 15:56 6/11/2001 +0100, Michael Everson wrote:
>
> >Shaw, Bernard. 1962. Androcles & th
- Original Message -
From: "System Administrator" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Monday, June 11, 2001 21:41
Subject: Undeliverable: Re: UTF8 vs AL32UTF8
> Your message
>
> To: Misha Wolf
> Cc: [EMAIL PROTECTED]
> Subject: Re: UTF8 vs AL32UTF8
> Sent
We will respond more fully later, but I want to make it very clear that
despite the unfortunate and confusing choice of name, "xICU" is not
connected to the ICU product or team in any way.
Mark
- Original Message -
From: "Bill Kurmey" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Mon
On 06/12/2001 10:29:26 AM "Mark Davis" wrote:
>When applying UTF-8 -- as originally designed -- the sequence D800
>DC00 would transform into a 6-byte sequence. Transforming back would
>result with the original sequence D800 DC00. When applying this to
>Unicode (16 bit only, at th
>What do you plan to propose for phonetic modifier letters "a",
>"o" and "i":
>
>1) Will you propose three new code points?
>
>2) will you propose to unify them with U+00AA, U00BA and U+2071?
If I were to propose new code points, the only differences might be between
Ll and Lm, and that 00AA and
In a message dated 2001-06-12 1:07:17 Pacific Daylight Time,
[EMAIL PROTECTED] writes:
> There's a mistake being made here that has been made repeatedly throughout
> our discussion: that's to assume that there are two kinds of UTF-8: the
> original, in which the code unit sequence < ED A0 80
Bill Kurmey wrote:
> Will the Unicode version of UTF-8 be registered with IANA and, if so, what
> will be its "charset" designation?
I believe this question is based on a misunderstanding:
"6-byte sequences" have been mentioned in this discussion. The intended meaning was
"pairs of 3-byte seque
Toby,
I agree that there is a need to preserve standards. Oracle did not support
surrogates. If you passed it a UTF-16 data stream it would not be converted
into proper UTF-8 encoding. At this juncture it should have fixed UTF8.
This would have worked with the old data because it had no non-pl
That would be viewing history in the prism of present thought.
When applying UTF-8 -- as originally designed -- the sequence D800
DC00 would transform into a 6-byte sequence. Transforming back would
result with the original sequence D800 DC00. When applying this to
Unicode (16 bit
Bill,
This product is not developed by the ICU development team. We at X.Net are
making this code available for people who are interested in implementing
ICU. This is designed to simplify ICU implementation if you choose to use
it. Even if you use it you can always invoke ICU directly.
I have
Mark,
I though hard about the name. I thought that maybe it should be totally
different. This would imply that it is a product in and of itself that
happens to use ICU. Instead it is more like a sample program to be used to
implement ICU, not a product in and of itself.
It provides functional
On 12/06/2001 06:43:10 Bill Kurmey wrote:
[...]
> Is the following an accurate statement of the present situation?
>
> Currently, if an email client receives a message with "Content Type:"
> containing "charset=UTF-8" and accepts up to 6 octets for each scalar
> value, it would be considered "U
Lisa Moore wrote:
> Jianping wrote:
>
> only Oracle provides fully UTF-8 and
> UTF-16 support for RDBMS
>
> Whoa...let me interject, DB2 for OS/390 supports UTF-8 and UTF-16. And DB2
> for Intel, Unix, supported both much earlier. I cannot speak to Jiangping's
> intrepretation of "fully"
>
Th
On 12/06/2001 04:16:50 Peter Constable wrote:
[...]
> I agree. I scheduled a week-long engagement mid-Sept. expecting from the
> past few years IUC to be held the first week of Sept. This has resulted in
> a conflict requiring me to adjust travel plans. It also wouldn't hurt to
> advertise futu
On 06/12/2001 01:13:48 PM Jianping Yang wrote:
>If you convert < ED A0 80 ED B0 80 > into UTF-16, what does it mean then?
I
>think definitely it means U-0001.
I'd say not if that 6-byte sequence is interpreted in terms of *UTF-8*.
UTF-8 has no 6-byte sequences. It must be something else, li
[EMAIL PROTECTED] wrote:
> On 06/11/2001 10:45:46 PM Mark Davis wrote:
>
> [earlier]
> > - Oracle could probably make a case for their name for UTF8 simply being
> >an
> > anachronism. After all, the original definition of UTF-8 did convert
> > surrogate pairs as they are doing in what they cal
[EMAIL PROTECTED] wrote:
> On 06/12/2001 01:13:48 PM Jianping Yang wrote:
>
> >If you convert < ED A0 80 ED B0 80 > into UTF-16, what does it mean then?
> I
> >think definitely it means U-0001.
>
> I'd say not if that 6-byte sequence is interpreted in terms of *UTF-8*.
So UTF-8 is not comp
Mark said:
> UTF-8 was defined before UTF-16. At the time it was first defined, there
> were no surrogates, so there was no special handling of the D800..DFFF code
> points.
Technically, the first statement is not true.
UTF-2 and FSS-UTF *were* defined well before UTF-16. FSS-UTF was
defined on
On Monday, June 11, 2001 4:14 AM, Vadim Snurnikov wrote:
> How can I read a text in Unicode (Russian) where every Russian letter
> is represented like that: D=B6 (or similar)? Unfortunately, all these
> four characters that stand for one Russian letter are of one byte each,
> so that I am getting
On 06/12/2001 02:05:38 PM Jianping Yang wrote:
>[EMAIL PROTECTED] wrote:
>
>> On 06/12/2001 01:13:48 PM Jianping Yang wrote:
>>
>> >If you convert < ED A0 80 ED B0 80 > into UTF-16, what does it mean
then?
>> I
>> >think definitely it means U-0001.
>>
>> I'd say not if that 6-byte sequence i
On 06/12/2001 01:13:48 PM Jianping Yang wrote:
>If you convert < ED A0 80 ED B0 80 > into UTF-16, what does it mean then?
I
>think definitely it means U-0001.
Please read the definitions and tell me how you support that.
The only way I can see to support that is to assume that the mapping
Case I. Code points U-D800..U-DFFF excluded
from the UTF's. "The way God intended it to be"
code point UTF-8 UTF-16 UTF-32
a. <=> 00
b. D700 <=> ED 9F BF D7FF D7FF
g. E000 <=> E
The Unicode Standard 3.0 (page 149) says that U+007E can be used as a
Spacing Clone of Combining Tilde. But isn't it this the function of U+02DC
(the so called "SMALL TILDE") ? Why suggest this usage then and not point to
U+02DC ?
Could one say (as some typographers see it) that U+007E should fo
One thing needs to clarify here is that there is no four byte encoding in
UTF-8S proposal and four byte encoding is illegal but not irregular. As
everything in UTF-8S is perfect match to UTF-16, any blame to this proposal
also applies to UTF-16 encoding form.
Regards,
Jianping.
Kenneth Whistler
Patrick Andries asked:
> The Unicode Standard 3.0 (page 149) says that U+007E can be used as a
> Spacing Clone of Combining Tilde. But isn't it this the function of U+02DC
> (the so called "SMALL TILDE") ? Why suggest this usage then and not point to
> U+02DC ?
>
> Could one say (as some typogra
Jianping wrote:
> One thing needs to clarify here is that there is no four byte encoding in
> UTF-8S proposal and four byte encoding is illegal but not irregular. As
> everything in UTF-8S is perfect match to UTF-16, any blame to this proposal
> also applies to UTF-16 encoding form.
Well after a
Kenneth Whistler wrote:
> This rampant failure to edit reply-to's is threatening
> to bring the wrath of Sarasvati back down on the list, folks.
Sarasvati is indeed listening and taking note.
Please do edit your replies and exercise some
intelligent snipping behaviour.
Cheery regards from your,
The Unicode Standard 3.0 (page 150) says that "U+2011 NON-BREAKING HYPHEN is
present for compatibility with existing standards" as if it shouldn't really
be encoded. But isn't its relation to U+2010, the same as the one that
opposes SPACE to NO-BREAK SPACE, i.e. a semantic (behavioural) one ?
Pat
Kenneth Whistler wrote:
> Jianping wrote:
>
> > One thing needs to clarify here is that there is no four byte encoding in
> > UTF-8S proposal and four byte encoding is illegal but not irregular. As
> > everything in UTF-8S is perfect match to UTF-16, any blame to this proposal
> > also applies
Kenneth Whistler wrote:
> Jianping responded:
>
> > Kenneth Whistler wrote:
> >
> > > Jianping wrote:
> > >
> > > > One thing needs to clarify here is that there is no four byte encoding in
> > > > UTF-8S proposal and four byte encoding is illegal but not irregular. As
> > > > everything in UTF
Jianping responded:
> Kenneth Whistler wrote:
>
> > Jianping wrote:
> >
> > > One thing needs to clarify here is that there is no four byte encoding in
> > > UTF-8S proposal and four byte encoding is illegal but not irregular. As
> > > everything in UTF-8S is perfect match to UTF-16, any blame t
UTF-8s is reminiscent of a problem that I had installing a certain vendor's
terminals. Each screen was about 2K of data. The terminal communications
protocol broke the data into 128 byte chunks. Each block had a small header
and the terminal would wait for a response before the next block was s
Patrick Andries asked:
> The Unicode Standard 3.0 (page 150) says that "U+2011 NON-BREAKING HYPHEN is
> present for compatibility with existing standards" as if it shouldn't really
> be encoded. But isn't its relation to U+2010, the same as the one that
> opposes SPACE to NO-BREAK SPACE, i.e. a s
Vadim Snurnikov wrote:
> How
can I read a text in Unicode (Russian) where every Russian letter> is
represented like that: D=B6 (or similar)? (The e-mail got>
transferred to this format.)
What kind of software is used to get E-mail?
I recommend Outlook Express 5.0 and above. It allows yo
Sarasvati,
-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On
Behalf Of Sarasvati
Sent: Tuesday, June 12, 2001 3:55 PM
To: [EMAIL PROTECTED]
Subject: Re: And Visions of Sugar Plum UTF-8's Dance in Their Heads
>Kenneth Whistler wrote:
>> This rampant failure to edit
Jianping said:
> > What you finally stated today is that is flat-out
> > *illegal* in UTF-8s. That was a missing piece of the puzzle for anyone
> > trying to interpret what you are proposing.
> >
>
> In the UTF-8S, there should be no irregular forms, should we repeat the history
>again?
> Nobo
I would urge Oracle and friends to move this to a different venue, specifically
[EMAIL PROTECTED] As far as I can see, UTF-8S does not need either
the approval or the disapproval of the Unicode Consortium. If it is
actually in use, it needs a label -- and IANA is in the business of assigning
suc
>In fact, in this particular case, if I recall, the distinctions were
>probably considered to be good practice, and not something to be mapped
>away. XCCS was often a *model* for early Unicode, rather than a character
>encoding that forced the grudging inclusion of many icky "characters"
>that we
From: <[EMAIL PROTECTED]>
> Out of curiousity, is there documentation on XCCS available anywhere?
Check out google.com: it will get about 120+ hits on the words "XCCS
standard" and several of them seem vaguely relevant. :-)
MichKa
Michael Kaplan
Trigeminal Software, Inc.
http://www.trigeminal.
On 06/12/2001 09:21:06 PM Jianping Yang wrote:
>Nobody except you though that 4-byte is allowed in UTF-8S.
Not so! That was under discussion just a few days ago:
On 06/07/2001 12:34:49 AM DougEwell2 wrote:
[snip]
>But definition D29 says that a UTF must round-trip these invalid code
point
48 matches
Mail list logo