date:20010612

RE: UTF-16 problems

2001-06-12 Thread Shigemichi Yazawa

At Mon, 11 Jun 2001 15:43:42 -0700, Carl W. Brown <[EMAIL PROTECTED]> wrote: > I first I thought the same thing but I have changed my mind. There are > problems but the problems are with UTF-16 not UTF-8. I don't think your new UTF-16 propesal solves any problem. It's yet another encoding. It wo

Re: UTF8 vs AL32UTF8

2001-06-12 Thread Peter_Constable

On 06/11/2001 10:45:46 PM Mark Davis wrote: [earlier] > - Oracle could probably make a case for their name for UTF8 simply being >an > anachronism. After all, the original definition of UTF-8 did convert > surrogate pairs as they are doing in what they call UTF8. [now] >UTF-8 was defined before

Re: UTF8 vs AL32UTF8

2001-06-12 Thread Peter_Constable

>In other words, Oracle has an alternate solution here for 9i -- they can >simply explain that the old product defined the old pre-surrogate UTF-8 and >the new product is now surrogate aware and uses the current definition. There's a mistake being made here that has been made repeatedly througho

Re: UTF-8

2001-06-12 Thread toby_phipps

>Will the Unicode version of UTF-8 be registered with IANA and, if so, what >will be its "charset" designation? Firstly, please let's halt the confusion right here. I'll repeat myself once more. We're not redefining UTF-8. What we're proposing is not a "Unicode version of UTF-8". It is anoth

RE: UTF-8 Syntax

2001-06-12 Thread toby_phipps

Marco Cimarosti <[EMAIL PROTECTED]> wrote: >My assumption was that, in the first case (no sort order requested by the >client), a server could in theory provide a result set randomly shuffled. Of >course, I know that this won't normally happen but, however, the server is >allowed to provide whate

Re: Missing characters for Italian

2001-06-12 Thread Michael Everson

At 18:43 -0700 2001-06-11, Rick McGowan wrote: >Everson wrote: > >> Lots of people with names like McGowan like to have the "c", >> ostensibly an abbreviation for "ac" superscripted and underlined. ;-) > >(Sound of wretching...) You mean "Ack!"? >Uh, no. I like it just fine as-is. If I >actu

RE: Missing characters for Italian

2001-06-12 Thread Marco Cimarosti

Antoine Leca shcrissi (Sicilian, this time): > Marco Cimarosti écrivit (!): > That is true. It is as true as the fact that when we French > are to write the oe digraph, we *type* it as two separate > letters, for lack of better solutions. The two issues are quite different. - The lack of French

RE: Missing characters for Italian

2001-06-12 Thread Marco Cimarosti

Peter Constable wrote: > >The point is that encodings currently used for French have > none of these. > > Well, then, just do what the French do: don't use any of > them, even though you may be tempted to use some. > [...] > >The ideal for me, rather than adding the missing "e" and > "i", woul

Re: New acquisition

2001-06-12 Thread James E. Agenbroad

Tuesday, June 12, 2001 Did the Lion dip his thorn in ink? Jim Agenbroad (discalimer and addresses at bottom) On Mon, 11 Jun 2001, John Hudson wrote: > At 15:56 6/11/2001 +0100, Michael Everson wrote: > > >Shaw, Bernard. 1962. Androcles & th

Fw: Undeliverable: Re: UTF8 vs AL32UTF8

2001-06-12 Thread Mark Davis

- Original Message - From: "System Administrator" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Monday, June 11, 2001 21:41 Subject: Undeliverable: Re: UTF8 vs AL32UTF8 > Your message > > To: Misha Wolf > Cc: [EMAIL PROTECTED] > Subject: Re: UTF8 vs AL32UTF8 > Sent

Re: xICU 3.0 Status - (Simplified Unicode Implementation)

2001-06-12 Thread Mark Davis

We will respond more fully later, but I want to make it very clear that despite the unfortunate and confusing choice of name, "xICU" is not connected to the ICU product or team in any way. Mark - Original Message - From: "Bill Kurmey" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Mon

Re: UTF8 vs AL32UTF8

2001-06-12 Thread Peter_Constable

On 06/12/2001 10:29:26 AM "Mark Davis" wrote: >When applying UTF-8 -- as originally designed -- the sequence D800 >DC00 would transform into a 6-byte sequence. Transforming back would >result with the original sequence D800 DC00. When applying this to >Unicode (16 bit only, at th

RE: Missing characters for Italian

2001-06-12 Thread Peter_Constable

>What do you plan to propose for phonetic modifier letters "a", >"o" and "i": > >1) Will you propose three new code points? > >2) will you propose to unify them with U+00AA, U00BA and U+2071? If I were to propose new code points, the only differences might be between Ll and Lm, and that 00AA and

Re: UTF8 vs AL32UTF8

2001-06-12 Thread DougEwell2

In a message dated 2001-06-12 1:07:17 Pacific Daylight Time, [EMAIL PROTECTED] writes: > There's a mistake being made here that has been made repeatedly throughout > our discussion: that's to assume that there are two kinds of UTF-8: the > original, in which the code unit sequence < ED A0 80

Re: UTF-8

2001-06-12 Thread Markus Scherer

Bill Kurmey wrote: > Will the Unicode version of UTF-8 be registered with IANA and, if so, what > will be its "charset" designation? I believe this question is based on a misunderstanding: "6-byte sequences" have been mentioned in this discussion. The intended meaning was "pairs of 3-byte seque

RE: UTF-16 problems

2001-06-12 Thread Carl W. Brown

Toby, I agree that there is a need to preserve standards. Oracle did not support surrogates. If you passed it a UTF-16 data stream it would not be converted into proper UTF-8 encoding. At this juncture it should have fixed UTF8. This would have worked with the old data because it had no non-pl

Re: UTF8 vs AL32UTF8

2001-06-12 Thread Mark Davis

That would be viewing history in the prism of present thought. When applying UTF-8 -- as originally designed -- the sequence D800 DC00 would transform into a 6-byte sequence. Transforming back would result with the original sequence D800 DC00. When applying this to Unicode (16 bit

RE: xICU 3.0 Status - (Simplified Unicode Implementation)

2001-06-12 Thread Carl W. Brown

Bill, This product is not developed by the ICU development team. We at X.Net are making this code available for people who are interested in implementing ICU. This is designed to simplify ICU implementation if you choose to use it. Even if you use it you can always invoke ICU directly. I have

RE: xICU 3.0 Status - (Simplified Unicode Implementation)

2001-06-12 Thread Carl W. Brown

Mark, I though hard about the name. I thought that maybe it should be totally different. This would imply that it is a product in and of itself that happens to use ICU. Instead it is more like a sample program to be used to implement ICU, not a product in and of itself. It provides functional

Re: UTF-8

2001-06-12 Thread Misha Wolf

On 12/06/2001 06:43:10 Bill Kurmey wrote: [...] > Is the following an accurate statement of the present situation? > > Currently, if an email client receives a message with "Content Type:" > containing "charset=UTF-8" and accepts up to 6 octets for each scalar > value, it would be considered "U

Re: UTF-16 problems

2001-06-12 Thread Jianping Yang

Lisa Moore wrote: > Jianping wrote: > > only Oracle provides fully UTF-8 and > UTF-16 support for RDBMS > > Whoa...let me interject, DB2 for OS/390 supports UTF-8 and UTF-16. And DB2 > for Intel, Unix, supported both much earlier. I cannot speak to Jiangping's > intrepretation of "fully" > Th

Re: 19th Unicode Conference, September 2001, San Jose, CA, USA

2001-06-12 Thread Misha Wolf

On 12/06/2001 04:16:50 Peter Constable wrote: [...] > I agree. I scheduled a week-long engagement mid-Sept. expecting from the > past few years IUC to be held the first week of Sept. This has resulted in > a conflict requiring me to adjust travel plans. It also wouldn't hurt to > advertise futu

Re: UTF8 vs AL32UTF8

2001-06-12 Thread Peter_Constable

On 06/12/2001 01:13:48 PM Jianping Yang wrote: >If you convert < ED A0 80 ED B0 80 > into UTF-16, what does it mean then? I >think definitely it means U-0001. I'd say not if that 6-byte sequence is interpreted in terms of *UTF-8*. UTF-8 has no 6-byte sequences. It must be something else, li

Re: UTF8 vs AL32UTF8

2001-06-12 Thread Jianping Yang

[EMAIL PROTECTED] wrote: > On 06/11/2001 10:45:46 PM Mark Davis wrote: > > [earlier] > > - Oracle could probably make a case for their name for UTF8 simply being > >an > > anachronism. After all, the original definition of UTF-8 did convert > > surrogate pairs as they are doing in what they cal

Re: UTF8 vs AL32UTF8

2001-06-12 Thread Jianping Yang

[EMAIL PROTECTED] wrote: > On 06/12/2001 01:13:48 PM Jianping Yang wrote: > > >If you convert < ED A0 80 ED B0 80 > into UTF-16, what does it mean then? > I > >think definitely it means U-0001. > > I'd say not if that 6-byte sequence is interpreted in terms of *UTF-8*. So UTF-8 is not comp

FSS-UTF, UTF-2, UTF-8, and UTF-16

2001-06-12 Thread Kenneth Whistler

Mark said: > UTF-8 was defined before UTF-16. At the time it was first defined, there > were no surrogates, so there was no special handling of the D800..DFFF code > points. Technically, the first statement is not true. UTF-2 and FSS-UTF *were* defined well before UTF-16. FSS-UTF was defined on

Re: FW: Russian Unicode Convertion

2001-06-12 Thread Otto Stolz

On Monday, June 11, 2001 4:14 AM, Vadim Snurnikov wrote: > How can I read a text in Unicode (Russian) where every Russian letter > is represented like that: D=B6 (or similar)? Unfortunately, all these > four characters that stand for one Russian letter are of one byte each, > so that I am getting

Re: UTF8 vs AL32UTF8

2001-06-12 Thread Peter_Constable

On 06/12/2001 02:05:38 PM Jianping Yang wrote: >[EMAIL PROTECTED] wrote: > >> On 06/12/2001 01:13:48 PM Jianping Yang wrote: >> >> >If you convert < ED A0 80 ED B0 80 > into UTF-16, what does it mean then? >> I >> >think definitely it means U-0001. >> >> I'd say not if that 6-byte sequence i

Re: UTF8 vs AL32UTF8

2001-06-12 Thread Peter_Constable

On 06/12/2001 01:13:48 PM Jianping Yang wrote: >If you convert < ED A0 80 ED B0 80 > into UTF-16, what does it mean then? I >think definitely it means U-0001. Please read the definitions and tell me how you support that. The only way I can see to support that is to assume that the mapping

And Visions of Sugar Plum UTF-8's Dance in Their Heads

2001-06-12 Thread Kenneth Whistler

Case I. Code points U-D800..U-DFFF excluded from the UTF's. "The way God intended it to be" code point UTF-8 UTF-16 UTF-32 a. <=> 00 b. D700 <=> ED 9F BF D7FF D7FF g. E000 <=> E

U+007E and U+02DC

2001-06-12 Thread Patrick Andries

The Unicode Standard 3.0 (page 149) says that U+007E can be used as a Spacing Clone of Combining Tilde. But isn't it this the function of U+02DC (the so called "SMALL TILDE") ? Why suggest this usage then and not point to U+02DC ? Could one say (as some typographers see it) that U+007E should fo

Re: And Visions of Sugar Plum UTF-8's Dance in Their Heads

2001-06-12 Thread Jianping Yang

One thing needs to clarify here is that there is no four byte encoding in UTF-8S proposal and four byte encoding is illegal but not irregular. As everything in UTF-8S is perfect match to UTF-16, any blame to this proposal also applies to UTF-16 encoding form. Regards, Jianping. Kenneth Whistler

Re: U+007E and U+02DC

2001-06-12 Thread Kenneth Whistler

Patrick Andries asked: > The Unicode Standard 3.0 (page 149) says that U+007E can be used as a > Spacing Clone of Combining Tilde. But isn't it this the function of U+02DC > (the so called "SMALL TILDE") ? Why suggest this usage then and not point to > U+02DC ? > > Could one say (as some typogra

Re: And Visions of Sugar Plum UTF-8's Dance in Their Heads

2001-06-12 Thread Kenneth Whistler

Jianping wrote: > One thing needs to clarify here is that there is no four byte encoding in > UTF-8S proposal and four byte encoding is illegal but not irregular. As > everything in UTF-8S is perfect match to UTF-16, any blame to this proposal > also applies to UTF-16 encoding form. Well after a

Re: And Visions of Sugar Plum UTF-8's Dance in Their Heads

2001-06-12 Thread Sarasvati

Kenneth Whistler wrote: > This rampant failure to edit reply-to's is threatening > to bring the wrath of Sarasvati back down on the list, folks. Sarasvati is indeed listening and taking note. Please do edit your replies and exercise some intelligent snipping behaviour. Cheery regards from your,

U+2011 and U+2010

2001-06-12 Thread Patrick Andries

The Unicode Standard 3.0 (page 150) says that "U+2011 NON-BREAKING HYPHEN is present for compatibility with existing standards" as if it shouldn't really be encoded. But isn't its relation to U+2010, the same as the one that opposes SPACE to NO-BREAK SPACE, i.e. a semantic (behavioural) one ? Pat

Re: And Visions of Sugar Plum UTF-8's Dance in Their Heads

2001-06-12 Thread Jianping Yang

Kenneth Whistler wrote: > Jianping wrote: > > > One thing needs to clarify here is that there is no four byte encoding in > > UTF-8S proposal and four byte encoding is illegal but not irregular. As > > everything in UTF-8S is perfect match to UTF-16, any blame to this proposal > > also applies

Re: And Visions of Sugar Plum UTF-8's Dance in Their Heads

2001-06-12 Thread Jianping Yang

Kenneth Whistler wrote: > Jianping responded: > > > Kenneth Whistler wrote: > > > > > Jianping wrote: > > > > > > > One thing needs to clarify here is that there is no four byte encoding in > > > > UTF-8S proposal and four byte encoding is illegal but not irregular. As > > > > everything in UTF

Re: And Visions of Sugar Plum UTF-8's Dance in Their Heads

2001-06-12 Thread Kenneth Whistler

Jianping responded: > Kenneth Whistler wrote: > > > Jianping wrote: > > > > > One thing needs to clarify here is that there is no four byte encoding in > > > UTF-8S proposal and four byte encoding is illegal but not irregular. As > > > everything in UTF-8S is perfect match to UTF-16, any blame t

UTF-8s programming problems

2001-06-12 Thread Carl W. Brown

UTF-8s is reminiscent of a problem that I had installing a certain vendor's terminals. Each screen was about 2K of data. The terminal communications protocol broke the data into 128 byte chunks. Each block had a small header and the terminal would wait for a response before the next block was s

Re: U+2011 and U+2010

2001-06-12 Thread Kenneth Whistler

Patrick Andries asked: > The Unicode Standard 3.0 (page 150) says that "U+2011 NON-BREAKING HYPHEN is > present for compatibility with existing standards" as if it shouldn't really > be encoded. But isn't its relation to U+2010, the same as the one that > opposes SPACE to NO-BREAK SPACE, i.e. a s

Russian Unicode Convertion

2001-06-12 Thread Vladimir Ivanov

Vadim Snurnikov wrote: > How can I read a text in Unicode (Russian) where every Russian letter> is represented like that: D=B6 (or similar)? (The e-mail got> transferred to this format.) What kind of software is used to get E-mail? I recommend Outlook Express 5.0 and above. It allows yo

[OT] Reply editing

2001-06-12 Thread Carl W. Brown

Sarasvati, -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of Sarasvati Sent: Tuesday, June 12, 2001 3:55 PM To: [EMAIL PROTECTED] Subject: Re: And Visions of Sugar Plum UTF-8's Dance in Their Heads >Kenneth Whistler wrote: >> This rampant failure to edit

Re: And Visions of Sugar Plum UTF-8's Dance in Their Heads

2001-06-12 Thread Kenneth Whistler

Jianping said: > > What you finally stated today is that is flat-out > > *illegal* in UTF-8s. That was a missing piece of the puzzle for anyone > > trying to interpret what you are proposing. > > > > In the UTF-8S, there should be no irregular forms, should we repeat the history >again? > Nobo

UTF-8S: a modest proposal

2001-06-12 Thread John Cowan

I would urge Oracle and friends to move this to a different venue, specifically [EMAIL PROTECTED] As far as I can see, UTF-8S does not need either the approval or the disapproval of the Unicode Consortium. If it is actually in use, it needs a label -- and IANA is in the business of assigning suc

Re: U+2011 and U+2010

2001-06-12 Thread Peter_Constable

>In fact, in this particular case, if I recall, the distinctions were >probably considered to be good practice, and not something to be mapped >away. XCCS was often a *model* for early Unicode, rather than a character >encoding that forced the grudging inclusion of many icky "characters" >that we

Re: U+2011 and U+2010

2001-06-12 Thread Michael \(michka\) Kaplan

From: <[EMAIL PROTECTED]> > Out of curiousity, is there documentation on XCCS available anywhere? Check out google.com: it will get about 120+ hits on the words "XCCS standard" and several of them seem vaguely relevant. :-) MichKa Michael Kaplan Trigeminal Software, Inc. http://www.trigeminal.

Re: And Visions of Sugar Plum UTF-8's Dance in Their Heads

2001-06-12 Thread Peter_Constable

On 06/12/2001 09:21:06 PM Jianping Yang wrote: >Nobody except you though that 4-byte is allowed in UTF-8S. Not so! That was under discussion just a few days ago: On 06/07/2001 12:34:49 AM DougEwell2 wrote: [snip] >But definition D29 says that a UTF must round-trip these invalid code point

48 matches

Mail list logo