Doug Ewell wrote:
Markus Scherer wrote:
"claim"? That hurts...
I did measure these things, and the numbers in the table are all from
my measurements. I also included the type of machine I used, etc.
(http://www.unicode.org/notes/tn6/#Performance)
Certainly I would never accuse Markus of falsifyin
Quoting Philippe Verdy <[EMAIL PROTECTED]>:
> From: "Jon Hanna" <[EMAIL PROTECTED]>
> > Quoting Marco Cimarosti <[EMAIL PROTECTED]>:
> >
> > > Jon Hanna wrote:
> > > > I refuse to rename my UTF-81920!
> > >
> > > Doug, Shlomi, there's a new one out there!
> > > Jon, would you mind describing it?
>
From: "Jon Hanna" <[EMAIL PROTECTED]>
> Quoting Marco Cimarosti <[EMAIL PROTECTED]>:
>
> > Jon Hanna wrote:
> > > I refuse to rename my UTF-81920!
> >
> > Doug, Shlomi, there's a new one out there!
> > Jon, would you mind describing it?
>
> There are two different UTF-81920s (the resultant ambiguit
Quoting Marco Cimarosti <[EMAIL PROTECTED]>:
> Jon Hanna wrote:
> > I refuse to rename my UTF-81920!
>
> Doug, Shlomi, there's a new one out there!
>
> Jon, would you mind describing it?
There are two different UTF-81920s (the resultant ambiguity is very much in the
spirit of UTF-81920).
The f
Jon Hanna wrote:
> I refuse to rename my UTF-81920!
Doug, Shlomi, there's a new one out there!
Jon, would you mind describing it?
_ Marco
> By the way, I don't think that there's an official reference that attributes
> the acronym "UTF-9" to any of these encoding forms. I think that if "UTF-9"
> is used it should be agreed by Unicode as being an official unique
> representation.
I refuse to rename my UTF-81920!
--
Jon Hanna
Markus Scherer wrote:
>> BOCU-1 might solve this problem, but multiplying and dividing by 243
>> doesn't sound faster than UTF-8 bit-shifting. (I'm still amazed by
>> the claim in UTN #6 that converting Hindi text between UTF-16 and
>> BOCU-1 took only 45% as long as converting it between UTF-16
Kenneth Whistler wrote:
>> I have seen several other informal proposals for "UTF-*" forms/
>> schemes. All this is just confusive, and their authors should imagine
>> their own names for reference. What do you think of this idea?
>
> It is, indeed, "confusive". Some of us have deliberately contri
Philippe suggested:
> I don't object proposals to define new "UTF-*" forms,
> but this should still be
> proposals for an otherwise distinctly named encoding form, chosen by the
> proposal author out of the "UTF-*" naming space.
The UTC clearly does object to proposals to define "new 'UTF-*' for
RE: Unicode forms for internal storage - BOCU-1 speedFrom: Mike Ayers
> The author called it "UTF-9". Therefore we call it the same thing so
anyone
> knows what we're talking about. It may not be ideal, but it's
intelligible.
> Why should anyone assume that something is an international standard
Title: RE: Unicode forms for internal storage - BOCU-1 speed
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On
> Behalf Of Philippe Verdy
> Sent: Thursday, January 22, 2004 1:49 PM
> I think then that
> "UTF-9" is a bad
> acronym to refer to a speci
From: <[EMAIL PROTECTED]>
To: "Philippe Verdy" <[EMAIL PROTECTED]>
Cc: "Markus Scherer" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
Sent: Thursday, January 22, 2004 10:26 PM
Subject: Re: Unicode forms for internal storage - BOCU-1 speed
> Philippe Ver
Philippe Verdy scripsit:
> Is the other competing UTF-9 from Jerome Abela this one:
No. Abela's version preserves all of 00-7F and A0-FF, packing all the rest
of Unicode into sequences beginning with any of 80-9F.
--
XQuery Blueberry DOMJohn Cowan
Entity parser dot-c
From: <[EMAIL PROTECTED]>
> Mark Crispin's UTF-9 (not to be confused with Jerome Abela's) is also
> excellent, although most of us don't have 36-bit systems, for which it
> makes sense. A precis:
>
> Code points (base 2) UTF-9 code units (base 2)
> 0abcdefgh 0abcdefgh
> 0abcdefghij
m: <[EMAIL PROTECTED]>
To: "Markus Scherer" <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>
Sent: Thu, 2004 Jan 22 10:50
Subject: Re: Unicode forms for internal storage - BOCU-1 speed
> Markus Scherer scripsit:
>
> > UTF-8 is useful because it's sim
Markus Scherer scripsit:
> UTF-8 is useful because it's simple, and supported just about everywhere -
> but it's otherwise hardly optimal for anything.
You entirely omit its principal advantage, sine qua non: it's maximally
ASCII-compatible, using bytes 0x00 to 0x7F to represent ASCII character
Doug Ewell wrote:
BOCU-1 might solve this problem, but multiplying and dividing by 243
doesn't sound faster than UTF-8 bit-shifting. (I'm still amazed by the
claim in UTN #6 that converting Hindi text between UTF-16 and BOCU-1
took only 45% as long as converting it between UTF-16 and UTF-8.)
"clai
17 matches
Mail list logo