Re: Unicode forms for internal storage - BOCU-1 speed

2004-01-23 Thread Markus Scherer
Doug Ewell wrote: Markus Scherer wrote: "claim"? That hurts... I did measure these things, and the numbers in the table are all from my measurements. I also included the type of machine I used, etc. (http://www.unicode.org/notes/tn6/#Performance) Certainly I would never accuse Markus of falsifyin

Re: [OT] UTF-81920 was RE: Unicode forms for internal storage - BOCU-1 speed

2004-01-23 Thread Jon Hanna
Quoting Philippe Verdy <[EMAIL PROTECTED]>: > From: "Jon Hanna" <[EMAIL PROTECTED]> > > Quoting Marco Cimarosti <[EMAIL PROTECTED]>: > > > > > Jon Hanna wrote: > > > > I refuse to rename my UTF-81920! > > > > > > Doug, Shlomi, there's a new one out there! > > > Jon, would you mind describing it? >

Re: [OT] UTF-81920 was RE: Unicode forms for internal storage - BOCU-1 speed

2004-01-23 Thread Philippe Verdy
From: "Jon Hanna" <[EMAIL PROTECTED]> > Quoting Marco Cimarosti <[EMAIL PROTECTED]>: > > > Jon Hanna wrote: > > > I refuse to rename my UTF-81920! > > > > Doug, Shlomi, there's a new one out there! > > Jon, would you mind describing it? > > There are two different UTF-81920s (the resultant ambiguit

[OT] UTF-81920 was RE: Unicode forms for internal storage - BOCU-1 speed

2004-01-23 Thread Jon Hanna
Quoting Marco Cimarosti <[EMAIL PROTECTED]>: > Jon Hanna wrote: > > I refuse to rename my UTF-81920! > > Doug, Shlomi, there's a new one out there! > > Jon, would you mind describing it? There are two different UTF-81920s (the resultant ambiguity is very much in the spirit of UTF-81920). The f

RE: Unicode forms for internal storage - BOCU-1 speed

2004-01-23 Thread Marco Cimarosti
Jon Hanna wrote: > I refuse to rename my UTF-81920! Doug, Shlomi, there's a new one out there! Jon, would you mind describing it? _ Marco

Re: Unicode forms for internal storage - BOCU-1 speed

2004-01-23 Thread Jon Hanna
> By the way, I don't think that there's an official reference that attributes > the acronym "UTF-9" to any of these encoding forms. I think that if "UTF-9" > is used it should be agreed by Unicode as being an official unique > representation. I refuse to rename my UTF-81920! -- Jon Hanna

Re: Unicode forms for internal storage - BOCU-1 speed

2004-01-23 Thread Doug Ewell
Markus Scherer wrote: >> BOCU-1 might solve this problem, but multiplying and dividing by 243 >> doesn't sound faster than UTF-8 bit-shifting. (I'm still amazed by >> the claim in UTN #6 that converting Hindi text between UTF-16 and >> BOCU-1 took only 45% as long as converting it between UTF-16

Re: Unicode forms for internal storage - BOCU-1 speed

2004-01-23 Thread Doug Ewell
Kenneth Whistler wrote: >> I have seen several other informal proposals for "UTF-*" forms/ >> schemes. All this is just confusive, and their authors should imagine >> their own names for reference. What do you think of this idea? > > It is, indeed, "confusive". Some of us have deliberately contri

Re: Unicode forms for internal storage - BOCU-1 speed

2004-01-22 Thread Kenneth Whistler
Philippe suggested: > I don't object proposals to define new "UTF-*" forms, > but this should still be > proposals for an otherwise distinctly named encoding form, chosen by the > proposal author out of the "UTF-*" naming space. The UTC clearly does object to proposals to define "new 'UTF-*' for

Re: Unicode forms for internal storage - BOCU-1 speed

2004-01-22 Thread Philippe Verdy
RE: Unicode forms for internal storage - BOCU-1 speedFrom: Mike Ayers > The author called it "UTF-9". Therefore we call it the same thing so anyone > knows what we're talking about. It may not be ideal, but it's intelligible. > Why should anyone assume that something is an international standard

RE: Unicode forms for internal storage - BOCU-1 speed

2004-01-22 Thread Mike Ayers
Title: RE: Unicode forms for internal storage - BOCU-1 speed > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On > Behalf Of Philippe Verdy > Sent: Thursday, January 22, 2004 1:49 PM > I think then that > "UTF-9" is a bad > acronym to refer to a speci

Re: Unicode forms for internal storage - BOCU-1 speed

2004-01-22 Thread Philippe Verdy
From: <[EMAIL PROTECTED]> To: "Philippe Verdy" <[EMAIL PROTECTED]> Cc: "Markus Scherer" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]> Sent: Thursday, January 22, 2004 10:26 PM Subject: Re: Unicode forms for internal storage - BOCU-1 speed > Philippe Ver

Re: Unicode forms for internal storage - BOCU-1 speed

2004-01-22 Thread jcowan
Philippe Verdy scripsit: > Is the other competing UTF-9 from Jerome Abela this one: No. Abela's version preserves all of 00-7F and A0-FF, packing all the rest of Unicode into sequences beginning with any of 80-9F. -- XQuery Blueberry DOMJohn Cowan Entity parser dot-c

Re: Unicode forms for internal storage - BOCU-1 speed

2004-01-22 Thread Philippe Verdy
From: <[EMAIL PROTECTED]> > Mark Crispin's UTF-9 (not to be confused with Jerome Abela's) is also > excellent, although most of us don't have 36-bit systems, for which it > makes sense. A precis: > > Code points (base 2) UTF-9 code units (base 2) > 0abcdefgh 0abcdefgh > 0abcdefghij

Re: Unicode forms for internal storage - BOCU-1 speed

2004-01-22 Thread Mark Davis
m: <[EMAIL PROTECTED]> To: "Markus Scherer" <[EMAIL PROTECTED]> Cc: <[EMAIL PROTECTED]> Sent: Thu, 2004 Jan 22 10:50 Subject: Re: Unicode forms for internal storage - BOCU-1 speed > Markus Scherer scripsit: > > > UTF-8 is useful because it's sim

Re: Unicode forms for internal storage - BOCU-1 speed

2004-01-22 Thread jcowan
Markus Scherer scripsit: > UTF-8 is useful because it's simple, and supported just about everywhere - > but it's otherwise hardly optimal for anything. You entirely omit its principal advantage, sine qua non: it's maximally ASCII-compatible, using bytes 0x00 to 0x7F to represent ASCII character

Re: Unicode forms for internal storage - BOCU-1 speed

2004-01-22 Thread Markus Scherer
Doug Ewell wrote: BOCU-1 might solve this problem, but multiplying and dividing by 243 doesn't sound faster than UTF-8 bit-shifting. (I'm still amazed by the claim in UTN #6 that converting Hindi text between UTF-16 and BOCU-1 took only 45% as long as converting it between UTF-16 and UTF-8.) "clai