Re: UTF8 vs. Unicode (UTF16) in code

2001-03-16 Thread William Overington
Quickly and initially stating that I am a relative novice in matters of unicode and have no knowledge of the details of the other encodings, I am unable to understand the (part) post copied below. I am looking at the possibility of having hypercode, ranging from H+11 to H+3FFF, that is,

Re: UTF8 vs. Unicode (UTF16) in code

2001-03-16 Thread Kenneth Whistler
In response to my posting on this thread, William Overington asked: Yet the posting appended would seem to imply that at least one standard has reserved some (all?) of these codes either as "never to be used" codes or as "might someday be used" codes. So, my question is this. Which of the

Re: UTF8 vs. Unicode (UTF16) in code

2001-03-15 Thread Michael Everson
At 19:30 -0800 2001-03-14, John Jenkins wrote: A bigger consideration than the ones I've mentioned was that having to rework Extension B to divide it into a BMP portion and a non-BMP portion would have delayed part 2 of 10646, and that was not acceptable. Moreover, the Japanese National Body

Re: Pentagrams: (was: RE: UTF8 vs. Unicode (UTF16) in code)

2001-03-14 Thread Michael Everson
At 19:50 -0800 2001-03-13, Pierpaolo BERNARDI wrote: Now that you mention them, someone will make a fuss over their absence. 8-) They have already been noticed. -- Michael Everson ** Everson Gunn Teoranta ** http://www.egt.ie 15 Port Chaeimhghein ochtarach; Baile tha Cliath 2; ire/Ireland

RE: Pentagrams: (was: RE: UTF8 vs. Unicode (UTF16) in code)

2001-03-14 Thread Marco Cimarosti
First of all, sorry for having started this demoniac thread. It was against my will -- I was probably possessed. :-) As someone correctly inferred, what I meant was the Italian "pentagramma": the five horizontal lines used in musical notation or, by extension, "musical notation". BTW,

RE: Pentagrams: (was: RE: UTF8 vs. Unicode (UTF16) in code)

2001-03-14 Thread Marco Cimarosti
I wrote: F666;ANTICHRISTIAN ALTERNATIVE LATIN CAPITAL LETTER T;Lu;0;L;font 0054N0074; Ooops! it lowercase t, not uppercase. As I must correct myself, I take the occasion to make a better decomposition: F666;ANTICHRISTIAN ALTERNATIVE LATIN SMALL LETTER T;Ll;0;L;compat

Re: Pentagrams: (was: RE: UTF8 vs. Unicode (UTF16) in code)

2001-03-14 Thread Bertrand Laidain
A lot of other religions managed to make it into Miscellaneous Symbols (although if Solomon's Seal/Mogen David is there, I'm not seeing it). No it is there : From ITC Zapf dingbats series 100 Stars asteriks and snowflakes 2721 STAR OF DAVID Bertrand

Re: Pentagrams: (was: RE: UTF8 vs. Unicode (UTF16) in code)

2001-03-14 Thread Daniel Biddle
On Tue, 13 Mar 2001, Curtis Clark wrote: At 07:50 PM 3/13/01, Pierpaolo BERNARDI wrote: And no, the Unicode Standard hasn't encoded any pentagrams yet -- or hexagrams or baphomets, for that matter. Now that you mention them, someone will make a fuss over their absence. 8-) A lot of

RE: Pentagrams: (was: RE: UTF8 vs. Unicode (UTF16) in code)

2001-03-14 Thread Carl W. Brown
Doug, U+235F (APL is a rather demonic language, isn't it?) In more ways than one. His pitch fork U+2366 is there too. You wonder why IBM had so much trouble selling its first PCs. The 5101 cost more than $30K, came with one tape drive, plus 8K of memory and did not support floppies. The

Re: UTF8 vs. Unicode (UTF16) in code

2001-03-14 Thread John Jenkins
On Tuesday, March 13, 2001, at 05:39 PM, Christopher John Fynn wrote: Some of the characters in Extension B are required for JIS X 0213 support, which is going to be a sine qua non in Japan within a few years. There was a push a little while ago to put these characters on the BMP for

Re: UTF8 vs. Unicode (UTF16) in code

2001-03-14 Thread John Jenkins
On Wednesday, March 14, 2001, at 09:01 AM, John Jenkins wrote: In any event, it was a politically impossible decision to make. It was extremely difficult to get agreement to add Vertical Extension A to the BMP; in the end, that agreement was secured only by promising that no future

RE: UTF8 vs. Unicode (UTF16) in code

2001-03-13 Thread Christopher John Fynn
John H. Jenkins [mailto:[EMAIL PROTECTED]] Some of the characters in Extension B are required for JIS X 0213 support, which is going to be a sine qua non in Japan within a few years. There was a push a little while ago to put these characters on the BMP for precisely this

Pentagrams: (was: RE: UTF8 vs. Unicode (UTF16) in code)

2001-03-13 Thread Kenneth Whistler
Pentagrams? I haven't seen those... where are they? Hmmm... This is possibly an Italian word badly Anglicized. I just meant "musical notation". Okay. I thought perhaps there were additions to "Misc Symbols" U+2600 .. U+267F or elsewhere that I had missed. In Italian,

Re: UTF8 vs. Unicode (UTF16) in code

2001-03-13 Thread Kenneth Whistler
Keld surmised: On Fri, Mar 09, 2001 at 10:56:30AM -0800, Yves Arrouye wrote: Since the U in UTF stands for Unicode, UTF-32 cannot represent more than what Unicode encodes, which is is 1+ million code points. Otherwise, you're talking about UCS-4. But I thought that one of the latest

Re: Pentagrams: (was: RE: UTF8 vs. Unicode (UTF16) in code)

2001-03-13 Thread Pierpaolo BERNARDI
On Tue, 13 Mar 2001, Kenneth Whistler wrote: In Italian, "Pentagramma" is a musical term. Cf. "Pentagramma per voce sola", etc. Pentagramma = stave, staff (the five horizontal lines on which the notes are written) But in English, a pentagram is an occult symbol-- a pentacle (5-pointed

Re: Pentagrams: (was: RE: UTF8 vs. Unicode (UTF16) in code)

2001-03-13 Thread DougEwell2
In a message dated 2001-03-13 18:29:12 Pacific Standard Time, [EMAIL PROTECTED] writes: But in English, a pentagram is an occult symbol-- a pentacle (5-pointed star), usually inscribed inside a circle, and associated with witchcraft, sorcery, and (by some) Satanism. See

Re: Pentagrams: (was: RE: UTF8 vs. Unicode (UTF16) in code)

2001-03-13 Thread Curtis Clark
At 07:50 PM 3/13/01, Pierpaolo BERNARDI wrote: And no, the Unicode Standard hasn't encoded any pentagrams yet -- or hexagrams or baphomets, for that matter. Now that you mention them, someone will make a fuss over their absence. 8-) A lot of other religions managed to make it into

RE: UTF8 vs. Unicode (UTF16) in code

2001-03-12 Thread Michael Everson
At 10:13 -0800 2001-03-11, John H. Jenkins wrote: Au contraire, Deseret is quite well-known in LDS circles. Most Mormons who grew up in the Church have at least heard of it. It isn't extensively used, by any means, but there really are people out there who do want to use it on their computers.

RE: UTF8 vs. Unicode (UTF16) in code

2001-03-12 Thread Thomas Chan
On Mon, 12 Mar 2001, Marco Cimarosti wrote: Thomas Chan wrote: How about the case of a retailer who needs to deal with parts for elevators and needs U+282E2, lip 'elevator'? Or neckties, requiring U+27639, taai 'tie'. I am not seeking excuses to not implement UTF-16 -- rather examples

Re: UTF8 vs. Unicode (UTF16) in code

2001-03-12 Thread John Jenkins
On Sunday, March 11, 2001, at 12:26 PM, Lars Marius Garshol wrote: How will the Japanese encode these JIS X 0213 characters? That is, what effect, if any, will this have on the legacy Japanese character encodings? Are there plans to extend ISO 2022-JP or EUC-JP, or for some entirely new

RE: UTF8 vs. Unicode (UTF16) in code

2001-03-11 Thread John H. Jenkins
At 9:59 AM -0800 3/9/01, Marco Cimarosti wrote: Well, I guess that Chu-Nm and Deseret are hardly known out of this mailing list. Au contraire, Deseret is quite well-known in LDS circles. Most Mormons who grew up in the Church have at least heard of it. It isn't extensively used, by any means,

RE: UTF8 vs. Unicode (UTF16) in code

2001-03-11 Thread John H. Jenkins
At 1:17 AM -0800 3/9/01, Marco Cimarosti wrote: I am wondering especially about the CJK characters in Extension B. We all know that the majority of them are rare, ancient or idiosyncratic characters, but I am not quite sure that this is true for *all* of them. Some of the characters in

RE: UTF8 vs. Unicode (UTF16) in code

2001-03-09 Thread Marco Cimarosti
Addison P. Phillips wrote: [...] currently there are no characters "up there" this isn't a really big deal. Shortly, when Unicode 3.1 is official, there will be 40K or so characters in the supplemental planes... but they'll be relatively rare. This reminds me of a question that I wanted to

RE: UTF8 vs. Unicode (UTF16) in code

2001-03-09 Thread Peter_Constable
On 03/08/2001 07:40:25 PM "Ayers, Mike" wrote: If you really want to finish the job, there's always UTF-32, which should do rather nicely until we meet the space aliens aith the 4,293,853,186 character alphabet! Um... no. The 1,113,023 character alphabet (one more than the encodable scalar

RE: UTF8 vs. Unicode (UTF16) in code

2001-03-09 Thread Thomas Chan
On Fri, 9 Mar 2001, Marco Cimarosti wrote: Addison P. Phillips wrote: [...] currently there are no characters "up there" this isn't a really big deal. Shortly, when Unicode 3.1 is official, there will be 40K or so characters in the supplemental planes... but they'll be relatively

RE: UTF8 vs. Unicode (UTF16) in code

2001-03-09 Thread Marco Cimarosti
Thomas Chan wrote: Does it exist at least one character U+ that is commonly used in at least one modern language? How about music and math notation? About the music symbols in Unicode 3.1, they are just the basic building blocks for it. So I assume that handling surrogates (or

Re: UTF8 vs. Unicode (UTF16) in code

2001-03-09 Thread Keld Jørn Simonsen
On Fri, Mar 09, 2001 at 10:56:30AM -0800, Yves Arrouye wrote: Since the U in UTF stands for Unicode, UTF-32 cannot represent more than what Unicode encodes, which is is 1+ million code points. Otherwise, you're talking about UCS-4. But I thought that one of the latest revs of ISO 10646

Re: UTF8 vs. Unicode (UTF16) in code

2001-03-09 Thread Allan Chau
Yves Arrouye wrote: On 03/08/2001 07:40:25 PM "Ayers, Mike" wrote: If you really want to finish the job, there's always UTF-32, which should do rather nicely until we meet the space aliens aith the 4,293,853,186 character alphabet! Um... no. The 1,113,023 character

RE: UTF8 vs. Unicode (UTF16) in code

2001-03-09 Thread Ayers, Mike
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] On 03/09/2001 12:53:57 PM "Ayers, Mike" wrote: Um... no. The UTF-32 CES can handle much more than the current space of the Unicode CCS. As far as I can tell, it's good to go until we need more than 32 bits to represent the ACR.

Re: UTF8 vs. Unicode (UTF16) in code

2001-03-09 Thread Antoine Leca
Ienup Sung wrote: Well, on the contrary to what you said, it is a very good option since you don't have to know anything about what's inside the character bytes which means by using the mblen/mbrlen, you can achieve codeset independent programming that will support not only Unicode/UTF-8

Re: UTF8 vs. Unicode (UTF16) in code

2001-03-09 Thread Ienup Sung
have today but even then the mblen implementations were not clumsy at all but elegant and lean as they can be in most of cases. With regards, Ienup ] Date: Fri, 09 Mar 2001 12:09:08 -0800 (GMT-0800) ] From: Antoine Leca [EMAIL PROTECTED] ] Subject: Re: UTF8 vs. Unicode (UTF16) in code ] To: Unicode L

RE: UTF8 vs. Unicode (UTF16) in code

2001-03-09 Thread Thomas Chan
On Fri, 9 Mar 2001, Marco Cimarosti wrote: It is not very clear to me what is included in Extension B: how is it possible to know something more about it? Look at DUTR #27[1] (2001.2.23), section 10.1, and see if any of those sources are ones that contain characters that are important to you.

Re: UTF8 vs. Unicode (UTF16) in code

2001-03-09 Thread Yves Arrouye
Since the U in UTF stands for Unicode, UTF-32 cannot represent more than what Unicode encodes, which is is 1+ million code points. Otherwise, you're talking about UCS-4. But I thought that one of the latest revs of ISO 10646 explicitely specified that UCS-4 will never encode more

Re: UTF8 vs. Unicode (UTF16) in code

2001-03-08 Thread addison
Generally, UTF-8 is a quicker-and-dirtier method of getting Unicode support into a legacy product. The work that goes into supporting UTF-8 in 8-bit clean code is analogous to multibyte enabling: you have to provide functions for moving the pointer about, searching, etc. This *can* be less work

Re: UTF8 vs. Unicode (UTF16) in code

2001-03-08 Thread Ienup Sung
[EMAIL PROTECTED] ] Subject: UTF8 vs. Unicode (UTF16) in code ] To: Unicode List [EMAIL PROTECTED] ] MIME-version: 1.0 ] ] We've got an English-language only product which makes use of ] single-byte character strings throughout the code. For our next ] release, we'd like to internationalize

RE: UTF8 vs. Unicode (UTF16) in code

2001-03-08 Thread Ayers, Mike
: Re: UTF8 vs. Unicode (UTF16) in code ] X-Sender: [EMAIL PROTECTED] ] To: Ienup Sung [EMAIL PROTECTED] ] Cc: Unicode List [EMAIL PROTECTED] ] MIME-version: 1.0 ] ] Well ] ] Actually, there is a significant difference between being "UTF-8 ] ignorant" and "UTF-16 ignor

RE: UTF8 vs. Unicode (UTF16) in code

2001-03-08 Thread Ienup Sung
Mike" [EMAIL PROTECTED] ] Subject: RE: UTF8 vs. Unicode (UTF16) in code ] To: 'Ienup Sung' [EMAIL PROTECTED], Unicode List [EMAIL PROTECTED] ] MIME-version: 1.0 ] ] ] If you really want to finish the job, there's always UTF-32, which ] should do rather nicely until we meet the space a

Re: UTF8 vs. Unicode (UTF16) in code

2001-03-08 Thread Michael \(michka\) Kaplan
-( MichKa Michael Kaplan Trigeminal Software, Inc. http://www.trigeminal.com/ - Original Message - From: "Ienup Sung" [EMAIL PROTECTED] To: "Unicode List" [EMAIL PROTECTED] Sent: Thursday, March 08, 2001 5:21 PM Subject: Re: UTF8 vs. Unicode (UTF16) in code I think we

UTF8 vs. Unicode (UTF16) in code

2001-03-07 Thread Allan Chau
We've got an English-language only product which makes use of single-byte character strings throughout the code. For our next release, we'd like to internationalize it (Unicode) be able to store data in UTF8 format (a requirement for data exchange). We're considering between using UTF8 within