Quickly and initially stating that I am a relative novice in matters of
unicode and have no knowledge of the details of the other encodings, I am
unable to understand the (part) post copied below.
I am looking at the possibility of having hypercode, ranging from H+11
to H+3FFF, that is,
In response to my posting on this thread, William Overington asked:
Yet the posting appended would seem to imply that at least one standard has
reserved some (all?) of these codes either as "never to be used" codes or as
"might someday be used" codes.
So, my question is this. Which of the
At 19:30 -0800 2001-03-14, John Jenkins wrote:
A bigger consideration than the ones I've mentioned was that having
to rework Extension B to divide it into a BMP portion and a non-BMP
portion would have delayed part 2 of 10646, and that was not
acceptable. Moreover, the Japanese National Body
At 19:50 -0800 2001-03-13, Pierpaolo BERNARDI wrote:
Now that you mention them, someone will make a fuss over their absence.
8-)
They have already been noticed.
--
Michael Everson ** Everson Gunn Teoranta ** http://www.egt.ie
15 Port Chaeimhghein ochtarach; Baile tha Cliath 2; ire/Ireland
First of all, sorry for having started this demoniac thread. It was against
my will -- I was probably possessed. :-)
As someone correctly inferred, what I meant was the Italian "pentagramma":
the five horizontal lines used in musical notation or, by extension,
"musical notation".
BTW,
I wrote:
F666;ANTICHRISTIAN ALTERNATIVE LATIN CAPITAL LETTER
T;Lu;0;L;font 0054N0074;
Ooops! it lowercase t, not uppercase. As I must correct myself, I take the
occasion to make a better decomposition:
F666;ANTICHRISTIAN ALTERNATIVE LATIN SMALL LETTER T;Ll;0;L;compat
A lot of other religions managed to make it into Miscellaneous Symbols
(although if Solomon's Seal/Mogen David is there, I'm not seeing it).
No it is there :
From ITC Zapf dingbats series 100
Stars asteriks and snowflakes
2721 STAR OF DAVID
Bertrand
On Tue, 13 Mar 2001, Curtis Clark wrote:
At 07:50 PM 3/13/01, Pierpaolo BERNARDI wrote:
And no, the Unicode Standard hasn't encoded any pentagrams yet -- or
hexagrams or baphomets, for that matter.
Now that you mention them, someone will make a fuss over their absence.
8-)
A lot of
Doug,
U+235F
(APL is a rather demonic language, isn't it?)
In more ways than one. His pitch fork U+2366 is there too.
You wonder why IBM had so much trouble selling its first PCs. The 5101 cost
more than $30K, came with one tape drive, plus 8K of memory and did not
support floppies. The
On Tuesday, March 13, 2001, at 05:39 PM, Christopher John Fynn wrote:
Some of the characters in Extension B are required for JIS X 0213
support, which is going to be a sine qua non in Japan within a few
years. There was a push a little while ago to put these characters
on the BMP for
On Wednesday, March 14, 2001, at 09:01 AM, John Jenkins wrote:
In any event, it was a politically impossible decision to make. It was
extremely difficult to get agreement to add Vertical Extension A to the
BMP; in the end, that agreement was secured only by promising that no
future
John H. Jenkins [mailto:[EMAIL PROTECTED]]
Some of the characters in Extension B are required for JIS X 0213
support, which is going to be a sine qua non in Japan within a few
years. There was a push a little while ago to put these characters
on the BMP for precisely this
Pentagrams? I haven't seen those... where are they?
Hmmm... This is possibly an Italian word badly Anglicized. I just meant
"musical notation".
Okay. I thought perhaps there were additions to "Misc Symbols" U+2600 ..
U+267F or elsewhere that I had missed.
In Italian,
Keld surmised:
On Fri, Mar 09, 2001 at 10:56:30AM -0800, Yves Arrouye wrote:
Since the U in UTF stands for Unicode, UTF-32 cannot represent more than
what Unicode encodes, which is is 1+ million code points. Otherwise, you're
talking about UCS-4. But I
thought that one of the latest
On Tue, 13 Mar 2001, Kenneth Whistler wrote:
In Italian, "Pentagramma" is a musical term. Cf. "Pentagramma per voce
sola", etc.
Pentagramma = stave, staff (the five horizontal lines on which the notes
are written)
But in English, a pentagram is an occult symbol-- a pentacle (5-pointed
In a message dated 2001-03-13 18:29:12 Pacific Standard Time, [EMAIL PROTECTED]
writes:
But in English, a pentagram is an occult symbol-- a pentacle (5-pointed
star), usually inscribed inside a circle, and associated with witchcraft,
sorcery, and (by some) Satanism.
See
At 07:50 PM 3/13/01, Pierpaolo BERNARDI wrote:
And no, the Unicode Standard hasn't encoded any pentagrams yet -- or
hexagrams or baphomets, for that matter.
Now that you mention them, someone will make a fuss over their absence.
8-)
A lot of other religions managed to make it into
At 10:13 -0800 2001-03-11, John H. Jenkins wrote:
Au contraire, Deseret is quite well-known in LDS circles. Most
Mormons who grew up in the Church have at least heard of it. It
isn't extensively used, by any means, but there really are people out
there who do want to use it on their computers.
On Mon, 12 Mar 2001, Marco Cimarosti wrote:
Thomas Chan wrote:
How about the case of a retailer who needs to deal with parts for
elevators and needs U+282E2, lip 'elevator'? Or neckties, requiring
U+27639, taai 'tie'.
I am not seeking excuses to not implement UTF-16 -- rather examples
On Sunday, March 11, 2001, at 12:26 PM, Lars Marius Garshol wrote:
How will the Japanese encode these JIS X 0213 characters? That is,
what effect, if any, will this have on the legacy Japanese character
encodings? Are there plans to extend ISO 2022-JP or EUC-JP, or for
some entirely new
At 9:59 AM -0800 3/9/01, Marco Cimarosti wrote:
Well, I guess that Chu-Nm and Deseret are hardly known out of this mailing
list.
Au contraire, Deseret is quite well-known in LDS circles. Most
Mormons who grew up in the Church have at least heard of it. It
isn't extensively used, by any means,
At 1:17 AM -0800 3/9/01, Marco Cimarosti wrote:
I am wondering especially about the CJK characters in Extension B. We all
know that the majority of them are rare, ancient or idiosyncratic
characters, but I am not quite sure that this is true for *all* of them.
Some of the characters in
Addison P. Phillips wrote:
[...]
currently there are no characters "up there" this isn't a really big
deal. Shortly, when Unicode 3.1 is official, there will be 40K or so
characters in the supplemental planes... but they'll be
relatively rare.
This reminds me of a question that I wanted to
On 03/08/2001 07:40:25 PM "Ayers, Mike" wrote:
If you really want to finish the job, there's always UTF-32, which
should do rather nicely until we meet the space aliens aith the
4,293,853,186 character alphabet!
Um... no. The 1,113,023 character alphabet (one more than the encodable
scalar
On Fri, 9 Mar 2001, Marco Cimarosti wrote:
Addison P. Phillips wrote:
[...]
currently there are no characters "up there" this isn't a really big
deal. Shortly, when Unicode 3.1 is official, there will be 40K or so
characters in the supplemental planes... but they'll be
relatively
Thomas Chan wrote:
Does it exist at least one character U+ that is
commonly used in at least one modern language?
How about music and math notation?
About the music symbols in Unicode 3.1, they are just the basic building
blocks for it. So I assume that handling surrogates (or
On Fri, Mar 09, 2001 at 10:56:30AM -0800, Yves Arrouye wrote:
Since the U in UTF stands for Unicode, UTF-32 cannot represent more than
what Unicode encodes, which is is 1+ million code points. Otherwise, you're
talking about UCS-4. But I
thought that one of the latest revs of ISO 10646
Yves Arrouye wrote:
On 03/08/2001 07:40:25 PM "Ayers, Mike" wrote:
If you really want to finish the job, there's always
UTF-32, which
should do rather nicely until we meet the space aliens aith the
4,293,853,186 character alphabet!
Um... no. The 1,113,023 character
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]
On 03/09/2001 12:53:57 PM "Ayers, Mike" wrote:
Um... no. The UTF-32 CES can handle much more than the current
space of the Unicode CCS. As far as I can tell, it's good
to go until we
need more than 32 bits to represent the ACR.
Ienup Sung wrote:
Well, on the contrary to what you said, it is a very good option since you
don't have to know anything about what's inside the character bytes which
means by using the mblen/mbrlen, you can achieve codeset independent
programming that will support not only Unicode/UTF-8
have today but even then the mblen
implementations were not clumsy at all but elegant and lean as they can be in
most of cases.
With regards,
Ienup
] Date: Fri, 09 Mar 2001 12:09:08 -0800 (GMT-0800)
] From: Antoine Leca [EMAIL PROTECTED]
] Subject: Re: UTF8 vs. Unicode (UTF16) in code
] To: Unicode L
On Fri, 9 Mar 2001, Marco Cimarosti wrote:
It is not very clear to me what is included in Extension B: how is it
possible to know something more about it?
Look at DUTR #27[1] (2001.2.23), section 10.1, and see if any of those
sources are ones that contain characters that are important to you.
Since the U in UTF stands for Unicode, UTF-32 cannot
represent more than
what Unicode encodes, which is is 1+ million code points.
Otherwise, you're
talking about UCS-4. But I
thought that one of the latest revs of ISO 10646
explicitely specified that
UCS-4 will never encode more
Generally, UTF-8 is a quicker-and-dirtier method of getting Unicode
support into a legacy product. The work that goes into supporting UTF-8
in 8-bit clean code is analogous to multibyte enabling: you have to
provide functions for moving the pointer about, searching, etc.
This *can* be less work
[EMAIL PROTECTED]
] Subject: UTF8 vs. Unicode (UTF16) in code
] To: Unicode List [EMAIL PROTECTED]
] MIME-version: 1.0
]
] We've got an English-language only product which makes use of
] single-byte character strings throughout the code. For our next
] release, we'd like to internationalize
: Re: UTF8 vs. Unicode (UTF16) in code
] X-Sender: [EMAIL PROTECTED]
] To: Ienup Sung [EMAIL PROTECTED]
] Cc: Unicode List [EMAIL PROTECTED]
] MIME-version: 1.0
]
] Well
]
] Actually, there is a significant difference between being "UTF-8
] ignorant" and "UTF-16 ignor
Mike" [EMAIL PROTECTED]
] Subject: RE: UTF8 vs. Unicode (UTF16) in code
] To: 'Ienup Sung' [EMAIL PROTECTED], Unicode List [EMAIL PROTECTED]
] MIME-version: 1.0
]
]
] If you really want to finish the job, there's always UTF-32, which
] should do rather nicely until we meet the space a
-(
MichKa
Michael Kaplan
Trigeminal Software, Inc.
http://www.trigeminal.com/
- Original Message -
From: "Ienup Sung" [EMAIL PROTECTED]
To: "Unicode List" [EMAIL PROTECTED]
Sent: Thursday, March 08, 2001 5:21 PM
Subject: Re: UTF8 vs. Unicode (UTF16) in code
I think we
We've got an English-language only product which makes use of
single-byte character strings throughout the code. For our next
release, we'd like to internationalize it (Unicode) be able to store
data in UTF8 format (a requirement for data exchange).
We're considering between using UTF8 within
39 matches
Mail list logo