Pardon my ignorance, but what is an astral character? I can't find a
definition on the Unicode site, and Google mostly comes up with hits that
seem to have to do with Tarot! (Does this confirm the long-held suspicion
that Macs run on magic? ;^)
Thanks!
- rick cameron
-Original Message-
Um, would those characters then dwell in the astral plane?
No question, a much more appealing term!
Thanks
- rick cameron
-Original Message-
From: John H. Jenkins [mailto:[EMAIL PROTECTED]]
Sent: Tuesday, 18 December 2001 8:09
To: Unicode List
Subject: Re: Plane One use, was Re: HTML
There is no such thing as an astral character in Unicode or 10646.
But someone did suggest that as a name for non-BMP characters before
one settled on the term supplementary character.
/kent k
-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On
Behalf Of
The allusion to Tarot isn't entirely specious! Code Planes higher than the
BMP were referred to as higher planes... or astral planes. Therefore,
these Code planes were obviously populated with astral characters.
But I never did figure out if everything above Code Plane 16 was above or
still
Rick Cameron suggested:
Would it be useful to have one term for planes 1-16 and another for all
planes above the BMP? Perhaps the former are astral and the latter
celestial. ;^)
(I'm half-serious, since according to my suggestion UTF-16 can encode all
astral characters, but not all
Clive said:
But I never did figure out if everything above Code Plane 16 was above or
still below the Heaviside Layer... ;-)}
As for the cats, in Up up up past the Russell Hotel, up up up to the Heaviside layer?
Actually, my surmise, given the fact that the code points past U+10 are
That's interesting - I had assumed that there was no maximum to the scalar
values in Unicode, just that each encoding had its limits.
In my copy of The Unicode Standard Version 3.0, I can't find an explicit
statement that scalar values in Unicode are only in the range U+0 to
U+10 - but this
The region beyond U+10 contains photos of the editors
of The Unicode Standard, Version 3.0.
Misha
On 18/12/2001 18:09:11 Kenneth Whistler wrote:
Clive said:
But I never did figure out if everything above Code Plane 16 was above or
still below the Heaviside Layer... ;-)}
As for the
At 10:09 -0800 2001-12-18, Kenneth Whistler wrote:
Perhaps, however, from the murmer of chilled microwaves emanating from the
vicinity of the noncharacters U+10FFFE and U+10, at the far nether
reaches of the astral planes, we can find patterns that will allow us to
interpret the earliest
The Java 2 Platform SE v1.4 Regular Expressions package
(java.util.regex) which is in beta supports this and other characters
mentioned in UTR #13.
cf. http://java.sun.com/j2se/1.4/docs/api/java/util/regex/Pattern.html.
Yes, I am aware of this UTR.
Is it implemented in any common programming
At 10:38 AM 12/18/01 -0800, Rick Cameron wrote:
It looks like UCS-2 and UCS-4 are defined in ISO 10646. Does that standard
restrict the valid range of UCS-4 to 0..10?
It will with AMD1 to ISO/IEC 10646-1:2000 which is expected to pass final
balloting and head for publication in 2002.
If
Rick Cameron asked:
Are you planning to add an explicit statement to the Unicode standard that
the valid range for scalar values is 0..10? (Or is such a statement
there, and I've just missed it?)
Unicode 3.0, p. 45, D28:
Unicode scalar value: a number N from 0 to 10sub16/sub...
and
At 03:38 PM 12/18/01 -0800, Rick Cameron wrote:
Are you planning to add an explicit statement to the Unicode standard that
the valid range for scalar values is 0..10? (Or is such a statement
there, and I've just missed it?)
see below:
In particular, as the use of 32-bit variables to hold
OK, so it is there in 3.0. But in the section on Surrogates? And on
Transformations? A little obscure.
I expected to find it in section 2.3, for example, where the major encoding
forms are being described; or even earlier - say in 1.1 Coverage. Surely the
range of valid scalar values is an
Tex,
Thanks for this and the several private responses.
For anyone interested, in addition to the Microsoft page:
http://www.microsoft.com/hk/hkscs/
The HK Gov't has a web page, fonts and mapping tables:
http://www.info.gov.hk/digital21/eng/hkscs/introduction.html
And to add to the
Thanks for this and the several private responses.
For anyone interested, in addition to the Microsoft page:
http://www.microsoft.com/hk/hkscs/
The HK Gov't has a web page, fonts and mapping tables:
http://www.info.gov.hk/digital21/eng/hkscs/introduction.html
Oracle gave a nice paper at a
Rick continued:
OK, so it is there in 3.0. But in the section on Surrogates? And on
Transformations? A little obscure.
But you need to keep in mind that Chapter 3 is the Conformance chapter,
the key part of the formal definition of the standard.
I expected to find it in section 2.3, for
Ken,
Thanks for commiserating.
Yes, I noticed the differences in mapping tables.
I am glad Sybase gave different character sets different names.
I am curious how you deal with Unicode and HKSCS in the private use
area, sometimes
For that matter I wonder what a user in HK does when their
On Tue, 18 Dec 2001, Tex Texin wrote:
I am glad Sybase gave different character sets different names.
There's a Big5-HKSCS tag[1]--is anyone using that?
[1] http://www.iana.org/assignments/character-sets (see MIBenum 2101;
I don't understand why it's in the vendor range, though)
For that
On Tue, 18 Dec 2001, Kenneth Whistler wrote:
And to add to the chaos and confusion, note that the HKSCS
patch for Windows Code Page 950 does not map exactly the
same as the HK Government mapping table. And that the HK
And that's in addition to the confusion caused by the semi-official,
On top of that, it looks like 950 maps a bogus symbol or punctuation
character to U+2574. (2574 is one of a set of 4, and only 1 is mapped for
starters. Fonts covering CP950 give a way different image for that
character than you'd expect from either the charts or the names...
I let some
-BEGIN PGP SIGNED MESSAGE-
Rick Cameron wrote:
From: Asmus Freytag [mailto:[EMAIL PROTECTED]]
Of course, the Unicode Standard 3.0 doesn't even mention a 32-bit
encoding - but that's not stopping uniphiles from storing Unicode data
in their wchar_t's!
The only way such use is
22 matches
Mail list logo