RE: Plane One use, was Re: HTML Validation

2001-12-18 Thread Rick Cameron
Pardon my ignorance, but what is an astral character? I can't find a definition on the Unicode site, and Google mostly comes up with hits that seem to have to do with Tarot! (Does this confirm the long-held suspicion that Macs run on magic? ;^) Thanks! - rick cameron -Original Message-

RE: Plane One use, was Re: HTML Validation

2001-12-18 Thread Rick Cameron
Um, would those characters then dwell in the astral plane? No question, a much more appealing term! Thanks - rick cameron -Original Message- From: John H. Jenkins [mailto:[EMAIL PROTECTED]] Sent: Tuesday, 18 December 2001 8:09 To: Unicode List Subject: Re: Plane One use, was Re: HTML

RE: Plane One use, was Re: HTML Validation

2001-12-18 Thread Kent Karlsson
There is no such thing as an astral character in Unicode or 10646. But someone did suggest that as a name for non-BMP characters before one settled on the term supplementary character. /kent k -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of

RE: Plane One use, was Re: HTML Validation

2001-12-18 Thread Hohberger, Clive
The allusion to Tarot isn't entirely specious! Code Planes higher than the BMP were referred to as higher planes... or astral planes. Therefore, these Code planes were obviously populated with astral characters. But I never did figure out if everything above Code Plane 16 was above or still

Astral planes again (was: RE: Plane One use, was Re: HTML Validation)

2001-12-18 Thread Kenneth Whistler
Rick Cameron suggested: Would it be useful to have one term for planes 1-16 and another for all planes above the BMP? Perhaps the former are astral and the latter celestial. ;^) (I'm half-serious, since according to my suggestion UTF-16 can encode all astral characters, but not all

Astral planes (was: RE: Plane One use, was Re: HTML Validation)

2001-12-18 Thread Kenneth Whistler
Clive said: But I never did figure out if everything above Code Plane 16 was above or still below the Heaviside Layer... ;-)} As for the cats, in Up up up past the Russell Hotel, up up up to the Heaviside layer? Actually, my surmise, given the fact that the code points past U+10 are

RE: Astral planes (was: RE: Plane One use, was Re: HTML Validation)

2001-12-18 Thread Rick Cameron
That's interesting - I had assumed that there was no maximum to the scalar values in Unicode, just that each encoding had its limits. In my copy of The Unicode Standard Version 3.0, I can't find an explicit statement that scalar values in Unicode are only in the range U+0 to U+10 - but this

Re: Astral planes (was: RE: Plane One use, was Re: HTML Validation)

2001-12-18 Thread Misha . Wolf
The region beyond U+10 contains photos of the editors of The Unicode Standard, Version 3.0. Misha On 18/12/2001 18:09:11 Kenneth Whistler wrote: Clive said: But I never did figure out if everything above Code Plane 16 was above or still below the Heaviside Layer... ;-)} As for the

Re: Astral planes (was: RE: Plane One use, was Re: HTMLValidation)

2001-12-18 Thread Michael Everson
At 10:09 -0800 2001-12-18, Kenneth Whistler wrote: Perhaps, however, from the murmer of chilled microwaves emanating from the vicinity of the noncharacters U+10FFFE and U+10, at the far nether reaches of the astral planes, we can find patterns that will allow us to interpret the earliest

Re: AW: U+2028

2001-12-18 Thread Christian Cooke
The Java 2 Platform SE v1.4 Regular Expressions package (java.util.regex) which is in beta supports this and other characters mentioned in UTR #13. cf. http://java.sun.com/j2se/1.4/docs/api/java/util/regex/Pattern.html. Yes, I am aware of this UTR. Is it implemented in any common programming

RE: Astral planes (was: RE: Plane One use, was Re: HTML Validatio n)

2001-12-18 Thread Asmus Freytag
At 10:38 AM 12/18/01 -0800, Rick Cameron wrote: It looks like UCS-2 and UCS-4 are defined in ISO 10646. Does that standard restrict the valid range of UCS-4 to 0..10? It will with AMD1 to ISO/IEC 10646-1:2000 which is expected to pass final balloting and head for publication in 2002. If

RE: Astral planes (was: RE: Plane One use, was Re: HTML Validatio n)

2001-12-18 Thread Kenneth Whistler
Rick Cameron asked: Are you planning to add an explicit statement to the Unicode standard that the valid range for scalar values is 0..10? (Or is such a statement there, and I've just missed it?) Unicode 3.0, p. 45, D28: Unicode scalar value: a number N from 0 to 10sub16/sub... and

RE: Astral planes (was: RE: Plane One use, was Re: HTML Validatio n)

2001-12-18 Thread Asmus Freytag
At 03:38 PM 12/18/01 -0800, Rick Cameron wrote: Are you planning to add an explicit statement to the Unicode standard that the valid range for scalar values is 0..10? (Or is such a statement there, and I've just missed it?) see below: In particular, as the use of 32-bit variables to hold

RE: Astral planes (was: RE: Plane One use, was Re: HTML Validatio n)

2001-12-18 Thread Rick Cameron
OK, so it is there in 3.0. But in the section on Surrogates? And on Transformations? A little obscure. I expected to find it in section 2.3, for example, where the major encoding forms are being described; or even earlier - say in 1.1 Coverage. Surely the range of valid scalar values is an

Re: Microsoft input method, 950, and Unicode mapping

2001-12-18 Thread Kenneth Whistler
Tex, Thanks for this and the several private responses. For anyone interested, in addition to the Microsoft page: http://www.microsoft.com/hk/hkscs/ The HK Gov't has a web page, fonts and mapping tables: http://www.info.gov.hk/digital21/eng/hkscs/introduction.html And to add to the

Re: Microsoft input method, 950, and Unicode mapping

2001-12-18 Thread Tex Texin
Thanks for this and the several private responses. For anyone interested, in addition to the Microsoft page: http://www.microsoft.com/hk/hkscs/ The HK Gov't has a web page, fonts and mapping tables: http://www.info.gov.hk/digital21/eng/hkscs/introduction.html Oracle gave a nice paper at a

RE: Astral planes (was: RE: Plane One use, was Re: HTML Validatio n)

2001-12-18 Thread Kenneth Whistler
Rick continued: OK, so it is there in 3.0. But in the section on Surrogates? And on Transformations? A little obscure. But you need to keep in mind that Chapter 3 is the Conformance chapter, the key part of the formal definition of the standard. I expected to find it in section 2.3, for

Re: Microsoft input method, 950, and Unicode mapping

2001-12-18 Thread Tex Texin
Ken, Thanks for commiserating. Yes, I noticed the differences in mapping tables. I am glad Sybase gave different character sets different names. I am curious how you deal with Unicode and HKSCS in the private use area, sometimes For that matter I wonder what a user in HK does when their

Re: Microsoft input method, 950, and Unicode mapping

2001-12-18 Thread Thomas Chan
On Tue, 18 Dec 2001, Tex Texin wrote: I am glad Sybase gave different character sets different names. There's a Big5-HKSCS tag[1]--is anyone using that? [1] http://www.iana.org/assignments/character-sets (see MIBenum 2101; I don't understand why it's in the vendor range, though) For that

Re: Microsoft input method, 950, and Unicode mapping

2001-12-18 Thread Thomas Chan
On Tue, 18 Dec 2001, Kenneth Whistler wrote: And to add to the chaos and confusion, note that the HKSCS patch for Windows Code Page 950 does not map exactly the same as the HK Government mapping table. And that the HK And that's in addition to the confusion caused by the semi-official,

Re: Microsoft input method, 950, and Unicode mapping

2001-12-18 Thread Asmus Freytag
On top of that, it looks like 950 maps a bogus symbol or punctuation character to U+2574. (2574 is one of a set of 4, and only 1 is mapped for starters. Fonts covering CP950 give a way different image for that character than you'd expect from either the charts or the names... I let some

Re: Astral planes (was: RE: Plane One use, was Re: HTML Validation)

2001-12-18 Thread David Hopwood
-BEGIN PGP SIGNED MESSAGE- Rick Cameron wrote: From: Asmus Freytag [mailto:[EMAIL PROTECTED]] Of course, the Unicode Standard 3.0 doesn't even mention a 32-bit encoding - but that's not stopping uniphiles from storing Unicode data in their wchar_t's! The only way such use is