;
Cc: "Kenneth Whistler" <[EMAIL PROTECTED]>
Sent: Tuesday, July 23, 2002 19:44
Subject: Re: Abstract character?
> Kenneth Whistler wrote:
>
> >> UTF-16 does not allow the representation of an unpaired surrogate
> >> 0xD800 followed by another, coincidental
I typo'd:
> I suggest that UAX #18 be revised to
> state this unambiguously.
s/#18/#19/
-Doug
Kenneth Whistler wrote:
>> UTF-16 does not allow the representation of an unpaired surrogate
>> 0xD800 followed by another, coincidental unpaired surrogate 0xDC00.
>> (It maps the two to U+1.) Among the standard UTFs, only UTF-32
>> allows the two to be treated as unpaired surrogates.
>
> A
Lars Marius Garshol followed up:
> H. OK. So combining diacritics are also abstract characters?
Yes, clearly.
Each encoded character in the Unicode CCS, ipso facto, associates
an abstract character with a code point.
So U+0300 COMBINING GRAVE ACCENT associates the code point U+0300
w
succeed iff the input is valid in the source UTF).
I also think this is of paramount importance.
> I.e. U+D800..DFFF, like U+11, should be undesignated and
> unrepresentable.
However, you can't go quite this far. As Markus pointed out, code points
themselves may have properties -- even code points which cannot, in
principle, be assigned to characters. And there are already existing
APIs which handle these code points. Their function is clearly
*designated* by the standard, normatively; that, however, is different
from saying that an abstract character could ever be assigned to them.
--Ken
So far, the Unicode Standard has defined code points to be from the contiguous range
of 0..0x10.
Some definitions are fuzzy in the standard, with hopes of clarification in Unicode 4.0.
It is true that UTF-16 cannot encode , but it can encode .
There are at least three reasons why not to for
On 07/22/2002 03:38:50 PM Kenneth Whistler wrote:
>Abstract character
>
> that which is encoded; an element of the repertoire (existing
> independent of the character encoding standard, and often
> identifiable in other character encoding standards, as well
> as th
-BEGIN PGP SIGNED MESSAGE-
Mark Davis wrote:
> A small correction to Ken's message:
>
> >The Unicode scalar value
> >definitionally excludes D800..DFFF, which are only code unit
> >values used in UTF-16, and which are not code points associated
> >with any well-formed UTF
* Kenneth Whistler
|
| Abstract character
|
|that which is encoded; an element of the repertoire (existing
|independent of the character encoding standard, and often
|identifiable in other character encoding standards, as well
|as the Unicode Standard); the implicit basis of
Mark Davis wrote:
> The UTC in has decided to make scalar value mean unambiguously the
> code points ..D7FF, E000..10, i.e., everything but surrogate
> code points. While surrogate code points cannot be represented in
> UTF-8 (as of Unicode 3.2), the UTC has not decided that the surrogat
Original Message -
From: "Kenneth Whistler" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
Sent: Monday, July 22, 2002 13:38
Subject: Re: Abstract character?
> Lars Marius Garshol asked:
>
> > I'm try
I usually define an abstract character in talks I give as "an element of a writing
system that you care about, independent of glyphs, and certainly independent of
endings or specific code points".
If it could be described more precisely than that, it wouldn't be "
Lars Marius Garshol asked:
> I'm trying to find out what an abstract character is. I've been
> looking at chapter 3 of Unicode 3.0, without really achieving
> enlightenment.
>
> The term Unicode scalar value (apparently synonymous with code point)
> seems clear.
Lars Marius Garshol wrote:
> I'm trying to find out what an abstract character is.
http://www.unicode.org/reports/tr17/
http://oss.software.ibm.com/icu/docs/papers/forms_of_unicode/
markus
I'm trying to find out what an abstract character is. I've been
looking at chapter 3 of Unicode 3.0, without really achieving
enlightenment.
The term Unicode scalar value (apparently synonymous with code point)
seems clear. It is the identifying number assigned to assigned
Unicode
15 matches
Mail list logo