Re: Abstract character?

2002-07-24 Thread Mark Davis
; Cc: "Kenneth Whistler" <[EMAIL PROTECTED]> Sent: Tuesday, July 23, 2002 19:44 Subject: Re: Abstract character? > Kenneth Whistler wrote: > > >> UTF-16 does not allow the representation of an unpaired surrogate > >> 0xD800 followed by another, coincidental

Re: Abstract character?

2002-07-23 Thread Doug Ewell
I typo'd: > I suggest that UAX #18 be revised to > state this unambiguously. s/#18/#19/ -Doug

Re: Abstract character?

2002-07-23 Thread Doug Ewell
Kenneth Whistler wrote: >> UTF-16 does not allow the representation of an unpaired surrogate >> 0xD800 followed by another, coincidental unpaired surrogate 0xDC00. >> (It maps the two to U+1.) Among the standard UTFs, only UTF-32 >> allows the two to be treated as unpaired surrogates. > > A

Re: Abstract character?

2002-07-23 Thread Kenneth Whistler
Lars Marius Garshol followed up: > H. OK. So combining diacritics are also abstract characters? Yes, clearly. Each encoded character in the Unicode CCS, ipso facto, associates an abstract character with a code point. So U+0300 COMBINING GRAVE ACCENT associates the code point U+0300 w

Re: Abstract character?

2002-07-23 Thread Kenneth Whistler
succeed iff the input is valid in the source UTF). I also think this is of paramount importance. > I.e. U+D800..DFFF, like U+11, should be undesignated and > unrepresentable. However, you can't go quite this far. As Markus pointed out, code points themselves may have properties -- even code points which cannot, in principle, be assigned to characters. And there are already existing APIs which handle these code points. Their function is clearly *designated* by the standard, normatively; that, however, is different from saying that an abstract character could ever be assigned to them. --Ken

Re: Abstract character?

2002-07-23 Thread Markus Scherer
So far, the Unicode Standard has defined code points to be from the contiguous range of 0..0x10. Some definitions are fuzzy in the standard, with hopes of clarification in Unicode 4.0. It is true that UTF-16 cannot encode , but it can encode . There are at least three reasons why not to for

Re: Abstract character?

2002-07-23 Thread Peter_Constable
On 07/22/2002 03:38:50 PM Kenneth Whistler wrote: >Abstract character > > that which is encoded; an element of the repertoire (existing > independent of the character encoding standard, and often > identifiable in other character encoding standards, as well > as th

Re: Abstract character?

2002-07-23 Thread David Hopwood
-BEGIN PGP SIGNED MESSAGE- Mark Davis wrote: > A small correction to Ken's message: > > >The Unicode scalar value > >definitionally excludes D800..DFFF, which are only code unit > >values used in UTF-16, and which are not code points associated > >with any well-formed UTF

Re: Abstract character?

2002-07-23 Thread Lars Marius Garshol
* Kenneth Whistler | | Abstract character | |that which is encoded; an element of the repertoire (existing |independent of the character encoding standard, and often |identifiable in other character encoding standards, as well |as the Unicode Standard); the implicit basis of

Re: Abstract character?

2002-07-22 Thread Doug Ewell
Mark Davis wrote: > The UTC in has decided to make scalar value mean unambiguously the > code points ..D7FF, E000..10, i.e., everything but surrogate > code points. While surrogate code points cannot be represented in > UTF-8 (as of Unicode 3.2), the UTC has not decided that the surrogat

Re: Abstract character?

2002-07-22 Thread Mark Davis
Original Message - From: "Kenneth Whistler" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Cc: <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]> Sent: Monday, July 22, 2002 13:38 Subject: Re: Abstract character? > Lars Marius Garshol asked: > > > I'm try

Re: Abstract character?

2002-07-22 Thread Barry Caplan
I usually define an abstract character in talks I give as "an element of a writing system that you care about, independent of glyphs, and certainly independent of endings or specific code points". If it could be described more precisely than that, it wouldn't be "

Re: Abstract character?

2002-07-22 Thread Kenneth Whistler
Lars Marius Garshol asked: > I'm trying to find out what an abstract character is. I've been > looking at chapter 3 of Unicode 3.0, without really achieving > enlightenment. > > The term Unicode scalar value (apparently synonymous with code point) > seems clear.

Re: Abstract character?

2002-07-22 Thread Markus Scherer
Lars Marius Garshol wrote: > I'm trying to find out what an abstract character is. http://www.unicode.org/reports/tr17/ http://oss.software.ibm.com/icu/docs/papers/forms_of_unicode/ markus

Abstract character?

2002-07-22 Thread Lars Marius Garshol
I'm trying to find out what an abstract character is. I've been looking at chapter 3 of Unicode 3.0, without really achieving enlightenment. The term Unicode scalar value (apparently synonymous with code point) seems clear. It is the identifying number assigned to assigned Unicode