David Hopwood wrote:
(about range U+FDD0..U+FDEF)
> It's for Arabic presentation forms internal to a rendering
> implementation,
> I assume (although it's not clear why existing private-use characters
> couldn't have been used for that).
Where could I found more information about this range and
David,
>
> It's for Arabic presentation forms internal to a rendering implementation,
> I assume (although it's not clear why existing private-use characters
> couldn't have been used for that).
>
Now I remember.
Thanks,
Carl
David Hopwood said:
> >
> > With Unicode 3.2 (in the works), the 32 additional code points
> > at U+FDD0..U+FDEF go from unallocated status to noncharacters
> > as well.
>
> Those are non-characters in Unicode 3.1 (see D7b in UAX #27).
Yes, I stand corrected. They are *already* approved by the
On Mon, Sep 10, 2001 at 12:22:20AM +0100, David Hopwood wrote:
> It's for Arabic presentation forms internal to a rendering implementation,
> I assume (although it's not clear why existing private-use characters
> couldn't have been used for that).
Because if the implementation uses them, then th
-BEGIN PGP SIGNED MESSAGE-
Kenneth Whistler wrote:
> Carl,
> > \xEF\xBF\xBE and \xEF\xBF\xBF are invalid Unicode characters.
>
> In current parlance (see Unicode 3.1, UAX #27), these are
> "noncharacters", and you must account for the fact that
> U+1FFFE..U+1
> U+2FFFE..U+2
> ...
> Also, if you're converting to, say, UTF-16, then non-character sequences
> like \xEF\xBF\xBE and \xEF\xBF\xBF should probably be converted to the
> corresponding UTF-16 non-characters (\uFFFE and \u), rather than being
> rejected. (Note: Unicode 3.1 and ISO/IEC 10646-1:2000 differ on this p
-BEGIN PGP SIGNED MESSAGE-
"Carl W. Brown" wrote:
> I am checking out my UTF-8 validation rules to see if they are correct.
>
> Check each character to be a valid UTF-8 initial character.
>
> \x00 to \x7f or \xC2 to \xF4
>
> Allow invalid forms su
Ken,
>
> With Unicode 3.2 (in the works), the 32 additional code points
> at U+FDD0..U+FDEF go from unallocated status to noncharacters
> as well.
>
Interesting. I have seen some of the proposed characters but nothing on
non-characters. It seems like an interesting range for non-characters.
C
Ken,
> -Original Message-
> From: Kenneth Whistler [mailto:[EMAIL PROTECTED]]
> Sent: Monday, September 10, 2001 12:48 PM
> To: [EMAIL PROTECTED]
> Cc: [EMAIL PROTECTED]
> Subject: Re: UTF-8 validation rules
>
>
> Carl,
>
> >
> > \xEF\xB
Carl,
>
> \xEF\xBF\xBE and \xEF\xBF\xBF are invalid Unicode characters.
In current parlance (see Unicode 3.1, UAX #27), these are
"noncharacters", and you must account for the fact that
U+1FFFE..U+1
U+2FFFE..U+2
...
U+10FFFE..U+10
all have the same status as noncharacters.
With Un
Misha,
> You seem to be using the word "character" in some places where
> you (probably) mean "byte", eg:
>
I am getting fuzzy headed these days. Thanks for pointing it out. It
should read:
> > I am checking out my UTF-8 validation rules to see if t
Carl,
You seem to be using the word "character" in some places where
you (probably) mean "byte", eg:
> All UTF-8 characters must be followed by the proper number of valid
> continuation characters, if any.
Misha
On 10/09/2001 18:21:48 Carl W. Brown wrote:
&g
I am checking out my UTF-8 validation rules to see if they are correct.
Check each character to be a valid UTF-8 initial character.
\x00 to \x7f or \xC2 to \xF4
Allow invalid forms such as \xC0 & \xC1 to decode but consider them invalid.
A first byte of \xE0 or \xF0 with a second byte
13 matches
Mail list logo