On Thu, 20 Nov 2003 21:02:49 -0800, "Doug Ewell" wrote:
> 
> An invalid GB18030 sequence, like <FE 40>, or a valid but out-of-range
> sequence, like <E3 32 9A 36>, should be treated just like an invalid or
> out-of-range UTF-8 sequence.  Issue an error message, format the hard
> disk, whatever; just don't try to treat it like a normal character.
> 

Hmm, surely <FE 40> is a valid GB-18030 sequence = U+FA0C according to my
reckoning (although Word fails to correctly convert <FE 40> when told to open a
file as GB-18030, it does save U+FA0C as <FE 40> when told to save as GB-18030).

In BabelPad I convert any invalid GB-18030 characters to U+FFFD ("used to
replace an incoming character whose value is unknown or unrepresentable in
Unicode"), and notify the user that the file has been opened with errors, which
I think is a compliant and sensible implementation. (Unfortunately I've just
noticed that BabelPad has a slight bug with out of range GB-18030 values such as
<E3 32 9A 36> = U+110000.)

Andrew

Reply via email to