People really should be using UTF-8 or something else :)   IMO these are legacy 
encodings and should be deprecated.

-Shawn

-----Original Message-----
From: unicode-bou...@unicode.org [mailto:unicode-bou...@unicode.org] On Behalf 
Of Doug Ewell
Sent: Friday, November 16, 2012 4:11 PM
To: Buck Golemon; unicode
Subject: Re: cp1252 decoder implementation

Buck Golemon wrote:

> Is it incorrect to say that 0x81 is a non-semantic byte in cp1252, and 
> to map it to the equally-non-semantic U+81 ?
>
> This would allow systems that follow the html5 standard and use cp1252 
> in place of latin1 to continue to be binary-faithful and reversible.

This isn't quite as black-and-white as the question about Latin-1. If you are 
targeting HTML5, you are probably safe in treating an incoming
0x81 (for example) as either U+0081 or U+FFFD, or throwing some kind of error. 
HTML5 insists that you treat 8859-1 as if it were CP1252, so it no longer 
matters what the byte is in 8859-1.

--
Doug Ewell | Thornton, Colorado, USA
http://www.ewellic.org | @DougEwell  







Reply via email to