People really should be using UTF-8 or something else :) IMO these are legacy encodings and should be deprecated.
-Shawn -----Original Message----- From: unicode-bou...@unicode.org [mailto:unicode-bou...@unicode.org] On Behalf Of Doug Ewell Sent: Friday, November 16, 2012 4:11 PM To: Buck Golemon; unicode Subject: Re: cp1252 decoder implementation Buck Golemon wrote: > Is it incorrect to say that 0x81 is a non-semantic byte in cp1252, and > to map it to the equally-non-semantic U+81 ? > > This would allow systems that follow the html5 standard and use cp1252 > in place of latin1 to continue to be binary-faithful and reversible. This isn't quite as black-and-white as the question about Latin-1. If you are targeting HTML5, you are probably safe in treating an incoming 0x81 (for example) as either U+0081 or U+FFFD, or throwing some kind of error. HTML5 insists that you treat 8859-1 as if it were CP1252, so it no longer matters what the byte is in 8859-1. -- Doug Ewell | Thornton, Colorado, USA http://www.ewellic.org | @DougEwell