Zitat von Buck Golemon <b...@yelp.com>:
cp1252 (aka windows-1252) defines 27 characters which iso-8859-1 does not.
This leaves five bytes with undefined semantics.
Currently the python cp1252 decoder allows us to ignore/replace/error on
these bytes, but there's no facility for allowing these unknown bytes to
round-trip through the codec, as the latin1 codec does.
That's not true: there are actually *two* facilities that allow exactly that.
1. you can write a new codec which round-trips these bytes through
some characters,
or
2. you can write an error handler that does such round-tripping. The
surrogate-escape error handler was specifically designed to allow such
round-tripping, see http://www.python.org/dev/peps/pep-0383/
(not just for this codec, but for any codec).
Regards,
Martin