Zitat von Buck Golemon <b...@yelp.com>:

cp1252 (aka windows-1252) defines 27 characters which iso-8859-1 does not.
This leaves five bytes with undefined semantics.

Currently the python cp1252 decoder allows us to ignore/replace/error on
these bytes, but there's no facility for allowing these unknown bytes to
round-trip through the codec, as the latin1 codec does.

That's not true: there are actually *two* facilities that allow exactly that.
1. you can write a new codec which round-trips these bytes through some characters,
   or
2. you can write an error handler that does such round-tripping. The
   surrogate-escape error handler was specifically designed to allow such
   round-tripping, see http://www.python.org/dev/peps/pep-0383/
   (not just for this codec, but for any codec).

Regards,
Martin



Reply via email to