On Wed, 4 Jul 2007, Kent Johnson wrote: > Terry Carroll wrote: > > Now, superficially, s and e8 are equal, because for plain old ascii > > characters (which is all I've used in this example), UTF-8 is equivalent > > to ascii. And they compare the same: > > > >>>> s == e8 > > True > > They are equal in every sense, I don't know why you consider this > superficial. And if your original string was not ascii the encode() > would fail with a UnicodeDecodeError.
Superficial in the sense that I was using only characters in the ascii character set, so that the same byte encoding in UTF-8. so: >>> 'abc'.decode("UTF-8") u'abc' works But UTF-8 can hold other characters, too; for example >>> '\xe4\xba\xba'.decode("UTF-8") u'\u4eba' (Chinese character for "person") I'm just saying that UTF-8 encodes ascii characters to themselves; but UTF-8 is not the same as ascii. I think we're ultimately saying the same thing; to merge both our ways of putting it, I think, is that ascii will map to UTF-8 identically; but UTF-8 may map back or it will raise UnicodeDecodeError. I just didn't want to leave the impression "Yeah, UTF-8 & ascii, they're the same thing." _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor