On Wed, 4 Jul 2007, William O'Higgins Witteman wrote: > >It is nonsense to talk about 'recasting' an ascii string as UTF-8; an > >ascii string is *already* UTF-8 because the representation of the > >characters is identical. OTOH it makes sense to talk about converting an > >ascii string to a unicode string. > > Then what does mystring.encode("UTF-8") do?
I'm pretty iffy on this stuff myself, but as I see it, you basically have three kinds of things here. First, an ascii string: s = 'abc' In hex, this is 616263; 61 for 'a'; 62 for 'b', 63 for 'c'. Second, a unicode string: u = u'abc' I can't say what this is "in hex" because that's not meaningful. A Unicode character is a code point, which can be represented in a variety of ways, depending on the encoding used. So, moving on.... Finally, you can have a sequence of bytes, which are stored in a string as a buffer, that shows the particular encoding of a particular string: e8 = s.encode("UTF-8") e16 = s.encode("UTF-16") Now, e8 and e16 are each strings (of bytes), the content of which tells you how the string of characters that was encoded is represented in that particular encoding. In hex, these look like this. e8: 616263 (61 for 'a'; 62 for 'b', 63 for 'c') e16: FFFE6100 62006300 (FFEE for the BOM, 6100 for 'a', 6200 for 'b', 6300 for 'c') Now, superficially, s and e8 are equal, because for plain old ascii characters (which is all I've used in this example), UTF-8 is equivalent to ascii. And they compare the same: >>> s == e8 True But that's not true of the UTF-16: >>> s == e16 False >>> e8 == e16 False So (and I'm open to correction on this), I think of the encode() method as returning a string of bytes that represents the particular encoding of a string value -- and it can't be used as the string value itself. But you can get that string value back (assuming all the characters map to ascii): >>> s8 = e8.decode("UTF-8") >>> s16 = e16.decode("UTF-16") >>> s == s8 == s16 True _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor