Dave Angel wrote:
On 11/20/2011 04:45 PM, Steven D'Aprano wrote:
<snip>
Something in the tool chain before it reached Python has saved it
using a wide (four byte) encoding, most likely UTF-16 as that is
widely used by Windows and Java. With the right settings, it could
take as little as opening the file in Notepad, then clicking Save.
UTF-16 is a two byte format. That's typically what Windows uses for
Unicode. It's Unices that are more likely to use a four-byte format.
Oops, you're right of course, two bytes, not four:
py> u'M'.encode('utf-16BE')
'\x00M'
I was thinking of four hex digits:
py> u'M'.encode('utf-16BE').encode('hex')
'004d'
--
Steven
_______________________________________________
Tutor maillist - Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor