On Fri, Nov 30, 2012 at 11:43 AM, Albert-Jan Roskam <fo...@yahoo.com> wrote: > > How can I pack a unicode string using the struct module?
struct.pack is for packing an arbitrary sequence of data into a C-like struct. You have to manually add pad bytes. Alternatively you can use a ctypes.Structure. The struct module supports plain byte strings, not Unicode. UTF-8 was designed to encode all of Unicode in a way that can seamlessly pass through libraries that process C strings (i.e. an array of non-null bytes terminated by a null byte). Byte values less than 128 are ASCII; beyond ASCII, UTF-8 uses 2-4 bytes, and all byte values are greater than 127, with standardized byte order. In contrast, UTF-16 and UTF-32 have null bytes in the string and platform-determined byte order. The length and order of the optional byte order mark (BOM) distinguishes UTF-16LE, UTF-16BE, UTF-32LE, and UTF-32BE. There's also a UTF-8 BOM used on Windows. Python calls this encoding "utf-8-sig". > fmt = endianness + str(len(hello)) + "s" That's the wrong length. Use the length of the encoded string. _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor