On Fri, Nov 30, 2012 at 11:43 AM, Albert-Jan Roskam <fo...@yahoo.com> wrote:
>
> How can I pack a unicode string using the struct module?

struct.pack is for packing an arbitrary sequence of data into a C-like
struct. You have to manually add pad bytes. Alternatively you can use
a ctypes.Structure.

The struct module supports plain byte strings, not Unicode. UTF-8 was
designed to encode all of Unicode in a way that can seamlessly pass
through libraries that process C strings (i.e. an array of non-null
bytes terminated by a null byte). Byte values less than 128 are ASCII;
beyond ASCII, UTF-8 uses 2-4 bytes, and all byte values are greater
than 127, with standardized byte order. In contrast, UTF-16 and UTF-32
have null bytes in the string and platform-determined byte order. The
length and order of the optional byte order mark (BOM) distinguishes
UTF-16LE, UTF-16BE, UTF-32LE, and UTF-32BE. There's also a UTF-8 BOM
used on Windows. Python calls this encoding "utf-8-sig".

> fmt = endianness + str(len(hello)) + "s"

That's the wrong length. Use the length of the encoded string.
_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Reply via email to