
How can I pack a unicode string using the struct module? If I simply use packed 
= struct.pack(fmt, hello) in the code below (and 'hello' is a unicode string), 
I get this: "error: argument for 's' must be a string". I keep reading that I 
have to encode it to a utf-8 bytestring, but this does not work (it yields 
mojibake and tofu output for some of the languages). It's annoying if one needs 
to know the encoding in which each individual language should be represented. I 
was hoping "unicode-internal" was the way to do it, but this does not reproduce 
the original string when I unpack it.. :-(

# Python 2.6.4 (r264:75708, Oct 26 2009, 08:23:19) [MSC v.1500 32 bit (Intel)] 
on win32

import sys
import struct

greetings = \
        [['Arabic', [1575, 1604, 1587, 1604, 1575, 1605, 32, 1593, 1604, 1610, 
                     1605], 'cp1256'], # 'cp864' 'iso8859_6'
         ['Assamese', [2472, 2478, 2488, 2509, 2453, 2494, 2544], 'utf-8'],
         ['Bengali', [2438, 2488, 2488, 2494, 2482, 2494, 2478, 2497, 32, 2438,
                      2482, 2494, 2439, 2453, 2497, 2478], 'utf-8'],
         ['Georgian', [4306, 4304, 4315, 4304, 4320, 4335, 4317, 4305, 4304], 
         ['Kazakh', [1057, 1241, 1083, 1077, 1084, 1077, 1090, 1089, 1110, 
1079, 32,
                     1073, 1077], 'utf-8'],
         ['Russian', [1047, 1076, 1088,1072, 1074, 1089, 1090, 1074, 1091, 1081,
                      1090, 1077], 'utf-8'],
         ['Spanish', [161, 72, 111, 108, 97, 33], 'cp1252'],
         ['Swiss German', [71, 114, 252, 101, 122, 105], 'cp1252'],
         ['Thai', [3626, 3623, 3633, 3626, 3604, 3637], 'cp874'],
         ['Walloon', [66, 111, 110, 100, 106, 111, 251], 'cp1252']]     
for greet in greetings:
    language, chars, encoding = greet
    hello = "".join([unichr(i) for i in chars])
    #print language, hello, encoding  # prints everything as it should look
    endianness = "<" if sys.byteorder == "little" else ">"
    fmt = endianness + str(len(hello)) + "s"
    #packed = struct.pack(fmt, hello.encode('utf_32_le'))
    #packed = struct.pack(fmt, hello.encode(encoding))
    #packed = struct.pack(fmt, hello.encode('utf_8'))
    packed = struct.pack(fmt, hello.encode("unicode-internal"))
    print struct.unpack(fmt, packed)[0].decode("unicode-internal")  # 
UnicodeDecodeError: 'unicode_internal' codec can't decode byte 0x00 in position 
12: truncated input

Thank you in advance!


