>> Thrift's string and binary types are represented as "str" (8-bit strings),
>> and you are expected to use "str" when populating your Thrift structures.
>> From your email, I assume you are using "unicode" strings in Thrift
>> structures.
>> The str type in Python (like cStringIO) is essentially binary blob data.
>> If you want to use utf-8 encoded Unicode data in a string field from Python,
>> you should manually encode your unicode object into utf-8 as a str.
>> This situation is pretty weak, but it is representative of all Python 2 code
>> that deals with Unicode strings.
>
> Yes, i'm aware of that. That's how I first tried and failed -- encoded
> my unicode objects to str (with utf8 encoding) and
> cStringIO.StringIO.write (or getvalue) fails because it tries to
> convert the string you give it with 'ascii' codec (and some of my
> chars cannot be represented with 1 byte) internaly and raises Unicode
> exception. cStringIO doc says:
>
> "Unlike the memory files implemented by the StringIO module, those
> provided by this module are not able to accept Unicode strings that
> cannot be encoded as plain ASCII strings."
Are you sure you encoded your unicode object to a str before trying to
serialize it?
It seems like cStringIO is working fine for me, until the last command, where I
try
to write a unicode object directly into it.
In [1]: from cStringIO import StringIO
In [2]: buf = StringIO()
In [3]: my_uni = u'\u03a9'
In [4]: my_uni
Out[4]: u'\u03a9'
In [5]: len(my_uni)
Out[5]: 1
In [6]: my_bin = my_uni.encode('utf-8')
In [7]: my_bin
Out[7]: '\xce\xa9'
In [8]: len(my_bin)
Out[8]: 2
In [9]: buf.write(my_bin)
In [10]: buf.getvalue()
Out[10]: '\xce\xa9'
In [11]: buf.getvalue().decode('utf8')
Out[11]: u'\u03a9'
In [12]: buf.getvalue().decode('utf8') == my_uni
Out[12]: True
In [13]: buf.write(my_uni)
---------------------------------------------------------------------------
<type 'exceptions.UnicodeEncodeError'> Traceback (most recent call last)
/home/dreiss/<ipython console> in <module>()
<type 'exceptions.UnicodeEncodeError'>: 'ascii' codec can't encode character
u'\u03a9' in position 0: ordinal not in range(128)