Thrift's string and binary types are represented as "str" (8-bit strings),
and you are expected to use "str" when populating your Thrift structures.
>From your email, I assume you are using "unicode" strings in Thrift structures.
The str type in Python (like cStringIO) is essentially binary blob data.
If you want to use utf-8 encoded Unicode data in a string field from Python,
you should manually encode your unicode object into utf-8 as a str.
This situation is pretty weak, but it is representative of all Python 2 code
that deals with Unicode strings.

In Python 3, "str" is actually a unicode string and there are separate types
for binary blobs.  When we create a Python 3 mapping for Thrift, I suspect that
we will use a Python binary type for the Thrift binary type and use str
for the Thrift string type, and we will automatically encode/decode strings
as utf-8.

--David

Ted Dunning wrote:
> Yes.  This difficulty is exactly why fixing this had to be rolled into the
> major incompatibilities introduced by 3.0.
> 
> http://docs.python.org/dev/3.0/whatsnew/3.0.html#text-vs-data-instead-of-unicode-vs-8-bit
> 
> On Wed, Jan 21, 2009 at 12:11 PM, Mark Slee <[email protected]> wrote:
> 
>> My personal experience dealing with Python string-handling has created a
>> lot of headaches. The distinction between the primitive unicode vs. string
>> types is subtle but can cause a lot of weird foibles like this.
>>
> 
> 
> 

Reply via email to