Thrift's string and binary types are represented as "str" (8-bit strings), and you are expected to use "str" when populating your Thrift structures. >From your email, I assume you are using "unicode" strings in Thrift structures. The str type in Python (like cStringIO) is essentially binary blob data. If you want to use utf-8 encoded Unicode data in a string field from Python, you should manually encode your unicode object into utf-8 as a str. This situation is pretty weak, but it is representative of all Python 2 code that deals with Unicode strings.
In Python 3, "str" is actually a unicode string and there are separate types for binary blobs. When we create a Python 3 mapping for Thrift, I suspect that we will use a Python binary type for the Thrift binary type and use str for the Thrift string type, and we will automatically encode/decode strings as utf-8. --David Ted Dunning wrote: > Yes. This difficulty is exactly why fixing this had to be rolled into the > major incompatibilities introduced by 3.0. > > http://docs.python.org/dev/3.0/whatsnew/3.0.html#text-vs-data-instead-of-unicode-vs-8-bit > > On Wed, Jan 21, 2009 at 12:11 PM, Mark Slee <[email protected]> wrote: > >> My personal experience dealing with Python string-handling has created a >> lot of headaches. The distinction between the primitive unicode vs. string >> types is subtle but can cause a lot of weird foibles like this. >> > > >
