Hi guys, I'm using thrift with python lib and I have a problem wit
serializing utf-8 strings (calling an server function with some string
params). The problem is with symbols whose codes are above 128 and by
hacking through the thrift python code (TTransport.py), It seems that
StringIO (from cStringIO) stores only 8bit per symbol strings (ascii),
and for english text it's ok, but when your string has other
characters - it fails. One solution is to use the slower equivalent
StringIO (from StringIO), but than you need to convert your strings to
python unicode (it accepts either ascii or unicode, no encodings like
utf-8 etc.) but than, socket.send (from TSocket.py) fails, since as we
all know unicode objects cannot be sent over the wire due to the risk
of different unicode rep on the other side (may be 4 bytes for
example), so we need to use something like utf-8 for sockets really.
So my question is, does thrift really supports utf-8 (like the wiki
says), that means all chars that can be represented, not just the
ascii subset, or I am I missing something? Any user with that kind of
a problem? I did not find anything on the subject on the internet, may
be other languages (java, php) does not have that problem?
Emil

Reply via email to