Shaun Lindsay created THRIFT-2948:
-------------------------------------

             Summary: Python TJSONProtocol doesn't handle structs with binary 
fields containing invalid unicode.
                 Key: THRIFT-2948
                 URL: https://issues.apache.org/jira/browse/THRIFT-2948
             Project: Thrift
          Issue Type: Bug
          Components: Python - Library
    Affects Versions: 0.9.2
         Environment: python 2.7.6, mac OSX yosemite
            Reporter: Shaun Lindsay
            Priority: Minor


Serializing a struct to JSON using TJSONProtocol can fail with a unicode decode 
error if the struct contains a binary field with invalid unicode bytes (for 
example '\xff').

To recreate:
Assume you have a TestStruct defined as {1: optional binary blob}.

def test_json_serialization():
  thrift_obj = TestStruct('\xff\xff\x00\xaa')
  transport = TTransport.TMemoryBuffer()
  protocol = TJSONProtocol.TJSONProtocol(transport)
  thrift_obj.write(protocol)

Running this will give the following exception:
Traceback (most recent call last):
  File "/Users/shaunlindsay/sona/simplethrift/test_suite.py", line 32, in 
test_json_serialize_deserialize
    serialized = simplethrift.serialize_json(original)
  File "/Users/shaunlindsay/sona/simplethrift/simplethrift.py", line 71, in 
serialize_json
    thrift_obj.write(protocol)
  File "testfiles/gen-py/teststruct/ttypes.py", line 84, in write
    oprot.writeString(self.blob)
  File "/Library/Python/2.7/site-packages/thrift/protocol/TJSONProtocol.py", 
line 473, in writeString
    self.writeJSONString(string)
  File "/Library/Python/2.7/site-packages/thrift/protocol/TJSONProtocol.py", 
line 177, in writeJSONString
    self.trans.write(json.dumps(string))
  File 
"/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/__init__.py",
 line 243, in dumps
    return _default_encoder.encode(obj)
  File 
"/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/encoder.py",
 line 201, in encode
    return encode_basestring_ascii(o)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xff in position 0: invalid 
start byte





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to