[ https://issues.apache.org/jira/browse/THRIFT-5303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jens Geyer resolved THRIFT-5303. -------------------------------- Fix Version/s: 0.14.0 Assignee: Quanlong Huang Resolution: Fixed > Unicode decode errors in _fast_decode > ------------------------------------- > > Key: THRIFT-5303 > URL: https://issues.apache.org/jira/browse/THRIFT-5303 > Project: Thrift > Issue Type: Bug > Components: Python - Library > Affects Versions: 0.11.0 > Environment: Ubuntu 16.04.6 LTS > Reporter: Quanlong Huang > Assignee: Quanlong Huang > Priority: Major > Fix For: 0.14.0 > > > Impala currently uses thrift-0.11.0 on client side and thrift-0.9.3 on server > side (server side upgrade is blocked by some issues). We encountered an issue > in decoding utf8 bytes on the client side. The result has a partial utf8 code > point. But thrift is not handling the error elegantly. The stacktrace: > {code:java} > Traceback (most recent call last): > File "/home/quanlong/workspace/Impala/shell/impala_client.py", line 1210, > in _do_beeswax_rpc > ret = rpc() > File "/home/quanlong/workspace/Impala/shell/impala_client.py", line 1113, > in <lambda> > self.fetch_size)) > File > "/home/quanlong/workspace/Impala/shell/build/thrift-11-gen/gen-py/beeswaxd/BeeswaxService.py", > line 254, in fetch > return self.recv_fetch() > File > "/home/quanlong/workspace/Impala/shell/build/thrift-11-gen/gen-py/beeswaxd/BeeswaxService.py", > line 275, in recv_fetch > result.read(iprot) > File > "/home/quanlong/workspace/Impala/shell/build/thrift-11-gen/gen-py/beeswaxd/BeeswaxService.py", > line 1410, in read > iprot._fast_decode(self, iprot, [self.__class__, self.thrift_spec]) > UnicodeDecodeError: 'utf8' codec can't decode byte 0xe6 in position 3: > unexpected end of data {code} > This is similar to THRIFT-2087, but the error happens in the boundary between > Python and C++ codes. Just like THRIFT-2087, we need to provide an error > handling behavior of decoding utf-8 bytes in > {{TBinaryProtocolAccelerated._fast_decode}}. The related codes are > [https://github.com/apache/thrift/blob/0.11.0/lib/py/src/ext/protocol.tcc#L708] > {code:c++} > case T_STRING: { > char* buf = NULL; > int len = impl()->readString(&buf); > if (len < 0) { > return NULL; > } > if (isUtf8(typeargs)) { > return PyUnicode_DecodeUTF8(buf, len, 0); <--- Needs to provide an > error handling method here > } else { > return PyBytes_FromStringAndSize(buf, len); > } > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)